[Numpy-discussion] Psyco MA?
tim.hochberg at ieee.org
Tue Feb 11 13:05:05 CST 2003
Perry Greenfield wrote:
>Tim Hochberg writes:
>> Overhead (c) Overhead (nc)
>>TimePerElement (c) TimePerElement (nc)
>>NumPy 10 us 10
>>us 85 ps 95 ps
>>NumArray 200 us 530 us
>>45 ps 135 ps
>>Psymeric 50 us 65
>>us 80 ps 80 ps
>>The times shown above are for Float64s and are pretty approximate, and
>>they happen to be a particularly favorable array shape for Psymeric. I
>>have seen pymeric as much as 50% slower than NumPy for large arrays of
>>The overhead for NumArray is surprisingly large. After doing this
>>experiment I'm certainly more sympathetic to Konrad wanting less
>>overhead for NumArray before he adopts it.
>Wow! Do you really mean picoseconds? I never suspected that
>either Numeric or numarray were that fast. ;-)
My bad, I meant ns. What's a little factor of 10^3 among friends.
>Anyway, this issue is timely [Err...]. As it turns out we started
>looking at ways of improving small array performance a couple weeks
>ago and are coming closer to trying out an approach that should
>reduce the overhead significantly.
>But I have some questions about your benchmarks. Could you show me
>the code that is used to generate the above timings? In particular
>I'm interested in the kinds of arrays that are being operated on.
>It turns out that that the numarray overhead depends on more than
>just contiguity and it isn't obvious to me which case you are testing.
I'll send you psymeric, including all the tests by private email to
avoid cluttering up the list. (Don't worry, it's not huge -- only 750
lines of Python at this point). You can let me know if you find any
horrible issues with it.
>For example, Todd's benchmarks indicate that numarray's overhead is
>about a factor of 5 larger than numpy when the input arrays are
>contiguous and of the same type. On the other hand, if the array
>is not contiguous or requires a type conversion, the overhead is
>much larger. (Also, these cases require blocking loops over large
>arrays; we have done nothing yet to optimize the block size or
>the speed of that loop.) If you are doing the benchmark on
>contiguous, same type arrays, I'd like to get a copy of the benchmark
>program to try to see where the disagreement arises.
Basically, I'm operating on two, random contiguous, 3x3, Float64
arrays.In the noncontiguous case the arrays are indexed using [::2,::2]
and [1::2,::2] so these arrays are 2x2 and 1x2. Hmmm, that wasn't
intentional, I'm measuring axis stretching as well. However using
[::2.::2] for both axes doesn't change things a whole lot. The core
timing part looks like this:
t0 = clock()
if op == '+': c = a + b
elif op == '-': c = a - b
elif op == '*': c = a * b
elif op == '/': c = a / b
elif op == '==': c = a==b
raise ValueError("unknown op %s" % op)
t1 = clock()
This is done N times, the first M values are thrown away and the
remaining values are averaged. Currently N is 3 and M is 1, so not a lot
averaging is taking place.
>The very preliminary indications are that we should be able to make
>numarray overheads approximately 3 times higher for all ufunc cases.
>That's still slower, but not by a factor of 20 as shown above. How
>much work it would take to reduce it further is unclear (the main
>bottleneck at that point appears to be how long it takes to create
>new output arrays)
That's good. I think it's important to get people like Konrad on board
and that will require dropping the overhead.
>We are still mainly in the analysis and design phase of how to
>improve performance for small arrays and block looping. We believe
>that this first step will not require moving very much of the
>existing Python code into C (but some will be). Hopefully we
>will have some working code in a couple weeks.
I hope it goes well.
More information about the Numpy-discussion