[Numpy-discussion] numpy ufuncs and COREPY - any info?

Andrew Friedley afriedle@indiana....
Thu May 28 07:34:11 CDT 2009

David Cournapeau wrote:
> Francesc Alted wrote: 
>> No, that seems good enough.  But maybe you can present results in cycles/item.  
>> This is a relatively common unit and has the advantage that it does not depend 
>> on the frequency of your cores.

Sure, cycles is fine, but I'll argue that in this case the number still 
does depend on the frequency of the cores, particularly as it relates to 
the frequency of the memory bus/controllers.  A processor with a higher 
clock rate and higher multiplier may show lower performance when 
measuring in cycles because the memory bandwidth has not necessarily 
increased, only the CPU clock rate.  Plus between say a xeon and opteron 
you will have different SSE performance characteristics.  So really, any 
sole number/unit is not sufficient without also describing the system it 
was obtained on :)

> (it seems that I do not receive all emails - I never get the emails from
> Andrew ?)

I seem to have issues with my emails just disappearing; sometimes they 
never appear on the list and I have to re-send them.

> Concerning the timing: I think generally, you should report the minimum,
> not the average. The numbers for numpy are strange: 3s to compute 3e6
> cos on a 2Ghz core duo (~2000 cycles/item) is very slow. In that sense,
> taking 20 cycles/item for your optimized version is much more
> believable, though :)

I can do minimum.  My motivation for average was to show a common-case 
performance an application might see.  If that application executes the 
ufunc many times, the performance will tend towards the average.

> I know the usual libm functions are not super fast, specially if high
> accuracy is not needed. Music softwares and games usually go away with
> approximations which are quite fast (.e.g using cos+sin evaluation at
> the same time), but those are generally unacceptable for scientific
> usage. I think it is critical to always check the result of your
> implementation, because getting something fast but wrong can waste a lot
> of your time :) One thing which may be hard to do is correct nan/inf
> handling. I don't know how  SIMD extensions handle this.

I was waiting for someone to bring this up :)  I used an implementation 
that I'm now thinking is not accurate enough for scientific use.  But 
the question is, what is a concrete measure for determining whether some 
cosine (or other function) implementation is accurate enough?  I guess 
we have precedent in the form of libm's implementation/accuracy 
tradeoffs, but is that precedent correct?

Really answering that question, and coming up with the best possible 
implementations that meet the requirements, is probably a GSoC project 
on its own.


More information about the Numpy-discussion mailing list