[Numpy-discussion] numpy ufuncs and COREPY - any info?
Thu May 28 07:34:11 CDT 2009
David Cournapeau wrote:
> Francesc Alted wrote:
>> No, that seems good enough. But maybe you can present results in cycles/item.
>> This is a relatively common unit and has the advantage that it does not depend
>> on the frequency of your cores.
Sure, cycles is fine, but I'll argue that in this case the number still
does depend on the frequency of the cores, particularly as it relates to
the frequency of the memory bus/controllers. A processor with a higher
clock rate and higher multiplier may show lower performance when
measuring in cycles because the memory bandwidth has not necessarily
increased, only the CPU clock rate. Plus between say a xeon and opteron
you will have different SSE performance characteristics. So really, any
sole number/unit is not sufficient without also describing the system it
was obtained on :)
> (it seems that I do not receive all emails - I never get the emails from
> Andrew ?)
I seem to have issues with my emails just disappearing; sometimes they
never appear on the list and I have to re-send them.
> Concerning the timing: I think generally, you should report the minimum,
> not the average. The numbers for numpy are strange: 3s to compute 3e6
> cos on a 2Ghz core duo (~2000 cycles/item) is very slow. In that sense,
> taking 20 cycles/item for your optimized version is much more
> believable, though :)
I can do minimum. My motivation for average was to show a common-case
performance an application might see. If that application executes the
ufunc many times, the performance will tend towards the average.
> I know the usual libm functions are not super fast, specially if high
> accuracy is not needed. Music softwares and games usually go away with
> approximations which are quite fast (.e.g using cos+sin evaluation at
> the same time), but those are generally unacceptable for scientific
> usage. I think it is critical to always check the result of your
> implementation, because getting something fast but wrong can waste a lot
> of your time :) One thing which may be hard to do is correct nan/inf
> handling. I don't know how SIMD extensions handle this.
I was waiting for someone to bring this up :) I used an implementation
that I'm now thinking is not accurate enough for scientific use. But
the question is, what is a concrete measure for determining whether some
cosine (or other function) implementation is accurate enough? I guess
we have precedent in the form of libm's implementation/accuracy
tradeoffs, but is that precedent correct?
Really answering that question, and coming up with the best possible
implementations that meet the requirements, is probably a GSoC project
on its own.
More information about the Numpy-discussion