[Numpy-discussion] numpy ufuncs and COREPY - any info?

David Cournapeau david@ar.media.kyoto-u.ac...
Mon May 25 20:11:56 CDT 2009

Charles R Harris wrote:
> On Mon, May 25, 2009 at 4:59 AM, Andrew Friedley <afriedle@indiana.edu
> <mailto:afriedle@indiana.edu>> wrote:
>     For some reason the list seems to occasionally drop my messages...
>     Francesc Alted wrote:
>     > A Friday 22 May 2009 13:52:46 Andrew Friedley escrigué:
>     >> I'm the student doing the project.  I have a blog here, which
>     contains
>     >> some initial performance numbers for a couple test ufuncs I did:
>     >>
>     >> http://numcorepy.blogspot.com
>     >> Another alternative we've talked about, and I (more and more
>     likely) may
>     >> look into is composing multiple operations together into a
>     single ufunc.
>     >>   Again the main idea being that memory accesses can be
>     reduced/eliminated.
>     >
>     > IMHO, composing multiple operations together is the most
>     promising venue for
>     > leveraging current multicore systems.
>     Agreed -- our concern when considering for the project was to keep the
>     scope reasonable so I can complete it in the GSoC timeframe.  If I
>     have
>     time I'll definitely be looking into this over the summer; if not
>     later.
>     > Another interesting approach is to implement costly operations
>     (from the point
>     > of view of CPU resources), namely, transcendental functions like
>     sin, cos or
>     > tan, but also others like sqrt or pow) in a parallel way.  If
>     besides, you can
>     > combine this with vectorized versions of them (by using the well
>     spread SSE2
>     > instruction set, see [1] for an example), then you would be able
>     to achieve
>     > really good results for sure (at least Intel did with its VML
>     library ;)
>     >
>     > [1] http://gruntthepeon.free.fr/ssemath/
>     I've seen that page before.  Using another source [1] I came up with a
>     quick/dirty cos ufunc.  Performance is crazy good compared to NumPy
>     (100x); see the latest post on my blog for a little more info.  I'll
>     look at the source myself when I get time again, but is NumPy using a
>     Python-based cos function, a C implementation, or something else?
>      As I
>     wrote in my blog, the performance gain is almost too good to believe.
> Numpy uses the C library version. If long double and float aren't
> available the double version is used with number conversions, but that
> shouldn't give a factor of 100x. Something else is going on.

I think something is wrong with the measurement method - on my machine,
computing the cos of an array of double takes roughly ~400 cycles/item
for arrays with a reasonable size (> 1e3 items). Taking 4 cycles/item
for cos would be very impressive :)


More information about the Numpy-discussion mailing list