[Numpy-discussion] Objected-oriented SIMD API for Numpy
Andrew Friedley
afriedle@indiana....
Wed Oct 21 15:36:38 CDT 2009
sigh; yet another email dropped by the list.
David Warde-Farley wrote:
> On 21-Oct-09, at 9:14 AM, Pauli Virtanen wrote:
>
>> Since these are ufuncs, I suppose the SSE implementations could just
>> be
>> put in a separate module, which is always compiled. Before importing
>> the
>> module, we could simply check from Python side that the CPU supports
>> the
>> necessary instructions. If everything is OK, the accelerated
>> implementations would then just replace the Numpy routines.
>
> Am I mistaken or wasn't that sort of the goal of Andrew Friedley's
> CorePy work this summer?
>
> Looking at his slides again, the speedups are rather impressive. I
> wonder if these could be usefully integrated into numpy itself?
Yes, my GSoC project is closely related, though I didn't do the CPU
detection part, that'd be easy to do. Also I wrote my code specifically
for 64-bit x86.
I didn't focus so much on the transcendental functions, though they
wouldn't be too hard to implement. There's also the possibility to
provide implementations with differing tradeoffs between accuracy and
performance.
I think the blog link got posted already, but here's relevant info:
http://numcorepy.blogspot.com
http://www.corepy.org/wiki/index.php?title=CoreFunc
I talked about this in my SciPy talk and up-coming paper, as well.
Also people have just been talking about x86 in this thread -- other
architectures could be supported too; eg PPC/Altivec or even Cell SPU
and other accelerators. I actually wrote a quick/dirty implementation
of addition and vector normalization ufuncs for Cell SPU recently. Basic
result is that overall performance is very roughly comparable to a
similar speed x86 chip, but this is a huge win over just running on the
extremely slow Cell PPC cores.
Andrew
More information about the NumPy-Discussion
mailing list