[Numpy-discussion] Objected-oriented SIMD API for Numpy
Wed Oct 21 03:05:24 CDT 2009
Mathieu Blondel wrote:
> I saw the video of Peter Norvig at the last Scipy conference who was
> suggesting to merge Numpy into Cython. The SIMD API would be an
> argument in favor of this too because of the possible interactions
> between such a SIMD API and an array API.
Hm, I don't remember this - I guess I would have to look at the video.
Do you know at which point of the presentation he discussed about SIMD ?
> My original idea was to write the code in C with Intel/Alvitec/Neon
> intrinsics and have this code binded to be able to call it from
> Python. So the SIMD code would be compiled already, ready to be called
> from Python. Like you said, there's a risk that the overhead of
> calling Python is bigger than the benefit of using SIMD instructions.
> If it's worth trying out, an experiment can be made with Vector4f to
> see if it's even worth continuing with other types.
I am quite confident that the overhead will be way too significant for
this approach to be useful. If you have two python objects, using + on
it will induce at least one function call, and most likely several
function calls at the python level. Python function calls are painfully
slow (several thousand cycles per call in the most optimistic case).
Python overhead is several order of magnitude bigger than what you can
earn between SIMD and straightforward C. The only way I can see to make
this work is to generate SIMD code from python (which would be a poor
man's replacement for a JIT in a way), there was a presentation
following this direction at scipy 09 conference.
> I recently used SIMD instructions for a project and I realized that
> they cannot be activated in a standard Debian package, because the
> package has to remain general-purpose. So people who want to benefit
> the speed up have to compile my project from source...
Yes - that's unacceptable IMHO. The real solution is to include all the
code at build time, detect at *runtime* which ISA is supported, and
select the functions accordingly. The problem is that loading shared
code at runtime in a cross platform way is complicated - python already
does it, but unfortunately does not provide a C API for it AFAIK, so we
would have to re-implement it in python.
More information about the NumPy-Discussion