[Numpy-discussion] Objected-oriented SIMD API for Numpy

Robert Ferrell ferrell@diablotech....
Thu Oct 22 07:40:33 CDT 2009


On Oct 22, 2009, at 1:35 AM, Sturla Molden wrote:

> Robert Kern skrev:
>> No, I think you're right. Using "SIMD" to refer to numpy-like
>> operations is an abuse of the term not supported by any outside
>> community that I am aware of. Everyone else uses "SIMD" to describe
>> hardware instructions, not the application of a single syntactical
>> element of a high level language to a non-trivial data structure
>> containing lots of atomic data elements.
>>
> Then you should pick up a book on parallel computing.
>
> It is common to differentiate between four classes of computers: SISD,
> MISD, SIMD, and MIMD machines.
>
> A SISD system is the classical von Neuman machine. A MISD system is a
> pipelined von Neuman machine, for example the x86 processor.
>
> A SIMD system is one that has one CPU dedicated to control, and a  
> large
> collection of subordinate ALUs for computation. Each ALU has a small
> amount of private memory. The IBM Cell processor is the typical SIMD
> machine.
>
> A special class of SIMD machines are the so-called "vector  
> machines", of
> which the most famous is the Cray C90. The MMX and SSE instructions in
> Intel Pentium processors are an example of vector instructions. Some
> computer scientists regard vector machines a subtype of MISD systems,
> orthogonal to piplines, because there are no subordinate ALUs with
> private memory.
>
> MIMD systems multiple independent CPUs. MIMD systems comes in two
> categories: shared-memory processors (SMP) and distributed-memory
> machines (also called cluster computers). The dual- and quad-core x86
> processors are shared-memory MIMD machines.
>
> Many people associate the word SIMD with SSE due to Intel marketing.  
> But
> to the extent that vector machines are MISD orthogonal to piplined von
> Neuman machines, SSE cannot be called SIMD.
>
> NumPy is a software simulated vector machine, usually executed on MISD
> hardware. To the extent that vector machines (such as SSE and C90) are
> SIMD, we must call NumPy an object-oriented SIMD library.

This is not the terminology I am familiar with.  Calling NumPy an "  
object-oriented SIMD library" is very confusing for me.  I worked in  
the parallel computer world for a while (back in the dark ages) and  
this terminology would have been confusing to everyone I dealt with.   
I've also read many parallel computing books.  In my experience SIMD  
refers to hardware, not software.  There is no reason that NumPy can't  
be written to run great (get good speed-ups) on an 8-core shared  
memory system.  That would be a MIMD system, and there's nothing about  
it that doesn't fit with the NumPy abstraction.  And, although SIMD  
can be a subset of MIMD, there are things that can be done in NumPy  
that be parallelized on MIMD machines but not on SIMD machines (e.g.  
the NumPy vector type is flexible enough it can store a list of tasks,  
and the operations on that vector can be parallelized easily on a  
shared memory MIMD machine - task parallelism - but not on a SIMD  
machine).

If we say that  "NumPy is a software simulated vector machine" or an "  
object-oriented SIMD library" we are pigeonholing NumPy in a way which  
is too limiting and isn't accurate.  As a user it feels to me that  
NumPy is built around various algebra abstractions, many of which map  
well onto vector machine operations.  That means that many of the  
operations are amenable to efficient implementation on SIMD hardware.   
But, IMO, one of the nice features of NumPy is it is built around high- 
level operations, and I would hate to see the project go down a path  
which insists that everything in NumPy be efficient on all SIMD  
hardware.

Of course, I would also love to see implementations which take as much  
advantage of available HW as possible (e.g. exploit SIMD HW if  
available).

That's my $0.02, worth only a couple cents less than that.

-robert



More information about the NumPy-Discussion mailing list