[Numpy-discussion] Fast threading solution thoughts

Michael Abshoff michael.abshoff@googlemail....
Thu Feb 12 07:19:42 CST 2009

Sturla Molden wrote:
> On 2/12/2009 12:20 PM, David Cournapeau wrote:


>> It does if you have access to the parallel toolbox I mentioned earlier
>> in this thread (again, no experience with it, but I think it is
>> specially popular on clusters; in that case, though, it is not limited
>> to thread-based implementation).
> As has been mentioned, Matlab is a safer language for parallel computing 
>   as arrays are immutable. There is almost no need for synchronization 
> of any sort, except barriers.
> Maybe it is time to implement an immutable ndarray subclass?
> With immutable arrays we can also avoid making temporary arrays in 
> expressions like y = a*b + c. y just gets an expression and three 
> immutable buffers. And then numexpr (or something like it) can take care 
> of the rest.
> As for Matlab, I have noticed that they are experimenting with CUDA now, 
> to use nvidia's processors for hardware-acceleration. As even modest 
> GPUs can yield hundreds of gigaflops, 

No even close. The current generation peaks at around 1.2 TFlops single 
precision, 280 GFlops double precision for ATI's hardware. The main 
problem with those numbers is that the memory on the graphics card 
cannot feed the data fast enough into the GPU to achieve theoretical 
peak. So those hundreds of GFlops are pure marketing :)

So in reality you might get anywhere from 20% to 60% (if you are lucky) 
locally before accounting for transfers from main memory to GPU memory 
and so on. Given that recent Intel CPUs give you about 7 to 11 Glops 
Double per core and libraries like ATLAS give you that performance today 
without the need to jump through hoops these number start to look a lot 
less impressive.

And Nvidia's number are lower than ATI's. NVidia's programming solution 
is much more advanced and rounded out compared to ATi's which is largely 
in closed beta. OpenCL is mostly vaporware at this point.

 > that is going to be hard to match
> (unless we make an ndarray that uses the GPU). But again, as the 
> performance of GPUs comes from massive multithreading, immutability may 
> be the key here as well.

I have a K10 system with two Tesla C1060 GPUs to play with and have 
thought about adding CUDABlas support to Numpy/Scipy, but it hasn't been 
a priority for me. My main interest here is finite field arithmetic by 
making FFPack via LinBox use CUDABlas. If anyone wants an account to 
make numpy/scipy optionally use CUDABlas feel free to ping me off list 
and I can set you up.

> Sturla Molden



> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion@scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion

More information about the Numpy-discussion mailing list