[Numpy-discussion] Fast threading solution thoughts
Thu Feb 12 07:19:42 CST 2009
Sturla Molden wrote:
> On 2/12/2009 12:20 PM, David Cournapeau wrote:
>> It does if you have access to the parallel toolbox I mentioned earlier
>> in this thread (again, no experience with it, but I think it is
>> specially popular on clusters; in that case, though, it is not limited
>> to thread-based implementation).
> As has been mentioned, Matlab is a safer language for parallel computing
> as arrays are immutable. There is almost no need for synchronization
> of any sort, except barriers.
> Maybe it is time to implement an immutable ndarray subclass?
> With immutable arrays we can also avoid making temporary arrays in
> expressions like y = a*b + c. y just gets an expression and three
> immutable buffers. And then numexpr (or something like it) can take care
> of the rest.
> As for Matlab, I have noticed that they are experimenting with CUDA now,
> to use nvidia's processors for hardware-acceleration. As even modest
> GPUs can yield hundreds of gigaflops,
No even close. The current generation peaks at around 1.2 TFlops single
precision, 280 GFlops double precision for ATI's hardware. The main
problem with those numbers is that the memory on the graphics card
cannot feed the data fast enough into the GPU to achieve theoretical
peak. So those hundreds of GFlops are pure marketing :)
So in reality you might get anywhere from 20% to 60% (if you are lucky)
locally before accounting for transfers from main memory to GPU memory
and so on. Given that recent Intel CPUs give you about 7 to 11 Glops
Double per core and libraries like ATLAS give you that performance today
without the need to jump through hoops these number start to look a lot
And Nvidia's number are lower than ATI's. NVidia's programming solution
is much more advanced and rounded out compared to ATi's which is largely
in closed beta. OpenCL is mostly vaporware at this point.
> that is going to be hard to match
> (unless we make an ndarray that uses the GPU). But again, as the
> performance of GPUs comes from massive multithreading, immutability may
> be the key here as well.
I have a K10 system with two Tesla C1060 GPUs to play with and have
thought about adding CUDABlas support to Numpy/Scipy, but it hasn't been
a priority for me. My main interest here is finite field arithmetic by
making FFPack via LinBox use CUDABlas. If anyone wants an account to
make numpy/scipy optionally use CUDABlas feel free to ping me off list
and I can set you up.
> Sturla Molden
> Numpy-discussion mailing list
More information about the Numpy-discussion