[Numpy-discussion] Fwd: GPU Numpy
Erik Tollerud
erik.tollerud@gmail....
Thu Aug 6 13:54:52 CDT 2009
Note that this is from a "user" perspective, as I have no particular plan of
developing the details of this implementation, but I've thought for a long
time that GPU support could be great for numpy (I would also vote for OpenCL
support over cuda, although conceptually they seem quite similar)...
But what exactly would the large-scale plan be? One of the advantages of
GPGPUs is that they are particularly suited to rather complicated
paralellizable algorithms, and the numpy-level basic operations are just the
simple arithmatic operations. So while I'd love to see it working, it's
unclear to me exactly how much is gained at the core numpy level, especially
given that it's limited to single-precision on most GPUs.
Now linear algebra or FFTs on a GPU would probably be a huge boon, I'll
admit - especially if it's in the form of a drop-in replacement for the
numpy or scipy versions.
By the way, I noticed no one mentioned the GPUArray class in pycuda (and it
looks like there's something similar in the pyopencl) - seems like that's
already done a fair amount of the work...
http://documen.tician.de/pycuda/array.html#pycuda.gpuarray.GPUArray
On Thu, Aug 6, 2009 at 10:41 AM, James Bergstra
<bergstrj@iro.umontreal.ca>wrote:
> On Thu, Aug 6, 2009 at 1:19 PM, Charles R
> Harris<charlesr.harris@gmail.com> wrote:
> > I almost looks like you are reimplementing numpy, in c++ no less. Is
> there
> > any reason why you aren't working with a numpy branch and just adding
> > ufuncs?
>
> I don't know how that would work. The Ufuncs need a datatype to work
> with, and AFAIK, it would break everything if a numpy ndarray pointed
> to memory on the GPU. Could you explain what you mean a little more?
>
> > I'm also curious if you have thoughts about how to use the GPU
> > pipelines in parallel.
>
> Current thinking for ufunc type computations:
> 1) divide up the tensors into subtensors whose dimensions have
> power-of-two sizes (this permits a fast integer -> ndarray coordinate
> computation using bit shifting),
> 2) launch a kernel for each subtensor in it's own stream to use
> parallel pipelines.
> 3) sync and return.
>
> This is a pain to do without automatic code generation though.
> Currently we're using macros, but that's not pretty.
> C++ has templates, which we don't really use yet, but were planning on
> using. These have some power to generate code.
> The 'theano' project (www.pylearn.org/theano) for which cuda-ndarray
> was created has a more powerful code generation mechanism similar to
> weave. This algorithm is used in theano-cuda-ndarray.
> Scipy.weave could be very useful for generating code for specific
> shapes/ndims on demand, if weave could use nvcc.
>
> James
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20090806/8640266b/attachment-0001.html
More information about the NumPy-Discussion
mailing list