[Numpy-discussion] Fwd: GPU Numpy
Thu Aug 6 14:03:15 CDT 2009
2009/8/6 Erik Tollerud <email@example.com>:
> Note that this is from a "user" perspective, as I have no particular plan of
> developing the details of this implementation, but I've thought for a long
> time that GPU support could be great for numpy (I would also vote for OpenCL
> support over cuda, although conceptually they seem quite similar)...
> But what exactly would the large-scale plan be? One of the advantages of
> GPGPUs is that they are particularly suited to rather complicated
> paralellizable algorithms,
You mean simple parallizable algorithms, I suppose?
and the numpy-level basic operations are just the
> simple arithmatic operations. So while I'd love to see it working, it's
> unclear to me exactly how much is gained at the core numpy level, especially
> given that it's limited to single-precision on most GPUs.
> Now linear algebra or FFTs on a GPU would probably be a huge boon, I'll
> admit - especially if it's in the form of a drop-in replacement for the
> numpy or scipy versions.
> By the way, I noticed no one mentioned the GPUArray class in pycuda (and it
> looks like there's something similar in the pyopencl) - seems like that's
> already done a fair amount of the work...
> On Thu, Aug 6, 2009 at 10:41 AM, James Bergstra <firstname.lastname@example.org>
>> On Thu, Aug 6, 2009 at 1:19 PM, Charles R
>> Harris<email@example.com> wrote:
>> > I almost looks like you are reimplementing numpy, in c++ no less. Is
>> > there
>> > any reason why you aren't working with a numpy branch and just adding
>> > ufuncs?
>> I don't know how that would work. The Ufuncs need a datatype to work
>> with, and AFAIK, it would break everything if a numpy ndarray pointed
>> to memory on the GPU. Could you explain what you mean a little more?
>> > I'm also curious if you have thoughts about how to use the GPU
>> > pipelines in parallel.
>> Current thinking for ufunc type computations:
>> 1) divide up the tensors into subtensors whose dimensions have
>> power-of-two sizes (this permits a fast integer -> ndarray coordinate
>> computation using bit shifting),
>> 2) launch a kernel for each subtensor in it's own stream to use
>> parallel pipelines.
>> 3) sync and return.
>> This is a pain to do without automatic code generation though.
>> Currently we're using macros, but that's not pretty.
>> C++ has templates, which we don't really use yet, but were planning on
>> using. These have some power to generate code.
>> The 'theano' project (www.pylearn.org/theano) for which cuda-ndarray
>> was created has a more powerful code generation mechanism similar to
>> weave. This algorithm is used in theano-cuda-ndarray.
>> Scipy.weave could be very useful for generating code for specific
>> shapes/ndims on demand, if weave could use nvcc.
>> NumPy-Discussion mailing list
> NumPy-Discussion mailing list
Information System Engineer, Ph.D.
Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92
More information about the NumPy-Discussion