[Numpy-discussion] Insights / lessons learned from NumPy design

Mike Anderson mike.r.anderson.13@gmail....
Wed Jan 9 04:35:29 CST 2013


On 8 January 2013 02:08, Chris Barker - NOAA Federal
<chris.barker@noaa.gov>wrote:

> On Thu, Jan 3, 2013 at 10:29 PM, Mike Anderson
> <mike.r.anderson.13@gmail.com> wrote:
> > In the Clojure community there has been some discussion about creating a
> > common matrix maths library / API. Currently there are a few different
> > fledgeling matrix libraries in Clojure, so it seemed like a worthwhile
> > effort to unify them and have a common base on which to build on.
> >
> > NumPy has been something of an inspiration for this, so I though I'd ask
> > here to see what lessons have been learned.
>
> A few thoughts:
>
> > We're thinking of a matrix library
>
> First -- is this a "matrix" library, or a general use nd-array
> library? That will drive your design a great deal. For my part, I came
> from MATLAB, which started our very focused on matrixes, then extended
> to be more generally useful. Personally, I found the matrix-focus to
> get in the way more than help -- in any "real" code, you're the actual
> matrix operations are likely to be a tiny fraction of the code.
>
> One reason I like numpy is that it is array-first, with secondary
> support for matrix stuff.
>
> That being said, there is the numpy matrix type, and there are those
> that find it very useful. particularly in teaching situations, though
> it feels a bit "tacked-on", and that does get in the way, so if you
> want a "real" matrix object, but also a general purpose array lib,
> thinking about both up front will be helpful.
>

This is very useful context - thanks! I've had opinions in favour of both
an nd-array style library and a matrix library. I guess it depends on your
use case which one you are more inclined to think in.

I'm hoping that it should be possible for the same API to support both,
i.e. you should be able to use a 2D array of numbers as a matrix, and
vice-versa.


>
> > - Support for multi-dimensional matrices (but with fast paths for 1D
> vectors
> > and 2D matrices as the common cases)
>
> what is a multi-dimensional matrix? -- is a 3-d something, a stack of
> matrixes? or something else? (note, numpy lacks this kind of object,
> but it is sometimes asked for -- i.e a way to do fast matrix
> multiplication with a lot of small matrixes)
>
> I think fast paths for 1-D and 2-D is secondary, though you may want
> "easy paths" for those. IN particular, if you want good support for
> linear algebra (matrixes), then having a clean and natural "row vector
> and  "column vector" would be nice. See the archives of this list for
> a bunch of discussion about that -- and what the weaknesses are of the
> numpy matrix object.
>
> > - Immutability by default, i.e. matrix operations are pure functions that
> > create new matrices.
>
> I'd be careful about this -- the purity and predictability is nice,
> but these days a lot of time is spend allocating and moving memory
> around -- numpy array's mutability is a major key feature -- indeed,
> the key issues with performance with numpy surrond the fact that many
> copies may be made unnecessarily (note, Dag's suggesting of lazy
> evaluation may mitigate this to some extent).
>

Interesting and very useful to know. Sounds like we should definitely allow
for mutable arrays / zero-copy operations in that case if that is proving
to be a big bottleneck.


>
> > - Support for 64-bit double precision floats only (this is the standard
> > float type in Clojure)
>
> not a bad start, but another major strength of numpy is the multiple
> data types - you may wantt to design that concept in from the start.
>

Sounds like good advice and that should be possible to accomodate in the
design.

But I'm curious: what is the main use case for the alternative data types
in NumPy? Is it for columns of data of heterogeneous types? or something
else?


>
> > - Ability to support multiple different back-end matrix implementations
> > (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.)
>
> This ties in to another major strength of numpy -- ndarrays are both
> powerful python objects, and wrappers around standard C arrays -- that
> makes it pretty darn easy to interface with external libraries for
> core computation.


Great - good to know we are on the right track with this one.

Thanks Chris for all your comments / suggestions!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20130109/b380d0c2/attachment.html 


More information about the NumPy-Discussion mailing list