# [Numpy-discussion] Insights / lessons learned from NumPy design

Mike Anderson mike.r.anderson.13@gmail....
Wed Jan 9 04:49:06 CST 2013

On 4 January 2013 16:00, Dag Sverre Seljebotn <d.s.seljebotn@astro.uio.no>wrote:

> On 01/04/2013 07:29 AM, Mike Anderson wrote:
> > Hello all,
> >
> > In the Clojure community there has been some discussion about creating a
> > common matrix maths library / API. Currently there are a few different
> > fledgeling matrix libraries in Clojure, so it seemed like a worthwhile
> > effort to unify them and have a common base on which to build on.
> >
> > NumPy has been something of an inspiration for this, so I though I'd ask
> > here to see what lessons have been learned.
> >
> > We're thinking of a matrix library with roughly the following design
> > (subject to change!)
> > - Support for multi-dimensional matrices (but with fast paths for 1D
> > vectors and 2D matrices as the common cases)
>
> Food for thought: Myself I have vectors that are naturally stored in 2D,
> "matrices" that can be naturally stored in 4D and so on (you can't view
> them that way when doing linear algebra, it's just that the indices can
> have multiple components) -- I like that NumPy calls everything "array";
> I think vector and matrix are higher-level mathematical concepts.
>

Very interesting. Can I ask what the application is? And is it equivalent
from a mathematical perspective to flattening the 2D vectors into very long
1D vectors?

>
> > - Immutability by default, i.e. matrix operations are pure functions
> > that create new matrices. There could be a "backdoor" option to mutate
> > matrices, but that would be unidiomatic in Clojure
>
> Sounds very promising (assuming you can reuse the buffer if the input
> matrix had no other references and is not used again?). It's very common
> for NumPy arrays to fill a large chunk of the available memory (think
> 20-100 GB), so for those users this would need to be coupled with buffer
> reuse and good diagnostics that help remove references to old
> generations of a matrix.
>

Yes it should be possible to re-use buffers, though to some extent that
would depend on the underlying matrix library implementation. The JVM makes
things a bit interesting here - the GC is extremely good but it doesn't
play particularly nicely with non-Java native code.

20-100GB is pretty ambitious and I guess reflects the maturity of NumPy -
I'd be happy with good handling of 100MB matrices right now.....

>
> > - Support for 64-bit double precision floats only (this is the standard
> > float type in Clojure)
> > - Ability to support multiple different back-end matrix implementations
> > (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.)
> > - A full range of matrix operations. Operations would be delegated to
> > back end implementations where they are supported, otherwise generic
> > implementations could be used.
> >
> > Any thoughts on this topic based on the NumPy experience? In particular
> > would be very interesting to know:
> > - Features in NumPy which proved to be redundant / not worth the effort
> > - Features that you wish had been designed in at the start
> > - Design decisions that turned out to be a particularly big mistake /
> > success
> >
> > Would love to hear your insights, any ideas+advice greatly appreciated!
>
> Travis Oliphant noted some of his thoughts on this in the recent thread
> "DARPA funding for Blaze and passing the NumPy torch" which is a must-read.
>