[Numpy-discussion] Insights / lessons learned from NumPy design
Dag Sverre Seljebotn
Fri Jan 4 02:00:42 CST 2013
On 01/04/2013 07:29 AM, Mike Anderson wrote:
> Hello all,
> In the Clojure community there has been some discussion about creating a
> common matrix maths library / API. Currently there are a few different
> fledgeling matrix libraries in Clojure, so it seemed like a worthwhile
> effort to unify them and have a common base on which to build on.
> NumPy has been something of an inspiration for this, so I though I'd ask
> here to see what lessons have been learned.
> We're thinking of a matrix library with roughly the following design
> (subject to change!)
> - Support for multi-dimensional matrices (but with fast paths for 1D
> vectors and 2D matrices as the common cases)
Food for thought: Myself I have vectors that are naturally stored in 2D,
"matrices" that can be naturally stored in 4D and so on (you can't view
them that way when doing linear algebra, it's just that the indices can
have multiple components) -- I like that NumPy calls everything "array";
I think vector and matrix are higher-level mathematical concepts.
> - Immutability by default, i.e. matrix operations are pure functions
> that create new matrices. There could be a "backdoor" option to mutate
> matrices, but that would be unidiomatic in Clojure
Sounds very promising (assuming you can reuse the buffer if the input
matrix had no other references and is not used again?). It's very common
for NumPy arrays to fill a large chunk of the available memory (think
20-100 GB), so for those users this would need to be coupled with buffer
reuse and good diagnostics that help remove references to old
generations of a matrix.
> - Support for 64-bit double precision floats only (this is the standard
> float type in Clojure)
> - Ability to support multiple different back-end matrix implementations
> (JBLAS, Colt, EJML, Vectorz, javax.vecmath etc.)
> - A full range of matrix operations. Operations would be delegated to
> back end implementations where they are supported, otherwise generic
> implementations could be used.
> Any thoughts on this topic based on the NumPy experience? In particular
> would be very interesting to know:
> - Features in NumPy which proved to be redundant / not worth the effort
> - Features that you wish had been designed in at the start
> - Design decisions that turned out to be a particularly big mistake /
> Would love to hear your insights, any ideas+advice greatly appreciated!
Travis Oliphant noted some of his thoughts on this in the recent thread
"DARPA funding for Blaze and passing the NumPy torch" which is a must-read.
More information about the NumPy-Discussion