[Numpy-discussion] Insights / lessons learned from NumPy design

Chris Barker - NOAA Federal chris.barker@noaa....
Wed Jan 9 11:38:46 CST 2013


On Wed, Jan 9, 2013 at 2:35 AM, Mike Anderson

>> First -- is this a "matrix" library, or a general use nd-array
>> library? That will drive your design a great deal.

> This is very useful context - thanks! I've had opinions in favour of both an
> nd-array style library and a matrix library. I guess it depends on your use
> case which one you are more inclined to think in.
>
> I'm hoping that it should be possible for the same API to support both, i.e.
> you should be able to use a 2D array of numbers as a matrix, and vice-versa.

sure, but the API can/should be differnent -- in some sense, the numpy
matrix object is really just syntactic sugar -- you can use a 2-d
array as a matrix, but then you have to explicilty call linear algebra
functions to get things like matrix multiplication, etc. and do some
hand work to make sure you're got things the right shape -- i.e a
column or row vector where called for.

tacking on the matrix object helped this, but in practice, it gets
tricky to prevent operations from accidentally returning a plan array
from operations on a matrix.

Also numpy's matrix concept does not include the concept of  a row or
column vector, just 1XN or NX1 matrixes -- which works OK, but then
when you iterate through a vector, you get 1X1 matrixes, rather than
scalars -- a bit odd.

Anyway, it takes some though to have two clean APIs sharing one core object.

>> not a bad start, but another major strength of numpy is the multiple
>> data types - you may wantt to design that concept in from the start.

> But I'm curious: what is the main use case for the alternative data types in
> NumPy? Is it for columns of data of heterogeneous types? or something else?

heterogeneous data types were added relatively recently in numpy, and
are great mostly for interacting with other libraries (and some
syntactic sugar uses...) that may store data in arrays of structures.

But multiple homogenous data types are critical for saving memory,
speeding operations, doing integer math when that's really called for,
manipulating images, etc, etc.....

> 20-100GB is pretty ambitious and I guess reflects the maturity of
> NumPy -  I'd be happy with good handling of 100MB matrices right
> now.....

100MB is prety darn small these days -- if you're only interested in
smallish problems, then you can probably forget about performance
issues, and focus on a really nice API. But I"m not sure I'd bother
with that -- once people start using it, they'll want to use it for
big problems!

-Chris


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov


More information about the NumPy-Discussion mailing list