[Numpy-discussion] Insights / lessons learned from NumPy design

Mike Anderson mike.r.anderson.13@gmail....
Sun Jan 13 06:08:18 CST 2013


On 10 January 2013 05:19, Chris Barker - NOAA Federal <chris.barker@noaa.gov
> wrote:

> On Wed, Jan 9, 2013 at 2:57 AM, Mike Anderson
>
> > I'm hoping the API will be independent of storage format - i.e. the
> > underlying implementations can store the data any way they like. So the
> API
> > will be written in terms of abstractions, and the user will have the
> choice
> > of whatever concrete implementation best fits the specific needs. Sparse
> > matrices, tiled matrices etc. should all be possible options.
>
> A note about that -- as I think if it, numpy arrays are two things:
>
> 1) a python object for working with numbers, in a wide variety of ways
>
> 2) a wrapper around a C-array (or data block) that can be used to
> provide an easyway for Python to interact with C (and Fortran, and...)
> libraries, etc.
>
> As it turns out a LOT of people use numpy for (2) -- what this means
> is that while you could change the underlying data representation,
> etc, and keep the same Python API -- such changes would break a lot of
> non-pure-python code that relies on that data representation.
>
> This is a big issue with the numpy-for-PyPy project -- they could
> write a numpy clone, but it would only be useful for the pure-python
> stuff.
>
> Even then, a number of folks do tricks with numpy arrays in python
> that rely on the underlying structure.
>
> Not sure how all this would play out for Clojure, but it's something
> to keep in mind.


Thanks Chris -  this is a really helpful insight.

Trying to translate that into the Clojure world, I think that's roughly
equivalent to the separation between the API (roughly equivalent to the
methods in the ndarray referred to in 1) from the specific implementations
(which will probably include a data block ndarray-style wrapper like 2, but
would also leave open other implementation options).

That way the majority of users can code purely against the API, and they
won't be affected if (when?) the underlying implementation changes. In this
way, they should be able to get the benefits of 2) without building a
direct dependency on it.

Of course, I still expect some users to circumvent the API and build a
dependency on the underlying implementation. Nothing we can do to stop
that, and they may even have good reasons like hardcore performance
optimization. We have to assume at that point they know what they are doing
and are prepared to live with the consequences :-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20130113/1bf3ca0b/attachment.html 


More information about the NumPy-Discussion mailing list