[Numpy-discussion] future directions
Dag Sverre Seljebotn
Fri Aug 28 12:52:22 CDT 2009
Fons Adriaensen wrote:
> Some weeks ago there was a post on this list requesting feedback
> on possible future directions for numpy. As I was quite busy at that
> time I'll reply to it now.
> My POV is that of a novice user, who at the same time wants quite
> badly to use the numpy framework for his numerical work which in
> this case is related to (some rather advanced) multichannell audio
I'm reluctantly joining the discussion... (reluctant because, as
interesting as these discussions may be, (relatively) simple things that
everyone agrees about like Python 3 compatability and PEP 3118 support
is still some ways off. Agreeing on things doesn't make it happen.)
> >From that POV, I'd suggest the following:
> 1. Adopt an object based on Python-3's buffer protocol as the
> basic array type. It's immensely more powerful than ndarray,
> while at the same time it's close enough to ndarray to allow
> a gradual adoption.
It's not immensely more powerful? It allows pointers, that's right, but
that's primarily for exporting data from data providers...
For things like "pointers to images" (which PEP 3118 could be used for),
Python lists usually work better anyway because they can be appended.
I think the whole idea of the protocol is that you can start passing
around data in *various* containers. Adopting a new array type as the
"basic array type" basically defeats this purpose.
My way of thinking of it is: Focus shifted over on the NumPy library
providing ufuncs, not array container. I think we'll in some years be
doing np.sin(x, out=y) without x or y being ndarrays at all.
One conclusion: All of this might call for a new library which tries to
focus more and support a wider set of memory layouts. But, well, it's
just to go ahead and do that! -- but I don't think NumPy can be turned
into it, nor do the NumPy developers likely have time to spare for that.
If you wait a year, such a library might be a 100-liner in Cython :-)
Actually, I right now think the best way of getting such a library
implemented is help out on Cython's array features, then export Cython's
arrays to Python-space in a library.
One BIG gotcha people should be aware about here is that PEP 3118
supports "fancy indexing as views".
I.e. with an object based on PEP 3118's memory model you could
b = a[a == 2]
b = 3
and have that change a!
I believe these semantics to be superior myself (because you can always
do "b = a[a==2].copy()" to get NumPy's behaviour).
But it does raise some interesting questions about consistency vs.
subtle API breakage etc.
> 2. Adopting that format will make it even more important to
> clearly define in which cases data gets copied and when not.
> This should be based on some simple rules that can be evaluated
> by a code author without requiring a lookup in the reference
> docs each time.
I think NumPy's already doing quite good here, except for the case of
fancy indexing as mentioned above. Cleaning up various incarnations of
"reshape" etc. to be consistent here would be good too (my vote is for
never doing any automatic copying in methods like reshape, but I
actually haven't checked what the semantics ended up being in the end).
(BTW, I was recently observed saying I might chip in and implement PEP
3118 for NumPy around November. If anyone wants to beat me to it then
I'd be happy of course.)
More information about the NumPy-Discussion