[Numpy-discussion] Numarray header PEP
perry at stsci.edu
Thu Jul 1 13:57:02 CDT 2004
Collin J. Williams Wrote:
> I feel lower on the understanding tree with respect to what is being
> proposed in the draft PEP, but would still like to offer my 2 cents
> worth. I get the feeling that numarray is being bent out of shape to
> fit Numeric.
Todd and Gerard address this point well.
> It was my understanding that Numeric had certain weakness which made it
> unacceptable as a Python component and that numarray was intended to
> provide the same or better functionality within a pythonic framework.
Let me reiterate what our motivations were. We wanted to use
an array package for our software, and Numeric had enough
shortcomings that we needed some changes in behavior (e.g.,
type coercion for scalars), changes in performance (particularly
with regard to memory usage), and enhancements in capabilities
(e.g., memory mapping, record arrays, etc.). It was the opinion
of some (Paul Dubois, for example) that a rewrite was in order in
any case since the code was not that maintainable (not everyone felt
this way, though at the time that wasn't as clear).
At the same time there was some hope that Numeric could be accepted
into the standard Python distribution. That's something we thought
would be good (but wasn't the highest priority for us) and I've
come to believe that perhaps a better solution with regard to that
is what this PEP is trying to address. In any case Guido made it
clear that he would not accept Numeric in its (then) current form.
That it be written mostly in Python was something suggested by
Guido, and we started off that way, mainly because it would get
us going much faster than writing it all in C. We definitely
understood that it would also have the consequence of making
small array performance worse. We said as much when we started;
it wasn't as clear as it is now that many users objected to a factor
of few slower performance (as it turned out, a mostly Python based
implemenation was more than an order of magnitude slower for small
> numarray has not achieved the expected performance level to date, but
> progress is being made and I believe that, for larger arrays, numarray
> has been shown to be be superior to Numeric - please correct me if I'm
> wrong here.
We never expected numarray to ever reach the performance level for small
arrays that Numeric has. If it were within a factor of two I would be
thrilled (its more like a factor of 3 or 4 currently for simple ufuncs).
I still don't think it ever will be as fast for small arrays. The
focus all along was on handling large arrays, which I think it does
quite well, both regard to memory and speed. Yes, there are some
functions and operations that may be much slower. Mainly they
need to be called out so they can be improved. Generally we
only notice performance issues that affect our software. Others
need to point out remaining large discrepancies.
I'm still of the opinion that if small array performance is really
important, a very different approach should be used and have a
completely different implementation. I would think that improvements
of an order of magnitude over what Numeric does now are possible.
But since that isn't important to us (STScI), don't expect us to work on
> The shock came for me when Todd Miller said:
> I looked at this some, and while INCREFing __dict__ maybe the right
> idea, I forgot that there *is no* Python NumArray.__init__ anymore.
> Wasn't it the intent of numarray to work towards the full use of the
> Python class structure to provide the benefits which it offers?
> The Python class has two constructors and one destructor.
> The constructors are __init__ and __new__, the latter only provides the
> shell of an instance which later has to be initialized. In version 0.9,
> which I use, there is no __new__, but there is a new function which has
> a functionality similar to that intended for __new__. Thus, with this
> change, numarray appears to be moving further away from being pythonic.
I'll agree that optimization is driving the underlying implementation to
one that is more complex and that is the drawback (no surprise there).
There's Pythonic in use and Pythonic in implementation. We are certainly
receptive to better ideas for the implementation, but I doubt that
a heavily Python-based implementation is ever going to be competitive
for small arrays (unless something like psyco become universal, but
I think there are a whole mess of problems to be solved for that kind
of approach to work well generically).
More information about the Numpy-discussion