[Numpy-discussion] Proposal for making of Numarray a real Numeric 'NG'

Francesc Altet faltet at carabos.com
Fri Jan 21 05:16:12 CST 2005


Hi List,

I would like to make a formal proposal regarding with the subject of
previous discussions in that list. This message is a bit long, but I've
tried my best to expose my thoughts as clearly as possible.

Is Numarray a good replacement of Numeric?
==========================================

It has been a debate lately with regard to the convinience of claiming
numarray to be a replacement of Numeric. Perhaps the main source for this
claim has been the home page of the Numeric project [1]:

"""
If you are new to Numerical Python, please use Numarray. The older module,
Numeric, is unsupported.  At this writing Numarray is slower for very small
arrays but faster for large ones. Numarray contains facilities to help you
convert older code to use it. Some parts of the community have not made the
switch yet but the Numarray libraries have been carefully named differently
so that Numeric and Numarray can coexist in one application.
"""

So the paragraph is giving the impression that Numeric was going to be
deprecated. While I recognize that I was between those that this statement
lent us to think about numarray as a kind of 'Next Generation of Numeric',
it seems now (from the previous discussions) that this was sort of
unfortunate/misleading observation. In fact, Perry Greenfield, one of the
main authors of numarray will be taking some steps in order to correct that
observation in the near future [2].

However, I'd like to believe (and with me, quite a few more people for sure)
that the mentioned statement, apart of creating some confusion, would
eventually easy the long term convergence of both packages. This would be
great not only to unify efforts, but also to allow the inclusion of
Numeric/Numarray in the Python Standard Library, which would be a Good
Thing.

Numarray vs Numeric: Pros and Cons
==================================

It's worth remembering that Numeric has been a major breakthrough in
introducing the capability to deal with large (homogeneous) datasets in
Python in a very efficient mannner. In my opinion Numarray is, generally
speaking, a very good package as well with many interesting new features
that lack Numeric. Between the main advantages of Numarray vs Numeric I can
list the next (although I can be a bit misleaded here because of my own user
cases of both libraries):

- Memory-mapped objects: Allow working with on-disk numarray objects like if
  they were in-memory.
  
- RecArrays: Objects that allow to deal with heterogeneous datasets
  (tables) in an efficient manner. This ought to be very beneficial in many
  fields.
  
- CharArrays: Allow to work with large amounts of fixed and variable length
  strings. I see this implementation much more powerful that Numeric.
  
- Index arrays within subscripts: e.g. if ind = array([4, 4, 0, 2])
  and x = 2*arange(6), x[inx] results in array([8, 8, 0, 4])

- New design interface: We should not forget that numarray has been designed
  from the ground with Python Library integration in mind (or at least, this
  is my impression). So, it should have more chances (if there is some hope)
  to enter in the Standard Library than Numeric.
  
[See [3] for a more acurate description of differences]

In this point, it would be also fair to recognize the important effort that
has been done by the Numarray crew (and others) to create a fairly good
replacement for Numeric: the API is getting closer bit a bit, the numerix
module makes easier to support both Numeric and numarray by an application
(see [5] for a concrete case of switching between Numeric and Numarray in
SciPy or [6] for matplotlib), the current effort to support Numarray in
SciPy, and last but not least, their good responsiveness to enhancements in
that respect.

The real problem for Numarray: Object Creation Time
===================================================

On the other hand, the main drawback of Numarray vs Numeric is, in my
opinion, its poor performance regarding object creation. This might look
like a banal thing at first glance, but it is not in many cases. One example
recently reported in this list is:

>>> from timeit import Timer
>>> setup = 'import Numeric; a = Numeric.arange(2000);a.shape=(1000,2)'
>>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100)
0.12782907485961914
>>> setup = 'import numarray; a = numarray.arange(2000);a.shape=(1000,2)'
>>> Timer('for i in range(len(a)): row=a[i]', setup).timeit(number=100)
1.2013700008392334

So, numarray performs 10 times slower than Numeric not because its indexing
access code would be 10 times slower, but mainly due to the fact that object
creation is between 5 and 10 times slower, and the loop above implies an
object creation on each iteration.

Other case of use where object creation time is important can be seen in
[4].

Proposal for making of Numarray a real Numeric 'NG' (Next Generation)
=====================================================================

Provided that the most important reason (IMO) to not consider Numarray to be
a good replacement of Numeric is object creation time, I would like to
propose a coordinated effort to solve this precise issue.

First of all, it would be nice if the most experienced people with Numarray
(i.e. the Numarray crew) would give a deep analysis to that, and end with a
series of small, autocontained benchmarks files that clearly exposes the
possible bottlenecks. This maybe hard to do, but this is crucial.

Once the problem has been reduced to optimize these small, auto-contained
benchmarks, they can be made publicly accessible together with an
explanation of what the problem is and what the benchmarks are intended for.
After this, I suggest a call for contributions (in this list and scipy list,
for example) on optimizing this code and spark discussions on that (a Wiki
can work great here). I'm pretty sure that there is enough brain and
challenge-hungry people in these lists to contribute solving the problem.

If after these efforts, there are issues that can't be solved yet, at least
the problem would be much more centered, and much more people can think on
that (hopefully, the solution may not depend on the intricacies of
Numeric/Numarray), so it maybe possible to sent it to the general Python
list and hope that some guru would be willing to help us on that.

Well, this is my proposal. Uh, sorry for the length of the message. Perhaps
you may think that I've smoked too much and maybe you are right. However,
I'm so convinced that such a Numeric/Numarray unification is going to be a
Very Good Thing that I unrecklessly spend some time making this proposal
(and look forward contributing in some way or another if this is going to be
done).

Cheers,

[1] http://www.pfdubois.com/numpy/
[2] http://sourceforge.net/mailarchive/message.php?msg_id=10608642
[3] http://stsdas.stsci.edu/numarray/numarray-1.1.html/node18.html
[4] http://sourceforge.net/mailarchive/message.php?msg_id=10582525
[5] http://aspn.activestate.com/ASPN/Mail/Message/scipy-dev/2299767
[6] http://matplotlib.sourceforge.net/matplotlib.numerix.html


-- 
>qo<   Francesc Altet     http://www.carabos.com/
V  V   Cárabos Coop. V.   Enjoy Data
 ""

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://projects.scipy.org/pipermail/numpy-discussion/attachments/20050121/82e4a4c7/attachment.bin 


More information about the Numpy-discussion mailing list