[Numpy-discussion] Re: Re-implementation of Python Numerical arrays (Numeric) available for download
perry at stsci.edu
Mon Nov 26 12:59:03 CST 2001
> From: Chris Barker <chrishbarker at home.net>
> To: Perry Greenfield <perry at stsci.edu>,
> numpy-discussion at lists.sourceforge.net
> Subject: [Numpy-discussion] Re: Re-implementation of Python
> Numerical arrays (Numeric) available
> for download
> I used Poor wording. When I wrote "datatypes", I meant data types in a
> much higher order sense. Perhaps structures or classes would be a better
> term. What I mean is that is should be easy to use an manipulate the
> same multidimensional arrays from both Python and C/C++. In the current
> Numeric, most folks generate a contiguous array, and then just use the
> array->data pointer to get what is essentially a C array. That's fine if
> you are using it in a traditional C way, with fixed dimension, one
> datatype, etc. What I'm imagining is having an object in C or C++ that
> could be easily used as a multidimentional array. I'm thinking C++ would
> probably neccesary, and probably templates as well, which is why blitz++
> looked promising. Of course, blitz++ only compiles with a few up-to-date
> compilers, so you'd never get it into the standard library that way!
Yes, that was an important issue (C++ and the Python Standard Library).
And yes, it is not terribly convenient to access multi-dimensional
arrays in C (of varying sizes). We don't solve that problem in the
way a C++ library could. But I suppose that some might say that C++
libraries may introduce their own, new problems. But coming up with
the one solution to all scientific computing appears well beyond our
grasp at the moment. If someone does see that solution, let us know!
> I agree, but from the newsgroup, it is clear that a lot of folks are
> very reluctant to use something that is not part of the standard
We agree that getting into the standard library is important.
> > > > We estimate
> > > > that numarray is probably another order of magnitude worse,
> > > > i.e., that 20K element arrays are at half the asymptotic
> > > > speed. How much should this be improved?
> > >
> > > A lot. I use arrays smaller than that most of the time!
> > >
> > What is good enough? As fast as current Numeric?
> As fast as current Numeric would be "good enough" for me. It would be a
> shame to go backwards in performance!
> > (IDL does much
> > better than that for example).
> My personal benchmark is MATLAB, which I imagine is similar to IDL in
We'll see if we can match current performance (or at least present usable
alternative approaches that are faster).
> > 10 element arrays will never be
> > close to C speed in any array based language embedded in an
> > interpreted environment.
> Well, sure, I'm not expecting that
> > 100, maybe, but will be very hard.
> > 1000 should be possible with some work.
> I suppose MATLAB has it easier, as all arrays are doubles, and, (untill
> recently anyway), all variable where arrays, and all arrays were 2-d.
> NumPy is a lot more flexible that that. Is is the type and size checking
> that takes the time?
Probably, but we haven't started serious benchmarking yet so I wouldn't
put much stock in what I say now.
> One of the things I do a lot with are coordinates of points and
> polygons. Sets if points I can handle easily as an NX2 array, but
> polygons don't work so well, as each polgon has a different number of
> points, so I use a list of arrays, which I have to loop over. Each
> polygon can have from about 10 to thousands of points (mostly 10-20,
> however). One way I have dealt with this is to store a polygon set as a
> large array of all the points, and another array with the indexes of the
> start and end of each polygon. That way I can transform the coordinates
> of all the polygons in one operation. It works OK, but sometimes it is
> more useful to have them in a sequence.
This is a good example of an ensemble of variable sized arrays.
> > As mentioned,
> > we tend to deal with large data sets and so I don't think we have
> > a lot of such examples ourselves.
> I know large datasets were one of your driving factors, but I really
> don't want to make performance on smaller datasets secondary.
> Christopher Barker,
That's why we are asking, and it seems so far that there are enough
of those that do care about small arrays to spend the effort to
significantly improve the performance.
More information about the Numpy-discussion