[Numpy-discussion] Comments on the Numarray/Numeric disscussion
Travis E. Oliphant
oliphant at ee.byu.edu
Wed Jan 21 16:00:01 CST 2004
I would like to thank the contributors to the discussion as I think one
of the problems we have had lately is that people haven't been talking
much. Partly because we have some fundamental differences of opinion
caused by different goals and partly because we are all busy working on
a variety of other pressing projects.
The impression has been that Numarray will replace Numeric. I agree
with Perry that this has always been less of a consensus and more of a
hope. I am more than happy for Numarray to replace Numeric as long as
it doesn't mean all my code slows down. I would say the threshold is
that my code can't slow down by more than a factor of 10%.
If there is a code-base out there (Numeric) that can allow my code to
run 10% faster it will get used.
I also don't think it's ideal to have multiple N-D arrays running around
there, but if they all have the same interface then it doesn't really
The two major problems I see with Numarray replacing Numeric are
1) How is UFunc support? Can you create ufuncs in C easily (with a
single function call or something similar).
2) Speed for small arrays (array creation is the big one).
It is actually quite a common thing to have a loop during which many
small arrays get created and destroyed. Yes, you can usually make such
code faster by "vectorizing" (if you can figure out how). But the
average scientist just wants to (and should be able to) just write a loop.
Regarding speed issues. Actually, there are situations where I am very
unsatisfied with Numeric's speed performance and so the goal for
Numarray should not be to achieve some percentage of Numeric's
performance but to beat it.
Frankly, I don't see how you can get speed that I'm talking about by
carrying around a lot of extras like byte-swapping support,
memory-mapping support, record-array support.
*Question*: Is there some way to turn on a flag in Numarray so that all
of the extra stuff is ignored (i.e. create a small-array that looks on a
binary level just like a Numeric array) ? It would seem to me that
this is the only way that the speed issue will go away.
Given that 1) Numeric already works and given that all of my code
depends on it 2) Numarray doesn't seem to have support for general
purpose ufunctions (can the scipy.special package be ported to
numarray?) 3) Numarray is slower for the common tasks I end up using
SciPy for and 4) I actually understand the Numeric code base quite well
I have a hard time justifying switching over to Numarray.
Thanks again for the comments.
Konrad Hinsen wrote:
> On 21.01.2004, at 19:44, Joe Harrington wrote:
>> This is a necessarily long post about the path to an open-source
>> replacement for IDL and Matlab. While I have tried to be fair to
> You raise many good points here. Some comments:
>> those who have contributed much more than I have, I have also tried to
>> be direct about what I see as some fairly fundamental problems in the
>> way we're going about this. I've given it some section titles so you
> I'd say the fundamental problem is that "we" don't exist as a coherent
> group. There are a few developer groups (e.g. at STSC and Enthought) who
> write code primarily for their own need and then make it available. The
> rest of us are what one could call "power users": very interested in the
> code, knowledgeable about its use, but not contributing to its
> development other than through testing and feedback.
>> THE PROBLEM
>> We are not following the open-source development model. Rather, we
> True. But is it perhaps because that model is not so well adapted to our
> situation? If you look at Linux (the OpenSource reference), it started
> out very differently. It was a fun project, done by hobby programmers
> who shared an idea of fun (kernel hacking). Linux was not goal-oriented
> in the beginnings. No deadlines, no usability criteria, but lots of
> technical challenges.
> Our situation is very different. We are scientists and engineers who
> want code to get our projects done. We have clear goals, and very
> limited means, plus we are mostly somone's employees and thus not free
> to do as we would like. On the other hand, our project doesn't provide
> the challenges that attract the kind of people who made Linux big. You
> don't get into the news by working on NumPy, you don't work against
> Microsoft, etc. Computational science and engineering just isn't the
> same as kernel hacking.
> I develop two scientific Python libraries myself, more specialized and
> thus with a smaller market share, but the situation is otherwise
> similar. And I work much like the Numarray people do: I write the code
> that I need, and I invest minimal effort in distribution and marketing.
> To get the same code developped in the Linux fashion, there would have
> to be many more developers. But they just don't exist. I know of three
> people worldwide whose competence in both Python/C and in the
> application domain is good enough that they could work on the code base.
> This is not enough to build a networked development community. The
> potential NumPy community is certainly much bigger, but I am not sure it
> is big enough. Working on NumPy/Numarray requires the combination of
> not-so-frequent competences, plus availability. I am not saying it can't
> be done, but it sure isn't obvious that it can be.
>> Release it in a way that as many people as possible will get it,
>> install it, use it for real work, and contribute to it. Make the main
>> focus of the core development team the evaluation and inclusion of
>> contributions from others. Develop a common vision for the program,
> This requires yet different competences, and thus different people. It
> takes people who are good at reading others' code and communicating with
> them about it.
> Some people are good programmers, some are good scientists, some are
> good communicators. How many are all of that - *and* available?
>> I know that Perry's group at STScI and the fine folks at Enthought
>> will say they have to work on what they are being paid to work on.
>> Both groups should consider the long term cost, in dollars, of
>> spending those development dollars 100% on coding, rather than 50% on
>> coding and 50% on outreach and intake. Linus himself has written only
> You are probably right. But does your employer think long-term? Mine
>> applications, yet in much less than 7 years Linux became a viable
>> operating system, something much bigger than what we are attempting
> Exactly. We could be too small to follow the Linux way.
>> 1. We should identify the remaining open interface questions. Not,
>> "why is numeric faster than numarray", but "what should the syntax
>> of creating an array be, and of doing different basic operations".
> Yes, a very good point. Focus on the goal, not on the legacy code.
> However, a technical detail that should not be forgotten here: NumPy and
> Numarray have a C API as well, which is critical for many add-ons and
> applications. A C API is more closely tied to the implementation than a
> Python API. It might thus be difficult to settle on an API and then work
> on efficient implementations.
>> 2. We should identify what we need out of the core plotting
>> capability. Again, not "chaco vs. pyxis", but the list of
>> requirements (as an astronomer, I very much like Perry's list).
> 100% agreement. For plotting, defining the interface should be easier
> (no C stuff).
> The SF.Net email is sponsored by EclipseCon 2004
> Premiere Conference on Open Tools Development and Integration
> See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
More information about the Numpy-discussion