[Numpy-discussion] updating Numeric

Joe Harrington jh at oobleck.astro.cornell.edu
Mon Jan 24 08:55:41 CST 2005


Hi Travis,

Perry may not see a problem with updating numeric, but I do, at least
in the short term.  It takes resources and time away from the issue at
hand, which I believe you yourself raised.  Certainly they are your
resources to allocate as you wish (this being open source), but please
consider the following.

This whole dispute arises from a single question to which we don't yet
know the answer:

	WHY is numarray slower than numeric for small arrays?

Why not just do the work to answer the question?  Then we can have a
discussion on the direction we want to go in that is based on actual
information.  At this point we know each others' opinions and why we
hold them, but we don't have the key information to make any
decisions.

Let's say, hypothetically, that there is a way to fix numarray to be
fast in small arrays without breaking it in other important ways.
Would it really be worth perpetuating numeric rather than working on
unifying the packages and the community?  If the problems are not
fundamental to our respective values, they can be fixed, and we can
move forward with the great volume of work that's needed to make this
a viable data analysis environment for the masses.

If the problems *are* fundamental to our values, we can work on
compromise solutions knowing *what* we are actually working around,
and unifying elsewhere when possible.  We wouldn't waste any more
years wringing our hands about unification.

Perry (and others) have summarized a few ideas on why numarray might
be slower.  One of those ideas, namely the use of new-style classes in
numarray, might mean that all the code-bumming in the world won't fix
the problem.  Numeric fans would likely say that speed is worth the
inconvenience of old-style coding.  Numarray fans wouldn't, and that
would be that: we'd be in the realm of co-existence solutions.  We'd
move forward implementing them, documenting them, etc.

I would think it worthwhile to check at least that possibility before
proceeding with other work.  To check it, someone who is familiar with
numeric needs to convert it (or an appropriate subset of it) to
new-style classes, and profile both versions.  If the array creation
time jumps by the factor we've seen, we need look no further.  Rather,
we'd need to focus the discussion on whether to continue using
new-style classes in numarray.

Assuming the two packages *do* have irreconcilable differences, then a
coexistence approach makes a lot of sense, and a numeric update would
be an important first step.  We've talked about two approaches: user
chooses a package at runtime, or things start "light" and the software
detects cases where "heavy" features get used and augments those
arrays on the fly.

We know what needs to be done to figure out where the problems lie.
Why not work on that next, and put this argument to bed once and for
all?

--jh--




More information about the Numpy-discussion mailing list