[Numpy-discussion] the direction and pace of development
hinsen at cnrs-orleans.fr
Wed Jan 21 13:27:00 CST 2004
On 21.01.2004, at 19:44, Joe Harrington wrote:
> This is a necessarily long post about the path to an open-source
> replacement for IDL and Matlab. While I have tried to be fair to
You raise many good points here. Some comments:
> those who have contributed much more than I have, I have also tried to
> be direct about what I see as some fairly fundamental problems in the
> way we're going about this. I've given it some section titles so you
I'd say the fundamental problem is that "we" don't exist as a coherent
group. There are a few developer groups (e.g. at STSC and Enthought)
who write code primarily for their own need and then make it available.
The rest of us are what one could call "power users": very interested
in the code, knowledgeable about its use, but not contributing to its
development other than through testing and feedback.
> THE PROBLEM
> We are not following the open-source development model. Rather, we
True. But is it perhaps because that model is not so well adapted to
our situation? If you look at Linux (the OpenSource reference), it
started out very differently. It was a fun project, done by hobby
programmers who shared an idea of fun (kernel hacking). Linux was not
goal-oriented in the beginnings. No deadlines, no usability criteria,
but lots of technical challenges.
Our situation is very different. We are scientists and engineers who
want code to get our projects done. We have clear goals, and very
limited means, plus we are mostly somone's employees and thus not free
to do as we would like. On the other hand, our project doesn't provide
the challenges that attract the kind of people who made Linux big. You
don't get into the news by working on NumPy, you don't work against
Microsoft, etc. Computational science and engineering just isn't the
same as kernel hacking.
I develop two scientific Python libraries myself, more specialized and
thus with a smaller market share, but the situation is otherwise
similar. And I work much like the Numarray people do: I write the code
that I need, and I invest minimal effort in distribution and marketing.
To get the same code developped in the Linux fashion, there would have
to be many more developers. But they just don't exist. I know of three
people worldwide whose competence in both Python/C and in the
application domain is good enough that they could work on the code
base. This is not enough to build a networked development community.
The potential NumPy community is certainly much bigger, but I am not
sure it is big enough. Working on NumPy/Numarray requires the
combination of not-so-frequent competences, plus availability. I am not
saying it can't be done, but it sure isn't obvious that it can be.
> Release it in a way that as many people as possible will get it,
> install it, use it for real work, and contribute to it. Make the main
> focus of the core development team the evaluation and inclusion of
> contributions from others. Develop a common vision for the program,
This requires yet different competences, and thus different people. It
takes people who are good at reading others' code and communicating
with them about it.
Some people are good programmers, some are good scientists, some are
good communicators. How many are all of that - *and* available?
> I know that Perry's group at STScI and the fine folks at Enthought
> will say they have to work on what they are being paid to work on.
> Both groups should consider the long term cost, in dollars, of
> spending those development dollars 100% on coding, rather than 50% on
> coding and 50% on outreach and intake. Linus himself has written only
You are probably right. But does your employer think long-term? Mine
> applications, yet in much less than 7 years Linux became a viable
> operating system, something much bigger than what we are attempting
Exactly. We could be too small to follow the Linux way.
> 1. We should identify the remaining open interface questions. Not,
> "why is numeric faster than numarray", but "what should the syntax
> of creating an array be, and of doing different basic operations".
Yes, a very good point. Focus on the goal, not on the legacy code.
However, a technical detail that should not be forgotten here: NumPy
and Numarray have a C API as well, which is critical for many add-ons
and applications. A C API is more closely tied to the implementation
than a Python API. It might thus be difficult to settle on an API and
then work on efficient implementations.
> 2. We should identify what we need out of the core plotting
> capability. Again, not "chaco vs. pyxis", but the list of
> requirements (as an astronomer, I very much like Perry's list).
100% agreement. For plotting, defining the interface should be easier
(no C stuff).
More information about the Numpy-discussion