[Numpy-discussion] the direction and pace of development

Perry Greenfield perry at stsci.edu
Wed Jan 21 13:29:01 CST 2004

Joe Harrington writes:
> This is a necessarily long post about the path to an open-source
> replacement for IDL and Matlab.  While I have tried to be fair to
> those who have contributed much more than I have, I have also tried to
> be direct about what I see as some fairly fundamental problems in the
> way we're going about this.  I've given it some section titles so you
> can navigate, but I hope that you will read the whole thing before
> posting a reply.  I fear that this will offend some people, but please
> know that I value all your efforts, and offense is not my intent.
No offense taken.


> We are not following the open-source development model.  Rather, we
> pay lip service to it.  Open source's development mantra is "release
> early, release often".  This means release to the public, for use, a
> package that has core capability and reasonably-defined interfaces.
> Release it in a way that as many people as possible will get it,
> install it, use it for real work, and contribute to it.  Make the main
> focus of the core development team the evaluation and inclusion of
> contributions from others.  Develop a common vision for the program,
> and use that vision to make decisions and keep efforts focused.
> Include contributing developers in decision making, but do make
> decisions and move on from them.
> Instead, there are no packages for general distribution.  The basic
> interfaces are unstable, and not even being publicly debated to decide
> among them (save for the past 3 days).  The core developers seem to
> spend most of their time developing, mostly out of view of the
> potential user base.  I am asked probably twice a week by different
> fellow astronomers when an open-source replacement for IDL will be
> available.  They are mostly unaware that this effort even exists.
> However, this indicates that there are at least hundreds of potential
> contributors of application code in astronomy alone, as I don't nearly
> know everyone.  The current efforts look rather more like the GNU
> project than Linux.  I'm sorry if that hurts, but it is true.
I'd both agree with this and disagree. Agree in the sense that 
many agree these are desireable traits of an open source project.
Disagree in the sense that many don't meet all of these traits, 
and yet may be useful to some degree. Even Python is not released
often, nor is it generally packaged by the core group. You will find
packaging by special interest group that may or may not be up to
date for various platforms. There is a whole spectrum of other,
useful open source projects that don't satisfy these requirments.
I don't mean that in a defensive way; it's certainly fair to ask
what is going wrong in the Python numeric world, but doing the above
alone doesn't necessarily guarentee that you will be sucessful in
attracting feedback and contributions; there are other factors as
well that influence how a project develops. 

We have had experience with the packaging issue for PyRAF, and it
isn't quite so simple, the package binary approach didn't always
make life simpler for the user (arguably, we have found the source
distribution approach more trouble-free than our original release).
Having ones own version of python packaged as a binary raises issues
with LD_LIBRARY_PATH that there are just no good solutions to.

> I know that Perry's group at STScI and the fine folks at Enthought
> will say they have to work on what they are being paid to work on.
> Both groups should consider the long term cost, in dollars, of
> spending those development dollars 100% on coding, rather than 50% on
> coding and 50% on outreach and intake.  Linus himself has written only
> a small fraction of the Linux kernel, and almost none of the
> applications, yet in much less than 7 years Linux became a viable
> operating system, something much bigger than what we are attempting
> here.  He couldn't have done that himself, for any amount of money.
> We all know this.
I'd say we have tried our best to solicit input (and accept
contributed code as well). You have to remember that how easily
contributions come depends on what the critical mass is for
usefulness. For something like numarray or Numeric, that critical
mass is quite large. Few are interested in contributing when it
can do very little and and older package exists that can do more.
By the time it has comparable functionality, it is already quite
large. A lot of projects like that start with a small group before
more join in.  There are others where the critical mass is low
and many join in when functionality is still relatively low.

> Here is what I suggest:
> 1. We should identify the remaining open interface questions.  Not,
>    "why is numeric faster than numarray", but "what should the syntax
>    of creating an array be, and of doing different basic operations".
>    If numeric and numarray are in agreement on these issues, then we
>    can move on, and debate performance and features later.
Well, there are, and continue to be those that can't come to
an agreement on even the interface. These issues have been raised
many times in the past. Often consensus was hard to achieve. 
We tended to lean towards backward compatibilty unless the change
seemed really necessary. For type coercion and error handling, 
we thought it was. But I don't think we have tried shield the 
decision making process from the community. I do think the difficulty
in achieving a sense of consensus is a problem.

Perhaps we are going about the process in the wrong way; I'd welcome
suggestions as to how to improve that. 

> 2. We should identify what we need out of the core plotting
>    capability.  Again, not "chaco vs. pyxis", but the list of
>    requirements (as an astronomer, I very much like Perry's list).
> 3. We should collect or implement a very minimal version of the
>    featureset, and document it well enough that others like us can do
>    simple but real tasks to try it out, without reading source code.
>    That documentation should include lists of things that still need
>    to be done.
> 4. We should release a stand-alone version of the whole thing in the
>    formats most likely to be installed by users on the four most
>    popular OSs: Linux, Windows, Mac, and Solaris.  For Linux, this
>    means .rpm and .deb files for Fedora Core 1 and Debian 3.0r2.
>    Tarballs and CVS checkouts are right out.  We have seen that nobody
>    in the real world installs them.  To be most portable and robust,
>    it would make sense to include the Python interpreter, named such
>    that it does not stomp on versions of Python in the released
>    operating systems.  Static linking likewise solves a host of
>    problems and greatly reduces the number of package variants we will
>    have to maintain.
Static linking also introduces other problems. And we have gone
this route in the past so we have some knowledge of what it entails.

> 5. We should advertize and advocate the result at conferences and
>    elsewhere, being sure to label it what it is: a first-cut effort
>    designed to do a few things well and serve as a platform for
>    building on.  We should also solicit and encourage people either to
>    work on the included TODO lists or to contribute applications.  One
>    item on the TODO list should be code converters from IDL and Matlab
>    to Python, and compatibility libraries.
> 6. We should then all continue to participate in the discussions and
>    development efforts that appeal to us.  We should keep in mind that
>    evaluating and incorporating code that comes in is in the long run
>    much more efficient than writing the universe ourselves.
> 7. We should cut and package new releases frequently, at least once
>    every six months.  It is better to delay a wanted feature by one
>    release than to hold a release for a wanted feature.  The mountain
>    is climbed in small steps.
> The open source model is successful because it follows closely
> something that has worked for a long time: the scientific method, with
> its community contributions, peer review, open discussion, and
> progress mainly in small steps.  Once basic capability is out there,
> we can twiddle with how to improve things behind the scenes.
In general, I can't disagree much with most of these. I'm happy for
others to smack us when we are going away from this sort of process.
Please do, it would be the only way (and others) would learn how to
really do it. But we have released fairly frequently, if not with
rpms. We do provide pretty good support as well. We have incorporated
most of the code sent to us, and considered and implemented many
feature requests or performance issues. But the numarray core is
not something one would casually change without spending some time
understanding how it works; I suspect that is the biggest inhibitor
to changes to the core. We are happy to work with others on it if 
they have the time to do so.

If anyone feels we have discouraged people contributing, please let 
me know (privately if you wish).

> The recipe above sounds a lot like SciPy.  SciPy began as a way to
> integrate the necessary add-ons to numeric for real work.  It was
> supposed to test, document, and distribute everything together.  I am
> aware that there are people who use it, but the numbers are small and
> they seem to be tightly connected to Enthought for support and
> application development.  Enthought's focus seems to be on servicing
> its paying customers rather than on moving SciPy development along,
> and I fear they are building an installed customer base on interfaces
> that were not intended to be stable.
I don't feel this is fair to Enthought. It is not my impression
that they have made any money off of the scipy distribution directly
(Chaco is a different issue). As far as I can tell, the only benefit
they've generally gotten from it is from the visibility of sponsoring
it, and perhaps from their own use few of the tools they have included
as part of it. I doubt that their own clients have driven its
development in any significant way. I'd guess they have sunk far 
more money into scipy than gotten out of it. I don't want others to 
get the impression that it is the other way around.

In fact, on a number of occasions I have heard users complain 
about the documentation and the standard response is "please
help us improve it" with very little in response. They have gone
the extra mile in soliciting contributions and help maintaining it.
Perhaps it is part of my open source blind spot, but I have 
trouble seeing what else they could be doing to encourage others
to contribute to scipy (besides paying them; which they have done
as well!). 

The only thing I can think of is that because they are doing it,
others feel that they don't. Perhaps there is a similar issue
with numarray. I don't know.

> So, I will raise the question: is SciPy the way?  Rather than forking
> the plotting and numerical efforts from what SciPy is doing, should we
> not be creating a new effort to do what SciPy has so far not
> delivered?  These are not rhetorical or leading questions.  I don't
> know enough about the motivations, intentions, and resources of the
> folks at Enthought (and elsewhere) to know the answer.  I do think
> that such a fork will occur unless SciPy's approach changes
> substantially.  The way to decide is for us all to discuss the
> question openly on these lists, and for those willing to participate
> and contribute effort to declare so openly.  I think all that is
> needed, either to help SciPy or replace it, is some leadership in the
> direction outlined above.  I would be interested in hearing, perhaps
> from the folks at Enthought, alternative points of view.  Why are
> there no packages for popular OSs for SciPy 0.2?  Why are releases so
> infrequent?  If the folks running the show at scipy.org disagree with
> many others on these lists, then perhaps those others would like to
> roll their own.  Or, perhaps stable/testing/unstable releases of the
> whole package are in order.
I think the answer is simple. Supporting distributions of the software
they have pulled into scipy is a hell of a lot of work; work that
nobody is paying them for. It gives me the shivers to think of our
taking on all they have for scipy. 
> Judging by the number of PhDs in sigs, there are a lot of researchers
> on this list.  I'm one, and I know that our time for doing core
> development or providing the aforementioned leadership is very
> limited, if not zero.  Later we will be in a much better position to
> contribute application software.  However, there is a way we can
> contribute to the core effort even if we are not paid, and that is to
> put budget items in grant and project proposals to support the work of
> others.  Those others could be either our own employees or
> subcontractors at places like Enthought or STScI.  A handful of
> contributors would be all we'd need to support someone to produce OS
> packages and tutorial documentation (the stuff core developers find
> boring) for two releases a year.
By all means, if there is a groundswell of support for development, 
please let us know. 

Perry Greenfield 

More information about the Numpy-discussion mailing list