[Numpy-discussion] Numeric3

Gary Strangman strang at nmr.mgh.harvard.edu
Mon Feb 7 08:42:17 CST 2005

Hi all,

[Long post: a long-time lurker here sharing his numerical 
experiences/input re: "the numeric split" and Numeric3.]

I'm a long-time python user (and follower of this list), a heavy user in 
the numerical realm, and an "incidental" contributor to SciPy (statistical 
functions in stats.py). First, I'm impressed (and indeed amazed) at all 
the hard work folks have put in--usually without pay--to make Numpy and 
Numarray and SciPy and Matplotlib (to name the major players in this 
discussion). I am indebted to each of you. Second, I must say that I think 
this protracted discussion is EXACTLY what the python numerical community 
needs. It appears as though we have critical mass in terms of code and 
interest (and opinions), and just need to bring them all together.

Since the inception of numarray, I've just been standing back and waiting 
to see how this all sorts itself out. My stats functions work for lists 
and numpy arrays. I didn't want to convert them to numarray (given my lack 
of spare time) unless that was going to be the "new path". It appears, 
however, even after all this time, there isn't (quite) a consensus on a 
new path. After the recent message-storm, however, I am very hopeful. I 
see 4 issues at stake here, with the caveat that I'm not the code writer, 
just a user ...

1) Multiarray in Python core: I agree that this (as already stated) is (1) 
mostly irrelevant for heavy-duty numerial folks, BUT (2) is critical to 
provide for python a standardized exchange data format. Being able to 
trivially (i.e., out of the python box) load, save, pickle and load again 
on a new platform N-D array objects would be a big deal for me (and many I 
work with). Such a core object can't favor any particular size array ... 
so it would need to provide good (or excellent) performance on both little 
arrays (a la numpy) and on big arrays (a la numarray). To be in keeping 
with other python objects, it seems this object would need to be tight, 
fast and easily extensible. I *think* this is exactly what Numeric3 is 
intended to do. Getting this right is tricky, but it seems like current 
solutions are EXTREMELY close.

2) Numerical function "packaging": Looking at this from a distance (i.e., 
as a user) numerical packaging is too complex. The python spirit seems to 
call for being a bit more of a splitter (and encapsulator) than a lumper. 
For example, to do web programming in python, one often depends on several 
separate modules (html, xml, cgi, etc) rather than one all-encompassing 
one. To give numerical work the same modular feel (as well as structure 
and insulation from installation headaches), it seems that collections of 
numerical operations should be similarly organized on themes (e.g., 
timeseries analysis, morphology (nd_image?), statistics (stats.py), etc). 
This way, if you're doing timeseries analysis you import the relevant 
modules and go to work ... no worries about installing stuff required for 
morphology or statistics that you don't need. I realize this might require 
(in some cases) more refactoring, but I don't think I'm supporting 
anything *that* different from what already exists. Granted, the notion of 
what's "basic" vs. advanced is relative (e.g., where do you put fft, or 
linear_algebra?). But if made modular and encapsulated (e.g., an fft.py, 
linear_algebra.py, integration.py, morphology.py) and made available both 
individually and as part of one or more suites--see #4 below, it's easier 
to build on existing code rather than reinvent. Interestingly, although 
not obvious, this is how Matlab works too. Your first $500 pays for basic 
array-based functionality (fft, psd, etc). Then there are add-on toolkits 
(at $500 each) specifically for timeseries analysis, imaging, wavelets, 
engineering simulation, etc.

3) Plotting: Until perhaps a year ago, I did almost all my computations in 
python, then saved data out to disk, and read it into matlab to plot it. I 
hated that situation, but it was the only way to quickly and easily look 
at data interactively, with zoom, easy subplotting, etc. Matplotlib has 
all but solved this problem (thanks!!). John indicates that the ultimate 
goal with matplotlib is to provide plotting, not just scientific plotting, 
which is even better! In that case, though, and in keeping with my 
previous comment, perhaps the name matplotlib is a little misleading 
(suggesting scientific plotting only). Again, if I were familiar with 
python but just starting timeseries analysis, I would expect to load my 
data into a (multiarray) python object, import timeseries.py, import 
plotlib.py (i.e., matplotlib) and go to work doing timeseries analysis ... 
be that at LLNL, Wall St, or in my neuro lab.

4) Matlab-like Environment: Both SciPy and Matplotlib have a stated goal 
of creating a matlab-style environment. This is great, as it might help 
wean more folks off of Matlab or IDL and into the python community. 
However, I think that this (as has been suggested ... sorry, I forgot who) 
should be a separate goal from any of the above. Building an environment 
with python is different from providing functionality to python (think 
website design *environment* vs. tools for handling web content ... 
they're different). SciPy, with it's integration goal, plus matplotlib's 
plotting goal would be an outstanding combination to this end.

In sum, I pretty much agree with most previous contributors. With one 
exception. I do agree with Konrad that, theory, the best analysis approach 
is to build a custom class for each domain of investigation. I've even 
done that a couple times. However, of the dozens of researchers (i.e., 
users, not developers) I've talked about this with, almost none have the 
time (or even desire) to design and develop a class-based approach. Their 
goal is data analysis and evaluation. So, they pull the data into python 
(or, more typically, matlab or IDL), analyze it, plot the results, try to 
get a plot that's pretty enough for publication, and then go on to the 
next project ... no time for deciding what functionality should be a 
method or an attribute, subclassing hierarchies, etc. In those rare 
cases where they have the desire, matlab (I don't know about IDL) doesn't 
give you the option, meaning that users (as opposed to developers) 
probably won't go the (unfamiliar and more time consuming) route.

I apologize for the long post that "simply" supports others' opinions, 
particularly when my opinion cannot count for much (after all, I'm not 
likely to be doing much of the coding). But, I did want to express my 
appreciation for ALL the hard work that's been done, and to give the 
strongest encouragement to hashing things out now. I would LOVE to see 
some consensus on (1) what a core multiarray object should look like, 
(2-3) how to imbue python with numerical functionality and plotting for 
generations to come ;-) and (4) to create environments for scientific 
exploration within python. I think we're SOOO close ...


Gary Strangman, PhD        |  Director, Neural Systems Group
Office: 617-724-0662       |  Massachusetts General Hospital
Fax:    617-726-4078       |  149 13th Street, Ste 10018
strang/@/nmr.mgh.harvard.edu |  Charlestown, MA  02129

More information about the Numpy-discussion mailing list