[Numpy-discussion] Meta: too many numerical libraries doing thesame
eric at enthought.com
Tue Nov 27 16:16:02 CST 2001
Blitz++ is very cool, but I'm not sure it would make a very good
underpinning for reimplementing Numeric. There are 2 (well maybe 3) main
Blitz++ declares arrays in the following way:
The first issue deals with how you declare arrays in Blitz++.
The big deal here is that the dimensionality of Array is a template
parameter, not a constructor parameter. In other words, 2D arrays are
effectively a different type than 3D arrays. Numeric, on the other hand
represents arrays of all dimensions with a single class/type. For Python,
this makes the most sense. I think you could fanagle some way of getting
blitz to work, but I'm not sure it would be the desired elegant solution.
I've also tinkered with building a simple C++ templated (non-blitz)
implementation of Numeric for kicks, but kept coming back to using the
dreaded void* to store the data arrays. I still haven't completely given up
on a templated solution, but it wasn't as obvious as I thought it would be.
Compiling Blitz++ is slooooow. scipy.compiler spits out 200-300 line
extension modules at the most. Depending on hox complicated expressions
are, it can take .5-1.5 minutes to compile a single extension funtion on an
850 MHz PIII. I can't imagine how long it would take to compile Numeric
arrays for 1 through 11 dimensions (the most blitz supports as I remember)
for all the different data types with 100s of extension functions. The cost
wouldn't be linear because you do pay a one time hit for some of the
template instantiation. Also, I've heard gcc 3.0 might be better. Still,
it'd be a painful development process.
Portability. This comes at two levels. The first is that blitz++ has heavy
duty requirements of the compiler. gcc works fine which is a huge plus, but
a lot of other compilers don't. MSVC is the most notable of these because
it is so heavily used on windows.
The second level is the portability of C++ extension modules in general.
I've run into this on windows, but I think it is an issue pretty much
everywhere. For example, MSVC and GCC compiled C extension libraries can
call each other on Windows because they the are binary compatible. C++
classes are _not_ binary compatible. This has come up for me with wxPython.
The standard version that Robin Dunn distributes is compiled with MSVC. If
you build a small
extensions with gcc that make wxPython call, it'll link just fine, but
seg-faults during execution.
Does anyone know if the same sorta thing is true on the Unices? If it is,
and Numeric was written in C++ then you'd have to compile extension modules
that use Numeric arrays with the same compiler that was used to compile
Numeric. This can lead to all sorts of hassles, and it has made me lean
back towards C as the preferred language for something as fundemental as
Numeric. (Note that I do like C++ for modules that don't really define an
API called by other modules).
Ok, so maybe there's a 4th point. Paul D. pointed out that blitz isn't much
of a win unless you have lazy evaluation (which scipy.compiler already
provides). I also think improved speed _isn't_ the biggest goal of a
reimplementation (although it can't be sacrificed either). I'm more excited
about a code base that more people can comprehend. Perry G. et al's mixed
Python/C implementation with the code generators is a very good idea and a
step in this direction. I hope the speed issues for small arrays can be
solved. I also hope the memory mapped aspect doesn't complicate the code
More information about the Numpy-discussion