[Numpy-discussion] Re: Trying out Numeric3

Scott Gilbert xscottg at yahoo.com
Wed Mar 23 03:00:04 CST 2005


--- Michiel Jan Laurens de Hoon <mdehoon at ims.u-tokyo.ac.jp> wrote:
> >>>
> >> Do 4 gigabyte 1D numerical python arrays occur in practice?
> 
> Why? There needs to be a good reason to break compatibility. Who needs
> this?
> 

I (and others I work with) routinely deal with 1D datasets that are
multiple gigabytes in length.  Working with terabyte datasets is on my near
horizon.  For lots of reasons, I don't/can't use Python/Numeric for very
much of this sort of thing, but it would be nice it I could.  The "32 bits
is enough for anyone" design has bitten me with lots of tools (not just
Python).  The Python core will fix it's int/intp problem eventually, I
can't see why Numeric3 wouldn't avoid the problem now.

As a concrete case that I'm sure has been done, consider memory mapped file
arrays.  64 bit platforms can mmap huge files without using huge amounts of
of real memory.



I try to remain a lurker on this list, but since I've already broken my
silence, let me add a few other notes and then I'll go back to being
silent...  I'll try to sort them by priority.



Pickling performance is important to us at my work.  We use pickling to
pass data across Unix pipes, through shared memory, across sockets on
Gig-E, etc...  Typically we'll have a dictionary containing some metadata,
and a few large chunks (1-100 MBytes would be common) of Numeric array
data.  We'd like to transfer 100s of these per second.  Currently, we
pickle the array into a string in memory, then pickle the string across the
conduit (pipe or socket or shared memory).  For some reason, pickling a
Numeric array directly to the file object is slower than the two stage
process... 

If the new Numeric3 didn't break too much compatibility with the original
Numeric but pickled much faster, we'd probably be in a hurry to upgrade
based on this feature alone.

The new pickling protocol that allows a generator to be used to copy small
chunks at a time instead of an entire binary string copy could potentially
save the cost of duplicating a 100 MByte array into a 100 MByte string.




The reason we use pickling like we do is to pass data between processes. 
Almost all of our work machines have multiple processors (typically 4).  A
lot of times, the multi-process design is cleaner and less buggy, but there
are also times when we'd prefer to use multiple threads in a single
process.

It's unfortunate that the GIL prohibits too much real concurrency with
multiple threads.  It would be nice if the ufuncs and other numerical
algorithms released the GIL when possible.  I know the limitations of the
Python buffer protocol add significant headache in this area, but it's
something to think about.



We have a wide group of smart engineering folks using Python/Numeric, but
most of them are not computer scientists or software engineers.  Meaning
they spend all day writing software, but know just enough about programming
to solve their problems, and almost none of them have any knowledge about
the internals of Python or Numeric.  Complicated rules about whether
something returns a scalar-versus-array, or a copy-versus-view add
frustration and hard to find bugs.

This has been beaten up on this list quite a bit, and there is probably too
much momentum behind the case by case strategy that is now in place, but
please count my vote for always getting an array copy (copy on write) from
subscripting unless you explicitly ask for a view, and always returning a
rank-0 array instead of a scalar.

I agree with the other guy who pointed out that arrays are mutable and that
likewise, rank-0 arrays should be mutable.  I know it's unlikely to happen,
but it would also be nice to see the Python parser change slightly to treat
a[] as a[()].  Then the mutability of rank-0 could fit elegantly with the
rank-(n > 1) arrays.  It's a syntax error now, so there wouldn't be a
backwards compatibility issue.




We commonly use data types that aren't in Numeric.  The most prevalent
example at my work is complex-short.  It looks like I can wrap the new
"Void" type to handle this to some extent.  Will indexing (subscripting) a
class derived from a Numeric3 array return the derived class?

     class Derived(Numeric3.ArrayType):
         pass

     d = Derived(shape=(200, 200, 2), typecode='s')
     if isinstance(d[0], Derived):
         print "This is what I mean"

I don't really expect Numeric3 to add all of the possible oddball types,
but I think it's important to remember that other types are out there
(fixed point for DSP, mu-law for audio, 16 bit floats for graphics, IBMs
decimal64 decimal128 types, double-double and quad-double for increased
precision, quaternions of standard types, ....).  It's one thing to treat
these like "record arrays", it's another thing for them to have overloaded
arithmetic operators.

Since Numeric3 can't support every type under the sun, it would be nice if
when the final version goes into the Python core that the C-API and Python
library functions used "duck typing" so that other array implementations
could work to whatever extent possible.  In other words, it would be better
if users were not required to derive from the Numeric3 type in order to
create new kinds of arrays that can be used with sufficiently generic
Numeric3 routines.  Simply having the required attributes (shape, strides,
itemsize, ...) of a Numeric3 array should be enough to be treated like a
Numeric3 array.




This last one is definitely pie-in-the-sky, but I thought I'd mention it. 
Since the 64 bit Alphas are expensive and pretty much on the way out of
production, we've stepped back to 32 bit versions of x86/Linux.  The Linux
boxes are cheaper, faster, and smaller, but not 64 bit.  It would be really
great using Numeric to directly manipulate huge (greater than 2**32 byte
length) files on a 32 bit platform.  This would require a smarter paging
scheme than simply mmapping the whole thing, and I don't think any of the
Python Array packages has proposed a good solution for this...  I realize
it adds considerable complexity to switch from a single buffer object
pointing to the entire block of data to having multiple buffers pinning
down pieces of the data at a time, but the result would be pretty useful.





I realize this is a lot of commentary from someone who doesn't contribute
much of anything back to the Numeric/Numarray/SciPy community.  If you got
this far, thanks for your time reading it.  I appreciate the work you're
doing.

Cheers,
    -Scott







More information about the Numpy-discussion mailing list