[Numpy-discussion] memory-mapped Numeric arrays: arrayfrombuffer version 2

Kragen Sitaker kragen at pobox.com
Mon Feb 18 12:13:19 CST 2002


(I thought I had sent this mail on January 30, but I guess I was
mistaken.)

Eric Nodwell writes:
> Since I have a 2.4GB data file handy, I thought I'd try this
> package with it.  (Normally I process this data file by reading
> it in a chunk at a time, which is perfectly adequate.)  Not
> surprisingly, it chokes:

Yep, that's pretty much what I expected.  I think that adding code to
support mapping some arbitrary part of a file should be fairly
straightforward --- do you want to run the tests if I write the code?

>   File "/home/eric/lib/python2.2/site-packages/maparray.py", line 15,
>   in maparray
>     m = mmap.mmap(fn, os.fstat(fn)[stat.ST_SIZE])
>   OverflowError: memory mapped size is too large (limited by C int)

This error message's wording led me to something that was *not* what I
expected.

That's a sort of alarming message --- it suggests that it won't work
on >2G files even on LP64 systems, where longs and pointers are 64
bits but ints are 32 bits.  The comments in the mmap module say:

   The map size is restricted to [0, INT_MAX] because this is the current
   Python limitation on object sizes. Although the mmap object *could* handle
   a larger map size, there is no point because all the useful operations
   (len(), slicing(), sequence indexing) are limited by a C int.

Horrifyingly, this is true.  Even the buffer interface function
arrayfrombuffer uses to get the size of the buffer return int sizes,
not size_t sizes.  This is a serious bug in the buffer interface, IMO,
and I doubt it will be fixed --- the buffer interface is apparently
due for a revamp soon at any rate, so little changes won't be
welcomed, especially if they break binary backwards compatibility, as
this one would on LP64 platforms.

Fixing this, so that LP64 Pythons can mmap >2G files (their
birthright!), is a bit of work --- probably a matter of writing a
modified mmap() module that supports a saner version of the buffer
interface (with named methods instead of a type object slot), and
can't be close()d, to boot.

Until then, this module only lets you memory-map files up to two gigs.

> (details: Python 2.2, numpy 20.3, Pentium III, Debian Woody, Linux
> kernel 2.4.13, gcc 2.95.4)

My kernel is 2.4.13 too, but I don't have any large files, and I don't
know whether any of my kernel, my libc, or my Python even support
them.

> I'm not a big C programmer, but I wonder if there is some way for
> this package to overcome the 2GB limit on 32-bit systems.  That
> could be useful in some situations.

I don't know, but I think it would probably require extensive code
changes throughout Numpy.

-- 
<kragen at pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
The sages do not believe that making no mistakes is a blessing. They believe, 
rather, that the great virtue of man lies in his ability to correct his 
mistakes and continually make a new man of himself.  -- Wang Yang-Ming




More information about the Numpy-discussion mailing list