[Numpy-discussion] memory-mapped Numeric arrays: arrayfrombuffer version 2

Mathew Yeates mathew at fugue.jpl.nasa.gov
Mon Feb 18 12:26:58 CST 2002


Has anyone checked out VMaps at  http://snafu.freedom.org/Vmaps/ ??
This might be what you're looking for.

Mathew
> (I thought I had sent this mail on January 30, but I guess I was
> mistaken.)
> 
> Eric Nodwell writes:
> > Since I have a 2.4GB data file handy, I thought I'd try this
> > package with it.  (Normally I process this data file by reading
> > it in a chunk at a time, which is perfectly adequate.)  Not
> > surprisingly, it chokes:
> 
> Yep, that's pretty much what I expected.  I think that adding code to
> support mapping some arbitrary part of a file should be fairly
> straightforward --- do you want to run the tests if I write the code?
> 
> >   File "/home/eric/lib/python2.2/site-packages/maparray.py", line 15,
> >   in maparray
> >     m = mmap.mmap(fn, os.fstat(fn)[stat.ST_SIZE])
> >   OverflowError: memory mapped size is too large (limited by C int)
> 
> This error message's wording led me to something that was *not* what I
> expected.
> 
> That's a sort of alarming message --- it suggests that it won't work
> on >2G files even on LP64 systems, where longs and pointers are 64
> bits but ints are 32 bits.  The comments in the mmap module say:
> 
>    The map size is restricted to [0, INT_MAX] because this is the current
>    Python limitation on object sizes. Although the mmap object *could* handle
>    a larger map size, there is no point because all the useful operations
>    (len(), slicing(), sequence indexing) are limited by a C int.
> 
> Horrifyingly, this is true.  Even the buffer interface function
> arrayfrombuffer uses to get the size of the buffer return int sizes,
> not size_t sizes.  This is a serious bug in the buffer interface, IMO,
> and I doubt it will be fixed --- the buffer interface is apparently
> due for a revamp soon at any rate, so little changes won't be
> welcomed, especially if they break binary backwards compatibility, as
> this one would on LP64 platforms.
> 
> Fixing this, so that LP64 Pythons can mmap >2G files (their
> birthright!), is a bit of work --- probably a matter of writing a
> modified mmap() module that supports a saner version of the buffer
> interface (with named methods instead of a type object slot), and
> can't be close()d, to boot.
> 
> Until then, this module only lets you memory-map files up to two gigs.
> 
> > (details: Python 2.2, numpy 20.3, Pentium III, Debian Woody, Linux
> > kernel 2.4.13, gcc 2.95.4)
> 
> My kernel is 2.4.13 too, but I don't have any large files, and I don't
> know whether any of my kernel, my libc, or my Python even support
> them.
> 
> > I'm not a big C programmer, but I wonder if there is some way for
> > this package to overcome the 2GB limit on 32-bit systems.  That
> > could be useful in some situations.
> 
> I don't know, but I think it would probably require extensive code
> changes throughout Numpy.
> 
> -- 
> <kragen at pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
> The sages do not believe that making no mistakes is a blessing. They believe, 
> rather, that the great virtue of man lies in his ability to correct his 
> mistakes and continually make a new man of himself.  -- Wang Yang-Ming
> 
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion






More information about the Numpy-discussion mailing list