[Numpy-tickets] [NumPy] #971: numpy.memmap 'offset' parameter docs are almost entirely wrong.

NumPy numpy-tickets@scipy....
Fri Dec 19 00:55:15 CST 2008


#971: numpy.memmap 'offset' parameter docs are almost entirely wrong.
--------------------+-------------------------------------------------------
 Reporter:  0ion9   |       Owner:  somebody
     Type:  defect  |      Status:  new     
 Priority:  normal  |   Milestone:  1.3.0   
Component:  Other   |     Version:  devel   
 Severity:  normal  |    Keywords:          
--------------------+-------------------------------------------------------
 "In the file, array data starts at this offset. offset should be a
 multiple of the byte-size of dtype. Requires shape=None. The default is 0"

 In actual fact, shape != None is usable with offset, and crucial when the
 file only contains *some* records of the type you want to memmap, with
 other data included in the file.

 Supposing you have a file of 27 bytes, with a 7-byte header and 10-byte
 records.
 Let's create one:

 echo -n "HEADER!-onerecord-tworecord" > /tmp/npmmap

 If you want to read the records, you can do:

 >>> mydtype = "S10"

 >>> mmappedarray = np.memmap ('/tmp/npmmap', mode = 'r', offset = 7, dtype
 = mydtype)

 >>> mmappedarray
 memmap(['-onerecord', '-tworecord'],
       dtype='|S10')

 Now supposing you had some other data at the end of the file:


 echo -n "HEADER!-onerecord-tworecordJUNK" > /tmp/npmmap


 You can handle that by explicitly stating the shape:

 >>> mmappedarray = np.memmap ('/tmp/npmmap', mode = 'r', offset = 7, dtype
 = mydtype, shape = 2)

 >>> mmappedarray
 memmap(['-onerecord', '-tworecord'],
       dtype='|S10')


 Therefore, here is a statement to replace "offset should be a multiple of
 the byte-size of dtype. Requires shape=None.":

 "if number of bytes remaining after offset is evenly divisible into the
 byte-size of dtype, shape can be omitted. Otherwise, when you specify
 offset you must also specify shape.
 When creating or overwriting files, offset can be any positive
 number, subject to file size limitations (see Notes).
 "

 and for shape, "When reading, the remaining bytes in the file must be <=
 the bytesize of an array of the specified dtype and shape, if shape is
 given.
 When writing a new file, shape must always be given. When overwriting
 (mode == 'w') or appending (mode == 'r+') to an existing file, specifying
 a shape that extends beyond the end of the current file contents causes
 the file to be immediately extended to fit."

 for 'w+' mode docs, "overwriting an existing file may cause file size to
 grow or shrink as needed."
 (specified because, in combination with 'offset' > 0, the total file size
 after memmapping may differ from the memmapped array size)



 (I set the version for this bug to 'devel', but it is actually 1.2, which
 is not available in the bug-tracker)

-- 
Ticket URL: <http://scipy.org/scipy/numpy/ticket/971>
NumPy <http://projects.scipy.org/scipy/numpy>
The fundamental package needed for scientific computing with Python.


More information about the Numpy-tickets mailing list