[Numpy-tickets] [NumPy] #971: numpy.memmap 'offset' parameter docs are almost entirely wrong.
NumPy
numpy-tickets@scipy....
Fri Dec 19 00:55:15 CST 2008
#971: numpy.memmap 'offset' parameter docs are almost entirely wrong.
--------------------+-------------------------------------------------------
Reporter: 0ion9 | Owner: somebody
Type: defect | Status: new
Priority: normal | Milestone: 1.3.0
Component: Other | Version: devel
Severity: normal | Keywords:
--------------------+-------------------------------------------------------
"In the file, array data starts at this offset. offset should be a
multiple of the byte-size of dtype. Requires shape=None. The default is 0"
In actual fact, shape != None is usable with offset, and crucial when the
file only contains *some* records of the type you want to memmap, with
other data included in the file.
Supposing you have a file of 27 bytes, with a 7-byte header and 10-byte
records.
Let's create one:
echo -n "HEADER!-onerecord-tworecord" > /tmp/npmmap
If you want to read the records, you can do:
>>> mydtype = "S10"
>>> mmappedarray = np.memmap ('/tmp/npmmap', mode = 'r', offset = 7, dtype
= mydtype)
>>> mmappedarray
memmap(['-onerecord', '-tworecord'],
dtype='|S10')
Now supposing you had some other data at the end of the file:
echo -n "HEADER!-onerecord-tworecordJUNK" > /tmp/npmmap
You can handle that by explicitly stating the shape:
>>> mmappedarray = np.memmap ('/tmp/npmmap', mode = 'r', offset = 7, dtype
= mydtype, shape = 2)
>>> mmappedarray
memmap(['-onerecord', '-tworecord'],
dtype='|S10')
Therefore, here is a statement to replace "offset should be a multiple of
the byte-size of dtype. Requires shape=None.":
"if number of bytes remaining after offset is evenly divisible into the
byte-size of dtype, shape can be omitted. Otherwise, when you specify
offset you must also specify shape.
When creating or overwriting files, offset can be any positive
number, subject to file size limitations (see Notes).
"
and for shape, "When reading, the remaining bytes in the file must be <=
the bytesize of an array of the specified dtype and shape, if shape is
given.
When writing a new file, shape must always be given. When overwriting
(mode == 'w') or appending (mode == 'r+') to an existing file, specifying
a shape that extends beyond the end of the current file contents causes
the file to be immediately extended to fit."
for 'w+' mode docs, "overwriting an existing file may cause file size to
grow or shrink as needed."
(specified because, in combination with 'offset' > 0, the total file size
after memmapping may differ from the memmapped array size)
(I set the version for this bug to 'devel', but it is actually 1.2, which
is not available in the bug-tracker)
--
Ticket URL: <http://scipy.org/scipy/numpy/ticket/971>
NumPy <http://projects.scipy.org/scipy/numpy>
The fundamental package needed for scientific computing with Python.
More information about the Numpy-tickets
mailing list