[Numpy-discussion] How to start at line # x when using numpy.memmap
Fri Aug 19 09:07:54 CDT 2011
On 08/19/2011 05:01 PM, Brent Pedersen wrote:
> On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin<firstname.lastname@example.org> wrote:
>> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen<email@example.com> wrote:
>>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote:
>>>> I would like to use numpy's memmap on some data files I have. The first
>>>> 12 or so lines of the files contain text (header information) and the
>>>> remainder has the numerical data. Is there a way I can tell memmap to
>>>> skip a specified number of lines instead of a number of bytes?
>>> First use standard Python I/O functions to determine the number of
>>> bytes to skip at the beginning and the number of data items. Then pass
>>> in `offset` and `shape` parameters to numpy.memmap.
>> Thanks for that suggestion. However, I'm unfamiliar with the I/O
>> functions you are referring to. Can you point me to do the
>> Thanks again,
>> NumPy-Discussion mailing list
> this might get you started:
> import numpy as np
> # make some fake data with 12 header lines.
> with open('test.mm', 'w') as fhw:
> print>> fhw, "\n".join('header' for i in range(12))
> np.arange(100, dtype=np.uint).tofile(fhw)
> # use normal python io to determine of offset after 12 lines.
> with open('test.mm') as fhr:
> for i in range(12): fhr.readline()
> offset = fhr.tell()
I think that before reading a line the program should
check whether the line starts with "#". Otherwise fhr.readline()
may return a very large junk of data (may be the rest of the file
content) that ought to be read only via memmap.
More information about the NumPy-Discussion