[Numpy-discussion] Possible roadmap addendum: building better text file readers

Erin Sheldon erin.sheldon@gmail....
Mon Feb 27 12:02:41 CST 2012


Excerpts from Nathaniel Smith's message of Mon Feb 27 12:07:11 -0500 2012:
> On Mon, Feb 27, 2012 at 2:44 PM, Erin Sheldon <erin.sheldon@gmail.com> wrote:
> > What I've got is a solution for writing and reading structured arrays to
> > and from files, both in text files and binary files.  It is written in C
> > and python.  It allows reading arbitrary subsets of the data efficiently
> > without reading in the whole file.  It defines a class Recfile that
> > exposes an array like interface for reading, e.g. x=rf[columns][rows].
> 
> What format do you use for binary data? Something tiled? I don't
> understand how you can read in a single column of a standard text or
> mmap-style binary file any more efficiently than by reading the whole
> file.

For binary, I just seek to the appropriate bytes on disk and read them,
no mmap.  The user must have input an accurate dtype describing rows in
the file of course.  This saves a lot of memory and time on big files if
you just need small subsets.

For ascii, the approach is similar except care must be taken when
skipping over unread fields and rows.

For writing binary, I just tofile() so the bytes correspond directly
between array and file.  For ascii, I use the appropriate formats for
each type.

Does this answer your question?
-e
-- 
Erin Scott Sheldon
Brookhaven National Laboratory


More information about the NumPy-Discussion mailing list