[Numpy-discussion] Possible roadmap addendum: building better text file readers

Chris Barker chris.barker@noaa....
Tue Mar 6 16:45:34 CST 2012


On Thu, Mar 1, 2012 at 10:58 PM, Jay Bourque <jayvius@gmail.com> wrote:

> 1. Loading text files using loadtxt/genfromtxt need a significant
> performance boost (I think at least an order of magnitude increase in
> performance is very doable based on what I've seen with Erin's recfile code)

> 2. Improved memory usage. Memory used for reading in a text file shouldn’t
> be more than the file itself, and less if only reading a subset of file.

> 3. Keep existing interfaces for reading text files (loadtxt, genfromtxt,
> etc). No new ones.

> 4. Underlying code should keep IO iteration and transformation of data
> separate (awaiting more thoughts from Travis on this).

> 5. Be able to plug in different transformations of data at low level (also
> awaiting more thoughts from Travis).

> 6. memory mapping of text files?

> 7. Eventually reduce memory usage even more by using same object for
> duplicate values in array (depends on implementing enum dtype?)

> Anything else?

Yes -- I'd like to see the solution be able to do high -performance
reads of a portion of a file -- not always the whole thing. I seem to
have a number of custom text files that I need to read that are laid
out in chunks: a bit of a header, then a block of number, another
header, another block. I'm happy to read and parse the header sections
with pure pyton, but would love a way to read the blocks of numbers
into a numpy array fast. This will probably come out of the box with
any of the proposed solutions, as long as they start at the current
position of a passes-in fiel object, and can be told how much to read,
then leave the file pointer in the correct position.

Great to see this moving forward.

-Chris



-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov


More information about the NumPy-Discussion mailing list