[Numpy-discussion] Possible roadmap addendum: building better text file readers
Chris Barker
chris.barker@noaa....
Tue Mar 6 16:45:34 CST 2012
On Thu, Mar 1, 2012 at 10:58 PM, Jay Bourque <jayvius@gmail.com> wrote:
> 1. Loading text files using loadtxt/genfromtxt need a significant
> performance boost (I think at least an order of magnitude increase in
> performance is very doable based on what I've seen with Erin's recfile code)
> 2. Improved memory usage. Memory used for reading in a text file shouldn’t
> be more than the file itself, and less if only reading a subset of file.
> 3. Keep existing interfaces for reading text files (loadtxt, genfromtxt,
> etc). No new ones.
> 4. Underlying code should keep IO iteration and transformation of data
> separate (awaiting more thoughts from Travis on this).
> 5. Be able to plug in different transformations of data at low level (also
> awaiting more thoughts from Travis).
> 6. memory mapping of text files?
> 7. Eventually reduce memory usage even more by using same object for
> duplicate values in array (depends on implementing enum dtype?)
> Anything else?
Yes -- I'd like to see the solution be able to do high -performance
reads of a portion of a file -- not always the whole thing. I seem to
have a number of custom text files that I need to read that are laid
out in chunks: a bit of a header, then a block of number, another
header, another block. I'm happy to read and parse the header sections
with pure pyton, but would love a way to read the blocks of numbers
into a numpy array fast. This will probably come out of the box with
any of the proposed solutions, as long as they start at the current
position of a passes-in fiel object, and can be told how much to read,
then leave the file pointer in the correct position.
Great to see this moving forward.
-Chris
--
Christopher Barker, Ph.D.
Oceanographer
Emergency Response Division
NOAA/NOS/OR&R (206) 526-6959 voice
7600 Sand Point Way NE (206) 526-6329 fax
Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
More information about the NumPy-Discussion
mailing list