[SciPy-User] IO of large ASCII table data
Tue Aug 17 13:04:24 CDT 2010
Excerpts from Dan Lussier's message of Tue Aug 17 13:41:26 -0400 2010:
> I am looking to read in large (many million rows) ASCII space
> separated tables into numpy arrays.
> In the past I have heard of people using Miller's TableIO to do this
> but was wondering if a similarly fast method has been more recently
> integrated into scipy/numpy?
> In consulting the documentation the most likely candidate is
> numpy.genfromtext(...). Is this function pure python or does it rely
> on a C extension as was the case with Miller's TableIO?
> Any advice here would be great as my application could get seriously
> bogged down (both time and memory) in reading these files into arrays
> if I get onto the wrong track.
The recfile package is designed specifically for this purpose:
It can read ascii or binary into record arrays. It is a C++ extension,
so it is efficient.
# Read from an ascii file. Number of rows will be determined from
# the file data and dtype if not entered.
robj = recfile.Open(fname, dtype=dtype, delim=',')
# read all rows and columns
data = robj[:]
# read subset of columns
data = robj[ ['x','y'] ][:]
# read subset of rows
data = robj[25:50]
data = robj[rowlist]
data = robj[ ['x','y'] ][rowlist]
Erin Scott Sheldon
More information about the SciPy-User