[SciPy-User] IO of large ASCII table data

Erin Sheldon erin.sheldon@gmail....
Tue Aug 17 13:04:24 CDT 2010


Excerpts from Dan Lussier's message of Tue Aug 17 13:41:26 -0400 2010:
> I am looking to read in large (many million rows) ASCII space
> separated tables into numpy arrays.
> 
> In the past I have heard of people using Miller's TableIO to do this
> but was wondering if a similarly fast method has been more recently
> integrated into scipy/numpy?
> 
> In consulting the documentation the most likely candidate is
> numpy.genfromtext(...).  Is this function pure python or does it rely
> on a C extension as was the case with Miller's TableIO?
> 
> Any advice here would be great as my application could get seriously
> bogged down (both time and memory) in reading these files into arrays
> if I get onto the wrong track.
> 
> Thanks.

The recfile package is designed specifically for this purpose:

    http://code.google.com/p/recfile/

It can read ascii or binary into record arrays.  It is a C++ extension,
so it is efficient.

    import recfile

    # Read from an ascii file.  Number of rows will be determined from
    # the file data and dtype if not entered.
    fname='test.csv'
    dtype=[('field1','f8'),('field2','2i4'),('field3','i8')]

    robj = recfile.Open(fname, dtype=dtype, delim=',')

    # read all rows and columns
    data = robj[:]

    # read subset of columns
    data = robj[ ['x','y'] ][:]

    # read subset of rows
    data = robj[25:50]
    data = robj[rowlist]

    data = robj[ ['x','y'] ][rowlist]


Erin Scott Sheldon


More information about the SciPy-User mailing list