[SciPy-user] efficiently importing ascii data

Darren Dale dd55 at cornell.edu
Sun Nov 13 09:59:36 CST 2005


Hi Alan,

On Saturday 12 November 2005 5:37 pm, Alan G Isaac wrote:
> On Thu, 10 Nov 2005, Darren Dale apparently wrote:
> > I'm reading arrays of data from an ascii file and
> > converting to appropriate numerical types. The data files
> > can get pretty big, I was wondering if someone here might
> > have a suggestion on how to speed things up.
>
> This is a common request on the SciPy Users list.

Now that you mention it, I think I may have asked once before.

> I asked Mike Miller to consider releasing TableIO
> http://php.iupui.edu/~mmiller3/python/
> under a more Pythonic license so that SciPy could use it.
> He initially sounded willing, but he never actually sent
> a message releasing the code under another license.  Nor did
> he say that he was ultimately unwilling to do so.
> If you can work with GPL'd code, you might try TableIO.
> I'll bcc: him on this to see if he has decided.

Thank you for the suggestion. I looked at TableIO, but I havent been able to 
get it working properly. I tried to read a file that had '1e7,' repeated a 
million times, and it gave me an array that looked like

array([[ 10000000.],
       [        0.],
       [ 10000000.],
       [        0.],
       [ 10000000.],
       [        0.],
       [ 10000000.],
       [        0.],
       [ 10000000.],
       [        0.],
       [ 10000000.],
       [        0.],
       [ 10000000.],
       [        0.],
       [ 10000000.],
       [        0.],
       [ 10000000.],
       [        0.],
       [ 10000000.],
       [        0.]])

I am considering using scipy's fromfile function, which gives a big speed 
boost over io.read_array, but I don't understand what this docstring is 
trying to tell me:

    WARNING: This function should be used sparingly, as it is not
    a robust method of persistence.  But it can be useful to
    read in simply-formatted or binary data quickly.



More information about the SciPy-user mailing list