[Numpy-discussion] How to read data from text files fast?
Fernando.Perez at colorado.edu
Thu Jul 1 13:28:01 CDT 2004
Chris Barker wrote:
> Hi all,
> I'm looking for a way to read data from ascii text files quickly. I've
> found that using the standard python idioms like:
> data = array((M,N),Float)
> for in range(N):
> Can be pretty slow. What I'd like is something like Matlab's fscanf:
> data = fscanf(file, "%g", [M,N] )
> I may have the syntax a little wrong, but the gist is there. What Matlab
> does keep recycling the format string until the desired number of
> elements have been read.
> It is quite flexible, and ends up being pretty fast.
> Has anyone written something like this for Numeric (or numarray, but I'd
> prefer Numeric at this point) ?
> I was surprised not to find something like this in SciPy, maybe I didn't
> look hard enough.
I haven't timed it, because it's been 'fast enough' for my needs.
For reading binary data files, I have this little utility which is basically a
wrapper around Numeric.fromstring (N below is Numeric imported 'as N'). Note
that it can read binary .gz files directly, a _huge_ gain for very sparse
files representing 3d arrays (I can read a 400k gz file which blows up to
~60MB when unzipped in no time at all, while reading the unzipped file is very
"""Read in a binary data file.
Does NOT check for endianness issues.
fname - can be .gz
offset=0: # of bytes to skip in file *from the beginning* before data starts
# config parameters
item_size = N.zeros(1,typecode).itemsize() # size in bytes
data_size = N.product(N.array(dims))*item_size
# read in data
data_file = gzip.open(fname)
data_file = file(fname)
data = N.fromstring(data_file.read(data_size),typecode)
data.shape = dims
#print 'Read',data_size/item_size,'data points. Shape:',dims
print 'Read',N.size(data),'data points. Shape:',dims
if recast_type is not None:
data = data.astype(recast_type)
More information about the Numpy-discussion