[Numpy-discussion] reading gzip compressed files using numpy.fromfile
Wed Oct 28 14:33:11 CDT 2009
On Wed, Oct 28, 2009 at 14:31, Peter Schmidtke <email@example.com> wrote:
> Dear Numpy Mailing List Readers,
> I have a quite simple problem, for what I did not find a solution for now.
> I have a gzipped file lying around that has some numbers stored in it and I
> want to read them into a numpy array as fast as possible but only a bunch
> of data at a time.
> So I would like to use numpys fromfile funtion.
> For now I have somehow the following code :
> f=gzip.open( "myfile.gz", "r" )
> So I would read 400 entries from the file, keep it open, process my data,
> come back and read the next 400 entries. If I do this, numpy is complaining
> that the file handle f is not a normal file handle :
> OError: first argument must be an open file
> but in fact it is a zlib file handle. But gzip gives access to the normal
> filehandle through f.fileobj.
np.fromfile() requires a true file object, not just a file-like
object. np.fromfile() works by grabbing the FILE* pointer underneath
and using C system calls to read the data, not by calling the .read()
> So I tried xyz=npy.fromfile(f.fileobj,dtype="float32",count=400)
> But there I get just meaningless values (not the actual data) and when I
> specify the sep=" " argument for npy.fromfile I get just .1 and nothing
This is reading the compressed data, not the data that you want.
> Can you tell me why and how to fix this problem? I know that I could read
> everything to memory, but these files are rather big, so I simply have to
> avoid this.
Read in reasonably-sized chunks of bytes at a time, and use
np.fromstring() to create arrays from them.
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the NumPy-Discussion