[Numpy-discussion] reading gzip compressed files using numpy.fromfile
Peter Schmidtke
pschmidtke@mmb.pcb.ub...
Thu Oct 29 06:38:11 CDT 2009
> Date: Wed, 28 Oct 2009 20:31:43 +0100
> From: Peter Schmidtke <pschmidtke@mmb.pcb.ub.es>
> Subject: [Numpy-discussion] reading gzip compressed files using
> numpy.fromfile
> To: numpy-discussion@scipy.org
> Message-ID: <fc345224bfa26132e9474287e32e083b@mmb.pcb.ub.es>
> Content-Type: text/plain; charset="UTF-8"
>
> Dear Numpy Mailing List Readers,
>
> I have a quite simple problem, for what I did not find a solution for
now.
> I have a gzipped file lying around that has some numbers stored in it and
I
> want to read them into a numpy array as fast as possible but only a bunch
> of data at a time.
> So I would like to use numpys fromfile funtion.
>
> For now I have somehow the following code :
>
>
>
> f=gzip.open( "myfile.gz", "r" )
> xyz=npy.fromfile(f,dtype="float32",count=400)
>
>
> So I would read 400 entries from the file, keep it open, process my data,
> come back and read the next 400 entries. If I do this, numpy is
complaining
> that the file handle f is not a normal file handle :
> OError: first argument must be an open file
>
> but in fact it is a zlib file handle. But gzip gives access to the normal
> filehandle through f.fileobj.
>
> So I tried xyz=npy.fromfile(f.fileobj,dtype="float32",count=400)
>
> But there I get just meaningless values (not the actual data) and when I
> specify the sep=" " argument for npy.fromfile I get just .1 and nothing
> else.
>
> Can you tell me why and how to fix this problem? I know that I could read
> everything to memory, but these files are rather big, so I simply have to
> avoid this.
>
> Thanks in advance.
>
>
> --
>
> Peter Schmidtke
>
> ----------------------
> PhD Student at the Molecular Modeling and Bioinformatics Group
> Dep. Physical Chemistry
> Faculty of Pharmacy
> University of Barcelona
>
>
>
> ------------------------------
>
> Message: 2
> Date: Wed, 28 Oct 2009 14:33:11 -0500
> From: Robert Kern <robert.kern@gmail.com>
> Subject: Re: [Numpy-discussion] reading gzip compressed files using
> numpy.fromfile
> To: Discussion of Numerical Python <numpy-discussion@scipy.org>
> Message-ID:
> <3d375d730910281233r5cadd0fcubea14676a3a978f1@mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
>
> On Wed, Oct 28, 2009 at 14:31, Peter Schmidtke <pschmidtke@mmb.pcb.ub.es>
> wrote:
>> Dear Numpy Mailing List Readers,
>>
>> I have a quite simple problem, for what I did not find a solution for
>> now.
>> I have a gzipped file lying around that has some numbers stored in it
and
>> I
>> want to read them into a numpy array as fast as possible but only a
bunch
>> of data at a time.
>> So I would like to use numpys fromfile funtion.
>>
>> For now I have somehow the following code :
>>
>>
>>
>> ? ? ? ?f=gzip.open( "myfile.gz", "r" )
>> xyz=npy.fromfile(f,dtype="float32",count=400)
>>
>>
>> So I would read 400 entries from the file, keep it open, process my
data,
>> come back and read the next 400 entries. If I do this, numpy is
>> complaining
>> that the file handle f is not a normal file handle :
>> OError: first argument must be an open file
>>
>> but in fact it is a zlib file handle. But gzip gives access to the
normal
>> filehandle through f.fileobj.
>
> np.fromfile() requires a true file object, not just a file-like
> object. np.fromfile() works by grabbing the FILE* pointer underneath
> and using C system calls to read the data, not by calling the .read()
> method.
>
>> So I tried ?xyz=npy.fromfile(f.fileobj,dtype="float32",count=400)
>>
>> But there I get just meaningless values (not the actual data) and when I
>> specify the sep=" " argument for npy.fromfile I get just .1 and nothing
>> else.
>
> This is reading the compressed data, not the data that you want.
>
>> Can you tell me why and how to fix this problem? I know that I could
read
>> everything to memory, but these files are rather big, so I simply have
to
>> avoid this.
>
> Read in reasonably-sized chunks of bytes at a time, and use
> np.fromstring() to create arrays from them.
>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
> -- Umberto Eco
>
>
> ------------------------------
>
> Message: 3
> Date: Wed, 28 Oct 2009 13:26:41 -0700
> From: Christopher Barker <Chris.Barker@noaa.gov>
> Subject: Re: [Numpy-discussion] reading gzip compressed files using
> numpy.fromfile
> To: Discussion of Numerical Python <numpy-discussion@scipy.org>
> Message-ID: <4AE8A901.3060403@noaa.gov>
> Content-Type: text/plain; charset=UTF-8; format=flowed
>
> Robert Kern wrote:
>>> f=gzip.open( "myfile.gz", "r" )
>>> xyz=npy.fromfile(f,dtype="float32",count=400)
>
>> Read in reasonably-sized chunks of bytes at a time, and use
>> np.fromstring() to create arrays from them.
>
> Something like:
>
> count = 400
> xyz = np.fromstring(f.read(count*4), dtype=np.float32)
>
> should work (untested...)
>
> -Chris
>
>
>
>
> --
> Christopher Barker, Ph.D.
> Oceanographer
>
> Emergency Response Division
> NOAA/NOS/OR&R (206) 526-6959 voice
> 7600 Sand Point Way NE (206) 526-6329 fax
> Seattle, WA 98115 (206) 526-6317 main reception
>
> Chris.Barker@noaa.gov
>
>
Thanks Robert and Chris...indeed I managed to read it quite fast this way.
++
Peter Schmidtke
----------------------
PhD Student at the Molecular Modeling and Bioinformatics Group
Dep. Physical Chemistry
Faculty of Pharmacy
University of Barcelona
More information about the NumPy-Discussion
mailing list