[Numpy-discussion] Reading a big netcdf file

Christopher Barker Chris.Barker@noaa....
Wed Aug 3 16:15:06 CDT 2011


On 8/3/11 1:57 PM, Gökhan Sever wrote:
> This is what I get here:
>
> In [1]: a = np.zeros((21601, 10801), dtype=np.uint16)
>
> In [2]: a.tofile('temp.npa')
>
> In [3]: del a
>
> In [4]: timeit a = np.fromfile('temp.npa', dtype=np.uint16)
> 1 loops, best of 3: 251 ms per loop

so that's about 10 times faster than my machine. I didn't think disks 
had gotten much faster -- they are still generally 7200 rpm (or slower 
in laptops).

So I've either got a really slow disk, or you have a really fast one (or 
both), or maybe you're getting cache effect, as you wrote the file just 
before reading it.

repeating, doing just what you did:

In [8]: timeit a = np.fromfile('temp.npa', dtype=np.uint16)
1 loops, best of 3: 2.53 s per loop

then I wrote a bunch of others to disk, and tried again:

In [17]: timeit a = np.fromfile('temp.npa', dtype=np.uint16)
1 loops, best of 3: 2.45 s per loop

so ti seems I'm not seeing cache effects, but maybe you are.

Anyway, we haven't heard from the OP -- I'm not sure what s/he thought 
was slow.

-Chris



>
> On Wed, Aug 3, 2011 at 10:50 AM, Christopher Barker
> <Chris.Barker@noaa.gov <mailto:Chris.Barker@noaa.gov>> wrote:
>
>     On 8/3/11 9:30 AM, Kiko wrote:
>      > I'm trying to read a big netcdf file (445 Mb) using netcdf4-python.
>
>     I've never noticed that netCDF4 was particularly slow for reading
>     (writing can be pretty slow some times). How slow is slow?
>
>      > The data are described as:
>
>     please post the results of:
>
>     ncdump -h the_file_name.nc <http://the_file_name.nc>
>
>     So we can see if there is anything odd in the structure (though I don't
>     know what it might be)
>
>     Post your code (in the simnd pplest form you can).
>
>     and post your timings and machine type
>
>     Is the file netcdf4 or 3 format? (the python lib will read either)
>
>     As a reference, reading that much data in from a raw file into a numpy
>     array takes 2.57 on my machine (a rather old Mac, but disks haven't
>     gotten much  faster). YOu can test that like this:
>
>     a = np.zeros((21601, 10801), dtype=np.uint16)
>
>     a.tofile('temp.npa')
>
>     del a
>
>     timeit a = np.fromfile('temp.npa', dtype=np.uint16)
>
>     (using ipython's timeit)
>
>     -Chris
>
>
>
>     --
>     Christopher Barker, Ph.D.
>     Oceanographer
>
>     Emergency Response Division
>     NOAA/NOS/OR&R (206) 526-6959 <tel:%28206%29%20526-6959>   voice
>     7600 Sand Point Way NE (206) 526-6329 <tel:%28206%29%20526-6329>   fax
>     Seattle, WA  98115 (206) 526-6317 <tel:%28206%29%20526-6317>   main
>     reception
>
>     Chris.Barker@noaa.gov <mailto:Chris.Barker@noaa.gov>
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
>     http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
>
> --
> Gökhan
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Chris.Barker@noaa.gov


More information about the NumPy-Discussion mailing list