[Numpy-discussion] Reading a big netcdf file

Kiko kikocorreoso@gmail....
Thu Aug 4 05:46:55 CDT 2011


Hi, all.

Thank you very much for your replies.

I am obtaining some issues. If I use netcdf4-python or scipy.io.netcdf
libraries:

In [4]: import netCDF4 as n4
In [5]: from scipy.io import netcdf as nS
In [6]: import numpy as np
In [7]: gebco4 = n4.Dataset('GridOne.grd', 'r')
In [8]: gebcoS = nS.netcdf_file('GridOne.grd', 'r')

Now, if a do:

In [9]: z4 = gebco4.variables['z']

I got no problems and I have:

In [14]: type(z4); z4.shape; z4.size
Out[14]: <type 'netCDF4.Variable'>
Out[14]: (233312401,)
Out[14]: 233312401

But if I do:

In [15]: z4 = gebco4.variables['z'][:]
------------------------------------------------------------
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
  File "netCDF4.pyx", line 2466, in netCDF4.Variable.__getitem__
(netCDF4.c:22943)
  File "C:\Python26\lib\site-packages\netCDF4_utils.py", line 278, in
_StartCountStride
    n = len(range(beg,end,inc))
MemoryError

I got a memory error. But if a select a smaller array I've got:

In [16]: z4 = gebco4.variables['z'][:10000000]
In [17]: type(z4); z4.shape; z4.size
Out[17]: <type 'numpy.ndarray'>
Out[17]: (10000000,)
Out[17]: 10000000

What's the difference between z4 as a netCDF4.Variable and as a
numpy.ndarray?

Now, if I use scipy.io.netcdf:

In [18]: zS = gebcoS.variables['z']
In [20]: type(zS); zS.shape
Out[20]: <class 'scipy.io.netcdf.netcdf_variable'>
Out[20]: (233312401,)

In [21]: zS = gebcoS.variables['z'][:]
In [22]: type(zS); zS.shape
Out[22]: <type 'numpy.ndarray'>
Out[22]: (233312401,)

What's the difference between zS as a scipy.io.netcdf.netcdf_variable and as
a numpy.ndarray?
Why with scipy.io.netcdf I do not have a MemoryError?

Finally, if I do the following (maybe it's a silly thing do this) using Eric
suggestions to clear the cache:

In [32]: zS = gebcoS.variables['z']
In [38]: timeit -n1 -r1 zSS = np.array(zS[:100000000]) # 100.000.000 out of
233.312.401 because I've got a MemoryError
1 loops, best of 1: 73.1 s per loop

(If I use a copy, timeit -n1 -r1 zSS = np.array(zS[:100000000], copy=True),
I get a MemoryError and I have to set the size to 50.000.000 but it's quite
fast).

Than you very much for your replies and excuse me if some questions are very
basic.

Best regards.

***********************************************************************
The results of ncdump -h
netcdf GridOne {
dimensions:
        side = 2 ;
        xysize = 233312401 ;
variables:
        double x_range(side) ;
                x_range:units = "user_x_unit" ;
        double y_range(side) ;
                y_range:units = "user_y_unit" ;
        short z_range(side) ;
                z_range:units = "user_z_unit" ;
        double spacing(side) ;
        short dimension(side) ;
        short z(xysize) ;
                z:scale_factor = 1. ;
                z:add_offset = 0. ;
                z:node_offset = 0 ;

// global attributes:
                :title = "GEBCO One Minute Grid" ;
                :source = "1.02" ;
}

The file is publicly available from:
http://www.gebco.net/data_and_products/gridded_bathymetry_data/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110804/320600ec/attachment-0001.html 


More information about the NumPy-Discussion mailing list