[Numpy-discussion] Reading a big netcdf file

Kiko kikocorreoso@gmail....
Thu Aug 4 05:46:55 CDT 2011

Hi, all.

Thank you very much for your replies.

I am obtaining some issues. If I use netcdf4-python or scipy.io.netcdf

In [4]: import netCDF4 as n4
In [5]: from scipy.io import netcdf as nS
In [6]: import numpy as np
In [7]: gebco4 = n4.Dataset('GridOne.grd', 'r')
In [8]: gebcoS = nS.netcdf_file('GridOne.grd', 'r')

Now, if a do:

In [9]: z4 = gebco4.variables['z']

I got no problems and I have:

In [14]: type(z4); z4.shape; z4.size
Out[14]: <type 'netCDF4.Variable'>
Out[14]: (233312401,)
Out[14]: 233312401

But if I do:

In [15]: z4 = gebco4.variables['z'][:]
Traceback (most recent call last):
  File "<ipython console>", line 1, in <module>
  File "netCDF4.pyx", line 2466, in netCDF4.Variable.__getitem__
  File "C:\Python26\lib\site-packages\netCDF4_utils.py", line 278, in
    n = len(range(beg,end,inc))

I got a memory error. But if a select a smaller array I've got:

In [16]: z4 = gebco4.variables['z'][:10000000]
In [17]: type(z4); z4.shape; z4.size
Out[17]: <type 'numpy.ndarray'>
Out[17]: (10000000,)
Out[17]: 10000000

What's the difference between z4 as a netCDF4.Variable and as a

Now, if I use scipy.io.netcdf:

In [18]: zS = gebcoS.variables['z']
In [20]: type(zS); zS.shape
Out[20]: <class 'scipy.io.netcdf.netcdf_variable'>
Out[20]: (233312401,)

In [21]: zS = gebcoS.variables['z'][:]
In [22]: type(zS); zS.shape
Out[22]: <type 'numpy.ndarray'>
Out[22]: (233312401,)

What's the difference between zS as a scipy.io.netcdf.netcdf_variable and as
a numpy.ndarray?
Why with scipy.io.netcdf I do not have a MemoryError?

Finally, if I do the following (maybe it's a silly thing do this) using Eric
suggestions to clear the cache:

In [32]: zS = gebcoS.variables['z']
In [38]: timeit -n1 -r1 zSS = np.array(zS[:100000000]) # 100.000.000 out of
233.312.401 because I've got a MemoryError
1 loops, best of 1: 73.1 s per loop

(If I use a copy, timeit -n1 -r1 zSS = np.array(zS[:100000000], copy=True),
I get a MemoryError and I have to set the size to 50.000.000 but it's quite

Than you very much for your replies and excuse me if some questions are very

Best regards.

The results of ncdump -h
netcdf GridOne {
        side = 2 ;
        xysize = 233312401 ;
        double x_range(side) ;
                x_range:units = "user_x_unit" ;
        double y_range(side) ;
                y_range:units = "user_y_unit" ;
        short z_range(side) ;
                z_range:units = "user_z_unit" ;
        double spacing(side) ;
        short dimension(side) ;
        short z(xysize) ;
                z:scale_factor = 1. ;
                z:add_offset = 0. ;
                z:node_offset = 0 ;

// global attributes:
                :title = "GEBCO One Minute Grid" ;
                :source = "1.02" ;

The file is publicly available from:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110804/320600ec/attachment-0001.html 

More information about the NumPy-Discussion mailing list