[Numpy-discussion] Reading a big netcdf file
Kiko
kikocorreoso@gmail....
Thu Aug 4 05:46:55 CDT 2011
Hi, all.
Thank you very much for your replies.
I am obtaining some issues. If I use netcdf4-python or scipy.io.netcdf
libraries:
In [4]: import netCDF4 as n4
In [5]: from scipy.io import netcdf as nS
In [6]: import numpy as np
In [7]: gebco4 = n4.Dataset('GridOne.grd', 'r')
In [8]: gebcoS = nS.netcdf_file('GridOne.grd', 'r')
Now, if a do:
In [9]: z4 = gebco4.variables['z']
I got no problems and I have:
In [14]: type(z4); z4.shape; z4.size
Out[14]: <type 'netCDF4.Variable'>
Out[14]: (233312401,)
Out[14]: 233312401
But if I do:
In [15]: z4 = gebco4.variables['z'][:]
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
File "netCDF4.pyx", line 2466, in netCDF4.Variable.__getitem__
(netCDF4.c:22943)
File "C:\Python26\lib\site-packages\netCDF4_utils.py", line 278, in
_StartCountStride
n = len(range(beg,end,inc))
MemoryError
I got a memory error. But if a select a smaller array I've got:
In [16]: z4 = gebco4.variables['z'][:10000000]
In [17]: type(z4); z4.shape; z4.size
Out[17]: <type 'numpy.ndarray'>
Out[17]: (10000000,)
Out[17]: 10000000
What's the difference between z4 as a netCDF4.Variable and as a
numpy.ndarray?
Now, if I use scipy.io.netcdf:
In [18]: zS = gebcoS.variables['z']
In [20]: type(zS); zS.shape
Out[20]: <class 'scipy.io.netcdf.netcdf_variable'>
Out[20]: (233312401,)
In [21]: zS = gebcoS.variables['z'][:]
In [22]: type(zS); zS.shape
Out[22]: <type 'numpy.ndarray'>
Out[22]: (233312401,)
What's the difference between zS as a scipy.io.netcdf.netcdf_variable and as
a numpy.ndarray?
Why with scipy.io.netcdf I do not have a MemoryError?
Finally, if I do the following (maybe it's a silly thing do this) using Eric
suggestions to clear the cache:
In [32]: zS = gebcoS.variables['z']
In [38]: timeit -n1 -r1 zSS = np.array(zS[:100000000]) # 100.000.000 out of
233.312.401 because I've got a MemoryError
1 loops, best of 1: 73.1 s per loop
(If I use a copy, timeit -n1 -r1 zSS = np.array(zS[:100000000], copy=True),
I get a MemoryError and I have to set the size to 50.000.000 but it's quite
fast).
Than you very much for your replies and excuse me if some questions are very
basic.
Best regards.
***********************************************************************
The results of ncdump -h
netcdf GridOne {
dimensions:
side = 2 ;
xysize = 233312401 ;
variables:
double x_range(side) ;
x_range:units = "user_x_unit" ;
double y_range(side) ;
y_range:units = "user_y_unit" ;
short z_range(side) ;
z_range:units = "user_z_unit" ;
double spacing(side) ;
short dimension(side) ;
short z(xysize) ;
z:scale_factor = 1. ;
z:add_offset = 0. ;
z:node_offset = 0 ;
// global attributes:
:title = "GEBCO One Minute Grid" ;
:source = "1.02" ;
}
The file is publicly available from:
http://www.gebco.net/data_and_products/gridded_bathymetry_data/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110804/320600ec/attachment-0001.html
More information about the NumPy-Discussion
mailing list