[Numpy-discussion] numpy.percentile multiple arrays
Tue Jan 24 18:55:48 CST 2012
This is probably not the best way to do it, but I think it would work:
Your could take two passes through your data, first calculating and storing
the median for each file and the number of elements in each file. From
those data, you can get a lower bound on the 95th percentile of the
combined dataset. For example, if all the files are the same size, and
you've got 100 of them, then the 95th percentile of the full dataset would
be at least as large as the 90th percentile of the individual file median
values. Once you've got that cut-off value, go back through your files and
just pull out the values larger than your cut-off value. Then you'd just
need to figure out what percentile in this subset would correspond to the
95th percentile in the full dataset.
On Tue, Jan 24, 2012 at 7:22 PM, questions anon <firstname.lastname@example.org>wrote:
> I need some help understanding how to loop through many arrays to
> calculate the 95th percentile.
> I can easily do this by using numpy.concatenate to make one big array and
> then finding the 95th percentile using numpy.percentile but this causes a
> memory error when I want to run this on 100's of netcdf files (see code
> Any alternative methods will be greatly appreciated.
> for (path, dirs, files) in os.walk(MainFolder):
> for dir in dirs:
> print dir
> for ncfile in files:
> if ncfile[-3:]=='.nc':
> print "dealing with ncfiles:", ncfile
> ncfile=Dataset(ncfile, 'r+', 'NETCDF4')
> Percentile95th=N.percentile(big_array, 95, axis=0)
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion