[Numpy-discussion] Help making better use of numpy array functions
Vincent Schut
schut@sarvision...
Thu Nov 26 07:27:44 CST 2009
mdekauwe wrote:
> Thanks...I have adopted that and as you said it is a lot neater! Though I
> need to keep the pixel count for a weighting in another piece of code so
> have amended your logic slightly.
Alternatively, you could simply take the sum over axis=0 of the weight
array to get the pixel count (e.g. "pixelcount=weight.sum(axis=0)"). I
don't know if performance is an issue, but I'd say that skipping these
numpy.where's and increments of count and running total, and use
numpy.average instead, should give you a speedup. Though the size of
your arrays might be too small for it to become very noticeable.
However, in your current version, it makes no sense to allocate an
8.nx.ny array to read the data, as you anyways calculate a running
total. The 8,nx,ny array only makes sense to postpone calculations till
you've read all 8 days, and only then calculate the average (and pixel
count). Currently you preserve the previous days of data, but you don't
use them anywhere anymore.
Either way will work, though, I guess. Choose what feels most naturally
for you, as that will help you maintain and understand your code later on.
Oh, and minor issue: creating a array of zeros and then multiplying with
-999 still makes an array of zeros... I'd incorporated an array of
*ones* multiplied with -999, because for the last chunk of days you
could end up with a 8day array only partly filled with real data. E.g.
if you'd have only 3 days of data left in your last chunk, 8dayData[0:3]
would be data, and to prevent the rest ([3:8]) to be incorporated into
the average calculation, these need to be -999. Currently these pixels
will be 0, which is > nodatavalue, and thus infuencing your average (the
pixelcount will be incremented for these 0 values).
Vincent.
>
> #!/usr/bin/env python
>
>
> import sys, os, glob
> import numpy as np
>
> def averageEightDays(filenamesList, numrows, numcols, year):
> doy = 122
> nodatavalue = -999.0
>
> for fchunk in filenamesList:
> # fill with nodata values, in case there are less than 8 days
> data8days = np.zeros((8, numrows, numcols), dtype=np.float32) * -999.0
> pixCount = np.zeros((numrows, numcols), dtype=np.int)
> avg8days = np.zeros((numrows, numcols), dtype=np.float32)
> for day in xrange(len(fchunk)):
> f = fchunk[day]
> data8days[day] = np.fromfile(f, dtype=np.float32).reshape(numrows,
> numcols)
> avg8days = np.where(data8days[day] > nodatavalue,
> avg8days+data8days[day], avg8days)
> pixCount = np.where(data8days[day] > nodatavalue, pixCount+1, pixCount)
>
> avg8days = np.where(pixCount > 0, avg8days / np.float32(pixCount), -999.0)
> doy += 8
> print year,':',doy
>
> outfile = "lst_8day1030am_" + str(year) + str(doy) + ".gra"
> write_outputs(outfile, avg8days)
> outfile = "pixelcount_8day1030am_" + str(year) + str(doy) + ".gra"
> write_outputs(outfile, pixCount)
>
> def write_outputs(outfile, data):
> opath = "/users/eow/mgdk/research/HOFF_plots/LST/tmp"
>
> try:
> of = open(os.path.join(opath, outfile), 'wb')
> except IOError:
> print "Cannot open outfile for write", outfile
> sys.exit(1)
>
> # empty stuff
> data.tofile(of)
> of.close()
>
>
> if __name__ == "__main__":
>
> numrows = 332
> numcols = 667
> path = "/users/eow/mgdk/research/HOFF_plots/LST/gridded_03/"
>
> files = 'lst_scr_2006*.gra'
> filenames = glob.glob(os.path.join(path, files))
> filenames.sort()
> filenamesList = [filenames[n:n+8] for n in xrange(0, len(filenames), 8)]
> averageEightDays(filenamesList, numrows, numcols, year=2006)
>
> files = 'lst_scr_2007*.gra'
> filenames = glob.glob(os.path.join(path, files))
> filenames.sort()
> filenamesList = [filenames[n:n+8] for n in xrange(0, len(filenames), 8)]
> averageEightDays(filenamesList, numrows, numcols, year=2007)
>
More information about the NumPy-Discussion
mailing list