[Numpy-discussion] Help making better use of numpy array functions

Vincent Schut schut@sarvision...
Thu Nov 26 07:27:44 CST 2009


mdekauwe wrote:
> Thanks...I have adopted that and as you said it is a lot neater! Though I
> need to keep the pixel count for a weighting in another piece of code so
> have amended your logic slightly.

Alternatively, you could simply take the sum over axis=0 of the weight 
array to get the pixel count (e.g. "pixelcount=weight.sum(axis=0)"). I 
don't know if performance is an issue, but I'd say that skipping these 
numpy.where's and increments of count and running total, and use 
numpy.average instead, should give you a speedup. Though the size of 
your arrays might be too small for it to become very noticeable.
However, in your current version, it makes no sense to allocate an 
8.nx.ny array to read the data, as you anyways calculate a running 
total. The 8,nx,ny array only makes sense to postpone calculations till 
you've read all 8 days, and only then calculate the average (and pixel 
count). Currently you preserve the previous days of data, but you don't 
use them anywhere anymore.
Either way will work, though, I guess. Choose what feels most naturally 
for you, as that will help you maintain and understand your code later on.

Oh, and minor issue: creating a array of zeros and then multiplying with 
-999 still makes an array of zeros... I'd incorporated an array of 
*ones* multiplied with -999, because for the last chunk of days you 
could end up with a 8day array only partly filled with real data. E.g. 
if you'd have only 3 days of data left in your last chunk, 8dayData[0:3] 
would be data, and to prevent the rest ([3:8]) to be incorporated into 
the average calculation, these need to be -999. Currently these pixels 
will be 0, which is > nodatavalue, and thus infuencing your average (the 
pixelcount will be incremented for these 0 values).


Vincent.
> 
> #!/usr/bin/env python
> 
> 
> import sys, os, glob
> import numpy as np
> 
> def averageEightDays(filenamesList, numrows, numcols, year):
> 	doy = 122
> 	nodatavalue =  -999.0
> 	
> 	for fchunk in filenamesList:
> 		# fill with nodata values, in case there are less than 8 days
> 		data8days = np.zeros((8, numrows, numcols), dtype=np.float32) * -999.0 
> 		pixCount = np.zeros((numrows, numcols), dtype=np.int)
> 		avg8days = np.zeros((numrows, numcols), dtype=np.float32)
> 		for day in xrange(len(fchunk)):
> 			f = fchunk[day]
> 			data8days[day] = np.fromfile(f, dtype=np.float32).reshape(numrows,
> numcols)
> 			avg8days = np.where(data8days[day] > nodatavalue,
> avg8days+data8days[day], avg8days)
> 			pixCount = np.where(data8days[day] > nodatavalue, pixCount+1, pixCount)
> 		
> 		avg8days = np.where(pixCount > 0, avg8days / np.float32(pixCount), -999.0)
> 		doy += 8
> 		print year,':',doy
> 		
> 		outfile = "lst_8day1030am_" + str(year) + str(doy) + ".gra"
> 		write_outputs(outfile, avg8days)
> 		outfile = "pixelcount_8day1030am_" + str(year) + str(doy) + ".gra"
> 		write_outputs(outfile, pixCount)
> 
> def write_outputs(outfile, data):
> 	opath = "/users/eow/mgdk/research/HOFF_plots/LST/tmp"
> 	
> 	try:
> 		of = open(os.path.join(opath, outfile), 'wb')
> 	except IOError:
> 		print "Cannot open outfile for write", outfile
> 		sys.exit(1)
> 	
> 	# empty stuff
> 	data.tofile(of)
> 	of.close()
> 	
> 	
> if __name__ == "__main__":
> 
> 	numrows = 332
> 	numcols = 667
> 	path = "/users/eow/mgdk/research/HOFF_plots/LST/gridded_03/"
> 	
> 	files = 'lst_scr_2006*.gra'
> 	filenames = glob.glob(os.path.join(path, files))
> 	filenames.sort()
> 	filenamesList = [filenames[n:n+8] for n in xrange(0, len(filenames), 8)]
> 	averageEightDays(filenamesList, numrows, numcols, year=2006)
> 	
> 	files = 'lst_scr_2007*.gra'
> 	filenames = glob.glob(os.path.join(path, files))
> 	filenames.sort()
> 	filenamesList = [filenames[n:n+8] for n in xrange(0, len(filenames), 8)]
> 	averageEightDays(filenamesList, numrows, numcols, year=2007)
> 



More information about the NumPy-Discussion mailing list