[Numpy-discussion] Help with np.where and datetime functions
John [H2O]
washakie@gmail....
Wed Jul 8 06:03:58 CDT 2009
Hello,
I have several issues which require me to iterate through a fairly large
array (300000+ records).
The first case is calculating and hourly average from non-regularly sampled
data. The second is screening one array, based on data in the second array.
The functions are defined below, but inherent to each is the following
snippet:
ind = np.where( (t1 < X[:,0]) & (X[:,0] < t2) )
where X is a (n,2) array and X[:,0] = a vector of datetime objects.
What I am trying to do (obviously?) is find all the values of X that fall
within a time range.
Specifically, one point I do not understand is why the following two methods
fail:
--> 196 ind = np.where( (t1 < Y[:,0] < t2) ) #same result
with/without inner parens
TypeError: can't compare datetime.datetime to numpy.ndarray
OR trying the 'and' method:
--> 196 ind = np.where( (Y[:,0]>t1) and (Y[:,0]<t2) )
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()
Is there a way that I can use np.where more efficiently, say, to pass a
vector of dates to a function, and return all indexes where the array has
times within a certain range of those times?
I would be interested in suggestions on how to improve/optimize the code
below. One point, I assume there is better way to create/build the new
arrays without using lists and append and converting to np.array. How do I
set up the assignments?
Thank you!
def calc_hravg(X):
"""Calculates hourly average from input data"""
X_hr = []
minT = X[:,0].min() #array is not necessarily sorted
maxT = dt.datetime(*X[:,0].max().timetuple()[0:4])
minT = dt.datetime(*minT.timetuple()[0:4]) #get the time to closest HOUR
t1 = minT
while t1 <= maxT:
t2 = t1 + dt.timedelta(hours=1)
ind = np.where( (t1 < X[:,0]) & (X[:,0] < t2) )
vals = X[ind,1][0].T
try:
#hr_avg = np.sum(vals) / len(vals)
hr_avg = np.average(vals)
except:
hr_avg = np.nan
X_hr.append([hr,hr_avg])
t1 = t2
return np.array(X_hr)
def screen_xfory(X,Y,rng=[(248,360),(0,111)]):
""" screens data in X for criteria (within) range in Y
where rng is a list of low/high tuples
assumes 2-d arrays of x,y pairs, screening on y"""
newX = []
for i in range(len(X)):
# define a 70 minute range of time to find data within:
t1 = X[i,0] - dt.timedelta(minutes=35)
t2 = X[i,0] + dt.timedelta(minutes=35)
ind = np.where( (Y[:,0]>t1) & (Y[:,0]<t2) )
#ind = np.where( (Y[:,0]>t1) and (Y[:,0]<t2) )
#ind = np.where( (t1 < Y[:,0] < t2) )
screen_vals = Y[ind,1][0]
dflag = True
if screen_vals:
if dflag != True:
break
for r in rng:
low = r[0]
high = r[1]
for s in screen_vals:
if s != 999.0:
if s > low:
if s < high:
dflag = False
else:
print 'MISSING'
else:
print 'no data available'
dflag = False
if dflag:
print '''%s ::: data: %s, OKAY''' % (X[i,0],screen_vals)
newX.append([X[i,0],X[i,1]])
else:
print '''%s ::: BAD: %s''' % (X[i,0],screen_vals)
return np.array(newX)
--
View this message in context: http://www.nabble.com/Help-with-np.where-and-datetime-functions-tp24389447p24389447.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.
More information about the NumPy-Discussion
mailing list