[Numpy-discussion] Help with np.where and datetime functions

John [H2O] washakie@gmail....
Wed Jul 8 06:03:58 CDT 2009


Hello,

I have several issues which require me to iterate through a fairly large
array (300000+ records).

The first case is calculating and hourly average from non-regularly sampled
data. The second is screening one array, based on data in the second array.
The functions are defined below, but inherent to each is the following
snippet:
    
    ind = np.where( (t1 < X[:,0]) & (X[:,0] < t2) )

    where X is a (n,2) array and X[:,0] = a vector of datetime objects.
    
What I am trying to do (obviously?) is find all the values of X that fall
within a time range.

Specifically, one point I do not understand is why the following two methods
fail:
    
--> 196         ind = np.where( (t1 < Y[:,0] < t2) ) #same result
with/without inner parens
TypeError: can't compare datetime.datetime to numpy.ndarray


OR trying the 'and' method:

--> 196         ind = np.where( (Y[:,0]>t1) and (Y[:,0]<t2) )
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all() 
    
Is there a way that I can use np.where more efficiently, say, to pass a
vector of dates to a function, and return all indexes where the array has
times within a certain range of those times?    

I would be interested in suggestions on how to improve/optimize the code
below. One point, I assume there is better way to create/build the new
arrays without using lists and append and converting to np.array. How do I
set up the assignments?

Thank you!

    
    
def calc_hravg(X):
    """Calculates hourly average from input data"""

    X_hr = []
    minT = X[:,0].min() #array is not necessarily sorted
    maxT = dt.datetime(*X[:,0].max().timetuple()[0:4])
    minT = dt.datetime(*minT.timetuple()[0:4]) #get the time to closest HOUR
    t1 = minT 
    while t1 <= maxT:
        t2 = t1 + dt.timedelta(hours=1)
        ind = np.where( (t1 < X[:,0]) & (X[:,0] < t2) )
        vals = X[ind,1][0].T
        try:
            #hr_avg = np.sum(vals) / len(vals)
            hr_avg = np.average(vals)

        except:
            hr_avg = np.nan
        X_hr.append([hr,hr_avg])
        t1 = t2
    
    return np.array(X_hr)
    
    
    
def screen_xfory(X,Y,rng=[(248,360),(0,111)]):
    """ screens data in X for criteria (within) range in Y
    where rng is a list of low/high tuples
    
    assumes 2-d arrays of x,y pairs, screening on y"""
    newX = []
    for i in range(len(X)):
        # define a 70 minute range of time to find data within:
        t1 = X[i,0] - dt.timedelta(minutes=35)
        t2 = X[i,0] + dt.timedelta(minutes=35)
        ind = np.where( (Y[:,0]>t1) & (Y[:,0]<t2) )
        #ind = np.where( (Y[:,0]>t1) and (Y[:,0]<t2) )
        #ind = np.where( (t1 < Y[:,0] < t2) )
        screen_vals = Y[ind,1][0]
        dflag = True
        if screen_vals:
            if dflag != True:
                break
            for r in rng:
                low = r[0]
                high = r[1]
                for s in screen_vals:
                    if s != 999.0:
                        if s > low:
                            if s < high:
                                dflag = False
                    else:
                        print 'MISSING'
        else:
            print 'no data available'
            dflag = False
        if dflag:
            print '''%s ::: data: %s, OKAY''' % (X[i,0],screen_vals)
            newX.append([X[i,0],X[i,1]])
        else:
            print '''%s ::: BAD:  %s''' % (X[i,0],screen_vals)
        
    return np.array(newX)
-- 
View this message in context: http://www.nabble.com/Help-with-np.where-and-datetime-functions-tp24389447p24389447.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.



More information about the NumPy-Discussion mailing list