[SciPy-User] Masking multiple fields in a structured timeseries object.

Pierre GM pgmdevlist@gmail....
Fri Jan 8 12:20:42 CST 2010


On Jan 8, 2010, at 11:16 AM, Dharhas Pothina wrote:
> Hi,
> 
> I have a structured time series object I have read in from a file. I am providing my script the following parameters
> filename, startdate, enddate, parameter (All, Salinity, Temp., etc), Max , Min, Instrument type (2 digit code contained in the filename).
> 
> My timeseries is structured like :
> 
> timeseries([ ('JOB_20090812_CXT_MW9999.csv', 0, --, --, --, --, --, 22.0, 13.199999999999999, 28.949999999999999, --, 0.39928999999999998, --, --)
....
> ('JOB_20090812_CXT_MW9999.csv', 0, --, --, --, --, 3.5899999999999999, 25.899999999999999, 15.699999999999999, 25.859999999999999, --, 1.6398200000000001, --, --)],
>   dtype = [('Filename', '|S27'), ('Year', '<i8'), ('Month', '<f8'), ('Day', '<f8'), ('Hour', '<f8'), ('Minute', '<f8'), ('Second', '<f8'), ('DOSat', '<f8'), ('AirPressure', '<f8'), ('AirTemperature', '<f8'), ('BatteryVoltage', '<f8'), ('DO', '<f8'), ('EC_Norm', '<f8'), ('Salinity', '<f8')],
>   dates = [11-Jun-1996 21:00 11-Jun-1996 22:00 11-Jun-1996 23:00 ...,
> 05-Oct-2000 09:00 05-Oct-2000 10:00 05-Oct-2000 11:00],
>   freq  = T)
> 
> 
> 
> I want to mask the data in the following way:
> 
> Mask all values between start & end dates that meet the following criteria:
> 
> 1) selected parameter (mask all if blank) 
> 2) selected filename (mask all if blank)
> 3) selected instrument (mask all if blank). Note the instrument is the 18 & 19 character in the filename, ie 'MW' in the example above.
> 4) parameter value lies between the given max and min values.
> 
> I'm having trouble working out how to check all these conditions at once or sequentially before masking.


Step by step, it's gonna be easier to debug.
Take the simpler example:
>>> ndtype=[('name','|S3'),('v1',float),('v2',float)]
>>> series=ts.time_series([("ABC",1.1,10.),("ABD",2.2,20.),("ABE",3.3,30)],
                      dtype=ndtype, start_date=ts.now('D'))
>>> _series=series.series

_series is only a masked array, that's gonna keep things nice and easy (no need to carry the dates)

Mask a record (viz, a full row) if v2>25
>>> series[_series['v2']>25]=ma.masked

Mask a record if the last character of the name is "C". This one is trickier, as we need to test whether the field 'name' is masked
>>> maskonnames = []
>>> for _ in _series['name']:
>>>     if _ is ma.masked:
>>>         maskonnames.append(False)
>>>     else:
>>>         maskonnames.append(_[-1]=='C')
>>> series[np.array(maskonnames)] = ma.masked

(maskonnames is a list that we need to transform into a bool ndarray to have fancy indexing. Otherwise, we just gonna take the first or second record (depending on whether maskonnames is False (0) or True (1)), and that's not what we want.
So, so far
>>> series
timeseries([(--, --, --) ('ABD', 2.2000000000000002, 20.0) (--, --, --)],
   dtype = [('name', '|S3'), ('v1', '<f8'), ('v2', '<f8')],
   dates = [08-Jan-2010 ... 10-Jan-2010],
   freq  = D)

Now mask v1 if v1 < 3
>>> _series['v1'][_series['v1']<3]=ma.masked
>>> series
timeseries([(--, --, --) ('ABD', --, 20.0) (--, --, --)],
   dtype = [('name', '|S3'), ('v1', '<f8'), ('v2', '<f8')],
   dates = [08-Jan-2010 ... 10-Jan-2010],
   freq  = D)

Then of course, you can add extras conditions.

Another approach is to create a global_condition bool array
>>> global_condition = np.zeros(len(series), dtype=bool)
>>> global_condition |= _series[_series['v2']>25]=ma.masked
>>> global_condition |= maskonnames
>>> series[global_condition]=ma.masked

HIH
P.


More information about the SciPy-User mailing list