[SciPy-user] A first proposal for dataset organization

Matt Knox mattknox_ca@hotmail....
Thu Sep 20 15:31:41 CDT 2007


> David (Huard) already highlighted one problem with my proposal (time 
> series representation). I would really be interested in comments about 
> using MaskedArrays to handle missing data (I've never used it myself), 
> and the use of record arrays for the data; for example, I can see cases 
> where record arrays may be a problem (if all your data are homogenous, 
> you cannot treat the data as a big numpy array), but I don't know if 
> this is significant.

Well, there are tools in the sandbox that handle all this kind of stuff. The 
new maskedarray implementation in the sandbox has a "MaskedRecords" class 
which allows for missing values in record arrays. The timeseries package 
handles time series of various frequencies, and is a subclass of MaskedArray 
so it also handles missing values too. There is also a "TimeSeriesRecords" 
class which is a subclass of the "MaskedRecords" class. This would probably be 
a really nice way to represent a lot of this data, but it is hard to say 
when/if this stuff will move out of the sandbox and into the core numpy/scipy 
distribution.

If you have specific questions about the maskedarray or timeseries module, or 
the current numpy.ma module, start up a new thread and I'll answer what I can, 
and I'm sure others can fill in any gaps.

- Matt



More information about the SciPy-user mailing list