[SciPy-user] A first proposal for dataset organization
Thu Sep 20 15:31:41 CDT 2007
> David (Huard) already highlighted one problem with my proposal (time
> series representation). I would really be interested in comments about
> using MaskedArrays to handle missing data (I've never used it myself),
> and the use of record arrays for the data; for example, I can see cases
> where record arrays may be a problem (if all your data are homogenous,
> you cannot treat the data as a big numpy array), but I don't know if
> this is significant.
Well, there are tools in the sandbox that handle all this kind of stuff. The
new maskedarray implementation in the sandbox has a "MaskedRecords" class
which allows for missing values in record arrays. The timeseries package
handles time series of various frequencies, and is a subclass of MaskedArray
so it also handles missing values too. There is also a "TimeSeriesRecords"
class which is a subclass of the "MaskedRecords" class. This would probably be
a really nice way to represent a lot of this data, but it is hard to say
when/if this stuff will move out of the sandbox and into the core numpy/scipy
If you have specific questions about the maskedarray or timeseries module, or
the current numpy.ma module, start up a new thread and I'll answer what I can,
and I'm sure others can fill in any gaps.
More information about the SciPy-user