[SciPy-user] A first proposal for dataset organization

David Cournapeau david@ar.media.kyoto-u.ac...
Mon Sep 24 00:23:25 CDT 2007

Matt Knox wrote:
>> David (Huard) already highlighted one problem with my proposal (time 
>> series representation). I would really be interested in comments about 
>> using MaskedArrays to handle missing data (I've never used it myself), 
>> and the use of record arrays for the data; for example, I can see cases 
>> where record arrays may be a problem (if all your data are homogenous, 
>> you cannot treat the data as a big numpy array), but I don't know if 
>> this is significant.
> Well, there are tools in the sandbox that handle all this kind of stuff. The 
> new maskedarray implementation in the sandbox has a "MaskedRecords" class 
> which allows for missing values in record arrays. The timeseries package 
> handles time series of various frequencies, and is a subclass of MaskedArray 
> so it also handles missing values too. There is also a "TimeSeriesRecords" 
> class which is a subclass of the "MaskedRecords" class. This would probably be 
> a really nice way to represent a lot of this data, but it is hard to say 
> when/if this stuff will move out of the sandbox and into the core numpy/scipy 
> distribution.
This sounds great. I am a bit worried to depend on sandboxed packages, 
though. My understanding, but I did not follow the discussion in 
details, was that MaskedArrays would replace the current implementation 
in numpy, right ?
> If you have specific questions about the maskedarray or timeseries module, or 
> the current numpy.ma module, start up a new thread and I'll answer what I can, 
> and I'm sure others can fill in any gaps.
Ok, I will take a look at those, because I am totally unfamiliar with those,



More information about the SciPy-user mailing list