[Numpy-discussion] missing data discussion round 2
Dag Sverre Seljebotn
Wed Jun 29 01:53:43 CDT 2011
On 06/28/2011 11:52 PM, Matthew Brett wrote:
> On Tue, Jun 28, 2011 at 5:38 PM, Charles R Harris
> <email@example.com> wrote:
>> Nathaniel, an implementation using masks will look *exactly* like an
>> implementation using na-dtypes from the user's point of view. Except that
>> taking a masked view of an unmasked array allows ignoring values without
>> destroying or copying the original data. The only downside I can see to an
>> implementation using masks is memory and disk storage, and perhaps memory
>> mapped arrays. And I rather expect the former to solve itself in a few
>> years, eight gigs is becoming a baseline for workstations and in a couple of
>> years I expect that to be up around 16-32, and a few years after that.... In
>> any case we are talking 12% - 25% overhead, and in practice I expect it
>> won't be quite as big a problem as folks project.
> Or, in the case of 16 bit integers, 50% memory overhead.
> I honestly find it hard to believe that I will not care about memory
> use in the near future, and I don't think it's wise to make decisions
> on that assumption.
In many sciences, waiting for the future makes things worse, not better,
simply because the amount of available data easily grows at a faster
rate than the amount of memory you can get per dollar :-)
More information about the NumPy-Discussion