[Numpy-discussion] feedback request: proposal to add masks to the core ndarray
Thu Jun 30 09:50:53 CDT 2011
On Wed, Jun 29, 2011 at 1:51 PM, Lluís <firstname.lastname@example.org> wrote:
> Mark Wiebe writes:
> > I think that deciding on the value of NA signal values boils down to
> > this question: should 3rd party code be able to interpret missing
> > information stored in the separate mask array?
> > I'm tossing around some variations of ideas using the iterator to
> > provide a buffered mask-based interface that works uniformly with both
> > masked arrays and NA dtypes. This way 3rd party C code only needs to
> > implement one missing data mechanism to fully support both of NumPy's
> > missing data mechanisms.
> Nice. If non-numpy C code is bound to see it as an array (i.e., _always_
> oblivious to the mask concept), then you should probably do what I said
> about "(un)merging" the bit pattern and mask-based NAs, but in this case
> can be done on each block given by the iteration window.
My hands are a little bit tied because of ABI compatibility, but I'm
thinking of ways I can cause 3rd party C code to fail if it doesn't ask for
the data with the mask when it's masked.
There's still the possibility of giving a finer granularity interface
> where both are explicitly accessed, but this will probably add yet
> another set of API functions (although the merging interface can be
> implemented on top of this explicit raw iteration interface).
Things should be as simple as possible, but having layers of lower level
stuff and higher level stuff is good. This is why, for instance, I
introduced the where= parameter to ufuncs, because it's another useful way
of using the same low-level mechanisms.
BTW, this has some overlapping with a mail Travis sent long ago about
> dynamically filling the backing byffer contents (in this case with the
> "merged" NA data for 3rd parties).
> It might prove completely unsatisfactory (w.r.t. performance), but you
> could also fake a bit-pattern-only sequential array by using mprotect to
> detect the memory accesses and trigger then the production of the merged
> data. This provides means for code using the simple buffer protocol,
> without duplicating the whole structure for NA merges.
> This can be complicated even more with some simple strided pattern
> detection to diminish the number of segfaults, as the shape is known.
Someone else will have to do stuff like this... ;)
> "And it's much the same thing with knowledge, for whenever you learn
> something new, the whole world becomes that much richer."
> -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion