[Numpy-discussion] feedback request: proposal to add masks to the core ndarray
Fri Jun 24 19:17:02 CDT 2011
On Fri, Jun 24, 2011 at 3:38 PM, Lluís <firstname.lastname@example.org> wrote:
> Mark Wiebe writes:
> > It's should also be possible to accomplish a general
> > solution at the dtype level. We could have a 'dtype
> > factory' used like: np.zeros(10, dtype=np.maybe(float))
> > where np.maybe(x) returns a new dtype whose storage size
> > is x.itemsize + 1, where the extra byte is used to store
> > missingness information. (There might be some annoying
> > alignment issues to deal with.) Then for each ufunc we
> > define a handler for the maybe dtype (or add a
> > special-case to the ufunc dispatch machinery) that checks
> > the missingness value and then dispatches to the ordinary
> > ufunc handler for the wrapped dtype.
> > The 'dtype factory' idea builds on the way I've structured
> > datetime as a parameterized type, but the thing that kills it
> > for me is the alignment problems of 'x.itemsize + 1'. Having
> > the mask in a separate memory block is a lot better than
> > having to store 16 bytes for an 8-byte int to preserve the
> > alignment.
> > Yes, but that assumes it is appended to the existing types in the
> > dtype individually instead of the dtype as a whole. The dtype with
> > mask could just indicate a shadow array, an alpha channel if you
> > will, that is essentially what you are already doing but just
> > probide a different place to track it.
> > This would seem to change the definition of a dtype - currently it
> > represents a contiguous block of memory. It doesn't need to use all of
> > that memory, but the dtype conceptually owns it. I kind of like it
> > that way, where the whole strides idea with data being all over memory
> > space belonging to ndarray, not dtype.
> I don't havy any knowledge on the numpy or ma internals, so this might
> well be nonsense.
> Increasing the dtype item size would certainly decrease performance when
> using big structures, as it will require higher memory bandwidth.
> Why not use structured arrays? (assuming each struct element has indeed
> its own buffer, otherwise it's the same as having a "bigger" dtype) Then
> you can have some "blessed" struct elements, like the mask, which
> influence on how to print the array or how other struct elements must be
Structured arrays do put their fields next to each other in memory, so this
is basically like having a bigger dtype.
> Besides, using "blessed" struct elements falls in line with the recent
> "_ufunc_wrapper_" proposal.
> "And it's much the same thing with knowledge, for whenever you learn
> something new, the whole world becomes that much richer."
> -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion