[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Mark Wiebe mwwiebe@gmail....
Fri Jun 24 19:17:02 CDT 2011


On Fri, Jun 24, 2011 at 3:38 PM, Lluís <xscript@gmx.net> wrote:

> Mark Wiebe writes:
>
> >             It's should also be possible to accomplish a general
> >             solution at the dtype level. We could have a 'dtype
> >             factory' used like:  np.zeros(10, dtype=np.maybe(float))
> >             where np.maybe(x) returns a new dtype whose storage size
> >             is x.itemsize + 1, where the extra byte is used to store
> >             missingness information.  (There might be some annoying
> >             alignment issues to deal with.) Then for each ufunc we
> >             define a handler for the maybe dtype (or add a
> >             special-case to the ufunc dispatch machinery) that checks
> >             the missingness value and then dispatches to the ordinary
> >             ufunc handler for the wrapped dtype.
>
>
> >         The 'dtype factory' idea builds on the way I've structured
> >         datetime as a parameterized type, but the thing that kills it
> >         for me is the alignment problems of 'x.itemsize + 1'. Having
> >         the mask in a separate memory block is a lot better than
> >         having to store 16 bytes for an 8-byte int to preserve the
> >         alignment.
>
>
> >     Yes, but that assumes it is appended to the existing types in the
> >     dtype individually instead of the dtype as a whole. The dtype with
> >     mask could just indicate a shadow array, an alpha channel if you
> >     will, that is essentially what you are already doing but just
> >     probide a different place to track it.
>
>
> > This would seem to change the definition of a dtype - currently it
> > represents a contiguous block of memory. It doesn't need to use all of
> > that memory, but the dtype conceptually owns it. I kind of like it
> > that way, where the whole strides idea with data being all over memory
> > space belonging to ndarray, not dtype.
>
> I don't havy any knowledge on the numpy or ma internals, so this might
> well be nonsense.
>
> Increasing the dtype item size would certainly decrease performance when
> using big structures, as it will require higher memory bandwidth.
>
> Why not use structured arrays? (assuming each struct element has indeed
> its own buffer, otherwise it's the same as having a "bigger" dtype) Then
> you can have some "blessed" struct elements, like the mask, which
> influence on how to print the array or how other struct elements must be
> operated.
>

Structured arrays do put their fields next to each other in memory, so this
is basically like having a bigger dtype.

-Mark



> Besides, using "blessed" struct elements falls in line with the recent
> "_ufunc_wrapper_" proposal.
>
>
>
> Lluis
>
> --
>  "And it's much the same thing with knowledge, for whenever you learn
>  something new, the whole world becomes that much richer."
>  -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
>  Tollbooth
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110624/ef2de1eb/attachment-0001.html 


More information about the NumPy-Discussion mailing list