[Numpy-discussion] feedback request: proposal to add masks to the core ndarray

Mark Wiebe mwwiebe@gmail....
Thu Jun 23 17:06:13 CDT 2011

On Thu, Jun 23, 2011 at 4:48 PM, Gael Varoquaux <
gael.varoquaux@normalesup.org> wrote:

> On Thu, Jun 23, 2011 at 03:53:31PM -0500, Mark Wiebe wrote:
> >    concluded that adding masks to the core ndarray appears is the best
> way to
> >    deal with the problem in general.
> It seems to me that this is going to make the numpy array a way more
> complex object. Althought it is currently quite simple, that object has
> already a hard time getting acceptance beyond the scientific community,
> whereas it should really be used in many other places.

I don't think it's the simplicity or complexity of the object which is
preventing more acceptance, but rather the implementation issues it
currently has. I've done a lot to smooth out the rough edges already, for
example the cleanup to structured arrays I just merged fixes a number of
things which just worked poorly or couldn't be done before.

Right now, the numpy array can be seen as an extension of the C array,
> basically a pointer, a data type, and a shape (and strides). This enables
> easy sharing with libraries that have not been written with numpy in
> mind.

It's not that simple, unfortunately, the C API does not provide a good
interface for this. I think a good C++ API needs to be created, because C
doesn't have enough expressive power to make interoperability simple.

The limitations of the subclassing approach that you mention do not seem
> fundemental to me. For instance the impossibility to mix subclasses could
> perhaps be solved using the Mixin Pattern. Ufuncs need work, but I have
> the impression that your proposal is simply to solve the special case of
> masked data in the ufunc by breaking the simple numpy array model.

This doesn't break the numpy array model, that stays exactly as it is. It
does provide a consistent way to deal with masks that align with those
arrays. The subclassing mechanism doesn't provide a way to give a robust
abstraction of the missing value phenomenon, and that is what my proposal is
providing. A simple API call for default behaviors when masks aren't
supported seems reasonable to me.

By moving in the core a growing amount of functionality, it seems to me
> that you are going to make it more and more complex while loosing its
> genericity. Each new feature will need to go in the core and induce a
> high cost. Making inheritance and unfuncs more generic seems to me like a
> better investment.

Putting masks into the core increases the genericity, because it makes that
feature orthogonal to all the other features in a very clean way. The
problems you are raising are potential issues if the implementation is done
poorly, not fundamental issues with the idea.

> My 2 cents,
> Gael
> PS: I am on the verge of conference travel, so I will not be able to
> participate any further in the discussion.

Thanks for taking the time to respond,


> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110623/ec5a57b0/attachment.html 

More information about the NumPy-Discussion mailing list