[Numpy-discussion] consensus (was: NA masks in the next numpy release?)

Benjamin Root ben.root@ou....
Fri Oct 28 22:38:45 CDT 2011


On Friday, October 28, 2011, Matthew Brett <matthew.brett@gmail.com> wrote:
>> Forget about rudeness or decision processes.
> No, that's a common mistake, which is to assume that any conversation
> about things which aren't technical, is not important.   Nathaniel's
> point is important.  Rudeness is important. The reason we've got into
> this mess is because we clearly don't have an agreed way of making
> decisions.  That's why countries and open-source projects have
> constitutions, so this doesn't happen.

Don't get me wrong. In general, you are right.  And maybe we all should
discuss something to that effect for numpy.  But I would rather do that when
there isn't such contention and tempers.

As for allegations of rudeness, I believe that we are actually very close to
consensus that I immediately wanted to squelch any sort of
meta-meta-disagreements about who was being rude to who.  As a quick
band-aide, anybody who felt slighted by me gets a drink on me at the next
scipy conference.  From this point on, let's institute a 10 minute rule --
write your email, wait ten minutes, read it again and edit it.

>> I will start by saying that I am willing to separate ignore and absent,
>> only on the write side of things.  On read, I want a single way to
>> the missing values.  I also want only a single way to perform
>> (either skip or propagate).
> Thank you - that is very helpful.
> Are you saying that you'd be OK setting missing values like this?
>>>> a.mask[0:2] = False

Probably not that far, because that would be an attribute that may or may
not exist.  Rather, I might like the idea of a NA to "always" mean absent
(and destroys - even through views), and MA (or some other name) which
always means ignore (and has the masking behavior with views). This makes
specific behaviors tied distinctly to specific objects.

> For the read side, do you mean you're OK with this
>>>> a.isna()
> To identify the missing values, as is currently the case?  Or something

Yes.  A missing value is a missing value, regardless of it being absent or
marked as ignored.  But it is a bit more subtle than that.  I should just be
able to add two arrays together and the "data should know what to do". When
the core ufuncs get this right (like min, max, sum, cumsum, diff, etc), then
I don't have to do much to prepare higher level funcs for missing data.

> If so, then I think we're very close, it's just a discussion about names.

And what does ignore + absent equals. ;-)

>> An indicator of success would be that people stop using NaNs and magic
>> numbers (-9999, anyone?) and we could even deprecate nansum(), or at
>> strongly suggest in its docs to use NA.
> That is an excellent benchmark,
> Best,
> Matthew

Ben Root
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20111028/676b7461/attachment.html 

More information about the NumPy-Discussion mailing list