[Numpy-discussion] alterNEP - was: missing data discussion round 2

Matthew Brett matthew.brett@gmail....
Thu Jun 30 19:02:15 CDT 2011


On Thu, Jun 30, 2011 at 9:01 PM, Lluís <xscript@gmx.net> wrote:
> Matthew Brett writes:
>> Hi,
>> On Thu, Jun 30, 2011 at 7:27 PM, Lluís <xscript@gmx.net> wrote:
>>> Matthew Brett writes:
>>> [...]
>>>> I'm afraid, like you, I'm a little lost in the world of masking,
>>>> because I only need the NAs.  I was trying to see if I could come up
>>>> with an API that picked up some of the syntactic convenience of NAs,
>>>> without conflating NAs with IGNOREs.   I guess we need some feedback
>>>> from the 'NA & IGNORE Share the API' (NISA?) proponents to get an idea
>>>> of what we've missed.  @Mark, @Chuck, guys - what have we lost here by
>>>> separating the APIs?
>>> As I tried to convey on my other mail, separating both will force you to
>>> either:
>>> * Make a copy of the array before passing it to another routine (because
>>>  the routine will assign np.NA but you still want the original data)
>> You have an array 'arr'.   The array does support NAs, but it doesn't
>> have a mask.  You want to pass ``arr`` to another routine ``func``.
>> You expect ``func`` to set NAs into the data but you don't want
>> ``func`` to modify ``arr`` and you don't want to copy ``arr`` either.
>> You are saying the following:
>> "with the fused API, I can make ``arr`` be a masked array, and pass it
>> into ``func``, and know that, when func sets elements of arr to NA, it
>> will only modify the mask and not the underlying data in ``arr``."
> Yes.
>> It does seem to me this is a very obscure case.  First, ``func`` is
>> modifying the array but you want an unmodified array back.  Second,
>> you'll have to do some view trick to recover the not-NA case to arr,
>> when it comes back.
> I know, the example is just silly and convoluted.
>> It seems to me, that what ``func`` should do, if it wants you to be
>> able to unmask the NAs, is to make a masked array view of ``arr``, and
>> return that.   And indeed the simplicity of the separated API
>> immediately makes that clear - in my view at least.
> I agree on this example. My only concern is on the API's ability to
> foresee as most future use-cases as possible, without impacting
> performance.

But, of course, there's a great danger in trying to cover every
possible use-case.

My argument is that the kind of cases that you are describe are - I
believe - very rare and are even a little difficult to make up.  Is
that fair?

To my mind, the separate NA and IGNORE API is easier to understand and
explain.   If that isn't true, please do say, and say why - because
that point is key.

If it is true that the separate API is clearer, then the benefit in
terms of power and extensibility has to be large, in order to go for
the fused API.



More information about the NumPy-Discussion mailing list