[Numpy-discussion] Missing data again

Skipper Seabold jsseabold@gmail....
Sat Mar 3 16:10:44 CST 2012

On Sat, Mar 3, 2012 at 4:46 PM, Mark Wiebe <mwwiebe@gmail.com> wrote:
> On Sat, Mar 3, 2012 at 12:30 PM, Travis Oliphant <travis@continuum.io>
>>        * the reduction operations need to default to "skipna" --- this is
>> the most common use case which has been re-inforced again to me today by a
>> new user to Python who is using masked arrays presently
> This is a completely trivial change. I went with the default as I did
> because it's what R, the primary inspiration for the NA design, does. We'll
> have to be sure this is well-marked in the documentation about "NumPy NA for
> R users".

It may be trivial to change the code, but this isn't a trivial change.
"Most common use case" is hard for me to swallow, since there are so
many. Of the different statistical softwares I've used, none that I
recall ignores missing data (silently) by default. This sounds
dangerous to me. It's one thing to be convenient to work with missing
data, but it's another to try to sweep the problem under the rug. I
imagine the choice of the R developers was a thoughtful one.

Perhaps something like np.seterr should also be implemented for
missing data since there's probably no resolution to what's most
sensible here.


More information about the NumPy-Discussion mailing list