[Numpy-discussion] numpy.filled, again

Nathaniel Smith njs@pobox....
Fri Jun 14 12:22:44 CDT 2013

On Wed, Jun 12, 2013 at 7:43 PM, Eric Firing <efiring@hawaii.edu> wrote:
> On 2013/06/12 2:10 AM, Nathaniel Smith wrote:
>> Personally I think that overloading np.empty is horribly ugly, will
>> continue confusing newbies and everyone else indefinitely, and I'm
>> 100% convinced that we'll regret implementing such a warty interface
>> for something that should be so idiomatic. (Unfortunately I got busy
>> and didn't actually say this in the previous thread though.) So I
>> think we should just merge the PR as is. The only downside is the
>> np.ma inconsistency, but, np.ma is already inconsistent (cf.
>> masked_array.fill versus masked_array.filled!), somewhat deprecated,
> "somewhat deprecated"?  Really?  Since when?  By whom?  Replaced by what?

Sorry, not trying to start a fight, just trying to summarize the
situation. As far as I can tell:

Despite heroic efforts on the part of its authors, numpy.ma has a
number of weird quirks (masked data can still trigger invalid value
errors), misfeatures (hard versus soft masks), and just plain old pain
points (ongoing issues with whether any given operation will respect
or preserve the mask).

It's been in deep maintenance mode for some time; we merge the
occasional bug fix that people send in, and that's it. (To be fair,
numpy as a whole is fairly slow-moving, but numpy.ma still gets much
less attention.)

Even if there were active maintainers, no-one really has any idea how
to fix any of the problems above; they're not so much bugs as
intrinsic limitations of the design.

Therefore, my impression is that a majority (not all, but a majority)
of numpy developers strongly recommend against the use of numpy.ma in
new projects.

I could be wrong! And I know there's nothing to really replace it. I'd
like to fix that. But I think "semi-deprecated" is not an unfair
shorthand for the above.

(I'll even admit that I'd *like* to actually deprecate it. But what I
mean by that is, I don't think it's possible to fix it to the point
where it's actually a solid/clean/robust library, so I'd like to reach
a point where everyone who's currently using it is happier switching
to something else and is happy to sign off on deprecating it.)

>> and AFAICT there are far more people who will benefit from a clean
>> np.filled idiom than who actually use np.ma (and in particular its
>> fill-value functionality). So there would be two
> I think there are more np.ma users than you realize.  Everyone who uses
> matplotlib is using np.ma at least implicitly, if not explicitly.  Many
> of the matplotlib examples put np.ma to good use.  np.ma.filled is an
> essential long-standing part of the np.ma API.  I don't see any good
> rationale for generating a conflict with it, when an adequate
> non-conflicting alternative ('np.initialized', maybe others) exists.

I'm aware of that. If I didn't care about the opinions of numpy.ma
users, I wouldn't go starting long and annoying mailing list threads
about features that are only problematic because of their affect on
numpy.ma :-).

But, IMHO given the issues with numpy.ma, our number #1 priority ought
to be making numpy proper as clean and beautiful as possible; my
position that started this thread is basically just that we shouldn't
make numpy proper worse just for numpy.ma's sake. That's the tail
wagging the dog. And this 'conflict' seems a bit overstated given that
(1) np.ma.filled already has multiple names (and 3/4 of the uses in
matplotlib use the method version, not the function version), (2) even
if we give it a non-conflicting name, np.ma's lack of maintenance
means that it'd probably be years before someone got around to
actually adding a parallel function to np.ma. [Unless this thread
spurs someone into submitting one just to prove me wrong ;-).]

But anyway, that was when the comparison was between np.filled() and
np.empty(..., fill_value=...). Of the new things on the table:

- I agree with Tom that 'np.values(...)' is so generic as to be
unguessable. np.fromvalues() was also suggested, but this is even
worse, because it suggests that it's analogous to
np.from{buffer,file,function,regex,...}. But the analogous
fromvalues() function already has a name: np.array.

- np.filled_with and np.initialized are both gratuitously cumbersome.
(It's the gratuitous that bothers me more than the cumbersome. No-one
enjoys using APIs that feel like they're annoying for no good reason.)

- np.full is... huh. It's quirky, and compared to np.filled it's more
confusing (all arrays are full of *something*, but not all have been
filled with a particular value) and it's less consistent with things
like 'sorted'. But at least it's short, simple, and -- once you see it
-- memorable. And at least it isn't immediately obvious when looking
at it that it's a fallback choice because all the good names were
taken. I could probably live with np.full.


More information about the NumPy-Discussion mailing list