[Numpy-discussion] New numpy functions: filled, filled_like

Dave Hirschfeld dave.hirschfeld@gmail....
Mon Jan 14 03:02:36 CST 2013


Robert Kern <robert.kern <at> gmail.com> writes:

> 
> >>> >
> >>> > One alternative that does not expand the API with two-liners is to let
> >>> > the ndarray.fill() method return self:
> >>> >
> >>> >   a = np.empty(...).fill(20.0)
> >>>
> >>> This violates the convention that in-place operations never return
> >>> self, to avoid confusion with out-of-place operations. E.g.
> >>> ndarray.resize() versus ndarray.reshape(), ndarray.sort() versus
> >>> np.sort(), and in the broader Python world, list.sort() versus
> >>> sorted(), list.reverse() versus reversed(). (This was an explicit
> >>> reason given for list.sort to not return self, even.)
> >>>
> >>> Maybe enabling this idiom is a good enough reason to break the
> >>> convention ("Special cases aren't special enough to break the rules. /
> >>> Although practicality beats purity"), but it at least makes me -0 on
> >>> this...
> >>>
> >>
> >> I tend to agree with the notion that inplace operations shouldn't return
> >> self, but I don't know if it's just because I've been conditioned this way.
> >> Not returning self breaks the fluid interface pattern [1], as noted in a
> >> similar discussion on pandas [2], FWIW, though there's likely some way to
> >> have both worlds.
> >
> > Ah-hah, here's the email where Guide officially proclaims that there
> > shall be no "fluent interface" nonsense applied to in-place operators
> > in Python, because it hurts readability (at least for Dutch people
> > ):
> >   http://mail.python.org/pipermail/python-dev/2003-October/038855.html
> 
> That's a statement about the policy for the stdlib, and just one
> person's opinion. You, and numpy, are permitted to have a different
> opinion.
> 
> In any case, I'm not strongly advocating for it. It's violation of
> principle ("no fluent interfaces") is roughly in the same ballpark as
> np.filled() ("not every two-liner needs its own function"), so I
> thought I would toss it out there for consideration.
> 
> --
> Robert Kern
> 

FWIW I'm +1 on the idea. Perhaps because I just don't see many practical 
downsides to breaking the convention but I regularly see a big issue with there 
being no way to instantiate an array with a particular value.

The one obvious way to do it is use ones and multiply by the value you want. I 
work with a lot of inexperienced programmers and I see this idiom all the time. 
It takes a fair amount of numpy knowledge to know that you should do it in two 
lines by using empty and setting a slice.

In [1]: %timeit NaN*ones(10000)
1000 loops, best of 3: 1.74 ms per loop

In [2]: %%timeit
   ...: x = empty(10000, dtype=float)
   ...: x[:] = NaN
   ...: 
10000 loops, best of 3: 28 us per loop

In [3]: 1.74e-3/28e-6
Out[3]: 62.142857142857146


Even when not in the mythical "tight loop" setting an array to one and then 
multiplying uses up a lot of cycles - it's nearly 2 orders of magnitude slower 
than what we know they *should* be doing.

I'm agnostic as to whether fill should be modified or new functions provided but 
I think numpy is currently missing this functionality and that providing it 
would save a lot of new users from shooting themselves in the foot performance-
wise.

-Dave







More information about the NumPy-Discussion mailing list