[Numpy-discussion] numpy.trapz() doesn't respect subclass

josef.pktd@gmai... josef.pktd@gmai...
Sun Mar 28 00:11:16 CDT 2010


On Sat, Mar 27, 2010 at 11:37 PM, Ryan May <rmay31@gmail.com> wrote:
> On Sat, Mar 27, 2010 at 8:23 PM,  <josef.pktd@gmail.com> wrote:
>> Matrices have been part of numpy for a long time and your patch would
>> break backwards compatibility in a pretty serious way.
>
> Yeah, and I should admit that I realize that makes this particular
> patch a no-go. However, that to me doesn't put the issue to bed for
> any future code that gets written (see below).
>
>> subclasses of ndarray, like masked_arrays and quantities, and classes
>> that delegate to array calculations, like pandas, can redefine
>> anything. So there is not much that can be relied on if any subclass
>> is allowed to be used inside a function
>>
>> e.g. quantities redefines sin, cos,...
>> http://packages.python.org/quantities/user/issues.html#umath-functions
>> What happens if you call fft with a quantity array?
>
> Probably ends up casting to an ndarray. But that's a complex operation
> that I can live with not working. It's coded in C and can't be
> implemented quickly using array methods. And in this
>
>> Except for simple functions and ufuncs, it would be a lot of work and
>> fragile to allow asanyarray. And, as we discussed in a recent thread
>> on masked arrays (and histogram), it would push the work on the
>> function writer instead of the ones that are creating new subclasses.
>
> I disagree in this case.  I think the function writer should only be
> burdened to try to use array methods rather than numpy functions, if
> possible, and avoiding casts other than asanyarray() at all costs.  I
> think we shouldn't be scared of getting an error when a subclass is
> passed to a function, because that's an indication to the programmer
> that it doesn't work with what you're passing in and you need to
> *explicitly* cast it to an ndarray. Having the function do the cast
> for you is: 1) magical and implicit 2) Forces an unnecessary cast on
> those who would otherwise work fine. I get errors when I try to pass
> structured arrays to math functions, but I don't see numpy casting
> that away.

the problem is quality control and testing.
If the cast is automatically done, then if I feed anything array_like
into the function, I only have to pay attention that casting to
ndarray works as intended (e.g. it doesn't work with masked arrays
with inappropriate masked values).
If the casting is correct, I know that I get correct numbers back out,
even if I have to reattach the meta information or convert the type
again. With asanyarray, anything could happen, including getting
numbers back that are wrong, maybe obviously so, maybe not.
(structured arrays at least have the advantage that an exception is
thrown pretty fast.)
And from what I have seen so far, testing is not a high priority for
many users.

Overall, I think that there are very few functions that are simple
enough that asanyarray would work, without wrapping and casting (at
least internally). In scipy.stats we converted a few, but for most
functions I don't want to spend the time thinking how it can be
written so we can have the internal code use anyarray (main problem
are matrices, masked_array and arrays with nans need special code
anyway, structured dtypes are mostly useless for calculations without
creating a view)
e.g. what is the quantities outcome of a call to np.corrcoef, np.cov ?

And as long as not every function returns a consistent result, changes
in implementation details can affect the outcome, and then it will be
a lot of fun hunting for bugs. e.g. stats.linregress, dot or explicit
sum of squares or np.cov, or ... Are you sure you always get the same
result with quantities?

>
>> Of course, the behavior in numpy and scipy can be improved, and trapz
>> may be simple enough to change, but I don't think a patch that breaks
>> backwards compatibility pretty seriously and is not accompanied by
>> sufficient tests should go into numpy or scipy.
>
> If sufficient tests is the only thing holding this back, let me know.
> I'll get to coding.
>
> But I can't argue with the backwards incompatibility. At this point, I
> think I'm more trying to see if there's any agreement that: casting
> *everyone* because some class breaks behavior is a bad idea.  The
> programmer can always make it work by explicitly asking for the cast,
> but there's no way for the programmer to ask the function *not* to
> cast the data. Hell, I'd be happy if trapz took a flag just telling it
> subok=True.

I thought a while ago that subclasses or even classes that implement
an array_like interface should have an attribute to signal this, like
iamalgebrasafe, or subok or dontcast.

The freedom or choice not to get cast to ndarray is desirable, but the
increase in bugs and bug reports won't be much fun. And the user with
the next subclass will argue that numpy/scipy should do the casting
because it's too much wrapping code that has to be build around every
function.

(Just as a related aside, in statsmodels I'm also still trying hard to
keep the main models to use ndarrays only, either it becomes to
specialized if it is based on a specific class, or it requires a lot
of wrapping code. I don't think your proposal, to just let any array
class in, will get very far before raising an exception or producing
incorrect numbers (except maybe for array subclasses that don't change
any numerical behavior.)

That's my opinion, maybe others see it in a different way.
But in any case, it should be possible to change individual functions
even if the overall policy doesn't change.

Josef

>
>> (On the other hand, I'm very slowly getting used to the pattern that
>> for a simple function, 10% is calculation and 90% is interface code.)
>
> Yeah, it's kind of annoying, since the 10% is the cool part you want,
> and that 90% is thorny to design and boring to code.
>
> Ryan
>
> --
> Ryan May
> Graduate Research Assistant
> School of Meteorology
> University of Oklahoma
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the NumPy-Discussion mailing list