[Numpy-discussion] numpy.trapz() doesn't respect subclass

josef.pktd@gmai... josef.pktd@gmai...
Sat Mar 27 21:23:03 CDT 2010


On Sat, Mar 27, 2010 at 2:31 PM, Ryan May <rmay31@gmail.com> wrote:
> On Sat, Mar 27, 2010 at 11:12 AM,  <josef.pktd@gmail.com> wrote:
>> On Sat, Mar 27, 2010 at 1:00 PM, Ryan May <rmay31@gmail.com> wrote:
>>> On Mon, Mar 22, 2010 at 8:14 AM, Ryan May <rmay31@gmail.com> wrote:
>>>> On Sun, Mar 21, 2010 at 11:57 PM,  <josef.pktd@gmail.com> wrote:
>>>>> On Mon, Mar 22, 2010 at 12:49 AM, Ryan May <rmay31@gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I found that trapz() doesn't work with subclasses:
>>>>>>
>>>>>> http://projects.scipy.org/numpy/ticket/1438
>>>>>>
>>>>>> A simple patch (attached) to change asarray() to asanyarray() fixes
>>>>>> the problem fine.
>>>>>
>>>>> Are you sure this function works with matrices and other subclasses?
>>>>>
>>>>> Looking only very briefly at it: the multiplication might be a problem.
>>>>
>>>> Correct, it probably *is* a problem in some cases with matrices.  In
>>>> this case, I was using quantities (Darren Dale's unit-aware array
>>>> package), and the result was that units were stripped off.
>>>>
>>>> The patch can't make trapz() work with all subclasses. However, right
>>>> now, you have *no* hope of getting a subclass out of trapz().  With
>>>> this change, subclasses that don't redefine operators can work fine.
>>>> If you're passing a Matrix to trapz() and expecting it to work, IMHO
>>>> you're doing it wrong.  You can still pass one in by using asarray()
>>>> yourself.  Without this patch, I'm left with copying and maintaining a
>>>> copy of the code elsewhere, just so I can loosen the function's input
>>>> processing. That seems wrong, since there's really no need in my case
>>>> to drop down to an ndarray. The input I'm giving it supports all the
>>>> operations it needs, so it should just work with my original input.
>>
>> With asarray it gives correct results for matrices and all array_like
>> and subclasses, it just doesn't preserve the type.
>> Your patch would break matrices and possibly other types, masked_arrays?, ...
>
> It would break matrices, yes.  I would argue that masked arrays are
> already broken with trapz:
>
> In [1]: x = np.arange(10)
>
> In [2]: y = x * x
>
> In [3]: np.trapz(y, x)
> Out[3]: 244.5
>
> In [4]: ym = np.ma.array(y, mask=(x>4)&(x<7))
>
> In [5]: np.trapz(ym, x)
> Out[5]: 244.5
>
> In [6]: y[5:7] = 0
>
> In [7]: ym = np.ma.array(y, mask=(x>4)&(x<7))
>
> In [8]: np.trapz(ym, x)
> Out[8]: 183.5
>
> Because of the call to asarray(), the mask is completely discarded and
> you end up with identical results to an unmasked array,
> which is not what I'd expect.  Worse, the actual numeric value of the
> positions that were masked affect the final answer. My patch allows
> this to work as expected too.
>
>> One solution would be using arraywrap as in numpy.linalg.
>
> By arraywrap, I'm assuming you mean:
>
> def _makearray(a):
>    new = asarray(a)
>    wrap = getattr(a, "__array_prepare__", new.__array_wrap__)
>    return new, wrap
>
> I'm not sure if that's identical to just letting the subclass handle
> what's needed.  To my eyes, that doesn't look as though it'd be
> equivalent, both for handling masked arrays and Quantities. For
> quantities at least, the result of trapz will have different units
> than either of the inputs.
>
>> for related discussion:
>> http://mail.scipy.org/pipermail/scipy-dev/2009-June/012061.html
>
> Actually, that discussion kind of makes my point.  Matrices are a pain
> to make work in a general sense because they *break* ndarray
> conventions--to me it doesn't make sense to help along classes that
> break convention at the expense of making well-behaved classes a pain
> to use.  You should need an *explicit* cast of a matrix to an ndarray
> instead of the function quietly doing it for you. ("Explicit is better
> than implicit") It just seems absurd that if I make my own ndarray
> subclass that *just* adds some behavior to the array, but doesn't
> break *any* operations, I need to do one of the following:
>
> 1) Have my own copy of trapz that works with my class
> 2) Wrap every call to numpy's own trapz() to put the metadata back.
>
> Does it not seem backwards that the class that breaks conventions
> "just works" while those that don't break conventions, will work
> perfectly with the function as written, need help to be treated
> properly?

(trying again, firefox or gmail deleted my earlier response instead of
sending it)

Matrices have been part of numpy for a long time and your patch would
break backwards compatibility in a pretty serious way.

subclasses of ndarray, like masked_arrays and quantities, and classes
that delegate to array calculations, like pandas, can redefine
anything. So there is not much that can be relied on if any subclass
is allowed to be used inside a function

e.g. quantities redefines sin, cos,...
http://packages.python.org/quantities/user/issues.html#umath-functions
What happens if you call fft with a quantity array?

For ndarrays  I know that  (x*y).sum(0) is equivalent to np.dot(x.T,y)
 for appropriate y and x (e.g. for a faster calculation of means), so,
it's just an implementation detail what I use. If a subclass redefines
ufuncs but not linalg, then these two can be different. Just a
thought, what happens if an existing function for ndarrays is
cythonized? I guess the behavior for subclasses might change if they
were not converted to ndarrays. So it appears to me that the only
feasible way *in general* is to have separate functions or wrappers
like np.ma.

Except for simple functions and ufuncs, it would be a lot of work and
fragile to allow asanyarray. And, as we discussed in a recent thread
on masked arrays (and histogram), it would push the work on the
function writer instead of the ones that are creating new subclasses.

Of course, the behavior in numpy and scipy can be improved, and trapz
may be simple enough to change, but I don't think a patch that breaks
backwards compatibility pretty seriously and is not accompanied by
sufficient tests should go into numpy or scipy.

(On the other hand, I'm very slowly getting used to the pattern that
for a simple function, 10% is calculation and 90% is interface code.)

Josef

>
> Ryan
>
> --
> Ryan May
> Graduate Research Assistant
> School of Meteorology
> University of Oklahoma
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the NumPy-Discussion mailing list