[SciPy-dev] scipy.stats._chk_asarray

josef.pktd@gmai... josef.pktd@gmai...
Wed Jun 3 09:26:17 CDT 2009


On Wed, Jun 3, 2009 at 10:05 AM, Bruce Southey <bsouthey@gmail.com> wrote:
> josef.pktd@gmail.com wrote:
>> On Wed, Jun 3, 2009 at 12:55 AM, Robert Kern <robert.kern@gmail.com> wrote:
>>
>>> On Tue, Jun 2, 2009 at 23:50, Pierre GM <pgmdevlist@gmail.com> wrote:
>>>
>>>> On Jun 2, 2009, at 11:09 PM, josef.pktd@gmail.com wrote:
>>>>
>>>>>> I tried to see if I can introduce a second version _check_asanyarray,
>>>>>>
>>>>> that doesn't convert to basic np.array, but I didn't get very far.
>>>>> nanmedian, and nanstd are not easy to convert to work with matrices,
>>>>> nanstd uses multiplication and nanmedian uses np.compress
>>>>>
>>>> Well, what about that:
>>>> * convert the inputs to ndarray w/ _chk_asarray
>>>> * compute as usual
>>>> * return a view of the result using the type of the input (using the
>>>> type keyword of view)
>>>> That should work w/ nanmedian. There might be some adjustment to make
>>>> for nanstd (pb of dimensions?)
>>>>
>>> That is what I was suggesting, only in decorator form so it could be
>>> applied everywhere. It's not worth wasting time making a small handful
>>> of functions work and be inconsistent with all of the others.
>>>
>>>
>>
>>
>> If someone gives me this decorator, I will use it, but I don't know
>> how to write a decorator that works for all input and output cases,
>> and doesn't screw up our documentation system.
>>
>> But I can change 2 lines per function, and I know I still have the
>> same signature and docstring. It looks like it will work for all
>> descriptive statistics and data transformation in scipy.stats. It
>> won't be relevant for most of the remainder.
>>
>> Josef
>> _______________________________________________
>> Scipy-dev mailing list
>> Scipy-dev@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>
> Hi,
> Using stats._chk_asarray should be completely unnecessary because most
> of numpy functions accept array-like inputs and use flattened arrays by
> default unless the axis keyword is used. That is why I did not use it
> for the stats.gmean and stats.hmean patches.
>
> I am also curious why the nanmean is so involved when I would think
> that, for some array b and axis, you can just do:
> numpy.nansum(b,axis=axis)/numpy.sum(numpy.isfinite(b), axis=axis)

For large, badly scaled arrays this might not be a numerically precise
way of doing it. But I agree that many functions could be written as
one liners where the only advantage I see, is that we don't have to
remember the formula.

>
> Granted nanstd is more complex and, in both cases, these probably should
> be part of numpy.
>

a**2 and a*b have completely different meaning for matrices than for
ndarrays. Without conversion, writing any more complex statistical
function would be a major hassle.

As I mentioned before, I tried with nanmedian and nanstd and gave up
very fast, since many functions don't work correctly or have a
different meaning. Writing code that is not allowed to use `*` looks
pretty hard to read and to write. I haven't tried what happens if
someone throws a sparse matrix at the stats functions, but we get
wrong results using for example np.dot.

Josef

> Bruce
>


More information about the Scipy-dev mailing list