[SciPy-User] stats.pearsonr divide by zero warning

josef.pktd@gmai... josef.pktd@gmai...
Mon Aug 9 15:35:49 CDT 2010


On Mon, Aug 9, 2010 at 4:29 PM, Skipper Seabold <jsseabold@gmail.com> wrote:
> On Mon, Aug 9, 2010 at 4:18 PM,  <josef.pktd@gmail.com> wrote:
>> On Mon, Aug 9, 2010 at 3:46 PM,  <josef.pktd@gmail.com> wrote:
>>> On Mon, Aug 9, 2010 at 3:19 PM, Zachary Pincus <zachary.pincus@yale.edu> wrote:
>>>> Hello,
>>>>
>>>> I just svn-up'd scipy, and now find that stats.pearsonr is causing
>>>> divide-by-zero warnings foolishly.
>>>>
>>>> the function contains the following stanzas:
>>>>
>>>>     rs = np.corrcoef(ar,br,rowvar=axisout)
>>>>
>>>>     t = rs * np.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
>>>>     prob = distributions.t.sf(np.abs(t),n-2)*2
>>>>
>>>>     if rs.shape == (2,2):
>>>>         return rs[1,0], prob[1,0]
>>>>     else:
>>>>         return rs, prob
>>>>
>>>> Given that the diagonal of the correlation matrix returned by corrcoef
>>>> will *always* be 1s, the t matrix will have divide-by-zero issues on
>>>> the diagonal, and give inf values -- which get zero values for the t-
>>>> distribution's survival function, so everything's fine, output-wise.
>>>> Presumably, though, the t-calculating line should be flanked by err =
>>>> np.seterr(divide='ignore') / np.seterr(**err), right?
>>>>
>>>> Should I add a bug in the tracker? Someone want to just commit this fix?
>>>
>>> I guess you mean spearmanr,  pearsonr hasn't been rewritten as far as I can see.
>>>
>>> The old trick (still used in pearsonr) was to add TINY in the
>>> calculation of the test statistic.
>>> Maybe we should add TINY to the diagonal, which would keep a zero
>>> division warning if any of the series are perfectly correlated.
>>>
>>> seterr is also fine with me.
>>>
>>> a ticket is always good, at least for the record so we know what to
>>> watch out for. I have warnings turned off globally, so no zero
>>> division problems for me.
>>>
>>> np.corrcoef might throw a warning if there is zero variance, but I'm
>>> not sure this applies in this case
>>>
>>> Josef
>>
>> just a follow-up because I think there is a similar case in the
>> contingency table code
>>
>> mut_inf = np.nansum(self.probability * np.log(self.observed / self.expected))
>>
>> Do we really need to protect everywhere for zero division warnings?
>>
>
> Is this protecting against a zero division warning?

No, it will cause a warning by design whenever there is a zero
probability. The question is whether we need a work-around in the code
as we did in statsmodels. I never tried to see how many zero division
and other warnings are raised in the regular usage in scipy.stats, but
I guess a lot.

Josef

>
>> I think in statsmodels we worked around the warning in 0*np.log(0) or
>> something like this.
>>
>
> IIRC, we use a mask for these kind of cases in Shannon entropy to
> avoid the ugly warning.  I don't know about speed.  The current
> implementation of stats.entropy uses where, but this doesn't avoid the
> warning.
>
> Skipper
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-User mailing list