[SciPy-User] stats.pearsonr divide by zero warning

josef.pktd@gmai... josef.pktd@gmai...
Mon Aug 9 14:46:27 CDT 2010


On Mon, Aug 9, 2010 at 3:19 PM, Zachary Pincus <zachary.pincus@yale.edu> wrote:
> Hello,
>
> I just svn-up'd scipy, and now find that stats.pearsonr is causing
> divide-by-zero warnings foolishly.
>
> the function contains the following stanzas:
>
>     rs = np.corrcoef(ar,br,rowvar=axisout)
>
>     t = rs * np.sqrt((n-2) / ((rs+1.0)*(1.0-rs)))
>     prob = distributions.t.sf(np.abs(t),n-2)*2
>
>     if rs.shape == (2,2):
>         return rs[1,0], prob[1,0]
>     else:
>         return rs, prob
>
> Given that the diagonal of the correlation matrix returned by corrcoef
> will *always* be 1s, the t matrix will have divide-by-zero issues on
> the diagonal, and give inf values -- which get zero values for the t-
> distribution's survival function, so everything's fine, output-wise.
> Presumably, though, the t-calculating line should be flanked by err =
> np.seterr(divide='ignore') / np.seterr(**err), right?
>
> Should I add a bug in the tracker? Someone want to just commit this fix?

I guess you mean spearmanr,  pearsonr hasn't been rewritten as far as I can see.

The old trick (still used in pearsonr) was to add TINY in the
calculation of the test statistic.
Maybe we should add TINY to the diagonal, which would keep a zero
division warning if any of the series are perfectly correlated.

seterr is also fine with me.

a ticket is always good, at least for the record so we know what to
watch out for. I have warnings turned off globally, so no zero
division problems for me.

np.corrcoef might throw a warning if there is zero variance, but I'm
not sure this applies in this case

Josef



>
> Zach
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>


More information about the SciPy-User mailing list