[SciPy-User] nan's in stats.spearmanr

Wes McKinney wesmckinn@gmail....
Wed Apr 4 19:51:52 CDT 2012


On Wed, Apr 4, 2012 at 6:34 PM,  <josef.pktd@gmail.com> wrote:
> On Wed, Apr 4, 2012 at 3:54 PM, Ben <benwhalley@gmail.com> wrote:
>> Apologies if this seems obvious to others, but I'm using both functions from
>> pandas and stats.spearmanr in different bits of my code and noticed something
>> odd.  Is the following output expected?
>>
>> from  pandas import DataFrame
>> from scipy import stats
>> a = [1, nan, 2]
>> b = [1, 2, 2]
>> df = DataFrame(zip(a,b))
>> stats.spearmanr(a,b)
>>
>> gives: (0.86602540378443871, 0.3333333333333332)
>>
>> df.corr(method="spearman")
>>   0  1
>> 0  1  1
>> 1  1  1
>>
>> Removing the nan from a produces identical results. I had expected the first
>> output, but perhaps I'm not  understanding how scipy likes to handle nan.
>
> scipy.stats doesn't handle nans in most cases, they are just ignored
> (what the outcome is depends on the implementation details)
>
> the correct answer should be in stats.mstats, which uses masked arrays
> to handle nan cases
>
>>>> am = np.ma.fix_invalid(a)
>>>> bm = np.ma.fix_invalid(b)
>>>> stats.mstats.spearmanr(am, bm)
> (1.0, 0.0)
>
> Josef
>
>
>>
>> Any advice much appreciated.
>>
>> Regards,
>>
>> Ben
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

pandas excludes NaN's by default so the output looks correct based on
what Josef wrote


More information about the SciPy-User mailing list