[Numpy-discussion] Using matplotlib's prctile on masked arrays

Gökhan Sever gokhansever@gmail....
Tue Oct 27 06:56:33 CDT 2009


Hello,

Consider this sample two columns of data:

 999999.9999 999999.9999
 999999.9999 999999.9999
 999999.9999 999999.9999
 999999.9999   1693.9069
 999999.9999   1676.1059
 999999.9999   1621.5875
    651.8040       1542.1373
    691.0138       1650.4214
    678.5558       1710.7311
    621.5777    999999.9999
    644.8341    999999.9999
    696.2080    999999.9999

Putting into this data into a file say "sample.data" and loading with:

a,b = np.loadtxt('sample.data', dtype="float").T

I[16]: a
O[16]:
array([  1.00000000e+06,   1.00000000e+06,   1.00000000e+06,
         1.00000000e+06,   1.00000000e+06,   1.00000000e+06,
         6.51804000e+02,   6.91013800e+02,   6.78555800e+02,
         6.21577700e+02,   6.44834100e+02,   6.96208000e+02])

I[17]: b
O[17]:
array([ 999999.9999,  999999.9999,  999999.9999,    1693.9069,
          1676.1059,    1621.5875,    1542.1373,    1650.4214,
          1710.7311,  999999.9999,  999999.9999,  999999.9999])

### interestingly, the second column is loaded as it is but a values
reformed a little. Why this could be happening? Any idea? Anyways, back to
masked arrays:

I[24]: am = ma.masked_values(a, value=999999.9999)

I[25]: am
O[25]:
masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777
644.8341 696.208],
             mask = [ True  True  True  True  True  True False False False
False False False],
       fill_value = 999999.9999)


I[30]: bm = ma.masked_values(b, value=999999.9999)

I[31]: am
O[31]:
masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777
644.8341 696.208],
             mask = [ True  True  True  True  True  True False False False
False False False],
       fill_value = 999999.9999)


So far so good. A few basic checks:

I[33]: am/bm
O[33]:
masked_array(data = [-- -- -- -- -- -- 0.422662755126 0.418689311712
0.39664667346 -- -- --],
             mask = [ True  True  True  True  True  True False False False
True  True  True],
       fill_value = 999999.9999)


I[34]: mean(am/bm)
O[34]: 0.41266624676580849

Unfortunately, matplotlib.mlab's prctile cannot handle this division:

I[54]: prctile(am/bm, p=[5,25,50,75,95])
O[54]:
array([  3.96646673e-01,   6.21577700e+02,   1.00000000e+06,
         1.00000000e+06,   1.00000000e+06])


This also results with wrong looking box-and-whisker plots.


Testing further with scipy.stats functions yields expected correct results:

I[55]: stats.scoreatpercentile(am/bm, per=5)
O[55]: 0.40877012449846228

I[49]: stats.scoreatpercentile(am/bm, per=25)
O[49]:
masked_array(data = --,
             mask = True,
       fill_value = 1e+20)

I[56]: stats.scoreatpercentile(am/bm, per=95)
O[56]:
masked_array(data = --,
             mask = True,
       fill_value = 1e+20)


Any confirmation?







-- 
Gökhan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20091027/6492935c/attachment.html 


More information about the NumPy-Discussion mailing list