[Numpy-discussion] Coverting ranks to a Gaussian

Robert Kern robert.kern@gmail....
Mon Jun 9 21:35:43 CDT 2008


On Mon, Jun 9, 2008 at 21:06, Keith Goodman <kwgoodman@gmail.com> wrote:
> On Mon, Jun 9, 2008 at 4:45 PM, Robert Kern <robert.kern@gmail.com> wrote:
>> On Mon, Jun 9, 2008 at 18:34, Keith Goodman <kwgoodman@gmail.com> wrote:
>>> Does anyone have a function that converts ranks into a Gaussian?
>>>
>>> I have an array x:
>>>
>>>>> import numpy as np
>>>>> x = np.random.rand(5)
>>>
>>> I rank it:
>>>
>>>>> x = x.argsort().argsort()
>>>>> x_ranked = x.argsort().argsort()
>>>>> x_ranked
>>>   array([3, 1, 4, 2, 0])
>>
>> There are subtleties in computing ranks when ties are involved. Take a
>> look at the implementation of scipy.stats.rankdata().
>
> Good point. I had to deal with ties and missing data. I bet
> scipy.stats.rankdata() is faster than my implementation.

Actually, it's pretty slow. I think there are opportunities to make it
faster, but I haven't explored them.

>>> I would like to convert the ranks to a Gaussian without using scipy.
>>
>> No dice. You are going to have to use scipy.special.ndtri somewhere. A
>> basic transformation (off the top of my head, I have no idea if this
>> is statistically meaningful):
>>
>>  scipy.special.ndtri((ranks + 1.0) / (len(ranks) + 1.0))
>>
>> Barring tied first or last items, this should give equal weight to
>> each of the tails outside of the range of your data.
>
> Nice. Thank you. It passes the never wrong chi-by-eye test:
>
> r = np.arange(1000)
> g = special.ndtri((r + 1.0) / (len(r) + 1.0))
> pylab.hist(g, 50)
> pylab.show()

BTW, what is your use case? If you are trying to compare your data to
the normal distribution, you might want to go the other way: find the
empirical quantiles of your data and compare them against the
hypothetical quantiles of the data on the distribution. This *is* an
established statistical technique; when you graph one versus the
other, you get what is known as a Q-Q plot.

> I wasn't able to use scipy.special.ndtri (after import scipy) like you
> did.

Actually, I didn't specify any import. The behavior you see is
correct. Importing scipy alone doesn't import any of the subpackages.
If you like, you can pretend there was an implicit "import
scipy.special" before the code I gave.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco


More information about the Numpy-discussion mailing list