[Numpy-discussion] Coverting ranks to a Gaussian
Tue Jun 10 08:17:01 CDT 2008
On Tue, Jun 10, 2008 at 12:56 AM, Anne Archibald
> 2008/6/9 Keith Goodman <firstname.lastname@example.org>:
>> Does anyone have a function that converts ranks into a Gaussian?
>> I have an array x:
>>>> import numpy as np
>>>> x = np.random.rand(5)
>> I rank it:
>>>> x = x.argsort().argsort()
>>>> x_ranked = x.argsort().argsort()
>> array([3, 1, 4, 2, 0])
>> I would like to convert the ranks to a Gaussian without using scipy.
>> So instead of the equal distance between ranks in array x, I would
>> like the distance been them to follow a Gaussian distribution.
>> How far out in the tails of the Gaussian should 0 and N-1 (N=5 in the
>> example above) be? Ideally, or arbitrarily, the areas under the
>> Gaussian to the left of 0 (and the right of N-1) should be 1/N or
>> 1/2N. Something like that. Or a fixed value is good too.
> I'm actually not clear on what you need.
> If what you need is for rank i of N to be the 100*i/N th percentile in
> a Gaussian distribution, then you should indeed use scipy's functions
> to accomplish that; I'd use scipy.stats.norm.ppf().
> Of course, if your points were drawn from a Gaussian distribution,
> they wouldn't be exactly 1/N apart, there would be some distribution.
> Quite what the distribution of (say) the maximum or the median of N
> points drawn from a Gaussian is, I can't say, though people have
> looked at it. But if you want "typical" values, just generate N points
> from a Gaussian and sort them:
> V = np.random.randn(N)
> V = np.sort(V)
> return V[ranks]
> Of course they will be different every time, but the distribution will be right.
I guess I botched the description of my problem.
I have data that contains outliers and other noise. I am trying
various transformations of the data to preprocess it before plugging
it into my prediction algorithm. One such transformation is to rank
the data and then convert that rank to a Gaussian. The particular
details of the transformation don't matter. I just want something
smooth and normal like.
> P.S. why the "no scipy" restriction? it's a bit unreasonable. -A
I'd rather not pull in a scipy dependency for one function if there is
a numpy alternative. I think it is funny that you picked up on my
brief mention of scipy and called it unreasonable.
More information about the Numpy-discussion