[Numpy-discussion] Statistical distributions on samples

Christopher Jordan-Squire cjordan1@uw....
Fri Aug 12 09:53:12 CDT 2011


Hi Andrea--An easy way to get something like this would be

import numpy as np
import scipy.stats as stats

sigma = #some reasonable standard deviation for your application
x = stats.norm.rvs(size=1000, loc=125, scale=sigma)
x = x[x>50]
x = x[x<200]

That will give a roughly normal distribution to your velocities, as long as,
say, sigma<25. (I'm using the rule of thumb for the normal distribution that
normal random samples lie 3 standard deviations away from the mean about 1
out of 350 times.) Though you won't be able to get exactly normal errors
about your mean since normal random samples can theoretically be of any
size.

You can use this same process for any other distribution, as long as you've
chosen a scale variable so that the probability of samples being outside
your desired interval is really small. Of course, once again your random
errors won't be exactly from the distribution you get your original samples
from.

-Chris JS

On Fri, Aug 12, 2011 at 8:32 AM, Andrea Gavana <andrea.gavana@gmail.com>wrote:

> Hi All,
>
>     I am working on something that appeared to be a no-brainer issue (at
> the beginning), by my complete ignorance in statistics is overwhelming and I
> got stuck.
>
> What I am trying to do can be summarized as follows
>
> Let's assume that I have to generate a sample of a 1,000 values for a
> variable (let's say, "velocity") using a normal distribution (but later I
> will have to do it with log-normal, triangular and a couple of others). The
> only thing I know about this velocity sample is the minimum and maximum
> values (let's say 50 and 200 respectively) and, obviously for the normal
> distribution (but not so for the other distributions), the mean value (125
> in this case).
>
> Now, I would like to generate this sample of 1,000 points, in which none of
> the point has velocity smaller than 50 or bigger than 200, and the number of
> samples close to the mean (125) should be higher than the number of samples
> close to the minimum and the maximum, following some kind of normal
> distribution.
>
> What I have tried up to now is summarized in the code below, but as you can
> easily see, I don't really know what I am doing. I am open to every
> suggestion, and I apologize for the dumbness of my question.
>
> import numpy
>
> from scipy import stats
> import matplotlib.pyplot as plt
>
> minval, maxval = 50.0, 250.0
> x = numpy.linspace(minval, maxval, 500)
>
> samp = stats.norm.rvs(size=len(x))
> pdf = stats.norm.pdf(x)
> cdf = stats.norm.cdf(x)
> ppf = stats.norm.ppf(x)
>
> ax1 = plt.subplot(2, 2, 1)
> ax1.plot(range(len(x)), samp)
>
> ax2 = plt.subplot(2, 2, 2)
> ax2.plot(x, pdf)
>
> ax3 = plt.subplot(2, 2, 3)
> ax3.plot(x, cdf)
>
> ax4 = plt.subplot(2, 2, 4)
> ax4.plot(x, ppf)
>
> plt.show()
>
>
> Andrea.
>
> "Imagination Is The Only Weapon In The War Against Reality."
> http://xoomer.alice.it/infinity77/
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110812/c5eaf0b7/attachment.html 


More information about the NumPy-Discussion mailing list