[Numpy-discussion] numpy.random.pareto, m equal zero
josef.pktd@gmai...
josef.pktd@gmai...
Fri Aug 7 21:38:04 CDT 2009
Does it make any (statistical) sense to have numpy.random.pareto
produce random numbers that start at zero?
Can we change it to start at 1 which is the usual default?
Notation from http://docs.scipy.org/numpy/docs/numpy.random.mtrand.RandomState.pareto/
The probability density for the Pareto distribution is
.. math:: p(x) = \\frac{am^a}{x^{a+1}}
where :math:`a` is the shape and :math:`m` the location
constraints from Johnson, Kotz, Balakrishnan vol1 page 574
m>0, a>0, x>=m
1) as m goes to zero, the pdf goes to zero for every point, (mean,
variance go to zero, essentially masspoint at zero)
2) quote from http://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/parpdf.htm
(their `a` is our `m`)
" Note that although the a (=m JP) parameter is typically called a
location parameter (and it is in the sense that it defines the lower
bound), it is not a location parameter in the technical sense that the
following relation does not hold:
f(x;gamma,a) = f((x-a);gamma,0)
For this reason, Dataplot treats a (=m JP) as a shape parameter. In
Dataplot, the a (=m JP) shape parameter is optional with a default
value of 1. "
my conclusion:
---------------------
What numpy.random.pareto actually produces, are random numbers from a
pareto distribution with lower bound m=1, but location parameter
loc=-1, that shifts the distribution to the left.
To actually get useful random numbers (that are correct in the usual
usage http://en.wikipedia.org/wiki/Pareto_distribution), we need to
add 1 to them.
stats.distributions doesn't use mtrand.pareto (why?), so I never
needed to check this before.
rvs_pareto = 1 + numpy.random.pareto(a, size)
for correction in some calculation, see the thread on the power distribution.
Do we have to live with loc=-1, or can we change it, or am I
misinterpreting something (which wouldn't be the first time either)?
Josef
More information about the NumPy-Discussion
mailing list