[SciPy-user] scipy.stats rv objects from data

Erik Tollerud erik.tollerud@gmail....
Sun Apr 27 18:29:21 CDT 2008


I'm finding the scipy.stats documentation somewhat difficult to
follow, so maybe the answer to this question is in there... I can't
really find it, though.

What I have is a sequence of numbers X_i . Two things I'd like to be
able to do with this:
1. Create a discrete probability distribution (class rv_discrete) from
this data so as to use the utility functions that take rv_discrete
objects.
The rv_discrete documentation suggests should be easy.  I did the following
>>>ddist=rv_discrete(values=(x,[1/len(x) for i in x]),name='test')
>>>ddist.pmf(50)
array(0.0)

Any value I try to get of the pmf seems to be 0.  Do I have to
explicitly subclass rv_discrete with my data and a _pmf method or
something? This seems like a very natural thing to want to do, and
hence it seems odd to not have some helper like
make_dist(x,name='whatever') .  I can take a shot at creating such a
function, but I don't want to do so if one exists.

2. Create a continuous probability distribution from something like
spline fitting or simple linear interpolation of a the data in X_i.
Does this require explict subclassing, or is there a straightforward
way to do it that's builtin?  I'm not sure if this step is strictly
necessary - what I really want to do is be able to draw from the
discrete distribution in 1 just by sampling the cdf... maybe this is
how it's supposed to work with the discrete distribution, but when I
tried to sample it using ddist.rvs, I would always get the input
values I specified rather random values sampled from the cdf.

I'm on scipy 0.6.0 and numpy 1.0.4


More information about the SciPy-user mailing list