[SciPy-user] scipy.stats rv objects from data
Erik Tollerud
erik.tollerud@gmail....
Fri May 2 16:36:34 CDT 2008
On Mon, Apr 28, 2008 at 12:29 AM, Stéfan van der Walt <stefan@sun.ac.za> wrote:
> That should be 1.0/len(x), otherwise all the probabilities are 0.
>
> Cheers
> Stéfan
I had >>>from __future__ import division above when I actually tested
this, so they aren't zero (although you're right they would be
otherwise), but you're right that they would be if this was run as-is.
I figured the pmf part out, though based on Michael's examples - the
pmf SHOULD be zero everywhere other then exactly at one of the
values... when I generate the cdf, it has non-zero values.
On Mon, Apr 28, 2008 at 2:02 AM, Michael <mnandris@btinternet.com> wrote:
> There are at least 2 ways of using rv_discrete
>
> e.g. 2 ways to calculate the next element of a simple Markov Chain with
> x(n+1)=Norm(0.5 x(n),1)
>
> from scipy.stats import rv_discrete
> from numpy.random import multinomial
>
> x = 3
> n1 = stats.rv_continuous.rvs( stats.norm, 0.5*x, 1.0 )[0]
> print n1
> n2 = stats.rv_discrete.rvs( stats.rv_discrete( name='sample',
> values=([0,1,2],[3/10.,5/10.,2/10.])), 0.5*x, 1.0 )[0]
> print n2
> print
> sample = stats.rv_discrete( name='sample',
> values=([0,1,2],[3/10.,5/10.,2/10.]) ).rvs( size=10 )
> print sample
>
> The multinomial distribution from numpy.random is somewhat faster (40
> times or so) but has a different idiom:
>
> SIZE = 100000
> VALUES = [0,1,2,3,4,5,6,7]
> PROBS = [1/8.,1/8.,1/8.,1/8.,1/8.,1/8.,1/8.,1/8.]
>
> The idiom for rv_discrete is
> rv_discrete( name='sample', values=(VALUES,PROBS) )
>
> The idiom for numpy.multinomial is different; if memory serves, you get
> frequencies as output instead of the actual values
> multinomial( SIZE, PROBS )
>
> >>> from numpy.random import multinomial
> >>> multinomial(100,[ 0.2, 0.4, 0.1, 0.3 ])
> array([12, 44, 10, 34])
> >>> multinomial( 100, [0.2, 0.0, 0.8, 0.0] ) <-- don't do this
> ...
> >>> multinomial( 100, [0.2, 1e-16, 0.8, 1e-16] ) <-- or this
> >>> multinomial( 100, [0.2-1e-16, 1e-16, 0.8-1e-16, 1e-16] ) <-- ok
> array([21, 0, 79, 0])
>
> the last one is ok since the probability adds up to 1... painful, but it
> works
As explained above, I think I got the discrete to work (spurred on by
your simpler example) - but what's the utility of using the
multinomial idiom over the rv_discrete syntax? Is it faster?.
> Continuous v's discrete: i found this in ./stats/scstats.py
>
> from scipy import stats, r_
> from pylab import show, plot
> import copy
>
> # SOURCE: ./stats/scstats.py
>
> SPREAD = 10
>
> class cdf( object ):
> """ Baseclass for the task of determining a sequence of numbers {vi}
> which is distributed as a random variable X
> """
> def integerDensityFunction( self ):
> """
> Outputs an integer density function: xs (ints) and ys (probabilities)
> which are the correspondence between the whole numbers on the x axis
> to the probabilities on the y axis, according to a normal distribution.
> """
> opt = []
> for i in r_[-SPREAD:SPREAD:100j]: # 2-tailed test (?)
> opt.append(( i, stats.norm.cdf(i) )) # ( int, P(int) )
> return zip(*opt) # [ (int...), (P...) ]
>
> def display( self ):
> xs, ys = self.integerDensityFunction()
> plot( xs, ys )
> show()
>
> if __name__=='__main__':
> d = cdf()
> d.display()
I don't really understand what you're suggesting, but anyway, I don't
see a stats/scstats.py file in the scipy directory (at least in
0.6)...
> Continuous: i can only suggest using rv_continuous
>
> stats.rv_continuous.rvs( stats.norm, 0.5*x, 1.0 ).whatever
>
> .rvs( shape, loc, scale ) is the random variates
> .pdf( x, shape, loc, scale ) is the probability density function which,
> i think, is or should be genuinely continuous
My interpretation of this is that it is using the normal distribution
- I want a distribution that is a smoothed/interpolated version of the
discrete distribution I generated above. I take this to mean there's
no built-in utility to do this, so I just have to make my own - this
seems like a useful thing for data analysis, though, so I may submit
it later to be added to SVN.
More information about the SciPy-user
mailing list