[SciPy-user] scipy.stats rv objects from data

Erik Tollerud erik.tollerud@gmail....
Fri May 2 16:36:34 CDT 2008


On Mon, Apr 28, 2008 at 12:29 AM, Stéfan van der Walt <stefan@sun.ac.za> wrote:
>  That should be 1.0/len(x), otherwise all the probabilities are 0.
>
>  Cheers
>  Stéfan

I had  >>>from __future__ import division above when I actually tested
this, so they aren't zero (although you're right they would be
otherwise), but you're right that they would be if this was run as-is.
 I figured the pmf part out, though based on Michael's examples - the
pmf SHOULD be zero everywhere other then exactly at one of the
values... when I generate the cdf, it has non-zero values.

On Mon, Apr 28, 2008 at 2:02 AM, Michael <mnandris@btinternet.com> wrote:
>  There are at least 2 ways of using rv_discrete
>
>  e.g. 2 ways to calculate the next element of a simple Markov Chain with
>  x(n+1)=Norm(0.5 x(n),1)
>
>  from scipy.stats import rv_discrete
>  from numpy.random import multinomial
>
>     x = 3
>     n1 = stats.rv_continuous.rvs( stats.norm, 0.5*x, 1.0 )[0]
>     print n1
>     n2 = stats.rv_discrete.rvs( stats.rv_discrete( name='sample',
>  values=([0,1,2],[3/10.,5/10.,2/10.])), 0.5*x, 1.0 )[0]
>     print n2
>     print
>     sample = stats.rv_discrete( name='sample',
>  values=([0,1,2],[3/10.,5/10.,2/10.]) ).rvs( size=10 )
>     print sample
>
>  The multinomial distribution from numpy.random is somewhat faster (40
>  times or so) but has a different idiom:
>
>  SIZE = 100000
>  VALUES = [0,1,2,3,4,5,6,7]
>  PROBS = [1/8.,1/8.,1/8.,1/8.,1/8.,1/8.,1/8.,1/8.]
>
>  The idiom for rv_discrete is
>         rv_discrete( name='sample', values=(VALUES,PROBS) )
>
>  The idiom for numpy.multinomial is different; if memory serves, you get
>  frequencies as output instead of the actual values
>         multinomial( SIZE, PROBS )
>
>  >>> from numpy.random import multinomial
>  >>> multinomial(100,[ 0.2, 0.4, 0.1, 0.3 ])
>  array([12, 44, 10, 34])
>  >>> multinomial( 100, [0.2, 0.0, 0.8, 0.0] ) <-- don't do this
>  ...
>  >>> multinomial( 100, [0.2, 1e-16, 0.8, 1e-16] ) <-- or this
>  >>> multinomial( 100, [0.2-1e-16, 1e-16, 0.8-1e-16, 1e-16] ) <-- ok
>  array([21,  0, 79,  0])
>
>  the last one is ok since the probability adds up to 1... painful, but it
>  works

As explained above, I think I got the discrete to work (spurred on by
your simpler example) - but what's the utility of using the
multinomial idiom over the rv_discrete syntax? Is it faster?.

>  Continuous v's discrete: i found this in ./stats/scstats.py
>
>  from scipy import stats, r_
>  from pylab import show, plot
>  import copy
>
>  # SOURCE: ./stats/scstats.py
>
>  SPREAD = 10
>
>  class cdf( object ):
>     """ Baseclass for the task of determining a sequence of numbers {vi}
>  which is distributed as a random variable X
>     """
>     def integerDensityFunction( self ):
>         """
>         Outputs an integer density function: xs (ints) and ys (probabilities)
>  which are the           correspondence between the whole numbers on the x axis
>  to the probabilities on the y axis, according to a normal distribution.
>         """
>         opt = []
>         for i in r_[-SPREAD:SPREAD:100j]: # 2-tailed test (?)
>             opt.append(( i, stats.norm.cdf(i) )) # ( int, P(int) )
>         return zip(*opt)                        # [ (int...), (P...) ]
>
>     def display( self ):
>         xs, ys = self.integerDensityFunction()
>         plot( xs, ys )
>         show()
>
>  if __name__=='__main__':
>     d = cdf()
>     d.display()

I don't really understand what you're suggesting, but anyway, I don't
see a stats/scstats.py file in the scipy directory (at least in
0.6)...

>  Continuous: i can only suggest using rv_continuous
>
>         stats.rv_continuous.rvs( stats.norm, 0.5*x, 1.0 ).whatever
>
>         .rvs( shape, loc, scale ) is the random variates
>         .pdf( x, shape, loc, scale ) is the probability density function which,
>  i think, is or should be genuinely continuous

My interpretation of this is that it is using the normal distribution
- I want a distribution that is a smoothed/interpolated version of the
discrete distribution I generated above.  I take this to mean there's
no built-in utility to do this, so I just have to make my own - this
seems like a useful thing for data analysis, though, so I may submit
it later to be added to SVN.


More information about the SciPy-user mailing list