[SciPy-user] Sparse Random Variables

Tom Johnson tjhnson@gmail....
Wed Oct 17 14:31:31 CDT 2007


I have some stats questions.

1) Please excuse my ignorance here...but how does one use rv_discrete
without initializing with the 'values' keyword?  For example, if I use
values=((1,2),(.3,.7)), then xk and pk will both be defined....and it
seems strange that pmf() should even bother using the cdf to compute
the probability (and slower?).

2) Suppose I want to store a log distribution, is this easily achievable?

3) I didn't do extensive tests, but it seemed like _entropy() was
usually faster than entropy() even when the distribution was 1e6
possible values.  Is there a reason that default calls to entropy use
the vectorized function?  It seems like most usage cases will be
random variables with much less than 1e6 values...but perhaps not.

4) Also, for some reason entropy() doesn't always work on the first try...

>>> from scipy import *
>>> x = 1e3
>>> v = rand(x)
>>> v = v/sum(x)
>>> a = stats.rv_discrete(name='test', values=(range(x), v))
>>> a.entropy()
>>> a.entropy()

The first entropy raises an error.  The second works.  The problem
seems to be with:

/home/me/lib/python/scipy/stats/distributions.py in entropy(self, *args, **kwds)
-> 3794         place(output,cond0,self.vecentropy(*goodargs))

/home/me/lib/python/numpy/lib/function_base.py in __call__(self, *args)
    941         if self.nout == 1:
--> 942             _res =
    943         else:
    944             _res = tuple([array(x,copy=False).astype(c) \

<type 'exceptions.TypeError'>: function not supported for these types,
and can't coerce safely to supported types

5) I really need to have random variables where the xk are tuples of
the same type (integers xor floats xor strings ...)

p( (0,0) ) = .25
p( (0,1) ) = .25
p( (1,0) ) = .25
p( (1,1) ) = .25


a = stats.rv_discrete(name='test', values=(((0,0),(0,1),(1,0),(1,1)), [.25]*4))


/home/me/lib/python/numpy/core/fromnumeric.py in take(a, indices,
axis, out, mode)
     79     except AttributeError:
     80         return _wrapit(a, 'take', indices, axis, out, mode)
---> 81     return take(indices, axis, out, mode)

<type 'exceptions.IndexError'>: index out of range for array

My initial thought would be that the xk could be anything that is
hashable.  For dictionary-based discrete distributions, I do use
tuples...but I would like to start using scipy.stats.  I am fishing
for too much or in the wrong lake?


More information about the SciPy-user mailing list