[SciPy-user] Sparse Random Variables
Tom Johnson
tjhnson@gmail....
Wed Oct 17 14:31:31 CDT 2007
Hi,
I have some stats questions.
1) Please excuse my ignorance here...but how does one use rv_discrete
without initializing with the 'values' keyword? For example, if I use
values=((1,2),(.3,.7)), then xk and pk will both be defined....and it
seems strange that pmf() should even bother using the cdf to compute
the probability (and slower?).
2) Suppose I want to store a log distribution, is this easily achievable?
3) I didn't do extensive tests, but it seemed like _entropy() was
usually faster than entropy() even when the distribution was 1e6
possible values. Is there a reason that default calls to entropy use
the vectorized function? It seems like most usage cases will be
random variables with much less than 1e6 values...but perhaps not.
4) Also, for some reason entropy() doesn't always work on the first try...
>>> from scipy import *
>>> x = 1e3
>>> v = rand(x)
>>> v = v/sum(x)
>>> a = stats.rv_discrete(name='test', values=(range(x), v))
>>> a.entropy()
>>> a.entropy()
The first entropy raises an error. The second works. The problem
seems to be with:
/home/me/lib/python/scipy/stats/distributions.py in entropy(self, *args, **kwds)
-> 3794 place(output,cond0,self.vecentropy(*goodargs))
/home/me/lib/python/numpy/lib/function_base.py in __call__(self, *args)
940
941 if self.nout == 1:
--> 942 _res =
array(self.ufunc(*args),copy=False).astype(self.otypes[0])
943 else:
944 _res = tuple([array(x,copy=False).astype(c) \
<type 'exceptions.TypeError'>: function not supported for these types,
and can't coerce safely to supported types
5) I really need to have random variables where the xk are tuples of
the same type (integers xor floats xor strings ...)
p( (0,0) ) = .25
p( (0,1) ) = .25
p( (1,0) ) = .25
p( (1,1) ) = .25
but
a = stats.rv_discrete(name='test', values=(((0,0),(0,1),(1,0),(1,1)), [.25]*4))
yields
/home/me/lib/python/numpy/core/fromnumeric.py in take(a, indices,
axis, out, mode)
79 except AttributeError:
80 return _wrapit(a, 'take', indices, axis, out, mode)
---> 81 return take(indices, axis, out, mode)
82
83
<type 'exceptions.IndexError'>: index out of range for array
My initial thought would be that the xk could be anything that is
hashable. For dictionary-based discrete distributions, I do use
tuples...but I would like to start using scipy.stats. I am fishing
for too much or in the wrong lake?
Thanks.
More information about the SciPy-user
mailing list