# [SciPy-User] generic expectation operator

josef.pktd@gmai... josef.pktd@gmai...
Wed Aug 22 18:48:56 CDT 2012

On Wed, Aug 22, 2012 at 5:03 PM,  <josef.pktd@gmail.com> wrote:
> On Wed, Aug 22, 2012 at 4:49 PM, nicky van foreest <vanforeest@gmail.com> wrote:
>> Hi,
>>
>> For numerous purposes I need to compute the expectation of some
>> function with respect to some probability distribution.  i.e. E f(X) =
>> \int f(x) dG(x), where f is the function and G the distribution. When
>> G is continuous, this is simple, just use scipy.integrate.quad.
>> However, if G is discrete, things become more complicated since quad
>> often misses the spikes in the density. To deal with this problem, in
>> part, I came up with the following code, but it is not completely ok
>> yet, see below.
>>
>> #!/usr/bin/env python
>>
>> import scipy.stats as stats
>> import scipy.integrate
>> from math import sqrt
>>
>> def E(f, X):
>>     if hasattr(X,'dist'):
>>         if isinstance(X.dist, stats.rv_discrete):  # hint from Ralph
>>             g = X.pmf
>>         else:
>>             g = X.pdf
>>         return scipy.integrate.quad(lambda x: x*g(x), X.dist.a, X.dist.b)
>>     else:
>>         return sum(f(x)*p for x,p in zip(X.xk, X.pk))
>>
>> # now the following works:
>> X = stats.norm(3,sqrt(3))
>> print E(sqrt, X)

I don't know why this works, sqrt of a normal distributed random
variable has lots of nans (or complex numbers)

>>> f = lambda x: np.sqrt(np.maximum(x,0))
>>> stats.norm.expect(f, loc=3, scale=np.sqrt(3))
1.6394037171420484

>>> f = lambda x: np.sqrt(np.abs(x))
>>> stats.norm.expect(f, loc=3, scale=np.sqrt(3))
1.6708150951924068

>>
>> # this is ok too
>> grid = [50, 100, 150, 200, 250, 300]
>> probs=[2,3,2,1.,0.5,0.5]
>> tot = sum(probs)
>> probs = [p/tot for p in probs]
>> X =  stats.rv_discrete(values=(grid,probs), name='sizeDist')
>> print E(sqrt, X)
>>
>> # but this is not ok
>>
>> X = stats.poisson(3)
>> print E(sqrt, X)

>>> stats.poisson.expect(np.sqrt, args=(3,))
1.6309535375094115

Josef

>>
>> ------------------------------------------
>>
>> This is the output:
>>
>> 11.0383463182
>> (7.771634331185016e-19, 1.916327339868129e-17)
>> (2.9999999999999996, 1.1770368678494054e-08)
>>
>> So indeed, quad misses the spikes in the poisson distribution.
>>
>> Is there any nice way to resolve this? Moreover, is there a better way
>> to build the expectation operator? Perhaps, if all works, it would be
>> a useful add-on to distributions.py, or else I can include it in the
>> distribution tutorial, if other people also think this is useful.
>
> quad works only for continuous functions
>
> did you look at
>
>>>> stats.poisson.expect
> <bound method poisson_gen.expect of
> <scipy.stats.distributions.poisson_gen object at 0x0320FC90>>
>>>> stats.norm.expect
> <bound method norm_gen.expect of <scipy.stats.distributions.norm_gen
> object at 0x031EB890>>
>
> improvements welcome, especially making it more robust
>
> Josef
>
>>
>> thanks
>>
>> Nicky
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user