[SciPy-User] scipy.stats convolution of two distributions

josef.pktd@gmai... josef.pktd@gmai...
Mon Apr 9 17:36:30 CDT 2012


On Mon, Apr 9, 2012 at 6:06 PM,  <josef.pktd@gmail.com> wrote:
> On Mon, Apr 9, 2012 at 5:04 PM, nicky van foreest <vanforeest@gmail.com> wrote:
>> Hi,
>>
>> In one of my projects I built some code that depends in a nice and
>> generic way on the methods of rv_continuous in scipy.stats. Now it
>> turns out that I shot myself in the foot because I need to run a test
>> with the sum (convolution) of three such distributions. As far as I
>> can see there is no standard way to achieve this in scipy.stats. Does
>> anybody know of a good (generic) way to do this? If not, would it
>> actually be useful to add such functionality to scipy.stats, if
>> possible at all?
>>
>> After some struggling I wrote the code below, but this is a first
>> attempt. I am very interested in obtaining feedback to turn this into
>> something that is useful for a larger population than just me.
>
> Sounds fun.
>
> The plots are a bit misleading, because matplotlib is doing the
> interpolation for you
>
> pl.plot(grid,D1.cdf(grid), '.')
> pl.plot(grid,conv.cdf(grid), '.')
> pl.plot(grid,conv2.cdf(grid), '.')
>
> The main problem is the mixing of continuous distribution and discrete
> grid. pdf, cdf, .... want to evaluate at a point and with floating
> points it's perilous (as we discussed before).
>
> As I read it, neither your cdf nor pdf return specific points.
>
> I think the easiest would be to work with linear interpolation
> interp1d on a fine grid, but I never checked how fast this is for a
> fine grid.
> If cdf and pdf are defined with a piecewise polynomial, then they can
> be evaluated at any points and the generic class, rv_continuous,
> should be able to handle everything.
> The alternative would be to work with the finite grid and use the
> fixed spacing to define a lattice distribution, but that doesn't exist
> in scipy.
>
> I haven't thought about the convolution itself yet, (an alternative
> would be using fft to work with the characteristic function.)
>
> If you use this for a test, how much do you care about having accurate
> tails, or is effectively truncating the distribution ok?

The other question I thought about in a similar situation is, what is
the usage or access pattern.

For many cases, I'm not really interested in evaluating the pdf at
specific points, but over a range or interval of points. In this case
relying on floating point access doesn't look like the best way to go,
and I spend more time on the `expect` method. Calculating expectation
of a function w.r.t the distribution that can use the internal
representation instead of the generic integrate.quad.

Josef


>
> Josef
>
>>
>> Thanks in advance
>>
>> Nicky
>>
>> import numpy as np
>> import scipy.stats
>> from scipy.stats import poisson, uniform, expon
>> import pylab as pl
>>
>> # I need a grid since np.convolve requires two arrays.
>>
>> # choose the grid such that it covers the numerically relevant support
>> # of the distributions
>> grid = np.arange(0., 5., 0.001)
>>
>> # I need P(D1+D2+D3 <= x)
>> D1 = expon(scale = 1./2)
>> D2 = expon(scale = 1./3)
>> D3 = expon(scale = 1./6)
>>
>> class convolved_gen(scipy.stats.rv_continuous):
>>    def __init__(self, D1, D2, grid):
>>        self.D1 = D1
>>        self.D2 = D2
>>        delta = grid[1]-grid[0]
>>        p1 = self.D1.pdf(grid)
>>        p2 = self.D2.pdf(grid)*delta
>>        self.conv = np.convolve(p1, p2)
>>
>>        super(convolved_gen, self).__init__(name = "convolved")
>>
>>    def _cdf(self, grid):
>>        cdf = np.cumsum(self.conv)
>>        return cdf/cdf[-1] # ensure that cdf[-1] = 1
>>
>>    def _pdf(self, grid):
>>        return self.conv[:len(grid)]
>>
>>    def _stats(self):
>>        m = self.D1.stats("m") + self.D2.stats("m")
>>        v = self.D1.stats("v") + self.D2.stats("v")
>>        return m, v, 0., 0.
>>
>>
>>
>> convolved = convolved_gen(D1, D2, grid)
>> conv = convolved()
>>
>> convolved2 = convolved_gen(conv, D3, grid)
>> conv2 = convolved2()
>>
>> pl.plot(grid,D1.cdf(grid))
>> pl.plot(grid,conv.cdf(grid))
>> pl.plot(grid,conv2.cdf(grid))
>> pl.show()
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user


More information about the SciPy-User mailing list