[Numpy-discussion] Multivariate hypergeometric distribution?
Skipper Seabold
jsseabold@gmail....
Mon Jul 2 21:31:43 CDT 2012
On Mon, Jul 2, 2012 at 9:35 PM, <josef.pktd@gmail.com> wrote:
> On Mon, Jul 2, 2012 at 8:08 PM, <josef.pktd@gmail.com> wrote:
> > On Mon, Jul 2, 2012 at 4:16 PM, Fernando Perez <fperez.net@gmail.com>
> wrote:
> >> Hi all,
> >>
> >> in recent work with a colleague, the need came up for a multivariate
> >> hypergeometric sampler; I had a look in the numpy code and saw we have
> >> the bivariate version, but not the multivariate one.
> >>
> >> I had a look at the code in scipy.stats.distributions, and it doesn't
> >> look too difficult to add a proper multivariate hypergeometric by
> >> extending the bivariate code, with one important caveat: the hard part
> >> is the implementation of the actual discrete hypergeometric sampler,
> >> which lives inside of numpy/random/mtrand/distributions.c:
> >>
> >>
> https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/distributions.c#L743
> >>
> >> That code is hand-written C, and it only works for the bivariate case
> >> right now. It doesn't look terribly difficult to extend, but it will
> >> certainly take a bit of care and testing to ensure all edge cases are
> >> handled correctly.
> >
> > My only foray into this
> >
> > http://projects.scipy.org/numpy/ticket/921
> > http://projects.scipy.org/numpy/ticket/923
> >
> > This looks difficult to add without a good reference and clear
> > description of the algorithm.
> >
> >>
> >> Does anyone happen to have that implemented lying around, in a form
> >> that would be easy to merge to add this capability to numpy?
> >
> > not me, I have never even heard of multivariate hypergeometric
> distribution.
> >
> >
> > maybe http://hal.inria.fr/docs/00/11/00/56/PDF/perm.pdf p.11
> > with some properties
> http://www.math.uah.edu/stat/urn/MultiHypergeometric.html
> >
> > I've seen one other algorithm, that seems to need N (number of draws
> > in hypergeom) random variables for one multivariate hypergeometric
> > random draw, which seems slow to me.
> >
> > But maybe someone has it lying around.
>
> Now I have a pure num/sci/python version around.
>
> A bit more than an hour, so no guarantees, but freq and pmf look close
> enough.
I could be wrong, but I think PyMC has sampling and likelihood.
Skipper
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120702/836c0e4f/attachment-0001.html
More information about the NumPy-Discussion
mailing list