[SciPy-User] scipy.stats: Sampling from an arbitrary probability distribution
Christoph Deil
Deil.Christoph@googlemail....
Tue Jun 5 04:05:39 CDT 2012
On Jun 4, 2012, at 3:21 PM, Sturla Molden wrote:
> On 03.06.2012 13:20, Daniel Sabinasz wrote:
>> Hi all,
>>
>> I need to sample a random number from a distribution whose probability
>> density function I specify myself. Is that possible using scipy.stats?
>
> Sampling a general distribution is typically an MCMC problem, that e.g.
> can be solved with the Metropolis-Hastings sampler.
>
> http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm
>
> Because of its recursive nature, a Markov chain like this is better
> written in Cython, or you can use NumPy to run multiple chains in
> parallel. (I depends on how many samples you need, of course, anything
> below a million should be fast enough in Python.)
>
> You might also take a look at PyMCMC:
> https://github.com/rdenham/pymcmc
>
>
> Sturla
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
If you are willing to look outside of scipy, there are nice methods to generate random numbers from arbitrary distributions in ROOT, a C++ physics data analysis package with python bindings:
http://root.cern.ch/drupal/content/pyroot
import ROOT
# Define the function and limits you want:
# TF1::TF1(const char* name, const char* formula, Double_t xmin = 0, Double_t xmax = 1)
f = ROOT.TF1("my_pdf", "x * x / 10.", -1, 1)
# Generate 100 random numbers from that distribution
[f.GetRandom() for _ in range(100)]
You can sample from an arbitrary 2D distribution as well:
# TF2::TF2(const char* name, const char* formula, Double_t xmin = 0, Double_t xmax = 1, Double_t ymin = 0, Double_t ymax = 1)
f2 = ROOT.TF2("my_pdf2", "x * x / 10. + pow(y, 4)", -1, 1, 3, 4)
x, y = ROOT.Double(), ROOT.Double()
f2.GetRandom2(x, y)
If you only want a histogram of values, not the array, you can avoid the python call overhead:
# TH1D::TH1D(const char* name, const char* title, Int_t nbinsx, Double_t xlow, Double_t xup)
h = ROOT.TH1D("my_hist", "my_hist", 1000, -1, 1)
# void TH1::FillRandom(const char* fname, Int_t ntimes = 5000)
In [49]: %timeit h.FillRandom("my_pdf", int(1e6))
10 loops, best of 3: 171 ms per loop
In [48]: %timeit [f.GetRandom() for _ in range(int(1e6))]
1 loops, best of 3: 2.62 s per loop
Here you can see the method used (parabolic approximations):
http://root.cern.ch/root/html/src/TF1.cxx.html#gYdi6C
Even if most users don't want to install ROOT, it might be worth comparing the accuracy / speed to the method in scipy.
ROOT also contains the UNURAN package, which implements several methods to sample from arbitrary one- or multi-dimensional distributions.
http://root.cern.ch/root/html/MATH_UNURAN_Index.html
http://statmath.wu.ac.at/unuran/
Unfortunately it's GPL and doesn't have python bindings itself as far as I know.
Christoph
More information about the SciPy-User
mailing list