[NumPy-Tickets] [NumPy] #1450: Patch with Ziggurat method for Normal distribution
NumPy Trac
numpy-tickets@scipy....
Fri Apr 9 21:30:38 CDT 2010
#1450: Patch with Ziggurat method for Normal distribution
---------------------------------+------------------------------------------
Reporter: ilan | Owner: somebody
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: numpy.random | Version: 1.4.0
Keywords: normal distribution |
---------------------------------+------------------------------------------
Comment(by charris):
We've had better ziggurat methods than those in Doornik for some years
now, but there has been little push in getting them into numpy.
{{{
Hi there.
I've completed timing and robustness tests that should
allow apples to apples comparisons of the mtrand normal distribution
possibilities. The short story goes like this:
0) One of the 'Harris' ziggurats, zig7, passes all TestU01 crush tests
when running atop each of five PRNGS.
1) Depending on how fast the underlying PRNG is, zig7
outperforms Box_Muller by ~1.3X to ~8.5X.
2) using gcc, mwc8222+zig7 outperforms mt19937+Box_Muller by 7X.
See you both at SciPy.
Bruce
And here are the details:
In what follows the uniform variate generators are:
lcg64
mwc8222
mt19937
mt19937_64
yarn5
And the normal distribution codes are:
trng - default normal distribution code in TRNG
boxm - Box-Muller, mtrand lookalike, remembers/uses 2nd value
zig7 - a 'Harris' ziggurat indexed by 7 bits
zig8 - a 'Harris' ziggurat indexed by 8 bits
zig9 - a 'Harris' ziggurat indexed by 9 bits
Here are the numbers in more detail:
# Timings from icc -O2 running on 2.4GhZ Core-2
lcg64 trng: 6.52459e+06 ops per second
lcg64 boxm: 2.18453e+07 ops per second
lcg64 zig7: 1.80616e+08 ops per second
lcg64 zig8: 2.01865e+08 ops per second
lcg64 zig9: 2.05156e+08 ops per second
mwc8222 trng: 6.52459e+06 ops per second
mwc8222 boxm: 2.08787e+07 ops per second
mwc8222 zig7: 9.44663e+07 ops per second
mwc8222 zig8: 1.05326e+08 ops per second
mwc8222 zig9: 1.03478e+08 ops per second
mt19937 trng: 6.41112e+06 ops per second
mt19937 boxm: 1.64986e+07 ops per second
mt19937 zig7: 4.23762e+07 ops per second
mt19937 zig8: 4.52623e+07 ops per second
mt19937 zig9: 4.52623e+07 ops per second
mt19937_64 trng: 6.42509e+06 ops per second
mt19937_64 boxm: 1.93226e+07 ops per second
mt19937_64 zig7: 5.8762e+07 ops per second
mt19937_64 zig8: 6.17213e+07 ops per second
mt19937_64 zig9: 6.29146e+07 ops per second
yarn5 trng: 5.95781e+06 ops per second
yarn5 boxm: 1.19156e+07 ops per second
yarn5 zig7: 1.48945e+07 ops per second
yarn5 zig8: 1.54809e+07 ops per second
yarn5 zig9: 1.53201e+07 ops per second
# Timings from g++ -O2 running on a 2.4GhZ Core-2
lcg64 trng: 6.72163e+06 ops per second
lcg64 boxm: 1.50465e+07 ops per second
lcg64 zig7: 1.31072e+08 ops per second
lcg64 zig8: 1.48383e+08 ops per second
lcg64 zig9: 1.6036e+08 ops per second
mwc8222 trng: 6.64215e+06 ops per second
mwc8222 boxm: 1.44299e+07 ops per second
mwc8222 zig7: 8.903e+07 ops per second
mwc8222 zig8: 1.00825e+08 ops per second
mwc8222 zig9: 1.03478e+08 ops per second
mt19937 trng: 6.52459e+06 ops per second
mt19937 boxm: 1.28223e+07 ops per second
mt19937 zig7: 5.00116e+07 ops per second
mt19937 zig8: 5.41123e+07 ops per second
mt19937 zig9: 5.47083e+07 ops per second
mt19937_64 trng: 6.58285e+06 ops per second
mt19937_64 boxm: 1.42988e+07 ops per second
mt19937_64 zig7: 6.72164e+07 ops per second
mt19937_64 zig8: 7.39591e+07 ops per second
mt19937_64 zig9: 7.46022e+07 ops per second
yarn5 trng: 6.25144e+06 ops per second
yarn5 boxm: 8.93672e+06 ops per second
yarn5 zig7: 1.50465e+07 ops per second
yarn5 zig8: 1.57496e+07 ops per second
yarn5 zig9: 1.56038e+07 ops per second
}}}
You should contact Bruce Carneal <bcarneal@gmail.com> and see if you can
get his code. It is in c++ but can probably be translated to c without too
much trouble. If we are going the ziggurat direction, it would be nice to
implement the exponential distribution also. As you can see, the overall
speedup also depends on the underlying rng, so if you can figure out a way
of making choices available there that will also help. Of course, for
small sets of values call overhead will dominate everything.
Thanks for looking into this problem, all it needs is for someone to take
it in hand and push it through.
--
Ticket URL: <http://projects.scipy.org/numpy/ticket/1450#comment:1>
NumPy <http://projects.scipy.org/numpy>
My example project
More information about the NumPy-Tickets
mailing list