[Numpy-discussion] Random number generators
Robert Kern
robert.kern at gmail.com
Mon Sep 4 03:31:02 CDT 2006
Charles R Harris wrote:
> What sort of api should this be? It occurs to me that there are already
> 4 sources of random bytes:
>
> Initialization:
>
> /dev/random (pseudo random, I think)
> /dev/urandom
> crypto system on windows
>
> Pseudo random generators:
>
> mtrand
>
> I suppose we could add some cryptologically secure source as well. That
> indicates to me that one set of random number generators would just be
> streams of random bytes, possibly in 4 byte chunks. If I were doing this
> for linux these would all look like file systems, FUSE comes to mind.
> Another set of functions would transform these into the different
> distributions. So, how much should stay in numpy? What sort of API are
> folks asking for?
It would be good to also support quasi-random generators like Sobol and Halton
sequences. They tend to only generate floating point numbers in [0, 1] (or (0,
1) or [0, 1) or (0, 1]; I think it varies). There are probably other PRNGs that
only output floating point numbers, but I doubt we want to implement any of
them. You can look at the 1.6 version of RandomKit for an implementation of
Sobol sequences and ISAAC, a cryptographic PRNG.
http://www.jeannot.org/~js/code/index.en.html
I'm thinking that the core struct that we're going to pass around will look
something like this:
struct random_state {
void* state;
unsigned int (*gen_uint32)(struct random_state*);
double (*gen_double)(struct random_state*);
}
state -- A pointer to the generator-specific struct.
gen_uint32 -- A function pointer that yields a 32-bit unsigned integer. Possibly
NULL if the generator can not implement such a generator.
gen_double -- A function pointer that yields a double-precision number in [0, 1]
(possibly omitting one or both of the endpoints depending on the implementation).
Possibly this struct needs to be expanded to include function pointers for
destroying the state struct and for a copying the state to a new object. The
seeding function(s) probably don't need to travel in the struct. We should
determine if C++ will work for us, here. Unfortunately, that will force all
extension modules that want to use this API to be C++ modules.
One other feature of the current implementation of state struct is that a few of
the distributions cache information between calls. Notably, the Box-Müller
Gaussian distribution algorithm generates two normal deviates at a time; one is
returned, the other is cached until the next call. The binomial distribution
computes some intermediate values that are reused if the next call has the same
parameters. We should probably store such things separately from the core state
struct this time.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
More information about the Numpy-discussion
mailing list