[Numpy-discussion] Random number generators

Charles R Harris charlesr.harris at gmail.com
Mon Sep 4 10:30:04 CDT 2006


On 9/4/06, A. M. Archibald <peridot.faceted at gmail.com> wrote:
>
> On 04/09/06, Robert Kern <robert.kern at gmail.com> wrote:
> > Charles R Harris wrote:
> >
> > > What sort of api should this be? It occurs to me that there are
> already
> > > 4 sources of random bytes:
> > >
> > > Initialization:
> > >
> > > /dev/random (pseudo random, I think)
>
> /dev/random is (supposed to be) true randomness derived from various
> sources. It's also fed through a bunch of hashing and CRC functions
> chosen for convenience, but is probably pretty unbiased.
>
> > > /dev/urandom
> > > crypto system on windows
> > >
> > > Pseudo random generators:
> > >
> > > mtrand
> > >
> > > I suppose we could add some cryptologically secure source as well.
> That
> > > indicates to me that one set of random number generators would just be
> > > streams of random bytes, possibly in 4 byte chunks. If I were doing
> this
> > > for linux these would all look like file systems, FUSE comes to mind.
> > > Another set of functions would transform these into the different
> > > distributions. So, how much should stay in numpy? What sort of API are
> > > folks asking for?
>
> "Cryptographically secure random number generator" means something
> different to everybody, and implementing one that everyone is happy
> with is going to be a colossal pain. Look at Yarrow and its design
> documents for some idea of the complexities involved.
>
> However, a random number generator based on a stream cipher is
> probably going to be pretty good, if slow, so implementingone if those
> can't hurt. But I think leaving the possibility open that one could
> use custom sources of random bytes is a very good idea. There's no
> particular need that custom ones should be fast, as they're for use
> when you really want *good* random numbers.


I was thinking it might be nice to have one, just to generate seeds if
nothing else. This could actually be written in python and use something
like the python cryptography
toolkit<http://www.amk.ca/python/writing/pycrypt/pycrypt.html>,
no need to reinvent the wheel. It might be worth looking at that code just
to see how someone else thought through the interface. It's under the Python
license, so I need to know if that license is compatible with the BSD
license used for numpy.

> It would be good to also support quasi-random generators like Sobol and
> Halton
> > sequences. They tend to only generate floating point numbers in [0, 1]
> (or (0,
> > 1) or [0, 1) or (0, 1]; I think it varies). There are probably other
> PRNGs that
> > only output floating point numbers, but I doubt we want to implement any
> of
> > them. You can look at the 1.6 version of RandomKit for an implementation
> of
> > Sobol sequences and ISAAC, a cryptographic PRNG.
> >
> >    http://www.jeannot.org/~js/code/index.en.html
>
> Sobol' sequences may require special care in constructing non-uniform
> distributions in order to preserve their uniformity properties. If we
> can simply import ISAAC, it won't hurt, but I'd rather trust (say) AES
> in counter mode than some custom cryptosystem that hasn't been
> analyzed as thoroughly.
>
> > I'm thinking that the core struct that we're going to pass around will
> look
> > something like this:
> >
> > struct random_state {
> >    void* state;
> >    unsigned int (*gen_uint32)(struct random_state*);
> >    double (*gen_double)(struct random_state*);
> > }
> >
> > state -- A pointer to the generator-specific struct.
> >
> > gen_uint32 -- A function pointer that yields a 32-bit unsigned integer.
> Possibly
> > NULL if the generator can not implement such a generator.
> >
> > gen_double -- A function pointer that yields a double-precision number
> in [0, 1]
> > (possibly omitting one or both of the endpoints depending on the
> implementation).
>
> Are 32-bit numbers really the right least common denominator? There
> are plenty of 64-bit platforms out there...


MWC8222 runs faster (2x) on some 64 bit platforms because it multiplies two
32 bit numbers and uses the 64 bit product. The mtrand generator really uses
a polynomial with coefficients in z_2 and has 624 words worth, an even
number, so could probably be adapted to run directly with 64 bit registers
although the gain might not be much. Most other generators I know of produce
32 bit numbers. Not that I am an expert on the subject.

Given this API, implementing a subclassable class that exports it
> should satisfy most people's needs for interchangeable generators.




A. M. Archibald


Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20060904/d8c7ba3f/attachment.html 


More information about the Numpy-discussion mailing list