[Numpy-discussion] Numarray: question on RandomArray2.seed(x=0, y=0) system clock default and possible bug

Todd Miller jmiller at stsci.edu
Tue Jul 23 11:56:04 CDT 2002


Eric Maryniak wrote:

>Dear crunchers,
>
>According to the _Numpy_ manual for RandomArray.seed(x=0, y=0)
>(with /my/ emphasis):
>
>  The seed() function takes two integers and sets the two seeds
>  of the random number generator to those values. If the default
>  values of 0 are used for /both/ x and y, then a seed is generated
>  from the current time, providing a /pseudo-random/ seed.
>
>Note: in numarray, the RandomArray2 package is provided but it's
>description is not (yet) included in the numarray manual.
>
>I have some questions about this:
>
>1. The implementation of seed(), which is, by the way, identical
>   both in Numeric's RandomArray.py and numarray's RandomArray2.py
>   seems to contradict it's usage description:
>
The 2 in RandomArray2 is there to support side-by-side testing with 
Numeric, not to imply something new and improved.  The point of 
providing RandomArray2 is to provide a migration path for current 
Numeric users.  To that end, RandomArray2 should be functionally 
identical to RandomArray.  

That should not, however, discourage you from writing a new and improved 
random number package for numarray.

>
>
>---cut---
>def seed(x=0,y=0):
>    """seed(x, y), set the seed using the integers x, y;
>    Set a random one from clock if  y == 0
>    """
>    if type (x) != IntType or type (y) != IntType :
>        raise ArgumentError, "seed requires integer arguments."
>    if y == 0:
>        import time
>        t = time.time()
>        ndigits = int(math.log10(t))
>        base = 10**(ndigits/2)
>        x = int(t/base)
>        y = 1 + int(t%base)
>    ranlib.set_seeds(x,y)
>---cut---
>
>   Shouldn't the second 'if' be:
>
>    if x == 0 and y == 0:
>
>  With the current implementation:
>
>  - 'seed(3)' will actually use the clock for seeding
>  - it is impossible to specify 0's (0,0) as seed: it might be
>    better to use None as default values?
>
>2. With the current time.time() based default seeding, I wonder
>   if you can call that, from a mathematical point of view,
>   pseudo-random:
>
>---cut---
>$ python
>Python 2.2.1 (#1, Jun 25 2002, 20:45:02)
>[GCC 2.95.3 20010315 (SuSE)] on linux2
>Type "help", "copyright", "credits" or "license" for more information.
>
>>>>from numarray import *
>>>>from RandomArray2 import *
>>>>import time
>>>>numarray.__version__
>>>>
>'0.3.5'
>
>>>>for i in range(5):
>>>>
>...     time.time()
>...     RandomArray2.seed()
>...     RandomArray2.get_seed()
>...     time.sleep(1)
>...     print
>...
>1027434978.406238
>(102743, 4979)
>
>1027434979.400319
>(102743, 4980)
>
>1027434980.400316
>(102743, 4981)
>
>1027434981.40031
>(102743, 4982)
>
>1027434982.400308
>(102743, 4983)
>---cut---
>
>   It is incremental, and if you use default seeding within
>   one (1) second, you get the same seed:
>
>---cut---
>
>>>>for i in range(5):
>>>>
>...     time.time()
>...     RandomArray2.seed()
>...     RandomArray2.get_seed()
>...     time.sleep(0.1)
>...     print
>...
>1027436537.066677
>(102743, 6538)
>
>1027436537.160303
>(102743, 6538)
>
>1027436537.260363
>(102743, 6538)
>
>1027436537.360299
>(102743, 6538)
>
>1027436537.460363
>(102743, 6538)
>---cut---
>
>3. I wonder what the design philosophy is behind the decision
>   to use 'mathematically suspect' seeding as default behavior.
>
Using time for a seed is fairly common.   Since it's an implementation 
detail,  I doubt anyone would object if you can suggest a better default 
seed.

>
>   Apart from the fact that it can hardly be called 'random', I also
>   have the following problems with it:
>
>   - The RandomArray2 module initializes with 'seed()' itself, too.
>     Reload()'s of RandomArray2, which might occur outside the
>     control of the user, will thus override explicit user's seeding.
>     Or am I seeing ghosts here?
>
Overriding a user's explicit seed as a result of a reload sounds correct 
to me.   All of the module's top level statements are re-executed during 
a reload.

>
>   - When doing repeated run's of one's neural net simulations that
>     each take less than a second, one will get identical streams of
>     random numbers, despite seed()'ing each time.
>     Not quite what you would expect or want.
>
This is easy enough to work around: don't seed or re-seed.     If you 
then need to make multiple simulation runs, make a separate module and 
call your simulation like:

import simulation

RandomArray2.seed(something_deterministic, something_else_deterministic)
for i in range(number_of_runs):
    simulation.main()

>
>   - From a purist software engineering point of view, I don't think
>     automagical default behavior is desirable: one wants programs to
>     be deterministic and produce reproducible behavior/output.
>
I don't know.  I think by default,  random numbers *should be* random.

>
>     If you use default seed()'ing now and re-run your program/model
>     later with identical parameters, you will get different output.
>
When you care about this, you need to set the seed to something 
deterministic.

>
>     In Eiffel, object attributes are always initialized, and you will
>     almost never have irreproducible runs. I found that this is a good
>     thing for reproducing ones bugs, too ;-)
>
This sounds like a good design principle, but I don't see anything in 
RandomArray2 which is keeping you from doing this now.

>
>To summarize, my recommendation would be to use None default arguments
>and use, when no user arguments are supplied, a hard (built-in) seed
>tuple, like (1,1) or whatever.
>
Unless there is a general outcry from the rest of the community,  I 
think the (existing) numarray extensions (RandomArray2, LinearAlgebra2, 
FFT2) should try to stay functionally identical with Numeric.

>
>Sometimes a paper on a random number generator suggests seeds (like 4357
>for the MersenneTwister), but of course, a good random number generator
>should behave well independently of the initial seed/seed-tuple.
>I may be completely mistaken here (I'm not an expert on random number
>theory), but the random number generators (Ahrens, et. al) seem 'old'?
>After some studying, we decided to use the Mersenne Twister:
>
An array enabled version might make a good add-on package for numarray.

>
>
>    http://www-personal.engin.umich.edu/~wagnerr/MersenneTwister.html
>    http://www.math.keio.ac.jp/~matumoto/emt.html
>
>PDF article:
>
>    http://www.math.keio.ac.jp/~nisimura/random/doc/mt.pdf
>
>    M. Matsumoto and T. Nishimura,
>        "Mersenne Twister: A 623-dimensionally equidistributed uniform
>         pseudorandom number generator",
>         ACM Trans. on Modeling and Computer Simulation Vol. 8, No. 1,
>         January pp.3-30 1998
>
>There are some Python wrappers and it has good performance as well.
>
>Bye-bye,
>
>Eric
>
Bye,
Todd







More information about the Numpy-discussion mailing list