[Numpy-discussion] Numarray: question on RandomArray2.seed(x=0, y=0) system clock default and possible bug
Todd Miller
jmiller at stsci.edu
Tue Jul 23 11:56:04 CDT 2002
Eric Maryniak wrote:
>Dear crunchers,
>
>According to the _Numpy_ manual for RandomArray.seed(x=0, y=0)
>(with /my/ emphasis):
>
> The seed() function takes two integers and sets the two seeds
> of the random number generator to those values. If the default
> values of 0 are used for /both/ x and y, then a seed is generated
> from the current time, providing a /pseudo-random/ seed.
>
>Note: in numarray, the RandomArray2 package is provided but it's
>description is not (yet) included in the numarray manual.
>
>I have some questions about this:
>
>1. The implementation of seed(), which is, by the way, identical
> both in Numeric's RandomArray.py and numarray's RandomArray2.py
> seems to contradict it's usage description:
>
The 2 in RandomArray2 is there to support side-by-side testing with
Numeric, not to imply something new and improved. The point of
providing RandomArray2 is to provide a migration path for current
Numeric users. To that end, RandomArray2 should be functionally
identical to RandomArray.
That should not, however, discourage you from writing a new and improved
random number package for numarray.
>
>
>---cut---
>def seed(x=0,y=0):
> """seed(x, y), set the seed using the integers x, y;
> Set a random one from clock if y == 0
> """
> if type (x) != IntType or type (y) != IntType :
> raise ArgumentError, "seed requires integer arguments."
> if y == 0:
> import time
> t = time.time()
> ndigits = int(math.log10(t))
> base = 10**(ndigits/2)
> x = int(t/base)
> y = 1 + int(t%base)
> ranlib.set_seeds(x,y)
>---cut---
>
> Shouldn't the second 'if' be:
>
> if x == 0 and y == 0:
>
> With the current implementation:
>
> - 'seed(3)' will actually use the clock for seeding
> - it is impossible to specify 0's (0,0) as seed: it might be
> better to use None as default values?
>
>2. With the current time.time() based default seeding, I wonder
> if you can call that, from a mathematical point of view,
> pseudo-random:
>
>---cut---
>$ python
>Python 2.2.1 (#1, Jun 25 2002, 20:45:02)
>[GCC 2.95.3 20010315 (SuSE)] on linux2
>Type "help", "copyright", "credits" or "license" for more information.
>
>>>>from numarray import *
>>>>from RandomArray2 import *
>>>>import time
>>>>numarray.__version__
>>>>
>'0.3.5'
>
>>>>for i in range(5):
>>>>
>... time.time()
>... RandomArray2.seed()
>... RandomArray2.get_seed()
>... time.sleep(1)
>... print
>...
>1027434978.406238
>(102743, 4979)
>
>1027434979.400319
>(102743, 4980)
>
>1027434980.400316
>(102743, 4981)
>
>1027434981.40031
>(102743, 4982)
>
>1027434982.400308
>(102743, 4983)
>---cut---
>
> It is incremental, and if you use default seeding within
> one (1) second, you get the same seed:
>
>---cut---
>
>>>>for i in range(5):
>>>>
>... time.time()
>... RandomArray2.seed()
>... RandomArray2.get_seed()
>... time.sleep(0.1)
>... print
>...
>1027436537.066677
>(102743, 6538)
>
>1027436537.160303
>(102743, 6538)
>
>1027436537.260363
>(102743, 6538)
>
>1027436537.360299
>(102743, 6538)
>
>1027436537.460363
>(102743, 6538)
>---cut---
>
>3. I wonder what the design philosophy is behind the decision
> to use 'mathematically suspect' seeding as default behavior.
>
Using time for a seed is fairly common. Since it's an implementation
detail, I doubt anyone would object if you can suggest a better default
seed.
>
> Apart from the fact that it can hardly be called 'random', I also
> have the following problems with it:
>
> - The RandomArray2 module initializes with 'seed()' itself, too.
> Reload()'s of RandomArray2, which might occur outside the
> control of the user, will thus override explicit user's seeding.
> Or am I seeing ghosts here?
>
Overriding a user's explicit seed as a result of a reload sounds correct
to me. All of the module's top level statements are re-executed during
a reload.
>
> - When doing repeated run's of one's neural net simulations that
> each take less than a second, one will get identical streams of
> random numbers, despite seed()'ing each time.
> Not quite what you would expect or want.
>
This is easy enough to work around: don't seed or re-seed. If you
then need to make multiple simulation runs, make a separate module and
call your simulation like:
import simulation
RandomArray2.seed(something_deterministic, something_else_deterministic)
for i in range(number_of_runs):
simulation.main()
>
> - From a purist software engineering point of view, I don't think
> automagical default behavior is desirable: one wants programs to
> be deterministic and produce reproducible behavior/output.
>
I don't know. I think by default, random numbers *should be* random.
>
> If you use default seed()'ing now and re-run your program/model
> later with identical parameters, you will get different output.
>
When you care about this, you need to set the seed to something
deterministic.
>
> In Eiffel, object attributes are always initialized, and you will
> almost never have irreproducible runs. I found that this is a good
> thing for reproducing ones bugs, too ;-)
>
This sounds like a good design principle, but I don't see anything in
RandomArray2 which is keeping you from doing this now.
>
>To summarize, my recommendation would be to use None default arguments
>and use, when no user arguments are supplied, a hard (built-in) seed
>tuple, like (1,1) or whatever.
>
Unless there is a general outcry from the rest of the community, I
think the (existing) numarray extensions (RandomArray2, LinearAlgebra2,
FFT2) should try to stay functionally identical with Numeric.
>
>Sometimes a paper on a random number generator suggests seeds (like 4357
>for the MersenneTwister), but of course, a good random number generator
>should behave well independently of the initial seed/seed-tuple.
>I may be completely mistaken here (I'm not an expert on random number
>theory), but the random number generators (Ahrens, et. al) seem 'old'?
>After some studying, we decided to use the Mersenne Twister:
>
An array enabled version might make a good add-on package for numarray.
>
>
> http://www-personal.engin.umich.edu/~wagnerr/MersenneTwister.html
> http://www.math.keio.ac.jp/~matumoto/emt.html
>
>PDF article:
>
> http://www.math.keio.ac.jp/~nisimura/random/doc/mt.pdf
>
> M. Matsumoto and T. Nishimura,
> "Mersenne Twister: A 623-dimensionally equidistributed uniform
> pseudorandom number generator",
> ACM Trans. on Modeling and Computer Simulation Vol. 8, No. 1,
> January pp.3-30 1998
>
>There are some Python wrappers and it has good performance as well.
>
>Bye-bye,
>
>Eric
>
Bye,
Todd
More information about the Numpy-discussion
mailing list