[Numpy-discussion] Numarray: question on RandomArray2.seed(x=0, y=0) system clock default and possible bug

Eric Maryniak e.maryniak at pobox.com
Wed Jul 24 09:24:14 CDT 2002


On Tuesday 23 July 2002 22:15, paul at pfdubois.com wrote:
> RandomArray got a "special" position as part of Numeric simply by
> historical accident in being there first. I think in the conversion to
> Numarray we will be able to remove such things from the "core" and make
> more of a marketplace of equals for the "addons". As it is now there is
> some implication that somehow one is "better" than the other, which is
> unjustified either mathematically or in the sense of design.
>
> RNG's design is based on my experience with large codes needing many
> independent streams. The mathematics is from a well-tested Cray algorithm.
> I'm sure it could use fluffing up but a good case can be made for it.

A famous quote from Linus is "Nice idea. Now show me the code."

Perhaps a detailed example makes my problem clearer, because as it is now,
RNG and RandomArray2 are not orthogonal in design, in the sense that RNG's
default seed is fixed and RandomArray's is automagical (clock), not
reproducible and mathematically suspect, which I think is not good for the
more naive Python user.

Below I will give intended usage in a provocative way, but please don't
take me too seriously (I know, I don't ;-)

Let's say you have a master shell script that runs a neural net paradigm
(size 20x20) 10 times, each time with the same parameters, to see if it's
stable or chaotic, i.e. does not 'converge' c.q. outcome depends on initial
values (it should not be chaotic, but this should always be checked).

    run10.sh
       tracelink.py 20 20 inputpat.dat > hippocamp01.out
       ... 8 more ...
       tracelink.py 20 20 inputpat.dat > hippocamp10.out

    tracelink.py
       ... import numarray, RandomArray2 _or_ RNG ...
       # Case 1: RandomArray2
       # User uses default clock seed, which is the same
       # during 1 second (see my previous posting).
       # ignlgi(void)'s seeds 1234567890L,123456789L
       # are _not_ used (see com.c).
       RandomArray2.seed()
       # But if omitted, RandomArray2.py does it, too.
       ... calculations
       ... other program outcome _only_ if program runs > 1 second,
       ... otherwise the others will have the same result.
       # Case 2: RNG
       # A 'standard_generator = CreateGenerator(-1)' is automatically done.
       #   seed < 0  ==>  Use the default initial seed value.
       #   seed = 0  ==>  Set a "random" value for the seed from system clock.
       #   seed > 0  ==>  Set seed directly (32 bits only).
       # Thus, the fixed seeds used are 0,0 (see Mixranf() in ranf.c).
       ... calculations
       ... all 10 programs have the same outcome when using ranf(),
       ... because it always starts the same seed, the sequence is always:
       ... 0.58011364857958725, 0.95051273498076583, 0.78637142533060356 etc.
       
The problem with RandomArray's seed is, that it is not truly random itself.
In it's current (time.time based) implementation it is linearly auto
incrementing every second, and therefore suffers from auto-correlation.
Moreover, in the above example, if 10 separate .py runs complete in 1 second
they'll all have the same seed (and outcome). This is not what the user,
if accustomed to clock seeding, would expect.
But if the seed is different each time, a problem is that runs are not
reproducible. Let's say that run hippocamp06.out produced some strange
output: now unless the user saved the seed (with get_seed), it can never
be reproduced.

Therefore, I think RNG's design is better and should be applied to
RandomArray2, too, because RandomArray2's seeding is flawed anyways.
A user should be aware of proper seeding, agreed, and now will be:
when doing multiple identical runs, the same (and thus reproducible)
output will result and so the user is made aware of the fact that,
as an example, he or she should seed or pickle it between runs.
So my suggestion would be to re-implement RandomArray2.seed(x=0,y=0)
as follows:

  if either the x or y seed:

    seed  < 0     ==>  Use the default initial seed value.
    seed  = None  ==>  Set a "random" value for the seed from the system clock.
    seeds >= 0    ==>  Set seed directly (32 bits only).

and en-passant do a better job than clock-based seeding:

---cut---
def seed(x=None,y=None):
    """seed(x, y), set the seed using the integers x, y;
    ...
    """
    if (x != None and type (x) != IntType) or
       (y != None and type (y) != IntType) :
        raise ArgumentError, "seed requires integer arguments (or None)."
    if x == None or y == None:
        import dev_random_device  # uses /dev/random or equivalent
        x = dev_random_device.nextvalue()   # egd.sf.net is a user space
        y = dev_random_device.nextvalue()   # alternative
    elif x < 0 or y < 0:
        x = 1234567890L
        y = 123456789L
    ranlib.set_seeds(x,y)
---cut---

But: I realize that this is different behavior from Python's standard
random and whrandom, where no arg or None uses the clock. But, if that
behavior is kept for RandomArray2 (and RNG should then be adapted, too)
then I'd urge at least to use a better initial seed.
In certain applications, e.g. generating session id's in crypto programs,
non-predictability of initial seeds is crucial. But if you have a look
at GPG's or OpenSSL's source for a PRNG (especially for Windows), it looks
like an art in itself. So perhaps RNG's 'clock code' should replace
RandomArray2's: it uses microseconds (in gettimeofday), too, and thus will
not have the 1-second problem.

Bye-bye, Eric
-- 
Eric Maryniak <e.maryniak at pobox.com>
WWW homepage: http://pobox.com/~e.maryniak/
Mobile phone: +31 6 52047532, or (06) 520 475 32 in NL.

Just because you're not paranoid, that doesn't mean that they're not
after you.




More information about the Numpy-discussion mailing list