[Numpy-discussion] Documenting `zipf`

Robert Kern robert.kern@gmail....
Thu Jul 24 14:31:01 CDT 2008


On Thu, Jul 24, 2008 at 10:15, Stéfan van der Walt <stefan@sun.ac.za> wrote:
> Hi,
>
> Does anybody know how Zipf's law or how Zipfian distributions work,
> and how they relate
> to NumPy's `np.random.zipf`?  I'm afraid I can't make head or tail of
> these results:
>
> In [106]: np.random.zipf(2, size=(10))
> Out[106]: array([ 1,  1,  1, 29,  1,  1,  1,  1,  1,  2])
>
> (8x1, 1x2, 1x29)
>
> In [107]: np.random.zipf(2, size=(10))
> Out[107]: array([75,  1,  1,  3,  1,  1,  1,  1,  1,  4])
>
> (7x1, 1x3, 1x4, 1x75)
>
> In [108]: np.random.zipf(2, size=(10))
> Out[108]: array([ 6, 17,  2,  1,  1,  2,  1, 20,  1,  2])
>
> (4x1, 3x2, 1x6, 1x17, 1x20)

With only 10 samples a piece, it's hard to evaluate what's going on.
zipf(s) samples from a Zipfian distribution with N=inf, using the
terminology as in the Wikipedia article:

  http://en.wikipedia.org/wiki/Zipf%27s_law

It's a long-tailed distribution, so you would expect to see one or two
big numbers with s=2. For example, here is the survival function for
the distribution (sf(x) = 1-cdf(x)).

In [23]: from numpy import *

In [24]: def harmonic_number(s, k):
   ....:     x = 1.0 / arange(1,k+1) ** s
   ....:     return x.sum()
   ....:

In [25]: from scipy.special import zeta

In [26]: def sf(x,s):
   ....:     return 1.0 - harmonic_number(s, int(x)) / zeta(s,1)
   ....:

In [27]: sf(10, 2.0)
Out[27]: 0.057854194645034718

In [28]: sf(20, 2.0)
Out[28]: 0.029649105042033996

In [29]: sf(60, 2.0)
Out[29]: 0.010048153098031198

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco


More information about the Numpy-discussion mailing list