[Numpy-discussion] Documenting `zipf`
Robert Kern
robert.kern@gmail....
Thu Jul 24 14:31:01 CDT 2008
On Thu, Jul 24, 2008 at 10:15, Stéfan van der Walt <stefan@sun.ac.za> wrote:
> Hi,
>
> Does anybody know how Zipf's law or how Zipfian distributions work,
> and how they relate
> to NumPy's `np.random.zipf`? I'm afraid I can't make head or tail of
> these results:
>
> In [106]: np.random.zipf(2, size=(10))
> Out[106]: array([ 1, 1, 1, 29, 1, 1, 1, 1, 1, 2])
>
> (8x1, 1x2, 1x29)
>
> In [107]: np.random.zipf(2, size=(10))
> Out[107]: array([75, 1, 1, 3, 1, 1, 1, 1, 1, 4])
>
> (7x1, 1x3, 1x4, 1x75)
>
> In [108]: np.random.zipf(2, size=(10))
> Out[108]: array([ 6, 17, 2, 1, 1, 2, 1, 20, 1, 2])
>
> (4x1, 3x2, 1x6, 1x17, 1x20)
With only 10 samples a piece, it's hard to evaluate what's going on.
zipf(s) samples from a Zipfian distribution with N=inf, using the
terminology as in the Wikipedia article:
http://en.wikipedia.org/wiki/Zipf%27s_law
It's a long-tailed distribution, so you would expect to see one or two
big numbers with s=2. For example, here is the survival function for
the distribution (sf(x) = 1-cdf(x)).
In [23]: from numpy import *
In [24]: def harmonic_number(s, k):
....: x = 1.0 / arange(1,k+1) ** s
....: return x.sum()
....:
In [25]: from scipy.special import zeta
In [26]: def sf(x,s):
....: return 1.0 - harmonic_number(s, int(x)) / zeta(s,1)
....:
In [27]: sf(10, 2.0)
Out[27]: 0.057854194645034718
In [28]: sf(20, 2.0)
Out[28]: 0.029649105042033996
In [29]: sf(60, 2.0)
Out[29]: 0.010048153098031198
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the Numpy-discussion
mailing list