[Numpy-tickets] [NumPy] #923: numpy.random.logseries - incorrect convergence for k=1, k=2

NumPy numpy-tickets@scipy....
Thu Oct 2 11:28:03 CDT 2008


#923: numpy.random.logseries - incorrect convergence for k=1, k=2
--------------------------+-------------------------------------------------
 Reporter:  josefpktd     |       Owner:  somebody
     Type:  defect        |      Status:  new     
 Priority:  normal        |   Milestone:          
Component:  numpy.random  |     Version:  none    
 Severity:  normal        |    Keywords:          
--------------------------+-------------------------------------------------
 random numbers generated by numpy.random.logseries do not converge to the
 theoretical distribution:

 Note: I checked with sample size 1 million, but numpy.random.logseries
 converges already for smaller sample sizes to the wrong values for k=1 and
 k=2.

 For probability paramater pr = 0.8, the random number generator converges
 to a
 frequency for k=1 at 39.8 %, while the theoretical probability mass is
 49.71%, k=2 is oversampled, other k's look ok

 {{{
 check frequency of k=1 and k=2 at N = 1000000
 0.398406 0.296465
 pmf at k = 1 and k=2 with formula
 [ 0.4971  0.1988]
 }}}

 For probability paramater pr = 0.3, the results are not as bad, but are
 still off:
 frequency for k=1 at 82.6 %, while the theoretical probability mass is
 84.11%

 {{{
 check frequency of k=1 and k=2 at N = 1000000
 0.826006 0.141244
 pmf at k = 1 and k=2 with formula
 [ 0.8411  0.1262]
 }}}

 below is a quick script for checking this

 Josef

 {{{
 import numpy as np
 from scipy import stats

 pr = 0.8
 np.set_printoptions(precision=2, suppress=True)

 # calculation for N=1million takes some time
 for N in [1000, 10000, 10000, 1000000]:
     rvsn=np.random.logseries(pr,size=N)
     fr=stats.itemfreq(rvsn)
     pmfs=stats.logser.pmf(fr[:,0],pr)*100
     print 'log series sample frequency and pmf (in %) with N = ', N
     print np.column_stack((fr[:,0],fr[:,1]*100.0/N,pmfs))

 np.set_printoptions(precision=4, suppress=True)

 print 'check frequency of k=1 and k=2 at N = ', N
 print np.sum(rvsn==1)/float(N),
 print np.sum(rvsn==2)/float(N)

 k = np.array([1,2])
 print 'pmf at k = 1 and k=2 with formula'
 print -pr**k * 1.0 / k / np.log(1-pr)
 }}}

-- 
Ticket URL: <http://scipy.org/scipy/numpy/ticket/923>
NumPy <http://projects.scipy.org/scipy/numpy>
The fundamental package needed for scientific computing with Python.


More information about the Numpy-tickets mailing list