[SciPy-User] NumPy Binomial BTPE method Problem

josef.pktd@gmai... josef.pktd@gmai...
Wed Oct 3 14:59:05 CDT 2012


On Wed, Oct 3, 2012 at 3:07 PM,  <josef.pktd@gmail.com> wrote:
> On Wed, Oct 3, 2012 at 2:42 PM, Josh Lawrence <josh.k.lawrence@gmail.com> wrote:
>> Hey all,
>>
>> I received access to the paper and it seems it was originally based
>> purely on the paper written by Kachitvichyanukul in 1988. I still
>> think there's a whoopsies with the if ... else if ... else, block
>> though.
>
> the c code "else" looks strange to me,
> however, checking a few cases with large p*n for a large sample (1
> million draws), I don't see any difference of the frequency count to
> the theoretical distribution from scipy.binom.


I'm pretty sure you are right.
(If my reading as non c programmer is correct)
The else block means that Step 50 is never used, instead it uses Step
52, which uses a different approximation that is intended for the
tails.
If Step 52 is relatively close to the result of Step 50, then it will
not be very visible in the final results.
>From my reading of the code there should be a small distortion around the mean.

Josef

>
> (but with all the goto's I'm not sure if I really trigger that path.)
>
> Josef
>
>>
>> On Wed, Oct 3, 2012 at 11:54 AM, Josh Lawrence
>> <josh.k.lawrence@gmail.com> wrote:
>>> Hello all,
>>>
>>> I am implementing a binomial random variable in MATLAB. The default
>>> method in the statistics toolbox is extremely slow for large
>>> population/trial size. I am needing to do trials for n as large as
>>> 2**28. I found in NumPy some code that implements a binomial random
>>> draw in numpy/random/mtrand/distributions.c. I was trying to convert
>>> the code to MATLAB and the BTPE method seems to have an error in lines
>>> 337-341 of distributions.c. The if ... else if ... else statement I
>>> think is incorrect. I think it should be an if ... else ... statement
>>> followed by the contents of the original else which starts on line
>>> 337.
>>>
>>> The if ... else if ... else block is as follows:
>>>
>>> #### begin code snippet ####
>>> if (m < y)
>>> {
>>>     for (i=m; i<=y; i++)
>>>     {
>>>         F *= (a/i - s);
>>>     }
>>> }
>>> else if (m > y)
>>> {
>>>     for (i=y; i<=m; i++)
>>>     {
>>>         F /= (a/i - s);
>>>     }
>>> }
>>> else
>>> {
>>>     if (v > F) goto Step10;
>>>     goto Step60;
>>> }
>>> #### end code snippet ####
>>>
>>> From what I can tell, the variable F is only used in the comparison
>>> within the else{} statment (i.e. the if(v > F) statement) and nowhere
>>> else within the scope of the function.
>>>
>>> I also found a fortran implementation here:
>>> http://wstein.org/home/wstein/www/home/mhansen/spkgs_in_progress/octave-3.2.4/src/libcruft/ranlib/ignbin.f
>>> and it appears this is from where the code was originally adapted as
>>> the variable names are the same.
>>>
>>> My parsing of fortran GOTOs is a bit rusty, but I think the contents
>>> of the else block in above snippet should be not be conditional.
>>>
>>> I don't understand the underlying algorithm very well and don't have
>>> access the the BTPE paper, so I can't comment on the validity of the
>>> fortran code. There just seems to be an error in logic in the above
>>> code. So please have someone who understands it look at it. It appears
>>> Robert Kern wrote the function a decent portion of the file at some
>>> point in the past.
>>>
>>> I hope this helps.
>>>
>>> Cheers,
>>>
>>> --
>>> Josh Lawrence
>>>
>>>
>>> P.S. I apologize if my email is inconvenient, but I could not figure
>>> out how to tell gmail to set the reply-to field to be
>>> scipy-user@scipy.org.
>>
>>
>>
>> --
>> Josh Lawrence
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User@scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user


More information about the SciPy-User mailing list