[SciPy-dev] nbinom.ppf

Robert Kern robert.kern@gmail....
Mon Jul 27 18:21:42 CDT 2009


On Mon, Jul 27, 2009 at 18:10, <josef.pktd@gmail.com> wrote:
> On Mon, Jul 27, 2009 at 6:20 PM, Pierre GM<pgmdevlist@gmail.com> wrote:
>>
>> On Jul 27, 2009, at 4:58 PM, Robert Kern wrote:
>>
>>> On Mon, Jul 27, 2009 at 15:29, <josef.pktd@gmail.com> wrote:
>>>> On Mon, Jul 27, 2009 at 3:55 PM, Robert Kern<robert.kern@gmail.com>
>>>> wrote:
>>>>>
>>>>> ceil() is correct; round() is not. round() would be okay if the only
>>>>> inputs are expected to be outputs of the CDF, but one frequently
>>>>> needs
>>>>> the PPF to take all values in [0,1], like for example doing random
>>>>> number generation via inversion.
>>
>> Fair enough, but a bit frustrating nevertheless in this particular case.
>>
>>>> Do you think it would be useful to round if we are epsilon (?) close
>>>> to the next integer? It is more likely that users have a case like
>>>> Pierre's where the answer might be the closest integer, instead of an
>>>> epsilon below. It would move the floating point error, but to an, in
>>>> actual usage, less likely location.
>>
>>
>>
>>>> I think, it is in scipy.special where I saw some code that treats
>>>> anything that is within 1e-8 (?) of an integer as the integer.
>>>
>>> I think the code after the first line is an attempt to remedy this
>>> situation:
>>>
>>>    def _ppf(self, q, n, pr):
>>>        vals = ceil(special.nbdtrik(q,n,pr))
>>>        vals1 = vals-1
>>>        temp = special.nbdtr(vals1,n,pr)
>>>        return where(temp >= q, vals1, vals)
>>>
>>> It is possible that "temp >= q" should be replaced with "temp >= q-
>>> eps".
>>
>> Doesn't work in the case I was presenting, as temp is here an array of
>> NaNs. Using
>>  >>> vals = ceil(special.nbdtrik(q,n,pr)-eps)
>> seems to work well enough, provided that eps is around 1e-8 as Josef
>> suggested.
>>
>
> I finally remembered, that in the test for the discrete distribution
> or for histogram with integers, I always use the 1e-8 correction, or
> 1e-14, depending on what is the more likely numerical error in the
> calculation. I'm usually not sure what the appropriate correction is.
>
> special.nbdtrik doesn't have any documentation, so I don't even know
> what it is supposed to do.

Just grep for it in the scipy/special sources. It's the inverse of
nbdtr, hence the "i", inverting for the "k" parameter given "n" and
"p".

The problem here is that we are using nbdtr() inside this function to
compute temp, but nbdtr doesn't handle n<1. The _cdf() method uses
betainc() instead. If one replaces these lines:

  vals1 = vals-1
  temp = special.nbdtr(vals1,n,pr)

with these:

  vals1 = (vals-1).clip(0.0, np.inf)
  temp = special.betainc(n,vals,pr)

then you get the desired answer.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


More information about the Scipy-dev mailing list