[SciPy-dev] nbinom.ppf

josef.pktd@gmai... josef.pktd@gmai...
Mon Jul 27 20:11:17 CDT 2009


On Mon, Jul 27, 2009 at 7:21 PM, Robert Kern<robert.kern@gmail.com> wrote:
> On Mon, Jul 27, 2009 at 18:10, <josef.pktd@gmail.com> wrote:
>> On Mon, Jul 27, 2009 at 6:20 PM, Pierre GM<pgmdevlist@gmail.com> wrote:
>>>
>>> On Jul 27, 2009, at 4:58 PM, Robert Kern wrote:
>>>
>>>> On Mon, Jul 27, 2009 at 15:29, <josef.pktd@gmail.com> wrote:
>>>>> On Mon, Jul 27, 2009 at 3:55 PM, Robert Kern<robert.kern@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> ceil() is correct; round() is not. round() would be okay if the only
>>>>>> inputs are expected to be outputs of the CDF, but one frequently
>>>>>> needs
>>>>>> the PPF to take all values in [0,1], like for example doing random
>>>>>> number generation via inversion.
>>>
>>> Fair enough, but a bit frustrating nevertheless in this particular case.
>>>
>>>>> Do you think it would be useful to round if we are epsilon (?) close
>>>>> to the next integer? It is more likely that users have a case like
>>>>> Pierre's where the answer might be the closest integer, instead of an
>>>>> epsilon below. It would move the floating point error, but to an, in
>>>>> actual usage, less likely location.
>>>
>>>
>>>
>>>>> I think, it is in scipy.special where I saw some code that treats
>>>>> anything that is within 1e-8 (?) of an integer as the integer.
>>>>
>>>> I think the code after the first line is an attempt to remedy this
>>>> situation:
>>>>
>>>>    def _ppf(self, q, n, pr):
>>>>        vals = ceil(special.nbdtrik(q,n,pr))
>>>>        vals1 = vals-1
>>>>        temp = special.nbdtr(vals1,n,pr)
>>>>        return where(temp >= q, vals1, vals)
>>>>
>>>> It is possible that "temp >= q" should be replaced with "temp >= q-
>>>> eps".
>>>
>>> Doesn't work in the case I was presenting, as temp is here an array of
>>> NaNs. Using
>>>  >>> vals = ceil(special.nbdtrik(q,n,pr)-eps)
>>> seems to work well enough, provided that eps is around 1e-8 as Josef
>>> suggested.
>>>
>>
>> I finally remembered, that in the test for the discrete distribution
>> or for histogram with integers, I always use the 1e-8 correction, or
>> 1e-14, depending on what is the more likely numerical error in the
>> calculation. I'm usually not sure what the appropriate correction is.
>>
>> special.nbdtrik doesn't have any documentation, so I don't even know
>> what it is supposed to do.
>
> Just grep for it in the scipy/special sources. It's the inverse of
> nbdtr, hence the "i", inverting for the "k" parameter given "n" and
> "p".
>
> The problem here is that we are using nbdtr() inside this function to
> compute temp, but nbdtr doesn't handle n<1. The _cdf() method uses
> betainc() instead. If one replaces these lines:
>
>  vals1 = vals-1
>  temp = special.nbdtr(vals1,n,pr)
>
> with these:
>
>  vals1 = (vals-1).clip(0.0, np.inf)
>  temp = special.betainc(n,vals,pr)
>
> then you get the desired answer.

That's better. It took me a while to understand the logic behind the
way the ceiling error is corrected. The same pattern is also followed
by the other discrete distributions that define a _ppf method. It is
cleaner then the epsilon correction, but takes longer to figure out
what it does.

To understand the logic more easily and to be DRY, it would be better
to replace the duplication of the _cdf method directly with a call to
self._cdf.
For example, in changeset 4673, Robert, you changed the _cdf method to
use betainc instead of nbdtr, but not the _ppf method. Without the
code duplication, partial corrections could be more easily avoided.

Is there a reason not to call self._cdf instead?



A temporary workaround would be to add, in this case, at least 1e-10
to the cdf in the roundtrip to avoid the floating point ceiling error.

>>> nbinom(.3,.15).ppf(nbinom(.3,.15).cdf(np.arange(20))+1e-8)
array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.])
>>> nbinom(.3,.15).ppf(nbinom(.3,.15).cdf(np.arange(20))+1e-10)
array([  1.,   2.,   3.,   4.,   5.,   5.,   7.,   8.,   9.,  10.,  11.,
        12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.])

>>> nbinom(.3,.15).ppf(nbinom(.3,.15).cdf(np.arange(20))+1e-11)
array([  1.,   2.,   2.,   4.,   5.,   5.,   7.,   8.,   9.,  10.,  11.,
        12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.])


Josef

>
> --
> Robert Kern
>
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
>  -- Umberto Eco
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


More information about the Scipy-dev mailing list