[SciPy-dev] nbinom.ppf

Robert Kern robert.kern@gmail....
Mon Jul 27 15:58:44 CDT 2009


On Mon, Jul 27, 2009 at 15:29, <josef.pktd@gmail.com> wrote:
> On Mon, Jul 27, 2009 at 3:55 PM, Robert Kern<robert.kern@gmail.com> wrote:
>> On Mon, Jul 27, 2009 at 14:47, Pierre GM<pgmdevlist@gmail.com> wrote:
>>> All,
>>> I'm puzzled by this result:
>>>  >>> from scipy.stats.distributions import nbinom
>>>  >>>  nbinom(.3,.15).ppf(nbinom(.3,.15).cdf(np.arange(20)))
>>> array([  1.,   2.,   2.,   3.,   5.,   5.,   6.,   8.,   9.,  10.,  11.,
>>>         12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.])
>>>
>>> I would have naturally expected np.arange(20).
>>
>> Floating point shenanigans. The CDF and PPF of discrete distributions
>> are step functions. Round-tripping involves evaluating the PPF around
>> the step. Naturally, floating point errors are going to nudge you to
>> the left or right of that step.
>>
>>> Using np.round instead of np.ceil in nbinom_gen._ppf seems to solve
>>> the issue.
>>> Is there any reason not to do it ?
>>
>> ceil() is correct; round() is not. round() would be okay if the only
>> inputs are expected to be outputs of the CDF, but one frequently needs
>> the PPF to take all values in [0,1], like for example doing random
>> number generation via inversion.
>
> Do you think it would be useful to round if we are epsilon (?) close
> to the next integer? It is more likely that users have a case like
> Pierre's where the answer might be the closest integer, instead of an
> epsilon below. It would move the floating point error, but to an, in
> actual usage, less likely location.
>
> I think, it is in scipy.special where I saw some code that treats
> anything that is within 1e-8 (?) of an integer as the integer.

I think the code after the first line is an attempt to remedy this situation:

    def _ppf(self, q, n, pr):
        vals = ceil(special.nbdtrik(q,n,pr))
        vals1 = vals-1
        temp = special.nbdtr(vals1,n,pr)
        return where(temp >= q, vals1, vals)

It is possible that "temp >= q" should be replaced with "temp >= q-eps".

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco


More information about the Scipy-dev mailing list