[SciPy-dev] Fwd: tests and partial patch forscipy.stats.distributions

Per.Brodtkorb@f... Per.Brodtkorb@f...
Wed Oct 1 06:14:34 CDT 2008

I have looked into the bugs in the patch of the discrete distributions you sent. The error in _ppf method of dlaplace can be corrected by replacing it with:

class dlaplace_gen(rv_discrete):
    def _ppf(self, q, a):
        const = 1.0/(1+exp(-a))
        cons2 = 1+exp(a)
        ind = q < const
        return ceil(where(ind, log(q*cons2)/a-1, -log((1-q)*cons2)/a))

The tests written for the generic moment, fails for those distributions that set the limits (self.a and/or self.b) for the support of the distribution in the self._argcheck method. I think the generic_moment method is meant to be private to the self._munp method, since it does only limited input checks.
Perhaps better to rename generic_moment to _generic_moment

Per A.

-----Opprinnelig melding-----
Fra: scipy-dev-bounces@scipy.org [mailto:scipy-dev-bounces@scipy.org] På vegne av josef.pktd@gmail.com
Sendt: 29. september 2008 14:51
Til: scipy-dev@scipy.org
Emne: [SciPy-dev] Fwd: tests and partial patch forscipy.stats.distributions

removed patch because of file size, mail bounced

---------- Forwarded message ----------
From: josef.pktd@gmail.com
Date: Mon, 29 Sep 2008 00:00:25 -0400
Subject: tests and partial patch for scipy.stats.distributions
To: scipy-dev@scipy.org

To help in bug hunting and patching of the discrete distribution in
scipy.stats, I wrote several test scripts that could be incorporated
in scipy tests. The attachment contains the corrected
scipy.stats.distributions.py file and 2 test files.

The tests check all discrete distributions at predefined (not randomly
chosen) values. Some distributions have some wrong results for some
other range of parameters.

To try out the patch, it is possible just to backup the current
and copy the patched version in its place.

Right now, I still left many comments (and some commented out failed
attempts for fixing some persistent bugs). In direct comparison with
the current trunk file this might help in understanding what the
underlying bug is.

Since there are a large number of bug fixes in this patch, if any of
them get incorporated in scipy, then they can be selectively copied.
Many bugs only show up once other bugs are fixed.

In terms of importance and sequence the bug fix categories are roughly:

* nin correction for vectorize with *args: without this, some methods
raise exceptions, e.g. _ppf (I think after many shaky attempts, I
found the correct solution)
* _ppf did not handle infinite support properly and requires a .tolist
in call to ``place`` (no idea why, but this works)
* _drv2_moment(self, n, *args) was completely broken, it didn't even
have a return and is the base for generic moment calculations

individual fixes:
* recipinvgauss_gen: rvs added, didn't work before
* def moment: for 2nd moment returned the mean (both discrete and
continuous distribution
* randint: -1 missing in _ppf, definitions differs from those in description pdf
* dlaplace: _ppf gives wrong results in tests, replace with generic calculation

(Note: the first vectorize nin correction in continuous distributions
is still at the wrong spot. I was only looking at discrete
distributions in the last few days, since I figured out how vectorize
nin works)

If you want a cleaner patch or more information, I can provide it next
week. Because, I never used numpy/scipy heavily, it took me a long
time to figure out what is going on, but it should be more obvious to
an expert. But overall my impression is, that using features in
numpy.stats.distribution, that are not commonly used, is pretty risky,
and until recently didn't have any stringent tests and does not have
any reasonably large test coverage. For serious work, I would want to
make sure that the results are actually correct.

There are still several things that are not covered, e.g. handling of
loc and scale, other methods, correctness over full range of
parameters. I haven't checked the continuous distributions since the
initial fuzz tests, but the test coverage for all methods looks pretty


Test f Basic Properties

The first test just tests for basic properties and their consistency,
e.g. cdf, pmf, ppf, stats and moments and the internal methods e.g
_ppf. The tests check quite a bit for internal consistency (private
methods) since I used the tests for debugging and trying out fixes.

>python C:\Programs\Python25\Scripts\nosetests-script.py  -s test_discrete_basic.py

with current trunk, I get for this test:

Ran 112 tests in 0.140s
FAILED (errors=41, failures=15)

after the bugfixes, I get:

Ran 112 tests in 8.672s
FAILED (failures=5)

some failures in binom, dlaplace, zipf are still remaining

Chisquare Test

the second test is a chisquare test for the random variables to be
close to the theoretical distribution as defined by .cdf

with current trunk I get for this test:

>python C:\Programs\Python25\Scripts\nosetests-script.py  -s test_ndiscrete.py

Ran 12 tests in 0.453s
FAILED (errors=5)

after patching scipy.stats.distribution the only test failure left is
in logser, where I think the numpy.random random numbers are wrong

>python C:\Programs\Python25\Scripts\nosetests-script.py  -s test_ndiscrete.py

bernoulli. binom. boltzmann. dlaplace. geom. hypergeom. logserF nbinom. planck.
poisson. randint. zipf.
FAIL: test_ndiscrete.test_discrete_rvscdf('logser', (0.59999999999999998,))
Traceback (most recent call last):
  File "c:\programs\python25\lib\site-packages\nose-0.10.3-py2.5.egg\nose\case.p
y", line 182, in runTest
  File "C:\Josef\work-oth\sort\pypi\test_ndiscrete.py", line 76, in checkchisqua
    assert (pval > alpha), 'chisquare - test for %s at arg = %s' % (distname,str
AssertionError: chisquare - test for logser at arg = (0.59999999999999998,)

Ran 12 tests in 19.781s

FAILED (failures=1)

scipy.stats tests: no failures

running the current trunk test for scipy.stats on the pre- and post
patch distributions.py gives no errors:

>>> from scipy import stats
>>> stats.test()
Running unit tests for scipy.stats
NumPy version 1.2.0rc2
NumPy is installed in C:\Programs\Python25\lib\site-packages\numpy
SciPy version 0.7.0.dev
SciPy is installed in C:\Josef\_progs\virtualpy25\envscipy\lib\site-packages\sci
Python version 2.5.2 (r252:60911, Feb 21 2008, 13:11:45) [MSC v.1310 32 bit (Int
nose version 0.10.3
618: UserWarning: Ties preclude use of exact statistic.
  warnings.warn("Ties preclude use of exact statistic.")
\lib\site-packages\numpy\lib\function_base.py:343: Warning:
            The semantics of histogram has been modified in
            the current release to fix long-standing issues with
            outliers handling. The main changes concern
            1. the definition of the bin edges,
               now including the rightmost edge, and
            2. the handling of upper outliers, now ignored rather
               than tallied in the rightmost bin.
            The previous behaviour is still accessible using
            `new=False`, but is scheduled to be deprecated in the
            next release (1.3).

            *This warning will not printed in the 1.3 release.*

            Use `new=True` to bypass this warning.

            Please read the docstring for more information.

  """, Warning)
Ran 233 tests in 2.266s

<nose.result.TextTestResult run=233 errors=0 failures=0>

More information about the Scipy-dev mailing list