[SciPy-dev] the state of scipy unit tests
Mon Nov 24 17:21:35 CST 2008
On Mon, Nov 24, 2008 at 16:58, <email@example.com> wrote:
> On Mon, Nov 24, 2008 at 4:39 PM, Xavier Gnata <firstname.lastname@example.org> wrote:
>>> Nathan Bell wrote:
>>>> I don't understand your argument. You propose to make 'fast' be the
>>>> thing that developers run before committing changes to SVN and then
>>>> argue that this will lead to more tests being run? Who runs the slow
>>> Users. But well, it looks like I am in minority, so let's go for your
>> Well looks like "unitary tests" versus "integration tests".
>> Sounds good. Many users use the svn (for various reasons).
>> >From there point of view, it could be problem when the svn is really broken.
>> Small test to be quite sure the svn is not broken and extensive tests
>> run once per X and/or by users (after a fresh install)
>> Scipy-dev mailing list
> Now that 0.7 has been tagged, shall I decorate my tests as slow?
> nosetests -A "not slow" or scipy.test() will then exclude the 4-5
> minutes of distributions tests.
> Without my tests (default setting with not slow) scipy.stats takes 4-6 seconds.
> I started to profile one of the tests and some distributions are very
> slow, they provide the correct results but the generic way of
> calculating takes a lot of time.
> example: For the R distribution, rdist, the test runs two kolmogorov
> smirnov tests and has about 4 million function calls to the _pdf
> function, I guess mostly to generate 2000 random variables in a
> generic way based only on the pdf.
I don't think we should be doing any K-S tests of the distributions in
the test suite. Once we have validated that our algorithms work (using
these tests, with large sample sizes), we should generate a small
number of variates from each distribution using a fixed seed. The unit
tests in the main test suite will simply generate the same number of
variates with the same seed and directly compare the results. If we
start to get failures, then we can recheck using the K-S tests that
the algorithm is still good, and regenerate the reference variates.
The only problem I can see is if there are platform-dependent results
for some distributions, but that would be very good to figure out now,
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the Scipy-dev