[SciPy-Dev] proper way to test distributions
Mon Jun 14 22:31:11 CDT 2010
On Mon, Jun 14, 2010 at 9:26 PM, Robert Kern <firstname.lastname@example.org> wrote:
> On Mon, Jun 14, 2010 at 22:07, Vincent Davis <email@example.com> wrote:
>> I was reviewing the how tests of distribution where done in scipy with
>> the thought of applying the same methods to numpy.random. I have a lot
>> to learn here and appreciate you suggestions.
>> Link to the scipy test
>> If I understand correctly the tests create a sample of 2000 from a
>> given distribution and the compares stats (mean, var...) calculate
>> with functions from numpy with those stored in the distribution
>> instant .stats I am not sure how the mean is calculated within the
>> distribution (is it just using the scipy mean) Anyway this seems a
>> little circular.
>> Maybe I am missing something but here are my thought.
>> 1) Using seed() and the comparing the actual results (arrays) helps to
>> make sure the code is stable but tells you nothing about the quality
>> of the distribution.
>> 2) Using seed() and the calculating the moments (with numpy and
>> dist.stats) is not really any different that (1)
>> 3) drawing a large sample (possibly using seed()) and calculating the
>> moments and comparing the to the theoretical moments seems like the
>> best option. But this could be slow.
>> What is the best way?
>> What is desired in numpy?
> While it's worthwhile to have both, you really only want (1) in the
> standard unit test suite. (3) is good for working out the bugs in the
> initial implementation (or retroactively doing so after the grad
> student who wrote the initial implementation suddenly ran off and got
> a real job. <ahem>). You can provide them, if you wish to do that
> verification, but it doesn't need to be in the main test suite. (1)
> provides the first layer of protection. If we make an unintentional
> change to the results, (1) will catch it. If we make an intentional
> change, we can use (3) to verify that our changes are good. But we
> don't need to write (3) until we are actually faced with that task.
>> And a little off topic but isn't numpy.random duplicating scipy or
>> scipy duplicating numpy?
> Not really. scipy is using those routines from numpy for most of the
> duplicated distributions. numpy needed that functionality to match
> Numeric's. Of course, this means that scipy's (3)-type tests should be
> providing us coverage for many of numpy's distributions.
Thanks for the feedback, makes sense to me.
> Robert Kern
> "I have come to believe that the whole world is an enigma, a harmless
> enigma that is made terrible by our own mad attempt to interpret it as
> though it had an underlying truth."
> -- Umberto Eco
> SciPy-Dev mailing list
More information about the SciPy-Dev