[SciPy-User] peer review of scientific software
Thu Jun 6 09:44:20 CDT 2013
On Thu, Jun 6, 2013 at 8:59 AM, Matthew Brett <firstname.lastname@example.org> wrote:
> On Thu, Jun 6, 2013 at 1:56 PM, <email@example.com> wrote:
>>>>> I found bugs in scipy.ndimage.shift and in scipy.stats.linregress.
>>>>> The first took me ages to be spotted as I was assuming the error was on
>>>>> my side as scipy was seen as a "large library widely used".
>> Ok, I found the stats.linregress case
>> There is no way I write unit tests for all edge cases that I never
>> expect to show up.
>> For sure you find bugs/behavior like this in many packages, and I
>> wouldn't trust any package for extreme cases, no matter what their
>> test suite is.
> I guess that means the user has to know what you thought an extreme case was?
Anything that gets close to machine precision in a special case
requires special attention.
I assume many scipy.special distribution functions where written with
statistical tests in mind, with maybe good accuracy in the 0.0001 to
0.5 percentiles. I wouldn't trust any of them for extreme tails 1e-30
until I have verified them. And I know in which cases Pauli and others
expanded the range with good precision.
fixed by https://github.com/scipy/scipy/pull/2494
but never went high on *my* priorities
> I think the point of test driven development is precisely in order to
> specify the edges before you've locked yourself down to an
> implementation. If one write's the implementation first one often
> does forget the edges.
"A common mistake that people make when trying to design something
completely foolproof is to underestimate the ingenuity of complete
It's a question of priorities, I don't spend my time coming up with
edge cases where something might fail, and then still only cover 50%
of things users might run into. Some edge cases are important, some
are just a numerical curiosity.
example: minimum sample size for time series analysis in statsmodels
is not checked
I have an open issue for it, but I have no idea why someone would do
time series analysis with 5 observations. It doesn't worry me enough
to drop everything and fix the "bug".
skew and kurtosis tests in scipy.stats now enforce the correct minimum
example almost perfect collinearity in estimating a linear regression:
the model produces nonsense, but what a statistical package is doing
in this case and how close to perfect collinearity it can get without
breaking down varies widely.
my priorities are usually: check that something is correct for 99.5%
of use cases and worry about the other 0.5% when they actually show
And sometimes we have to revise our evaluation, when an edge case that
we never thought off actually occurs pretty regularly.
(if you want an example: problems with perfect prediction in Logit
that neither Skipper nor I knew about until someone ran into it.)
to come back to the original point:
I think edge cases are an area where having a large user base, that
does implicit functional testing, is an advantage, and where I would
trust packages that are popular more than those that have a larger
test suite (when that's not the same).
<making up percentages>
> SciPy-User mailing list
More information about the SciPy-User