[SciPy-User] peer review of scientific software

Matthew Brett matthew.brett@gmail....
Wed Jun 5 04:05:23 CDT 2013


On Tue, Jun 4, 2013 at 4:27 AM,  <josef.pktd@gmail.com> wrote:
> On Tue, Jun 4, 2013 at 4:07 AM, Suzen, Mehmet <msuzen@gmail.com> wrote:
>> On 28 May 2013 20:23, Calvin Morrison <mutantturkey@gmail.com> wrote:
>>> http://arxiv.org/pdf/1210.0530v3.pdf
>>> Pissed-off Scientific Programmer,
>>> Calvin Morrison
>> Those recent papers and discussions all talk about good practises. I
>> was thinking
>> today in the bus, why there are not many literature on scientific
>> software development
>> methodologies. One explicit paper I found was from 80s called
>> A Development Methodology for Scientific Software
>> Cort, G. et. al.
>> http://dx.doi.org/10.1109/TNS.1985.4333629
>> It is pretty classic approach for today's standard,  There is also a book about
>> generic style and good practice, its a pretty good book (might be
>> mentioned in this list before):
>> Writing Scientific Software: A Guide to Good Style
>> Suely Oliveira and David E. Stewart
>> http://www.cambridge.org/9780521858960
>> but I don't see any reference to modern development methodologies specifically
>> address to scientific software. For example: extensions of test driven
>> development,
>> which would suit better than classic
>> specification-design-coding-testing. Test cases
>> would be directly related to what we would like to achieve in the
>> first place. For example
>> a generic density of something etc. I haven't heard anyone developing
>> scientific software
>> in this way...yet.
> I think functional (not unit) testing is pretty much the standard in
> the area of developing statistical algorithms even if nobody calls it
> that way. And I don't know of any references to software development
> for it.
> When writing a library function for existing algorithms, then it is
> standard to test it against existing results. Many (or most) software
> packages, or articles that describe the software, show that they
> reproduce existing results as test cases.
> (And that's the way we work for statsmodels.)
> For new algorithms, it is standard to publish Monte Carlo studies that
> show that the new algorithm is "better" in at least some cases or
> directions than the existing algorithms (or statistical estimators and
> tests), and often they use published case studies or applied results
> to show how the conclusion would differ or be unchanged
> (Just for illustration: the workflow of some friends of mine that are
> theoretical econometricians.
> First write the paper with the heavy theory and proofs, then start to
> write the MonteCarlo, the first version doesn't deliver the results
> that can be expected based on the theory, look for bugs and fix those,
> rerun MonteCarlo, iterate, then find different test cases, simulated
> data generating processes, and show where it works and where it
> doesn't, and check the theoretical explanation/intuition why it
> doesn't work in some cases. Submit only cases that work, and write a
> footnote for the other cases.)

Here is an example of some incorrect theory combined with a simulation
showing correct results.  It turned out there were two separate errors
in theory which balanced each other out in the particular case used
for the simulation.

This paper reviews and corrects the previous paper:


Quote from section 2.2:

"In general the variance of the parameter estimates is underestimated
by equation (3) but the
estimator of the variance is overestimated by equation (6), so that
the two tend to cancel
each other out in the T statistic (5). It can be shown that they do
cancel out almost exactly
for the random regressors that were chosen for validating the methods,
which explains why
the biases were not obsereved. However for other non-random regressors
these e®ects do not
cancel and large discrepancies can occur."

I think that points at the need to write tests for all parts not just the whole.



More information about the SciPy-User mailing list