[SciPy-User] peer review of scientific software

josef.pktd@gmai... josef.pktd@gmai...
Tue Jun 4 06:27:05 CDT 2013

On Tue, Jun 4, 2013 at 4:07 AM, Suzen, Mehmet <msuzen@gmail.com> wrote:
> On 28 May 2013 20:23, Calvin Morrison <mutantturkey@gmail.com> wrote:
>> http://arxiv.org/pdf/1210.0530v3.pdf
>> Pissed-off Scientific Programmer,
>> Calvin Morrison
> Those recent papers and discussions all talk about good practises. I
> was thinking
> today in the bus, why there are not many literature on scientific
> software development
> methodologies. One explicit paper I found was from 80s called
> A Development Methodology for Scientific Software
> Cort, G. et. al.
> http://dx.doi.org/10.1109/TNS.1985.4333629
> It is pretty classic approach for today's standard,  There is also a book about
> generic style and good practice, its a pretty good book (might be
> mentioned in this list before):
> Writing Scientific Software: A Guide to Good Style
> Suely Oliveira and David E. Stewart
> http://www.cambridge.org/9780521858960
> but I don't see any reference to modern development methodologies specifically
> address to scientific software. For example: extensions of test driven
> development,
> which would suit better than classic
> specification-design-coding-testing. Test cases
> would be directly related to what we would like to achieve in the
> first place. For example
> a generic density of something etc. I haven't heard anyone developing
> scientific software
> in this way...yet.

I think functional (not unit) testing is pretty much the standard in
the area of developing statistical algorithms even if nobody calls it
that way. And I don't know of any references to software development
for it.

When writing a library function for existing algorithms, then it is
standard to test it against existing results. Many (or most) software
packages, or articles that describe the software, show that they
reproduce existing results as test cases.
(And that's the way we work for statsmodels.)

For new algorithms, it is standard to publish Monte Carlo studies that
show that the new algorithm is "better" in at least some cases or
directions than the existing algorithms (or statistical estimators and
tests), and often they use published case studies or applied results
to show how the conclusion would differ or be unchanged

(Just for illustration: the workflow of some friends of mine that are
theoretical econometricians.
First write the paper with the heavy theory and proofs, then start to
write the MonteCarlo, the first version doesn't deliver the results
that can be expected based on the theory, look for bugs and fix those,
rerun MonteCarlo, iterate, then find different test cases, simulated
data generating processes, and show where it works and where it
doesn't, and check the theoretical explanation/intuition why it
doesn't work in some cases. Submit only cases that work, and write a
footnote for the other cases.)

And after, that there are many published articles that present
MonteCarlo studies to show that an algorithm does not work properly if
some assumptions are violated, and that something else is better.

(This doesn't mean that they produce a "pretty" piece of software, but
it shows that it works as advertised.)

I don't think I ever heard of unit or functional testing for applied
research, that is testing the workflow and not the computational


> Best,
> -m
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

More information about the SciPy-User mailing list