[SciPy-User] peer review of scientific software
Sat Jun 1 16:39:37 CDT 2013
On Sat, Jun 1, 2013 at 8:35 AM, <firstname.lastname@example.org> wrote:
> On Tue, May 28, 2013 at 10:34 PM, Matthew Brett <email@example.com> wrote:
>> On Tue, May 28, 2013 at 7:18 PM, Paulo Jabardo <firstname.lastname@example.org> wrote:
>>> I'm an engineer working in research but I spend a good deal of time coding.
>>> What I've seen with most of my colleagues and friends is that they will only
>>> code whenever it is extremely necessary for an immediate application in an
>>> experiment or for their PhD. The problem starts very early, when I was
>>> beginning my studies, we were taught C (and that is still the case almost 20
>>> years later). A small percentage of the students (10%?) enjoy programming
>>> and they will profit. I really loved pointers and doing neat tricks. For the
>>> rest it was torture, plain and simple torture. And completely useless. Most
>>> students couldn't do anything useful with programming. All their suffering
>>> was for nothing. What happened later was obvious: they would avoid
>>> programming at all costs and if they had to do something they would use
>>> MS-Excel. The spreadsheets I've seen... I still have nightmares. The things
>>> they accomplished humbles me, proves that I'm a lower being. I've seen
>>> people solve partial differential equations where each cell was an element
>>> in the solution and it was colored according to the result. Beautiful but
>>> I'd rather suffer accute physical pain than to do something like that, or
>>> worse, debug such a "program". By the way, this sort of application was not
>>> a joke or a neat hack, it was actually the only way those guys knew how to
>>> solve a problem.
>>> 15 years later... I have a physics undergraduate student working with me.
>>> Very smart and interested. They still learn C and later on when they need to
>>> do something, what is it they do? Most professors use Origin. A huge
>>> improvement over Excel, but still. A couple of months ago, he had to turn in
>>> a report and since we don't have Origin, he was using Excel. I kind of felt
>>> sorry for him and I helped him out to do it in Python. He couldn't believe
>> Oh - dear; you probably saw this stuff?
> I think that's a good example that peer review works.
It's a good example of how peer-review should work, but it's very
uncommon for the reviewer to have the original spreadsheet, and that
was the key to the problem.
>>> I did my Masters and PhD in CFD. Most other students had almost no
>>> background in programming and did most things using Excel! When they had to
>>> modify some code, it was almost by accident that things worked. You can
>>> imagine what sort of code comes out of this. The professors didn't know
>>> programming much better. Just getting them to understand the concept of
>>> version control took a while.
>>> In my opinion, If schools taught, at the begining, something like
>>> Python/Octave/R instead of C, students would be able to use this knowledge
>>> easily and productively throughout their courses and eventually learn C when
>>> they really needed it.
>> That's surely one of the big arguments for Python - it is a great
>> first language, and it is capable across a wider range than Octave or
>> R - or even Excel :)
> We can mistake in any language
> I just read this
> [Correction Notice: An Erratum for this article was reported in
> Vol 17(4) of Psychological Methods (see record 2012-33502-001). The R
> code for arriving at adjusted p values for one of the methods is
> incorrect. The specific changes that need to be made are provided in
> the erratum.]
> It's still functioning peer review if a mistake is found after an
> article has been published, or after a pull request has landed in
The problem is that the peers don't get to review what has been done,
in general, they get to review what the author said had been done.
Donoho's point - about computational science - is that this can be
The question is then : does this matter? Are - most published
research findings false?
> in general:
> in the research areas that I know, the vast majority of researchers
> use Windows, and everything that is not core task is point and click.
> As long as Matlab, Stata and GAUSS, or whatever else, doesn't have
> version control build in, VC won't be used by the majority of
> researchers that I know. We didn't grow up when version control was
> popular. And we don't have IT guys to manage it for us.
> (There is the old fashioned version control of starting new
> directories at crucial stages, or for specific conference talks and
> paper submissions.)
> (DVCS are only a few years old, and it will take a few more years for
> diffusion to "non-programmers" to happen.)
We get taught some complicated things when we are training - calculus,
Does it make sense that we don't teach less complicated things like
version control and programming?
> Even after using git some time, I only find it usable because I can do
> all the regular stuff with git gui (and for unusual stuff I can use
> commandline and git gui at the same time).
> (just in case I'm misunderstood:
> I'm all in favor of best practices and unit and functional tests, but
> I don't expect that researchers will adopt it (fast) if it goes
> against their usual pattern of using tools.
> example: If you teach a software carpentry course that uses Linux,
> then I wouldn't be surprised if some users go back to their office and
> the first thing they do is use Excel. :)
In general as you know I agree completely that it doesn't make sense
to persuade people to switch from Windows to Linux at the same time as
persuading them to use good software tools. We should teach people
stuff that they will and can use, and it's a common them among
software-carpentry types that it would be better to teach Windows
people how to best use Windows rather than teaching them on a virtual
machine that they are unlikely to use for their work.
More information about the SciPy-User