[SciPy-dev] Scipy workflow (and not tools).

josef.pktd@gmai... josef.pktd@gmai...
Wed Feb 25 13:15:24 CST 2009

On Wed, Feb 25, 2009 at 1:16 PM, Bruce Southey <bsouthey@gmail.com> wrote:
> Travis E. Oliphant wrote:
>> Charles R Harris wrote:
>>> I don't think there are enough eyes at this point for a strict review
>>> policy. How many of the current packages have any maintainer? Who was
>>> maintaining the stats package before Josef got involved? How many
>>> folks besides Robert could look over the changes usefully? How many
>>> folks looked over Travis' recent addition to optimize?  Who is working
>>> on the interpolation package?
>>> I think at this point we would be better off trying to recruit at
>>> least one person to "own" each package. For new packages that is
>>> usually the person who committed it but we also need ownership of
>>> older packages. Someone with a personal stake in a package is likely
>>> to do more for quality assurance at this point than any amount of
>>> required review.
>> Yes,  my feelings exactly.   Quality goes up when people who have a
>> personal stake or attachment to the code are engaged.   How do we get
>> more of this to happen?   Formal review processes can actually have at
>> least some negative impact in getting people engaged.     Let's make a
>> tweak here and a tweak there.   Right now, I'm of the opinion that
>> whatever makes the *workflow* of people like David, Pauli, Jarrod,
>> Robert K, Robert C, Nathan, Matthew, Charles, Anne, Andrew, Gael, and
>> Stefan (and others big contributors I may have missed) easier, I'm
>> totally in favor of.    If that is a DVCS and/or something different
>> than Trac, then let's do that.
>> It sounds like we are making steps in that direction which is excellent.
> Really based on the discussion (including the latter comments), it
> appears to me that this discussion has moved towards what sort of
> developmental structure should scipy be using with a DVCS.
> I viewed much of the discussion following what sort of happens with
> Linux kernel development since they adopted DVCS starting with
> Bitkeeper. Jonathon Corbet has an interesting article on this
> http://ldn.linuxfoundation.org/book/how-participate-linux-community .
> Essentially there are sub-maintainer trees that feed into the testing
> tree (-mm), the staging tree (where patches are applied against that
> should minimize tree divergence) and hopefully Linus's tree.  During
> that process is informal code review for at least bug fixes as new or
> major features still have a problem with code review. In some aspects,
> the man-power restriction with the Linux kernel development has been
> removed because code no longer has to flow through a single node. So
> this allows a user to get easily get code not only from these trees but
> also other developers.
> So scipy could do something similar where the use of DVCS which would
> hopefully this would reduce the burden on people like Robert and you.
> I do not see a real need at this time to say you 'own' that module and
> you must 'control' the development of it. I would suspect that
> 'ownership' of scipy components will naturally develop over time and,
> thus, should not be forced upon anyone.
> If sub-trees were created it would permit a sharing of incomplete code
> so that the burden of developing appropriate tests, writing
> documentation and testing can be distributed to interested parties. This
> would also foster mentoring and getting hands dirty in a positive way.
>>> I don't have a problem with folks complaining about missing tests,
>>> etc., but I worry that if we put too many review steps into the
>>> submission path there won't be enough people to make it work.
>> This is exactly the way I feel....  I don't want to imply at all that we
>> shouldn't be bugging each other about documentation and testing.  I
>> personally welcome any reminders in that direction.  I am just worried
>> about whether or not we are really solving the real problems that make
>> it hard to contribute by instituting policy rather than providing
>> examples of code to model.
>  From this it appears that in order to get code into scipy then you have
> to have all this documentation and tests. But really my concern is
> strict requirements of tests and documentation is that we will get
> minimal tests and inferior documentation or nothing at all. Rather I
> hope that the burden can be shifted from one person to a group of people
> then perhaps we can get more extensive tests and documentation as well
> as people actually testing the code on different systems with hopefully
> realistic situations. So at least there is some fix in some tree for a
> problem and eventually the rest of it will follow by the time everything
> is ready for mainline inclusion.
> Regards
> Bruce

R has recently the discussion on quality control for statistical functions
especially for use in the health industry, because they were critized
by SAS that open source has insufficient guarantee for correctness.

Scipy is not in the same group, but I think a review process before
commit, if it attracts more users, will make it more likely to catch any
problems. There are many good statistical tools in scipy, however
until recently I wasn't sure what I would use in a "serious" application
since there are too many, possibly incorrect results. The second
case is that, more eyes might catch problems with refactoring, given
that the test coverage is still shaky, and it might reduce the chance
for dead code. Two examples for stats related functions: The recent
removal of var and mean from scipy stats broke several functions
that didn't have test coverage and so didn't show up in the tests.
The second case is the recent addition of curvefit where the
documentation didn't correspond to what was actually calculated.

In both cases the review and corrections happened after the commit,
since I keep an eye on any stats related commits. Without the
review we might get misleading (or incorrect) numbers and broken
code. And I've seen a lot of both in stats.

But I also hope that any changes in the workflow helps in spreading
the work of testing and documentation and makes adding new code
easier and safer.


More information about the Scipy-dev mailing list