[SciPy-dev] Scipy workflow (and not tools).

josef.pktd@gmai... josef.pktd@gmai...
Tue Feb 24 19:59:08 CST 2009


On Tue, Feb 24, 2009 at 7:20 PM, Anne Archibald
<peridot.faceted@gmail.com> wrote:
> 2009/2/24 Robert Kern <robert.kern@gmail.com>:
>> On Tue, Feb 24, 2009 at 15:13, Charles R Harris
>> <charlesr.harris@gmail.com> wrote:
>>
>>> I think at this point we would be better off trying to recruit at least one
>>> person to "own" each package. For new packages that is usually the person
>>> who committed it but we also need ownership of older packages. Someone with
>>> a personal stake in a package is likely to do more for quality assurance at
>>> this point than any amount of required review.
>>
>> "Ownership" has a bad failure mode. Case in point: nominally, I am the
>> "owner" of scipy.stats and numpy.random and completely failed to move
>> Josef's patches along.
>
> It seems to me that scipy's development model is a classic open-source
> "scratch an itch": it bothered me that people were forever asking
> questions that needed spatial data structures, so I took a weekend and
> wrote some. I don't foresee this changing without some major change
> (e.g. a company suddenly hiring ten people to work full-time on
> scipy). So the question is how to make this model produce reliable
> code.
>
> Suggestions people have made to accomplish this:
>
> (1) Don't allow anything into SVN without tests and documentation.
> (2) Make sure everything gets reviewed before it goes in.
> (3) Appoint owners for parts of scipy.

I think that having someone who feels responsible for the different parts
of scipy is the main problem. And whatever we do to make this
easier and that expands the number of active participants will be an
improvement.

I don't feel like the "owner" of stats, but it's more a case of adoption.
I like the centralized trac timeline since it is easy to monitor new
tickets and changes to svn. And I'm doing code review ex-post
(after commits) to minimize new problems. This is also an incentive
to increase test coverage to complain immediately if something
breaks. (My main problem with trac was monitoring old
tickets, which I haven't figured out how to do efficiently.)

I think for packages that have a responsible and responsive "maintainer"
my experience with the mailing list was pretty good. On the other hand
looking at the mailing list history, I saw many comments and threads about
the problems in stats, and while some problems got fixed, many reports
of problems were never followed by any action. Which is also pretty
frustrating for the user.

A new workflow and code review might help, but if there is nobody, that
adopts the orphaned subpackages, it will be just another place to store
comments.

I'm a huge fan of full test coverage, but writing full verified tests is for me
a lot of work and I still have a backlog of bugfixes because I haven't
had time to write sufficient tests.

Also, I think that the commitment to maintain and increase test
coverage should be sufficient for some cases.
For example, in stats.mstats Pierre rewrote and added statistics
functions for masked arrays, the test coverage is good, but there
are still quite a few functions not covered and still some rough edges,
but overall it looks in better condition than scipy.stats did. In this case
I find it useful to have the full set of functions, that Pierre wrote,
available immediately than adding them piecemeal as he finds time
to write tests.

The documentation editor is a good example where an easier
access by new contributors increased the number of participants,
and maybe collective writing and review of code and tests can
lower the entry barrier.
But for now, I think, I still need to be able toget some bug fixes
into stats without a large beaurocracy, or with an
expiration date on any code review.

Josef


More information about the Scipy-dev mailing list