[SciPy-dev] Scikits and stuff
Thu Dec 27 22:01:02 CST 2007
On Dec 27, 2007 8:34 PM, Travis E. Oliphant <firstname.lastname@example.org> wrote:
> Hey everyone,
> In preparation for doc-day tomorrow, I've been thinking about scikits
> and its relationship to scipy. There seem to be three distinct
> purposes of scikits:
> 1) A place for specialized tool boxes that build on numpy and/or scipy
> that live under a common name-space, but are too specialized to live in
> scipy itself.
> 2) A place for GPL or similarly-licensed tools so that people can
> understand what they are getting and appropriate use can be made.
> 3) A place for modularly-installed scipy tools.
> Given these three purposes. It seems that we should have three name-spaces:
> 1) scikits
> 2) ??? perhaps scigpl
Loses points for ugly :)
> 3) scipy (why not use the same namespace --- there is a technological
> hurdle that may prevent egg distribution of these until we fix it. But,
> I think we can fix it and would rather not invent another name-space for
> something that should be scipy).
I don't like the idea of the base scipy package being a moving target.
Much like the standard library (for any given python version) is a
known quantity, I'd like scipy to be the same.
Rather than rehash it, I'm just going to copy here for the public
discussion the reply I sent in our private chat on this topic. I
think it states my view clearly, and others can then disagree. I'm
pasting it in full so it reads normally, even if you've already
addressed the domain-specific aspects above.
What about domain-specific functionality, for example? I think that
it's important that 'scipy version x.y' is a known, fixed quantity, so
that installing it means having a well-defined set of tools. But over
time, I can foresee lots of domain-specific functionality that is
scipy-based being developed, and I simply don't think it's realistic
(for many reasons) to pull all of it into scipy itself.
Much like Matlab has 'toolboxes' and Mathematica has a simliar
concept, I think there's value for the users in having a well-defined
location where they can find lots of extra tools that are related to
scipy, but somewhat independent of it in their development. The
scikits would all honor similar naming conventions and doucumentation,
we could have a centralized page listing them so they are easy to
find, and users could add (via namespace packages) their own scikits
without necessarily having write privileges over the central scipy
Basically in my mind the distinction is not "we did a poor job
modularising scipy" but rather:
- scipy: core library with large amounts of Fortran (as much of netlib
as is reasonable) and functionality that can be reasonably considered
to be of wide appeal. All of it BSD-compatible.
- scikits: toolkits under a single umbrella namespace, easy to find
(we can provide tools for this), with unified naming, coding,
documentation and example conventions. Domain-specific codes go here,
as well as GPL or patent-encumbered codes (but still open source).
[Edit: I'm not sure if *any* patent-encumbered code is really a good
idea, so perhaps this last sentence should be removed].
In addition, scikits could be the staging area for new projects to be
developed until they mature a bit, for eventual inclusion into scipy
itself. This would give us a monitoring mechanism to ensure that a
contributor is developing a package according to the scipy standards
of naming, quality, documentation, etc. while allowing the developer
to proceed at his own pace without locking into the scipy release
schedule. Eventually if a project turns out to work very well and is
deemed of full general interest, it can be folded into scipy itself
(like what happened to ElementTree or optik in the stdlib, for
example). This way developers can also get users to follow their own
release schedule, without the problems we have today with the sandbox
(scikits should be available via eggs, so users can easily grab and
update scikits they're interested in).
For the above scipy.foo discussion, if foo==clustering, it probably
belongs in scipy itself (people in all disciplines use that), but a
DNA sequence analysis tool that finds clustering patterns directly
operating on standard bioinformatics formats should probably be a
I don't know about the others, but I find the above distinction
reasonably clear and useful in practice. But perhaps I'm totally
missing the mark.
More information about the Scipy-dev