[SciPy-dev] Package organization

Robert Kern rkern at ucsd.edu
Wed Oct 12 18:15:02 CDT 2005


Travis Oliphant wrote:
> Robert Kern wrote:
> 
>>I have more opinions on this, scipy package organization, and What
>>Belongs In Scipy, but I think I've used up my opinion budget for the week.
> 
> I don't know,  people who contribute code get much, much larger opinion 
> budgets.... ;-)

Okay. I warned you.

I would like to see scipy's package organization become flatter and more
oriented towards easy, lightweight, modular packaging rather than
subject matter. For example, some people want bindings to SUNDIALS for
ODEs. They could go into scipy.integrate, but that introduces a large,
needless dependency for those who just want to compute integrals. So I
would suggest that SUNDIALS bindings would be in their own
scipy.sundials package.

As another example, I might also suggest moving the simulated annealing
module out into scipy.globalopt along with diffev.py and pso.py that are
currently in my sandbox. They're all optimizers, but functionally they
are unrelated to the extension-heavy optimizers that make up the
remainder of scipy.optimize.

The wavelets library that Fernando is working on would go in as
scipy.wavelets rather than being stuffed into scipy.signal. You get the
idea.

This is also why I suggested making the scipy_core versions of fftpack
and linalg be named scipy.corefft and scipy.corelinalg and not be
aliased to scipy.fftpack and scipy.linalg. The "core" names reflect
their packaging and their limited functionality. For one thing, this
naming allows us to try to import the possibly optimized versions in the
full scipy:

  # scipy/corelinalg/__init__.py
  import basic_lite
  svd = basic_lite.singular_value_decomposition
  ...

  try:
    from scipy import linalg
    svd = linalg.svd
    ...
  except ImportError:
    pass

The explicit "core" names help us and others keep control over
dependencies. Lots of the scipy subpackages need some linear algebra
functions, but AFAICT, none actually require anything beyond what's in
scipy_core. With the "core" names, we won't accidentally add a
dependency on the full scipy.linalg without due deliberation.

Okay, What Belongs In Scipy. It's somewhat difficult to answer the
question, "Does this package belong in scipy?" without having a common
answer to, "What is scipy?" I won't pretend to have the single answer to
that last question, but I will start the dialogue based on the
rationalizations I've come up with to defend my gut feelings.

Things scipy is not:

  * A framework. You shouldn't have to restructure your programs to use
the algorithms implemented in scipy. Sometimes the algorithms themselves
may require it (e.g. reverse communication solvers), but that's not
imposed by scipy.

  * Everything a scientist will need to do computing. For a variety of
reasons, it's just not an achievable goal and, more importantly, it's
not a good standard for making decisions. A lot of scientists need a
good RDBMS, but there's no reason to put pysqlite into scipy. Enthon,
package repositories, and specialized LiveCDs are better places to
collect "everything."

  * A plotting library. (Sorry, had to throw that in.)

Things scipy is:

  * A loose collection of slightly interdependent modules for numerical
computing.

  * A common build environment that handles much of the annoying work
for numerical extension modules. Does your module rely on a library that
needs LAPACK or BLAS? If you put it in scipy, your users can configure
the location of their optimized libraries *once*, and all of the scipy
modules they build can use that information.

  * A good place to put numerical modules that don't otherwise have a
good home.

Things scipy *could* be:

  * An *excellent* build environment for library-heavy extension
modules. To realize this, we would need to integrate the configuration
portion of PETSc's BuildSystem or something equivalent. The automatic
discovery/download/build works quite well. If this were to be realized,
some packages might make more sense as subpackages of scipy. For
example, matplotlib and pytables don't have much reason to be part of
scipy right now, but if the libraries they depend on could be
automatically detected/downloaded/built and shared with other scipy
subpackages, then I think it might make sense for them to live in scipy,
too.

As Pearu suggested, as we port scipy packages to the new scipy_core we
should audit and label them. To that end:

  * gui_thread, gplt, and plt are dead, I think.

  * xplt shambles along, but shouldn't return as scipy.xplt. It can
never be *the* plotting library for scipy, and leaving it as scipy.xplt
gives people that impression.

  * scipy.cluster is sorta broken and definitely incomplete. We should
port over what's in Bio.Cluster. For that matter, there's quite a bit in
biopython we should stea^Wport (Bio.GA, Bio.HMM, Bio.KDTree,
Bio.NeuralNetwork, Bio.NaiveBayes, Bio.MaxEntropy, Bio.MarkovModel,
Bio.LogisticRegression, Bio.Statistics.lowess).

  * The bindings to ODEPACK, QUADPACK, FITPACK, and MINPACK are
handwritten. Should we mark them as, "f2py when you get the chance"?
Otherwise, they probably count as "state-of-the-art" although we could
always expand our offerings like exposing some of the other functions in
ODEPACK.

  * scipy.optimize: I think I recently ran into a regression in the old
scipy. fmin() wasn't finding the minimum of the Rosenbrock function in
the tutorial. I'll have to check that again. The simulated annealing
code could use some review.

  * scipy.special: cephes.round() seems to be buggy depending on the
platform, and I think we got a bug report about one of the other functions.

  * I will maintain much of scipy.stats. Of course, that will probably
mean, "throwing anova() into the sandbox never to return." Many of the
other functions in stats.py need vetting.

Now I'm sure I've used up my opinion budget.

-- 
Robert Kern
rkern at ucsd.edu

"In the fields of hell where the grass grows high
 Are the graves of dreams allowed to die."
  -- Richard Harter




More information about the Scipy-dev mailing list