[SciPy-dev] layering discussion
Fernando Perez
Fernando.Perez at colorado.edu
Fri Sep 23 00:21:19 CDT 2005
sent from scipy bof...
Level 0: A multidimensional array object (possibly in the core Python language)
-------------------------------------------------------------------------------
These arrays should support at least arithmetic operations, and hopefully have
a well-defined ufunc API. It would be even better if the math library
functions could handle them either transparently or with a mirror
'arraymath/amath' library of equivalent array functions.
In terms of additional functionality, only random arrays should be supported,
perhaps via the random module in the stdlib. More sophisticated things like
FFTs, linear algebra, Lapack access, etc, does not belong in the core.
I'm torn about a Matrix object (for which A*B does true matrix multiplication
and not elementwise one) should go there as well. It feels a bit much, but it
would round up the basic datatype support for numerical work really nicely.
The library writers can then add the _fuctionality_ that would build upon
these basic types.
[ Note added by Pearu Peterson:
Note that Matrix could use optimized blas/lapack libraries. So, either
I wouldn't put Matrix into level 0 layer or there should be a
well-defined interface for enhancing matrix operations (matrix*vector,
matrix*matrix,inverse(matrix),etc). May be level 1 is a better place
for Matrix where linear algebra routines are available.
]
Level 1 - Generic scientific computing functionality
-----------------------------------------------------
The rest of what today goes into numerix, perhaps merged into a scipy_base
package and made very easy to install, with all dependencies shipped and
binary installers provided regularly for all platforms. This would provide
the minimal set of functionality for common 'light' scientific use, and it
should include plotting. It would be something at the level of a basic matlab
or octave, without any extension toolboxes.
Matplotlib seems like a good candidate for integration at this level. Yes,
choice is good. But having to study the pros and cons of 40 plotting
libraries before you can plot a line is a sure way to send new potential users
running for the hills. The python community is having the same problem with
the whole 'web frameworks' topic, and sometimes these issues are best
addressed simply by making a solid choice, and living with it. Individual
users can still use whatever personal favorite they want, but people can
assume _something_ to look at data will work out of the (level 1) box.
No compilers should be required for end-users at this point (obviously
packagers will need compilers, including Fortran ones for lapack/blas
wrappers).
This level would include FFTs, basic linear algebra (lapack/blas wrapping),
and perhaps numerical integration.
This level will also include any additional packages which satisfy these
constraints:
a - Are 100% guaranteed, proven, to be easy and risk-free installations,
with the possibility to ensure up-to-date distribution of binaries for
*nix, OSX and win32.
b - Do not require a compiler in the target system to be _used_.
c - Have a reasonably wide potential audience (tools of interest only to
super-specialized communities belong in level 3).
Once this Level is reasonably well defined, a tutorial written specifically to
target this would be extremely useful. It could be based on the current
scipy/numerix/pylab documentation.
Level 2 - Infrastructure and extension support level
----------------------------------------------------
A set of tools to support a more complex package infrastructure and extensions
in other languages. Note that this is mostly a 'glue' layer, without much
scientific functionality proper (except that the external code tools do end up
being useful to scientists who do significant code development).
Something like scipy_distutils, weave, f2py, and whatever else is needed for
easy packaging, installation and distribution of the more sophisticated
functionality.
At this point, even end users (not only packagers) will be expected to have
Fortran/C/C++ compilers available.
This level means both a set of tools AND of guidelines for where things go,
how to express dependencies, etc. Everything below this will assume this
level to be satisfied, AND will conform to these guidelines. This means that
some naming and packaging conventions will need to be clearly spelled out and
documented.
Level 3 - High-level, specialized and third-party packages
----------------------------------------------------------
These packages are installable independently of one another (though some may
depend on others, case in which this dependency should be clearly stated and
hopefully an automatic dependency mechanism will handle it). These could be
considered 'toolboxes', and will range from fairly common things to
domain-specific tools only of interest to specialists.
Much of what today is scipy would live on level 3, and a scipy-full could
still be distributed for most users. This scipy-full would encompass a lot of
functionality, since as long as it comes packaged together, all it does is
take space on disk for people. Much like what Mathematica is today: a huge
library of mathematical functions, the bulk of which most users never need,
but which is great to have at your fingertips just in case.
But the specs laid out in level 2 (and the functionality level 2 provides)
would also enable third-party projects to distribute their own level3 tools,
even outside of the scipy project management and release cycles. In this
manner, very specialized tools (only of interest to small communities, but
potentially very important for them) could be distributed and live in harmony
with the rest of the system, taking advantage of the common facilities
provided by levels 0-2.
At this level, pretty much anything goes. Difficult installations,
platform-specific packages, compiler-dependent code, ... The scipy project may
not distribute such nasties, but third parties are free to do as they wish.
But we do hope that third-party packages will rely on the level 0-2
functionality and conventions, to make life easier for the community at large.
It should be noted that scipy developers have started to move wrappers
to various libraries such as blas,lapack,fftpack,quadpack,etc to "lib"
package that should be used by higher level (>=3) packages as basic
computational tools. So all packages at this level can use
scipy.lib.foo* as engines for their internal needs.
Other notes by Pearu
--------------------
The main difficulty for installing scipy is related to providing
optimized blas/lapack libraries that scipy.linalg can _optionally_
use. As a rule, when ATLAS or blas/lapck libraries are properly
installed then there are no problems using them by scipy building
scripts. However, some systems provide broken or incomplete
blas/lapack libraries and when scipy is using them, all bad things can
happen. This is imho the main reason why people find it difficult to
install scipy because in such cases they end up rebuilding
ATLAS/lapack libraries that may need some experience to be succesful
in the first iteration. So, imho there are no "difficult to install
packages" in this layer as linalg, as the most sensitive one to broken
systems, is already in level 1.
Originally also scipy used linear leveling for organization of its
tools but that didn't work out long. The orders of mathematical
concepts and the corresponding implementation details often do not
match and so such a simplified organization by linear levels may not
be practical. A trivial example is where Matrix should go:
mathematically it should be in a low level while it's implementation
(of useful methods) may require higher level tools. So, may be we
should first recognize which (scipy, numerix, etc) packages can be
implemented independently from others, what would be the complete tree
of dependecies by implementation details, and then see what would be
practical structure for "scientific computing functionality for
python" concept.
More information about the Scipy-dev
mailing list