[SciPy-User] Scipy stack: standard packages (poll)
Wed Oct 3 20:00:10 CDT 2012
On Wed, Oct 3, 2012 at 12:06 PM, Thomas Kluyver <email@example.com> wrote:
> Following on from recent discussion here and on the numfocus list, I'm
> trying to work out the set of packages that should make up a
> standardised 'scipy stack'. We've determined that Python, numpy,
> scipy, matplotlib and IPython are to be included. Then there's a list
> that have got a 'maybe': pandas, statsmodels, sympy, scikits-learn,
> scikits-image, PyTables, h5py, NetworkX, nose, basemap & netCDF4.
> My aim is to have a general set of packages that you can do useful
> work with, and will stand up to the competition (particularly Matlab &
> R), but without gaining too many subject-specific packages. But I
> don't know what's generally useful and what's subject specific.
> Vote at: http://www.doodle.com/ma6rnpnbfc6wivu9
> It's set up so you can vote for or against a package, or abstain if
> you're not sure - I've abstained on most of them myself.
Why I'm in favor of a "Big Scipy":
Using Travis's popularity criterion: google has for "from scipy import
stats" "About 104,000 results"
scipy.stats is a bit of an outlier among the scipy subpackages in that
it is more application oriented. I uses many tools from other
scipy.stats is in turn used by many application packages, if they
don't want to bother coding a version of the statistics themselves.
If you are in a field with a strong python background, then there are
field specific packages available, cars, sherpa in the recent spectra
discussion, nipy/pymvpa, pysal, ...
If you are not in one of those python fields (or want to try something
non-standard), then you have to use a general purpose library, or code
scikit-learn, statsmodels and scikit-image try to be the general
purpose extension of scipy (the package), and there is a lot of useful
and reusable code.
for example, clustering with sklearn
a linear regression, or a polyfit if you have outliers use statsmodels
that's not field specific.
(I'm not using scikits-image, but I assume there are similar features,
given the mailing list)
(I would also like to use a scikits-signal, but it's still is vapor-ware.)
As a user I don't care (much) about a new meta-package, python-xy and
Gohlke have (almost) all I need an easy_install away, and a lot more
than is under discussion here.
Where I do see a potentially big advantage as a maintainer of
statsmodels is in code sharing and being able to rely on more
consistent package versions by users.
Currently we are reluctant to add any additional dependencies to
statsmodels not only because it requires more work by users, but also
because it requires work for us to keep track of changes across
versions of the different packages.
We currently maintain compatibility modules for python between 2.5 and
3.2, and for numpy >= 1.4, scipy >= 0.7 and pandas > 0.7.1. Increasing
the number of dependencies increases the number of version
combinations that need to be tested.
That's also a good reason for me not to split up scipy, keeping track
of the versions of 8 (linalg, optimize, signal, sparse, stats,
fftpack, integrate, interpolate, special and maybe some others)
packages sounds like a lot of fun. (I wouldn't mind splitting off
I would prefer to go the other way, and have a "scipy-big", where I
can use any functions from any of the packages without having to worry
too much about whether they are available on a users machine or about
version compatibilities across packages.
As a statsmodels developer I would be glad about the additional
advertising and the hopefully faster development of or convergence to
a standard through the scipy-stack discussed here, but, at least in
the "data-analysis" area, I think we are well on our way to get to the
"big-scipy" and fill in the major gaps compared to other languages or
data analysis packages.
> SciPy-User mailing list
More information about the SciPy-User