[SciPy-User] ANN: statsmodels 0.4.0
Fri Apr 27 13:52:00 CDT 2012
We are pleased to announce the release of statsmodels 0.4.0.
The big changes in this release are that most models can now be used
with Pandas dataframes, and that we dropped the scikits namespace.
Importing scikits.statsmodels is still possible but will be removed in
Pandas is now a required dependency.
For more changes including some breaks in backwards compatibility see below.
Josef and Skipper
What it is
Statsmodels is a Python package that provides a complement to scipy
for statistical computations including descriptive statistics and
estimation and inference for statistical models.
Documentation for the 0.4 version is currently at
Main Changes and Additions in 0.4.0
* Added pandas dependency.
* Cython source is built automatically if cython and compiler are present
* Support use of dates in timeseries models
* Improved plots
- Violin plots
- Bean Plots
- QQ Plots
* Added lowess function
* Support for pandas Series and DataFrame objects. Results instances return
pandas objects if the models are fit using pandas objects.
* Full Python 3 compatibility
* Fix bugs in genfromdta. Convert Stata .dta format to structured array
preserving all types. Conversion is much faster now.
* Improved documentation
* Models and results are pickleable via save/load, optionally saving the model
* Kernel Density Estimation now uses Cython and is considerably faster.
* Diagnostics for outlier and influence statistics in OLS
* Added El Nino Sea Surface Temperatures dataset
* Numerous bug fixes
* Internal code refactoring
* Improved documentation including examples as part of HTML
*Changes that break backwards compatibility*
* Deprecated scikits namespace. The recommended import is now::
import statsmodels.api as sm
* model.predict methods signature is now (params, exog, ...) where before
it assumed that the model had been fit and omitted the params argument.
(this removed circularity between models and results instances)
* For consistency with other multi-equation models, the parameters of MNLogit
are now transposed.
* tools.tools.ECDF -> distributions.ECDF
* tools.tools.monotone_fn_inverter -> distributions.monotone_fn_inverter
* tools.tools.StepFunction -> distributions.StepFunction
* linear regression models: Generalized least squares (including
weighted least squares and
least squares with autoregressive errors), ordinary least squares.
* glm: Generalized linear models with support for all of the one-parameter
exponential family distributions.
* discrete: regression with discrete dependent variables, including
Logit, Probit, MNLogit, Poisson, based on maximum likelihood
* rlm: Robust linear models with support for several M-estimators.
* tsa: models for time series analysis
- univariate time series analysis: AR, ARIMA
- vector autoregressive models, VAR and structural VAR
- descriptive statistics and process models for time series analysis
* nonparametric : (Univariate) kernel density estimators
* datasets: Datasets to be distributed and used for examples and in testing.
* stats: a wide range of statistical tests
- diagnostics and specification tests
- goodness-of-fit and normality tests
- functions for multiple testing
- various additional statistical tests
- Tools for reading Stata .dta files into numpy arrays.
- printing table output to ascii, latex, and html
* miscellaneous models
* sandbox: statsmodels contains a sandbox folder with code in various stages of
developement and testing which is not considered "production ready".
This covers among others Mixed (repeated measures) Models, GARCH
models, general method
of moments (GMM) estimators, kernel regression, various extensions
panel data models, generalized additive models and information
Where to get it
The master branch on GitHub is the most up to date code
Source download of release tags are available on GitHub
Binaries and source distributions are available from PyPi
More information about the SciPy-User