[Numpy-discussion] ANN: statsmodels 0.4.0

josef.pktd@gmai... josef.pktd@gmai...
Fri Apr 27 13:52:00 CDT 2012

We are pleased to announce the release of statsmodels 0.4.0.

The big changes in this release are that most models can now be used
with Pandas dataframes, and that we dropped the scikits namespace.
Importing scikits.statsmodels is still possible but will be removed in
the future.
Pandas is now a required dependency.

For more changes including some breaks in backwards compatibility see below.

Josef and Skipper

What it is

Statsmodels is a Python package that provides a complement to scipy
for statistical computations including descriptive statistics and
estimation and inference for statistical models.

Documentation for the 0.4 version is currently at

Main Changes and Additions in 0.4.0

* Added pandas dependency.
* Cython source is built automatically if cython and compiler are present
* Support use of dates in timeseries models
* Improved plots
  - Violin plots
  - Bean Plots
  - QQ Plots
* Added lowess function
* Support for pandas Series and DataFrame objects. Results instances return
  pandas objects if the models are fit using pandas objects.
* Full Python 3 compatibility
* Fix bugs in genfromdta. Convert Stata .dta format to structured array
  preserving all types. Conversion is much faster now.
* Improved documentation
* Models and results are pickleable via save/load, optionally saving the model
* Kernel Density Estimation now uses Cython and is considerably faster.
* Diagnostics for outlier and influence statistics in OLS
* Added El Nino Sea Surface Temperatures dataset
* Numerous bug fixes
* Internal code refactoring
* Improved documentation including examples as part of HTML
* ...

*Changes that break backwards compatibility*

* Deprecated scikits namespace. The recommended import is now::
      import statsmodels.api as sm

* model.predict methods signature is now (params, exog, ...) where before
  it assumed that the model had been fit and omitted the params argument.
  (this removed circularity between models and results instances)
* For consistency with other multi-equation models, the parameters of MNLogit
  are now transposed.
* tools.tools.ECDF -> distributions.ECDF
* tools.tools.monotone_fn_inverter -> distributions.monotone_fn_inverter
* tools.tools.StepFunction -> distributions.StepFunction

Main Features

* linear regression models: Generalized least squares (including
weighted least squares and
  least squares with autoregressive errors), ordinary least squares.
* glm: Generalized linear models with support for all of the one-parameter
  exponential family distributions.
* discrete: regression with discrete dependent variables, including
Logit, Probit, MNLogit, Poisson, based on maximum likelihood
* rlm: Robust linear models with support for several M-estimators.
* tsa: models for time series analysis
  - univariate time series analysis: AR, ARIMA
  - vector autoregressive models, VAR and structural VAR
  - descriptive statistics and process models for time series analysis
* nonparametric : (Univariate) kernel density estimators
* datasets: Datasets to be distributed and used for examples and in testing.
* stats: a wide range of statistical tests
  - diagnostics and specification tests
  - goodness-of-fit and normality tests
  - functions for multiple testing
  - various additional statistical tests
* iolib
  - Tools for reading Stata .dta files into numpy arrays.
  - printing table output to ascii, latex, and html
* miscellaneous models
* sandbox: statsmodels contains a sandbox folder with code in various stages of
  developement and testing which is not considered "production ready".
  This covers among others Mixed (repeated measures) Models, GARCH
models, general method
  of moments (GMM) estimators, kernel regression, various extensions
to scipy.stats.distributions,
  panel data models, generalized additive models and information
theoretic measures.

Where to get it

The master branch on GitHub is the most up to date code


Source download of release tags are available on GitHub


Binaries and source distributions are available from PyPi


More information about the NumPy-Discussion mailing list