[SciPy-user] Modular toolkit for Data Processing (MDP) released

Pietro Berkes and Tiziano Zito p.berkes at biologie.hu-berlin.de
Tue Aug 24 13:05:17 CDT 2004

We are pleased to announce the first public release of the MDP library
for Python (http://mdp-toolkit.sourceforge.net). This package has been
developed in the context of computational neuroscience research, but
it should fit the needs of a larger audience of scientists and

Modular toolkit for Data Processing is a Python library to implement
data processing elements (nodes) and to combine them into data
processing sequences (flows).

A node corresponds to a learning algorithm or to a generic data
processing unit. Each node can have a training phase, during which the
internal structures are learned from training data (e.g. the weights
of a neural network are adapted or the covariance matrix is estimated)
and an execution phase, where new data can be processed forwards (by
processing the data through the node) or backwards (by applying the
inverse of the transformation computed by the node if defined). MDP is
designed to make the implementation of new algorithms easy and
intuitive, for example by setting automatically input and output
dimension and by casting the data to match the typecode (e.g. float or
double) of the internal structures. The nodes were designed to be
applied to arbitrarily long sets of data: the internal structures can
be updated successively by sending chunks of the input data (this is
equivalent to online learning if the chunks consists of single
observations, or to batch learning if the whole data is sent in a
single chunk). Already implemented nodes include Principal Component
Analysis (PCA), Independent Component Analysis (ICA), and Slow
Feature Analysis (SFA).

A flow consists in an acyclic graph of nodes (currently only node
sequences are implemented). The data is sent to an input node and is
successively processed by the following nodes on the graph. The
general flow implementation automatizes the training, execution and
inverse execution (if defined) of the whole graph. A subclass of the
basic flow class allows user-supplied checkpoint functions to be
executed at the end of each phase, for example to save the internal
structures of a node for later analysis.

Best regards,

     Pietro Berkes and Tiziano Zito

{p.berkes, t.zito}@biologie.hu-berlin.de
Institute for Theoretical Biology
Humboldt University
Invalidenstrasse 43
D-10115 Berlin, Germany

More information about the SciPy-user mailing list