[SciPy-User] ANN: pandas 0.4.0 release

Travis Oliphant oliphant@enthought....
Mon Sep 12 15:56:34 CDT 2011

Congratulations Wes.

All that shiny functionality looks inviting.


(mobile phone of)
Travis Oliphant
Enthought, Inc.

On Sep 12, 2011, at 3:50 PM, Wes McKinney <wesmckinn@gmail.com> wrote:

> Dear all,
> I'm very pleased to announce the long-awaited release of the newest
> version of pandas. It's the product of an absolutely huge amount of
> development work primarily over the last 4 months. By the numbers:
> - Over 550 commits over 6 months
> - Codebase increased more than 60% in size
> - More than 300 new test functions, with overall > 97% line coverage
> The list of new features, improvements, and other changes is large,
> but the main bullet points are are:
> - Significantly enhanced GroupBy functionality
> - Hierarchical indexing
> - New pivoting and reshaping methods
> - Improved PyTables/HDF5-based IO class
> - Improved flat file (CSV, delimited text) parsing functions
> - More advanced label-based indexing (getting/setting)
> - Refactored former DataFrame/DataMatrix class into a single unified
> DataFrame class
> - Host of new methods and speed optimizations
> - Memory-efficient "sparse" versions of data structures for mostly NA
> or mostly constant (e.g. 0) data
> - Better mixed dtype-handling and missing data support
> For the full list of new features and enhancements since the 0.3.0
> release, I refer interested people to the release notes on GitHub (see
> link below).
> In addition, the documentation (see below) has been nearly completely
> rewritten and expanded to cover almost all of the features of the
> library in great detail:
> http://pandas.sourceforge.net
> I expect more frequent releases of pandas going forward, especially
> given the breadth and scope of the new functionality. I look forward
> to user feedback (good and bad) on all the new functionality. Special
> thanks to all the users who contributed bug reports, feature requests,
> and ideas to this release.
> best,
> Wes
> Links
> =====
> Release Notes: https://github.com/wesm/pandas/blob/master/RELEASE.rst
> Documentation: http://pandas.sourceforge.net
> Installers: http://pypi.python.org/pypi/pandas
> Code Repository: http://github.com/wesm/pandas
> Mailing List: http://groups.google.com/group/pystatsmodels
> Blog: http://blog.wesmckinney.com
> What is it
> ==========
> **pandas** is a `Python <http://www.python.org>`__ package providing fast,
> flexible, and expressive data structures designed to make working with
> "relational" or "labeled" data both easy and intuitive. It aims to be the
> fundamental high-level building block for doing practical, **real world** data
> analysis in Python. Additionally, it has the broader goal of becoming **the
> most powerful and flexible open source data analysis / manipulation tool
> available in any language**. It is already well on its way toward this goal.
> pandas is well suited for many different kinds of data:
>  - Tabular data with heterogeneously-typed columns, as in an SQL table or
>    Excel spreadsheet
>  - Ordered and unordered (not necessarily fixed-frequency) time series data.
>  - Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
>    column labels
>  - Any other form of observational / statistical data sets. The data actually
>    need not be labeled at all to be placed into a pandas data structure
> The two primary data structures of pandas, :class:`Series` (1-dimensional)
> and :class:`DataFrame` (2-dimensional), handle the vast majority of typical use
> cases in finance, statistics, social science, and many areas of
> engineering. For R users, :class:`DataFrame` provides everything that R's
> ``data.frame`` provides and much more. pandas is built on top of `NumPy
> <http://www.numpy.org>`__ and is intended to integrate well within a scientific
> computing environment with many other 3rd party libraries.
> Here are just a few of the things that pandas does well:
>  - Easy handling of **missing data** (represented as NaN) in floating point as
>    well as non-floating point data
>  - Size mutability: columns can be **inserted and deleted** from DataFrame and
>    higher dimensional objects
>  - Automatic and explicit **data alignment**: objects can be explicitly
>    aligned to a set of labels, or the user can simply ignore the labels and
>    let `Series`, `DataFrame`, etc. automatically align the data for you in
>    computations
>  - Powerful, flexible **group by** functionality to perform
>    split-apply-combine operations on data sets, for both aggregating and
>    transforming data
>  - Make it **easy to convert** ragged, differently-indexed data in other
>    Python and NumPy data structures into DataFrame objects
>  - Intelligent label-based **slicing**, **fancy indexing**, and **subsetting**
>    of large data sets
>  - Intuitive **merging** and **joining** data sets
>  - Flexible **reshaping** and pivoting of data sets
>  - **Hierarchical** labeling of axes (possible to have multiple labels per
>    tick)
>  - Robust IO tools for loading data from **flat files** (CSV and delimited),
>    Excel files, databases, and saving / loading data from the ultrafast **HDF5
>    format**
>  - **Time series**-specific functionality: date range generation and frequency
>    conversion, moving window statistics, moving window linear regressions,
>    date shifting and lagging, etc.
> Many of these principles are here to address the shortcomings frequently
> experienced using other languages / scientific research environments. For data
> scientists, working with data is typically divided into multiple stages:
> munging and cleaning data, analyzing / modeling it, then organizing the results
> of the analysis into a form suitable for plotting or tabular display. pandas
> is the ideal tool for all of these tasks.
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user

More information about the SciPy-User mailing list