[SciPy-User] ANN: pandas 0.4.2

Wes McKinney wesmckinn@gmail....
Mon Oct 3 00:12:28 CDT 2011


I'm pleased to announce the 0.4.2 pandas release, it includes a number
of bugfixes from the recent 0.4.1 version but also includes a host of
new speed optimizations primarily in the core data alignment and
joining/merging routines. The most significant enhancement of which is
the introduction of a specialized int64-based Index class which will
help enable some of the fastest open source time series processing
available, using the new NumPy datetime64 dtype. Please see the full
release notes. Thanks to all for the feedback on recent releases and
bug reports.

best,
Wes

What is it
==========
pandas is a Python package providing fast, flexible, and expressive
data structures designed to make working with “relational” or
“labeled” data both easy and intuitive. It aims to be the fundamental
high-level building block for doing practical, real world data
analysis in Python. Additionally, it has the broader goal of becoming
the most powerful and flexible open source data analysis /
manipulation tool available in any language.

Links
=====
Release Notes: https://github.com/wesm/pandas/blob/master/RELEASE.rst
Documentation: http://pandas.sourceforge.net
Installers: http://pypi.python.org/pypi/pandas
Code Repository: http://github.com/wesm/pandas
Mailing List: http://groups.google.com/group/pystatsmodels
Blog: http://blog.wesmckinney.com

pandas 0.4.2 Release Notes
==========================

**Release date:** 10/3/2011

This is a performance optimization release with several bug fixes. The new
Int64Index and new merging / joining Cython code and related Python
infrastructure are the main new additions

**New features / modules**

  - Added fast `Int64Index` type with specialized join, union,
    intersection. Will result in significant performance enhancements for
    int64-based time series (e.g. using NumPy's datetime64 one day) and also
    faster operations on DataFrame objects storing record array-like data.
  - Refactored `Index` classes to have a `join` method and associated data
    alignment routines throughout the codebase to be able to leverage optimized
    joining / merging routines.
  - Added `Series.align` method for aligning two series with choice of join
    method
  - Wrote faster Cython data alignment / merging routines resulting in
    substantial speed increases
  - Added `is_monotonic` property to `Index` classes with associated Cython
    code to evaluate the monotonicity of the `Index` values
  - Add method `get_level_values` to `MultiIndex`
  - Implemented shallow copy of `BlockManager` object in `DataFrame` internals

**Improvements to existing features**

  - Improved performance of `isnull` and `notnull`, a regression from v0.3.0
    (GH #187)
  - Wrote templating / code generation script to auto-generate Cython code for
    various functions which need to be available for the 4 major data types
    used in pandas (float64, bool, object, int64)
  - Refactored code related to `DataFrame.join` so that intermediate aligned
    copies of the data in each `DataFrame` argument do not need to be
    created. Substantial performance increases result (GH #176)
  - Substantially improved performance of generic `Index.intersection` and
    `Index.union`
  - Improved performance of `DateRange.union` with overlapping ranges and
    non-cacheable offsets (like Minute). Implemented analogous fast
    `DateRange.intersection` for overlapping ranges.
  - Implemented `BlockManager.take` resulting in significantly faster `take`
    performance on mixed-type `DataFrame` objects (GH #104)
  - Improved performance of `Series.sort_index`
  - Significant groupby performance enhancement: removed unnecessary integrity
    checks in DataFrame internals that were slowing down slicing operations to
    retrieve groups
  - Added informative Exception when passing dict to DataFrame groupby
    aggregation with axis != 0

**API Changes**

None

**Bug fixes**

  - Fixed minor unhandled exception in Cython code implementing fast groupby
    aggregation operations
  - Fixed bug in unstacking code manifesting with more than 3 hierarchical
    levels
  - Throw exception when step specified in label-based slice (GH #185)
  - Fix isnull to correctly work with np.float32. Fix upstream bug described in
    GH #182
  - Finish implementation of as_index=False in groupby for DataFrame
    aggregation (GH #181)
  - Raise SkipTest for pre-epoch HDFStore failure. Real fix will be sorted out
    via datetime64 dtype

Thanks
------

- Uri Laserson
- Scott Sinclair


More information about the SciPy-User mailing list