[Numpy-discussion] ANN: PyTables 1.3 released
faltet at carabos.com
Sat Apr 1 11:01:05 CST 2006
Announcing PyTables 1.3
This is a new major release of PyTables. The most remarkable feature
added in this version is a complete support (well, almost, because
unicode arrays are not there yet) for NumPy objects. Improved support
for native HDF5 is there as well. As an aside, I'm happy to inform you
that the PyTables web site (http://www.pytables.org) has been converted
into a wiki so that users can contribute to the project with recipes or
any other document. Try it out!
Go to the (new) PyTables web site for downloading the beast:
or keep reading for more info about the new features and bugs fixed.
Changes more in depth
- Support for NumPy objects in all the objects of PyTables, namely:
Array, CArray, EArray, VLArray and Table. All the numerical and
character (except unicode arrays) flavors are supported as well as
plain and nested heterogeneous NumPy arrays. PyTables leverages the
adoption of the array interface
(http://numeric.scipy.org/array_interface.html) for a very efficient
conversion between all the numarray (which continues to be the native
flavor for PyTables) object to/from NumPy/Numeric.
- The FLAVOR schema in PyTables has been refined and simplified. Now,
the only 'flavors' allowed for data objects are: "numarray", "numpy",
"numeric" and "python". The changes has been made so that they are
fully backward compatible with existing PyTables files. However, when
users would try to use old flavors (like "Numeric" or "Tuple") in
existing code, a ``DeprecationWarning`` will be issued in order to
encourage them to migrate to the new flavors as soon as possible.
- Nested fields can be specified in the "field" parameter of Table.read
by using a '/' as a separator between fields (e.g. 'Info/value').
- The Table.Cols accessor has received a new ``__setitem__()`` method
that allows doing things like:
table.cols = record
table.cols.x[4:1000:2] = array # homogeneous column
table.cols.Info[4:1000:2] = recarray # nested column
- A clean-up function (using ``atexit``) has been registered so that
remaining opened files are closed when a user hits a ^C, for
example. That would help to avoid ending with corrupted files.
- Native HDF5 compound datasets that are contiguous are supported
now. Before, only chunked datasets were supported.
- Updated (and much improved) sections about compression issues in the
User's Guide. It includes new benchmarks made with PyTables 1.3 and a
exhaustive comparison between Zlib, LZO and bzip2.
- The HTML version of manual is made now from the docbook2html package
for an improved look (IMO).
- Solved a problem when trying to save CharArrays with itemsize = 0 as
attributes of nodes. Now, these objects are pickled in order to
prevent HDF5 from crashing.
- Fixed some alignment issues with nested record arrays under certain
architectures (e.g. PowerPC).
- Fixed automatic conversions when a VLArray is read in a platform with
a byte ordering different from the file.
- Due to recurrent problems with the UCL compression library, it has
been declared deprecated from this version on. You can still compile
PyTables with UCL support (using the --force-ucl), but you are urged
to not use it anymore and convert any existing datafiles with UCL to
other supported library (zlib, lzo or bzip2) with the ``ptrepack``
- Please, see ``RELEASE-NOTES.txt`` file.
Important note for Windows users
If you are willing to use PyTables with Python 2.4 in Windows platforms,
you will need to get the HDF5 library compiled for MSVC 7.1, aka .NET
2003. It can be found at:
Users of Python 2.3 on Windows will have to download the version of HDF5
compiled with MSVC 6.0 available in:
What it is
**PyTables** is a package for managing hierarchical datasets and
designed to efficiently cope with extremely large amounts of data (with
support for full 64-bit file addressing). It features an
object-oriented interface that, combined with C extensions for the
performance-critical parts of the code, makes it a very easy-to-use tool
for high performance data storage and retrieval.
PyTables runs on top of the HDF5 library and numarray (but NumPy and
Numeric are also supported) package for achieving maximum throughput and
Besides, PyTables I/O for table objects is buffered, implemented in C
and carefully tuned so that you can reach much better performance with
PyTables than with your own home-grown wrappings to the HDF5 library.
PyTables sports indexing capabilities as well, allowing doing selections
in tables exceeding one billion of rows in just seconds.
This version has been extensively checked on quite a few platforms, like
Linux on Intel32 (Pentium), Win on Intel32 (Pentium), Linux on Intel64
(Itanium2), FreeBSD on AMD64 (Opteron), Linux on PowerPC (and PowerPC64)
and MacOSX on PowerPC. For other platforms, chances are that the code
can be easily compiled and run without further issues. Please, contact
us in case you are experiencing problems.
Go to the PyTables web site for more details:
About the HDF5 library:
To know more about the company behind the PyTables development, see:
Thanks to various the users who provided feature improvements, patches,
bug reports, support and suggestions. See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors. Many
thanks also to SourceForge who have helped to make and distribute this
package! And last but not least, a big thank you to THG
(http://www.hdfgroup.org/) for sponsoring many of the new features
recently introduced in PyTables.
Share your experience
Let us know of any bugs, suggestions, gripes, kudos, etc. you may
-- The PyTables Team
>0,0< Francesc Altet http://www.carabos.com/
V V Cárabos Coop. V. Enjoy Data
More information about the Numpy-discussion