[Numpy-discussion] PyTables 0.5 released
falted at openlc.org
Sat May 10 04:40:01 CDT 2003
Announcing PyTables 0.5
This is the second public beta release. On this release you will find
a 20% of I/O speed improvement over the previous one (0.4), some bugs
has been fixed and support for a couple of compression (LZO and UCL)
libraries has been added, and... a long awaited Windows version is
More in detail:
- As a consequence of some twiking the write/read performance has been
improved by a 20% overall. One particular case were performance has
largely increased (0.5 is up to 6 times faster than 0.4) is when
column elements are unidimensional arrays. This impressive speed-up
is mainly because of the recent improvements in numarray 0.5
performance (good work, folks!). With that, the reading speed is
reaching its theoretical maximum (at least when using the current
data access schema).
- When reading a Table object, and the user wants to fetch column
elements which are unidimensional arrays, a copy of the array from
the I/O buffer is delivered automatically to him, so that there is
no need to make a call to .copy() method of the numarray arrays
anymore. It think this is more comfortable for the user.
- The compression was enabled by default in version 0.4, despite of
what was stated in the documentation. Now, this has been corrected
and compression is *disabled* by default.
- Support for two new compression libraries: LZO and UCL
(http://www.oberhumer.com/opensource/). These libraries are made by
Markus F.X.J. Oberhumer, and they stand for allowing *very* fast
decompression. Now, if your data is compressible, you can obtain
better reading speed than if not using compression at all!. The
improvement is still more noticeable if your are dealing with
extremely large (and compressible) data sets. Read the online
documentation for more info about that:
- A couple of memory leaks has been isolated and fixed (it was
hard, but I finally did it!).
- A bug with column ordering of tables that happens in some special
situations has been fixed (thanks to Stan Heckman for reporting this
and suggesting the patch).
- File class has now an 'isopen' attribute in order to check if a file
is open or not.
- Updated documentation, specially for giving advice about the use of
the new compression libraries. See "Compression issues" subsection,
(also on the web:
- Added more unit tests (up to 218 now!)
- PyTables has been tested against newest numarray 0.5 and it works
just fine. It even works well with Python 2.3b1.
- And last, but not least, a Windows version is available!. Thanks to
Alan McIntyre for its porting!. There is even a binary ready for
click and install.
What it is
In short, PyTables provides a powerful and very Pythonic interface to
process and organize your table and array data on disk.
Its goal is to enable the end user to manipulate easily scientific
data tables and Numerical and numarray Python objects in a persistent
hierarchical structure. The foundation of the underlying hierarchical
data organization is the excellent HDF5 library
A table is defined as a collection of records whose values are stored
in fixed-length fields. All records have the same structure and all
values in each field have the same data type. The terms
"fixed-length" and strict "data types" seems to be quite a strange
requirement for an interpreted language like Python, but they serve a
useful function if the goal is to save very large quantities of data
(such as is generated by many scientific applications, for example) in
an efficient manner that reduces demand on CPU time and I/O resources.
Quite a bit effort has been invested to make browsing the hierarchical
data structure a pleasant experience. PyTables implements just two
(orthogonal) easy-to-use methods for browsing.
What is HDF5?
For those people who know nothing about HDF5, it is is a general
purpose library and file format for storing scientific data made at
NCSA. HDF5 can store two primary objects: datasets and groups. A
dataset is essentially a multidimensional array of data elements, and
a group is a structure for organizing objects in an HDF5 file. Using
these two basic constructs, one can create and store almost any kind of
scientific data structure, such as images, arrays of vectors, and
structured and unstructured grids. You can also mix and match them in
HDF5 files according to your needs.
I'm using Linux as the main development platform, but PyTables should
be easy to compile/install on other UNIX machines. This package has
also passed all the tests on a UltraSparc platform with Solaris 7 and
Solaris 8. It also compiles and passes all the tests on a SGI
Origin2000 with MIPS R12000 processors and running IRIX 6.5.
With Windows, PyTables has been tested with Windows 2000 Professional SP1
and Windows XP, but it should also work with other flavors.
For online code examples, have a look at
Go to the PyTables web site for more details:
Share your experience
Let me know of any bugs, suggestions, gripes, kudos, etc. you may
-- Francesc Alted
More information about the Numpy-discussion