[Numpy-discussion] [ANN] carray 0.2: an in-memory compressed data container
Fri Aug 27 10:22:53 CDT 2010
Announcing carray 0.2
What it is
carray is a container for numerical data that can be compressed
in-memory. The compresion process is carried out internally by Blosc,
a high-performance compressor that is optimized for binary data.
Having data compressed in-memory can reduce the stress of the memory
subsystem. The net result is that carray operations can be faster
than using a traditional ndarray object from NumPy.
Two new `__iter__()` and `iter(start, stop, step)` iterators that allows
to perform potentially complex operations much faster than using
plain ndarrays. For example::
In : a = np.arange(1e6)
In : time sum((v for v in a if v < 4))
CPU times: user 6.51 s, sys: 0.00 s, total: 6.51 s
Wall time: 6.52 s
In : b = ca.carray(a)
In : time sum((v for v in b if v < 4))
CPU times: user 0.73 s, sys: 0.04 s, total: 0.78 s
Wall time: 0.75 s # 8.7x faster than ndarray
The `iter(start, stop, step)` iterator also allows to select slices
specified by the `start`, `stop` and `step` parameters. Example::
In : time sum((v for v in a[2::3] if v < 10))
CPU times: user 2.18 s, sys: 0.00 s, total: 2.18 s
Wall time: 2.19 s
In : time sum((v for v in b.iter(start=2, step=3) if v < 10))
CPU times: user 0.26 s, sys: 0.03 s, total: 0.30 s
Wall time: 0.30 s # 7.3x faster than ndarray
The main advantage of these iterators is that you can use them in
generators and hence, you don't need to waste memory for creating
temporaries, which can be important when dealing with large arrays.
For more info, see the new ``Using iterators`` section in USAGE.txt.
Visit the main carray site repository at:
You can download a source package from:
Home of Blosc compressor:
Share your experience
Let us know of any bugs, suggestions, gripes, kudos, etc. you may
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion