[Numpy-discussion] [ANN] carray: an in-memory compressed data container

Francesc Alted faltet@pytables....
Fri Aug 20 18:31:03 CDT 2010


2010/8/20, Zbyszek Szmek <zbyszek@in.waw.pl>:
> OK, I've got a case where carray really shines :|
>
> zbyszek@escher:~/python/numpy/carray-0.1.dev$ PYTHONPATH=. python
> bench/concat.py numpy 800000 1000 4 1
> problem size: (800000) x 1000 = 10^8.90309
> time for concat: 4.806s
> size of the final container: 6103.516 MB
> zbyszek@escher:~/python/numpy/carray-0.1.dev$ PYTHONPATH=. python
> bench/concat.py concat 800000 1000 4 1
> problem size: (800000) x 1000 = 10^8.90309
> time for concat: 3.475s
> size of the final container: 6103.516 MB
> zbyszek@escher:~/python/numpy/carray-0.1.dev$ PYTHONPATH=. python
> bench/concat.py carray 800000 1000 4 1
> problem size: (800000) x 1000 = 10^8.90309
> time for concat: 1.434s
> size of the final container: 373.480 MB
>
> Size is set to NOT hit the swap. This is still the easily compressible
> arange... but still, the results are very nice.

Wow, the results with your processor are much nicer than with my Atom
indeed.  But yeah, I somewhat expected this because Blosc works much
faster with recent processors, as can be seen in:

http://blosc.pytables.org/trac/wiki/SyntheticBenchmarks

BTW, the difference between memcpy and memmove times for this
benchmark is almost 40% for your computer, which is really large :-/
Hmm, something must go really wrong with memcpy in some glibc
distributions...

At any rate, for real data that is less compressible the advantages of
carray will be less apparent, but at least the proof of concept seems
to work as intended, so I'm very happy with it.  I'm also expecting
that the combination carray/numexpr would perform faster than plain
computations programmed in C, most specially with modern processors,
but will see how much faster exactly.

> Of course when the swap is hit, the ratio between carray and a normal array
> can grow to infinity :)
>
> zbyszek@escher:~/python/numpy/carray-0.1.dev$ PYTHONPATH=. python
> bench/concat.py numpy 1000000 1000 3 1
> problem size: (1000000) x 1000 = 10^9
> time for concat: 35.700s
> size of the final container: 7629.395 MB
> zbyszek@escher:~/python/numpy/carray-0.1.dev$ PYTHONPATH=. python
> bench/concat.py carray 1000000 1000 3 1
> problem size: (1000000) x 1000 = 10^9
> time for concat: 1.751s
> size of the final container: 409.633 MB

Exactly.  This is another scenario where the carray concept can be
really useful.

-- 
Francesc Alted


More information about the NumPy-Discussion mailing list