[Numpy-discussion] [ANN] carray: an in-memory compressed data container

Sebastian Haase seb.haase@gmail....
Sat Aug 21 15:34:22 CDT 2010


Hi Francesc,

another exciting project ... congratulations !
Am I correct in thinking that memmapping a carray would also be a
great speed advantage over memmapped ndarrays ?  Let's say I have a
2Gbyte ndarray memmaped over a NFS network connection, should the
speed increase simply scale with the compression factor ?

Regards,
Sebastian


On Sat, Aug 21, 2010 at 1:31 AM, Francesc Alted <faltet@pytables.org> wrote:
> 2010/8/20, Zbyszek Szmek <zbyszek@in.waw.pl>:
>> OK, I've got a case where carray really shines :|
>>
>> zbyszek@escher:~/python/numpy/carray-0.1.dev$ PYTHONPATH=. python
>> bench/concat.py numpy 800000 1000 4 1
>> problem size: (800000) x 1000 = 10^8.90309
>> time for concat: 4.806s
>> size of the final container: 6103.516 MB
>> zbyszek@escher:~/python/numpy/carray-0.1.dev$ PYTHONPATH=. python
>> bench/concat.py concat 800000 1000 4 1
>> problem size: (800000) x 1000 = 10^8.90309
>> time for concat: 3.475s
>> size of the final container: 6103.516 MB
>> zbyszek@escher:~/python/numpy/carray-0.1.dev$ PYTHONPATH=. python
>> bench/concat.py carray 800000 1000 4 1
>> problem size: (800000) x 1000 = 10^8.90309
>> time for concat: 1.434s
>> size of the final container: 373.480 MB
>>
>> Size is set to NOT hit the swap. This is still the easily compressible
>> arange... but still, the results are very nice.
>
> Wow, the results with your processor are much nicer than with my Atom
> indeed.  But yeah, I somewhat expected this because Blosc works much
> faster with recent processors, as can be seen in:
>
> http://blosc.pytables.org/trac/wiki/SyntheticBenchmarks
>
> BTW, the difference between memcpy and memmove times for this
> benchmark is almost 40% for your computer, which is really large :-/
> Hmm, something must go really wrong with memcpy in some glibc
> distributions...
>
> At any rate, for real data that is less compressible the advantages of
> carray will be less apparent, but at least the proof of concept seems
> to work as intended, so I'm very happy with it.  I'm also expecting
> that the combination carray/numexpr would perform faster than plain
> computations programmed in C, most specially with modern processors,
> but will see how much faster exactly.
>
>> Of course when the swap is hit, the ratio between carray and a normal array
>> can grow to infinity :)
>>
>> zbyszek@escher:~/python/numpy/carray-0.1.dev$ PYTHONPATH=. python
>> bench/concat.py numpy 1000000 1000 3 1
>> problem size: (1000000) x 1000 = 10^9
>> time for concat: 35.700s
>> size of the final container: 7629.395 MB
>> zbyszek@escher:~/python/numpy/carray-0.1.dev$ PYTHONPATH=. python
>> bench/concat.py carray 1000000 1000 3 1
>> problem size: (1000000) x 1000 = 10^9
>> time for concat: 1.751s
>> size of the final container: 409.633 MB
>
> Exactly.  This is another scenario where the carray concept can be
> really useful.
>
> --
> Francesc Alted
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the NumPy-Discussion mailing list