[Numpy-discussion] How to efficiently reduce a 1-dim 100-10000 element array with user defined binary function
Sat Nov 15 16:29:21 CST 2008
Charles R Harris <charlesr.harris <at> gmail.com> writes:
> On Fri, Nov 14, 2008 at 6:22 PM, Kim Hansen <slaunger <at> gmail.com> wrote:
> Dear numpy-discussion,
> I am quite new to Python and numpy.
> I am working on a Python application using Scipy, where I need to
> unpack and pack quite large amounts of data (GBs) into data structures
> and convert them into other data structures. Until now the program has
> been running amazingly efficiently (processing 10-125 MB/s depending
> on the task) because I have managed to use only built-in array
> functions for all computationally intensive operations (and beause i
> have fairly good hardware to run it on).
> However, I have run into problems now, as I need to compute new
> checksums for modified data structures using a ones complement add on
> one-dimensional, uint16 arrayrs with 50-10000 elements.
> The best procedure I have come up with yet is the following implementation
> def complement_add_uint16(a, b):
> Returns the complement ones addition of two unsigned 16 bit integers
> The values are added and if the carry is set, the value is incremented.
> It is assumed that a and b are both in the range [0; 0xFFFF].
> c = a + b
> c += c >> 16
> return c & 0xFFFF
> def complement_add_checksum(ints):
> Return a complement checksum based on a
> one-dimensional array of dtype=uint16
> return reduce(complement_add_uint16, ints, 0)
> Can't you add them up in a higher precision, then do a = (a & 0xffff ) + (a
>> 16) until the high order bits are gone? The high order bits will count the
>number ones you need to add. Something like
>def complement_add_uint16(x) :
> y = sum(x, dtype=uint64)
> while y >= 2**16 :
> y = (y & uint64(0xffff)) + (y >> uint64(16))
> return y
>You might need to block the data to avoid overflow of uint64, but you get an
>extra 48 bits in any case. Mind, I'm not sure this is exactly the same thing
>as what you are trying to do, so it bears checking.
Charles, you are great!
That does seems to work alright!
It passes my quite elaborate unit tests of the checksum function, where I use
several real world examples and the performance boost is enourmous!
Whereas my previous method never got me any higher processing speeds than
413 kB/s, I am now at, hold on, 47400 kB/s checksum processing speed which is
a x 100 performance boost. Once again I see that numpy rules! No need to do
Fantastic! You saved me a lot of trouble there. I had sort of thought of the
same idea, but somehow convinced myself that it would not work.
More information about the Numpy-discussion