[Numpy-discussion] Index Array Performance
Robert Kern
robert.kern@gmail....
Tue Feb 14 03:58:02 CST 2012
On Mon, Feb 13, 2012 at 23:23, Marcel Oliver
<m.oliver@jacobs-university.de> wrote:
> Hi,
>
> I have a short piece of code where the use of an index array "feels
> right", but incurs a severe performance penalty: It's about an order
> of magnitude slower than all other operations with arrays of that
> size.
>
> It comes up in a piece of code which is doing a large number of "on
> the fly" histograms via
>
> hist[i,j] += 1
>
> where i is an array with the bin index to be incremented and j is
> simply enumerating the histograms. I attach a full short sample code
> below which shows how it's being used in context, and corresponding
> timeit output from the critical code section.
Other people have explained that yes, applying index arrays is slow. I
would just like to add the tangential point that this code does not
behave the way that you think it does. You cannot make histograms like
this. The statement "hist[i,j] += 1" gets broken down into three
separate statements by the Python compiler:
tmp = hist.__getitem__((i,j))
tmp = tmp.__iadd__(1)
hist.__setitem__((i,j), tmp)
Note that tmp is a new array with copies of the data in hist at the
(i,j) locations, possibly multiple copies if the i index has
repetitions. Each one of these copies gets incremented by 1, then the
__setitem__() will apply each of those in turn to the appropriate cell
in hist, each one simply overwriting the previous one.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco
More information about the NumPy-Discussion
mailing list