[Numpy-discussion] Index Array Performance

Robert Kern robert.kern@gmail....
Tue Feb 14 03:58:02 CST 2012

On Mon, Feb 13, 2012 at 23:23, Marcel Oliver
<m.oliver@jacobs-university.de> wrote:
> Hi,
> I have a short piece of code where the use of an index array "feels
> right", but incurs a severe performance penalty: It's about an order
> of magnitude slower than all other operations with arrays of that
> size.
> It comes up in a piece of code which is doing a large number of "on
> the fly" histograms via
>  hist[i,j] += 1
> where i is an array with the bin index to be incremented and j is
> simply enumerating the histograms.  I attach a full short sample code
> below which shows how it's being used in context, and corresponding
> timeit output from the critical code section.

Other people have explained that yes, applying index arrays is slow. I
would just like to add the tangential point that this code does not
behave the way that you think it does. You cannot make histograms like
this. The statement "hist[i,j] += 1" gets broken down into three
separate statements by the Python compiler:

  tmp = hist.__getitem__((i,j))
  tmp = tmp.__iadd__(1)
  hist.__setitem__((i,j), tmp)

Note that tmp is a new array with copies of the data in hist at the
(i,j) locations, possibly multiple copies if the i index has
repetitions. Each one of these copies gets incremented by 1, then the
__setitem__() will apply each of those in turn to the appropriate cell
in hist, each one simply overwriting the previous one.

Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco

More information about the NumPy-Discussion mailing list