[SciPy-User] Accumulation sum using indirect indexes

Alexander Kalinin alec.kalinin@gmail....
Sun Feb 5 01:17:12 CST 2012


Yes, the numpy.take() is much faster than "fancy" indexing and now "pure
numpy" solution is two time faster than pandas. Below are timing results:

The data shape:
     (1062, 6348)

Pandas solution:
    0.16610 seconds

"Pure numpy" solution:
    0.08907 seconds

Timing of the "pure numpy" by blocks:
block (a) (sorting and obtaining groups):
    0.00134 seconds
block (b) (copy data to the ordered_data):
    0.05517 seconds
block (c) (reduceat):
    0.02698

Alexander.

On Sun, Feb 5, 2012 at 4:01 AM, <josef.pktd@gmail.com> wrote:

> On Sat, Feb 4, 2012 at 2:27 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
> > On Sat, Feb 4, 2012 at 2:23 PM, Alexander Kalinin
> > <alec.kalinin@gmail.com> wrote:
> >> I have checked the performance of the "pure numpy" solution with pandas
> >> solution on my task. The "pure numpy" solution is about two times
> slower.
> >>
> >> The data shape:
> >>     (1062, 6348)
> >> Pandas "group by sum" time:
> >>     0.16588 seconds
> >> Pure numpy "group by sum" time:
> >>     0.38979 seconds
> >>
> >> But it is interesting, that the main bottleneck in numpy solution is the
> >> data copying. I have divided solution on three blocks:
> >>
> >> # block (a):
> >>     s = np.argsort(labels)
> >>
> >> keys, inv = np.unique(labels, return_inverse = True)
> >>
> >> i = inv[s]
> >>
> >> groups_at = np.where(i != np.concatenate(([-1], i[:-1])))[0]
> >>
> >>
> >> # block (b):
> >>     ordered_data = data[:, s]
>
> can you try with numpy.take? Keith and Wes were showing that take is
> much faster than advanced indexing.
>
> Josef
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20120205/2640ace9/attachment.html 


More information about the SciPy-User mailing list