[SciPy-User] Accumulation sum using indirect indexes
Alexander Kalinin
alec.kalinin@gmail....
Sun Feb 5 01:17:12 CST 2012
Yes, the numpy.take() is much faster than "fancy" indexing and now "pure
numpy" solution is two time faster than pandas. Below are timing results:
The data shape:
(1062, 6348)
Pandas solution:
0.16610 seconds
"Pure numpy" solution:
0.08907 seconds
Timing of the "pure numpy" by blocks:
block (a) (sorting and obtaining groups):
0.00134 seconds
block (b) (copy data to the ordered_data):
0.05517 seconds
block (c) (reduceat):
0.02698
Alexander.
On Sun, Feb 5, 2012 at 4:01 AM, <josef.pktd@gmail.com> wrote:
> On Sat, Feb 4, 2012 at 2:27 PM, Wes McKinney <wesmckinn@gmail.com> wrote:
> > On Sat, Feb 4, 2012 at 2:23 PM, Alexander Kalinin
> > <alec.kalinin@gmail.com> wrote:
> >> I have checked the performance of the "pure numpy" solution with pandas
> >> solution on my task. The "pure numpy" solution is about two times
> slower.
> >>
> >> The data shape:
> >> (1062, 6348)
> >> Pandas "group by sum" time:
> >> 0.16588 seconds
> >> Pure numpy "group by sum" time:
> >> 0.38979 seconds
> >>
> >> But it is interesting, that the main bottleneck in numpy solution is the
> >> data copying. I have divided solution on three blocks:
> >>
> >> # block (a):
> >> s = np.argsort(labels)
> >>
> >> keys, inv = np.unique(labels, return_inverse = True)
> >>
> >> i = inv[s]
> >>
> >> groups_at = np.where(i != np.concatenate(([-1], i[:-1])))[0]
> >>
> >>
> >> # block (b):
> >> ordered_data = data[:, s]
>
> can you try with numpy.take? Keith and Wes were showing that take is
> much faster than advanced indexing.
>
> Josef
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20120205/2640ace9/attachment.html
More information about the SciPy-User
mailing list