[Numpy-discussion] Optimization of loops

Kurt Smith kwmsmith@gmail....
Mon Oct 27 09:45:31 CDT 2008


On Thu, Oct 23, 2008 at 5:55 PM, Pierre Yger <pierre.yger@gmail.com> wrote:

>
> import bisect
> my_list.sort()
> time, ind = zip(*my_list)
> for i in id_list :
>     beg = bisect.bisect_left(ind,i)
>    end = bisect.bisect_right(ind,i)
>    mylist.append(tim[beg:end])
>

I've always found itertools.groupby to be the ideal function for such
groupings.  From what I can tell, it saves about 30% in time over the bisect
version.

[code]
from random import randrange, random
import bisect
from itertools import groupby

N = 1000
nn = 100000

dta = []
for i in xrange(nn):
    dta.append((randrange(0,N), random()))

def bsct(dta):
    dta.sort()
    ind, time = zip(*dta)
    ret_list = []
    for i in xrange(N):
        beg = bisect.bisect_left(ind, i)
        end = bisect.bisect_right(ind, i)
        ret_list.append((i,time[beg:end]))
    ret_list = [subl for subl in ret_list if len(subl) == 2]
    return ret_list

def gpby(dta):
    dta.sort()
    keyfunc = lambda x: x[0]
    res = []
    for key, subit in groupby(dta, keyfunc):
        res.append((key, tuple(time for ind, time in subit)))
    return res

if __name__ == '__main__':
    from time import clock
    dta1 = dta
    dta2 = dta[:]
    c1 = clock()
    bd = bsct(dta1)
    print "bisect: %f" % (clock() - c1)
    c1 = clock()
    gb = gpby(dta2)
    print "groupby: %f" % (clock() - c1)
    assert bd == gb
[/code]

bisect: 1.430000
groupby: 0.970000

Hope this helps,

Kurt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://projects.scipy.org/pipermail/numpy-discussion/attachments/20081027/44ff563d/attachment.html 


More information about the Numpy-discussion mailing list