[Numpy-discussion] For loop tips

Tim Hochberg tim.hochberg at ieee.org
Tue Aug 29 13:48:19 CDT 2006


Tim Hochberg wrote:
> Keith Goodman wrote:
>   
>> I have a very long list that contains many repeated elements. The
>> elements of the list can be either all numbers, or all strings, or all
>> dates [datetime.date].
>>
>> I want to convert the list into a matrix where each unique element of
>> the list is assigned a consecutive integer starting from zero.
>>   
>>     
> If what you want is that the first unique element get's zero, the second 
> one, I don't think the code below will work in general since the dict 
> does not preserve order. You might want to look at the results for the 
> character case to see what I mean. If you're looking for something else, 
> you'll need to elaborate a bit. Since list2index doesn't return 
> anything, it's not entirely clear what the answer consists of. Just idx? 
> Idx plus uL?
>
>   
>> I've done it by brute force below. Any tips for making it faster? (5x
>> would make it useful; 10x would be a dream.)
>>   
>>     
> Assuming I understand what you're trying to do, this might help:
>
>     def list2index2(L):
>         idx = ones([len(L)])
>         map = {}
>         for i, x in enumerate(L):
>             index = map.get(x)
>             if index is None:
>                 map[x] = index = len(map)
>             idx[i] = index
>         return idx
>
>
> It's almost 10x faster for numbers and about 40x faster for characters 
> and dates. However it produces different results from list2index in the 
> second two cases. That may or may not be a good thing depending on what 
> you're really trying to do.
>   
Ugh! I fell victim to premature optimization disease. The following is 
both clearer and faster: Sigh.

    def list2index3(L):
        idx = ones([len(L)])
        map = {}
        for i, x in enumerate(L):
            if x not in map:
                map[x] = len(map)
            idx[i] = map[x]
        return idx



> -tim
>
>   
>>   
>>     
>>>> list2index.test()
>>>>       
>>>>         
>> Numbers: 5.84955787659 seconds
>> Characters: 24.3192870617 seconds
>> Dates: 39.288228035 seconds
>>
>>
>> import datetime, time
>> from numpy import nan, asmatrix, ones
>>
>> def list2index(L):
>>
>>   # Find unique elements in list
>>   uL = dict.fromkeys(L).keys()
>>
>>   # Convert list to matrix
>>   L = asmatrix(L).T
>>
>>   # Initialize return matrix
>>   idx = nan * ones((L.size, 1))
>>
>>   # Assign numbers to unique L values
>>   for i, uLi in enumerate(uL):
>>     idx[L == uLi,:] = i
>>
>> def test():
>>
>>     L = 5000*range(255)
>>     t1 = time.time()
>>     idx = list2index(L)
>>     t2 = time.time()
>>     print 'Numbers:', t2-t1, 'seconds'
>>
>>     L = 5000*[chr(z) for z in range(255)]
>>     t1 = time.time()
>>     idx = list2index(L)
>>     t2 = time.time()
>>     print 'Characters:', t2-t1, 'seconds'
>>
>>     d = datetime.date
>>     step = datetime.timedelta
>>     L = 5000*[d(2006,1,1)+step(z) for z in range(255)]
>>     t1 = time.time()
>>     idx = list2index(L)
>>     t2 = time.time()
>>     print 'Dates:', t2-t1, 'seconds'
>>
>> -------------------------------------------------------------------------
>> Using Tomcat but need to do more? Need to support web services, security?
>> Get stuff done quickly with pre-integrated technology to make your job easier
>> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>> _______________________________________________
>> Numpy-discussion mailing list
>> Numpy-discussion at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>>
>>
>>   
>>     
>
>
>
> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/numpy-discussion
>
>
>   






More information about the Numpy-discussion mailing list