[Numpy-discussion] For loop tips
Tue Aug 29 12:57:26 CDT 2006
I have a very long list that contains many repeated elements. The
elements of the list can be either all numbers, or all strings, or all
dates [datetime.date].
I want to convert the list into a matrix where each unique element of
the list is assigned a consecutive integer starting from zero.
I've done it by brute force below. Any tips for making it faster? (5x
would make it useful; 10x would be a dream.)
>> list2index.test()
Numbers: 5.84955787659 seconds
Characters: 24.3192870617 seconds
Dates: 39.288228035 seconds
import datetime, time
from numpy import nan, asmatrix, ones
def list2index(L):
# Find unique elements in list
uL = dict.fromkeys(L).keys()
# Convert list to matrix
L = asmatrix(L).T
# Initialize return matrix
idx = nan * ones((L.size, 1))
# Assign numbers to unique L values
for i, uLi in enumerate(uL):
idx[L == uLi,:] = i
def test():
L = 5000*range(255)
t1 = time.time()
idx = list2index(L)
t2 = time.time()
print 'Numbers:', t2-t1, 'seconds'
L = 5000*[chr(z) for z in range(255)]
t1 = time.time()
idx = list2index(L)
t2 = time.time()
print 'Characters:', t2-t1, 'seconds'
d = datetime.date
step = datetime.timedelta
L = 5000*[d(2006,1,1)+step(z) for z in range(255)]
t1 = time.time()
idx = list2index(L)
t2 = time.time()
print 'Dates:', t2-t1, 'seconds'
