[Numpy-discussion] Speed up function on cross product of two sets?

Tim Hochberg tim.hochberg at cox.net
Sun Apr 2 17:53:01 CDT 2006

Zachary Pincus wrote:

> Tim -
> Thanks for your suggestions -- that all makes good sense.
> It sounds like the general take home message is, as always: "the  
> first thing to try is to vectorize your inner loop."

Exactly and far more pithy than my meanderings. If I were going to make 
a list it would look something like:

0. Think about your algorithm.
1. Vectorize your inner loop.
2. Eliminate temporaries
3. Ask for help
4. Recode in C.
5 Accept that your code will never be fast.

Step zero should probably be repeated after every other step ;)


> Zach
>>> I have a inner loop that looks like this:
>>> out = []
>>> for elem1 in l1:
>>>   for elem2 in l2:
>>>     out.append(do_something(l1, l2))
>> this is do_something(elem1, elem2), correct?
>>> result = do_something_else(out)
>>> where do_something and do_something_else are implemented with  only  
>>> numpy ufuncs, and l1 and l2 are numpy arrays.
>>> As an example, I need to compute the median distance from any  
>>> element  in one set to any element in another set.
>>> What's the best way to speed this sort of thing up with numpy  
>>> (e.g.  push as much down into the underlying C as possible)? I  
>>> could re- write do_something with the numexpr tools (which are  very 
>>> cool), but  that doesn't address the fact that I've still got  
>>> nested loops living  in Python.
>> The exact approach I'd take would depend on the sizes of l1 and l2  
>> and a certain amount of trial and error. However, the first thing  
>> I'd try is:
>>    n1 = len(l1)
>>    n2 = len(l2)
>>    out = numpy.zeros([n1*n2], appropriate_dtype)
>>    for i, elem1 in enumerate(l1):
>>        out[i*n2:(i+1)*n2] = do_something(elem1, l1)
>>    result = do_something_else(out)
>> That may work as is, or you may have to tweak do_something slightly  
>> to handle l1 correctly. You might also try to do the operations in  
>> place and stuff the results into out directly by using X= and three  
>> argument ufuncs. I'd not do that at first though.
>> One thing to consider is that, in my experience, numpy works best  on 
>> chunks of about 10,000 elements. I believe that this is a  function 
>> of cache size. Anyway, this may choice of which of l1 and  l2 you 
>> continue to loop over, and which you vectorize. If they both  might 
>> get really big, you could even consider chopping up l1 when  you 
>> vectorize it. Again I wouldn't do that unless it really looks  like 
>> you need it.
>> If that all sounds opaque, feel free to ask more questions. Or if  
>> you have questions about microoptimizing the guts of do_something,  I 
>> have a bunch of experience with that and I like a good puzzle.
>>> Perhaps there's some way in numpy to make one big honking array  
>>> that  contains all the pairs from the two lists, and then just run  
>>> my  do_something on that huge array, but that of course scales  poorly.
>> I know of at least one way, but it's a bit of a kludge. I don't  
>> think I'd try that though. As you said, it scales poorly.  As long  
>> as you can vectorize your inner loop, it's not necessary and  
>> sometimes makes things worse, to vectorize your outer loop as well.  
>> That's assuming your inner loop is large, it doesn't help if your  
>> inner loop is 3 elements long for instance, but that doesn't seem  
>> like it should be a problem here.
>> Regards,
>> -tim

More information about the Numpy-discussion mailing list