# [Numpy-discussion] Speed up function on cross product of two sets?

Zachary Pincus zpincus at stanford.edu
Sun Apr 2 17:17:06 CDT 2006

Tim -

Thanks for your suggestions -- that all makes good sense.

It sounds like the general take home message is, as always: "the
first thing to try is to vectorize your inner loop."

Zach

>> I have a inner loop that looks like this:
>> out = []
>> for elem1 in l1:
>>   for elem2 in l2:
>>     out.append(do_something(l1, l2))
>
> this is do_something(elem1, elem2), correct?
>
>> result = do_something_else(out)
>>
>> where do_something and do_something_else are implemented with
>> only  numpy ufuncs, and l1 and l2 are numpy arrays.
>>
>> As an example, I need to compute the median distance from any
>> element  in one set to any element in another set.
>>
>> What's the best way to speed this sort of thing up with numpy
>> (e.g.  push as much down into the underlying C as possible)? I
>> could re- write do_something with the numexpr tools (which are
>> very cool), but  that doesn't address the fact that I've still got
>> nested loops living  in Python.
>
> The exact approach I'd take would depend on the sizes of l1 and l2
> and a certain amount of trial and error. However, the first thing
> I'd try is:
>
>    n1 = len(l1)
>    n2 = len(l2)
>    out = numpy.zeros([n1*n2], appropriate_dtype)
>    for i, elem1 in enumerate(l1):
>        out[i*n2:(i+1)*n2] = do_something(elem1, l1)
>    result = do_something_else(out)
>
> That may work as is, or you may have to tweak do_something slightly
> to handle l1 correctly. You might also try to do the operations in
> place and stuff the results into out directly by using X= and three
> argument ufuncs. I'd not do that at first though.
>
> One thing to consider is that, in my experience, numpy works best
> on chunks of about 10,000 elements. I believe that this is a
> function of cache size. Anyway, this may choice of which of l1 and
> l2 you continue to loop over, and which you vectorize. If they both
> might get really big, you could even consider chopping up l1 when
> you vectorize it. Again I wouldn't do that unless it really looks
> like you need it.
>
> If that all sounds opaque, feel free to ask more questions. Or if
> you have questions about microoptimizing the guts of do_something,
> I have a bunch of experience with that and I like a good puzzle.
>
>>
>> Perhaps there's some way in numpy to make one big honking array
>> that  contains all the pairs from the two lists, and then just run
>> my  do_something on that huge array, but that of course scales
>> poorly.
>
> I know of at least one way, but it's a bit of a kludge. I don't
> think I'd try that though. As you said, it scales poorly.  As long
> as you can vectorize your inner loop, it's not necessary and
> sometimes makes things worse, to vectorize your outer loop as well.
> That's assuming your inner loop is large, it doesn't help if your
> inner loop is 3 elements long for instance, but that doesn't seem
> like it should be a problem here.
>
> Regards,
>
> -tim
>