# [Numpy-discussion] Speed up function on cross product of two sets?

Tim Hochberg tim.hochberg at cox.net
Sun Apr 2 16:53:05 CDT 2006

```Zachary Pincus wrote:

> Hi folks,

Hi Zach,

>
> I have a inner loop that looks like this:
> out = []
> for elem1 in l1:
>   for elem2 in l2:
>     out.append(do_something(l1, l2))

this is do_something(elem1, elem2), correct?

> result = do_something_else(out)
>
> where do_something and do_something_else are implemented with only
> numpy ufuncs, and l1 and l2 are numpy arrays.
>
> As an example, I need to compute the median distance from any element
> in one set to any element in another set.
>
> What's the best way to speed this sort of thing up with numpy (e.g.
> push as much down into the underlying C as possible)? I could re-
> write do_something with the numexpr tools (which are very cool), but
> that doesn't address the fact that I've still got nested loops living
> in Python.

The exact approach I'd take would depend on the sizes of l1 and l2 and a
certain amount of trial and error. However, the first thing I'd try is:

n1 = len(l1)
n2 = len(l2)
out = numpy.zeros([n1*n2], appropriate_dtype)
for i, elem1 in enumerate(l1):
out[i*n2:(i+1)*n2] = do_something(elem1, l1)
result = do_something_else(out)

That may work as is, or you may have to tweak do_something slightly to
handle l1 correctly. You might also try to do the operations in place
and stuff the results into out directly by using X= and three argument
ufuncs. I'd not do that at first though.

One thing to consider is that, in my experience, numpy works best on
chunks of about 10,000 elements. I believe that this is a function of
cache size. Anyway, this may choice of which of l1 and l2 you continue
to loop over, and which you vectorize. If they both might get really
big, you could even consider chopping up l1 when you vectorize it. Again
I wouldn't do that unless it really looks like you need it.

If that all sounds opaque, feel free to ask more questions. Or if you
have questions about microoptimizing the guts of do_something, I have a
bunch of experience with that and I like a good puzzle.

>
> Perhaps there's some way in numpy to make one big honking array that
> contains all the pairs from the two lists, and then just run my
> do_something on that huge array, but that of course scales poorly.

I know of at least one way, but it's a bit of a kludge. I don't think
I'd try that though. As you said, it scales poorly.  As long as you can
vectorize your inner loop, it's not necessary and sometimes makes things
worse, to vectorize your outer loop as well. That's assuming your inner
loop is large, it doesn't help if your inner loop is 3 elements long for
instance, but that doesn't seem like it should be a problem here.

Regards,

-tim

```