[Numpy-discussion] fast way of doing "cross-multiplications" ?
tim.hochberg at cox.net
Thu Jul 20 18:35:35 CDT 2006
Fernando Perez wrote:
> On 7/18/06, Tim Hochberg <tim.hochberg at cox.net> wrote:
>> Eric Emsellem wrote:
>> > thanks for the tips. (indeed your "add.reduce" is correct: I just
>> > this down too quickly, in the script I have a "sum" included).
>> > And yes you are right for the memory issue, so I may just keep the
>> > in and try to make it work on a fast PC...(or use parallel processes)
>> > (is "sum" different than "add.reduce"?)
>> > thanks again to both Bill Baxter and Perry Greenfield for their fast
>> > (and helpful!) answers.
>> I just wanted to add that there are faster, but considerably complicated
>> ways to attack this class of problems. The one I've looked at in the
>> past was the fast multipole method and I believe there are others. I'm
>> not sure whether these can be implemented efficiently in numpy, but you
>> may want to take a look into this kind of more sophisticated/complicated
>> approach if brute forcing the calculation doesn't work.
> Indeed, FMM is the best known method that can turn this O(n^2) problem
> into O(n*log(n)). As Tim indicates, there is no easy way out of this
> one. Incidentally, there is a talk about FMM on the scipy'06
> schedule, in case you are going to attend.
> An alternative approach to FMM (conceptually similar in some ways) is
> described here:
> Unfortunately this isn't exactly a half-hour optimization effort, as
> the required machinery is pretty heavy duty. And yes, this has all
> been done in python and runs on current numpy/scipy (though it has
> Fortran, C and C++ sprinkled as needed).
I'm interested in hearing in what situations people end up dropping into
C/C++ or Fortran. It would be interesting to see if general solutions
could be found for making some of them fast enough at the Python level.
Here are some situations that I've run into where numpy has trouble
computing results efficiently.
1. The value at a given point is the computed one of two (or more)
ways depending on some condition. The subcomputations are
expensive, so one does not want to compute them both and then
chose a result using 'where'. I think I have a decent way to
attack this at the Python level using argsort: basically, sort all
of the operands, find the boundaries between the regions where
different computations are done, compute the expensive
subcomputations on the appropriate blocks and then unsort the
data. It would be interesting to see if this could be automated in
a clean way, perhaps within numexpr.
2. Generalized reduce. We want to reduce an array using an arbitrary
function f(x, y) that is not a standard ufunc. In the particular
case I've run into, x and y would be vectors or perhaps record
arrays, and the functions would couple the various records when
producing the output. I don't see how this case can be handled
efficiently at the Python level, short of using something like
Weave to produce a custom ufunc and calling reduce on it. I don't
see how something like numexpr could handle it, unfortunately,
since it gains it's speed by operating on blocks of data.
I'm sure there are other cases that I've run into, but that's all I'm
coming up with right now.
More information about the Numpy-discussion