[Numpy-discussion] odd performance of sum?
Sat Feb 12 21:30:47 CST 2011
I think everybody would be happy if .sum() were as fast as possible, and patches for review are always welcome. There are distinct things that could be done to improve the performance of ufuncs and their methods as several people have shown. In fact, there is a lot of low-hanging fruit inside of NumPy for optimization.
One of the advantages of an open source community is that different needs get addressed by the people who need them. So, if you really need a faster sum and are willing to do the hard work to make it happen (or convince someone else to do it for you), then there will be many to say thank you when you are finished. You should know though that in my experience It is harder than you might think at first to "convince someone else to do it for you."
On Feb 12, 2011, at 6:02 PM, eat wrote:
> Hi Sturla,
> On Sat, Feb 12, 2011 at 5:38 PM, Sturla Molden <email@example.com> wrote:
> Den 10.02.2011 16:29, skrev eat:
> > One would expect sum to outperform dot with a clear marginal. Does
> > there exixts any 'tricks' to increase the performance of sum?
> First of all, thanks for you still replying. Well, I'm still a little bit unsure how I should proceed with this discussion... I may have used bad wordings and created unneccessary mayhyem with my original question (:. Trust me, I'm only trying to discuss here with a positive criticism in my mind.
> Now, I'm not pretending to know what kind of a person a 'typical' numpy user is. But I'm assuming that there just exists more than me with roughly similar questions in their (our) minds and who wish to utilize numpy more 'pythonic; all batteries included' way. Ocassionally I (we) may ask really stupid questions, but please beare with us.
> Said that, I'm still very confident that (from a users point of view) there's some real substance on the issue I addressed.
> I see that others have ansvered already. The ufunc np.sum is not going
> going to beat np.dot. You are racing the heavy machinery of NumPy (array
> iterators, type chekcs, bound checks, etc.) against level-3 BLAS routine
> DGEMM, the most heavily optimized numerical kernel ever written.
> Fair enough.
> beware that computation is much cheaper than memory access.
> Sure, that's exactly where I expected the performance boost to emerge.
> DGEMM does more arithmetics, and even is O(N3) in that respect, it is
> always faster except for very sparse arrays. If you need fast loops, you
> can always write your own Fortran or C, and even insert OpenMP pragmas.
> That's a very important potential, but surely not all numpy users are expected to master that ;-)
> But don't expect that to beat optimized high-level BLAS kernels by any
> margin. The first chapters of "Numerical Methods in Fortran 90" might be
> worth reading. It deals with several of these issues, including
> dimensional expansion, which is important for writing fast numerical
> code -- but not intuitively obvious. "I expect this to be faster because
> it does less work" is a fundamental misconception in numerical
> computing. Whatever cause less traffic on the memory BUS (the real
> bottleneck) will almost always be faster, regardless of the amount of
> work done by the CPU.
> And I'm totally aware of it, and actually it was exactly the original intended logic of my question: "how bout if the sum could follow the steps of dot; then, since less instructions it must be bounded below of the execution time of dot". But as R. Kern gently pointed out allready it's not fruitfull enough avenue to proceed. And I'm able to live with that.
> A good advice is to use high-level BLAS whenever
> you can. The only exception, as mentioned, is when matrices get very sparse.
> NumPy-Discussion mailing list
> NumPy-Discussion mailing list
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion