[Numpy-discussion] Scipy dot
Nathaniel Smith
njs@pobox....
Fri Nov 9 17:05:34 CST 2012
In this case it's even possible the blas is being smart enough to notice
that you've passed it the same pointer twice with the transpose switch on
one of them, and is just skipping half the multiplies since the output is
provably symmetric. Don't know if they actually do that but...
-n
On 9 Nov 2012 22:57, "Matthieu Brucher" <matthieu.brucher@gmail.com> wrote:
> Hi,
>
> A.A slower than A.A' is not a surprise for me. The latter is far more
> cache friendly that the former. Everything follows cache lines, so it is
> faster than something that will use one element from each cache line. In
> fact it is exactly what "proves" that the new version is correct.
> Good job (if all the tests were made and still pass ;) )
>
> Cheers,
>
> Matthieu
>
>
> 2012/11/9 Nicolas SCHEFFER <scheffer.nicolas@gmail.com>
>
>> Ok: comparing apples to apples. I'm clueless on my observations and
>> would need input from you guys.
>>
>> Using ATLAS 3.10, numpy with and without my changes, I'm getting these
>> timings and comparisons.
>>
>> #
>> #I. Generate matrices using regular dot:
>> #
>> big = np.array(np.random.randn(2000, 2000), 'f');
>> np.savez('out', big=big, none=big.dot(big), both=big.T.dot(big.T),
>> left=big.T.dot(big), right=big.dot(big.T))"
>>
>> #
>> #II. Timings with regular dot
>> #
>> In [3]: %timeit np.dot(big, big)
>> 10 loops, best of 3: 138 ms per loop
>>
>> In [4]: %timeit np.dot(big, big.T)
>> 10 loops, best of 3: 166 ms per loop
>>
>> In [5]: %timeit np.dot(big.T, big.T)
>> 10 loops, best of 3: 193 ms per loop
>>
>> In [6]: %timeit np.dot(big.T, big)
>> 10 loops, best of 3: 165 ms per loop
>>
>> #
>> #III. I load these arrays and time again with the "fast" dot
>> #
>> In [21]: %timeit np.dot(big, big)
>> 10 loops, best of 3: 138 ms per loop
>>
>> In [22]: %timeit np.dot(big.T, big)
>> 10 loops, best of 3: 104 ms per loop
>>
>> In [23]: %timeit np.dot(big.T, big.T)
>> 10 loops, best of 3: 138 ms per loop
>>
>> In [24]: %timeit np.dot(big, big.T)
>> 10 loops, best of 3: 102 ms per loop
>>
>> 1. A'.A': great!
>> 2. A.A' becomes faster than A.A !?!
>>
>> #
>> #IV. MSE on differences
>> #
>> In [25]: np.sqrt(((arr['none'] - none)**2).sum())
>> Out[25]: 0.0
>>
>> In [26]: np.sqrt(((arr['both'] - both)**2).sum())
>> Out[26]: 0.0
>>
>> In [27]: np.sqrt(((arr['left'] - left)**2).sum())
>> Out[27]: 0.015900515
>>
>> In [28]: np.sqrt(((arr['right'] - right)**2).sum())
>> Out[28]: 0.015331409
>>
>> #
>> # CCl
>> #
>> While the MSE are small, I'm wondering whether:
>> - It's a bug: it should be exactly the same
>> - It's a feature: BLAS is taking shortcuts when you have A.A'. The
>> difference is not significant. Quick: PR that asap!
>>
>> I don't have enough expertise to answer that...
>>
>> Thanks much!
>>
>> -nicolas
>> On Fri, Nov 9, 2012 at 2:13 PM, Nicolas SCHEFFER
>> <scheffer.nicolas@gmail.com> wrote:
>> > I too encourage users to use scipy.linalg for speed and robustness
>> > (hence calling this scipy.dot), but it just brings so much confusion!
>> > When using the scipy + numpy ecosystem, you'd almost want everything
>> > be done with scipy so that you get the best implementation in all
>> > cases: scipy.zeros(), scipy.array(), scipy.dot(), scipy.linalg.inv().
>> >
>> > Anyway this is indeed for another thread, the confusion we'd like to
>> > fix here is that users shouldn't have to understand the C/F contiguous
>> > concepts to get the maximum speed for np.dot()
>> >
>> > To summarize:
>> > - The python snippet I posted is still valid and can speed up your
>> > code if you can change all your dot() calls.
>> > - The change in dotblas.c is a bit more problematic because it's very
>> > core. I'm having issues right now to replicate the timings, I've got
>> > better timing for a.dot(a.T) than for a.dot(a). There might be a bug.
>> >
>> > It's a pain to test because I cannot do the test in a single python
>> session.
>> > I'm going to try to integrate most of your suggestions, I cannot
>> > guarantee I'll have time to do them all though.
>> >
>> > -nicolas
>> > On Fri, Nov 9, 2012 at 8:56 AM, Nathaniel Smith <njs@pobox.com> wrote:
>> >> On Fri, Nov 9, 2012 at 4:25 PM, Gael Varoquaux
>> >> <gael.varoquaux@normalesup.org> wrote:
>> >>> On Fri, Nov 09, 2012 at 03:12:42PM +0000, Nathaniel Smith wrote:
>> >>>> But what if someone compiles numpy against an optimized blas (mkl,
>> >>>> say) and then compiles SciPy against the reference blas? What do you
>> >>>> do then!? ;-)
>> >>>
>> >>> This could happen. But the converse happens very often. What happens
>> is
>> >>> that users (eg on shared computing resource) ask for a scientific
>> python
>> >>> environment. The administrator than installs the package starting from
>> >>> the most basic one, to the most advanced one, thus starting with numpy
>> >>> that can very well build without any external blas. When he gets to
>> scipy
>> >>> he hits the problem that the build system does not detect properly the
>> >>> blas, and he solves that problem.
>> >>>
>> >>> Also, it used to be that on the major linux distributions, numpy
>> would not
>> >>> be build with an optimize lapack because numpy was in the 'base' set
>> of
>> >>> packages, but not lapack. On the contrary, scipy being in the
>> 'contrib'
>> >>> set, it could depend on lapack. I just checked, and this has been
>> fixed
>> >>> in the major distributions (Fedora, Debian, Ubuntu).
>> >>>
>> >>> Now we can discuss with such problems should not happen, and put the
>> >>> blame on the users/administrators, the fact is that they happen
>> often. I
>> >>> keep seeing environments in which np.linalg is unreasonnably slow.
>> >>
>> >> If this is something that's been a problem for you, maybe we should
>> >> start another thread on things we could do to fix it directly? Improve
>> >> build instructions, advertise build systems that set up the whole
>> >> environment (and thus do the right thing), make numpy's setup.py
>> >> scream and yell if blas isn't available...?
>> >>
>> >> -n
>> >> _______________________________________________
>> >> NumPy-Discussion mailing list
>> >> NumPy-Discussion@scipy.org
>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
>
> --
> Information System Engineer, Ph.D.
> Blog: http://matt.eifelle.com
> LinkedIn: http://www.linkedin.com/in/matthieubrucher
> Music band: http://liliejay.com/
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20121109/5acc8f84/attachment.html
More information about the NumPy-Discussion
mailing list