[Numpy-discussion] [help needed] associativity and precedence of '@'

Nathaniel Smith njs@pobox....
Wed Mar 19 14:45:30 CDT 2014

On Sat, Mar 15, 2014 at 3:41 AM, Nathaniel Smith <njs@pobox.com> wrote:
> I think we need to
> know something about how often the Mat @ Mat @ vec type cases arise in
> practice. How often do non-scalar * and np.dot show up in the same
> expression? How often does it look like a * np.dot(b, c), and how often
> it look like np.dot(a * b, c)? How often do we see expressions like
> np.dot(np.dot(a, b), c), and how often do we see expressions like
> np.dot(b, c))? This would really help guide the debate. I don't have this
> data, and I'm not sure the best way to get it. A super-fancy approach
> be to write a little script that uses the 'ast' module to count things
> automatically. A less fancy approach would be to just pick some code
> written, or a well-known package, grep through for calls to 'dot', and
> notes on what you see. (An advantage of the less-fancy approach is that
as a
> human you might be able to tell the difference between scalar and
> *, or check whether it actually matters what order the 'dot' calls are
> in.)

Okay, I wrote a little script [1] to scan Python source files look for
things like 'dot(a, dot(b, c))' or 'dot(dot(a, b), c)', or the ndarray.dot
method equivalents. So what we get out is:
- a count of how many 'dot' calls there are
- a count of how often we see left-associative nestings: dot(dot(a, b), c)
- a count of how often we see right-associative nestings: dot(a, dot(b, c))

Running it on a bunch of projects, I get:

| project      | dots | left | right | right/left |
| scipy        |  796 |   53 |    27 |       0.51 |
| nipy         |  275 |    3 |    19 |       6.33 |
| scikit-learn |  472 |   11 |    10 |       0.91 |
| statsmodels  |  803 |   46 |    38 |       0.83 |
| astropy      |   17 |    0 |     0 |        nan |
| scikit-image |   15 |    1 |     0 |       0.00 |
| total        | 2378 |  114 |    94 |       0.82 |

(Any other projects worth trying? This is something that could vary a lot
between different projects, so it seems more important to get lots of
projects here than to get a few giant projects. Or if anyone wants to run
the script on their own private code, please do! Running it on my personal
pile of random junk finds 3 left-associative and 1 right.)

Two flaws with this approach:
1) Probably some proportion of those nested dot calls are places where it
doesn't actually matter which evaluation order one uses -- dot() forces you
to pick one, so you have to. If people prefer to, say, use the "left" form
in cases where it doesn't matter, then this could bias the left-vs-right
results -- hard to say. (Somewhere in this thread it was suggested that the
use of the .dot method could create such a bias, because a.dot(b).dot(c) is
more natural than a.dot(b.dot(c)), but only something like 6% of the dot
calls here use the method form, so this probably doesn't matter.)

OTOH, this also means that the total frequency of @ expressions where
associativity even matters at all is probably *over*-estimated by the above.

2) This approach misses cases where the cumbersomeness of dot has caused
people to introduce temporary variables, like 'foo = np.dot(a, b); bar =
np.dot(foo, c)'. So this causes us to *under*-estimate how often
associativity matters. I did read through the 'dot' uses in scikit-learn
and nipy, though, and only caught a handful of such cases, so I doubt it
changes anything much.


[1] https://gist.github.com/njsmith/9157645#file-grep-dot-dot-py

Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20140319/103f11da/attachment.html 

More information about the NumPy-Discussion mailing list