[Numpy-discussion] Rank-0 arrays - reprise

Nathaniel Smith njs@pobox....
Sat Jan 5 16:58:04 CST 2013

On Sat, Jan 5, 2013 at 10:10 PM, David Cournapeau <cournape@gmail.com> wrote:
> Thanks for the entertaining explanation.

Procrastination is a hell of a drug.

> I don't think 0-dim array being slow is such a big drawback. I would
> be really surprised if there was no way to make them faster, and
> having unspecified, nearly duplicated type handling code in multiple
> places is likely one reason why nobody took time to really make them
> faster.

I agree!

> Regarding ufunc combination caching, couldn't we do the caching on
> demand ? I am not sure how you arrived at a 600 bytes per ufunc, but
> in many real world use cases, I would suspect only a few combinations
> would be used.

600 bytes is for an implementation that just kept a table like
  chosen_ufunc_offset = np.empty((24, 24), dtype=uint8)  # 576 bytes
and looked up the proper ufunc loop by doing
  ufunc_loops[chosen_ufunc_offset[left_arg_typenum, right_arg_typenum]]
I suspect we could pre-fill such tables extremely quickly (basically
just fill in all the exact matches, and then do a flood-fill along the
can_cast graph), or we could fill them on-demand. (Also even 24 * 24
is currently an over-estimate since some of those 24 types are
parametrized, and currently ufuncs can't handle parametrized types,
but hopefully that will get fixed at some point.)

The numpy main namespace and scipy.special together contain only 35 +
47 = 82 ufuncs that take 2 arguments[1], so loading those two modules
using this scheme would add a total of *50 kilobytes* to numpy's
memory overhead...

(Interestingly, scipy.special does include 48 three-argument ufuncs,
15 four-argument ufuncs, and 6 five-argument ufuncs, which obviously
cannot use a table lookup scheme. Maybe we can add a check for
symmetry -- if all the loops are defined on matching types, like
"dd->d" and "ff->f", then really it's a one-dimensional lookup problem
-- find a common type for the inputs ("d" or "f") and then find the
best loop for that one type.)

Another option, like you suggest (?), would be to keep a little
fixed-size LRU cache for each ufunc and do lookups in it by linear
search. It's hard to know how many different types get used in
real-world programs, though, and I'd worry about performance falling
off a cliff as soon as someone tweaked their inner loop so it used 6
different (type1, type2) combinations instead of 5 or whatever (maybe
in one place they do int * float and in another float * int, etc.).

Anyway the point is yes, this particular thing is eminently fixable.

> Scalar arrays are ones of the most esoteric feature of numpy, and a
> fairly complex one in terms of implementation. Getting rid of it would
> be a net plus on that side. Of course, there is the issue of backward
> compatibility, whose extend is hard to assess.

You mean array scalars, not scalar arrays, right?[2]


[1] len([v for v in np.__dict__.values() if isinstance(v, np.ufunc)
and v.nin == 2])
[2] The fact that this sentence means something is certainly evidence
for... something.

More information about the NumPy-Discussion mailing list