[SciPy-Dev] GSoC -- numpy interactions with sparse matrices
Pauli Virtanen
pav@iki...
Tue Jul 2 15:15:18 CDT 2013
(cc'd Numpy list --- the discussion is about making np.multiply(A, B) et
al. do sensible rather than non-sensible things if A or B is a sparse
matrix)
Hi Blake,
02.07.2013 06:40, Blake Griffith kirjoitti:
> Today marks the first day of the next phase of my GSoC proposal, numpy
> interactions and overriding ufuncs for sparse matrices.
>
> I've been thinking about how I will handle the ufunc interactions. I'm
> still considering how to implement a ufunc override mechanism. However I
> think the existing functionality in numpy, (__array_prepare__,
> __array_finalize__, etc) would be enough for a few cases:
I think using __array_prepare__ and __array_wrap__ et al. for this will
run into difficulties:
- array_prepare requires that the strides and dimensions stay the same
as in the "original array".
- Sparse matrices are not subclasses of ndarrays, so I think they are
considered 0-dim (or NxN?) object arrays by the ufuncs.
This then wreaks all sorts of havoc.
The issue is sort of that the whole wrapping mechanism is written under
the assumption that the data is still contained in an ndarray stored in
memory in a strided layout with the correct shape etc., which is not
true for sparse matrices.
You can perhaps check the above for yourself, in case I'm mistaken.
However, as far as I know, the facilities in Numpy don't currently allow
getting sensible behavior out.
***
One option is to try to make array_prepare to allow returning arbitrary
arrays.
I think this approach has a number of problems:
- It leads to an unnatural way to write the code --- you are trying to
trick the chosen ufunc to do what you want, rather than coding up
straightforwardly the operation that should give the result.
- It seems somewhat a nasty job to do: the ufunc loop selection
(which IIRC can do stuff like optimizing iteration order etc)
runs *before* array_prepare, so you cannot change the memory layout
any more...
So I think extending this mechanism does not sound very promising...
***
The simplest approach seems to me to add a way to completely override
the ufuncs immediately on call.
The logic could be of the style: On entry to ufunc:
- if an input has __array_priority__, check if it also has
__ufunc_override__
- if some of the inputs have __ufunc_override__, choose the one with the
maximum __array_priority__
- call it, passing it the arguments of the ufuncs, and any keywords (in
a dict?)
- if it returns NotImplemented, try the next-highest priority one
(or just give up)
It's perhaps easiest to prototype the solution first in pure Python,
following Nathaniel's example:
https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py
I.e., monkeypatch all ufuncs in Numpy, and insert them also to the
numerical operations using set_numeric_ops. This should allow you to try
out different solutions before diving deeply into the Numpy source code.
Another things to consider: does this add significantly to the runtime
cost? I think not, because the ufunc machinery makes many checks for
__array_priority__, so adding one more attribute check will probably not
significantly hurt performance. But I think this needs to be measured at
the end.
So I would suggest the following starting procedure: re-check if the
array_prepare/array_wrap machinery can be used (I think not), and then
we need to spec for the ufunc override. Here, prototyping it out will
help, so I suggest you take a simple scheme how the override should work
(eg. as above), and then try to use it in scipy.sparse with a
pure-Python prototype monkeypatch implementation. This then hopefully
shows how workable the scheme is, and whether we need to consider some
other option.
The drawback here is that this would add yet another knob for the ufunc
vs. custom classes interaction for the users to fiddle with. On the
other, it seems that currently there is no way to do it.
***
The option C would be to add __array__ to sparse matrices, which would
make np.multiply and other routines just cast the sparse matrices to
dense ones. Or perhaps __array__ should raise an error?
This can also be considered if the ufunc override ends up being too radical.
--
Pauli Virtanen
More information about the SciPy-Dev
mailing list