[Numpy-discussion] ndarray and lazy evaluation

James Bergstra bergstrj@iro.umontreal...
Mon Feb 20 13:26:43 CST 2012


On Mon, Feb 20, 2012 at 1:01 PM, James Bergstra <james.bergstra@gmail.com>wrote:

> On Mon, Feb 20, 2012 at 12:28 PM, Francesc Alted <francesc@continuum.io>wrote:
>
>> On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote:
>> > You need at least a slightly different Python API to get anywhere, so
>> > numexpr/Theano is the right place to work on an implementation of this
>> > idea. Of course it would be nice if numexpr/Theano offered something as
>> > convenient as
>> >
>> > with lazy:
>> >     arr = A + B + C # with all of these NumPy arrays
>> > # compute upon exiting…
>>
>> Hmm, that would be cute indeed.  Do you have an idea on how the code in
>> the with context could be passed to the Python AST compiler (à la
>> numexpr.evaluate("A + B + C"))?
>>
>>
> The biggest problem with the numexpr approach (e.g. evaluate("A + B + C"))
> whether the programmer has to type the quotes or not, is that the
> sub-program has to be completely expressed in the sub-language.
>
> If I write
>
> >>> def f(x): return x[:3]
> >>> numexpr.evaluate("A + B + f(C)")
>
> I would like that to be fast, but it's not obvious at all how that would
> work. We would be asking numexpr to introspect arbitrary callable python
> objects, and recompile arbitrary Python code, effectively setting up the
> expectation in the user's mind that numexpr is re-implementing an entire
> compiler. That can be fast obviously, but it seems to me to represent
> significant departure from numpy's focus, which I always thought was the
> data-container rather than the expression evaluation (though maybe this
> firestorm of discussion is aimed at changing this?)
>
> Theano went with another option which was to replace the A, B, and C
> variables with objects that have a modified __add__. Theano's back-end can
> be slow at times and the codebase can feel like a heavy dependency, but my
> feeling is still that this is a great approach to getting really fast
> implementations of compound expressions.
>
> The context syntax you suggest using is a little ambiguous in that the
> indented block of a with statement block includes *statements* whereas what
> you mean to build in the indented block is a *single expression* graph.
>  You could maybe get the right effect with something like
>
> A, B, C = np.random.rand(3, 5)
>
> expr = np.compound_expression()
> with np.expression_builder(expr) as foo:
>    arr = A + B + C
>    brr = A + B * C
>    foo.return((arr, brr))
>
> # compute arr and brr as quickly as possible
> a, b = expr.run()
>
> # modify one of the arrays that the expression was compiled to use
> A[:] += 1
>
> # re-run the compiled expression on the new value
> a, b = expr.run()
>
> - JB
>

I should add that the biggest benefit of expressing things as compound
expressions in this way is not in saving temporaries (though that is nice)
it's being able to express enough computation work at a time that it
offsets the time required to ship the arguments off to a GPU for
evaluation!  This has been a *huge* win reaped by the Theano approach, it
works really well.  The abstraction boundary offered by this sort of
expression graph has been really effective.

This speaks even more to the importance of distinguishing between the data
container (e.g. numpy, Theano's internal ones, PyOpenCL's one, PyCUDA's
one) and the expression compilation and evaluation infrastructures (e.g.
Theano, numexpr, cython).  The goal should be as much as possible to
separate these two so that programs can be expressed in a natural way, and
then evaluated using containers that are suited to the program.

- JB

-- 
James Bergstra, Ph.D.
Research Scientist
Rowland Institute, Harvard University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20120220/e4feab34/attachment.html 


More information about the NumPy-Discussion mailing list