[Numpy-discussion] low level optimization in NumPy and minivect
Wed Jun 19 06:45:42 CDT 2013
On Wed, Jun 19, 2013 at 1:43 AM, Frédéric Bastien <email@example.com> wrote:
> On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor
> <firstname.lastname@example.org> wrote:
>> On 17.06.2013 17:11, Frédéric Bastien wrote:
>> > Hi,
>> > I saw that recently Julian Taylor is doing many low level optimization
>> > like using SSE instruction. I think it is great.
>> > Last year, Mark Florisson released the minivect project that he
>> > worked on during is master thesis. minivect is a compiler for
>> > element-wise expression that do some of the same low level optimization
>> > that Julian is doing in NumPy right now.
>> > Mark did minivect in a way that allow it to be reused by other project.
>> > It is used now by Cython and Numba I think. I had plan to reuse it in
>> > Theano, but I didn't got the time to integrate it up to now.
>> > What about reusing it in NumPy? I think that some of Julian optimization
>> > aren't in minivect (I didn't check to confirm). But from I heard,
>> > minivect don't implement reduction and there is a pull request to
>> > optimize this in NumPy.
>> what I vectorized is just the really easy cases of unit stride
>> continuous operations, so the min/max reductions which is now in numpy
>> is in essence pretty trivial.
>> minivect goes much further in optimizing general strided access and
>> broadcasting via loop optimizations (it seems to have a lot of overlap
>> with the graphite loop optimizer available in GCC ) so my code is
>> probably not of very much use to minivect.
>> The most interesting part in minivect for numpy is probably the
>> optimization of broadcasting loops which seem to be pretty inefficient
>> in numpy .
>> Concerning the rest I'm not sure how much of a bottleneck general
>> strided operations really are in common numpy using code.
>> I guess a similar discussion about adding an expression compiler to
>> numpy has already happened when numexpr was released?
>> If yes what was the outcome of that?
> I don't recall a discussion when numexpr was done as this is before I read
> this list. numexpr do optimization that can't be done by NumPy: fusing
> element-wise operation in one call. So I don't see how it could be done to
> reuse it in NumPy.
> You call your optimization trivial, but I don't. In the git log of NumPy,
> the first commit is in 2001. It is the first time someone do this in 12
> years! Also, this give 1.5-8x speed up (from memory from your PR
> description). This is not negligible. But how much time did you spend on
> them? Also, some of them are processor dependent, how many people in this
> list already have done this? I suppose not many.
> Yes, your optimization don't cover all cases that minivect do. I see 2 level
> of optimization. 1) The inner loop/contiguous cases, 2) the strided,
> broadcasted level. We don't need all optimization being done for them to be
> useful. Any of them are useful.
> So what I think is that we could reuse/share that work. NumPy have c code
> generator. They could call minivect code generator for some of them when
> compiling NumPy. This will make optimization done to those code generator
> reused by more people. For example, when new processor are launched, we will
> need only 1 place to change for many projects. Or for example, it the call
> to MKL vector library is done there, more people will benefit from it. Right
> now, only numexpr do it.
> About the level 2 optimization (strides, broadcast), I never read NumPy code
> that deal with that. Do someone that know it have an idea if it would be
> possible to reuse minivect for this?
Would someone be able to guide some of the numpy C experts into a room
to do some thinking / writing on this at the scipy conference?
I completely agree that these kind of optimizations and code sharing
seem likely to be very important for the future.
I'm not at the conference, but if there's anything I can do to help,
please someone let me know.
More information about the NumPy-Discussion