# [Numpy-discussion] NEP for faster ufuncs

Travis Oliphant oliphant@enthought....
Thu Dec 23 23:24:21 CST 2010

```This is very cool!

I would like to see this get into NumPy 2.0.

Thanks for all the great work!

-Travis

On Dec 21, 2010, at 6:53 PM, Mark Wiebe wrote:

> Hello NumPy-ers,
>
> After some performance analysis, I've designed and implemented a new iterator designed to speed up ufuncs and allow for easier multi-dimensional iteration.  The new code is fairly large, but works quite well already.  If some people could read the NEP and give some feedback, that would be great!  Here's a link:
>
>
> I would also love it if someone could try building the code and play around with it a bit.  The github branch is here:
>
>
> To give a taste of the iterator's functionality, below is an example from the NEP for how to implement a "Lambda UFunc."  With just a few lines of code, it's possible to replicate something similar to the numexpr library (numexpr still gets a bigger speedup, though).  In the example expression I chose, execution time went from 138ms to 61ms.
>
> Hopefully this is a good Christmas present for NumPy. :)
>
> Cheers,
> Mark
>
> Here is the definition of the ``luf`` function.::
>
>     def luf(lamdaexpr, *args, **kwargs):
>         """Lambda UFunc
>
>             e.g.
>             c = luf(lambda i,j:i+j, a, b, order='K',
>                                 casting='safe', buffersize=8192)
>
>             c = np.empty(...)
>             luf(lambda i,j:i+j, a, b, out=c, order='K',
>                                 casting='safe', buffersize=8192)
>         """
>
>         nargs = len(args)
>         op = args + (kwargs.get('out',None),)
>         it = np.newiter(op, ['buffered','no_inner_iteration'],
>                 order=kwargs.get('order','K'),
>                 casting=kwargs.get('casting','safe'),
>                 buffersize=kwargs.get('buffersize',0))
>         while not it.finished:
>             it[-1] = lamdaexpr(*it[:-1])
>             it.iternext()
>
>         return it.operands[-1]
>
> Then, by using ``luf`` instead of straight Python expressions, we
> can gain some performance from better cache behavior.::
>
>     In [2]: a = np.random.random((50,50,50,10))
>     In [3]: b = np.random.random((50,50,1,10))
>     In [4]: c = np.random.random((50,50,50,1))
>
>     In [5]: timeit 3*a+b-(a/c)
>     1 loops, best of 3: 138 ms per loop
>
>     In [6]: timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c)
>     10 loops, best of 3: 60.9 ms per loop
>
>     In [7]: np.all(3*a+b-(a/c) == luf(lambda a,b,c:3*a+b-(a/c), a, b, c))
>     Out[7]: True
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

---
Travis Oliphant
Enthought, Inc.
oliphant@enthought.com
1-512-536-1057
http://www.enthought.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20101223/352ea412/attachment.html
```