[NumPy-Tickets] [NumPy] #1901: Speed up np.copyto with 'where=', and any routines using nditer in its masked mode
NumPy Trac
numpy-tickets@scipy....
Mon Jul 11 14:57:31 CDT 2011
#1901: Speed up np.copyto with 'where=', and any routines using nditer in its
masked mode
-------------------------+--------------------------------------------------
Reporter: mwiebe | Owner: somebody
Type: enhancement | Status: new
Priority: normal | Milestone: Unscheduled
Component: Other | Version: devel
Keywords: |
-------------------------+--------------------------------------------------
As part of the missing value functionality, I have implemented a masked
data copying and casting mechanism. It has all the hooks in place to be
fast, but I have only implemented a slow wrapper around the unmasked
routines. One example where it currently performs poorly is filling based
on a mask:
{{{
In [1]: import numpy as np
In [2]: a = np.zeros((100,100,100))
In [3]: m = np.random.rand(100,100,100) > 0.5
In [4]: timeit np.copyto(a, 1, where=m)
100 loops, best of 3: 9.22 ms per loop
In [5]: timeit np.putmask(a, m, 1)
100 loops, best of 3: 6.02 ms per loop
}}}
To do this optimization,
1) Learn how dtype_transfer.c works, by reading its code and the
documentation in comments. In particular, the function
PyArray_GetDTypeTransferFunction is the main thing to understand. Its
arguments are documented in private/lowlevel_strided_loops.h.
2) Learn how lowlevel_stided_loops.c.src works, just as for 1). In
particular, understanding how zero strides, contiguous strides, and
alignment affect the nature of the inner loop functions.
3) Create specialized inner loops for various masked transfer functions.
Just specializing aligned data is probably ok. The important cases are
likely:
* contiguous src, dst, and mask
* zero stride src, contiguous dst and mask
* zero stride src, general strided dst and mask
This should be done for both straight data copies and cast operations.
4) Edit PyArray_GetMaskedDTypeTransferFunction to return these specialized
masked loops where appropriate, analogously to how
PyArray_GetDTypeTransferFunction does it.
5) Demonstrate that it's working with some before/after benchmarks of the
different cases.
--
Ticket URL: <http://projects.scipy.org/numpy/ticket/1901>
NumPy <http://projects.scipy.org/numpy>
My example project
More information about the NumPy-Tickets
mailing list