[NumPy-Tickets] [NumPy] #1901: Speed up np.copyto with 'where=', and any routines using nditer in its masked mode

NumPy Trac numpy-tickets@scipy....
Mon Jul 11 14:57:31 CDT 2011


#1901: Speed up np.copyto with 'where=', and any routines using nditer in its
masked mode
-------------------------+--------------------------------------------------
 Reporter:  mwiebe       |       Owner:  somebody   
     Type:  enhancement  |      Status:  new        
 Priority:  normal       |   Milestone:  Unscheduled
Component:  Other        |     Version:  devel      
 Keywords:               |  
-------------------------+--------------------------------------------------
 As part of the missing value functionality, I have implemented a masked
 data copying and casting mechanism. It has all the hooks in place to be
 fast, but I have only implemented a slow wrapper around the unmasked
 routines. One example where it currently performs poorly is filling based
 on a mask:

 {{{
 In [1]: import numpy as np
 In [2]: a = np.zeros((100,100,100))
 In [3]: m = np.random.rand(100,100,100) > 0.5

 In [4]: timeit np.copyto(a, 1, where=m)
 100 loops, best of 3: 9.22 ms per loop

 In [5]: timeit np.putmask(a, m, 1)
 100 loops, best of 3: 6.02 ms per loop
 }}}

 To do this optimization,

 1) Learn how dtype_transfer.c works, by reading its code and the
 documentation in comments. In particular, the function
 PyArray_GetDTypeTransferFunction is the main thing to understand. Its
 arguments are documented in private/lowlevel_strided_loops.h.

 2) Learn how lowlevel_stided_loops.c.src works, just as for 1). In
 particular, understanding how zero strides, contiguous strides, and
 alignment affect the nature of the inner loop functions.

 3) Create specialized inner loops for various masked transfer functions.
 Just specializing aligned data is probably ok. The important cases are
 likely:
   * contiguous src, dst, and mask
   * zero stride src, contiguous dst and mask
   * zero stride src, general strided dst and mask
 This should be done for both straight data copies and cast operations.

 4) Edit PyArray_GetMaskedDTypeTransferFunction to return these specialized
 masked loops where appropriate, analogously to how
 PyArray_GetDTypeTransferFunction does it.

 5) Demonstrate that it's working with some before/after benchmarks of the
 different cases.

-- 
Ticket URL: <http://projects.scipy.org/numpy/ticket/1901>
NumPy <http://projects.scipy.org/numpy>
My example project


More information about the NumPy-Tickets mailing list