[Numpy-tickets] [NumPy] #517: Mean, median, and std dev need optimization in numpy

NumPy numpy-tickets@scipy....
Thu May 10 15:30:59 CDT 2007


#517: Mean, median, and std dev need optimization in numpy
-----------------------+----------------------------------------------------
 Reporter:  cwarner62  |       Owner:  somebody
     Type:  defect     |      Status:  new     
 Priority:  normal     |   Milestone:          
Component:  Other      |     Version:  none    
 Severity:  normal     |    Keywords:          
-----------------------+----------------------------------------------------
 I have been using numpy/scipy while writing software to deal with sets of
 large (2048x2048) astronomical images and have discovered that the built-
 in mean, median, and std methods for numpy are very slow -- much slower
 than calculating those quantities by hand.  There is no reason for this
 and I am not sure why they were written in such a way as to be so slow but
 for instance m = reduce(add,z)/z.shape[0] is nearly twice as fast as m =
 z.mean(0).  See below for full results from some time tests I ran:

 >>> z.shape
 (10, 2048, 2048)
 >>> z
 array([[[ 186.,  154.,  134., ...,  385.,  359.,  251.],
         [ 232.,  208.,  210., ...,  344.,  333.,  220.],
         [ 236.,  208.,  226., ...,  330.,  324.,  226.],
         ...,
         [ 428.,  400.,  475., ...,  334.,  304.,  371.],
         [ 447.,  488.,  455., ...,  296.,  285.,  340.],
         [ 495.,  480.,  533., ...,  236.,  226.,  275.]],

        [[ 138.,  119.,  111., ...,  309.,  278.,  189.],
         [ 185.,  168.,  183., ...,  289.,  269.,  176.],
         [ 212.,  175.,  188., ...,  280.,  282.,  188.],
         ...,
         [ 360.,  346.,  398., ...,  277.,  246.,  275.],
         [ 362.,  407.,  378., ...,  258.,  233.,  241.],
         [ 382.,  381.,  424., ...,  176.,  175.,  209.]],

        [[ 136.,  117.,  106., ...,  304.,  268.,  182.],
         [ 174.,  169.,  180., ...,  285.,  275.,  169.],
         [ 199.,  167.,  183., ...,  280.,  278.,  188.],
         ...,
         [ 352.,  349.,  398., ...,  279.,  245.,  267.],
         [ 367.,  410.,  378., ...,  250.,  227.,  251.],
         [ 379.,  373.,  418., ...,  188.,  186.,  200.]],

        ...,
        [[ 144.,  115.,  101., ...,  304.,  274.,  193.],
         [ 183.,  168.,  173., ...,  285.,  274.,  171.],
         [ 209.,  169.,  184., ...,  273.,  278.,  182.],
         ...,
         [ 356.,  346.,  395., ...,  272.,  244.,  264.],
         [ 361.,  412.,  377., ...,  244.,  235.,  243.],
         [ 384.,  378.,  428., ...,  194.,  186.,  208.]],

        [[ 133.,  117.,  107., ...,  295.,  273.,  183.],
         [ 180.,  167.,  185., ...,  286.,  273.,  174.],
         [ 199.,  168.,  189., ...,  277.,  285.,  189.],
         ...,
         [ 362.,  344.,  399., ...,  283.,  243.,  262.],
         [ 362.,  408.,  383., ...,  249.,  227.,  242.],
         [ 377.,  376.,  421., ...,  181.,  175.,  204.]],

        [[ 139.,  121.,  105., ...,  309.,  273.,  189.],
         [ 177.,  174.,  181., ...,  289.,  280.,  177.],
         [ 209.,  177.,  185., ...,  273.,  273.,  183.],
         ...,
         [ 353.,  339.,  396., ...,  264.,  248.,  266.],
         [ 367.,  414.,  386., ...,  246.,  228.,  249.],
         [ 387.,  373.,  423., ...,  183.,  178.,  202.]]])

 1. Mean is slower than reduce(add)
 >>> t = time.time(); m = z.mean(0); print time.time()-t
 1.68424701691
 >>> m
 array([[ 142.4,  121.9,  108.8, ...,  311.9,  279.9,  193.2],
        [ 184.4,  173.9,  181.9, ...,  293.1,  280.4,  179.8],
        [ 206.5,  174.1,  190.9, ...,  284.2,  284.7,  190.7],
        ...,
        [ 365.2,  349.9,  406.3, ...,  279.7,  255.3,  276.4],
        [ 373. ,  418.6,  389.1, ...,  254.1,  237.7,  252.9],
        [ 392.3,  385.4,  433.4, ...,  190.3,  183.3,  209.8]])
 >>> t = time.time(); m = reduce(add,z)/z.shape[0]; print time.time()-t
 1.00891780853
 >>> m
 array([[ 142.4,  121.9,  108.8, ...,  311.9,  279.9,  193.2],
        [ 184.4,  173.9,  181.9, ...,  293.1,  280.4,  179.8],
        [ 206.5,  174.1,  190.9, ...,  284.2,  284.7,  190.7],
        ...,
        [ 365.2,  349.9,  406.3, ...,  279.7,  255.3,  276.4],
        [ 373. ,  418.6,  389.1, ...,  254.1,  237.7,  252.9],
        [ 392.3,  385.4,  433.4, ...,  190.3,  183.3,  209.8]])

 2. Std Dev is much slower than calculating by hand:
 >>> t = time.time(); s = z.std(0); print time.time()-t
 16.9463210106
 >>> s
 array([[ 14.90771612,  11.42322196,   8.87468309, ...,  24.65542537,
          26.60996054,  19.55914109],
        [ 16.13815355,  11.81058847,  10.22203502, ...,  17.3634674 ,
          18.25486237,  13.99142595],
        [ 10.92931837,  11.86970935,  12.16100325, ...,  16.22837022,
          13.55027675,  12.09173271],
        ...,
        [ 21.27345764,  16.87275911,  23.2467202 , ...,  18.83640093,
          17.51028269,  31.8031445 ],
        [ 24.95195383,  23.26886332,  22.26858774, ...,  14.54269576,
          16.81695573,  29.48711583],
        [ 34.40363353,  31.65185619,  33.29324256, ...,  16.14341971,
          15.36912489,  22.18017132]])
 >>> t = time.time(); m = reduce(add,z)/z.shape[0]; s =
 sqrt(reduce(add,z*z)*(1./z.shape[0])-m*m); print time.time()-t
 2.93472313881
 >>> s
 array([[ 14.90771612,  11.42322196,   8.87468309, ...,  24.65542537,
          26.60996054,  19.55914109],
        [ 16.13815355,  11.81058847,  10.22203502, ...,  17.3634674 ,
          18.25486237,  13.99142595],
        [ 10.92931837,  11.86970935,  12.16100325, ...,  16.22837022,
          13.55027675,  12.09173271],
        ...,
        [ 21.27345764,  16.87275911,  23.2467202 , ...,  18.83640093,
          17.51028269,  31.8031445 ],
        [ 24.95195383,  23.26886332,  22.26858774, ...,  14.54269576,
          16.81695573,  29.48711583],
        [ 34.40363353,  31.65185619,  33.29324256, ...,  16.14341971,
          15.36912489,  22.18017132]])

 3. Median slower than sorting and taking middle value
 >>> t = time.time(); m = median(z); print time.time()-t
 18.8376431465
 >>> m
 array([[ 138. ,  120. ,  106.5, ...,  304. ,  273. ,  188. ],
        [ 179. ,  169. ,  180.5, ...,  287. ,  274.5,  176.5],
        [ 202. ,  169.5,  188. , ...,  280. ,  281. ,  188. ],
        ...,
        [ 360.5,  344. ,  398.5, ...,  274.5,  248. ,  265. ],
        [ 366.5,  410.5,  381.5, ...,  249.5,  233.5,  242.5],
        [ 382. ,  374.5,  422.5, ...,  187. ,  179. ,  203. ]])

 >>> from arraymedian import *
 >>> t = time.time(); m = arraymedian(z,axis="Y"); print time.time()-t
 11.0808348656
 >>> m
 array([[ 138. ,  120. ,  106.5, ...,  304. ,  273. ,  188. ],
        [ 179. ,  169. ,  180.5, ...,  287. ,  274.5,  176.5],
        [ 202. ,  169.5,  188. , ...,  280. ,  281. ,  188. ],
        ...,
        [ 360.5,  344. ,  398.5, ...,  274.5,  248. ,  265. ],
        [ 366.5,  410.5,  381.5, ...,  249.5,  233.5,  242.5],
        [ 382. ,  374.5,  422.5, ...,  187. ,  179. ,  203. ]])


 My arraymedian method simply sorts and takes the middle value.  I can send
 it to you if you wish.  I have also noticed that doing this is still a
 factor of 2.5 slower than IDL's median method.  Perhaps the built-in sort
 is not optimized to do a quick sort?  Thanks.

 Craig

 warner@astro.ufl.edu

-- 
Ticket URL: <http://projects.scipy.org/scipy/numpy/ticket/517>
NumPy <http://projects.scipy.org/scipy/numpy>
The fundamental package needed for scientific computing with Python.


More information about the Numpy-tickets mailing list