[Numpy-tickets] [NumPy] #517: Mean, median, and std dev need optimization in numpy
NumPy
numpy-tickets@scipy....
Thu May 10 15:30:59 CDT 2007
#517: Mean, median, and std dev need optimization in numpy
-----------------------+----------------------------------------------------
Reporter: cwarner62 | Owner: somebody
Type: defect | Status: new
Priority: normal | Milestone:
Component: Other | Version: none
Severity: normal | Keywords:
-----------------------+----------------------------------------------------
I have been using numpy/scipy while writing software to deal with sets of
large (2048x2048) astronomical images and have discovered that the built-
in mean, median, and std methods for numpy are very slow -- much slower
than calculating those quantities by hand. There is no reason for this
and I am not sure why they were written in such a way as to be so slow but
for instance m = reduce(add,z)/z.shape[0] is nearly twice as fast as m =
z.mean(0). See below for full results from some time tests I ran:
>>> z.shape
(10, 2048, 2048)
>>> z
array([[[ 186., 154., 134., ..., 385., 359., 251.],
[ 232., 208., 210., ..., 344., 333., 220.],
[ 236., 208., 226., ..., 330., 324., 226.],
...,
[ 428., 400., 475., ..., 334., 304., 371.],
[ 447., 488., 455., ..., 296., 285., 340.],
[ 495., 480., 533., ..., 236., 226., 275.]],
[[ 138., 119., 111., ..., 309., 278., 189.],
[ 185., 168., 183., ..., 289., 269., 176.],
[ 212., 175., 188., ..., 280., 282., 188.],
...,
[ 360., 346., 398., ..., 277., 246., 275.],
[ 362., 407., 378., ..., 258., 233., 241.],
[ 382., 381., 424., ..., 176., 175., 209.]],
[[ 136., 117., 106., ..., 304., 268., 182.],
[ 174., 169., 180., ..., 285., 275., 169.],
[ 199., 167., 183., ..., 280., 278., 188.],
...,
[ 352., 349., 398., ..., 279., 245., 267.],
[ 367., 410., 378., ..., 250., 227., 251.],
[ 379., 373., 418., ..., 188., 186., 200.]],
...,
[[ 144., 115., 101., ..., 304., 274., 193.],
[ 183., 168., 173., ..., 285., 274., 171.],
[ 209., 169., 184., ..., 273., 278., 182.],
...,
[ 356., 346., 395., ..., 272., 244., 264.],
[ 361., 412., 377., ..., 244., 235., 243.],
[ 384., 378., 428., ..., 194., 186., 208.]],
[[ 133., 117., 107., ..., 295., 273., 183.],
[ 180., 167., 185., ..., 286., 273., 174.],
[ 199., 168., 189., ..., 277., 285., 189.],
...,
[ 362., 344., 399., ..., 283., 243., 262.],
[ 362., 408., 383., ..., 249., 227., 242.],
[ 377., 376., 421., ..., 181., 175., 204.]],
[[ 139., 121., 105., ..., 309., 273., 189.],
[ 177., 174., 181., ..., 289., 280., 177.],
[ 209., 177., 185., ..., 273., 273., 183.],
...,
[ 353., 339., 396., ..., 264., 248., 266.],
[ 367., 414., 386., ..., 246., 228., 249.],
[ 387., 373., 423., ..., 183., 178., 202.]]])
1. Mean is slower than reduce(add)
>>> t = time.time(); m = z.mean(0); print time.time()-t
1.68424701691
>>> m
array([[ 142.4, 121.9, 108.8, ..., 311.9, 279.9, 193.2],
[ 184.4, 173.9, 181.9, ..., 293.1, 280.4, 179.8],
[ 206.5, 174.1, 190.9, ..., 284.2, 284.7, 190.7],
...,
[ 365.2, 349.9, 406.3, ..., 279.7, 255.3, 276.4],
[ 373. , 418.6, 389.1, ..., 254.1, 237.7, 252.9],
[ 392.3, 385.4, 433.4, ..., 190.3, 183.3, 209.8]])
>>> t = time.time(); m = reduce(add,z)/z.shape[0]; print time.time()-t
1.00891780853
>>> m
array([[ 142.4, 121.9, 108.8, ..., 311.9, 279.9, 193.2],
[ 184.4, 173.9, 181.9, ..., 293.1, 280.4, 179.8],
[ 206.5, 174.1, 190.9, ..., 284.2, 284.7, 190.7],
...,
[ 365.2, 349.9, 406.3, ..., 279.7, 255.3, 276.4],
[ 373. , 418.6, 389.1, ..., 254.1, 237.7, 252.9],
[ 392.3, 385.4, 433.4, ..., 190.3, 183.3, 209.8]])
2. Std Dev is much slower than calculating by hand:
>>> t = time.time(); s = z.std(0); print time.time()-t
16.9463210106
>>> s
array([[ 14.90771612, 11.42322196, 8.87468309, ..., 24.65542537,
26.60996054, 19.55914109],
[ 16.13815355, 11.81058847, 10.22203502, ..., 17.3634674 ,
18.25486237, 13.99142595],
[ 10.92931837, 11.86970935, 12.16100325, ..., 16.22837022,
13.55027675, 12.09173271],
...,
[ 21.27345764, 16.87275911, 23.2467202 , ..., 18.83640093,
17.51028269, 31.8031445 ],
[ 24.95195383, 23.26886332, 22.26858774, ..., 14.54269576,
16.81695573, 29.48711583],
[ 34.40363353, 31.65185619, 33.29324256, ..., 16.14341971,
15.36912489, 22.18017132]])
>>> t = time.time(); m = reduce(add,z)/z.shape[0]; s =
sqrt(reduce(add,z*z)*(1./z.shape[0])-m*m); print time.time()-t
2.93472313881
>>> s
array([[ 14.90771612, 11.42322196, 8.87468309, ..., 24.65542537,
26.60996054, 19.55914109],
[ 16.13815355, 11.81058847, 10.22203502, ..., 17.3634674 ,
18.25486237, 13.99142595],
[ 10.92931837, 11.86970935, 12.16100325, ..., 16.22837022,
13.55027675, 12.09173271],
...,
[ 21.27345764, 16.87275911, 23.2467202 , ..., 18.83640093,
17.51028269, 31.8031445 ],
[ 24.95195383, 23.26886332, 22.26858774, ..., 14.54269576,
16.81695573, 29.48711583],
[ 34.40363353, 31.65185619, 33.29324256, ..., 16.14341971,
15.36912489, 22.18017132]])
3. Median slower than sorting and taking middle value
>>> t = time.time(); m = median(z); print time.time()-t
18.8376431465
>>> m
array([[ 138. , 120. , 106.5, ..., 304. , 273. , 188. ],
[ 179. , 169. , 180.5, ..., 287. , 274.5, 176.5],
[ 202. , 169.5, 188. , ..., 280. , 281. , 188. ],
...,
[ 360.5, 344. , 398.5, ..., 274.5, 248. , 265. ],
[ 366.5, 410.5, 381.5, ..., 249.5, 233.5, 242.5],
[ 382. , 374.5, 422.5, ..., 187. , 179. , 203. ]])
>>> from arraymedian import *
>>> t = time.time(); m = arraymedian(z,axis="Y"); print time.time()-t
11.0808348656
>>> m
array([[ 138. , 120. , 106.5, ..., 304. , 273. , 188. ],
[ 179. , 169. , 180.5, ..., 287. , 274.5, 176.5],
[ 202. , 169.5, 188. , ..., 280. , 281. , 188. ],
...,
[ 360.5, 344. , 398.5, ..., 274.5, 248. , 265. ],
[ 366.5, 410.5, 381.5, ..., 249.5, 233.5, 242.5],
[ 382. , 374.5, 422.5, ..., 187. , 179. , 203. ]])
My arraymedian method simply sorts and takes the middle value. I can send
it to you if you wish. I have also noticed that doing this is still a
factor of 2.5 slower than IDL's median method. Perhaps the built-in sort
is not optimized to do a quick sort? Thanks.
Craig
warner@astro.ufl.edu
--
Ticket URL: <http://projects.scipy.org/scipy/numpy/ticket/517>
NumPy <http://projects.scipy.org/scipy/numpy>
The fundamental package needed for scientific computing with Python.
More information about the Numpy-tickets
mailing list