[Numpy-discussion] Numpy vs PIL in image statistics

David Cournapeau david@ar.media.kyoto-u.ac...
Thu May 28 03:40:04 CDT 2009


cp wrote:
>>> The image I tested initially is 2000x2000 RGB tif ~11mb in size.
>>>       
> I continued testing, with the initial PIL approach
> and 3 alternative numpy scripts:
>
> #Script 1 - indexing
> for i in range(10):
>     imarr[:,:,0].mean()
>     imarr[:,:,1].mean()
>     imarr[:,:,2].mean()
>
> #Script 2 - slicing
> for i in range(10):
>     imarr[:,:,0:1].mean()
>     imarr[:,:,1:2].mean()
>     imarr[:,:,2:3].mean()
>
> #Script 3 - reshape
> for i in range(10):
>     imarr.reshape(-1,3).mean(axis=0)
>
> #Script 4 - PIL
> for i in range(10):
>     stats = ImageStat.stat(img)
>     stats.mean
>
> After profiling the four scripts separately I got the following
> script 1: 5.432sec
> script 2: 10.234sec
> script 3: 4.980sec
> script 4: 0.741sec
>
> when I profiled scripts 1-3 without calculating the mean, I got similar
> results of about 0.45sec for 1000 cycles, meaning that even if there
> is a copy involved the time required is only a small fraction of the whole
> procedure.Getting back to my initial statement I cannot explain why PIL 
> is very fast in calculations for whole images, but very slow in
> calculations of small sub-images.
>   

I don't know anything about PIL and its implementation, but I would not
be surprised if the cost is mostly accessing items which are not
contiguous in memory and bounds checking ( to check where you are in the
subimage). Conditional inside loops often kills performances, and the
actual computation (one addition/item for naive average implementation)
is negligeable in this case.

cheers,

David


More information about the Numpy-discussion mailing list