[Numpy-discussion] Speed performance on array constant set
Travis Oliphant
oliphant.travis at ieee.org
Fri Jan 20 08:42:01 CST 2006
Mark Heslep wrote:
> Travis Oliphant wrote:
>
>> This is actually a bit surprising that opencv can create and fill so
>> quickly. Perhaps they are using optimized SSE functions for the
>> Intel platform, or something?
>> -Travis
>>
> Ah, sorry, Im an unintentional fraud. Yes I have Intel's optimization
> library IPP turned on and had forgotten about it. So one more time:
>
> With IPP on as before. UseOptimized = # of Cv functions available w/
> IPP
>
>> python -m timeit -s "import opencv.cv as cv; print
>> cv.cvUseOptimized(1); im =cv.cvCreateImage(cv.cvSize(1000,1000), 8,
>> 1)" "cv.cvSet( im, cv.cvRealScalar( 7 ) )"
>> 305
>> 305
>> 305
>> 305
>> 305
>> 100 loops, best of 3: 2.24 msec per loop
>
>
> And without:
>
>> python -m timeit -s "import opencv.cv as cv; print
>> cv.cvUseOptimized(0); im =cv.cvCreateImage(cv.cvSize(1000,1000), 8,
>> 1)" "cv.cvSet( im, cv.cvRealScalar( 7 ) )"
>> 0
>> 0
>> 0
>> 0
>> 0
>> 100 loops, best of 3: 6.94 msec per loop
>
>
> So IPP gives me 3X, which leads me to ask about plans for IPP / SSE
> for NumPy, no offense intended to non Intel users. I believe I recall
> some post that auto code generation in NumArray was the road block?
There was some talk of using liboil for this (similar to what _dotblas
does). There could definitely be some gains. I don't see any road
block other than time and effort....
With my own tests of a ctypes-wrapped function that just mallocs memory
and fills it, I put numpy at about 3x slower than that function.
The scalar fill function of numpy could definitely be made faster.
Right now, it's still pretty generic.
-Travis
More information about the Numpy-discussion
mailing list