[Numpy-discussion] Speed performance on array constant set

Travis Oliphant oliphant.travis at ieee.org
Mon Jan 23 20:42:02 CST 2006

Mark Heslep wrote:

> Travis Oliphant wrote:
>>> 1. numdata.h  example that works well for simple Cv structures  
>>> containing uniform types:  CvPoint2D32F  => struct { float x, float 
>>> y }.  Can I use a record array here, or is there some Numpy overhead 
>>> interlaced with fields?
>> You can use a record array, but for uniform types I don't know what 
>> the advantage is (except for perhaps named-field slicing---which may 
>> actually be faster I'm not sure) over a x2 float array.
> So that I can step through the array record by record and not field by 
> field.

Right, but you could do that using steps of length 2 if you want to 
manage it yourself (or use a complex data type).    Again, I'm not sure 
of the performance implications at this point as I haven't done enough 

>>> 2. numarray.h  Attempt to replace the main Cv Image structures 
>>> CvMat, IplImage.  Needs work. Some success but there's segfault or 
>>> two in there somewhere.
>> These can be ported fairly easily, I think (actually, the old Numeric 
>> typemaps would still work --- and work with Numarray), so the basic 
>> Numeric C-API still presents itself as the simplest way to support 
>> all the array packages.
> Well good to know.  Ive been proceeding directly from the NumArray 
> Users Manual 1.5 by Greenfield/Miller et al that describes the NA High 
> Level API as the fastest of the APIs available to NA.  I thought that 
> NA also took steps to insure that unnecessary mem copies were not 
> made, unlike Numeric?

Numeric doesn't make unnecessary copies either.  It depends on what you 
mean by necessary.  I think they may be referring to the fact that most 
people used ContiguousFromObject which always gave you contiguous memory 
from an array (copying if necessary), but if you learn how to access 
strided memory then FromObject was just fine (which would never copy). 

There are still some algorithms that don't deal with strided memory and 
so will make a copy.  Some of these can be changed, others would be more 
difficult.   It really becomes a question of cache-misses versus memory 
copy time, which I'm not sure how to optimize generally except by 
per-system experimentation like ATLAS does.


More information about the Numpy-discussion mailing list