[Numpy-discussion] object array alignment issues

Sturla Molden sturla@molden...
Fri Oct 16 11:05:05 CDT 2009

Francesc Alted skrev:
> The response is clear: avoid memcpy() if you can.  It is true that memcpy() 
> performance has improved quite a lot in latest gcc (it has been quite good in 
> Win versions since many years ago), but working with data in-place (i.e. 
> avoiding a memory copy) is always faster (and most specially for large arrays 
> that don't fit in cache processors).
> My own experiments says that, with an Intel Core2 processor the typical speed-
> ups for avoiding memcpy() are 2x. 
If the underlying array is strided, I have seen the opposite as well. 
"Copy-in copy-out" is a common optimization used by Fortran compilers 
when working with strided arrays. The catch is that the work array has 
to fit in cache for this to make any sence. Anyhow, you cannot use 
memcpy for this kind of optimization - it assumes both buffers are 
contiguous. But working with arrays directly instead of copies is not 
always the faster option.


>  And I've read somewhere that both AMD and 
> Intel are trying to make unaligned operations to go even faster in next 
> architectures (the goal is that there should be no speed difference in 
> accessing aligned or unaligned data).
>> I believe the memcpy approach is used for other unaligned parts of void
>> types. There is an inherent performance penalty there, but I don't see how
>> it can be avoided when using what are essentially packed structures. As to
>> memcpy, it's performance seems to depend on the compiler/compiler version,
>> old versions of gcc had *horrible* implementations of memcpy. I believe the
>> situation has since improved. However, I'm not sure we should be coding to
>> compiler issues unless it is unavoidable or the gain is huge.
> IMO, NumPy can be improved for unaligned data handling.  For example, Numexpr 
> is using this small snippet:
> from cpuinfo import cpu
> if cpu.is_AMD() or cpu.is_Intel():
>     is_cpu_amd_intel = True
> else:
>     is_cpu_amd_intel = False
> for detecting AMD/Intel architectures and allowing the code to avoid memcpy() 
> calls for the unaligned arrays.
> The above code uses the excellent ``cpuinfo.py`` module from Pearu Peterson, 
> which is distributed under NumPy, so it should not be too difficult to take 
> advantage of this for avoiding unnecessary copies in this scenario.

More information about the NumPy-Discussion mailing list