[Numpy-discussion] Byte aligned arrays
Wed Dec 19 10:47:25 CST 2012
On Wed, 2012-12-19 at 15:57 +0000, Nathaniel Smith wrote:
> Not sure which interface is more useful to users. On the one hand,
> using funny dtypes makes regular non-SIMD access more cumbersome, and
> it forces your array size to be a multiple of the SIMD word size,
> which might be inconvenient if your code is smart enough to handle
> arbitrary-sized arrays with partial SIMD acceleration (i.e., using
> SIMD for most of the array, and then a slow path to handle any partial
> word at the end). OTOH, if your code *is* that smart, you should
> probably just make it smart enough to handle a partial word at the
> beginning as well and then you won't need any special alignment in the
> first place, and representing each SIMD word as a single numpy scalar
> is an intuitively appealing model of how SIMD works. OTOOH, just
> adding a single argument np.array() is a much simpler to explain than
> some elaborate scheme involving the creation of special custom dtypes.
If it helps, my use-case is in wrapping the FFTW library. This _is_
smart enough to deal with unaligned arrays, but it just results in a
performance penalty. In the case of an FFT, there are clearly going to
be issues with the powers of two indices in the array not lying on a
suitable n-byte boundary (which would be the case with a misaligned
array), but I imagine it's not unique.
The other point is that it's easy to create a suitable power of two
array that should always bypass any special case unaligned code (e.g.
with floats, any multiple of 4 array length will fill every 16-byte
Finally, I think there is significant value in auto-aligning the array
based on an appropriate inspection of the cpu capabilities (or
alternatively, a function that reports back the appropriate SIMD
alignment). Again, this makes it easier to wrap libraries that may
function with any alignment, but benefit from optimum alignment.
More information about the NumPy-Discussion