Aligning data improves the performance of intrinsics. When using the Intel® Streaming SIMD Extensions (Intel® SSE) intrinsics, you should align data to 16 bytes in memory operations. Specifically, you must align
__m128objects as addresses passed to the
_mm_storeintrinsics. If you want to declare arrays of floats and treat them as
__m128objects by casting, you need to ensure that the float arrays are properly aligned.
__declspec(align)to direct the compiler to align data more strictly than it otherwise would. For example, a data object of type
intis allocated at a byte address which is a multiple of 4 by default. By using
__declspec(align), you can direct the compiler to instead use an address which is a multiple of 8, 16, or 32 (with the following restriction on IA-32 architecture: 16-byte addresses can be locally or statically allocated).
You can use this data alignment support as an advantage in optimizing cache line usage. By clustering small objects that are commonly used together into a
struct, and forcing the
structto be allocated at the beginning of a cache line, you can effectively guarantee that each object is loaded into the cache as soon as any one is accessed, resulting in a significant performance benefit.
For 16-byte alignment, you can use the macro
_MM_ALIGN16, which other compilers can support by including header files. This macro enables you to write portable code that does not rely on compiler support for