Tutorial

  • 03/26/2021
  • Public Content

Improving Performance by Aligning Data

The vectorizer can generate faster code when operating on aligned data. In this activity you will improve the vectorizer performance by aligning the arrays
a
,
b
, and
c
in
driver.f90
on a 16-byte boundary so the vectorizer can use aligned load instructions for all arrays rather than the slower unaligned load instructions and can avoid runtime tests of alignment. Using the
ALIGNED
macro will insert an alignment directive for
a
,
b
, and
c
in
driver.f90
with the following syntax:
!dir$attributes align : 16 :: a,b,c
This instructs the compiler to create arrays that it are aligned on a 16-byte boundary, which should facilitate the use of SSE aligned load instructions.
In addition, the column height of the matrix a needs to be padded out to be a multiple of 16 bytes, so that each individual column of
a
maintains the same 16-byte alignment. In practice, maintaining a constant alignment between columns is much more important than aligning the start of the arrays.
To derive the maximum benefit from this alignment, we also need to tell the vectorizer it can safely assume that the arrays in
matvec.f90
are aligned by using the directive
!dir$ vector aligned
If you use
!dir$ vector aligned
, you must be sure that all the arrays or subarrays in the loop are 16-byte aligned. Otherwise, you may get a runtime error. Aligning data may still give a performance benefit even if
!dir$ vector aligned
is not used. See the code under the
ALIGNED
macro in
matvec.f90
If your compilation targets the Intel® AVX instruction set, you should try to align data on a 32-byte boundary. This may result in improved performance. In this case,
!dir$ vector aligned
advises the compiler that the data is 32-byte aligned.
Rebuild the program after adding the
ALIGNED
preprocessor definition to ensure consistently aligned data:
Fortran > Preprocessor > Preprocessor Definitions
Rebuild your project. The report will now reflect aligned access.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.