This chapter covers topics in vectorization. Vectorization is a form of data-parallel programming where the processor performs the same operation simultaneously on N data elements of a vector (a one-dimensional array of scalar data objects such as floating point objects, integers, or double precision floating point objects).
Skill and knowledge of vectorization is absolutely ESSENTIAL to gain performance on the Intel® Xeon product family. Vectorization of an application can give as much as 8x (double precision) or 16x (single precision float) speedup in the perfect case. Your application may not reach these potential speedups, but what should be clear is that if your code has no vectorization then it will not use all the compute features available on the Intel® Xeon product family.
As a first step, it is essential to understand vectorization - what it is and how to use the vectorization report to determine where the compiler is able to vectorize your application. Use -qopt-report to generate a vectorization report. It is also important to know which sections of code the compiler cannot vectorize and why it cannot.
Since it is not realistic to expect the compiler to do all the work for vectorization, you need to know compiler pragmas and directives to assist the compiler with vectorization. One key technique to aid efficient vectorization is data alignment. This chapter describes how to control data alignment and assist the compiler to recognize aligned data.
Also in this chapter, pointer aliasing and its effects on optimization and vectorization is discussed, as are ways to tell the compiler that pointer arguments are not aliased.
Finally, this chapter also presents novel approaches to outer loop vectorization.
Note that OpenMP* 4.0 and subsequent versions of the OpenMP standard include directives to enable vectorization. These should be used in preference to the older Intel® Cilk™ Plus directives. The Intel Compiler supports the following OpenMP features:
The following subchapters provide more information on vectorization topics. Click the links below to access these topics.
The following topics should be considered required reading
The following topics present some optional techniques to take vectorization to the next level:
In this chapter, various vectorization methods and optimizations were presented. You will not get good performance on the Intel® Xeon product family without good vectorization. It is essential that you understand the following:
Compiler option -qopt-report should be used to determine what portions of your application is vectorizing, is not vectorizing, and why it is not vectorizing.
You should have learned how critical data alignment is for vectorization, and how to force alignment of your data.
Pointer aliasing and how to tell the compiler that pointers are not aliasing the same data
Elemental functions and how they assist vectorization.
Also in this chapter were optional techniques in outer loop optimizations and vectorized random number generation. Refer to the article Intel Vectorization Tools for many additional resources..
It is essential that you read this guide from start to finish using the built-in hyperlinks to guide you along a path to a successful port and tuning of your application(s) on the Intel® Xeon product family. The paths provided in this guide reflect the steps necessary to get best possible application performance.