This chapter covers topics in vectorization. Vectorization is a form of data-parallel programming where the processor performs the same operation simultaneously on N data elements of a vector (a one-dimensional array of scalar data objects such as floating point objects, integers, or double precision floating point objects).
Skill and knowledge of vectorization is absolutely ESSENTIAL to gain performance on the Intel® Xeon Phi™ product family. Vectorization of an application can give as much as 8x (double precision) or 16x (single precision float) speedup in the perfect case. Your application may not reach these potential speedups, but what should be clear is that if your code has no vectorization then it will not run efficiently on the Intel® Xeon Phi™ product family
As a first step, it is essential to understand vectorization - what it is and how to use the vectorization report to determine where the compiler is able to vectorize your application. Use option -vec-report with Intel compiler versions 14.0 and earlier. Likewise, use -qopt-report to generate a vectorization report with Intel compiler version 15.0 and later. It is also important to know which sections of code the compiler cannot vectorize and why it cannot.
Since it is not realistic to expect the compiler to do all the work for vectorization, you need to know compiler pragmas and directives to assist the compiler with vectorization. One key technique to aid efficient vectorization is data alignment. This chapter describes how to control data alignment and assist the compiler to recognize aligned data.
Also in this chapter, pointer aliasing and it's effects on optimization and vectorization is discussed, as are ways to tell the compiler that pointer arguments are not aliased.
Finally, this chapter also presents novel approaches to outer loop vectorization.
Note that OpenMP* 4.0 and subsequent versions of the OpenMP standard include new directives to enable vectorization. Theses should be used in preference to the older Intel® Cilk™ Plus directives. The Intel Compiler supports the following OpenMP 4.0 features starting from the named releases:
- OpenMP 4.0 Features in Intel Fortran Composer XE 2013
- OpenMP 4.0 Features in Intel C++ Composer XE 2013
- Updated Support for OpenMP* 4.0 Features Added in Composer XE 2013 SP1 (C++ and Fortran)
- OpenMP* 4.0 Features in Intel Compiler 15.0
The following subchapters provide more information on vectorization topics. Click the links below to access these topics.
The following topics should be considered required reading
The following topics present some optional techniques to take vectorization to the next level:
In this chapter, various vectorization methods and optimizations were presented. You will not get good performance on the Intel® Xeon Phi™ product family without good vectorization. It is essential that you understand the following:
Compiler option -qopt-report (-vec-report is deprecated in version 15.0) should be used to determine what portions of your application is vectorizing, is not vectorizing, and why it is not vectorizing.
You should have learned how critical data alignment is for vectorization, and how to force alignment of your data.
Pointer aliasing and how to tell the compiler that pointers are not aliasing the same data
Elemental functions and how they assist vectorization.
Also in this chapter were optional techniques in outer loop optimizations and vectorized random number generation. Refer to the article Intel Vectorization Tools for many additional resources..
It is essential that you read this guide from start to finish using the built-in hyperlinks to guide you along a path to a successful port and tuning of your application(s) on the Intel® Xeon Phi™ product family. The paths provided in this guide reflect the steps necessary to get best possible application performance.
The next chapter, Advanced MIC Optimizations, presents some advanced optimizations to get most performance out of The Intel® Many Integrated Core Architecture (Intel® MIC Architecture) and the Intel® Xeon Phi™ product family.