Compiler Methodology for Intel® MIC Architecture
This article is part of the Intel® Modern Code Developer Community documentation which supports developers in leveraging application performance in code through a systematic step-by-step optimization framework methodology. This article addresses: parallelization.
This methodology enables you to determine your application's suitability for performance gains using Intel® Many Integrated Core Architecture (Intel® MIC Architecture). The following links will allow you to understand the programming environment and help you evaluate the suitability of your app to the Intel Xeon and MIC environment.
The Intel® MIC Architecture provides two principal programming models: the native model covers compiling applications to run directly on the coprocessor, the heterogeneous offload model covers running a main host program and offloading work to the coprocessor, including standard offload and the Cilk_Offload model. The following chapter gives you insights into the applicability of these models to your application.
The next chapter covers topics in parallelization. This includes Rank parallelization and Thread parallelization with links to various parallelization methods and resources along with tips and techniques for getting optimal parallel performance. In this chapter, you will learn techniques for the Intel OpenMP* runtime library provided with the Intel compilers, Intel® MPI, Intel® Cilk™ Plus, and Intel® Threading Building Blocks (Intel® TBB).
The third level of parallelism associated with code modernization is vectorization and SIMD instructions. The Intel compilers recognize a broad array of vector constructs and are capable of enabling significant performance boosts for both scalar and vector code. The following chapter provides detailed information on ways to maximize your vector performance.
Because of the rich and varied programming environments provided by the Intel Xeon and Xeon Phi processors, the Intel compilers offer a wide variety of switches and options for controlling the executable code that they produce. This chapter provides the information necessary to insure that a user gets the maximum benefit from the compilers.
The final chapter in the section provides insight into some advanced optimization topics. Included are discussions of floating point accuracy, data movement, thread scheduling, and many more. This is a good chapter for users still not seeing their desired performance OR are looking for the last level of performance enhancements.