Scheduling for 1-4 Threads Per Core Using Compiler Option -qopt-threads-per-core
This option is a hint or suggestion to the compiler about the number of hardware threads per core that MAY be used for an application. This hint enables the compiler to perform better code optimizations (such as instruction scheduling).
-qopt-threads-per-core=1/2/3/4 (default is 4)
- This option does not affect the number of threads/core that will be used at run time. That will be controlled by settings such as KMP_AFFINITY, OMP_NUM_THREADS, KMP_PLACE_THREADS, etc.
- Code compiled with this option can run correctly on any (hardware supported) number of threads/core
Gives a hint to the compiler about how many threads are likely to be running on the core for the application. This information is used to optimize more effectively, especially during instruction scheduling. The value of N should be chosen by the user to match how many threads will be used per core while executing the application. For example, if the application is parallelized using OpenMP, use the value of N for the number of threads-per-core as the OpenMP affinity setting to be used when executing the application code on Intel MIC Architecture. Please refer to the Intel
It is essential that you read this guide from start to finish using the built-in hyperlinks to guide you along a path to a successful port and tuning of your application(s) on Intel® Xeon Phi™Coprocessor. The paths provided in this guide reflect the steps necessary to get best possible application performance.
Back to the main chapter, Advanced Optimizations for Intel® MIC Architecture