Intel® Advisor includes capabilities for analyzing Intel® Xeon Phi™ coprocessor applications. This article steps through this analysis on an Intel Xeon Phi coprocessor and also outlines some of the new capabilities.
Building the application
The application we are using is one of the samples included in the Intel Advisor. It is located in C:\Program Files (x86)\Intel\Advisor\samples\en\C++\tachyon_Advisor.zip. To build the application on the Microsoft Windows* OS:
- First source the environment for the Intel® compiler you are using.
- Run C:\Program Files (x86)\Intel\compiler_xe_2015\bin\compilervars.bat intel64.
- Unzip the sample in a directory where you have permission. We will unzip to C:\advisor_samples.
- Build the application.
- Open the solution file C:\advisor_samples\tachyon_Advisor\tachyon_Advisor.sln using Microsoft Visual Studio* 2012.
- In the Microsoft Visual Studio IDE right-click 2_tachyon_annotated and select Set As Startup Project.
- Make sure you are set for building in Release mode, then click Build > Rebuild Solution.
Running the application using the Advisor suitability analysis
First bring up the Intel Advisor Workflow.
- Click Tools > Advisor XE 2015 > Open Advisor Workflow.
- Click the Collect Suitability Analysis button.
Some key observations:
- By default the Intel Advisor does its modeling on a host CPU. In this case it assumes you have 8 CPUs. You can change the CPU count by using the CPU Count drop-down list.
- Note how much speedup you can expect.
- Also note the scalability graph. This indicates if you have the type of workload that will scale (that is, get faster) when you add additional CPUs.
Showing Suitability for an Intel Xeon Phi coprocessor
Click the Target System drop-down list and select Intel Xeon Phi.
Some key observations
- By default Intel Advisor models your application with 128 coprocessor threads. You can modify this with the Coprocessor Threads drop-down list.
- In the scalability graph the area in green indicates if your parallel region has enough parallelism to be ready for running on an Intel Xeon Phi coprocessor.
- The following indicators show where you should be looking to improve performance:
- Load Imbalance
- Runtime Overhead
- Lock Contention
One interesting note with this application is that when the CPU Count is 8 there is not a significant load imbalance. If you expand the Load Imbalance item on the graph you can see the following:
If you increase the number of CPUs the load imbalance does become significant. In this case we have set the number of CPUs to 128.
Note: In the Task Modeling area you can see that on average there are 512 tasks for this parallel region. When you have only 8 CPUs there is plenty of work to assign to each CPU, but when you have 128 you see the load imbalance. To test an algorithm with a greater amount of work, use the slider in the Task Modeling section. Slide to 5x and then click Apply.
As you can see, when we increase the amount of work we decrease the load imbalance.
Showing suitability for offloading your parallel region to an Intel Xeon Phi coprocessor
Click the Target System drop-down list and select Offload to Intel Xeon Phi.
Modeling different data sets using Intel Advisor
You can also model the scalability of your parallel regions using different data sets. For example you can test “what-if” you had 5, 25 or 125 times the number of tasks/work. What would be the resulting speedup and scalability?
- Go to the Task Modeling region.
- Use the slide to select 5x.
- Click Apply.
You can also model average task duration by using the slider titled Avg. Task Duration.
Advanced modeling of Intel Xeon Phi coprocessor vectorization
Intel Advisor also has the ability to model how your application would run on an Intel Xeon Phi coprocessor both considering vectorization and not considering vectorization.
If you click Intel Xeon Phi Advanced Modeling you can model the suitability with and without vectorization:
Intel Advisor is a powerful tool for modeling the scalability of your application. Using the new features for modeling on an Intel Xeon Phi coprocessor you can easily tell if your workload will scale to the high number of coprocessor threads. The features to dynamically change the size of your data set size let you see if your algorithm will see benefits from additional scaling without having to make any code changes.