Intel® Advisor includes some new capabilities for modeling the scalability of your application. This article steps through runtime modeling and data set size modeling, and outlines these new capabilities.
Run your application using the Intel Advisor suitability analysis
- First add annotations and rebuild your application in Release mode.
- Next bring up the Intel Advisor Workflow by clicking Tools > Advisor XE 2015 > Open Advisor Workflow.
- Click the Collect Suitability Data button.
The key observation from the scalability graph is that the application is not scaling well as designed. Intel Advisor XE reports a suggestion that your tasks are too fine-grained.
Intel Advisor has several techniques where you can model the runtime and see how the application will perform assuming you modify how the application is applying parallelism.
Check all of the recommend changes.
We can see clear benefits. Notice that in the case of 32 CPUs we go from less than a 1x improvement to over a 4x maximum gain from our parallel region.
But our graph shows that we see the maximum gain when we have 16 CPUs; adding additional resources decreases the performance. There are several possible reasons that this can happen:
There is not enough work to keep all of the CPUs busy.
The work is not balanced on all of the CPUs.
The runtime overhead to keep track of additional threads is too high.
There is lock contention
Model Data Set Size
Intel Advisor has a way for you to model how your parallel region will perform under different workloads. It allows you to change the size of your data set and modify how long each task iteration takes, thereby testing the scalability without making any code changes. This feature is particularly useful for CPU-bound workloads.
Look at the Loops iteration (Task) Modeling area of the Intel Advisor XE.
Using the two sliders you can increase or decrease the number of tasks, that is, the size of your workload. You can also increase or decrease how long each task takes. Testing your design under different workloads is critical to understanding if you are using the correct design for the amount of work your problem is assigned. Intel Advisor XE lets you increase of decrease the number of tasks in multiples of 5, so 5x, 25x and 125x. You also increase or decrease the duration of tasks in a similar manner.
In our example let’s multiply both the number of tasks and the duration of each task by 25.
You can see below that the algorithm performs differently when run under this new workload.
Not only does the new algorithm scale as we increase the CPU count past 16, we also increase our maximum site gain to 32x from the previous 4x gain for 32 CPUs. In this example the parallel region does not have enough work and each of the tasks is too fine-grained to work efficiently. We have shown that we can achieve good scalability by increasing the data set size and also increasing the amount of work each task performs.
Intel Advisor is a powerful tool for modeling the scalability of your application. By using runtime modeling and the new features to dynamically change the size of your data set you can see how to tune your algorithm without making any code changes.