Weighing My Options With Intel® Parallel Advisor Lite

Intel® Parallel Advisor Lite allows me to model the effects of adding parallelism to my serial application without having to actually parallelize my code. This can be very handy because in the world of multi-core processors, adding parallelism can tremendously improve a program's performance, but debugging real parallel code producing non-deterministic results can be an arduous and frustrating task.

By quickly and easily profiling my applications with Parallel Advisor Lite, I can determine where the program is spending most of its time. These sites would be good candidates for parallelism because they could provide some of the best speedups. Figure 1 shows the breakdown of time spent in functions in my application.

Figure 1

There are two functions that are consuming most of the runtime, chainedSeed and orderedSeed, and one init function that is a minor contributor to the execution time. The two top functions are worth more attention for parallelization and Figure 2 shows what they look like.

Figure 2

Both functions are populating an array in a for() loop whose iterations may be parallelizable. A quick addition of Parallel Advisor Lite Annotations (Figure 3) and a run of the Parallel Advisor Lite correctness modeling tool reveal the potential correctness issues I may face when I add real parallelization (Figure 4).

Figure 3

Figure 4

Clicking on the read or write in the data communication observations navigates me to the offending line, in this case the statement a[i]=RNG(a[i-1]) would be an issue if the iterations were parallelized. This is a data dependence in the chainedSeed function which uses the previously computed value to compute the next value. This makes it difficult to parallelize the iterations of this loop. There are no correctness issues in the orderedSeed function.

Now that I have my hotspots and correctness information I have some thinking to do before I start parallelizing. I know I want to focus on the top functions. Trying to parallelize a small function like init won't provide a lot of performance gain. The chainedSeed function consumes the most runtime, however there is one correctness issue associated with parallelizing it. The orderedSeed function consumes only slightly less runtime and should be easier to parallelize because there are no correctness issues associated with it. By looking at both the hotspots and correctness information that Parallel Advisor Lite provides I can make educated decisions about where to focus my efforts to add parallelism to my application, as well as what types of debugging issues I can expect to see once the code is parallelized. In this case I would first parallelize the orderedSeed function since there are no correctness issues, and it takes up nearly half of the total runtime. You can download Intel® Parallel Advisor Lite for free and use it with a free evaluation copy of Intel® Parallel Studio to parallelize your favorite program.

1. By installing or copying all or any part of the software components in this site, you agree to the terms of the Intel Sample Source Code License Agreement.
2. By accessing and using the Intel Corporation Web Sites and Materials, you acknowledge and agree to abide by the following Terms of Use.