Intel® Parallel Amplifier Under the Hood

Intel® Parallel Amplifier’s three analysis types are designed to each give you insight into different aspects of your program’s performance. Each level of analysis collects additional information. Here’s how it works.

Hotspots



The first type of analysis, Hotspots, is the most lightweight. Running it only takes about as much time as it takes to run your application normally (outside of Intel® Parallel Amplifier). While your app executes, Intel Parallel Amplifier’s data collector will periodically take samples. For each sample, the collector cooperates with the operating system to interrupt your program and gather data. It records the instruction pointer (IP) for each CPU core that is executing your app, as well as the call stack (saved as part of the O/S’s data structures). Once your application finishes executing, Intel Parallel Amplifier takes all of the IP samples and uses them to figure out how long each function was executing, and it uses all of the call stack samples to create a Call Tree for the whole program. In order to create the hotspots list and the call tree, Intel Parallel Amplifier also uses information about your program’s data and instruction space stored in its program debug database (.pdb file).

Concurrency



The second type of analysis, Concurrency, works in the same way but collects a bit more information. It also records data on the status of each of the application’s threads – running, ready to run, and blocked. Once all the samples have been collected, Intel Parallel Amplifier analyzes the data to determine the percentage of time your application’s threads were active. This is called the Concurrency Level, and it is broken down per function. Ideally, the concurrency level for your app should match the number of processors on the system – this is what Intel Parallel Amplifier calls Fully Utilized.

Locks and Waits



The last type of analysis is called Locks and Waits, and it is the most processor intensive. Your application may take longer to run under Locks and Waits analysis, and here’s why. In addition to collecting the data above, Intel Parallel Amplifier will add instructions to your compiled program. These instructions are placed wherever your program uses threading and synchronization API calls, and their purpose is to measure the wait times elapsed whenever a thread is not active. This timing information is combined with symbol information found in the .pdb file to create a picture of where your application is waiting, and what it is waiting on. Locks and other structures which might cause threads to wait are called Synchronization Objects. After running this analysis Intel Parallel Amplifier will show you a list of these objects along with the wait time for each and the concurrency of your app during the wait.
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.