Pane: Analysis Type - Concurrency

To access this pane:

  1. Click the New Analysis button on the Intel® VTune™ Amplifier toolbar.

    The New Amplifier Result tab opens with the Analysis Type window active.

  2. Select the Algorithm Analysis > Concurrency analysis type from the analysis tree on the left pane.

    The Concurrency pane opens on the right.

Use this pane to explore and edit the Concurrency analysis type predefined configuration. This analysis type helps find out where your application does not use the available logical CPUs effectively.

Use This

To Do This

CPU sampling interval, ms spin box

Specify an interval between CPU samples.

Analyze DirectX pipeline events check box

Analyze GPU usage and frame rate based on the data provided by the DirectX* and identify whether your application is GPU or CPU bound.

Analyze user tasks check box

Analyze tasks specified in your code via Task API.

Analyze Intel runtimes and user synchronization check box

Analyze thread synchronization by profiling User synchronization API used by Intel runtimes like OpenMP* and Intel® Threading Building Blocks (Intel TBB) or by the user. This option causes higher overhead and increases result size.

Trace OpenCL kernels on Processor Graphics check box

Capture the execution time of OpenCL™ kernels on a GPU, identify performance-critical GPU computing tasks, and analyze the performance of OpenCL kernels per GPU hardware metrics.

Analyze Processor Graphics hardware events drop-down menu

Analyze performance data from Intel HD Graphics based on the predefined groups of GPU metrics.

Details button

Expand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify these settings for the Concurrency analysis, click the Copy button in the upper right corner. VTune Amplifier creates an editable copy of this analysis type configuration and locates it under the Custom Analysis branch in the analysis tree.

The Details section provides information on the following default collection settings used for the Concurrency analysis:

Use This Option

To Do This

Default Concurrency Value

CPU sampling interval, ms

Set the interval between collected CPU samples in milliseconds.

10

Collect highly accurate CPU time

Obtain more accurate CPU time data. This option causes more runtime overhead and increases result size. Administrator privileges are required.

Yes

Collect CPU sampling data

Enable sampling and include stack unwinding, that is respective result windows and panes will contain information about function call stacks.

With stacks

Collect signalling API data

Identify synchronization transitions in the timeline and signalling call stacks for associated waits. The collector instruments signalling APIs, which causes higher overhead and increases result size. The specified option value enables stack unwinding for signaling calls, that is respective result windows and panes will contain information about calling sequences for signaling calls.

With stacks

Collect synchronisation API data

Identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size. The specified option value enables stack unwinding for synchronization wait calls, that is respective result windows and panes will contain information about calling sequences for synchronization wait calls.

With stacks

Collect I/O API data

Identify where threads are waiting or enables you to compute thread concurrency. The collector instruments APIs, which causes higher overhead and increases result size. The specified option value enables stack unwinding for I/O calls, that is respective result windows and panes will contain information about calling sequences for I/O calls.

With stacks

Analyze user tasks

Analyze tasks in your code specified via Task API. This option causes higher overhead and increases result size.

No

Analyze Intel runtimes and user synchronization

Analyze thread synchronization by profiling User synchronization API used by Intel runtimes like OpenMP* and Intel® Threading Building Blocks (Intel TBB) or by the user. This option causes higher overhead and increases result size.

No

Analyze Processor Graphics hardware events

Analyze performance data from Intel HD Graphics based on the predefined groups of GPU metrics.

No

Analyze DirectX* pipeline events

Analyze GPU usage and frame rate based on the data provided by the DirectX* and identify whether your application is GPU or CPU bound.

No

Trace OpenCL kernels on Processor Graphics

Capture the execution time of OpenCL kernels on a GPU, identify performance-critical GPU computing tasks, and analyze the performance of OpenCL kernels per GPU hardware metrics.

No

GPU sampling interval, us

Specify an interval between GPU samples.

1000

Stack unwinding mode

Enable stack unwinding after collection finishes (offline mode). Offline mode reduces analysis overhead and is typically recommended.

After collection

Stitch stacks

For applications using Intel Threading Building Blocks (Intel TBB) or OpenMP* using Intel runtime libraries, restructure the call flow to attach stacks to a point introducing a parallel workload.

Yes

Collect timeline data

Enable collecting and retaining overhead data to display the Timeline pane. This mode increases result size.

Yes

Collect frequency data

Collect data about processor frequency changes. This type of data collection is supported only for Linux* systems based on Intel® Xeon® processors.

No

Collect sleep data

Analyze when and what causes the hardware to wake up from a sleep state. This type of data collection is supported only for Linux* systems based on Intel Xeon® processors.

No

Note

You may copy the command line for this configuration using the Command Line... button at the bottom and run this analysis remotely.

For more complete information about compiler optimizations, see our Optimization Notice.