Intel® Advisor

FLOPS Analysis

The Intel® Advisor FLOPS analysis is part of the Advisor Trip Counts and FLOPS analysis, which runs your target program, collects data about loops, and displays the collected information in the Survey Report.

Note

You must run the Survey analysis at least once before or after collecting FLOPS data to see the results.

Before you run the Trip Counts and FLOPS analysis

Mixing and Matching Tasks

You can combine the data parallel and task parallel patterns. Continuing with the display/update example, suppose that you can parallelize the update operation, but not the display operation. Then you could execute the display operation in parallel with multiple tasks from the update operation. Consider this C/C++ code:

Reducing Lock Overhead

Lock overhead is the time spent in creating, destroying, acquiring, and releasing locks. Lock overhead does not include the time spent waiting for a lock held by another task - that is called lock contention. You can think of lock overhead as the cost of the lock operations themselves assuming the lock is always available.

Before Running the Memory Access Patterns Tool

Intel® Advisor Memory Access Patterns tool runs your serial program's executable and watches its memory access operations in great detail to predict possible issues. After you fix any found issues in the source code, run the Memory Access Patterns tool again to check the modified program's memory access strides.

Before you run the Memory Access Patterns tool, do the following:

  • Mark Loops for deeper analysis:

Eliminating Incidental Sharing

Sharing problems involving a task and a memory location are incidental if the memory location does not carry information into or out of the task. Therefore, if you replace all uses of the shared memory location in the task with uses of some non-shared memory location, you eliminate the sharing problem without changing the behavior of the program.

The following sections describe incidental sharing problems and their solutions.

OpenMP Critical Sections

Use OpenMP critical sections to prevent multiple threads from accessing the critical section's code at the same time, thus only one active thread can update the data referenced by the code. Critical sections are useful for a non-nested mutex.

Unlike OpenMP atomic operations that provide fine-grain synchronization for a single operation, critical sections can provide course-grain synchronization for multiple operations.

Use:

Site and Task Annotations for Simple Loops With One Task

Parallel site annotations mark the beginning and end of the parallel site. In contrast, to mark an entire simple loop body as a task, you only need a single iteration task annotation in the common case where the Survey tool identifies a single simple loop that consumes much of an application's time. In many cases, a single time-consuming simple loop structure may be the only task needed within a parallel site. This annotation form is also the easiest to convert to parallel code.

Subscribe to Intel® Advisor