Amdahl's law: A theoretical formula for predicting the maximum performance benefits of parallelizing application programs. Amdahl's law states that run-time execution time speedup is limited by the part of the program that is not parallelized (executes serially). To achieve results close to this potential, overhead must be minimized and all cores need to be fully utilized. See also Using Amdahl's Law and Measuring the Program
annotation: A method of conveying information about proposed parallel execution. In the Intel® Advisor, you create annotations by inserting macros or function calls. These annotations are used by Intel Advisor tools to predict parallel execution. For example, the C/C++ ANNOTATE_SITE_BEGIN(sitename) macro identifies where a parallel site begins. Later, to allow this code to execute in parallel, you replace the annotations with code needed to use a parallel framework. See also parallel framework and Summary of Annotation Types.
atomic operation: An operation performed by a thread on a memory location(s) that is guaranteed not to be interfered with by other threads. See also synchronization.
chunking: The ability of a parallel framework to aggregate multiple instances of a task into groups for more efficient parallel processing. For tasks that do small amounts of computation and many iterations, task chunking can minimize task overhead. You can also restructure a single loop into an inner and outer loop (strip-mining). See also task and Enabling Task Chunking.
critical section: A synchronization construct that allows only one thread to enter its associated code region at a time. Critical sections enforce mutual exclusion on enclosed regions of code. With Intel Advisor, mark critical sections by using ANNOTATE_LOCK_ACQUIRE() and ANNOTATE_LOCK_RELEASE() annotations.
data race: When multiple threads share (read/write) a memory location, if the program does not implement controls to manage the sequence of concurrent memory accesses, one thread can inadvertently overwrite data written by another thread, or otherwise read or write stale data. This can produce execution errors that are difficult to detect and reproduce, such as obtaining different calculated results when the same executable is run on different systems. To prevent data races, you can insert data synchronization constructs that restrict shared memory access to one thread at a time, or you might eliminate the sharing. See also Common Issues When Adding Parallelism and deadlock.
data parallelism: Occurs when a single portion of code is paired with multiple portions of data, and each pairing executes as a task. For example, tasks are made by pairing a loop body with each element of an array iterated by the loop, and the tasks execute in parallel. See also About Task Patterns. Contrast task parallelism.
data set: A set of data to be used as input or with an interactive application the way you interact with the application to cause a portion of the application to be executed. Because the Correctness tool watches each memory access in a parallel site in great detail, the parallel site's code takes much longer to run than usual. To limit the time needed to run Correctness analysis, reduce the data (such as the number of loop iterations) and when using an interactive program, create a very small test case. See also Choosing a Small, Representable Data Set for the Correctness Tool.
deadlock: A situation where a set of threads have each acquired some locks and are waiting for other locks to be released. All threads in the set are waiting for a lock held by a different thread, and since none can proceed and release their lock(s), they all remain waiting. See also Common Issues When Adding Parallelism.
dynamic extent: All code that may possibly be executed by a parallel site or task. For example, a dynamic extent might include a loop, all functions called from the loop, all functions the called functions may in turn call, and so on. Contrast static extent. See also Task Organization and Annotations.
framework: See parallel framework
hotspot: A small code region that consumes much of the program's run time. Hotspots can be identified by a profiler, such as the Intel Advisor Survey tool. See also Using Amdahl's Law and Measuring the Program and About Workflows
Intel® Cilk™ Plus: Intel Cilk Plus is a high-level parallel framework included with the Intel® C++ Compiler (part of Intel® Composer XE and similar Intel software suites). With the compiler support, C or C++ programs can use Intel Cilk Plus's three keywords: cilk_for, cilk_spawn, and cilk_synch. Also, C++ programs can use hyperobjects called reducers to synchronize shared data access without using locks. See parallel framework.
Intel® Threading Building Blocks: A C++ template library for writing programs that take advantage of multiple cores. You can use this library to write scalable programs that specify tasks rather than threads, emphasize data parallel programming, and take advantage of concurrent collections and parallel algorithms. This is provided as an Intel software product - Intel Threading Building Blocks - as well as open source. Threading Building Blocks is one of several parallel frameworks. Abbreviation TBB.
load balancing: The equal division of work among cores. If the load is balanced, the cores are busy most of the time.
lock: A synchronization mechanism that allows one thread to wait until another thread allows it to continue. A lock can be used to synchronize threads accessing a specific memory location. See also synchronization and nested lock.
multi-core: A processor that combines two or more independent cores. Although each core shares interconnection to the rest of the system, it executes instructions independently by using its dedicated CPU, architectural state, and interrupt controllers, as well as private and/or shared cache. Most multi-core systems use identical cores. The number of cores used determines whether it is called dual-core (2), quad-core (4), or many-core system.
multithreaded processing: See parallel processing
mutual exclusion: A type of locking typically used to prevent actions occurring at the same time. Abbreviation mutex. See also synchronization
nested lock: A type of lock that can be locked again by a task when the task already owns the lock. Nested locks are convenient when several inter-related functions use the same lock. See also synchronization and lock
OpenMP*: A high-level parallel framework and language extension designed to support shared-memory parallel programming that consists of compiler directives (C/C++ pragmas and Fortran directives), library functions, and environment variables. The OpenMP specification was developed by multiple hardware and software vendors to provide a scalable, portable interface for parallel programming on a variety of platforms. OpenMP is one of several parallel frameworks. See also http://openmp.org.
parallel framework: A combination of libraries, language features, or other software techniques that enable code for a program to execute in parallel. Examples include OpenMP, Threading Building Blocks, Message Passing Interface (MPI), Intel® Concurrent Collections for C/C++, Intel® Cilk™ Plus, Microsoft Task Parallel Library* (TPL), and low-level, basic threading APIs, like POSIX* threads (Pthreads). Some parallel frameworks support shared-memory parallel processing, while others like MPI support non-shared-memory parallel processing. See also Threading Building Blocks, Intel Cilk Plus, and Overview of Parallel Frameworks.
parallel site: A region of code that contains tasks that can execute in parallel. See also annotation and Task Organization and Annotations
parallel processing: The use of multiple threads during execution of a program. Intel Advisor focuses on parallel processing for shared-memory systems. There are other types of parallel processing, such as for clusters or grids. Shortened version is parallelism. See also hotspot and thread
pipeline: An approach to organizing task computations that uses both data parallelism and task parallelism, and organizes the computation into stages that run in a predetermined order.
self time: In the Survey Report window, how much time was spent in a particular function or loop.
site: See parallel site
shared-memory parallelism: See parallel processing
static extent: The code between a site's or a task's _BEGIN and _END annotations. A static extent might not be lexically paired; for example, a parallel site may have one _BEGIN point, but may require multiple independent _END exit points. Contrast with dynamic extent. See also annotation, parallel site, and Task Organization and Annotations.
synchronization: Coordinating the execution of multiple threads. For example, a lock or a Intel Cilk Plus reducer can be used to restrict access to a shared data. See also About Synchronizing Independent Updates
task parallelism: Occurs when two different portions of the code are made into tasks and execute in parallel. For example, a task is made by pairing a display algorithm with the state to display, another task by pairing a compute-next-state algorithm with the same state, and the two tasks execute in parallel. See also About Task Patterns. Contrast data parallelism
TBB: See Intel® Threading Building Blocks
Threading Building Blocks: A high-level parallel framework included with Intel® Composer XE. See Intel Threading Building Blocks and parallel framework. Abbreviation Intel TBB.
thread: A thread executes instructions within a process. Each process has one or more threads active at a time. Threads share the address space of the process, but have their own stack, program counters, and other registers.
total time: In the Survey Report window, how much time was spent in a particular function or loop, plus the time spent by anything that entity calls.