Parallel Programming

Submit New Article



Title Tag(s) Modified Date
Using Intel Cluster Checker to check that MPI applications will properly run over Infiniband
Using Intel Cluster Checker to check that MPI applications will properly run over InfiniBand.
Type: Technical Article
Intel Cluster Ready Intel Cluster Checker intel_mpi_rt_internode 02/09/2012
Using Intel® Inspector XE 2011 to Find Data Races in Multithreaded Code
Intel Inspector XE 2011 automatically finds memory errors, deadlocks and other conditions that could lead to deadlocks, data races, thread . Some specific issues associated with debugging multithreaded applications will be discussed in this article.
Type: Technical Article
critical section OpenMP Debugger data races Intel Parallel Inspector threading 02/03/2012
90 errors in open-source projects
There are actually 91 errors described in the article, but number 90 looks nicer in the title. The article is intended for C/C++ programmers, but developers working with other languages may also find it interesting.
Type: Technical Article
errors C++ open source bugs cpp PVS-Studio code review static code analyzer Security Community 02/02/2012
Multi-threaded Rendering and Physics Simulation
by Rajshree Chabukswar, Adam T. Lake, and Mary R. Lee, Intel® Software Solutions Group Introduction Learn how to decouple rendering and physical simulation in a multi-threaded environment with a ...
Type: Technical Article
Multithreading physics visual computing 01/12/2012
Introduction to OpenCL™
Open Compute Language (OpenCL™) provides a framework to write programs in C-like language that can run on heterogeneous cores such as CPUs, GPUs or specialized hardware. This white paper provides a br ...
Type: Technical Article
01/05/2012
OpenCL™ – Using Events
Introduction This white paper is the fourth in a series of white papers on OpenCL describing how to set up and use events in multithreaded design. This white paper will go over various design choices ...
Type: Technical Article
12/21/2011
OpenCL™ - Programming for CPU Performance
This white paper is the third in a series of whitepapers on OpenCL™ describing how to best utilize underlying Intel hardware architecture using OpenCL. This white paper will go over programming conside ...
Type: Technical Article
12/21/2011
Intel® Performance Counter Monitor - A better way to measure CPU utilization
The Intel® Performance Counter Monitor provides sample C++ routines and utilities to estimate the internal resource utilization of the latest Intel® Xeon® and Core™ processors and gain a significant performance boost.
Type: Technical Article
monitoring Intel Performance Counter Monitor simultaneous multithreading out-of-order execution Intel® Performance Counter Monitor Intel® Xeon® Core™ processors multi-level caches pipelining 11/30/2011
Avoiding AVX-SSE Transition Penalties
Avoiding AVX-SSE Transition Penalties (PDF 678 KB) Transitioning between 256-bit Intel® AVX instructions and legacy Intel® SSE instructions within a program may cause performance penalties because the ...
Type: Technical Article
Intel AVX Sandy Bridge Intel® SSE 11/10/2011
Use Non-blocking Locks When Possible
Non-blocking system calls allow the competing thread to return on an unsuccessful attempt to the lock, and allow useful work to be done, thereby avoiding wasteful utilization of execution resources at the same time.
Type: Technical Article
critical section synchronization threading non-blocking lock context switch spin-wait 11/04/2011
Choosing Appropriate Synchronization Primitives to Minimize Overhead
Currently, there are a number of synchronization mechanisms available, and it is left to the application developer to choose an appropriate one to minimize overall synchronization overhead.
Type: Technical Article
atomic operations synchronization threading Win32 threads system overhead mutual exclusion PPGuide 11/04/2011
Use Synchronization Routines Provided by the Threading API Rather than Hand-Coded Synchronization
Application programmers sometimes write hand-coded synchronization routines rather than using constructs provided by a threading API in order to reduce synchronization overhead or provide different functionality than existing constructs offer.
Type: Technical Article
Hyper-Threading OpenMP synchronization threading Pthreads Win32 threads spin-wait PPGuide 11/04/2011
Managing Lock Contention: Large and Small Critical Sections
This topic introduces the concept of critical section size, defined as the length of time a thread spends inside a critical section, and its effect on performance.
Type: Technical Article
11/04/2011
Using AVX Without Writing AVX Code
Using AVX Without Writing AVX Code (PDF 260KB) Abstract Intel® Advanced Vector Extensions (Intel® AVX) is a new 256-bit instruction set extension to Intel® Streaming SIMD Extensions (Intel® SSE) and ...
Type: Technical Article
11/04/2011
Exploiting Data Parallelism in Ordered Data Streams
This article identifies some of these challenges and illustrates strategies for addressing them while maintaining parallel performance.
Type: Technical Article
data parallelism I/O threading order dependence PPGuide 11/04/2011
Using Tasks Instead of Threads
Tasks are a lightweight alternative to threads that provide faster startup and shutdown times, better load balancing, an efficient use of available resources, and a higher level of abstraction.
Type: Technical Article
11/04/2011
Expose Parallelism by Avoiding or Removing Artificial Dependencies
Many applications and algorithms contain serial optimizations that inadvertently introduce data dependencies and inhibit parallelism. One can often remove such dependences through simple transforms, or even avoid them altogether through.
Type: Technical Article
11/04/2011
Load Balance and Parallel Performance
Load balancing an application workload among threads is critical to performance. The key objective for load balancing is to minimize idle time on threads.
Type: Technical Article
11/04/2011
Granularity and Parallel Performance
One key to attaining good parallel performance is choosing the right granularity for the application. Granularity is the amount of real work in the parallel task. If granularity is too fine, then performance can suffer from communication overhead.
Type: Technical Article
11/04/2011
Loop Modifications to Enhance Data-Parallel Performance
When confronted with nested loops, the granularity of the computations that are assigned to threads will directly affect performance. Loop transformations such as splitting and merging nested loops can make parallelization easier and more productive.
Type: Technical Article
11/04/2011
Predicting and Measuring Parallel Performance
The success of parallelization is typically quantified by measuring the speedup of the parallel version relative to the serial version. It is also useful to compare that speedup relative to the upper limit of the potential speedup.
Type: Technical Article
11/04/2011
Optimize Data Structures and Memory Access Patterns to Improve Data Locality
GOptimize Data Structures and Memory Access Patterns to Improve Data Locality (PDF 782KB) Abstract Cache is one of the most important resources of modern CPUs: it’s a smaller and faster part of the m ...
Type: Technical Article
11/02/2011
Getting Code Ready for Parallel Execution with Intel® Parallel Composer
This article provides an overview of the methods available in Intel® Parallel Composer, along with a comparison of their key benefits.
Type: Technical Article
OpenMP Vectorization Parallel Composer compiler threading auto-parallelization 11/02/2011
Curing Thread Imbalance Using Intel® Parallel Amplifier
Curing Thread Imbalance Using Intel® Parallel Amplifier (PDF 302KB) Abstract One of the performance-inhibiting factors in threaded applications is load imbalance. Balancing the workload among thread ...
Type: Technical Article
concurrency scheduling parallel amplifier threading scalability hotspot utilization PPGuide 11/02/2011
Threading and Intel® Integrated Performance Primitives
Threading and Intel® Integrated Performance Primitives (PDF 230KB) Abstract There is no universal threading solution that works for all applications. Likewise, there are multiple ways for application ...
Type: Technical Article
11/02/2011