Using Intel Cluster Checker to check that MPI applications will properly run over Infiniband
Using Intel Cluster Checker to check that MPI applications will properly run over InfiniBand. Type: Technical Article |
Intel Cluster Ready Intel Cluster Checker intel_mpi_rt_internode |
02/09/2012
|
Using Intel® Inspector XE 2011 to Find Data Races in Multithreaded Code
Intel Inspector XE 2011 automatically finds memory errors, deadlocks and other conditions that could lead to deadlocks, data races, thread . Some specific issues associated with debugging multithreaded applications will be discussed in this article. Type: Technical Article |
critical section OpenMP Debugger data races Intel Parallel Inspector threading |
02/03/2012
|
90 errors in open-source projects
There are actually 91 errors described in the article, but number 90 looks nicer in the title. The article is intended for C/C++ programmers, but developers working with other languages may also find it interesting. Type: Technical Article |
errors C++ open source bugs cpp PVS-Studio code review static code analyzer Security Community |
02/02/2012
|
Multi-threaded Rendering and Physics Simulation
by Rajshree Chabukswar, Adam T. Lake, and Mary R. Lee, Intel® Software Solutions Group
Introduction
Learn how to decouple rendering and physical simulation in a multi-threaded environment with a ... Type: Technical Article |
Multithreading physics visual computing |
01/12/2012
|
Introduction to OpenCL™
Open Compute Language (OpenCL™) provides a framework to write programs in C-like language that can run on heterogeneous cores such as CPUs, GPUs or specialized hardware. This white paper provides a br ... Type: Technical Article |
|
01/05/2012
|
OpenCL™ – Using Events
Introduction
This white paper is the fourth in a series of white papers on OpenCL describing how to set up and use events in multithreaded design. This white paper will go over various design choices ... Type: Technical Article |
|
12/21/2011
|
OpenCL™ - Programming for CPU Performance
This white paper is the third in a series of whitepapers on OpenCL™ describing how to best utilize underlying Intel hardware architecture using OpenCL. This white paper will go over programming conside ... Type: Technical Article |
|
12/21/2011
|
Intel® Performance Counter Monitor - A better way to measure CPU utilization
The Intel® Performance Counter Monitor provides sample C++ routines and utilities to estimate the internal resource utilization of the latest Intel® Xeon® and Core™ processors and gain a significant performance boost. Type: Technical Article |
monitoring Intel Performance Counter Monitor simultaneous multithreading out-of-order execution Intel® Performance Counter Monitor Intel® Xeon® Core™ processors multi-level caches pipelining |
11/30/2011
|
Avoiding AVX-SSE Transition Penalties
Avoiding AVX-SSE Transition Penalties (PDF 678 KB)
Transitioning between 256-bit Intel® AVX instructions and legacy Intel® SSE instructions within a program may cause performance penalties because the ... Type: Technical Article |
Intel AVX Sandy Bridge Intel® SSE |
11/10/2011
|
Use Non-blocking Locks When Possible
Non-blocking system calls allow the competing thread to return on an unsuccessful attempt to the lock, and allow useful work to be done, thereby avoiding wasteful utilization of execution resources at the same time. Type: Technical Article |
critical section synchronization threading non-blocking lock context switch spin-wait |
11/04/2011
|
Choosing Appropriate Synchronization Primitives to Minimize Overhead
Currently, there are a number of synchronization mechanisms available, and it is left to the application developer to choose an appropriate one to minimize overall synchronization overhead. Type: Technical Article |
atomic operations synchronization threading Win32 threads system overhead mutual exclusion PPGuide |
11/04/2011
|
Use Synchronization Routines Provided by the Threading API Rather than Hand-Coded Synchronization
Application programmers sometimes write hand-coded synchronization routines rather than using constructs provided by a threading API in order to reduce synchronization overhead or provide different functionality than existing constructs offer. Type: Technical Article |
Hyper-Threading OpenMP synchronization threading Pthreads Win32 threads spin-wait PPGuide |
11/04/2011
|
Managing Lock Contention: Large and Small Critical Sections
This topic introduces the concept of critical section size, defined as the length of time a thread spends inside a critical section, and its effect on performance. Type: Technical Article |
|
11/04/2011
|
Using AVX Without Writing AVX Code
Using AVX Without Writing AVX Code (PDF 260KB)
Abstract
Intel® Advanced Vector Extensions (Intel® AVX) is a new 256-bit instruction set extension to Intel® Streaming SIMD Extensions (Intel® SSE) and ... Type: Technical Article |
|
11/04/2011
|
Exploiting Data Parallelism in Ordered Data Streams
This article identifies some of these challenges and illustrates strategies for addressing them while maintaining parallel performance. Type: Technical Article |
data parallelism I/O threading order dependence PPGuide |
11/04/2011
|
Using Tasks Instead of Threads
Tasks are a lightweight alternative to threads that provide faster startup and shutdown times, better load balancing, an efficient use of available resources, and a higher level of abstraction. Type: Technical Article |
|
11/04/2011
|
Expose Parallelism by Avoiding or Removing Artificial Dependencies
Many applications and algorithms contain serial optimizations that inadvertently introduce data dependencies and inhibit parallelism. One can often remove such dependences through simple transforms, or even avoid them altogether through. Type: Technical Article |
|
11/04/2011
|
Load Balance and Parallel Performance
Load balancing an application workload among threads is critical to performance. The key objective for load balancing is to minimize idle time on threads. Type: Technical Article |
|
11/04/2011
|
Granularity and Parallel Performance
One key to attaining good parallel performance is choosing the right granularity for the application. Granularity is the amount of real work in the parallel task. If granularity is too fine, then performance can suffer from communication overhead. Type: Technical Article |
|
11/04/2011
|
Loop Modifications to Enhance Data-Parallel Performance
When confronted with nested loops, the granularity of the computations that are assigned to threads will directly affect performance. Loop transformations such as splitting and merging nested loops can make parallelization easier and more productive. Type: Technical Article |
|
11/04/2011
|
Predicting and Measuring Parallel Performance
The success of parallelization is typically quantified by measuring the speedup of the parallel version relative to the serial version. It is also useful to compare that speedup relative to the upper limit of the potential speedup. Type: Technical Article |
|
11/04/2011
|
Optimize Data Structures and Memory Access Patterns to Improve Data Locality
GOptimize Data Structures and Memory Access Patterns to Improve Data Locality (PDF 782KB)
Abstract
Cache is one of the most important resources of modern CPUs: it’s a smaller and faster part of the m ... Type: Technical Article |
|
11/02/2011
|
Getting Code Ready for Parallel Execution with Intel® Parallel Composer
This article provides an overview of the methods available in Intel® Parallel Composer, along with a comparison of their key benefits. Type: Technical Article |
OpenMP Vectorization Parallel Composer compiler threading auto-parallelization |
11/02/2011
|
Curing Thread Imbalance Using Intel® Parallel Amplifier
Curing Thread Imbalance Using Intel® Parallel Amplifier (PDF 302KB)
Abstract
One of the performance-inhibiting factors in threaded applications is load imbalance. Balancing the workload among thread ... Type: Technical Article |
concurrency scheduling parallel amplifier threading scalability hotspot utilization PPGuide |
11/02/2011
|
Threading and Intel® Integrated Performance Primitives
Threading and Intel® Integrated Performance Primitives (PDF 230KB)
Abstract
There is no universal threading solution that works for all applications. Likewise, there are multiple ways for application ... Type: Technical Article |
|
11/02/2011
|