Intel® Parallel Studio

Superscalar Programming 101 (Matrix Multiply) Part 1 of 5

By Jim Dempsey

The subject matter of this article is: How to optimally tune a well known algorithm. We will take this well known (small) algorithm, a common approach to parallelizing this algorithm, a better approach to parallelizing this algorithm, and then produce a fully cache sensitized approach to parallelizing this algorithm. The intention of this article is to teach you a methodology of how to interpret the statistics gathered during test runs and then use those interpretations at improving your parallel code.

  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Server
  • Fortgeschrittene
  • Intel® C++-Compiler
  • Intel® Fortran Compiler
  • Intel® Parallel Composer
  • Intel® Parallel Studio
  • Intel® Parallel Studio XE
  • Intel® Cilk Plus Software Development Kit
  • Parallel Computing
  • Using Intel® Inspector XE 2011 to Find Data Races in Multithreaded Code

    Intel Inspector XE 2011 automatically finds memory errors, deadlocks and other conditions that could lead to deadlocks, data races, thread . Some specific issues associated with debugging multithreaded applications will be discussed in this article.
  • Entwickler
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Anfänger
  • Intel® Parallel Studio
  • Intel® Parallel Studio XE
  • Intel® Inspector XE
  • Intel® Parallel Inspector
  • critical section
  • data races
  • Learning Lab
  • OpenMP*
  • Parallel Computing
  • Threading
  • Loop Modifications to Enhance Data-Parallel Performance

    When confronted with nested loops, the granularity of the computations that are assigned to threads will directly affect performance. Loop transformations such as splitting and merging nested loops can make parallelization easier and more productive.
  • Entwickler
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Server
  • Fortgeschrittene
  • Intel® C++-Compiler
  • Intel® Fortran Compiler
  • Intel® Parallel Composer
  • Intel® Parallel Studio
  • Intel® Parallel Studio XE
  • Optimierung
  • Parallel Computing
  • Granularity and Parallel Performance

    One key to attaining good parallel performance is choosing the right granularity for the application. Granularity is the amount of real work in the parallel task. If granularity is too fine, then performance can suffer from communication overhead.
  • Entwickler
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Server
  • Fortgeschrittene
  • Intel® C++-Compiler
  • Intel® Fortran Compiler
  • Intel® Parallel Composer
  • Intel® Parallel Studio
  • Intel® Parallel Studio XE
  • Parallel Computing
  • Intel® Parallel Studio abonnieren