Intel® Parallel Studio

Superscalar Programming 101 (Matrix Multiply) Part 1 of 5

By Jim Dempsey

The subject matter of this article is: How to optimally tune a well known algorithm. We will take this well known (small) algorithm, a common approach to parallelizing this algorithm, a better approach to parallelizing this algorithm, and then produce a fully cache sensitized approach to parallelizing this algorithm. The intention of this article is to teach you a methodology of how to interpret the statistics gathered during test runs and then use those interpretations at improving your parallel code.

  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Server
  • Intermedio
  • Compilatore C++ Intel®
  • Compilatore Fortran Intel®
  • Intel® Parallel Composer
  • Intel® Parallel Studio
  • Intel® Parallel Studio XE
  • Kit di sviluppo del software Intel® Cilk Plus
  • Intel® Parallel Studio XE Composer Edition
  • Elaborazione parallela
  • 循环修改增强数据并行性能

    When confronted with nested loops, the granularity of the computations that are assigned to threads will directly affect performance. Loop transformations such as splitting and merging nested loops can make parallelization easier and more productive.
  • Sviluppatori
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Server
  • Intermedio
  • Intel® Parallel Studio XE
  • Intel® Parallel Studio XE Composer Edition
  • Compilatore C++ Intel®
  • Compilatore Fortran Intel®
  • Intel® Parallel Composer
  • Intel® Parallel Studio
  • Ottimizzazione
  • Elaborazione parallela
  • Loop Modifications to Enhance Data-Parallel Performance

    When confronted with nested loops, the granularity of the computations that are assigned to threads will directly affect performance. Loop transformations such as splitting and merging nested loops can make parallelization easier and more productive.
  • Sviluppatori
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Server
  • Intermedio
  • Intel® Parallel Studio XE
  • Compilatore C++ Intel®
  • Compilatore Fortran Intel®
  • Intel® Parallel Composer
  • Intel® Parallel Studio
  • Intel® Parallel Studio XE Composer Edition
  • Ottimizzazione
  • Elaborazione parallela
  • Granularity and Parallel Performance

    One key to attaining good parallel performance is choosing the right granularity for the application. Granularity is the amount of real work in the parallel task. If granularity is too fine, then performance can suffer from communication overhead.
  • Sviluppatori
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Server
  • Intermedio
  • Compilatore C++ Intel®
  • Compilatore Fortran Intel®
  • Intel® Parallel Composer
  • Intel® Parallel Studio
  • Intel® Parallel Studio XE
  • Intel® Parallel Studio XE Composer Edition
  • Elaborazione parallela
  • Iscriversi a Intel® Parallel Studio