Intel® Parallel Studio

Threaded Programming Methodology with Parallel Studio

In this 3 hour module, participants will learn the evolution of parallel processing architectures. After completing this module, a student should be able to describe how threading architectures relates to software development, to rapidly estimate the effort required to thread time consuming regions and to prototype the solution.

Topics covered include:

Utiliser Intel VTune pour identifier les goulets d'étranglements

C'est le premier concours Accelerate sur lequel nous avons aussi facilement accès aux outils d'Intel tel qu'Intel Inspector, ou Intel VTune.
Lors de l'édition précédente, Maxime et moi avions souffert de ce manque. Nous avions identifié trop tard un goulet dans notre code en nous connectant en mode bureau sur la MTL par SSH pour exécuter Vtune et en regardant le profil d'exécution.

Superscalar programming 101 (Matrix Multiply) Part 5 of 5

In part 4 we saw the effects of the QuickThread Parallel Tag Team Transpose method of Matrix Multiplication performed on a Dual Xeon 5570 systems with 2 sockets and two L3 caches, each shared by four cores (8 threads). and each processor with four L2 and four L1 caches each shared by one core and 2 threads, we find:

  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Server
  • Intermediate
  • Intel® C++ Compiler
  • Intel® Fortran Compiler
  • Intel® Parallel Composer
  • Intel® Parallel Studio
  • Intel® Parallel Studio XE
  • Intel® Cilk Plus Software Development Kit
  • Parallel Computing
  • Superscalar programming 101 (Matrix Multiply) Part 3 of 5

    By Jim Dempsey

    In the previous article (part 2) we have seen that by reorganizing the loops and with use of temporary array we can observe a performance gain with SSE small vector optimizations (compiler does this) but a larger gain came from better cache utilization due to the layout change and array access order. The improvements pushed us into a memory bandwidth limitation whereby the Serial method now outperforms the Parallel method (of the Serial method).

  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Server
  • Intermediate
  • Intel® C++ Compiler
  • Intel® Fortran Compiler
  • Intel® Parallel Composer
  • Intel® Parallel Studio
  • Intel® Parallel Studio XE
  • Intel® Cilk Plus Software Development Kit
  • Parallel Computing
  • Superscalar Programming 101 (Matrix Multiply) Part 1 of 5

    By Jim Dempsey

    The subject matter of this article is: How to optimally tune a well known algorithm. We will take this well known (small) algorithm, a common approach to parallelizing this algorithm, a better approach to parallelizing this algorithm, and then produce a fully cache sensitized approach to parallelizing this algorithm. The intention of this article is to teach you a methodology of how to interpret the statistics gathered during test runs and then use those interpretations at improving your parallel code.

  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Server
  • Intermediate
  • Intel® C++ Compiler
  • Intel® Fortran Compiler
  • Intel® Parallel Composer
  • Intel® Parallel Studio
  • Intel® Parallel Studio XE
  • Intel® Cilk Plus Software Development Kit
  • Parallel Computing
  • Subscribe to Intel® Parallel Studio