英特尔® Parallel Studio XE

Superscalar programming 101 (Matrix Multiply) Part 5 of 5

In part 4 we saw the effects of the QuickThread Parallel Tag Team Transpose method of Matrix Multiplication performed on a Dual Xeon 5570 systems with 2 sockets and two L3 caches, each shared by four cores (8 threads). and each processor with four L2 and four L1 caches each shared by one core and 2 threads, we find:

  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • 服务器
  • 中级
  • 英特尔® C++ 编译器
  • Intel® Fortran Compiler
  • 英特尔® Parallel Composer
  • 英特尔® Parallel Studio
  • 英特尔® Parallel Studio XE
  • 英特尔® Cilk Plus 软件开发套件
  • 并行计算
  • Superscalar programming 101 (Matrix Multiply) Part 3 of 5

    By Jim Dempsey

    In the previous article (part 2) we have seen that by reorganizing the loops and with use of temporary array we can observe a performance gain with SSE small vector optimizations (compiler does this) but a larger gain came from better cache utilization due to the layout change and array access order. The improvements pushed us into a memory bandwidth limitation whereby the Serial method now outperforms the Parallel method (of the Serial method).

  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • 服务器
  • 中级
  • 英特尔® C++ 编译器
  • Intel® Fortran Compiler
  • 英特尔® Parallel Composer
  • 英特尔® Parallel Studio
  • 英特尔® Parallel Studio XE
  • 英特尔® Cilk Plus 软件开发套件
  • 并行计算
  • Superscalar Programming 101 (Matrix Multiply) Part 1 of 5

    By Jim Dempsey

    The subject matter of this article is: How to optimally tune a well known algorithm. We will take this well known (small) algorithm, a common approach to parallelizing this algorithm, a better approach to parallelizing this algorithm, and then produce a fully cache sensitized approach to parallelizing this algorithm. The intention of this article is to teach you a methodology of how to interpret the statistics gathered during test runs and then use those interpretations at improving your parallel code.

  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • 服务器
  • 中级
  • 英特尔® C++ 编译器
  • Intel® Fortran Compiler
  • 英特尔® Parallel Composer
  • 英特尔® Parallel Studio
  • 英特尔® Parallel Studio XE
  • 英特尔® Cilk Plus 软件开发套件
  • 并行计算
  • 订阅 英特尔® Parallel Studio XE