Intel® Parallel Studio XE

The Chronicles of Phi - part 5 - Plesiochronous phasing barrier – tiled_HT3

For the next optimization, I knew what I wanted to do; I just didn’t know what to call it. In looking for words that describes loosely-synchronous, I came across plesiochronous:

In telecommunications, a plesiochronous system is one where different parts of the system are almost, but not quite, perfectly synchronized.

The Chronicles of Phi - part 4 - Hyper-Thread Phalanx – tiled_HT2

The prior part (3) of this blog showed the effects of the first-level implementation of the Hyper-Thread Phalanx. The change in programming yielded 9.7% improvement in performance for the small model, and little to no improvement in the large model. This left part 3 of this blog with the questions:

What is non-optimal about this strategy?
And: What can be improved?

There are two things, one is obvious, and the other is not so obvious.

Data alignment

The Chronicles of Phi - part 3 Hyper-Thread Phalanx – tiled_HT1 continued

The prior part (2) of this blog provided a header and set of function that can be used to determine the logical core and logical Hyper-Thread number within the core. This determination is to be use in an optimization strategy called the Hyper-Thread Phalanx.

Intel® Advisor 2015 Beta Tutorials: Windows* OS

Discover how to find where to add parallelism to a serial application using the Intel® Advisor and the nqueens_Advisor C++ sample application.

This short tutorial demonstrates an end-to-end workflow you can ultimately apply to your own applications:

  1. Survey the target to locate the loops and functions where the target spends the most time.

  • Developers
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • C/C++
  • Intel® Parallel Studio XE
  • Intel® Advisor
  • Intel® Parallel Studio XE Cluster Edition
  • Intel® Advisor 2015 Beta Tutorials: Linux* OS

    Discover how to find where to add parallelism to a serial application using the Intel® Advisor and the nqueens_Advisor C++ sample application.

    This short tutorial demonstrates an end-to-end workflow you can ultimately apply to your own applications:

    1. Survey the target to locate the loops and functions where the target spends the most time.

  • Developers
  • Linux*
  • C/C++
  • Intel® Parallel Studio XE
  • Intel® Advisor
  • Intel® Parallel Studio XE Cluster Edition
  • Diagnostic 15002: loop was vectorized (Fortran)

     

    Cause:

    For the Intel® Compiler, vectorization is the unrolling of a loop combined with the generation of packed SIMD instructions. Because the packed instructions operate on more than one data element at a time, the loop can execute more efficiently. The above message indicates that the loop was successfully vectorized using packed SIMD instructions. 

    Example:

  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • Fortran
  • Intel® Fortran Compiler
  • Intel® Parallel Studio XE
  • Intel® Parallel Studio XE Composer Edition
  • Intel® Parallel Studio XE Professional Edition
  • vectorization
  • Diagnostics
  • Optimization
  • Vectorization
  • Subscribe to Intel® Parallel Studio XE