Intel® Parallel Studio XE 2015 Update 2 Cluster Edition Readme

The Intel® Parallel Studio XE 2015 Update 2 Cluster Edition for Linux* and Windows* combines all Intel® Parallel Studio XE and Intel® Cluster Tools into a single package. This multi-component software toolkit contains the core libraries and tools to efficiently develop, optimize, run, and distribute parallel applications for clusters with Intel processors.  This package is for cluster users who develop on and build for IA-32 and Intel® 64 architectures on Linux* and Windows*, as well as customers running over the Intel® Xeon Phi™ coprocessor on Linux*. It contains:

  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • Server
  • C/C++
  • Fortran
  • Intel® Parallel Studio XE Cluster Edition
  • Message Passing Interface
  • OpenMP*
  • Cloud computing
  • Using L1/L2 cache as a scratchpad memory

    Dear all,

    Explicitly cache control is a one of important feature in Xeonphi (MIC). How could I use the L1 or L2 as scratchpad memory and also sharing them data between the cores?

    In addition,  is there any way to hack the MESI state of the cache line in the distributed tag directory (DTD)? 

    Thanks in advance.


    Performance comparison between Intel TBB task_list, openMP task and parallel for

    I am planning on parallelizing a hotspot in a project. And I would like to know your opinion between the performance evaluation between parallel for, omp single followed by task and intel TBB task_list, under ideal conditions where number of threads are equal to computation items and when computation are much greater than available threads to see scheduling overhead(in order to evaluate the most efficient scheduler). I will also, be writing some sample test programs to evaluate myself but I also wanted to know if anybody had previously made these evaluations.

    Thanks in advance.

    Further information about different barrier algorithms


    I'm researching on barrier algorithms using SIMD instructions and I'm trying to deeply understand the different versions included in the RTL.

    I've noticed that there is a new barrier algorithm (hierarchical) since the last time I had a look.

    Where could I find a further description of them? Could someone from Intel provides me with further information?


    Thank you in advance.

    Kind regards.

    an interesting and serious topic

    Hello there:

             I have found an interesting  appearance which I can not explain. Okey, let's go.

             I apply "micsmc" to surveiling the offload running state of MIC. The critical code like this:

    #pragma offload target(mic:0) inout(XXXX) in (XXXX)
    #pragma omp parallel for schedule (dynamic)
    for( int i = 0; i < num_cluster; i++) // num_cluster from 60 to 300,concentrated on 90~150
      do something....

             And then set the environment variables :

    export OMP_NUM_THREADS=X
    export KMP_AFFINITY=compact

    can't start because libiomp5md.dll is missing from your computer

    Hi all:

    I met one problem when I use XE2015 with visual studio 2010 in windows 7, 64 bit.


    The program can compiled successfully, but when I run it in CMD command window, I met the following error:


    Can anyone tell me how to fix this? I am new to this, kindly let me know the detail procedure to solve this problem. Thanks in advance!

    Introduction to OpenMP* on YouTube

    Tim Mattson (Intel), has authored an extensive series of excellent videos as in introduction to OpenMP*. Not only does he walk through a series of programming exercises in C, he also starts with a background introduction on parallel programming.

    Check out the series:

    Iscriversi a OpenMP*