About Intel Cilk™ Plus and How To Get Started

Intel® Cilk™ Plus is a new method for implementing SIMD (SSEs) vectorization and parallel programs. It is provided by the Intel Parallel Composer 2011 product and the Intel C++ Composer XE (previously called Intel C++ Compiler Professional Edition) for Windows and Linux*.

When adding parallelism to the application for better performance or scaling, do not forget the hardware level data parallelism - vectorization. Ideally it should benefit most if using it with multi-threading technology like the Intel Cilk plus keywords below.

So below is the steps for adding parallelism with vectorization:
  1. identify hotspot loops or functions in the application: using Intel Parallel Advisor, Intel Parallel Amplifier, or Intel Vtune™ Amplifier XE.
  2. to take advantage of the hardware level SIMD support: using Intel Parallel Composer 2011 or Intel C++ Composer XE.
    1. Build your application at -O2 to enable auto-vectorization. This option is available for both Intel® and non-Intel microprocessors but it may result in more optimizations for Intel microprocessors than for non-Intel microprocessors.  You may use processor-specific options to take advantages of additional features of your processor like /QxSSE2,SSE3,SSE4.1 or /QaxSSE2,SSE3,SSE4.1 (-xSSE.. or -axSSE for Linux) or /arch:SSE2,SSE3,SSE4.1 (-msse2,sse3,sse4.1 for Linux).  For more details see /en-us/articles/performance-tools-for-software-developers-intel-compiler-options-for-sse-generation-and-processor-specific-optimizations.
      • use/Qvec-report[n] (-vec-report[n] for Linux) to check if the loops are vectorized
    2. apply array notation or elemental functions for hotspot loops and functions where auto-vectorization didn't work
      • use /Qvec-report[n] (-vec-report[n] for Linux) to check if the loops are vectorized
  3. add the Intel Cilk Plus keywords for paralleling the application: using Intel Parallel Composer 2011 or Intel C++ Composer XE.
  4. testing and debugging various tools: using Intel Parallel Debugger Extension to Visual Studio or Intel Parallel Inspector 2011 or XE on Windows, and Intel Debugger on Linux*.
  5. Note: the Intel Cilk Plus feature is supported by Intel C++ Composer XE and Intel Parallel Composer 2011 at the time of this writing. In order to keep the source code to be compilable by other compilers, the following methods could be used:
    • include following in the header file where the Intel Cilk plus keywords are used:
      #ifdef __cilk 
          #include <cilk.h> 
      #else 
          #define cilk_spawn 
          #define cilk_sync 
          #define cilk_for 
      #endif 
      
    • use "#ifdef __cilk" where the array notation or elemental functions are used:
      #ifdef __cilk
         c[:] = foo_calc(a[:], b[:])
      #else
         for (int i = 0; i < ARRSIZE; i++)
              c[i] = foo_calc(a[i], b[i]);
      #endif
      
The following is about some more information of the Intel Cilk Plus. It consists of following features:
  1. For adding SIMD vectorization to the program:
    1. Array Notation - a language extension for data parallelism for whole arrays or sections of arrays and operations thereon. It maps parallel constructs to the underlying SIMD hardware.
      • single dimension: A[<lower bound>:<length>:<stride>]; <stride> is optional, it has a default value of 1.
        • A[:] - refers the entire array
        • A[10:5] - refers A[10], A[11], A[12], A[13], A[14]
        • A[1:4:2] - refers A[1], A[3], A[5], A[7]
      • multiple dimensions: A[<lower bound>:<length>:<stride>][<lower bound>:<length>:<stride>]
        • A[:][:]- refers the entire 2 dimensional array
    2. Elemental Functions - data parallelism of whole functions or operations which can then be used with array notation. __declspec (vector [clauses]) return_type function_name (arguments)


    Example of array notation and elemental function used together:

    __declspec (vector) double foo_calc(double a, double b) 
    { 
        if (a > b) 
            return a; 
        else 
           return a+b; 
    } 
    void foo() 
    { 
        int a[ARRSIZE], b[ARRSIZE]; 
        int c[ARRSIZE]; 
        .......... 
        c[:] = foo_calc(a[:], b[:]) 
    } 
    
  2. For adding data or task parallelism to the program:
    1. New C/C++ language keywords:
      • cilk_for
      • cilk_spawn
      • cilk_sync
    2. Following hyper-objects (Reducers) to be used with above keywords for creating multi-threaded code:
      • reducer_min<Type>
      • reducer_max<Type>
      • reducer_opadd<Type>
      • reducer_opand<Type>
      • reducer_opor<Type>
      • reducer_opxor<Type>
      • reducer_basic_string<Elem, Traits, Alloc>
      • reducer_list_append<Type, Allocator>
      • reducer_list_prepend<Type, Allocator>
      • class reducer_ostream
  3. Note: It is possible to create a customized reducer to suit specific needs. Use the provided templates and class as examples.

Example of using array notation and cilk_for together:
__declspec (vector) double foo_calc(double a, double b) 
{ 
    if (a > b) 
        return a; 
    else 
        return a+b; 
} 
void bar() 
{ 
    int a[ARRSIZE], b[ARRSIZE]; 
    int c[ARRSIZE]; 
    .......... 
    cilk_for (int i = 0; i < ARRSIZE; i++) 
        c[i] = foo_calc(a[i], b[i]) 
} 

Optimization Notice in English

Notice revision #20110804

 

Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.