About Intel Cilk™ Plus and How To Get Started

This article has been updated for Intel C++ Composer XE 2013 and 2013 SP1 for Windows,  Linux* and Mac OS* X.

Intel® Cilk™ Plus is a new method for implementing SIMD (SSEs) vectorization and parallel programs. It is supported by the following Intel software development products:

  • Intel Parallel Composer 2011
  • Intel C++ Composer XE for Windows and Linux*
  • Intel C++ Composer XE 2011 for Windows, Linux
  • Intel C++ Composer XE 2011 for Mac OS* X update 6 or above
  • Intel C++ Composer XE 2013 for Windows, Linux and Mac OS X
  • Intel C++ Composer XE 2013 SP1 for Windows, Linux and Mac OS X

When adding parallelism to the application for better performance or scaling, do not forget the hardware level data parallelism - vectorization. Ideally it should benefit most if using it with multi-threading technology like the Intel Cilk plus keywords below.

So below is the steps for adding parallelism with vectorization:

  1. identify hotspot loops or functions in the application: using Intel Parallel Advisor, Intel Parallel Amplifier, or Intel Vtune™ Amplifier XE. For Mac OS X, you can use Apple's performance analyzer Shark.
  2. to take advantage of the hardware level SIMD support: using the Intel C++ compiler coming with the Intel software products listed above.
    1. Build your application at -O2 to enable auto-vectorization. This option is available for both Intel® and non-Intel microprocessors but it may result in more optimizations for Intel microprocessors than for non-Intel microprocessors.  You may use processor-specific options to take advantages of additional features of your processor like /QxSSE2,SSE3,SSE4.1 or /QaxSSE2,SSE3,SSE4.1 (-xSSE.. or -axSSE for Linux) or /arch:SSE2,SSE3,SSE4.1 (-msse2,sse3,sse4.1 for Linux).  For more details see /en-us/articles/performance-tools-for-software-developers-intel-compiler-options-for-sse-generation-and-processor-specific-optimizations.
      • use/Qvec-report[n] (-vec-report[n] for Linux) to check if the loops are vectorized
    2. apply array notation or SIMD-enabled functions for hotspot loops and functions where auto-vectorization didn't work
      • use /Qvec-report[n] (-vec-report[n] for Linux) to check if the loops are vectorized
  3. add the Intel Cilk Plus keywords for paralleling the application: using the intel C++ compiler.
  4. testing and debugging various tools: using Intel Parallel Inspector or Intel Inspector XE on Windows, and Intel Debugger on Linux* and Mac OS* X.
Note: the Intel Cilk Plus feature is supported by Intel C++ compiler. In order to keep the source code to be compilable by other compilers, the following methods could be used:
  • include following in the header file where the Intel Cilk plus keywords are used:
    #ifdef __cilk 
        #include <cilk.h> 
    #else 
        #define cilk_spawn 
        #define cilk_sync 
        #define cilk_for 
    #endif 
  • use "#ifdef __cilk" where the array notation or SIMD-enabled functions are used:
    #ifdef __cilk
       c[:] = foo_calc(a[:], b[:])
    #else
       for (int i = 0; i < ARRSIZE; i++)
            c[i] = foo_calc(a[i], b[i]);
    #endif

The following is some more information on using the Intel Cilk Plus in your application:

  1. For adding SIMD vectorization to the program:
    1. Array Notation- a language extension for data parallelism for whole arrays or sections of arrays and operations thereon. It maps parallel constructs to the underlying SIMD hardware.
      • single dimension: A[<lower bound>:<length>:<stride>]; <stride>is optional, it has a default value of 1.
        • A[:] - refers the entire array
        • A[10:5] - refers A[10], A[11], A[12], A[13], A[14]
        • A[1:4:2] - refers A[1], A[3], A[5], A[7]
      • multiple dimensions: A[<lower bound>:<length>:<stride>][<lower bound>:<length>:<stride>]
        • A[:][:]- refers the entire 2 dimensional array
    2. SIMD-enabled Functions - data parallelism of whole functions or operations which can then be used with array notation. __declspec (vector [clauses]) return_type function_name (arguments)


    Example of array notation and SIMD-enabled function used together:

    __declspec (vector) double foo_calc(double a, double b) 
    { 
        if (a > b) 
            return a; 
        else 
           return a+b; 
    } 
    void foo() 
    { 
        int a[ARRSIZE], b[ARRSIZE]; 
        int c[ARRSIZE]; 
        .......... 
        c[:] = foo_calc(a[:], b[:]) 
    } 
  2. For adding data or task parallelism to the program:
    1. New C/C++ language keywords:
      • cilk_for
      • cilk_spawn
      • cilk_sync
    2. Following hyper-objects(Reducers) to be used with above keywords for creating multi-threaded code:
      • reducer_min<Type>
      • reducer_min_index<Type>
      • reducer_max<Type>
      • reducer_max_index<Type>
      • reducer_opadd<Type>
      • reducer_opand<Type>
      • reducer_opor<Type>
      • reducer_opxor<Type>
      • reducer_basic_string<Elem, Traits, Alloc>
      • reducer_string
      • reducer_wstring
      • reducer_list_append<Type, Allocator>
      • reducer_list_prepend<Type, Allocator>
      • class reducer_ostream

Note : It is possible to create a customized reducer to suit specific needs. Use the provided templates and class as examples.

Example of using array notation and cilk_for together:

__declspec (vector) double foo_calc(double a, double b) 
{ 
    if (a > b) 
        return a; 
    else 
        return a+b; 
} 
void bar() 
{ 
    int a[ARRSIZE], b[ARRSIZE]; 
    int c[ARRSIZE]; 
    .......... 
    cilk_for (int i = 0; i < ARRSIZE; i++) 
        c[i] = foo_calc(a[i], b[i]) 
} 
 
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.