PLDI Tutorial: Using the Intel(R) C++ Compiler for General Purpose Computation Offload to Intel(R) Processor Graphics

By Anoop Madhusoodhanan Prabha, Published: 02/27/2015, Last Updated: 02/27/2015

When: Saturday, June 13th, 2015 (9am to 12:30pm)
Where: At PLDI in Portland, OR, USA


Since 2nd Generation Intel® Core Processors most processors come with Intel® HD Graphics on-chip to provide high performance graphics without the added cost and power of a discrete graphics add-in card. When an application doesn’t rely on or fully utilize display and graphics, for example an embedded application, the GPU can be used to offload parallel computations taking advantage of both thread parallelism and vector instructions for data parallelism. A workload can either run completely on the GPU or it can be partitioned between CPU and GPU. One advantage of the on-chip processor graphics is physical memory shared between CPU and GPU thus providing no-copy overhead in sharing data between the CPU and the GPU.

The C/C++ Cilk Plus parallel programming model, with small extensions for offload, is used to take advantage of the computational capabilities of the GPU. The C++ compiler handles all aspects of both the host and target side compilation including setting up data to be shared and kernel parameters. This provides an easy heterogeneous programming model that is similar between host and target and lets the programmer focus on the algorithm and performance.

The Tutorial format will be a presentation on the programming model as well as demonstrations on selected examples. The tutorial will give insight into parallel programming using Cilk™ Plus, offload to processor graphics, as well as tuning and debugging.  Specific examples will be shown on how to port an application to take advantage of offload.

Topics Covered

  • Motivation on using Intel(R) Processor Graphics for General Purpose Computing.
  • Performance gain using CPU and GPU cores
  • Intel(R) Cilk(TM) Plus Programming Model
    • cilk_for keyword
    • Explicit Vectorization Tools
      • SIMD-enabled function
      • Array Notation
      • #pragma simd
  • Offload Support
    • Synchronous Offload
    • Asynchronous Offload
    • OpenMP Offload Support
  • Debugger Support
  • Intel(R) Processor Graphics Architecture
  • Memory Model of Processor Graphics
  • Code Generation options
    • Virtual Instruction Set Architecture (ISA) - Jitter approach
    • Native ISA
  • New features from 5th Generation Intel(R) Core(TM) Processor
    • Shared Virtual Memory
    • Shared Local Memory
  • Hardware Platform Support
  • Operating Support
  • Limitations of Offload Model
  • Customer Feedback
  • Demonstrate performance using Sample applications


Knud J. Kirkegaard is a Principal Engineer in the Intel’s Mobile Computing and Compilers group. He currently works as architect on the C/C++ compiler with Cilk™ Plus supporting heterogeneous computing on Intel® Graphics Technology. Since he joined Intel, he has worked on scalar optimizations, interprocedural optimizations, profile guided optimizations, and Cilk™ Plus. His current interests are in parallel computing, heterogeneous computing, optimized C++ code, and compiler architecture. He has an M.S. degree in Information and Control Systems Engineering from Aalborg University, Denmark. His e-mail is

Anoop Madhusoodhanan Prabha is a Software Engineer in Intel's Software and Services Group. He currently works as a Technical Consulting Engineer on the C/C++ compiler support team. He joined Intel on 1st August 2009. Since he joined Intel, he has worked on optimizing various customer applications by enabling multi-threading and vectorization. He has experience working with OpenMP, Cilk™ Plus, TBB, CUDA etc. His current interest are in Processor and GPU architecture, heterogeneous computing and high performance computing . He has an M.S. degree in Electrical Engineering from State University of New York at Buffalo, US. His e-mail is


Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804