OpenMP 4.0 may offer important solutions for targeting and vectorization

The upcoming OpenMP 4.0 will be discussed at SC12, and there will be a number of additions I'm particularly excited to see coming from OpenMP.  They are: "SIMD extensions" and "targeting extensions."  One helps make the intention of a developer to have code vectorized efficiently be realized, and the other allows for the first time an industry standard to designate code and data be targeted to an attached device. The specification for "targeting extensions" is available now from OpenMP to encourage comment before full standardization, it is titled OpenMP Technical Report 1 on Directives for Attached Accelerators, and will be discussed along with other future OpenMP features at their SC12 BoF.

Both are worthy problems to see solved by the OpenMP standards body by bringing together many vendors and users.  OpenMP has helped bring together representatives from across the industry with many points of view, and to ensure standards that give developers a chance to write code that can span multiple architectures, while giving hardware vendors a chance to have their offerings well supported.

The "SIMD extensions" are more powerful and better specified than the vague but commonly supported "IVDEP" pragma. Many compilers support "IVDEP" but the definition of what that means, and what guarantees it gives vary a lot. This is a perfect place for a standards body to provide a more consistent and guaranteed approach.  Most commonly, "IVDEP" tells a compiler that it can ignore intra-loop assumed dependencies which may be just the trick needed for the compiler to vectorize a loop. This carries two flaws: it does not tell the compiler that it must vectorize, and it does not help vectorize code when assumed loop carried dependencies are not the barrier. The new "SIMD extensions " in OpenMP 4.0 will address these flaws and offer an industry standard approach to telling the compiler that it must vectorize a loop, and to allow some finer grained control on how it is vectorized. “IVDEP” pragmas did not have rich sets of clauses to give control like the “SIMD extensions” will have. We can all learn more with the release of the "SIMD extensions " proposal at SC12.  I say "proposal" because OpenMP will be releasing for public comment the pieces that will probably become OpenMP 4.0. Discussions will occur at the OpenMP BoF at SC12 (information at the end of my blog on that).

The "targeting directives" also fill a strong need for addressing the problem of "offloading" code and data to an attached device. These have been called "accelerator directives" by PGI and others, "offload directives" by Intel, and now "targeting directives" by OpenMP. OpenMP took a general and inclusive approach in order to let code span a great variety of devices. This resembles the commitment we've seen with groups such as Khronos OpenCL efforts to be inclusive. OpenMP has consistently over the years been inclusive with two objectives: (1) be able to have code written that shows off any given hardware to its potential, (2) have code be portable as much as possible to not require rewriting for each piece of hardware.

The "targeting directives" intended for OpenMP 4.0, have the challenge of spanning the likes of NVidia GPUs (SIMT oriented), and Intel Xeon Phi coprocessors (SMP-on-a-chip), and Intel HD Graphics (vector oriented GPUs), other GPUs (like AMD), and other potential attached processing devices and future ones as well. It is not easy to bring multiple companies together to find common ground.  In the interim, NVidia has been happy with OpenACC designed for their GPUs and supported during the prior year, and Intel has had its offload directives for Intel Xeon Phi coprocessors for about two years now. Each demonstrated capabilities, which can inspire and inform an inclusive standard. Now OpenMP will share a specification that OpenMP believes does that, and ask for input and comment from users and implementers alike. No doubt, there is work remaining to be done - but the result is well worth the work. Neither OpenACC nor Intel's offload directives come close to the inclusion a standard requires to be useful.  OpenMP has done well to bring us closer to that goal. OpenMP’s open process has published their current draft for comment as 

OpenMP has a BoF at SC12 on Tuesday, Nov 13th, 5:30 - 7:00pm in Room 355-A (click this link to check SC12 official site in case it changes).  I will be in another BoF sharing thoughts on directives for accelerators (same time - different room).  I'll be talking about "targeting directives" there as part of comments on what we should aspire too in order for developers to be able to truly code with confidence, performance and portability.  I look forward to the conversations and interactions we'll have at SC12.

OpenMP is fifteen years old, and apparently that means cake at 12:30pm and beer at 3:30pm on Tuesday and Wednesday at their booth at Supercomputing according to their website.  I hope I can drop by to sample at least some beer.

Intel has been part of OpenMP, along with other founders, for all fifteen years. I was one of a handful of people involved at Intel from the earliest days, even before formation. I’ve been proud to be part of supporting OpenMP over the years with Intel’s leadership compilers. OpenMP 4.0 should be no different. Users have told us how much they want better vectorization, and a standard for targeting devices well. We plan to move quickly to support OpenMP 4.0, as we did for OpenMP 3.0, OpenMP 2.0 and OpenMP 1.0.  It has been an effective and valuable standard for compiler writers and compiler users alike.

Para obtener información más completa sobre las optimizaciones del compilador, consulte nuestro Aviso de optimización.