Feedback on use of the Open source runtime

Feedback on use of the Open source runtime

Folks,

SC13 is approaching and we'll be talking about OpenMP in various places, such as the OpenMP BoF. I'd therefore be very interested to receive feedback from people who have downloaded the runtime and are using it.

  • What have you done with it? 
  •      Integrated with your tools? Which?
  • Have you had success?
  • Have you made changes?
  •     If so will you be submitting them back for integration into the mainstream code, or are you happy to keep paying the cost of re-integrating into each new release?

The more users we have, the better (and the easier it is to justfy the effort to our management :-)).Please reply here, or, if you don't want people to know in public, but are happy for me to know for use inside Intel, contact me directly at james.h.cownie@intel.com
Thanks

p.s. I'll be at SC all week, and am happy to meet any or all of you more informally (i.e. over a beer!), so by all means contact me as above if that appeals.

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi James!

I have not done much yet except for playing with undocumented environment variables, but there is a group here that will be adding instrumentation to automatically accumulate information about load imbalances.   This is especially convenient on Xeon Phi, where the overhead of the RDTSC instruction is so low (~5 cycles).  (IMHO this should have been a feature of every implementation of OpenMP since day 1.)

Using one of these undocumented environment variables, I have already gotten a ~33% reduction in the overhead of the OpenMP Reduction operator from the EPCC OpenMP benchmarks in C (version 2) when running 240 OpenMP threads on a Xeon Phi SE10P.   The specific change was to set KMP_FORCE_REDUCTIONS to "atomic", while it appears to default to "tree".  

I expect to be posting questions about the interpretation of the components of KMP_REDUCTION_BARRIER and KMP_REDUCTION_BARRIER_PATTERN, but if I can dig up the manpower, I hope to supervise the construction of a replacement implementation using memory addresses that map to DTDs that are "close to" the cores involved in each stage of the tree.

John D. McCalpin, PhD "Dr. Bandwidth"

Hello James,

In a collaboration between TU Dresden and Aachen University we plan to extend the Intel OpenMP Runtime in order to collect information for correctness checking and performance analysis tools, especially facing on OpenMP 4.0. For instance, we plan to extend the MPI correctness checking tool MUST (http://www.vi-hps.org/tools/must.html) for hybrid and OpenMP applications. Furthermore, we hope that we can benefit of the work done by the group around John Mellor-Crummey concerning the OMPT Interface (http://code.google.com/p/intel-openmp-rtl).

 

Best regards,

Tim Cramer

RWTH Aachen University

 

Leave a Comment

Please sign in to add a comment. Not a member? Join today