Use of Profile Guided Optimization with OpenMP* may substantially increase the execution time for the generation of the profile (.dyn file). This is a known issue which is being addressed. The use of profile guided optimization in conjunction with OpenMP is not recommended.
As part of my focus on software performance, I also support and consult on implementing scalable parallelism in applications. There are many reasons to implement parallelism as well as many methods for doing it - but this blog is not about either of those things. This blog is about the performance advantages of one particular way of implementing parallelism - and, luckily, that way is supported by several models available.
Program analysis tools can be valuable for debugging program correctness and performance issues, even more so for multi-threaded programs. Some of these tools need to know about certain events in the program. For example, race detection for Intel® Cilk™ Plus programs requires knowing precisely when spawn and sync events happen. Similar events are necessary to analyze Intel® TBB programs and OpenMP.