OpenMP very slow when run outside of Visual Studio

OpenMP very slow when run outside of Visual Studio

Since we are using intel MKL library we have to load INTEL's OpenMP library (libiomp5md.dll) at run time and exclude vcomp.lib at link time. But we have to compile and link with VC++. With my release 64 bit build if I run it directly, part of my code won't fully utilize the cores I specified and it runs very slowly. It seems to be using multiple cores but might be even slower than one core. If I attach it (release build) to the visual studio debugger without doing anything else, then it fully utilize the cores I specified. Does anybody have any ideas?

We are using Visual Studio 2010 on Window 7 professional. libiomp5md.dll shows file version of 5.0.2012.803.

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Isaac Liu,

Does the app contain #pragma omp? If yes, does the app call MKL from OMP sections?

Thanks,

Evgueni.

Hi Evgueni,

This is a very big application. The part with issue uses OpenMP but not MKL. Other parts of this application uses MKL. My code uses a lot of OpenMP. Most of them works great and the code in trouble is actually very similar to other part.

Thanks,

Isaac

As I read the original post, it was recognized that vcomp.lib has to be excluded so that only the single Intel OpenMP instance is active, as that will support the vcomp calls.

This raises the possibility of working with KMP_AFFINITY and number of threads so as to improve the distribution of work across cores.

If Intel(c) hyperthreading is active, MKL will use a single thread per core, but you will need to set OMP_NUM_THREADS and KMP_AFFINITY to get a similar effect from the C++ parallel regions, e.g.

KMP_AFFINITY=compact,1,1

to spread threads out 1 per core.

I don't know what effects might be produced by transitioning from 1 thread per core in MKL to something different in the C++ code.

If you have a 2 socket platform affinity will be particularly important.

It is hard to guess what may be happening without knowing details of the application.  Do the application creates threads for example (I mean non-OpenMP threads)? If it does then the resources oversubscription is possible. Some applications gain from setting environment variable KMP_BLOCKTIME=0, especially in case of oversubscription, when idle-spinning OpenMP worker threads slow down active OpenMP threads.

If the problem is different, then you can try to create small reproducer and submit support request.

- Andrey

After some trial and error the issue is resolved. Part of my code is called repeatedly, in the millions, and it  uses a few local std::vector of some data type of size about 100s bytes. The memory management should be very simple compared to the complexity of the computations involved. But somehow the memory management brings down the whole process.

well. thanks for letting us know about that cause.

>>...Part of my code is called repeatedly, in the millions, and it uses a few local std::vector of some data type of
>>size about 100s bytes. The memory management should be very simple compared to the complexity of
>>the computations involved. But somehow the memory management brings down the whole process...

It is hard to tell you what could be exactly wrong but I would assume that there is a problem with Heap defragmentation.

Leave a Comment

Please sign in to add a comment. Not a member? Join today