ICC + OpenMP + other threads "bug" / annoiance

ICC + OpenMP + other threads "bug" / annoiance

I have a testapplication that combines OpenMP together with another threading tool.

When the test application starts and main performs

OpenMP parallel region
Other threading tool parallel code
OpenMP parallel region
...

This works fine.
However,
When the test application starts and main performs

Other threading tool parallel code ** this first
OpenMP parallel region
Other threading tool parallel code
OpenMP parallel region
...

OpenMP does not complain
OpenMP initializes its thread pool
*** but it only runs using 1 thread

MS VS 2005 C++ OpenMP does not exhibit this problem

The work-around is to place near the top of main (prior to starting other threading tool when it is first)

#pragma omp parallel
{
Sleep(0); // or other OpenMP initialization here
}

Just thought I would pass the work-around about this forum

Jim Dempsey

www.quickthreadprogramming.com
15 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Interesting Jim, thanks for the work-around. Which other threading tool you use in your test code? Just curious. Also, I'll try to reproduce this and file an issue with the developers, just FYI.
-regards,
Kittur

Quoting - Kittur Ganesh (Intel)
Interesting Jim, thanks for the work-around. Which other threading tool you use in your test code? Just curious. Also, I'll try to reproduce this and file an issue with the developers, just FYI.
-regards,
Kittur

I have my own threading tool - QuickThread

We will be discussing this on

Parallel Programming Talk - Internet Radio : Feb 24th at 8:00AM PST

http://www.blogtalkradio.com/MulticoreSoftware

Jim

www.quickthreadprogramming.com

I think you could reproduce the problem with

int _tmain(int argc, _TCHAR* argv[])
{
#if 0 // enable to correct problem
#pragma omp parallel
{
Sleep(0);
}
#endif
OpenWindowsThreads();
#pragma parallel for
for(int i=0; i<100; ++i)
Sleep(i); // put Break on here, observe Thread ID's
}

www.quickthreadprogramming.com

Quoting - jimdempseyatthecove

I think you could reproduce the problem with

int _tmain(int argc, _TCHAR* argv[])
{
#if 0 // enable to correct problem
#pragma omp parallel
{
Sleep(0);
}
#endif
OpenWindowsThreads();
#pragma parallel for
for(int i=0; i<100; ++i)
Sleep(i); // put Break on here, observe Thread ID's
}

Thanks again Jim. I'll file a tracker with the developers on this issue, appreciate much.
-regards,
Kittur

Hi Jim,
I did pass by your test case to our developers, and they did make two observations:

1. The test case uses "#pragma parallel for" and is not valid OpenMP syntax.and it should be "#pragma omp parallel for". Probably this is just a typo, but please confirm that the correct syntax is used in the real code. .

2. Also, noticed that the test case indicates thatyou're using a breakpoint in a debugger to figure out how many threads there are. This is not a reliable way of determining the number of threads as it creates a race condition in that it depends on how long it takes to create a thread vs how long it takes for the main thread to reach the breakpoint. If the main thread reaches the breakpoint before the other threads are created, no other threads will appear. A more reliable way to determine the number of threads is to either have each thread print its thread number inside the parallel region or to call omp_get_num_threads().

Please let me know what your input is on the above and if after verifying the number of threads with the above approach you still see an issue (if so, again appreciate a test case that I can pass on to the developers). I've yet filed an issue on this, just FYI.

Appreciate your input.

-regards,
Kittur

1. was type-o (code on another system) just typed in sample.

2. I am not determining the number of theads with the break point. I am determining if each thread reaches the break point.

The break point is in the code section shared by all threads. As each thread reaches the break point the highlight arrow in the Threads window will (should) hop from one thread to another _provided_ that each thread is running.
After 1st break, F5 should advance to one of the other threads but it does not. If you do not want to use a break point insert an infinite compute loop in place of the Sleep(0)and look at the TaskManager | Performance | CPU usage you will see only one thread running.

If I were to make an educated guess at the problem
a) program starts
b)creation of non-OpenMPadditional threads by application
c) resuming inmain thread
d)then entering 1st time OpenMP parallel region
e) OpenMP notes 1st use and correctly creates thread pool

Then OpenMP incorrectly assumes the parallel region is a nested parallel region with nested off meaning run with thread entering region.

or

Then OpenMP thinks additional threads are OpenMP threads but that they are busy and won't distribute work to other threads. i.e. it thinks the additional threads are OpenMP threads - which they aren't

Just fix the type-o and run the test. Or use the explanation and write your own test case.

Jim

www.quickthreadprogramming.com

Quoting - jimdempseyatthecove

1. was type-o (code on another system) just typed in sample.

2. I am not determining the number of theads with the break point. I am determining if each thread reaches the break point.

The break point is in the code section shared by all threads. As each thread reaches the break point the highlight arrow in the Threads window will (should) hop from one thread to another _provided_ that each thread is running.
After 1st break, F5 should advance to one of the other threads but it does not. If you do not want to use a break point insert an infinite compute loop in place of the Sleep(0)and look at the TaskManager | Performance | CPU usage you will see only one thread running.

If I were to make an educated guess at the problem
a) program starts
b)creation of non-OpenMPadditional threads by application
c) resuming inmain thread
d)then entering 1st time OpenMP parallel region
e) OpenMP notes 1st use and correctly creates thread pool

Then OpenMP incorrectly assumes the parallel region is a nested parallel region with nested off meaning run with thread entering region.

or

Then OpenMP thinks additional threads are OpenMP threads but that they are busy and won't distribute work to other threads. i.e. it thinks the additional threads are OpenMP threads - which they aren't

Just fix the type-o and run the test. Or use the explanation and write your own test case.

Jim

Thanks Jim for your take. Let me go over this carefully andinvestigate further. Will update you thereafter, appreciate much.

-regards,
Kittur

Pardon me Jim, for the delay as I was busy with conference demo work etc., and couldn't get to this soon.

Well, I did create the test case as you suggest (see attached cpp & .out files) and I couldn't reproduce the problem with all versions of the compiler including upcoming 11.1 beta (& Composer version too). The program generated threads according to the value set in OMP_NUM_THREADS too.

My only hunch is: Can you check to make sure that OMP_NUM_THREADS is by chance not set to 1 in your test run?

If you still insist it's a bug, I suggest you to please file the issue in Premier with the test case attached. Also, attach the run output after setting KMP_SETTINGS=1 and KMP_VERSION=1 to help investigation, appreciate much.

-regards,
Kittur

Quoting - Kittur Ganesh (Intel)

Pardon me Jim, for the delay as I was busy with conference demo work etc., and couldn't get to this soon.

Well, I did create the test case as you suggest (see attached cpp & .out files) and I couldn't reproduce the problem with all versions of the compiler including upcoming 11.1 beta (& Composer version too). The program generated threads according to the value set in OMP_NUM_THREADS too.

My only hunch is: Can you check to make sure that OMP_NUM_THREADS is by chance not set to 1 in your test run?

If you still insist it's a bug, I suggest you to please file the issue in Premier with the test case attached. Also, attach the run output after setting KMP_SETTINGS=1 and KMP_VERSION=1 to help investigation, appreciate much.

-regards,
Kittur

I can structure a test case, however, the submission will have to include a .LIB file that performs the non-OpenMP threading. The main will be in source so you could verify what is happening.

Jim Dempsey

www.quickthreadprogramming.com

Quoting - jimdempseyatthecove

I can structure a test case, however, the submission will have to include a .LIB file that performs the non-OpenMP threading. The main will be in source so you could verify what is happening.

Jim Dempsey

Thank Jim, appreciate much. BTW, I didn't see the test pgm files I attached in my previous email. So, I am attaching now, just FYI.

-regards,
Kittur

Attachments: 

AttachmentSize
Download openmp-run.cpp885 bytes
Download openmp-run.out2.07 KB

Quoting - Kittur Ganesh (Intel)

Quoting - jimdempseyatthecove

I can structure a test case, however, the submission will have to include a .LIB file that performs the non-OpenMP threading. The main will be in source so you could verify what is happening.

Jim Dempsey

Thank Jim, appreciate much. BTW, I didn't see the test pgm files I attached in my previous email. So, I am attaching now, just FYI.

-regards,
Kittur

Kittur,

Your code is not correct. It is missing the thread proc parameter (3rd arg). You have

printf("Kicking-off windows threads... n");
wthread = CreateThread(
	NULL,	// LPSECURITY_ATTRIBUTES lpThreadAttributes (ok)
	0, 	// SIZE_T dwStackSize (ok)
	NULL, 	// LPTHREAD_START_ROUTINE lpStartAddress (*** not good ***)
	&tdata, // LPVOID lpParameter (ok)
	0, 	// DWORD dwCreationFlags (ok)
	NULL);	// LPDWORD lpThreadId (ok)

You have no thread start address
Try something like

DWORD WINAPI ThreadProc(
  LPVOID lpParameter
)
{
  printf("windows threadn");
  Sleep(10000);

}


...
printf("Kicking-off windows threads... n");
wthread = CreateThread(
	NULL,	// LPSECURITY_ATTRIBUTES lpThreadAttributes (ok)
	0, 	// SIZE_T dwStackSize (ok)
	ThreadProc, 	// LPTHREAD_START_ROUTINE lpStartAddress (OK)
	&tdata, // LPVOID lpParameter (ok)
	0, 	// DWORD dwCreationFlags (ok)
	NULL);	// LPDWORD lpThreadId (ok)
...

Jim

www.quickthreadprogramming.com

Hi Jim,
Pardon me again - was off all week for the conference and just remembered I had to look into this :-(

Oops you're correct I didn't have the thread starting address in the code.

Well, I corrected that like you mention, and ran but couldn't reproduce the problem. I tried with all
the 11.0, 11.1 and the Composer versions of the compiler too. The program runs fine and openmp threads are created (including the windows thread we create before)

Please see the attached code & output files (main.cpp, main.out). Thanks much, for your patience on this :-)

-regards,
Kittur

Attachments: 

AttachmentSize
Download main.cpp2.75 KB
Download main.out3.9 KB

Quoting - Kittur Ganesh (Intel)
Hi Jim,
Pardon me again - was off all week for the conference and just remembered I had to look into this :-(

Oops you're correct I didn't have the thread starting address in the code.

Well, I corrected that like you mention, and ran but couldn't reproduce the problem. I tried with all
the 11.0, 11.1 and the Composer versions of the compiler too. The program runs fine and openmp threads are created (including the windows thread we create before)

Please see the attached code & output files (main.cpp, main.out). Thanks much, for your patience on this :-)

-regards,
Kittur

Kittur,

Well "dang!" when using the other thread tool in your test app (taking out the Windows thread) the problem does not appear???

A fleeting problem. Will have to try to produce a simple failing example.

Jim

www.quickthreadprogramming.com

Interesting. Sure Jim, let me know when you have some test that can reproduce the problem. In the meantime, you've a great weekend.

-Cheers,
Kittur

Leave a Comment

Please sign in to add a comment. Not a member? Join today