omp_get_thread_num() sometimes delivers wrong values

omp_get_thread_num() sometimes delivers wrong values

Urs's picture

Hi,

maybe this is a general Intel OpenMP problem, but I observed it when testing the Parallel Studio beta.

I would have expected to get different values from omp_get_thread_num() when 2 blocks of code run in different threads.

But when adding:
#pragma omp critical
{
cout << omp_get_thread_num() << "/" << GetCurrentThreadId() << " ";
}

To a parallel section I get outputs like 0/8640 0/9268 0/8640 0/9268 0/8640... So obviously I have the threads 8640 and 9268, but omp_get_thread_num() always returns 0.

How can this be?

Cheers,
Urs

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Kittur Ganesh (Intel)'s picture

Hi Urs,

I just couldn't reproduce this problem and works fine for me for a non-nested proper parallel region. Note that the function called from a serial region, omp_get_thread_num returns 0. If called from within a nested parallel region that is serialized, this function returns 0 also (that is if nested parallelism is disabled which is the default).

Please attach a small test case that shows the parallel region and invokation of that function, appreciate much.
Also, could you set the following environment variables and attach the output of the testcase that you get also:
KMP_SETTINGS=1 and KMP_VERSION=1

-thanks,
Kittur

Urs's picture

Hello Kittur,

yes, I also thought it would be because of serialized regions. But Why is the output of omp_get_thread_num() different from GetCurrentThreadId() (of course the ID is different, but I would expect the same ID for all theads having the omp-number "0").

Example, take a tree:

class Tree
{
public:
Tree* Left;
Tree* Right;
int Data;
int ProcessItem();
}

And a recursive traversal, e.g.:

static void WalkTreeOmpParReg(Tree* tree)
{
if (tree == NULL) return;
#pragma omp parallel sections firstprivate(tree)
{
#pragma omp section
WalkTreeOmpParReg(tree->Left);
#pragma omp section
WalkTreeOmpParReg(tree->Right);
}
tree->ProcessItem();
}

In ProcessItem I do a spin wait and do an output of the thread number and ID:
#pragma omp critical
{
cout << omp_get_thread_num() << "/" << GetCurrentThreadId() << " ";
}

And I get reults like 0/8640 0/9268 0/8640 0/9268 0/8640...
This means OpenMP says: I am always in thead "0" (acually only once I get the "1").
Windows says: you are woking in two different threads.
The Task Manager shows a 100% utilization of a dual core, so I believe in the Windows output rather than OpenMP.
Furrther, compared to the sequencial version the speedup is close to 2. Another indication for two threads, proceessing about the same number of tree nodes each.

omp_set_nested() crashes for some reason. I did nont figure out why, yet.

Thanks, cheers,
Urs

om-sachan (Intel)'s picture
Quoting - Urs Hello Kittur,

yes, I also thought it would be because of serialized regions. But Why is the output of omp_get_thread_num() different from GetCurrentThreadId() (of course the ID is different, but I would expect the same ID for all theads having the omp-number "0").

Example, take a tree:

class Tree
{
public:
Tree* Left;
Tree* Right;
int Data;
int ProcessItem();
}

And a recursive traversal, e.g.:

static void WalkTreeOmpParReg(Tree* tree)
{
if (tree == NULL) return;
#pragma omp parallel sections firstprivate(tree)
{
#pragma omp section
WalkTreeOmpParReg(tree->Left);
#pragma omp section
WalkTreeOmpParReg(tree->Right);
}
tree->ProcessItem();
}

In ProcessItem I do a spin wait and do an output of the thread number and ID:
#pragma omp critical
{
cout << omp_get_thread_num() << "/" << GetCurrentThreadId() << " ";
}

And I get reults like 0/8640 0/9268 0/8640 0/9268 0/8640...
This means OpenMP says: I am always in thead "0" (acually only once I get the "1").
Windows says: you are woking in two different threads.
The Task Manager shows a 100% utilization of a dual core, so I believe in the Windows output rather than OpenMP.
Furrther, compared to the sequencial version the speedup is close to 2. Another indication for two threads, proceessing about the same number of tree nodes each.

omp_set_nested() crashes for some reason. I did nont figure out why, yet.

Thanks, cheers,
Urs

I can help you investigate it if you help me with code segment that compiles and run. Also we will need compilation command and test data if any.

Thanks,

Om

Kittur Ganesh (Intel)'s picture

Hi Urs,

Using the code snippet you've given, I trieda small testcase.Both, Microsoft compiler cl as well as Composer gives identical results that you describe. The reason is due to the omp_get_thread_num function called within a nested parallel region that is serialized during tree traversal returns 0, even though the threadID is different. Setting omp_set_nested enabled will only output the correct omp_num_threads value, but the omp_get_thread_num output will still be 0. Try to compile with MS compiler and you should see similar results, just FYI

-regards,
Kittur

Urs's picture

Hi Kittur,

thanks for the test. It also tried the Microsoft compiler a minute ago. You are right I have the same behavior with the Microsoft compiler as well.
Anyway this a quite strange behavior. I definitely see that the work is done by 2 threads.
So for nested regions omp_get_thread_num() is completely useless.

So my hint for everybody having the same problem: use ThreadID::get() of the following helper class instead.

--------------------------------ThreadID.h--------------------------------

class ThreadID {
__declspec(thread) static LONG myID;
static LONG counter;
public:
static int get();
};

--------------------------------ThreadID.cpp--------------------------------
#include
#include "ThreadID.h"
LONG ThreadID::counter = 0;
LONG ThreadID::myID = 0;

int ThreadID::get() {
if (myID == 0)
{
myID = InterlockedIncrement(&counter);
}
return myID;
}
--------------------------------------------------------------------------------

Cheers,
Urs

Kittur Ganesh (Intel)'s picture

Hi Urs,

The function omp_get_thread_num() always returning 0 ina serialized region even when multiple threads are used at different times is a feature of openmp and is supposed to work that way with openmp; allowing data values in a parallel program be more predictable. Also, a thread number is assigned to a thread only if a parallel directive is encountered in code by definition - so the compilers are doing the correct thing, just FYI.

-regards,
Kittur

jimdempseyatthecove's picture

Urs,

I did not see this mentioned in any of the replies.

omp_get_thread_num() returns the 0-based thread number of the current OpenMP team which is not the same as a 0-based thread number of all the OpenMP threads.

The main portion (prior to entering a parallel region) thread receives 0 (as explained by other poster).
At some point in your code the main thread will execute an OpenMP statement that creates an OpenMP team of threads, this team need not contain all the the threads availble to OpenMP.

The important thing is, the new team has its own set of 0-based thread numbers. The confusion arrises about the thread number (team member number) because the first time you encounter this in your programming experience the number of the thread prior to entering the parallel region (0 in this case) is the number that creates the new team (0), and becomes the thread team member number 0 of the new team. This enforces an impression that the OpenMP thread numbers are universal.

In your tree search example main (0) creates a team of two threads with OpenMP thread team member numbers 0 and 1. For clarity I will use a different notation. The thread numbers are (0.0 and 0.1) where the current level team member number is on the right (and is the number returned by omp_get_thread_num()). The number immediately to the left of the right most number is the OpenMP team member number of the next upper level OpenMP team that created the current OpenMP thread team. This repeats until you get up to the main level 0 team member number.

In your code, after you enter the 1st level, each of your two thread 0.0 and 0.1 will create a new parallel sections thus creating two new teams each with OpenMP thread team member numbers 0,1. In the notation used above you have (0.0.0, 0.0.1) and (0.1.0, 0.1.1). The second level runs with 4threads (two of which receive 0 on call to omp_get_thread_num() and two of which receive 1 on call to omp_get_thread_num()). The 3rd level, each thread creates a new team yielding teams(0.0.0.0, 0.0.0.1),(0.0.1.0, 0.0.1.1),(0.1.0.0, 0.1.0.1),(0.1.1.0, 0.1.1.1). You now have 4 teams each with team member numbers 0 and 1. And you are using 8 out of your specified 10 threads.

The 4th level something different will happen. This is because your specifiedrestriction of using 10 threads max. Two of the current 8 running threads will be able to split into a new team with two team members. The other 6 3rd level threads will not have an available thread to use. For those unlucky threads, the parallel sections run sequentially. However, should any of the threads (and threads spawned from threads, etc...) reach the end of the tree without result, and expire, then, depending on implementation,those global OpenMP threads may be available to become enlisted as a member of a team formed by any of the other threads. Some implementations permit this, some do not.

Now with the above stated. IIF you have OpenMP nested disabled. The main code will be able to form a team, but all other created parallel sections will run sequentially (i.e. only have team member number 0)

Jim Dempsey

www.quickthreadprogramming.com
Urs's picture

Thanks, Jim!

This helps!

Cheers,
Urs

Login to leave a comment.