How do I use TBB eliminate competition

How do I use TBB eliminate competition

Frank.F的头像

    Hello everyone!     

When i use paralle_for parallel the program,then i use inspector detected there has data race(read and write race).But i cannot find a detailed methods that can used eliminate data race.Who can give me a suggestion.Thank you very much!

19 帖子 / 0 new
最新文章
如需更全面地了解编译器优化,请参阅优化注意事项
Adri C. S.的头像

Hi Frank!

It's hard to help you with that little info. It's better if you post the code you have and maybe a screenshot of the Inspector when the race is detected.

 

Cheers!
jimdempseyatthecove的头像

Read: http://en.wikipedia.org/wiki/Race_condition

to give you an overview of race conditions.

Jim Dempsey

www.quickthreadprogramming.com
Frank.F的头像

Quote:

Adri C. S. wrote:

Hi Frank!

It's hard to help you with that little info. It's better if you post the code you have and maybe a screenshot of the Inspector when the race is detected.

 

It is read and write data race.Just like the picture show,the parameter jy appears read and write race.What can i do? If atomic ok? Or use mutexes or lock?

 

附件: 

附件尺寸
下载 QQ截图20131227101043.png128.84 KB
jimdempseyatthecove的头像

The question we need you to answer is:

a) Do all worker threads use the same [iter->number] range?
or
b) Does each worker have its own (different) [iter->number] range?

In the event of a) then consider using a reducer (each worker accumulates += values independently, which are then reduced once later).
In the event of b) then you only have an issue between one worker thread issuing first [iter->number] against adjacent worker thread issuing last [iter->number]. In this case you only require a mutex (or reducer) at the first and last iter->number's

Jim Dempsey

www.quickthreadprogramming.com
Frank.F的头像

Quote:

jimdempseyatthecove wrote:

The question we need you to answer is:

a) Do all worker threads use the same [iter->number] range?

or

b) Does each worker have its own (different) [iter->number] range?

In the event of a) then consider using a reducer (each worker accumulates += values independently, which are then reduced once later).

In the event of b) then you only have an issue between one worker thread issuing first [iter->number] against adjacent worker thread issuing last [iter->number]. In this case you only require a mutex (or reducer) at the first and last iter->number's

Jim Dempsey

Can you give me some information about the case a:using a reducer,I am a novice.

jimdempseyatthecove的头像

You can look at the TBB parallel_reduce examples or you might want to consult a recursion example such as: C:\Program Files (x86)\Intel\Composer XE 2011 SP1\tbb\examples\task\tree_sum

(or wherever you installed the samples on Linux.

or (untested code)

double sum(double A[], size_t n)
{
  if(n < MIN_N_FOR_SUM)
  {
    double ret = 0.0;
    for(int i=0; i<n; ++i)
      ret += A[i];
    return ret;
  }
  else
  {
    size_t n1 = (n / 2) - ((n / 2) % (64 / sizeof(double)));
    size_t n2 = n - n1;
    double sum1, sum2;
    parallel_invoke(
      []() { sum1 = sum(A, n1); },
      []() { sum2 = sum(A+n1, n2); });
    return sum1 + sum2;
  }
}

Jim Dempsey

 

www.quickthreadprogramming.com
Frank.F的头像

Quote:

jimdempseyatthecove wrote:

You can look at the TBB parallel_reduce examples or you might want to consult a recursion example such as: C:\Program Files (x86)\Intel\Composer XE 2011 SP1\tbb\examples\task\tree_sum

(or wherever you installed the samples on Linux.

or (untested code)

double sum(double A[], size_t n)
{
  if(n < MIN_N_FOR_SUM)
  {
    double ret = 0.0;
    for(int i=0; i<n; ++i)
      ret += A[i];
    return ret;
  }
  else
  {
    size_t n1 = (n / 2) - ((n / 2) % (64 / sizeof(double)));
    size_t n2 = n - n1;
    double sum1, sum2;
    parallel_invoke(
      []() { sum1 = sum(A, n1); },
      []() { sum2 = sum(A+n1, n2); });
    return sum1 + sum2;
  }
}

Jim Dempsey

 

Thank you very much! my dear friend.My code like this,and i want it be parallel_for,then the problem had arisen.

for( iter=cell_n+3; iter!=iter_end-3; iter++ )

{                        

    iter->fp = (iter-1)->fp_old1 - pidt1 * (iter-1)->jy;    

    iter->gm = (iter-1)->gm_old1 - pidt1 * (iter-1)->jz;

    iter->fm = (iter+1)->fm - pidt1 * iter->jy;

    iter->gp = (iter+1)->gp - pidt1 * iter->jz;

    iter->ey = iter->fp + iter->fm;   iter->bz = iter->fp - iter->fm;

    iter->ez = iter->gp + iter->gm;   iter->by = iter->gp - iter->gm;

    iter->ex -= 2.0 * pidt1 * iter->jx;

}

in the parallel_for code,each thread accesses the "fm" and "gp" ,int the code 5th and 6th line,there will be any data race.How do I remove this competition?Thank you!

Raf Schietekat的头像

(Note that Jim's code sample emphasises clarity, not performance.)

At first sight fm and gp are the target of a "scan" operation over jy and jz from right to left, so you would have to move them into a separate loop running in that direction. Use tbb::parallel_scan() for large intervals (one forward for fp and gm, one backward for fm and gp). You could parallel_invoke() those scans and then use tbb::parallel_for to compute e{x,y,z} and b{y,z} (clearest), or run one of the scans first and then everything else in the other scan (tbb::parallel_for() stuff integrated in the final pass of the second scan, which might make better use of the cache).

Frank.F的头像

Quote:

Raf Schietekat wrote:

(Note that Jim's code sample emphasises clarity, not performance.)

At first sight fm and gp are the target of a "scan" operation over jy and jz from right to left, so you would have to move them into a separate loop running in that direction. Use tbb::parallel_scan() for large intervals (one forward for fp and gm, one backward for fm and gp). You could parallel_invoke() those scans and then use tbb::parallel_for to compute e{x,y,z} and b{y,z} (clearest), or run one of the scans first and then everything else in the other scan (tbb::parallel_for() stuff integrated in the final pass of the second scan, which might make better use of the cache).

Thank you! Could  you recommend me some  material to learn TBB, some books,some Web ? And also i have a question like 

   for(iter = part_n,iter_end=part_n+(vector_part.end()-vector_part.begin());iter!=iter_end;iter++)
   {
          has_to_change_cell( cell_n, iter );   // put particles on stack
   }
inline void propagate::has_to_change_cell( struct cell *cell, struct particle *part )
{
  if ( part->x < cell[part->number].x ) 
	  {
		cell[part->number].np[part->species]--;
		cell[part->number-1].np[part->species]++;
		cell[part->number].npart--;
		cell[part->number-1].npart++;
		part->number--;
      }
  else if ( part->x >= cell[part->number].x  + dx ) 
	  {
		cell[part->number].np[part->species]--;
		cell[part->number+1].np[part->species]++;
		cell[part->number].npart--;
		cell[part->number+1].npart++;
		part->number++;
      }
}

Then between the threads ,

		"cell[part->number].np[part->species]"will have data race ,what can i do?Thank you very much!
 
Frank.F的头像

Quote:

Raf Schietekat wrote:

(Note that Jim's code sample emphasises clarity, not performance.)

At first sight fm and gp are the target of a "scan" operation over jy and jz from right to left, so you would have to move them into a separate loop running in that direction. Use tbb::parallel_scan() for large intervals (one forward for fp and gm, one backward for fm and gp). You could parallel_invoke() those scans and then use tbb::parallel_for to compute e{x,y,z} and b{y,z} (clearest), or run one of the scans first and then everything else in the other scan (tbb::parallel_for() stuff integrated in the final pass of the second scan, which might make better use of the cache).

Thank you! Could  you recommend me some  material to learn TBB, some books,some Web ? And also i have a question like 

   for(iter = part_n,iter_end=part_n+(vector_part.end()-vector_part.begin());iter!=iter_end;iter++)
   {
          has_to_change_cell( cell_n, iter );   // put particles on stack
   }
inline void propagate::has_to_change_cell( struct cell *cell, struct particle *part )
{
  if ( part->x < cell[part->number].x ) 
	  {
		cell[part->number].np[part->species]--;
		cell[part->number-1].np[part->species]++;
		cell[part->number].npart--;
		cell[part->number-1].npart++;
		part->number--;
      }
  else if ( part->x >= cell[part->number].x  + dx ) 
	  {
		cell[part->number].np[part->species]--;
		cell[part->number+1].np[part->species]++;
		cell[part->number].npart--;
		cell[part->number+1].npart++;
		part->number++;
      }
}

Then between the threads ,

		"cell[part->number].np[part->species]"will have data race ,what can i do?Thank you very much!
 
Raf Schietekat的头像

At TBB's own website you will find the online "User Guide and Design Patterns", and a picture of the book "Intel Threading Build Blocks" with a link to a large online shop for books and other stuff. The book might not be up to date anymore, but it's still a good read.

Unless the problem has some relevant structure that I missed, you could make np and npart atomic.

Frank.F的头像

Quote:

Raf Schietekat wrote:

At TBB's own website you will find the online "User Guide and Design Patterns", and a picture of the book "Intel Threading Build Blocks" with a link to a large online shop for books and other stuff. The book might not be up to date anymore, but it's still a good read.

Unless the problem has some relevant structure that I missed, you could make np and npart atomic.

Thank you! I tried to use atomic operations before,may be i wrote the wrong statement,could you tell me which the statement I can use?

Raf Schietekat的头像

++ and -- are implemented by tbb::atomic (at least for integers).

jimdempseyatthecove的头像

While you can solve this with using atomics, the problem is with performance. I would suggest something along the following:

inline void propagate::has_to_change_cell( struct cell *cell, struct particle *part )
{
  if ( part->x < cell[part->number].x ) 
   {
  cell[part->number].np[part->species]--;
  cell[part->number].np_prior[part->species]++;
  cell[part->number].npart--;
  part->number--;
      }
  else if ( part->x >= cell[part->number].x  + dx ) 
   {
  cell[part->number].np[part->species]--;
  cell[part->number].np_following[part->species]++;
  cell[part->number].npart--;
  part->number++;
      }
}

inline int propagate::get_np( struct cell *cell, struct particle *part )
{
 return cell[part->number].np[part->species]
   + cell[part->number-1].np_following[part->species]
   + cell[part->number+1].np_prior[part->species];
}

Jim Dempsey

www.quickthreadprogramming.com
jimdempseyatthecove的头像

It goes without mentioning that you can run (occasionally) a parallel_for to reconcile (collapse) the np_following and np_prior into np.

Jim Dempsey

www.quickthreadprogramming.com
Raf Schietekat的头像

I like the idea, but I'm afraid it won't work here, because the -1 and +1 are on "part->number", not on "part".

A simple situation that would fail is if all "number" and "species" values are the same for each *part.

jimdempseyatthecove的头像

In both the original code and my suggested code there is a requirement that all part->number's are unique. Though, all part->species need not be unique.

My code suggestion removes the +1 and -1 on the cell subscript, and adds two arrays of size of the np array. The tally of the adjacent cells will contain the proper tallies for the changes.

These tallies are what would have been added/subtracted from the +1 and -1 entries (which can be summed when fetching the new counts).

Jim Dempsey

www.quickthreadprogramming.com
Frank.F的头像

Quote:

jimdempseyatthecove wrote:

In both the original code and my suggested code there is a requirement that all part->number's are unique. Though, all part->species need not be unique.

My code suggestion removes the +1 and -1 on the cell subscript, and adds two arrays of size of the np array. The tally of the adjacent cells will contain the proper tallies for the changes.

These tallies are what would have been added/subtracted from the +1 and -1 entries (which can be summed when fetching the new counts).

Jim Dempsey

Thank you very much! There is the situation like Raf Schietekat says,A simple situation that would fail is if all "number" and "species" values are the same for each *part.In a cell number,there are some particles,the particles'number is same to the cell's number.

登陆并发表评论。