How do I use TBB eliminate competition

How do I use TBB eliminate competition

imagem de Frank.F

    Hello everyone!     

When i use paralle_for parallel the program,then i use inspector detected there has data race(read and write race).But i cannot find a detailed methods that can used eliminate data race.Who can give me a suggestion.Thank you very much!

19 posts / 0 new
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
imagem de Adri C. S.

Hi Frank!

It's hard to help you with that little info. It's better if you post the code you have and maybe a screenshot of the Inspector when the race is detected.

 

Cheers!
imagem de jimdempseyatthecove

Read: http://en.wikipedia.org/wiki/Race_condition

to give you an overview of race conditions.

Jim Dempsey

www.quickthreadprogramming.com
imagem de Frank.F

Quote:

Adri C. S. wrote:

Hi Frank!

It's hard to help you with that little info. It's better if you post the code you have and maybe a screenshot of the Inspector when the race is detected.

 

It is read and write data race.Just like the picture show,the parameter jy appears read and write race.What can i do? If atomic ok? Or use mutexes or lock?

 

Anexos: 

AnexoTamanho
Download QQ截图20131227101043.png128.84 KB
imagem de jimdempseyatthecove

The question we need you to answer is:

a) Do all worker threads use the same [iter->number] range?
or
b) Does each worker have its own (different) [iter->number] range?

In the event of a) then consider using a reducer (each worker accumulates += values independently, which are then reduced once later).
In the event of b) then you only have an issue between one worker thread issuing first [iter->number] against adjacent worker thread issuing last [iter->number]. In this case you only require a mutex (or reducer) at the first and last iter->number's

Jim Dempsey

www.quickthreadprogramming.com
imagem de Frank.F

Quote:

jimdempseyatthecove wrote:

The question we need you to answer is:

a) Do all worker threads use the same [iter->number] range?

or

b) Does each worker have its own (different) [iter->number] range?

In the event of a) then consider using a reducer (each worker accumulates += values independently, which are then reduced once later).

In the event of b) then you only have an issue between one worker thread issuing first [iter->number] against adjacent worker thread issuing last [iter->number]. In this case you only require a mutex (or reducer) at the first and last iter->number's

Jim Dempsey

Can you give me some information about the case a:using a reducer,I am a novice.

imagem de jimdempseyatthecove

You can look at the TBB parallel_reduce examples or you might want to consult a recursion example such as: C:\Program Files (x86)\Intel\Composer XE 2011 SP1\tbb\examples\task\tree_sum

(or wherever you installed the samples on Linux.

or (untested code)

double sum(double A[], size_t n)
{
  if(n < MIN_N_FOR_SUM)
  {
    double ret = 0.0;
    for(int i=0; i<n; ++i)
      ret += A[i];
    return ret;
  }
  else
  {
    size_t n1 = (n / 2) - ((n / 2) % (64 / sizeof(double)));
    size_t n2 = n - n1;
    double sum1, sum2;
    parallel_invoke(
      []() { sum1 = sum(A, n1); },
      []() { sum2 = sum(A+n1, n2); });
    return sum1 + sum2;
  }
}

Jim Dempsey

 

www.quickthreadprogramming.com
imagem de Frank.F

Quote:

jimdempseyatthecove wrote:

You can look at the TBB parallel_reduce examples or you might want to consult a recursion example such as: C:\Program Files (x86)\Intel\Composer XE 2011 SP1\tbb\examples\task\tree_sum

(or wherever you installed the samples on Linux.

or (untested code)

double sum(double A[], size_t n)
{
  if(n < MIN_N_FOR_SUM)
  {
    double ret = 0.0;
    for(int i=0; i<n; ++i)
      ret += A[i];
    return ret;
  }
  else
  {
    size_t n1 = (n / 2) - ((n / 2) % (64 / sizeof(double)));
    size_t n2 = n - n1;
    double sum1, sum2;
    parallel_invoke(
      []() { sum1 = sum(A, n1); },
      []() { sum2 = sum(A+n1, n2); });
    return sum1 + sum2;
  }
}

Jim Dempsey

 

Thank you very much! my dear friend.My code like this,and i want it be parallel_for,then the problem had arisen.

for( iter=cell_n+3; iter!=iter_end-3; iter++ )

{                        

    iter->fp = (iter-1)->fp_old1 - pidt1 * (iter-1)->jy;    

    iter->gm = (iter-1)->gm_old1 - pidt1 * (iter-1)->jz;

    iter->fm = (iter+1)->fm - pidt1 * iter->jy;

    iter->gp = (iter+1)->gp - pidt1 * iter->jz;

    iter->ey = iter->fp + iter->fm;   iter->bz = iter->fp - iter->fm;

    iter->ez = iter->gp + iter->gm;   iter->by = iter->gp - iter->gm;

    iter->ex -= 2.0 * pidt1 * iter->jx;

}

in the parallel_for code,each thread accesses the "fm" and "gp" ,int the code 5th and 6th line,there will be any data race.How do I remove this competition?Thank you!

imagem de Raf Schietekat

(Note that Jim's code sample emphasises clarity, not performance.)

At first sight fm and gp are the target of a "scan" operation over jy and jz from right to left, so you would have to move them into a separate loop running in that direction. Use tbb::parallel_scan() for large intervals (one forward for fp and gm, one backward for fm and gp). You could parallel_invoke() those scans and then use tbb::parallel_for to compute e{x,y,z} and b{y,z} (clearest), or run one of the scans first and then everything else in the other scan (tbb::parallel_for() stuff integrated in the final pass of the second scan, which might make better use of the cache).

imagem de Frank.F

Quote:

Raf Schietekat wrote:

(Note that Jim's code sample emphasises clarity, not performance.)

At first sight fm and gp are the target of a "scan" operation over jy and jz from right to left, so you would have to move them into a separate loop running in that direction. Use tbb::parallel_scan() for large intervals (one forward for fp and gm, one backward for fm and gp). You could parallel_invoke() those scans and then use tbb::parallel_for to compute e{x,y,z} and b{y,z} (clearest), or run one of the scans first and then everything else in the other scan (tbb::parallel_for() stuff integrated in the final pass of the second scan, which might make better use of the cache).

Thank you! Could  you recommend me some  material to learn TBB, some books,some Web ? And also i have a question like 

   for(iter = part_n,iter_end=part_n+(vector_part.end()-vector_part.begin());iter!=iter_end;iter++)
   {
          has_to_change_cell( cell_n, iter );   // put particles on stack
   }
inline void propagate::has_to_change_cell( struct cell *cell, struct particle *part )
{
  if ( part->x < cell[part->number].x ) 
	  {
		cell[part->number].np[part->species]--;
		cell[part->number-1].np[part->species]++;
		cell[part->number].npart--;
		cell[part->number-1].npart++;
		part->number--;
      }
  else if ( part->x >= cell[part->number].x  + dx ) 
	  {
		cell[part->number].np[part->species]--;
		cell[part->number+1].np[part->species]++;
		cell[part->number].npart--;
		cell[part->number+1].npart++;
		part->number++;
      }
}

Then between the threads ,

		"cell[part->number].np[part->species]"will have data race ,what can i do?Thank you very much!
 
imagem de Frank.F

Quote:

Raf Schietekat wrote:

(Note that Jim's code sample emphasises clarity, not performance.)

At first sight fm and gp are the target of a "scan" operation over jy and jz from right to left, so you would have to move them into a separate loop running in that direction. Use tbb::parallel_scan() for large intervals (one forward for fp and gm, one backward for fm and gp). You could parallel_invoke() those scans and then use tbb::parallel_for to compute e{x,y,z} and b{y,z} (clearest), or run one of the scans first and then everything else in the other scan (tbb::parallel_for() stuff integrated in the final pass of the second scan, which might make better use of the cache).

Thank you! Could  you recommend me some  material to learn TBB, some books,some Web ? And also i have a question like 

   for(iter = part_n,iter_end=part_n+(vector_part.end()-vector_part.begin());iter!=iter_end;iter++)
   {
          has_to_change_cell( cell_n, iter );   // put particles on stack
   }
inline void propagate::has_to_change_cell( struct cell *cell, struct particle *part )
{
  if ( part->x < cell[part->number].x ) 
	  {
		cell[part->number].np[part->species]--;
		cell[part->number-1].np[part->species]++;
		cell[part->number].npart--;
		cell[part->number-1].npart++;
		part->number--;
      }
  else if ( part->x >= cell[part->number].x  + dx ) 
	  {
		cell[part->number].np[part->species]--;
		cell[part->number+1].np[part->species]++;
		cell[part->number].npart--;
		cell[part->number+1].npart++;
		part->number++;
      }
}

Then between the threads ,

		"cell[part->number].np[part->species]"will have data race ,what can i do?Thank you very much!
 
imagem de Raf Schietekat

At TBB's own website you will find the online "User Guide and Design Patterns", and a picture of the book "Intel Threading Build Blocks" with a link to a large online shop for books and other stuff. The book might not be up to date anymore, but it's still a good read.

Unless the problem has some relevant structure that I missed, you could make np and npart atomic.

imagem de Frank.F

Quote:

Raf Schietekat wrote:

At TBB's own website you will find the online "User Guide and Design Patterns", and a picture of the book "Intel Threading Build Blocks" with a link to a large online shop for books and other stuff. The book might not be up to date anymore, but it's still a good read.

Unless the problem has some relevant structure that I missed, you could make np and npart atomic.

Thank you! I tried to use atomic operations before,may be i wrote the wrong statement,could you tell me which the statement I can use?

imagem de Raf Schietekat

++ and -- are implemented by tbb::atomic (at least for integers).

imagem de jimdempseyatthecove

While you can solve this with using atomics, the problem is with performance. I would suggest something along the following:

inline void propagate::has_to_change_cell( struct cell *cell, struct particle *part )
{
  if ( part->x < cell[part->number].x ) 
   {
  cell[part->number].np[part->species]--;
  cell[part->number].np_prior[part->species]++;
  cell[part->number].npart--;
  part->number--;
      }
  else if ( part->x >= cell[part->number].x  + dx ) 
   {
  cell[part->number].np[part->species]--;
  cell[part->number].np_following[part->species]++;
  cell[part->number].npart--;
  part->number++;
      }
}

inline int propagate::get_np( struct cell *cell, struct particle *part )
{
 return cell[part->number].np[part->species]
   + cell[part->number-1].np_following[part->species]
   + cell[part->number+1].np_prior[part->species];
}

Jim Dempsey

www.quickthreadprogramming.com
imagem de jimdempseyatthecove

It goes without mentioning that you can run (occasionally) a parallel_for to reconcile (collapse) the np_following and np_prior into np.

Jim Dempsey

www.quickthreadprogramming.com
imagem de Raf Schietekat

I like the idea, but I'm afraid it won't work here, because the -1 and +1 are on "part->number", not on "part".

A simple situation that would fail is if all "number" and "species" values are the same for each *part.

imagem de jimdempseyatthecove

In both the original code and my suggested code there is a requirement that all part->number's are unique. Though, all part->species need not be unique.

My code suggestion removes the +1 and -1 on the cell subscript, and adds two arrays of size of the np array. The tally of the adjacent cells will contain the proper tallies for the changes.

These tallies are what would have been added/subtracted from the +1 and -1 entries (which can be summed when fetching the new counts).

Jim Dempsey

www.quickthreadprogramming.com
imagem de Frank.F

Quote:

jimdempseyatthecove wrote:

In both the original code and my suggested code there is a requirement that all part->number's are unique. Though, all part->species need not be unique.

My code suggestion removes the +1 and -1 on the cell subscript, and adds two arrays of size of the np array. The tally of the adjacent cells will contain the proper tallies for the changes.

These tallies are what would have been added/subtracted from the +1 and -1 entries (which can be summed when fetching the new counts).

Jim Dempsey

Thank you very much! There is the situation like Raf Schietekat says,A simple situation that would fail is if all "number" and "species" values are the same for each *part.In a cell number,there are some particles,the particles'number is same to the cell's number.

Faça login para deixar um comentário.