Hello,First of all I am new to tbb so excuse me if my question seems easy.I have this process that I want to parallelize. Simply, I have a bunch of nodes that represent an odrinary differential equation (so the number of computations made by each node is important). Those node are interconnected and I want to solve all of them for each time step. I found a way to write the problem using a single loop if I can resend a finished computation in the ready pool.Here is the idea I had:

  1. Create a dummy parent
  2. The dummy parent generates N children(I can not use the tree architecture for various reasons)
  3. Spawning the tasks
  4. Depending on a double condition (Have I reached the final time? && Am I too fast?), update the internal state of each child and send it back in the ready pool of the parent.

To do so, I used the following implementation:NetworkClassTask.cpp

#include "NetworkClassTask.hpp"RootTask::RootTask(int NNode,IntAVector* ResultIN) { root = allocate_task(); NNODE = NNode; AllResults=ResultIN; root->NNODE = NNode; root->AllResults = AllResults;}task* RootTask::execute() { int count = 1; for (int ii = 0; ii < NNODE; ii++) { ++count; listT.push_back( *new (task::allocate_additional_child_of((*this))) BaseTask(ii,root)); } set_ref_count(count); spawn_and_wait_for_all(listT); return NULL;}RootTask* RootTask::allocate_task() { return scalable_allocator ().allocate(1);}BaseTask::BaseTask(int idx, RootTask* parent) { daddy = parent; IND = idx; sum = 0; TF = 10; T = 0;}task* BaseTask::execute() { bool OK = true; int NNode=daddy->NNODE; //printf("IND:%d\\n",IND); int CVal=(*(daddy->AllResults))[IND]; for (int ii=0;ii { if (ii!=IND) if (CVal>(*(daddy->AllResults))[ii]) OK=false; } if (OK) { task_list TMP = daddy->listT; for (int ii = 0; ii < 10; ii++) sum += 1; (*(daddy->AllResults))[IND] = sum; printf( "Node IND:%d\\t t:%d\\t tf:%d\\t sum:%d \\tNNODE:%d \\n", IND, T, TF, sum, NNode); T++; if (T < TF) recycle_as_child_of(*parent()); else return NULL; } else { recycle_as_child_of(*parent()); } return this;}


#ifndef NETWORKCLASSTASK_HPP_#define NETWORKCLASSTASK_HPP_#include "tbb/scalable_allocator.h"#include "tbb/task_scheduler_init.h"#include "tbb/tick_count.h"#include "tbb/task.h"#include "tbb/concurrent_vector.h"#include#include #include #include const bool tbbmalloc = true;const bool stdmalloc = false;using namespace tbb;using namespace std;typedef concurrent_vector > IntAVector;class RootTask: public task{public: int NNODE; RootTask* root; IntAVector* AllResults; RootTask(int NNode,IntAVector* ResultIN); task* execute(); static RootTask* allocate_task(); task_list listT;};class BaseTask: public task{public: int IND; int sum; int TF; int T; RootTask* daddy; BaseTask(int idx,RootTask* parent); task* execute();};#endif /* NETWORKCLASSTASK_HPP_ */


#include "TestNetworkTask.hpp"

#include "tbb/task_scheduler_init.h"

int main (){

int NNode=20;

task_scheduler_init my_tbb;

IntAVector AllResults;


for (int ii=0;ii


task& my_root=*new(task::allocate_root()) RootTask(NNode,&AllResults);


return 0;



#ifndef TESTNETWORKTASK_HPP_#define TESTNETWORKTASK_HPP_#include "NetworkClassTask.hpp"#endif /* TESTNETWORKTASK_HPP_ */

The compilation went fine and the program is running if NNODE in TestNetworkTask.cpp is smaller than my number of processors. But if I have more nodes that the number of CPUs the program hangs when it reaches the number of CPUs...Can someone help me to find the mistake?ThanksPierre

1 post / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.