The program I am looking at is supposed to perform load balancing of a 2-D pipeline parallel loop of tasks.
Imagine a 2-D (i=0...9, j=0...9) grid of 100 tasks
A task (i,j) is ready when task (i-1, j) and (i, j-1) are executed.
Tasks for which i-1 < 0 or j-1 < 0 are ready from the start (the environment provides the proper tags and data)
Since a step can only be prescribed by 1 tag collection I thought I would rely on data to help synchronize my computations step.
I have a problem with the execution order of tasks, I reduced my problem to the following simple case.
// Input from the environment
env -> [A],[B],;
// Steps Prescription
// Steps Execution
(tag_compute) -> ;
(problem_compute) -> [A],[B];
[A],[B] -> (problem_compute);
Basically, the control tag collection enumerates the diagonal wavefronts, each task produces data for its (i+1, j) and (i, j+1) neighbors.
Since I am having ordering issues I do not understand yet I did the following experiment:
remove all the data puts from the environment and the (problem_compute) steps.
I only generate the control tag for task (0,0)
My expectation was that all (tag_compute) steps would execute but none of the (problem_compute) (since I voluntarely starve the production of data).
Instead, all steps get executed.
Was I wrong to assume missing data would block the (problem_compute) steps ?
Also, if I understand properly, the way multiple prescribing tags is implemented in practice is to perform cascading tags (like in the cholesky example).
Is the intuition of using 1 tag + data reasonable or should I just stick on doing it the cholesky way ?