Mathematical Parallelization By Compilers

This is not to say that compilers can automatically parallelize code. I would however really like to see that happen and here is an interesting and reliable way to parallelize operations. If a compiler can use this method of thinking then it can also be used as hints for developers writing code today.

C and C++ languages are based on mathematical expressions. So much so that 1; is a legal operation in C\++. Other languages such as C#, Java, VB and Delphi also use mathematical operations to before actions. For example:
MyInterger = GetCount() + GetLength()
Both are function calls that do some work.

Many mathematical operations are interchangeable for example X = 1 + 2 is the same as X = 2 + 1. This means that:
  X = CallFunc_A() + CallFunc_B()
is the same as:
  X = CallFunc_B() + CallFunc_A()

This is a hint telling us that CallFunc_A and CallFunc_B can run in parallel (unless internal resources are shared).

A more complex example would be: X = 3 * (1 + 2), and the hint now is that one operation needs to complete before the other can continue.
Here is the code equivalent:
  X = CallFunc_C() * ( CallFunc_A() + CallFunc_B() )

Here is another:
  X = CallFunc_C( CallFunc_A() + CallFunc_B() )

These concepts apply to logical operations as well.

Looks like as a general rule, when we close braces we have a Conjunction Point (a Join). It makes sense because we don't really need to make sure that an operation is complete until we need its return value.

The question we have left now is how can we cancel an operation when it is no longer used, for example X = A or B. what if we execute A and B in parallel and B returned TRUE while A is still executing.

Your thoughts?
For more complete information about compiler optimizations, see our Optimization Notice.

11 comments

Top
anonymous's picture

Hi Asaf.

This is not a new idea, with a whole program analysis (for most languages) and with a very simple analysis for pure declarative languages we can easily determine which operations are independent and therefore decide to run them in parallel. Furthermore, futures[1] can be used where computations are dependent to create safe, dependent parallelism.
This can be done automatically - the programmer need not be aware of the communication that their program performs[2].

The tricky part is knowing which parallelisations make the program faster, and which make it slower. This is very important, I have only four cores in my system, so I don't need much parallelism, I also want parallelism that has smaller overheads or doesn't block for long. But I believe it can be done, and can be done by the compiler to automatically parallelise my code; this is the topic of my PhD thesis and more details can be found in [3].

Regarding possibly canceling tasks where the answer is no-longer needed such as in the case of the boolean operation; I perfer never to parallelise them to begin with. This is speculative execution and can easily be wasteful if done too aggressively. Concentrate on introducing parallelism in the cases where you _know_ you'll need the answer, then benchmark the program and determine if it's not yet fast enough before handling tricker cases like this.

Enjoy.

1. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.97.1841&rep=rep1&type=pdf

2. http://www.mercury.csse.unimelb.edu.au/information/papers.html#dep_and_parallel

3. http://www.mercury.csse.unimelb.edu.au/information/papers.html#dep_and_parallel

anonymous's picture

In XKAAPI (http://kaapi.gforge.inria.fr/dokuwiki/doku.php?id=start), we have something called preemption.

It is explicit, ie. left to the developper and not guessed by the compiler.
It is cooperative: a computation task can "reguarly" check if it has been preempted. If this is the case, it can abort and if needed, a reduction is performed.

This is a basic abstraction we use to implement early cancellation (for instance, think about a find_one_of() algorithm or a special OR operator mentioned above).
More importantly, this is used in "prefix" like algorithms where spawning new computation tasks means adding work to the sequential algorithm (ie. due to the associated reduction operations needed): in this case, it is better to "preempt" as soon as possible and execute parallel tasks that wont generate reduction operations (ie. either prior reduction in the same algorithm or other tasks in the same program).

Best regards,

f.

jimdempseyatthecove's picture

Asaf,

Cancellation parameter is an incomplete solution. You really need the equivilent to a reduction operator (with implied cancellation for (||) operators). Otherwise you do not get an answer or not the right answer when answers conflict.

Functionally you have

bool foo(arg1T arg1, arg2T arg2[,...], reductT reduct)

Excepting that ", reductT reduct" is not visible in the source code.
It is added with the #pragma
Nor is the reduct object used in the source code by the programmer
It could be explicity used by the programmer via another #pragma at points in the code where it might be wise to check for bailout condition.

The reduction variable can be extended to contain structured exception information too.

if(foo_ABC() (||) foo_123())

could not only return the first true, with termination of other function
but could also return an exception usable with try { ... } catch { ... }

The interesting things about this suggestion are:

Functions, other than for the #pragma, are unchanged from the serial code function
Such #pragma'd functions can be called from either serial if(func()) or parallel if(funcA() (||) funcB() )

The "fly in the ointment" though is what threading model is supported?
OpenMP, TBB, pthread, Windows.

I would prefer it if the the compiler permitted the programmer to specify an overload operator/function that can fork and join the tasks. Thus any thread model could be used (as well as non-threaded).

Asaf Shelly's picture

I see your point.
So now I am back to being confused. The first question I had to myself was: Do we want the programmer to be aware of the fact that the function is being called concurrently?
Igor suggested that we just keep going because it is most likely that the calls would be very short.
You are suggesting that we have a variable passed as a parameter to the function and then developers can test on that variable, which covers all cases but then requires code updates and would make it a bit less transparent.

How about we say that this version of the compiler changes the signatures for all functions to have a cancellation parameter? The last parameter on the Stack is the cancellation parameter accessible for atomic operations (size and alignment). Linking to old libraries would require some tagging such as: extern “C” or: extern “C++”.

This actually makes sense to me as the opposite of Exceptions. Instead of an inner scope cancelling operations on the outer scope have a way for an outer scope to cancel operations for the inner scope.
It is more reasonable that the outer scope or the caller has the bigger picture and knows the nature of the process taking place.

Inside the function you would probably have to test the variable __cancel and if it is true return.

How does that sound?

jimdempseyatthecove's picture

You also have the requirement to terminate execution in the other functions in those circumstances where it is appropriate. My suggestion was to use an analog of a reduction variable (which not only includes the variable but an operator) which can be automagically tested during the execution of the functions (tasks). Igor suggested in some cases just let all functions complete. I think in other cases (long running functions) you would want to terminate. While you could use a kill thread, this might not be clean

#pragma early_bailout
bool Foo(char* arg)
{
Work work work…
return( true / false )
}
...
if(foo("ABC") (||) foo("123)) foundIt();

Assuming foo takes a long time to not find it. You would like both paths to terminate when one path terminates with true.

The pragma would mark foo as having a hidden reference to the reduction variable (and operator). When foo is called inside or outside of the parallel if, the hidden reduction variable is created on the stack. Thus the call is the same inside or outside the parallel if. The foo function itself will periodicaly test the hidden reduction variable to early bailout condition

#pragma early_bailout
bool foo(char* match)
{
// has hidden reduction variable, e.g. reduct
{ // implicit hidden code
if(reduct.bailout) return;
} // end implicit hidden code
...
// some foo loop
for(i=0;...
{
{ // implicit hidden code inserted by compiler
if(reduct.bailout) return;
} // end implicit hidden code
// loop code here
if(success)
return true; // this is written to the reduct.result and sets bailout
}

This function will run slightly slower, because of the test for bailout.
Alternately, the compiler would insert the reduction variable, but the test for bailout would be explicit.
Initially by #pragma, later maybe by new keyword.

Asaf Shelly's picture

Hi Jim,

It does add some complexity but using (A || B) you can maintain sequencing and using (A (||) B) you can get faster results. I am guessing it would be possible to combine both for example:
If (A || (B (||) C)) …
I am not sure whether or not this would be good for application flow management, like this:
Load_String() || (Search_For_ABC() (||) Search_For_123());
It might work assuming that functions return false or zero for failure and true or non-zero for success.

From what I understand you would expect (A (||) B (||) C) to behave like this:
Start A, Start B, Start C
Wait for any of: A, B, C (whichever completes)
If all returned false then the result is false
If the one that returned now is true then result is true
else go back to wait

For AND operations is would probably be (A (&&) B (&&) C)
Start A, Start B, Start C
Wait for any of: A, B, C (whichever completes)
If all returned true then the result is true
If the one that returned now is false then result is false
else go back to wait

With that we can say that we just let incomplete operations to continue and ignore their results, so if evaluating A was enough we continue to the next statement and let B and C complete in the background.

Would the following function prototype be enough according to the description above?
bool Foo()
{
Work work work…
return( true / false )
}

jimdempseyatthecove's picture

Asaf,

Assuming you have a syntax in C/C++ that specifies concurrency permitted (required) in Boolean expressions, your (||) for example, the use of that syntax could also have the following characteristics:

if(….(||)….) ….

Would not only imply the left and right … expressions/calls can occur concurrently or in either order, but also imply (require) the results be accumulated into an implicit reduction variable, who’s identity is reserved for the if(…) encapsulated expression.

Furthermore, the reduction variable (for Boolean) is initialized to “undefined” and the reduction variable is monitored by each (all) participating functions in the if(expression). When used in .OR. (||) expression, the functions that are not first with the result may optionally bail out early. This will also require the called functions to be declared such that it can be called with or without the reduction variable.

// or declspec(reduction_terminable)
#pragma reduction_terminable
bool foo()
{
statement;
statement;
}

Where function calls to foo will then create a hidden pointer to the reduction variable (initialized to “undefined” by the caller). Such marked functions would consult the hidden reduction variable reference/pointer at compiler determined points. This is not so much different than compiler deciding to inline or unroll loops and similar to structured exception handling (throw(bailout)).

The hidden reduction variable would also contain information as to if the result is used in .OR. (||) or .AND. (&&) or .other. Boolean expressions. This is to say, does early result by other thread terminate your function execution or not.

Asaf Shelly's picture

jfifield is making a valid point. I am not sure I was after a language specific solution. I guess it is possible to use OR in certain languages differently. How about adding another language extension so that we get a conditional lambda expressions?

If (CallFunc_A() (||) CallFunc_B())

Now I am asking myself whether or not the compiler should start evaluating all conditional lambda expressions when entering the parent scope for example:

If (CallFunc_A() (||) CallFunc_B())
{
If (CallFunc_C() (||) CallFunc_D())
{ ; }
}
else
{
If (CallFunc_E() (||) CallFunc_F())
{ ; }
}

Can the code above start evaluating all calls (A through F) when entering the function and then wait for the results only when required to do so?

Asaf Shelly's picture

I guess that it is going to be cheaper to add cores instead of adding code to verify whether or not to keep going.
Sounds like a good way to make sure that upgrading to a CPU with more cores would increase performance... I can't find any use for that :)

Igor Levicki's picture

If A is still executing and if it is a short operation (which you could figure out from say loop trip count), then I would just let it run but stop waiting on it. Otherwise, you would just be using another core/thread to cancel it which may end up more costly in terms of performance (OS calls, context switches, page faults, cache cooldowns, etc).

Pages

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.