cross-join in CnC

cross-join in CnC

Hello,I am using CnC in my PhD research, as a compilation-target for a dataflow quantum simulator.During this work it is common to create the 'cross join' product of two (gigantic) sets, representing the tensor- or kronecker product of two vectors.What would be the recommended way to do this efficiently, given that each set is a simple item collection with an associated tag collection? Currently I can think of two ways, each of which seems to suspend or depend too much.a) a generator step (prescribed by a single tag element) that generates all pairs as tags. A compute step prescribed by the pairs will get the two items and multiply.

(gen_pairs) -> ;[vec1] -> (product) -> [vec_result];[vec2] -> (product);(gen_pairs) :: ;(product) :: ;

cons: Possibly suspends the product step size(v1) * size(v2) times.This could perhaps be mitigated by using the Tuner to declare a dependency for (product) to [vec1] and [vec2], as sizes are known beforehand. But I'm not sure of queuing up size(v1) * sive(v2) dependencies up front is a good idea.
b) similar to a), but letting the pair generated be prescribed by a tag collection associated with one of the two vectors

...(gen_pairs) ::

cons: still suspends in the worst case (size(v1) - 1) * size(v2) steps. By using the tuner again for (product) we can perhaps lower the number of suspended steps by depending on v2 elements. This queues up size(v2) dependencies, which is lot better but can still be large.Is there an elegant solution to this cross-join pattern? One that retains the inherit parallelism, but stays efficient.Thanks in advance!-Yves VandriesscheSoftware Languages LabVrije Universiteit Brussel

8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Glad to hear that you are using CnC!

Your instinct about the efficiency of large numbers of dependents is correct, in the 0.6 CnC runtime system. It's also a great idea to have CnC patterns for all sorts of computations. This is on our list of things to do.

Another thing to keep in mind is the granularity of the CnC step code. In the 0.6 update, the steps should be fairly coarse grain: a big chunk of calculations.

I'm not familiar with the mathematics that you're talking about "tensor product"

From reading on wikipedia, i think the serial computation for tensor product is something like this:

for i in 1..m {

for j in 1..n {

result[i,j] = Func( v1[i] , v2[j] );



As a first try, you might try calculating the entire top row of the result matrix in a single step instance, and so forth. The tag value could be the row index. The loop across the result matrix becomes the body of the jLoop step code.

; // tag declaration

[MatrixRow resultRow ];

env -> ; // The environment (~~main program) produces tag values 1..m

:: ( jLoop ) ; // Each i tag value prescribes a jLoop step to execute

(jLoop) -> [resultRow] ; // Each jLoop step produces a row

[resultRow] -> env; // the environment consumes all the rows

Since V1 and V2 are effectively constant, the jLoop could use the values from the environment (shared memory only). If you're going to use distributed CnC, you might put V1 and V2 into an Item collection.

The CnC runtime API has a way to specify usage counts, so that all the "resultRow" will be removed from the Item collection and the storage deallocated. The 0.6 sample code fib_getcount has an example of how to do that.

Let us know how it goes!

Ifyour environment produces the [vec?]-items (and that's what I assume from your description) then there should be no step-suspensions at all. All items have been produced before the tags are put, so every (product)-step can execute andfinishimmediately.
The same is true if (gen_pairs) puts the [vec?]-items and thenthe -tags.

If neither of the above is true, could you please let us know where the [vec?]-items are produced?

Thank you for the speedy response!Some clarifications:The Kronecker product of vectors which I am confronted with works rather like this:

vec[a b] X vec[c d e f] = vec[a*c a*d a*e a*f b*c b*d b*e b*f]

Such that in essence you get a vector where every element from one set has been multiplied with another.The elements of these vectors are in the middle of a pipeline, no actual elements of these vectors come from the environment.Each element of the vectors is not guaranteed to be put into the item collection in sequence. The computations in my pipeline perform several permutations so that a typical trace of the vector's item collectionmight be:


I was already afraid that my fine-grained approach wouldn't perform well. After everything 'works' I will try to tune everything to a be a bit more coarse. Are there any planned cnc-compiler optimization? Something in the line of the StreaMIT compiler that would fuse together parts of the graph.

I generated code similar to method b) in my original post. With a bigger dataset this causes the program to end prematurely, with no error code at all.My hunch is that it generates too many suspended steps. Is there a way to actually see what is going on?

I am sorry to see you are having problems.
You can try 2 things: First compile with NDEBUG being defined and link against the debug library. If that doesn't give you more insight, try to enable tracing
CnC::debug::trace( , "name" );
(preferabley in the constructor of your context.

I am sorry for the topic change, my intent was not to start a debugging thread.

As far as I can see, there is no debug version of supplied with the installation. Doing the above with the tbb debug library linked does not return any new information.

I was already using trace: trace_all and trace() on all steps. The program simply stops program execution and ends gracefully without any errors, somewhere in the middle of the graph.

> grep -c 'item not ready' log


Steps created( 287848 )

Steps scheduled( 337020 ) inflight( 151040, 0 )

Steps requeued( 200212 ) resumed( 49172 )

Hello Yves,
We'd like to get in touch with you directly, to see if we could help debug the problem.
You could send us your contact info directly by replying with a "private post"
(There is a radio button "Mark this post Private" -- select "yes")

Leave a Comment

Please sign in to add a comment. Not a member? Join today