Glad to hear that you are using CnC!

Your instinct about the efficiency of large numbers of dependents is correct, in the 0.6 CnC runtime system. It's also a great idea to have CnC patterns for all sorts of computations. This is on our list of things to do.

Another thing to keep in mind is the granularity of the CnC step code. In the 0.6 update, the steps should be fairly coarse grain: a big chunk of calculations.

I'm not familiar with the mathematics that you're talking about "tensor product"

From reading on wikipedia, i think the serial computation for tensor product is something like this:

for i in 1..m {

for j in 1..n {

result[i,j] = Func( v1[i] , v2[j] );

}

}

As a first try, you might try calculating the entire top row of the result matrix in a single step instance, and so forth. The tag value could be the row index. The loop across the result matrix becomes the body of the jLoop step code.

; // tag declaration

[MatrixRow resultRow ];

env -> * ; // The environment (~~main program) produces tag values 1..m*

* :: ( jLoop ) ; // Each i tag value prescribes a jLoop step to execute*

(jLoop) -> [resultRow] ; // Each jLoop step produces a row

[resultRow] -> env; // the environment consumes all the rows

Since V1 and V2 are effectively constant, the jLoop could use the values from the environment (shared memory only). If you're going to use distributed CnC, you might put V1 and V2 into an Item collection.

The CnC runtime API has a way to specify usage counts, so that all the "resultRow" will be removed from the Item collection and the storage deallocated. The 0.6 sample code fib_getcount has an example of how to do that.

Let us know how it goes!

## cross-join in CnC

Hello,I am using CnC in my PhD research, as a compilation-target for a dataflow quantum simulator.During this work it is common to create the 'cross join' product of two (gigantic) sets, representing the tensor- or kronecker product of two vectors.What would be the recommended way to do this efficiently, given that each set is a simple item collection with an associated tag collection? Currently I can think of two ways, each of which seems to suspend or depend too much.a) a generator step (prescribed by a single tag element) that generates all pairs as tags. A compute step prescribed by the pairs will get the two items and multiply.

cons: Possibly suspends the product step size(v1) * size(v2) times.This could perhaps be mitigated by using the Tuner to declare a dependency for (product) to [vec1] and [vec2], as sizes are known beforehand. But I'm not sure of queuing up size(v1) * sive(v2) dependencies up front is a good idea.

b) similar to a), but letting the pair generated be prescribed by a tag collection associated with one of the two vectors

cons: still suspends in the worst case (size(v1) - 1) * size(v2) steps. By using the tuner again for (product) we can perhaps lower the number of suspended steps by depending on v2 elements. This queues up size(v2) dependencies, which is lot better but can still be large.Is there an elegant solution to this cross-join pattern? One that retains the inherit parallelism, but stays efficient.Thanks in advance!-Yves VandriesscheSoftware Languages LabVrije Universiteit Brussel