What happens with duplicate tag values?

What happens with duplicate tag values?

Portrait de Clay Breshears (Intel)

I was playing around with one of the CnC sample codes and found that if I have duplicate identifiers (tag components?) within an item collection, all but one of the items put into the collection are inaccessible. My code was taking a single integer pair (N, M), generating N steps, which in turn generate M steps each, and compute a simple value. For the results item collection, I was using the values of the second step (1..M) as tag components. The execution yielded N results, not N*M as expected.

I assume that there is a hash function used for item collections. If so, how are duplicates handled? Is there a good method for creating unique tag components for elements that are put into an item collection? One that might not require some "state" information with all the items or prescribing tags for step collections. (An obvious way in my example is to know the maximum value of N or M and use N*(maxM)+M or something similar.)

Thanks.

--clay

6 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.
Portrait de Melanie Blower (Intel)

Hi Clay,

I'm glad to hear that you are still experimenting with CnC!

It sounds like the program you created was basically erroneous, but the error condition was unchecked -- it was just ignored. As I understand it, previous versions of CnC checked for the error condition, but the 0.6 update doesn't check for this error.

You could modify your program to use a tuple as the tag value e.g. std::pair using (i,j) as thetag value instead of a simple int.

There are some choices here: we could define an error return code for the item_collection::put() signalling that a duplicate put has occurred. This kind of checking incurs a runtime overhead. Should we only enable this kind of checking when in debug mode?Alternatively, if the same item value was put multiple times -- should that be an error? Also note that, with the get_count functionality, the item value will be removed from the item collection after the get_count has been reached. In that case, if the item value for that particular tag was put again, would there be -- should there be -- any constraint on the item value (e.g. should it be the same value as when it was originally put?)

We don't have any method for automatically generating a hash function -- but our team has been discussing this idea recently.

Best regards
--Melanie

Portrait de Clay Breshears (Intel)

I got around the problem by using a pair tag.

Is there a (documented) restriction for duplicate tag or tag components? I can think of situations where it would be easier and more natural to the original calculation to perform the same computations over. (The recursive algorithm for computing Fibonacci numbers comes to mind.) If the same computation arises from a step that is then supported by a different "instantiation" of the step collection, does the programmer need to come up with a unique tag and tag component in order to distinguish the current computation from the same computation created by some other step instantiation? If so, that adds another layer of complexity to the domain expert's job of translating from serial to CnC, IMO.

If there is some underlying reason that tag component for items cannot have duplicates, then that restriction needs to made well-knownin the CnC documentation. I don't thinkit should be an error to have such duplicates (in the abstract).Ifmy computation reaches a similar state in two different step executions and puts outtwo similarly tagged items, I would expect thatboth of the itemswould be subsequentlyget'ted and used.

If there is a fundamental restriction in the library that cannot be overcome, maybe CnC should include a tag generatorfunction.Then, when I'm computing Fibonacci numbers, I can eschew the simpleinteger parameter (N)as a tag, which would lead to many duplicates,and call the genTag() method to get a unique tag for the item and the tag to trigger the subsequent computation on that item.

--clay

Portrait de Melanie Blower (Intel)

This concept is covered in CnC_tutorial.pdf, I'llpaste the text here:

1.2.4.3 Rules for valid step code

IN GENERAL

A step may not reference (read or write) any global values.

- it accesses global state only by reading its tag or by getting items

- it modifies the global state only by putting items or tags

Gets and puts in the steps must be consistent with the associated program graph

The steps adheres to dynamic single assignment (during execution, the program never generates multiple instances of an item with the same name and tag but different contents.)

You can think of an item collection as an array of values, and the index to this array is a function of the tag value that the step is executed with. Any element in this array should be assigned only one time. That's called 'dynamic single assignment'. (Or, if it is assigned multiple times, the same value should be assigned each time).

The CnC specification doesn't say anything about when a particular step instancewill be executed, only if the step instance will be executed. That's why there is the requirement for dynamic single assignment.

It's a kind of "golden handcuffs". If you set up your program to obey the rules, then your steps can execute in parallel, and the results are deterministic. (The item collection(s) containing the result of execution will always contain the same set of values.)

--Melanie

Portrait de Clay Breshears (Intel)
Quoting Melanie Blower (Intel)

This concept is covered in CnC_tutorial.pdf, I'llpaste the text here:

1.2.4.3 Rules for valid step code
. . .

You can think of an item collection as an array of values, and the index to this array is a function of the tag value that the step is executed with. Any element in this array should be assigned only one time. That's called 'dynamic single assignment'. (Or, if it is assigned multiple times, the same value should be assigned each time).

D'oh! There it is in black and white. My bad. I must admit that Iwasn't too focused when I read that section (or understood what"dynamic single assignment" meant,even though it is defined right there). Consider expanding these rules in a future edition of the tutorial document. The array analogy you quote here is a good explanation of how a programmer could think of things like this and avoid the problem.

Even with that, I would still advocate some CnC tag generation function. In a tree traversal problem where the leaves of the tree are independent computations I can have two phases of computation, generating child nodes from a parent and computing at the leaves. If the identity of a node (and state of thetree traversal)can be detected by the contents of the node/item itself, then each node (parent or leaf) is practically identical. How do I tag each node uniquely?

I could somehow encode the (unique) path to the node that was traversed. There are some schemes for this used in the CnC tutorial, but these have space limitations and, therefore, tree depth limits. What about cases that have more than 2 children per parent, like a game tree? I can see that the devising (and coding and debugging)of a suitable tagging scheme could overshadow the other (more important) development considerations. A simple global counter could be used in this case, but updating the shared variables that would need go against the other Rules. So, why not include one in Concurrent Collections?

I realize that my Fibonacci example isn't the best argument for adding a tag generator. That computation needs to retrace the computation tree to deliver a final answer and "random" tags would not hold that information. The same would go for something like a minimax game tree evaluation since parents are needed to pass evaluation scores back up the tree.

For something like depth- or breadth-first search, the current node and the previous nodes visited could be used to tag. However, there can be many different paths through the same nodes to arrive at a given "goal" node. If the computation needs to explore all paths and those paths are unique to the computation, I need a more sophisticated tagging structure, which could be more complex than the computation I'm trying to create. On the other hand, if all the nodes visited and current state of the search were in the data part of an item collection, a simple unique tag easily differentiates all similar states without burdening the programmer to develop a method or data structure to encode all the unique possibilities.

I can see many simplifying ways to use a tag generating API in CnC. It would require some rules for correct use, like the tag value couldn't be used to index some global array. If the tag is opaque to the programmer, it shouldn't be less of a problem to abuse the feature.

Just an idea to ponder on for a future release of Concurrent Collections.

--clay

Portrait de Melanie Blower (Intel)

Thanks for your suggestion, Clay. The CnC team agrees that finidng a suitable control tag mapping can be tricky in certain programming problems. We'll ponder this issue!

It turns out that, for shared memory (not distributed memory!) the address of the item can be used as the tag value.

There's detailed information about tagtypes in the CnC Runtime API that's available online. Here's a link:
"http://software.intel.com/en-us/sites/whatif/runtime_api/class_cn_c_1_1t...">
tags

Connectez-vous pour laisser un commentaire.