non-destructive consumption of item instances?

non-destructive consumption of item instances?

Hi,

I have a question about the FaceDetection example in CnC sample directory. The three step collections, C1, C2 and C3, consume items from one item collection, "image", but controlled by different tag collections. From the term "consumption" used in the tutorial, I assumed when C1 "gets" an item from "image" collection by invoking c.image.get(t, image_item) the retrieved item will not exist in the item collection any longer. But apparently the item data is still there, and can be retrieved again when C2 is prescribed a matching tag and executes c.image.get(t, image_item). In my understanding, items are not exactly "consumed" on get() calls, and can be retrieved any time when a step collections receives a matching tag, is this correct? I gussed so because when I tried to put back the consumed data item into the collection with the tag (right after c.classifier2_tags.put(t) in Classifier1::execute(...)), I get a warning message saying "Warning: multiple assignments to same tag ( ... )". It'd be great if you can correct me if I'm missing something here and clarify the semantics of get() calls for me.

Thanks,

 Hyojin

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Hyojin,
You are right, "conusming" is by default non-destructive and you can issue as many gets on the same item-instance as you want. In general, it is important to know that, in the abstract, the CnC model treats items
1. by value (and not by (object-)reference) and
2. as dynamic single assignment (DSA)
That's why the runtime issues the warning you see: Every item instance (identified by a tag) can be written only once (but it can be consumed mumltiple times). Hence, whenever semantically an item is altered and should be put for a other steps to be consumed, the altered "value" needs to be put with a different and unique tag.

BTW:
There is a garbage collection facility in the tuners called get_cout. If this is specified, the runtime will delete the item after it was "consumed" that many times. The tutorial (http://software.intel.com/sites/landingpage/icc/api/tutorial.html) explains this in more detail with an example. Some of the providedmore complex examples (like RTM and cholesky) combine smart-pointers and tuner::get_count to minimize the memory footprint and avoid unnecessary copying.

Does this help?

frank

Thank you for the detailed response! I am exploring how existing pipeline-parallel programs using buffers/queues can be expressed in Intel CnC, so non-destructive retrieval of data items was rather something new to me. I have one more question: how can I limit the number of items processed in parallel for a certain step? For example, regardless of how many items are ready, I'd like to process them one by one in a serialized fashion. I looked at the tuner interface, but couldn't find a way to limit the parallelism for an individual step collection. Thank you in advance!

The tuners don't have such a feature yet. We are designing a more powerfull tuning language, but that is also not yet available.

There are certainly ways to achieve what you want within the "domain" code. I don't know exaclty what your code is doing (e.g. what is the relation between the data and a step-tag and how do they get produced?), but I would really like to understand your motivation for wanting serial execution. Can you say a few words?

In any case, here are some random thoughts:
- using env-var CNC_NUM_THREADS=1 at runtime fully serializes the execution
- putting the next step-tag within the step itself just before exiting serializes execution of a certain step-collection
- an "artificial" producer-consumer relationship between two steps also forces the runtime to serialize

Quote:

how can I limit the number of items processed in parallel for a certain step? For example, regardless of how many items are ready, I'd like to process them one by one in a serialized fashion. I looked at the tuner interface, but couldn't find a way to limit the parallelism for an individual step collection. Thank you in advance!

That what I was trying to do with a "Log" class, that process each logged message in a delayed step (to not block the execution of the program): a message step can start only when the preceding message was processeced.
I get the trick with a step_tuner to setup the dependency, a dummy item_collection to populate the dependency, and an item_tuner to 'poping' the message queue. The complete code is attached (there's some codes / defines related to my framework, but nothing complicated).
Yey I know it's not some parallelized code, but it will be part of a much wider task based system, and in this case it makes sense!

Attachments: 

AttachmentSize
Download usullog.h3.02 KB
Download usullog.cpp2.76 KB

Leave a Comment

Please sign in to add a comment. Not a member? Join today