Intel Concurrent Collections for C/C++

Intel Concurrent Collections for C/C++

We have just released Intel Concurrent Collections for C/C++and are interested in your feedback. Please take a moment to download, install and let us know what you think.

What are your thoughts on this parallel programming model. How does it compare with other techniques you have used?
Features of Intel Concurrent Collections for C/C++. What missing features are important to you?
Syntax of the Intel Concurrent Collections for C/C++ textual notation
Ease of use of the current implementation. What did you find difficult to use or to understand?
Are there restrictions that present problems for your application domain?
What did you like?

16 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

I love this idea!

But why make it a runtime environment? How about traslating the texture into a new C/C++ program parallelized with OpenMP/MPI? We can then use our own favourite compilers to compile the generated code. And we will not be confined to a specific OS/architecture.

Good idea. Our current runtime is a thin layer over Intel Threading Building Blocks. We could work to make the system produce a C++ program that couldrun directly on top of TBB. We chose TBB because the work-stealing task scheduler is a good match for Concurrent Collection steps.

Note that our Concurrent Collection translator does produce a C++source file which you include in your program. It also builds a"hints" file to give you advice on how to structure the code for your steps.

This is a good idea. One of our goals is the separation of the concerns of the domain-expert from issues of tuning. This separation allows us to support a wide variety of execution models including C/C++ parallelized with OpenMP/MPI. The same Intel Concurrent Collections programshould work on all the execution models.

Here is some explanation of how Intel Concurrent Collections works which may provide you with some insight into the benefit of using this model.

In Intel Concurrent Collections terms, we call the part of the application to be executed in parallel a graph, which is analogous to a white-board design sketch of the tasks with their input and output. A graph may be associated with the whole application or just part of it. A graph specifies a collection of steps (tasks), items (data), tags (task or data IDs) and the relations among these components.

Steps are tasks that might run in parallel and they correspond to functions in the code. In the Intel Concurrent Collections model there are no explicit invocations of these functions in the C++ source program (in contrast to a serial program). Rather, the executions of these functions are invoked and scheduled by the Intel Concurrent Collections runtime. The user initiates parallel execution of the graph via a runtime call. The runtime invokes a step when its controlling tag (essentially an iteration index) is available. How is a step-tag made available? A step-tag becomes available when the user does a put of the tag via a runtime call. A put of the tag may be done either in the serial code before the parallel execution starts, or by a step code during the parallel execution. The application needs to put at least one step-tag before the parallel execution starts; otherwise there would be no steps to execute.

As an example of generating the step-tags up-front, consider the loop in the serial code

for (i =0; i < n; i++) {

step1(...)

}

The corresponding CnC-style code does not have the explicit call to step1. Instead there is a loop to put the step-tags, then a call to execute the graph:

for (i =0; i < n; i++) {

// We choose to use the loop index as tag value

myGraph.stepTag1.Put(i);

}

myGraph.run();

Each step is conceptually atomic, and each step instance has a unique step-tag. A step performs computations which may use values from the step-tag and local variables
, as well as data input to the step (each data instance is also associated with a unique data-tag). The availability of the input data is another controlling factor to the execution of the step aside from the availability of the step-tag. A step should get (using runtime calls) all input data in the beginning. If an input data is not available, the runtime requeues the step until that input data becomes available.

A step may output data, which can either be used by other steps or by the caller of the graph. A step may also output step-tags to drive the execution of other steps.

Therefore using step-tags and data-availability to control parallel execution satisfies the dependence relations (control dependences and data dependences respectively) of the tasks, without imposing arbitrary ordering constraints of the tasks.

The beauty of the Intel Concurrent Collections model also lies in the fact that for the application writer, the knowledge of the tasks and the input and output for the tasks is inherent to the application algorithm, which the application writer already is familiar with.

To help the user in using the Intel Concurrent Collections runtime interface, a simple graph textual notation was designed for the user to represent the high level components in the graph. In this textual notation, the user describes in a separate file (suffixed .cnc) the name of the steps and their controlling step-tags and the producer and consumer relations of the data for the steps. The provided translator converts the textual representation into a header file, which makes it simple to use the runtime interface. The generated header file contains definitions of the classes for the user graph, steps, tags, and items derived from base classes in the Intel Concurrent Collections runtime. A generated coding-hints file also makes it easy for the user to write the step code interface, and shows how to execute the graph.

Use of the Intel Concurrent Collections models requires that all steps follow the single assignment rule with regard to the output data. Violations of the rule need to be fixed before the Intel Concurrent Collections model can be used. Typically, most of the violations are readily identified when the user describes the producer/consumer relations in the graph. Frequently a violation can be fixed by adding another tag component in the data-tag (same concept as adding a data dimension).

The Intel Concurrent Collections for C/C++ team has just released version 0.2.

New with Release 0.2.0

  • Support for the Windows* Intel 64 architecture.
  • Intel Threading Building Blocks 2.1 compatibility. To use this release, you must have Intel Threading Building Blocks 2.1 installed.
  • Textual notation enhancements added, including more checking to ensure the correctness of the graph.
  • New functionality added to allow specification of the number of threads to be used when executing the graph.
  • New performance and memory usage improvements.
  • New samples added.
  • Resolution of the library path problem for Microsoft Visual Studio* 2008 sample projects.

Quoting - Aaron Tersteeg (Intel)

We have just released Intel Concurrent Collections for C/C++and are interested in your feedback. Please take a moment to download, install and let us know what you think.

What are your thoughts on this parallel programming model. How does it compare with other techniques you have used?
Features of Intel Concurrent Collections for C/C++. What missing features are important to you?
Syntax of the Intel Concurrent Collections for C/C++ textual notation
Ease of use of the current implementation. What did you find difficult to use or to understand?
Are there restrictions that present problems for your application domain?
What did you like?

The Intel has discovered that a subset of Microsoft Visual 2008 users is encountering an unexpected error when building an Intel Concurrent Collections application. The error message is:

Translating Concurrent Collections Graph...

The system cannot execute the specified program.

Project : error PRJ0019: A tool returned an error code from "Translating Concurrent Collections Graph..."

The above error message is due to a missing side-by-side manifest for a particular version of a Microsoft ATL dll. To verify that this is problem, in a Windows Command Prompt execute %CNC_INSTALL_DIR%binia32cnc.exe, if you see

"The system cannot execute the specified program.", then this is the problem.

To correct the above problem, please install the Microsoft Visual C++ 2005 SP1 Redistributable Package (x86). To do this:

We are planning to fix this issue in a future CNC update.

Intel Concurrent Collections for C/C++ version 0.3 is now available!

Intel Concurrent Collections for C/C++ simplifies software parallelism by providing a mechanism to adapt a C++ program to execute in parallel while allowing the application developer to ignore complex issues of parallelism such as low-level threading constructs or the scheduling and distribution of computations.

For more information and a free download go to:

http://software.intel.com/en-us/whatif/

and click on the Intel Concurrent Collections link.

In Intel Concurrent Collections for C/C++ version 0.3 we added two new features:

1) Garbage collection by reference counting -for improving memory usage during parallel execution. The user provides refcount function and dealloc function for an item collection. The ref count for each item is initialized at Put, and decremented at Get. The memory allocated for the item will be released when all uses are done.

2) Performance tuning with step prioroties - for improving runtime performance. The user provides step priority function whose value is used by the runtime tooptimize scheduling for the steps.

The new sample MatrixInverter shows how to use these new features. Your feedback is welcome.

Quoting - shin.lee

Here is some explanation of how Intel Concurrent Collections works which may provide you with some insight into the benefit of using this model.

...

Each step is conceptually atomic, and each step instance has a unique step-tag. A step performs computations which may use values from the step-tag and local variables , as well as data input to the step (each data instance is also associated with a unique data-tag). The availability of the input data is another controlling factor to the execution of the step aside from the availability of the step-tag. A step should get (using runtime calls) all input data in the beginning. If an input data is not available, the runtime requeues the step until that input data becomes available.

...

A step may output data, which can either be used by other steps or by the caller of the graph. A step may also output step-tags to drive the execution of other steps.

...

Use of the Intel Concurrent Collections models requires that all steps follow the single assignment rule with regard to the output data. Violations of the rule need to be fixed before the Intel Concurrent Collections model can be used. Typically, most of the violations are readily identified when the user describes the producer/consumer relations in the graph. Frequently a violation can be fixed by adding another tag component in the data-tag (same concept as adding a data dimension).

Consider the graph:

:: (tile);
(tile) -> ;
[data]->(tile);
(tile)->[data];

Currently there seem to be only puts and gets of Items and Tags, but it would be incredibly useful to have an operation similar to: 'boolean PutIfNotExists(Tag_t tag)'. This method would return true if the tag did not exist before the Put, and false if it was already there, but either way the Tag will be there after the operation. I imagine that this would help alleviate the cost of re-queueing Steps, given that the programmer decides he wants to produce dependent tags only after some of the input data has also been produced. Such a method would also be usful for Item collections.

Is there any plan to implement this kind of feature in CnC?

Intel Concurrent Collections for C++ version 0.4 is now available!

Intel Concurrent Collections for C++ simplifies software parallelism by providing a mechanism to adapt a C++ program to execute in parallel while allowing the application developer to ignore complex issues of parallelism such as low-level threading constructs or the scheduling and distribution of computations.

New and noteworthy in version 0.4

Linux* support, in addition to updated MS Windows* support
A new and improved runtime API

For more information and a free download, go to:

http://software.intel.com/en-us/whatif/

and click on the Intel Concurrent Collections for C++ link.

Is there a roadmap for the official release (not whatif) ? Andwill Concurrent Collectionshave alicense like TBB ?

Quoting - vu64

Is there a roadmap for the official release (not whatif) ? Andwill Concurrent Collectionshave alicense like TBB ?

I would like to know if there is some roadmap for the official release too, good post. :)

therakebackmafia.com

Quoting - andrew11

I would like to know if there is some roadmap for the official release too, good post. :)

therakebackmafia.com

Intel Concurrent Collections for C++remains underactive development and continues to evolve. We will continue toprovideIntel Concurrent Collections releases via the whatif site for now. When we have a firm productization plan in place, we will share it with our users.

If you would like to provide feedback about Intel Concurrent Collections, please visit the Whatif Alpha Software Forum, via the link below, and share your thoughts in an Intel Concurrent Collections for C++ discussion thread (search for "concurrent collections" via the search box in the upper right hand corner) or create a new thread.

http://software.intel.com/en-us/forums/whatif-alpha-software/

We appreciate your continued interest in Intel Concurrent Collections!

Intel Concurrent Collections for C++ version 0.5 is now available!

Intel Concurrent Collections for C++ simplifies software parallelism by providing a mechanism to adapt a C++ program to execute in parallel while allowing the application developer to ignore complex issues of parallelism such as low-level threading constructs or the scheduling and distribution of computations.

New and noteworthy in version 0.5:

* GraphBuilder, a GUI tool to design and build an Intel Concurrent Collections for C++ graph has been added to the Windows product. GraphBuilder will translate the graph either to .cnc file or, directly to a CnC C++ header file and corresponding codinghints.txt file. Please see the online documentation GraphBuilder User's Guide document for information on using the GraphBuilder.
* Tag-ranges and the parallel_for construct have been added
* CnC for distributed memory (distCnC) for the socket-based communication model has been added. (See the Runtime API document for detailed information.)

For more information and a free download, go to:

http://software.intel.com/en-us/whatif/

and click on the Intel Concurrent Collections for C++ link.

Please use this forum for feedback. We appreciate all feedback, in particular feedback on the following:

1. Your thoughts on this parallel programming model. How does it compare with other techniques you have used?

2. Features of Intel Concurrent Collections for C++.

3. What missing features are important to you?

4. Syntax of the Intel Concurrent Collections for C++ textual notation.

5. Ease of use of the current implementation. What did you find difficult to use or to understand?

6. Are there restrictions that present problems for your application domain?

7. What did you like?

--Melanie

Melden Sie sich an, um einen Kommentar zu hinterlassen.