Parallel programming extensions to C/C++

At Intel we have been wondering whether parallel programming features built into mainstream compiled languages such as C and C++ would speed up adoption of parallel programming. Parallel programs would make better usage of available hardware and enable more efficient solutions to day-to-day problems. The C++ language committee has been considering parallel programming extensions but they move slowly. As an experiment, we've introduced some simple extensions to C/C++ that allows asynchronous execution of any statement. A compiler that supports these features is available at Intel's whatif.intel.com site: http://softwarecommunity.intel.com/articles/eng/3689.htm.

We were motivated to do this because even though multi-core processors are everywhere, programs that make use of more than one processor are relatively rare. Granted, multi-threading is in common use within the high-performance computing community. They've used parallel processing for decades using techniques such as MPI for clusters and OpenMP for multi-processor systems. On the desktop, with the availability of "hyper-threading", or logical multi-processors, most of the commonly used pre-packaged software has also been adapted to use multiple processors. However, we have not yet reached the point where all programmers designing a new program or extension to some existing software ask themselves "how do I make my program use multiple processors if they are available and do the job in parallel?"

Writing a parallel program is far more complex than writing a sequential program. Until a person actually writes one, he/she will not appreciate the truth of the previous statement. There are any number of pitfalls and subtle issues when global state is modified simultaneously. But that complexity aside, there is also a lack of ease of access to parallel programming features in compiled languages such as C and C++. Yes, there are APIs provided by Microsoft Windows and standard threading packages such as pthreads, but they remain outside the language, harder to use than say, just writing a for-loop, and non-portable in general.

The language extensions we chose are a very small set: __parallel and __spawn for structured parallel execution in a fork-join model, a __par construct for for-loops whose iterations can be executed in parallel, and __critical for protecting against concurrent updates to shared variables.

We're eager to have you check out the compiler and tell us what you think. You opinion on the following would be very welcome:

    1. are these extensions expressive enough to parallelize programs?

    1. are they easy to use?

    1. do they provide sufficient protection mechanisms against race conditions?

    1. is it a good idea to build these features into the language?

    1. do you see performance gains from using these features?




Other comments are also welcome.

For more complete information about compiler optimizations, see our Optimization Notice.

Comments

I have to go with Marc on this one; whether an otherwise identical feature gets labelled "__parallel" or "#pragma omp parallel" really falls below any reasonable threshold of interest.

There must be more important things to tackle! For example, instead of playing defense against inherently nondeterministic constructs (see point 3 above, "provide sufficient protection...against race conditions"), how about introducing deterministic mechanisms, start to move beyond threads?


As Marc has correctly pointed out, these language extensions do the same thing as OpenMP. In fact, under the hood, we have implemented them using OpenMP, so that they share the same reliable, scalable, robust runtime support.

However, the point of these extensions was to gauge accessibility. Do keywords make sense? Should C/C++ evolve to include parallel constructs as part of the language? Or should parallelism remain “outside” the language, as a library, or optional compiler-supported feature accessed through pragmas?


openmp is a standard implemented by many compilers, and I believe it allows to do everything you can do with the few extensions you listed, so I don't really understand the point.


On Matthew Wolf's blog http://software.intel.com/en-us/blogs/2007/12/20/multicore-in-the-classroom-say-it-three-times/ he discusses multi-core in the class room. His blog and your got me thinking about the connection. In my opinion teaching parallel processing will definitely be more straight forward and more quickly integrated into the core curriculum if the instructors can simply focus on the code and not the integration of multiple pieces of technology.