At Intel we have been wondering whether parallel programming features built into mainstream compiled languages such as C and C++ would speed up adoption of parallel programming. Parallel programs would make better usage of available hardware and enable more efficient solutions to day-to-day problems. The C++ language committee has been considering parallel programming extensions but they move slowly. As an experiment, we've introduced some simple extensions to C/C++ that allows asynchronous execution of any statement. A compiler that supports these features is available at Intel's whatif.intel.com site: http://softwarecommunity.intel.com/articles/eng/3689.htm.
We were motivated to do this because even though multi-core processors are everywhere, programs that make use of more than one processor are relatively rare. Granted, multi-threading is in common use within the high-performance computing community. They've used parallel processing for decades using techniques such as MPI for clusters and OpenMP for multi-processor systems. On the desktop, with the availability of "hyper-threading", or logical multi-processors, most of the commonly used pre-packaged software has also been adapted to use multiple processors. However, we have not yet reached the point where all programmers designing a new program or extension to some existing software ask themselves "how do I make my program use multiple processors if they are available and do the job in parallel?"
Writing a parallel program is far more complex than writing a sequential program. Until a person actually writes one, he/she will not appreciate the truth of the previous statement. There are any number of pitfalls and subtle issues when global state is modified simultaneously. But that complexity aside, there is also a lack of ease of access to parallel programming features in compiled languages such as C and C++. Yes, there are APIs provided by Microsoft Windows and standard threading packages such as pthreads, but they remain outside the language, harder to use than say, just writing a for-loop, and non-portable in general.
The language extensions we chose are a very small set: __parallel and __spawn for structured parallel execution in a fork-join model, a __par construct for for-loops whose iterations can be executed in parallel, and __critical for protecting against concurrent updates to shared variables.
We're eager to have you check out the compiler and tell us what you think. You opinion on the following would be very welcome:
- are these extensions expressive enough to parallelize programs?
- are they easy to use?
- do they provide sufficient protection mechanisms against race conditions?
- is it a good idea to build these features into the language?
- do you see performance gains from using these features?
Other comments are also welcome.