There seems to be a widespread notion that in order to do parallel programming, you have to write an inherently parallel program. By inherently parallel, I mean one that must have more than one thread to run correctly. A simple example is a producer-consumer program with a bounded buffer. A single thread cannot execute such a program by running the producer to completion first (buffer might overrun), or running the consumer to completion first (nothing to consume!).
Thus instead of sequential vs. parallel, consider three categories:
- sequential programs
- sequential programs with compute-head parallelism
- inherently parallel programs
I'm a big fan of category 2, because it enables me to debug the single-threaded case first. That lets me eliminate most of the logic errors without parallel pain. Then I crank up the number of processors and remove the parallel bugs. Programs in category 3 are inherently painful to debug from the start. Furthermore, they have performance pitfalls when the number of required threads exceeds the number of hardware threads, because then time slicing artifacts can hit hard.
Packages like OpenMP, Intel® TBB, and Cilk are designed for category 2. They do not guarantee any parallelism; they merely provide it when it is available. The extra threads execute tasks ahead of time that would otherwise eventually be executed by the main thread. Optional parallelism makes the program easier to understand and explain. You can understand the program's sequential execution first, and then understand how the look-ahead computations don't upset it.
The distinction between Category 2 and category 3 needs to be widely understood. Perhaps Category 2 needs a catchy name?