What is so Hard About Parallel Programming?

I've seen it again.  One more claim that "Parallel programming is hard" and then the claimant launching into some complex and convoluted solution to make parallel programming easier.  (Matt Wolfe has some interesting observations in this regard.) How hard is it to do/learn parallel programming?
Disclaimer: I've been doing parallel/concurrent/multithreaded programming in one form or another for over 20 years. I want my readers to understand that my opinions in this area may be tainted by my experience. I watch chef Gordon Ramsay whip up something in the kitchen, making it look easy, but I know I'd never be able to do the same thing with the same ease. He's had years of practice, and I know that I might be able to get closer to his culinary skills with some instruction and lots of practice. It's the same with parallel programming, IMO.

I do know something that is hard: Theory of Computation. O-M-G! All that stuff about grammars, finite state machines, context-free languages, pumping lemmas, Turing Machines, recursive functions, uncomputability, Church, Post, and Kleene. It was enough to make my head swim when first confronted with all of this. (If your head is swimming with the memory of these topics, too, pause now and take a deep calming breath.) With mega-liters of water under the bridge since those harrowing days, I no longer wake up in the middle of the night screaming about being asked to solve the Halting Problem.

Recently, I started reading Feynman Lectures on Computation(Perseus Publishing, 1996), edited by Tony Hey and Robin W. Allen. These are transcriptions of lectures delivered by Richard P. Feynman during the mid-80s at the California Institute of Technology. Chapter 3 deals with the Theory of Computation. It is 40 pages long and covers a range of topics from finite state machines to Turing Machines to uncomputable functions and the Halting Problem.  Forty pages(!) on stuff that took me a whole semester to cover. Can it really be that hard? Prof. Feynman (a Physics professor) certainly makes it look easy.

Granted, there was a lot of stuff I covered in my semester course that was left out of Feynman's lecture. But, on further reflection, he didn't need to cover it to make his connections and to give his students (likely non-CS majors) enough background to allow them to be conversant with the topic and to get started on their own investigations. I believe that Parallel Programming can well take this path, too.

My thoughts about the proponents of the "Parallel Programming is Hard" line is that they are probably thinking about the totality of all the things one needs to have mastered to be an expert and effective parallel programmer in all situations. That is a lot of stuff and could be thought of as "hard." But do we really need to teach it all at once, in one sitting as it were?  Couldn't we start by hitting the highlights and give some previews about what the future issues are that will give students enough information to be practical parallel programmers?

I think that learning parallel programming is no harder than learning a second programming language. C++ and Fortran and Java have syntax rules and methods of coding that must be mastered in order to write programs, as do Perl, Lisp, COBOL, Python, Ruby, Scheme, APL, C#, and a thousand other languages. Any given parallel or multithreaded programming model/method/API also has syntax rules and methods of coding that must be mastered in order to write programs.

Some may say that the methods and other pitfalls in parallel programming are so different from sequential programming, and this is what makes it "hard." Well, COBOL and Lisp are two very different programming languages, but I've been able to learn both of these during my career.  While I may not be proficient in either of these right now, I feel I could easily brush up on the details and be able to achieve at least a journeyman level of skill.

For any university degree program that involves computer programming, journeyman seems like a sufficient level of skill in parallel programming to ask a sophomore to have achieved. (Is there any university/college that wouldn't expect their rising juniors to have at least two programming languages under their belts?) If more specific parallel programming skills are needed, those niggly, more advanced details can get taught during the junior and senior years. To get to the journeyman level, we need to add parallel programming fundamentals and principles (not all the esoteric details) in the first programming course and continue to exercise those skills beyond that.

What about the creative side of parallel programming? I've often heard that programming is more art than science. I think that this is doubly true for parallel programming. We can easily teach the mechanics of programming. But, if there isn't an inate talent for logical thinking and breaking down a problem into a series of steps, a student isn't going to stick around in a CS program for very long. Once parallel programming becomes a part of the university curriculums, especially if it starts out on Day One, we'll still have people enroll in courses only to later discover that they can't grasp the art.

To summarize, I don't think parallel programming is any harder than learning a second programming language and, taking a cure from Richard Feynman, we don't need to fill in all the details of problems and pitfalls inherent in concurrent algorithms in order to start teaching parallel programming at the outset of a student's academic career. We don't need to breed a race of "Atomic Super-Programmers." Give them enough to get started with some chance of success in a controlled environment and fill in the blanks later, as needed. That doesn't sound all that "hard," does it?
For more complete information about compiler optimizations, see our Optimization Notice.

Comments

's picture

Not even wrong.

In single-threaded programming, you have to keep in mind all the invariants that hold in the procedure or method you are writing. That's hard enough. If there are other threads, you also have to create and remember invariants that hold no matter where those other threads are executing. This is harder - fundamentally harder because of asynchrony. Without a lot of discipline it leads to a combinatorial state space explosion and an endless list of unreproducible bugs.

The logical conclusion of your article is that designing hardware - the ultimate parallel computing system - is no harder than learning another programming language. This is manifestly false, as you will find out by talking to your fellow Intel engineers. Hardware and software design have been two different ways of thinking for 50 years, and that's almost entirely due to the the very different way each deals with parallelism. The best hardware engineers I have known are terrible programmers (invariant? what's an invariant?), but have an intuitive grasp of parallelism.

Clay Breshears (Intel)'s picture

anoid - You've actually helped my point here. You talk about "invariants," but I have never thought about those. At least not conciously. I'm not even sure I've ever seen a book on a programming language that talks about those. (Arrrgh! if only I had an electronic book that could search for that term.) I may have been using these or the skills related to using an invariant, but I've been programming for almost 30 years without needing to know about an invariant. These are the things that "master" programmers may think about or talk about, but it's a really, really fine detail that doesn't need to be explicitly spelled out or drilled into the imressionable heads of young programmers for them to become proficient.

As for parallel invariants and the horrors of asynchrony, covering simple data races, deadlocks and how to think about interleaving parallel execution to determine if and when these might occur must be done. I wouldn't couch it in terms of invariants and I woudn't expect to be able to equip students for all possible cases, either. If they are aware of it and have a few tricks to identify and combat it, then they can become proficient enough.

Your last paragraph confuses me. You claim that the logical conclusion of my argument is that parallel hardware design will be no tougher than learning a new programming language. Then you state that hardware and software design are totally different ways of thinking. If these are so different, how can you logically conclude my thoughts on parallel programming (software) would lead to any claims about hardware design?

My whole thesis is that *basic* parallel programming is no harder to learn than a second programming language. How much different is it to track down a data race (assuming that you know what they are, several ways to spot them, and some methods of solving them) than it is to wrestle with getting your first sizeable Modula-2 code to compile? I've seen people struggle for a whole week on both of these problems. Yes, there is more to consider in parallel code than serial code, but that's why you need to teach them the basics like data races, load balance, mutual exclusion, and synchronization contention. You don't need parallel programming patterns or data flow languages or data parallel programming or to know all the intricacies of vast libraries of object-oriented message-passing APIs. All that can come later, because that is what makes parallel programming "hard."

's picture

I fully agree with user anoid.

Learning sequential programming is simply a matter of learning syntax and internalizing common functions. There is little uncertainty -- program flow is sequential and can be mapped out linearly in a set of predictable steps. Debugging such problems is easy, because a bug is simply one or more highly localized and easy to extract problems in this friendly sequence of predictable steps.

Parallel programming introduces unpredictability. It introduces bizarre and difficult to reproduce bugs that are not predictable. Solving parallel programming bugs requires a very large amount of time investment, a strong sense of intuition, and luck. These are deeply fundamental differences.

Clay Breshears (Intel)'s picture

I don't disagree. However, I also contend that I can say "Fortran programming introduces unpredictability." By using default variable types, if I mistype a variable name, the compiler is happy to allocate a new variable for me and my outcome may not be what I had predicted it to be. How do you find this error? If you know about the default declarations, you can search to find a mistyped variable name or you can try putting in a statement to turn off default behavior and declare all variables explicitly to get the compiler to complain about undeclared variables. Thus, if we know about a possible error situation and some possible solutions for it, we can attack problems that arise. Parallel or serial.

Yes, I understand that the Fortran example will always give me the same results each time whereas an incorrect parallel code may not (or will give me a completely different result every time). Still, if you let students know about the possible parallel programming problems (data race), their causes (unprotected access), and some solutions (critical region or local variables), is that really all that much different than my Fortran example? Multiplied by a couple factors, of course, but nothing that can't be anticipated.

Granted, if you're doing it by hand, finding and solving parallel bugs will take all those things you mention. On the other hand, I've pored over printouts for hours looking for a Fortran bug where I had typed '1' for 'I' or '0' for 'O'. Believe me, tracking down something like that can take intuition and luck, too, even if you know what you're looking for.

I'm not saying that parallel programming is easy. Serial programing isn't easy. Parallel programming, for me, is like twice as hard as serial programming, but I get the feeling that those saying "Parallel Programming is Hard" want me to believe that it is 10 or 100 or 1000 times as hard. And I dont need to know about false sharing or the ABA problem or livelock to do parallel programming. (I'm may get bit by these at some point, but when you're starting out, these aren't important.)

jseigh's picture

For me it's too easy. I was doing lock-free algorithms for more of a challenge but even that got kind of boring. So it's a mystery to me why they think it's so hard. It does however take more than a day to learn and master, which seems to be about the length of time some of the more vocal threading critics have spent on it.

Part of the problem is I don't think they teach multi-threaded programming right. The way it is now, you either get it or don't . And if you don't get it, you can get into trouble pretty quickly.

's picture

Oh, I agree that it is possible to educate students about parallel programming issues. Operating systems courses and dedicated courses on parallel computer architecture are available. These courses teach students how to write multi-threaded code and how to use mechanisms for concurrency control.

However, many CS students that are exposed to these concepts still have difficulty writing parallel code that is as efficient, bug-free, and maintainable as sequential code. There may be some individuals with the ability to solve strange intermittent resource contention bugs, but these are certainly not the norm.

When an issue is intermittent or difficult to reproduce reliably, it stands to reason that one will need to spend more time solving the issue proportional to how difficult it is to reproduce and how many entities (threads, processes, etc.) are involved. The sequential bugs in a system are often hard enough to solve. Adding an additional layer of parallel bugs greatly increases the complexity of a system and the number of configurations one must mentally explore to solve a problem.

Parallel code has a larger collection of possible states that can't be easily determined due to the probabilistic nature of parallel bugs. A sequential code base with an equal number of lines of code has a smaller collection of possible states that *can* be easily determined.

's picture

Interesting topic and discussion.

One thing I disagree with: "I don't think parallel programming is any harder than learning a second programming language" My sense is that it may take considerably longer to really master parallel programming, perhaps by an order of magnitude. To date, it's been a sufficiently large barrier to preclude the vast majority of programmers from really mastering parallel techniques. During that time, I bet they learned more than one language.

I do agree with the notion that we can start teaching parallel programming without filling in all the details/pitfalls. A friend of mine prides himself on being able to successfully introduce parallel programming concepts very early in his school's curriculum.

For what it's worth, we at Cilk Arts recently published an e-Book as a little primer on multicore programming, and have been pretty encouraged by the response and downloads from thousands of developers http://www.cilk.com/multicore-e-book/

's picture

Interresting qarticle but I think it is important not to mixup every thing. // programming is not as a first step a language problem. It is more a matter of mind set and the way people tackle the problem.

For me it is the same as object oreinted programming. The most important thing is to understand what OO prgramming means and then a language can support this approach with more or less efficency.

In parallel programming it is the same and I agree that as a first it is not needed to submerge students with all the intricacies of // execution, but just make sure they will get the basic good practices when entering this new world.

A computer language is always to express something but not the thing by itself. I am working in a big european semiconductor company and I have a software background but I think we should look at the silicon designers which have a strong knowledge of parallel execution, the language VHDL or verilog being just a way to express it eficiently.

I developped a big multi processor ssystem in the 80's and I agree with Clay that multi processing shouldn't be more complex than leearning a new language assuming we adrress the problem from the right angle. THis means that it is important to educate people about the basics of parallelism(concurrency, race conditions,..) and as well how to make their application observable and debuggable which are again concepts fully valid in single CPU/single threaded applications but more important in case of // execution. Then based on the previous assumptions // programming shouldn't be more complex than learning a new language.

's picture

I agree "Parallel Programming" is 10x harder than "Sequential Programming".

Why? "Parallel Programming" includes reentry, timing, message, event, and synchronization, .etc. These instruction and data flows are dynamic and usually unpredictable. You can not use a debugger to trace them step by step. Plus dynamic resource allocation, all of this makes programming harder.

"Parallel Programming" in theory is not very hard, but in the practical large program, the complex is more than 10x higher than sequential programming. The "learning" programs are different from "commercial" programs.

's picture

I think what Clay is suggesting is akin to what most other sciences do all the time.

For example, consider the Calculus classes that most people in CS had to take during a college education. It introduces the concepts of differentiation and integration in a simple and incomplete setting. Then, later classes in Real Analysis, Complex Analysis, Differential Geometry and others can extend these concepts far beyond the starting point. There are complexities in how to handle divergences in integrals, what to do with functions that can't be written as simple equations, the differences between functions, relations and distributions, and other such things that separate the novice from the expert. Those details also don't show up in a majority of cases. They are all true, and they can make a problem blow up in your face if ignored at the wrong time, but they are not part of many real uses of the techniques.

Similar things could be said in all of the other sciences. However, for some reason the people who complain about how hard parallel programming is don't separate this introductory core away from the more advanced details. This produces an entry barrier that appears insurmountable to the new initiate. Think of Clay's learning a new language comparison. If I told a new student that they had to learn the intricacies of generative programming techniques and expression templates as part of the first C++ class, everyone would give up at the start. In any new language, it takes several years to move from novice through journeyman and into expert. The same is true for parallel programming. However, very few language experts are currently graduating from undergraduate institutions, so why should we require expert parallel programming at that juncture?

My interactions with parallel programming have been in the academic and research end of the market. It is a common tool for physical simulation programming. In this setting, I have found it to be somewhat harder, but not a factor of 10 or more. Those in the commercial software market may have different experiences, and since I have none of these experiences I can't argue with them. One thing I would say is that designing for parallelism is easier than trying to intelligently modify an existing serial code. Even with tools like OpenMP, I prefer working with code where parallel has been part of the thought process from the start.

That provides a starting place for educating new students (that is not original with me).

The first CS class usually includes some programming and a lot of learning to turn processes into algorithms. Starting from the first constructions, make the distinction between serial and parallel, and look at the different issues that crop up in both. I personally even favor some manipulative activities as part of this process, where individual students or groups of students work through something by hand. I have used simple examples like getting groups of audience members sorting playing cards in academic talks to show some of the different processes and the challenges for each.

Before anyone complains that there are layers of complications that I'm leaving out at this level of presentation, I already know that. I also know that when I teach about light propagation in introductory physics I never include a single Feynman diagram calculation from quantum electrodynamics. The goal is to get them moving on the road to someday becoming an expert, not to try and make them experts in their first exposure. Those who go no further at least have some awareness, while others have a foundation of thinking about and confronting problems that will be addressed later in the curriculum.

Each class after the first can continue this thinking, and by the time the students graduate they will be immersed and so have a much smaller barrier between them and expert.

John

Pages