COMMENTS
posted @ Tuesday, July 29, 2008 2:56 PM by Kevin Kerr
Having spent some time working on memory consistency in programming languages, I think it is important to be careful about a couple of the assumptions made in the book:
- Data races in C or C++ with pthreads are a bug. The Posix standard is reasonably clear about that. So is the current draft for the next C++ standard. A data race gives the implementation license to produce any output whatsover, including referencing an invalid address. Even if it appears that any interleaving of the load and store instructions generated by the compiler would produce an acceptable outcome, the compiler is allowed to optimize based on the assumption that there are no data races. The code may not be what you think it should be. The architecture may allow or require the loads and stores to be performed in smaller pieces instead of as a single unit. This of course makes race detectors even more useful.
- The assumption that "the underlying hardware supports the sequential consistency memory model, ..." makes it easier to understand the examples. But it's important to remember that it has to apply to more than just the hardware, and it's completely unrealistic. It applies to almost no current hardware. And even if the hardware enforces sequential consistency, your optimizing compiler will rearrange your code sufficiently that you won't be able to tell.
posted @ Wednesday, July 30, 2008 7:21 PM by Hans Boehm
Your comments about memory consistency are well taken. That's a good topic for another blog post.
posted @ Saturday, August 02, 2008 9:10 PM by Charles E. Leiserson
I have found Erlang's OTP to be a very useful platform for building highly-available, widely-scalable concurrent applications. It's experiencing something of a revival, particularly among Rubyists, and finding a new lease on life in web-based message-oriented applications.
Its share-nothing, message-passing concurrency semantics combined with its clean, functional programming style and "process-oriented" design paradigm offer a compelling alternative to lower-level languages with bolt-on libraries and keywords. The "process-oriented" paradigm ensures that the serialized code is minimal, comparatively speaking--that is to say, you almost have to go out of your way to write code that cannot be parallelized.
One thing that I wonder about is if Erlang, or some evolutionary successor thereto, were written in Cilk++ instead of C. That idea intrigues me a lot.
There are some quirky things about Erlang that could be improved upon, but overall I've not found a language with better built-in support for concurrency and scaling to multiple cores.
posted @ Monday, August 04, 2008 7:13 AM by Bob Calco
And it's pattern-matching and binary parsing syntax are to die for! I almost feel naked without them when I go back to C, C++, Java, C# etc.
I love languages in the C/C++ family, but having been smitten by Erlang's overall value proposition, I would use C/C++ only in remote corners of my applications that Erlang cannot easily reach, to write port programs or linked-in drivers if I got desperate.
In these cases I'd love to try Cilk++ out. And like I said I'd love to consider a future Erlang or Erlang-like language actually written in Cilk++.
I neglected to mention Erlang's "lightweight" threading mechanism, which does NOT expose OS-level threads to the programmer. The programmer thinks solely in terms of these lightweight processes, and is encouraged to create hundred or even thousands of them, and to string them together in arbitrarily complex supervision models. They can behave like general purpose servers or finite state machines, or whatever you invent as new kinds of behaviors.
So Erlang as a platform for concurrent programming is something well worth covering in your book and exploring for ideas for future applications of Cilk++.
posted @ Monday, August 04, 2008 7:46 AM by Bob Calco
My personal familiarity with Erlang is minimal (I've never written an Erlang program), but my understanding is that Erlang addresses distributed-processing concerns, such as fault-tolerance and soft real-time quality assurance, not processor performance. Since multicore technology addresses processor-performance bottlenecks, and Erlang doesn't pretend to address this issue, I don't feel negligent in failing to include mention of it.
Indeed, my understanding is that Erlang is slower on one core than even most functional languages, such as OCaml or Haskell, and far, far slower than C++. For example (and here I speak from profound ignorance -- enlighten me if I'm wrong), I doubt that a 4-core Erlang quicksort program could rival a decent 1-core C++ code. Like many message-passing environments, Erlang seems to need a whole bunch of cores before it can outperform a single-core implementation in a conventional programming language.
Erlang seems well suited, however, as a platform for distributed processing, as opposed to multicore processing, where communication exacts a high toll and processor performance is not a bottleneck. In particular, its support for fault-tolerance and hot-swapping of software modules makes it attractive for many distributed applications. Because of its performance limitations, however, I don't see Erlang as having the potential to be a big player for multicore, but for distributed processing, it seems to occupy an important niche.
posted @ Monday, August 04, 2008 11:50 PM by Charles E. Leiserson
I agree C++ is always going to give you better performance on a single core--that point is spot-on. Languages like Erlang and Oz and Ocaml and Haskell are written *in* C/C++ for a reason!
There are a lot of ways to look at this question of performance. It's not only about processor bottlenecks, but also about large-scale application design. I guess if I had to distill what I'm saying it's that Erlang would be my choice for the distributed part of any real-world concurrent application, almost bar none--and that is no less a part of modern real-world applications than the multi-core part.
Performance-critical code can always be pushed to a "port" or a C/C++ node or linked-in driver--all three different ways of interfacing to Erlang from native code are there for a reason.
Don't underestimate the lightweight threading advantage, would be my only criticism. Erlang *does* do a fantastic job of evenly scheduling thousands of these lightweight processes over however many cores you have (provided they follow the rules regarding avoiding side-effects, which are pretty easy to grok), and you can configure SMP support at the command line when you load an Erlang node to make its scheduling even better.
Erlang crushes native-thread solutions in any heavy load test and the more "parallel-friendly" the code, the more impressive the blow-out. The famous example is the comparison between Apache and Yaws, a web server written in Erlang (see http://www.sics.se/~joe/apachevsyaws.html). The same task that kills Apache at about 4,000 concurrent connections chugs right along in Yaws at 80,000 concurrent connections, on a commodity Linux box. The difference is not single-core perf but lightweight vs. heavyweight processes, which are then distributed over multiple cores (albeit naively).
The concurrent-programming promise in Erlang is that performance of your code will dramatically (if not linearly) improve the more cores you have, without having to change a single line of code, and it keeps this promise so long as you write code that is inherently process-oriented and side-effect-free. However badly it may comparatively perform on a single core machine given this or that algorithm in C++ (and it does NOT promise this kind of performance advantage), the future is indeed multi-core, even massively multi-core some day, and so for a technology as arguably old as Erlang, it's doing quite well and hardly showing its age--indeed it seems to get younger the more cores you have. There is a lot of C++ code out there that cannot make this promise.
Incidentally: There is another built-in compiler option for improving Erlang's performance called "HiPE" or "High Performance Erlang" that also comes out of the box. It creates native code that performs rather like Concurrent SML than "plain old" Erlang. So this is an option short of dropping down to hand-writing native code to get performance of particular pieces of code up to spec.
My other point was that I can imagine and even more high-performance Erlang emulator, one written in, say, Cilk++, that would improve its performance without having to resort to native code. I happen to like some of Erlang's HLL trade-offs because the language practically makes certain kinds of errors, like corruption of shared data, by definition impossible. It forces you to write clean, elegant code that is easy to parallelize, indeed, you learn to *think* that way. No no longer rely on convenient but dangerous global memory schemes, nor do you write functions with unintended consequences. You structure your programs as many interrelated processes that communicate simply and naturally with each other, whether there on another node on the same machine or another node on a remote machine, and can be arranged in arbitrarily complex supervision trees.
Given the ease-of-development, time-to-market, amazing library support for distributed programming, and built-in multi-core "consciousness" and discipline of Erlang, I see it at a minimum as an evolutionary parent of the next generation of concurrent programming platforms. They got so many things right when they designed that language and its supporting libraries for the distributed world we live in that it would be a shame to ignore Erlang's "total package" value proposition as a platform for concurrent programming when designing future platforms for a universe that is *both* multi-core *and* highly distributed.
Comparatively speaking, Erlang is like John the Baptist. It is not the Messiah, but points to the Messiah, and makes straight the way of repentance when the Kingdom of Multi-Core Distributed Programming finally comes....indeed it is already at hand. ;)
Sorry to get all evangelical on you, but I really do see Erlang as something worth studying in this context.
Specifically, I imagine an Erlang with Cilk++'s "work stealing" scheduling algorithms for its lightweight threading execution model, and I get jazzed.
To recap: What gets me excited about Cilk++ and related technologies is the hope they hold out of building an even better evolutionary successor to distributed programming languages like Erlang, because I think it's the combined benefits of multi-core and distributed programming that are going to change the way we build systems in the future. Even with newfound multi-core mojo, C++ remains difficult to program for distributed applications---much more difficult and bug-prone, in fact, than Erlang.
The issues (multi-core vs. distributed programming) may seem orthogonal, but one without the other is only half the solution.
OK, I'll get off my soapbox now. You're points are well-taken and indeed I wouldn't be as interested as I am in Cilk++ if I didn't know the rough spots of my beloved Erlang... indeed even of plain ol' C/C++, which I also love. :)
posted @ Tuesday, August 05, 2008 6:26 AM by Bob Calco
There is another way of looking at this. If you have libraries that support the syntax sugar of 'spawn' and other Cilk++ keywords/directives for multi-core programming, then it's not so hard to imagine libraries for network and inter-process programming that borrow from Erlang's lessons in distributed application design, with perhaps some keyword/syntax sugar support.
So rather than bringing Cilk++ to Erlang, maybe it's worth looking at Erlang's process-oriented programming concepts and lightweight threading mechanisms, and building support for them into Cilk++ some day?
Personally I'd be a lot more willing to make a decisive switch. As it is, Erlang has a killer story to tell for distributed programming, and a better-than-most story to tell for multi-core. (Yea, its story for single core perf is a rather short one...) On balance, though, this makes it my first choice, and so I don't see it as a "niche" language so much as a "total package" language.
Cilk++ could have a killer story in both respects--high perf on one to many cores, and easy to build distributed concurrent applications. *That* would make it in my opinion an even more compelling platform for future development.
In any case, thanks again for the dialog, and sorry for chewing up so much bandwidth to explain myself. :)
posted @ Tuesday, August 05, 2008 7:01 AM by Bob Calco
One thing I am curious about is your point regarding the "distributed" attribute of real world apps being no less important than the "multicore-enabled" attribute. It's probably an interesting Venn diagram, and I wonder how it'll evolve over the next 5-10 years. Couple thoughts on this point:
First, they may not always be distinct: there's a class of apps where throughput matters more than response time, and running multiple apps with something like Erlang may fit the bill well, in terms of putting all the cores to work.
Second, I would say that there's a huge class of desktop apps that don't have a distributed notion (for example, CAD/CAE tools, graphics tools, etc); and the mainstream desktop productivity apps. Google docs and other browser-based apps have made an interesting dent in desktop software, but I think the mainstream apps will be on the client for a while. And many of these are written in C/C++, and need a path to multicore today (or very soon.) Thoughts? Do you see it differently? Can Erlang help them somehow?
posted @ Tuesday, August 05, 2008 7:17 AM by Ilya Mirman
Erlang currently sucks for writing those kinds of *rich client* applications, though there are Erlang bindings to wxWidgets which makes it possible to write them.
Technologies like Adobe AIR/Flex, .NET Silverlight, and JavaFX are still the way to go for pretty GUIs. But threading in these technologies is a pain point.
Erlang's sucky UI story is just for lack of attention, I think. Erlang's focus has been on distributed processes.
But for the *backend* of any RIA, mobile app or peer-to-peer system, I see Erlang's "Distributed + Multi-Core" story to be better than popular .NET and Java alternatives--though both VMs are evolving rapidly.
.NET for example has added a host of features to the common language runtime to support generic, functional and even dynamic programming.
However, neither Java nor .NET have lightweight threading or very good built-in support for scaling up to multiple cores. These are left, as usual, as an exercise for the coder. .NET does not yet even support creating fibers (a fact that astonished me when I read pre-release versions of Joe Duffy's book, "Concurrent Programming on Windows Vista" at Safari)--though folklore has it that MS made some heroic efforts to include them in the last release, and they are focusing heavily on concurrent programming as a general area of improvement.
I do see a lot of apps that aren't web-oriented still being relevant today, and for those I admit the case for Erlang is less compelling. There is however a case to be made (provided better UI support in the future) and that lies in the inherent improvement in reliability and fault tolerance that Erlang's process-oriented programming discipline brings to the table, quite apart from either multi-core or distributed programming. The ability to fail gracefully by design is another black art in programming that is often neglected for sexier buzzwords. Erlang has an interesting story to tell in stability and reliability as well. For example, the following desktop 3D subdivision modeling application is written in Erlang:
http://wings.sourceforge.net/
http://internap.dl.sourceforge.net/sourceforge/wings/wings3d_manual1.6.1.pdf
But there's no question that Erlang's strength is in high-volume messaging--which is inherently a "Distributed + Multi-core" story.
I guess my view is that the future is multi-core AND distributed, and a proper focus on performance would address both the low-level per-core efficiency of the code and the large-scale, multi-tier, multi-code design patterns that make the world a safer place even for erroneous code.
Erlang seems to point the way, but I expect a future platform to fit the bill more perfectly some day.
posted @ Tuesday, August 05, 2008 8:17 AM by Bob Calco
I wrote a reply to you that did not appear to take, and I did not save a copy externally. I'll try it again, from memory, with probably some additional and subtracted points--I apologize in advance if this ends up being a duplicate.
To answer your questions:
1. I see the future of desktop application development being focused on "smart client"/occasionally disconnected (aka, Rich Internet) applications, peer-to-peer applications. I also see the shift to smaller computing devices also amplifying the case for BOTH multi-core AND distributed programming support. Just consider the iPhone.
2. Erlang is in its current incarnation best suited for server-side programming that is inherently distributed and must take advantage of multiple cores as well. As a back end for RIAs it's got a very compelling value proposition.
3. There are impressive desktop apps written in Erlang, for example:
http://www.wings3d.com
http://internap.dl.sourceforge.net/sourceforge/wings/wings3d_manual1.6.1.pdf
There are also bindings to various GUI toolkits, like wxWidgets, to make client applications *possible*. See http://sourceforge.net/project/screenshots.php?group_id=151173.
The main case for using them in this scenario is the fact that Erlang has a lot of mechanisms for fault-tolerance, reliability and graceful recovery in the face of erroneous code. But the lightweight threading is another advantage that can be leveraged in a single node, UI-centric application--it's the main motivator I believe in Wings3D being written in Erlang.
3. Java and .NET are the other major players in the internet space, and they are evolving many language features that support generic, functional and even dynamic programming. However, they both lack great support for concurrent programming, though there is a lot of increasing focus on both platforms to improve the state of the art in this regard. For example, if you read Joe Duffy's "Concurrent Programming on Windows Vista," currently in early-access state at Safari, you learn that .NET does not yet even support creating Fibers--a fact that astonished me, considering the feature has been promised awhile. But really, OS-level threading should be a black box to developers. Well factored code that is designed to be easy to parallelize should just scale--that's Erlang's main contribution to the future requirements of the next-generation platform for concurrent programming. It has other things to say, as have been noted, about large-scale application design for high performance, as well.
posted @ Tuesday, August 05, 2008 8:38 AM by Bob Calco
We agree that this is a problem with the current C/C++ and pthreads standards. Locks can be expensive.
However, simply allowing races on ordinary variables causes problems with existing compiler optimizations, on hardware that doesn't support atomic references of the right kind, and in that it makes it very difficult (a clean solution is an open research problem) to specify what races mean.
The working paper for the next C++ standard contains an alternate solution: We support "atomic" objects that may be concurrently accessed without introducing a data race. By default, you get clean semantics, at some expense, but significantly less expense than locks. (Think Java volatiles, with atomic increment and decrement, etc.) If even that's too expensive, you can give up memory ordering guarantees, and get back to very close to the cost of a plain memory reference.
Although this standard is offcially still at least a couple of years out, it looks like atomics will start to appear in some implementations well before that.
Hans
posted @ Tuesday, August 05, 2008 11:16 AM by Hans Boehm
Therefore, it must be relevant! ;)
He also made another comment about the design mistakes they made in making object locks public in Java, instead of private. I could never understand why they did that.
I do wish I had that guy's job, I'm a polyglot when it comes to programming languages (I can even speak Russian fluently, and vaguely recall Latin and Greek from high school). Unfortunately he made me scratch my head when he said the following to the question about the top 3-5 languages an engineer should know to be paradigmatically broad-minded.
"More recently you might want to take a look at Ruby and Pascal, to pick a functional language, and COBOL, which has some interesting ideas in it you won't find in some of the other languages."
Pascal and Ruby are functional languages? That's news to me. Should have said Haskell, Ocaml or even (dare I say?) Erlang instead! Though I do like Ruby and enjoy newer flavors of Object Pascal, like RemObjects' "Oxygene" -- which by the way also supports parallelism in its syntax.
posted @ Tuesday, August 05, 2008 12:00 PM by Bob Calco
posted @ Tuesday, August 05, 2008 12:04 PM by Bob Calco
Firstly, Charles E. Leiserson implied that C++ is uniformly faster than OCaml. Languages like OCaml are often faster (sometimes a lot faster) than languages like C++. For example, exceptions are ~6x faster in OCaml than C++ because longjmp with a GC is much faster than unwinding slowly and checking for destructors to call. Some standard library calls are asymptotically slower in C++ than OCaml and I have seen equivalent programs run 100x slower in C++ as a consequence.
Secondly, Bob Calco claimed that Erlang, Oz, OCaml and Haskell are implemented in "C/C++" when, in fact, they are all written in a mix of assembler, C and themselves. C++ has nothing to do with any of them.
Thirdly, Bob Calco claimed that multicore programming is left as an exercise for the coder on .NET. In fact, Microsoft published their Task Parallel Library in 2007 and it does an excellent job of making multicore programming easy.
Finally, I note that this article and many respondents have confused concurrent and parallel programming. Erlang is great for concurrent programming but it offers nothing of benefit in the context of multicore computing (better scaling is irrelevant if it is always slower in absolute terms).
posted @ Monday, August 11, 2008 7:25 PM by Jon Harrop
Actually check your facts on Oz. It is in fact written in part also in C++. That's only reason I included C++ in the list. The point about assembler is well taken.
<em>Thirdly, Bob Calco claimed that multicore programming is left as an exercise for the coder on .NET. In fact, Microsoft published their Task Parallel Library in 2007 and it does an excellent job of making multicore programming easy.</em>
TPL is still very new, but the point is taken. Note I also mentioned RemObject's "Oxygene" language, which uses the PFX (of which TPL is a subcomponent) to support parallelism in the language itself. I had that kind of support in mind when I made my statement. Oxygene let's you drop the keyword "parallel" in for loops and it behaves rather like cilk_for, for example, in semantics.
Sorry if I was unclear about that.
I do take issue with your last statement that Erlang offers nothing of benefit in the context of multi-core computing. Just because it executes some kinds of algorithms slower compared to this or that other language doesn't mean, either necessarily or in reality, that it doesn't benefit from multiple cores, relative to its own performance on a single core. I also suggest you pick up Joe Armstrong's book on Erlang. Of particular note is Chapter 20, "Programming Multicore CPUs", in which he hints at Erlang's capabilities in this area with a simple demo-level implementation of the map-reduce algorithm. Erlang has a lot to offer for both concurrency and parallelism, not to mention its undisputed credentials for distributed programming on top of all that.
posted @ Monday, August 11, 2008 9:02 PM by Bob Calco
Regarding the data-parallel languages and Fortran, I would like to draw your attention to the relevant related work, such as the Vienna Fortran of Hans Zima (http://www.par.univie.ac.at/~zima/shortvita.html) or the dHPF Compiler of Ken Kennedy (http://www.cs.rice.edu/~ken/).
- What other content might be of interest to you?
It would be good to address more specifically the issue of programming heterogeneous multi-core processors such as the Cell.
posted @ Tuesday, September 30, 2008 7:18 AM by Sabri Pllana
