English | 中文 | Русский | Français
2,595 Posts served
8,341 Conversations started
In my recent "Superlinearity Is Impossible; We Just Don't Always Think Correctly" I argued that algorithmic processing superlinearity is impossible. It might appear that a parallel application was achieving superlinearity, but that appearance was due to factors other than the algorithm itself engendering a more efficient computation. For example, memory access might be more efficient in the parallel algorithm. But this would not mean the limits defined by Amdahl's Law had been surpassed. I agreed with the statement Herb Sutter makes in his "Going Superlinear" article in the February Dr. Dobb's Journal:
"But wait," someone could complain, "your example so far is unfair because you've stacked the deck. The truth is that, when you find a superlinear speedup, what you've really found is an inefficiency in the sequential algorithm."
When parallelism is simpler
I had the good fortune to be able to speak with Herb Sutter today about these issues. We found ourselves in agreement in most areas. We agree, for example, that memory issues can make an enormous performance difference when you compare an algorithm run in parallel with a serial algorithm that retraces the data processing of the parallel algorithm in an element-by-element manner.
But even when I asked Herb to ignore memory issues (which is where I believe the greatest advantages will lie for some parallel algorithms), he remained quite adamant that superlinearity is possible. How so? Well, Herb mentioned things like algorithm complexity, and the known or unknown characteristics of the data. As I pondered this, Herb asked that I re-look at the last section of his "Going Superlinear" article. Here we find (among other things) a comparison between a simple parallel algorithm and the serial algorithm that retraces the parallel algorithm's data processing element-by-element. The serial algorithm, in this case, is:
more complex. It has to do more bookkeeping than a simple linear traversal. This additional work can be a small additional source of performance overhead ...
This is indeed correct. The "bread-slicing" example in my "Superlinearity Is Impossible" post would involve slightly more complicated programming than the simple incremented loop that could be easily parallelized using something like TBB.
Here's Herb's conclusion about the serial algorithm that mimics the parallel processing and the parallel algorithm:
when we're comparing the proposed algorithm with simple parallel search, we're not really comparing apples with apples. We are comparing:
- A complex sequential algorithm that has been designed to optimize for certain expected data distributions, and
- A simple parallel algorithm that doesn't make assumptions about distributions, works well for a wide range of distributions, and naturally takes advantage of the special ones the optimized one is trying to exploit ...
So, is superlinearity possible?
Have I been proven wrong? I don't think so. When we talk about "superlinear" speedups, I think it all comes down to a matter of definition: what do we each mean by "superlinear performance"? The comments posted to my "Superlinearity Is Impossible" post revealed that different people think about this quite differently.
My background is physics, and mathematical modeling and simulation. I'm accustomed to thinking in terms of an ideal world that doesn't really exist when I think of equations (a world full of infinitely-long wires, which can be approximated as having an infinitesimal thickness, etc.). I view Amdahl's Law as existing in this type of realm. Hence, I asked Herb to pretend memory access is instantaneous as we discussed whether or not superlinearity is possible.
In this "purist" theoretical way of looking at superlinearity and Amdahl's Law, I still consider superlinearity impossible. I view Amdahl's Law as akin to the "law" of gravity. Just because I see an airplane go up into the air doesn't mean the law of gravity has been broken. Other factors were involved.
The fact that duplication of the parallel algorithm's element-by-element processing can require a more complex serial algorithm means you really aren't comparing apples with apples, as Herb says. Nor can you; this situation seems inescapable.
Indeed, I ran several actual tests tonight on my quad-core system, writing some very simple programs where I eliminated advantages due to memory cache as much as possible. I found many cases where the more complex serial algorithm was slower than the simple algorithm that would have been parallelized; but surprisingly, sometimes my compiler's optimization actually made the more complex serial algorithm run faster than it's simpler cousin! If I turned the optimizer off, then the more complex algorithm always took more time to complete.
So, this is a complex realm. Looking at it all through a practical (i.e., non-purist, not rigidly theoretical) lens, I certainly agree that there are software engineering techniques which, when applied to software that is intended for a parallel environment, will produce programs that will complete their tasks in an amount of time that implies superlinear performance when compared with an equivalent serial algorithm. I do not, however, believe that this means that the computational limits represented by Amdahl's Law are being violated.
The implications for programming education
All this complexity implies that parallel programming might be best considered as being its own unique realm when it comes to performance optimization techniques. Even discounting the standard problems involving thread-safety, race conditions, deadlocks, etc., optimized parallel programming is a very different creature from optimized serial programming. To consider parallel programming as being a "next natural stage" that is somehow in sync with the serial programming most developers already understand is quite possibly a mistake, in varying ways. Software performance optimization within the parallel realm can be very different, as Herb Sutter and other are showing.
So, once again: is superlinear performance through parallelization possible? Can Amdahl's Law be broken? You tell me...
Kevin Farnham, O'Reilly Media TBB Open Source Community, Freenode IRC #tbb, TBB Mailing Lists
| March 20, 2008 5:05 AM PDT
knut Grunwald |
I am a computer scientist. Superlinearity is impossible, if you look at the algorithms. But if you have a real machine, you add registers, cache and memory access paths. This will result in an acceleration, which may make the outcome look like superlinearity. In fact it's only the constant factor of the algorithm complexity, like better optimization or a faster machine. Using parallel algorithms on a single core, multithreaded processor may improve the runtime of a program, by making better use of the processors capabilities. So if you talk about algorithms, superlinearity is impossible. If you talk about a realization, you may achieve a more than count of threads speedup by more efficient use of the underlying hardware. |
| March 20, 2008 9:32 AM PDT
Jules | I'm not a computer scientist or physicist, but it's quite obvious to me that superlinearity is impossible. You can easily simulate a parallel processor on a sequential one such that the slowdown is linear. QED. |
| March 21, 2008 3:52 AM PDT
jmuffat
| The mention of the plane is where I've loved this post! It isn't important if we break gravity or not : what matters is that we travel the sky... |
| March 21, 2008 5:04 AM PDT
jseigh |
Given some set of inputs each of which has a certain probability of occurring, if you sum those inputs multiplied times their probability times the minimum execution time of some set of algorithms and that sum is less than that for any single algorithm, then the former is superlinear. for all i SUM i * p(i) * MIN(g(i), h(i), ...) if less than for all i, f in (g, h, ...) SUM i * p(i) * f(i) then superlinear We're assuming processors are ubiquitous otherwise factor by the amount of reduced resources each algorithm instance gets. Hardware has sort of been doing this for ages. Look at how virtual address translation is done, TLB and page table lookups in parallel. |

mikedeskevich