# Superlinearity and Algorithmic Complexity; or, My Interesting Conversation with Herb Sutter

In my recent "Superlinearity Is Impossible; We Just Don't Always Think Correctly" I argued that algorithmic processing superlinearity is impossible. It might appear that a parallel application was achieving superlinearity, but that appearance was due to factors other than the algorithm itself engendering a more efficient computation. For example, memory access might be more efficient in the parallel algorithm. But this would not mean the limits defined by Amdahl's Law had been surpassed. I agreed with the statement Herb Sutter makes in his "Going Superlinear" article in the February Dr. Dobb's Journal:
"But wait," someone could complain, "your example so far is unfair because you've stacked the deck. The truth is that, when you find a superlinear speedup, what you've really found is an inefficiency in the sequential algorithm."

When parallelism is simpler

I had the good fortune to be able to speak with Herb Sutter today about these issues. We found ourselves in agreement in most areas. We agree, for example, that memory issues can make an enormous performance difference when you compare an algorithm run in parallel with a serial algorithm that retraces the data processing of the parallel algorithm in an element-by-element manner.

But even when I asked Herb to ignore memory issues (which is where I believe the greatest advantages will lie for some parallel algorithms), he remained quite adamant that superlinearity is possible. How so? Well, Herb mentioned things like algorithm complexity, and the known or unknown characteristics of the data. As I pondered this, Herb asked that I re-look at the last section of his "Going Superlinear" article. Here we find (among other things) a comparison between a simple parallel algorithm and the serial algorithm that retraces the parallel algorithm's data processing element-by-element. The serial algorithm, in this case, is:
more complex. It has to do more bookkeeping than a simple linear traversal. This additional work can be a small additional source of performance overhead ...

This is indeed correct. The "bread-slicing" example in my "Superlinearity Is Impossible" post would involve slightly more complicated programming than the simple incremented loop that could be easily parallelized using something like TBB.

Here's Herb's conclusion about the serial algorithm that mimics the parallel processing and the parallel algorithm:
when we're comparing the proposed algorithm with simple parallel search, we're not really comparing apples with apples. We are comparing:

• A complex sequential algorithm that has been designed to optimize for certain expected data distributions, and

• A simple parallel algorithm that doesn't make assumptions about distributions, works well for a wide range of distributions, and naturally takes advantage of the special ones the optimized one is trying to exploit ...

So, is superlinearity possible?

Have I been proven wrong? I don't think so. When we talk about "superlinear" speedups, I think it all comes down to a matter of definition: what do we each mean by "superlinear performance"? The comments posted to my "Superlinearity Is Impossible" post revealed that different people think about this quite differently.

My background is physics, and mathematical modeling and simulation. I'm accustomed to thinking in terms of an ideal world that doesn't really exist when I think of equations (a world full of infinitely-long wires, which can be approximated as having an infinitesimal thickness, etc.). I view Amdahl's Law as existing in this type of realm. Hence, I asked Herb to pretend memory access is instantaneous as we discussed whether or not superlinearity is possible.

In this "purist" theoretical way of looking at superlinearity and Amdahl's Law, I still consider superlinearity impossible. I view Amdahl's Law as akin to the "law" of gravity. Just because I see an airplane go up into the air doesn't mean the law of gravity has been broken. Other factors were involved.

The fact that duplication of the parallel algorithm's element-by-element processing can require a more complex serial algorithm means you really aren't comparing apples with apples, as Herb says. Nor can you; this situation seems inescapable.

Indeed, I ran several actual tests tonight on my quad-core system, writing some very simple programs where I eliminated advantages due to memory cache as much as possible. I found many cases where the more complex serial algorithm was slower than the simple algorithm that would have been parallelized; but surprisingly, sometimes my compiler's optimization actually made the more complex serial algorithm run faster than it's simpler cousin! If I turned the optimizer off, then the more complex algorithm always took more time to complete.

So, this is a complex realm. Looking at it all through a practical (i.e., non-purist, not rigidly theoretical) lens, I certainly agree that there are software engineering techniques which, when applied to software that is intended for a parallel environment, will produce programs that will complete their tasks in an amount of time that implies superlinear performance when compared with an equivalent serial algorithm. I do not, however, believe that this means that the computational limits represented by Amdahl's Law are being violated.

The implications for programming education

All this complexity implies that parallel programming might be best considered as being its own unique realm when it comes to performance optimization techniques. Even discounting the standard problems involving thread-safety, race conditions, deadlocks, etc., optimized parallel programming is a very different creature from optimized serial programming. To consider parallel programming as being a "next natural stage" that is somehow in sync with the serial programming most developers already understand is quite possibly a mistake, in varying ways. Software performance optimization within the parallel realm can be very different, as Herb Sutter and other are showing.

So, once again: is superlinear performance through parallelization possible? Can Amdahl's Law be broken? You tell me...

Kevin Farnham, O'Reilly Media TBB Open Source Community, Freenode IRC #tbb, TBB Mailing Lists