Don't worry about sublinear speedup, be happy!

Beginning programmers often expect linear speedup in parallel programs, thinking two cores should be twice as fast as one. Of course, this is usually not the case. Limitations such as serial portions, bus bandwidth, cache, communication etc. start to kick in.

For Amdahl's law kicks in, the answer is the Gustafson-Basis Law http://en.wikipedia.org/wiki/Gustafson's_Law - more cores let you handle a large problem. Bus bandwidth, cache limitations, and communication limits can be helped with by improving decompositions. But ultimately most programs do reach a limit, and unless the first limit reached is CPU speed, the speedup will be sublinear in the number of processors (P).

But here's the "be happy" part. If the CPU vendors kept on building more and more complex (and power hungry) serial processors, we would need a lot more power. The relationship is cubic. See, for example, the top slide on page 2 of http://www.netlib.org/utk/people/JackDongarra/SLIDES/dongarra-cetraro-2006.pdf . That requires an 8x power to achieve 2x speedup, and 64x power to get 4x speedup.

In contrast, power for multi-core is proportional to the number of cores (modulo cache power). So if you get "only" 2x speedup on 4 cores, or "only" 4x speedup on 16 cores, you are way ahead of the alternative history where serial processors continued to roam the earth.

Thus sqrt(P) speedup is not bad. Anything more is a free bonus.

有关编译器优化的更完整信息,请参阅优化通知