To improve the performance of applications and kernels we are constantly on the search for novel Best Known Methods or BKMs, but as our searches grow more esoteric, it is important to keep in mind the basics and how many performance improvements rely on them. This article will describe some common BKMs for improving parallel performance and show their application over this spectrum of processor architectures. The advice collected here should help you speed up your code, whether running on an Intel® Xeon Phi™ coprocessor or an Intel Xeon process
The lecture given here is the eleventh (and penultimate) part in the “Introduction to Parallel Programming” video series. This part starts by explaining why less than optimal serial algorithms can be easier to parallelize. The concepts of temporal and data locality are defined and why maximizing these within parallel programs will pay off in performance dividends. The latter part of the lecture demonstrates how loop fusion, loop fission, and loop inversion can be used to create or improve opportunities for parallel execution.
The lecture given here is the tenth part in the “Introduction to Parallel Programming” video series. This part offers definitions for the performance metrics speedup and efficiency. A fence painting example is used to illustrate how to compute these metrics. Use of Amdahl’s Law to predict maximum speedup is explained along with the derivation of the model. Explanations of why Amdahl’s Law is overly optimistic in the prediction of possible speedup are given, as well.
Running time: 15:03