This article is part of the Intel® Modern Code Developer Community documentation which supports developers in leveraging application performance in code through a systematic step-by-step optimization framework methodology. This article addresses: Thread level parallelization.
This is the second article in a series of articles about High Performance Computing with the Intel Xeon Phi.
by Clay P. Breshears Parallel Applications Engineer
Vectorizing improves performance, and achieving high performance can save power. Introduction to tools for vectorizing compute-intensive processing.
Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
In this article, I discuss some common performance pitfalls in Cilk™ Plus programs that prevent users from seeing speedups in their code, and describe some techniques for avoiding these pitfalls.
Download this guide for developing multithreaded applications, which also includes general topics such as application threading and synchronization.
This article focuses on the steps to improve software performance with vectorization. Included are examples of full applications along with some simpler cases to illustrate the steps to vectorization.
Part one of a five-part series, this article teaches a methodology to interpret statistics gathered during test runs and use those interpretations to improve parallel code.
By Jim Dempsey