This is the first article in a series of articles about High Performance Computing with the Intel Xeon Phi. The Intel Xeon Phi is the first commercial product of Intel to incorporate the Many Integrated Core architecture. In this article I will present the basics of the Xeon Phi architecture, the programming models and what we can do to measure the performance in cycles for micro benchmarks.
Answers for the questions raised during the April session of our Introduction to High Performance Application Development for Intel® Xeon® & Intel® Xeon Phi™ processors class have been assembled. There were some duplicates and other questions we couldn't decipher, either because of the wording or because of implied context that was not spelled out. We tried to address the rest, which appear below:
Typical reductions in OpenMP* involve using a associative operator op to do local reductions, and then using a reduction clause to collect those local reductions. For example, the following code computes a dot product by computing local sums on each thread and then summing them.
The code used in examples (Chapters 2-4) in our book can be downloaded from the book's website. We appreciate attribution, but there are no restrictions on use of the code - please use and enjoy! You can use the step by step instructions in the book or if you prefer we've included a Makefile for each of the chapter examples to make life a little easier.
This document describes a set of source code examples that are available as part of the Intel® Composer 2013 package. These examples demonstrate the basic concepts of offload programming for Intel® Xeon Phi™ coprocessors and are installed to the following default location on Linux*. The default value for <install_dir> is /opt/intel/composer_xe_2013.
Fortran code examples: <install-dir>/Samples/en_US/Fortran/mic_samples/LEO_Fortran_intro/
C code examples: <install_dir>/Samples/en_US/C++/mic_samples/intro_sampleC/