Intel® Developer Zone:
Intel® Xeon Phi™ Coprocessor

Productivity via architecture innovation coupled with familiar software. Intel® Xeon Phi™ coprocessor:

  • Extends hardware support to higher degrees of parallelism with power savings
  • Uses familiar and standard programming models to preserve investments
  • Shares parallel programming with general purpose processor
Getting Started
Is Intel® Xeon Phi™ coprocessor right for you?
Intel® Xeon Phi™ Coprocessor Architecture
Site maps: Administrators, Developers, Investigators
Guides & Manuals
Intel® Xeon Phi™ Coprocessor Developer’s Quick Start Guide
Intel® Xeon Phi™ Coprocessor Instruction Set Architecture Reference Manual
System Administration Guide

Parallel programming is part of the evolution to the future. Intel processors and coprocessors offer a converged method so you may use common programming models and tools.

  • Standards-driven parallel programming models that scale for today and tomorrow
  • Use established development workflows and code base to scale forward
  • Techniques benefit both processors and coprocessors thereby preserving past and future investments
Programming for Multicore and Many-core Products
Code recipes for Intel® Xeon Phi™ Coprocessor

Programming

Intel® Xeon Phi™ Coprocessor Software Developer’s Guide

Building Native Applications

Programming and Compiling

Cheatsheet: Directives and Functions

Math Kernel Library Automatic Offload

Using Intel® MPI

Using OpenMP* extensions

OpenCL* Design and Programming

System V Application Binary Interface

Differences in Floating-point Arithmetic

Run to Run Reproducibility

Power Analysis and Configuration

Migrating Fortran projects

Debugging

Debugging on Linux*

Debugging on Windows*

Optimization

Optimization – Part 1: Essentials

Optimization – Part 2: Hardware Events

Performance Monitoring Units

Loop Optimization

Best Known Methods for Performance

Software Developer Workshop Videos

A technical guide to the software development environment for the Intel® Xeon Phi™ coprocessor

Name/DescriptionProgramming LanguageUser Experience Level

BeginningSlides_ExtractedCode.zip
Samples extracted from slides for the Intel® Xeon Phi™ coprocessor Beginning Workshop including Fortran translations.

C++, Fortran Beginner

BeginningLabs_FortranVersion.zip
Lab exercises for the Intel® Xeon Phi™ coprocessor Beginning Workshop – Fortran version.

Fortran Beginner

BeginningLabs_CVersion.zip
Lab exercises for the Intel® Xeon Phi™ coprocessor Beginning Workshop – C++ version.

C/C++ Beginner

Advanced Workshop Labs
Labs covering more advanced concepts, including Intel® MKL, Intel® MPI, Debugging, Memory optimization, Tuning, and Vectorization.

C/C++, Fortran Advanced

Importance of Vectorization (Fortran example)
To get good performance out of the Intel® Many Integrated Core architecture (Intel® MIC architecture) and systems including Intel® Xeon Phi™ coprocessors, applications need to take advantage of the 16-wide SIMD registers as well as the many cores.

Fortran Advanced

Many Faces of Parallelism
This lab contains a number of examples (Riemann Sums, SGEMM, Fibonacci, Qsort, Cholesky Decomposition, Algorithm, Mandelbrot Set) detailing steps from a serial problem to a parallel solution running on the Intel® Xeon Phi™ Coprocessor.

C/C++ Advanced

Intel® SDK for OpenCL* Applications XE Samples

OpenCL Beginner, Intermediate

iXPTC 2013 Financial Services conference 

C/C++ Beginner, Intermediate

SHOC MD Lab Exercises
Using a simple implementation of an nbody pairwise computation using the Lennard-Jones Potential from molecular dynamics as an example of porting and optimizing applications.

C/C++ Beginner, Intermediate

Structured Parallel Programming: Patterns for Efficient Computation
by Michael McCool, James Reinders and Arch Robison Publication Date: July 9, 2012 | ISBN-10: 0124159931 | ISBN-13: 978-0124159938


Intel® Xeon Phi™ Coprocessor High Performance Programming
by Jim Jeffers and James Reinders – Now available!


Parallel Programming and Optimization with Intel® Xeon Phi™ Coprocessors
by Colfax International


Intel® Xeon Phi™ Coprocessor Architecture and Tools - The Guide for Application Developers
by Reza Rahman

This article contains a growing compendium of commonly accessible or downloadable code that can be run on Intel® Xeon Phi™ Coprocessors.

If you have completed an upstream promotion of a community code, please post a thread on the Intel® Many Integrated Core Architecture Forum to let us know, so that we can update this list.

Latest changes:  4/14/2014 -- Added recipe for tHogbom Clean; 3/20/2014 -- Added recipe for running WRF with conus2.5km benchmark in Symmetric mode

Code (in Alphabetical order) Description Segment Where to download Install recipe (if needed)
Embree Collection of high-performance ray tracing kernels developed at Intel Digital Content Creation http://embree.github.io See this recipe for how to demonstrate the rendition of the Crown model on Intel(R) Xeon Phi(tm) Coprocessor
GEMM, STREAM, Linpack GEMM and Linpack both exercise basic dense matrix operations targeting floating point performance on the coprocessor. STREAM is a test of memory bandwidth targeting GDDR memory performance. Academic These benchmarks can be obtained when downloading the Intel® MPSS-- these are included in optionally installed performance packages that will put the benchmarks and related documentation into /opt/intel/mic/perf on the 2.x version of the MPSS  or  /usr/share/micperf if using an MPSS 3.1.* release

Intel® MPSS 2.1 users:  Follow guidance from Chapter 5 of the Intel® MPSS Readme on installation and configuration.

Intel® MPSS 3.1 users:   Follow guidance from Chapter 4 of the MPSS_Users_Guide on installation and configuration

For STREAM, if you prefer to download source yourself, compilation and optimization recipe is here

GTC-P (Gyrokinetic Toroidal Code - Princeton) The gyrokinetic toroidal code (GTC) is a massively parallel, particle-in-cell code for turbulence simulation in support of the burning plasma experiment , the crucial next step in the quest for fusion energy. This is a 2D domain decomposition version of the GTC global gyrokinetic PIC code for studying microturbulent core transport. Academia Submit a request for the code here.

Follow instructions here to build and run

LBS3D Simulation tools for multiphase flows based on the free energy Lattice Boltzmann Method (LBM), important for Computational Fluid Dynamics. The code allows for the simulation of quasi-incompressible two-phase flows, and uses multiphase models that allow for large density ratios. Manufacturing mplabs

Follow compilation instructions here

(Also reference whitepaper)

Mantevo MiniFE

Self-contained, stand-alone mini-application which encapsulates the most significant performance characteristics (generation, assembly, solution) of an implicit finite element method application in C++ code. The physical domain is a three-dimensional box modeled by hexahedral elements (sometimes called "brick" elements). The box is discretized as a structured grid but treated as unstructured. The domain is decomposed for parallel execution using recursive coordinate bisection (RCB).

Academic mantevo.org > Download

Follow guidance from this MiniFE Case Study to understand what flags/options to use to run MiniFE on host, coprocessor, or both

MPI-HMMER

A version of HMMER, a hidden Markov model for analyzing protein sequences. In this version, two routines, hmmsearch and hmmpfam have been modified to use MPI for parallelism.

Academic

http://mpihmmer.org

See this recipe for compilation and optimization
SHOC

The Scalable Heterogeneous Computing Benchmark Suite (SHOC GitHub) may be used for measuring performance and stability of Coprocessor based systems. The benchmark has been ported to support Intel® Xeon Phi™ using offload programming constructs implemented in the Intel® Compiler that is available as part of Intel® Composer XE 2013 package.

Academic

GitHub

See this recipe for configuration and compilation
tHogbom Clean

Benchmark that implements the kernel of the Hogbom Clean deconvolution algorithm.  Part of the ASKAP benchmark package, used to benchmark platforms for the Australian SKA Pathfinder (ASKAP) Science Data Processor

Astronomy, Academic

GitHub

See this recipe for configuration and compilation
WRF The Weather Research and Forecasting (WRF) model is a numerical weather prediction system designed to serve atmospheric research and operational forecasting needs. WRF is used by academic atmospheric scientists, forecast teams at operational centers, application scientists, etc. Please see http://www.wrf-model.org/index.php for more details about WRF. Weather, Academic WRF Users Page

See this recipe for configuration and compilation

See this recipe for running the conus2.5km benchmark in Symmetric mode

  • Intel® Many Integrated Core Architecture
  • This is a compendium of success stories and publications that we will add to periodically as we hear about how Intel® Xeon Phi™ Coprocessor technology was applied to solve particular problems, or how performance was obtained. Most of these are external publications.



    General


      

    Computational Physics and Astrophysics


    Energy


    Financial Services


    Health and Life Sciences


    Molecular Dynamics


    Weather and Climate Forecasting


    Do you have a success story to share? Please let us know by posting a topic in our Community Forum with details about where to find it, and what industry/topic area it targets (if specific to a particular line of business)

     

     

  • Intel® Many Integrated Core Architecture
  • Server
  • Server
  • Latest Posts

    List of Useful Power and Power Management Articles, Blogs and References
    By Taylor Kidd (Intel)Posted 04/17/20142
    INTRODUCTION AND PURPOSE: This article endeavors to provide a single point of reference to Power Management blogs, articles and other resources relevant to the Intel® Xeon Phi™ coprocessor. There are many excellent resources out there on power, power management and tools; this article cannot ho...
    Power Management States: P-States, C-States, and Package C-States
    By Taylor Kidd (Intel)Posted 04/17/20140
    (For a PDF version of this article, download the attachment.) Contents Preface: What, Why and from Where. 1 Chapter 1: Introduction and inquiring minds. 2 Chapter 2: P-States, Reducing power consumption without impacting performance. 3 Chapter 3: Core C-States, The Details. 5 Chapter 4: ...
    Resolving Symbols for Intel® Manycore Platform System Stack (Intel® MPSS) in Intel® VTune™ Amplifier XE Analysis
    By Sumedh Naik (Intel)Posted 04/09/20140
    Background Whenever Intel VTune Amplifier XE is unable to resolve symbols for libraries or the operating system, it lumps all the counts for that module together. Often, these lumped counts end up at the top of the hotspot list, skewing the analysis. By setting the correct search library path in...
    Recipe: Building and Optimizing the Hogbom Clean Benchmark for Intel® Xeon Phi™ Coprocessors
    By Sumedh Naik (Intel)Posted 04/09/20140
    Overview This article provides a recipe for compiling and running the Hogbom Clean benchmark for the Intel® Xeon Phi™ coprocessor and discusses the various optimizations applied to the code.  Introduction Hogbom Clean is a part of the ASKAP benchmark package. The ASKAP benchmark package is use...

    Pages

    Subscribe to
    Intel(R) Xeon Phi(tm) Coprocessor -- Cluster training - call for demand!
    By BELINDA L. (Intel)16
    Intel is evaluating to offer a 4 hour web-based basic tutorial covering the fundamental principles of how to integrate an Intel Xeon Phi coprocessor into a Linux based cluster. During the course each attendant would have remote access to a Linux server and be able to do each step as shown in the outline below.  The course will be given free of charge. Requirements are an Internet connection, a web browser, and Putty.   We are settling on the sharing technology we will be using, and will publish that at a later date. If you are interested in such an offer please reply to this forum thread -- you have the ability to reply privately, if you don't want to be identified. If we have enough interest, we'll pull it together! Topics: Finding information on Intel Xeon Phi coprocessor on the web Download the driver software Unpacking the driver software package, explanation of components Discussion on prerequisites of the compute server (for instance what software needs to be installed, reserv...
    Invitation to evaluate Intel® MKL Sparse Matrix Vector Multiply Format Prototype Package for Intel® Xeon Phi™ coprocessors
    By Zhang Z (Intel)0
    We are seeking interested parties to evaluate Intel® MKL SpMV Format Prototype Package for Intel® Xeon Phi™ coprocessors. Sparse Matrix Vector Multiply (SpMV) is an important operation in many scientific applications, and its performance can be a critical part of overall application performance. On Intel® Xeon Phi™ coprocessors, Intel® MKL 11.0 and later provide highly-tuned SpMV kernels for the compressed sparse row (CSR) sparse matrix storage format. But the existing standard (NIST*) sparse BLAS interface has limitations that prevent us from realizing  further performance improvements, especially for matrices with non-uniform sparsity structures. The Intel® MKL SpMV Format Prototype Package tries to address these limitations by introducing a new interface that supports a staged approach: First, the input matrix structure is analyzed and an appropriate computational kernel and workload balancing algorithm are chosen. Then, repeated SpMV calls can be made for matrices of the same st...
    MICRAS Log User Guide
    By MARC B. (Intel)0
    The attached document describes how to interpret the messages in the micras.log file.
    Flash Issues & Remedies
    By MARC B. (Intel)0
    The attached document describes some common issues and questions that have been reported and how they might be addressed.
    New Tools: Simple Performance Tools for the Intel® Xeon® processor line and the Intel® Xeon Phi™ coprocessor
    By Sumedh Naik (Intel)0
    Larry Meadows from Intel Corporation has developed two simple tools for the Intel® Xeon® processor line as well as the Intel® Xeon Phi™ coprocessor that allow a user to determine how well their application is using the machine. Speedometer: Speedometer measures the resource usage of a system while running an application and reports that usage as a percentage of the peak value of the corresponding resource. The resources that are tracked include memory bandwidth, instruction bandwidth, and vector or floating-point unit use. Average values for each resource are reported after the program executes. It is also possible to record the resource usage over time, and GUI tools are provided to plot such recordings. Speedometer is intended to give you a general idea of how well your code is using the system. Overhead: Overhead uses statistical profiling to determine how the application's CPU time is allocated. The hardware periodically interrupts the application and saves the current instruc...
    Troubleshooting HOWTO: Bad hardware? MPSS? Configuration?
    By BELINDA L. (Intel)32
    Are you having problems with your hardware (Cannot see your Intel(R) Xeon Phi(tm) coprocessor?  Sporadic accessibility?) or with the Intel(R) Manycore Platform Software Stack (Intel(R) MPSS) running reliably? Attached to this post is a PDF "flowchart" that explains how you can troubleshoot the problem (note:  this applies if you are running the Linux* operating system on your host), and shows what information you will want to collect if you need to escalate your issue to your OEM provider or Intel. We hope this is is useful to you!   Please let us know if you have found a boundary condition not comprehended properly by this "flow".
    What collateral/documentation do you want to see?
    By BELINDA L. (Intel)49
    Do you have questions that you are not finding the answers for in our documentation?  Need more training, source code examples, on what specifically?   Help us understand what's missing so that we can make sure we develop documentation you care about (what is important, and what is nice to have)!   Thank you
    FAQS: Compilers, Libraries, Performance, Profiling and Optimization.
    By Sumedh Naik (Intel)6
    In the period prior to the launch of Intel® Xeon Phi™ coprocessor, Intel collected questions from developers who had been involved in pilot testing. This document contains some of the most common questions asked. Additional information and Best-Known-Methods for the Intel Xeon Phi coprocessor can be found here. The Intel® Compiler reference guides can be found at: C/C++: http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/cpp-lin/index.htm Fortran: http://software.intel.com/sites/products/documentation/doclib/stdxe/2013/composerxe/compiler/fortran-lin/index.htm Addendum: http://software.intel.com/sites/default/files/article/327178/intelmpi4.1-releasenotes-linux-addendum-for-mic.pdf The Intel® Math Kernel Libraries (Intel® MKL) reference guide can be found at: http://software.intel.com/sites/products/documentation/doclib/mkl_sa/11/mkl_userguide_lnx/index.htm ______________________________________________________________________________________...

    Pages

    Subscribe to Forums
    Performance BKMs: Introduction and Super-secret Intel Tools
    By Taylor Kidd (Intel)Posted 03/27/20140
    At SC13 (Super Computing 2013)*, someone commented that Intel seems to have some super-secret set of tricks in its pocket, allowing us to optimize “far beyond those of mortal man”+. We don’t really have any super-secret tricks. Even if we did, we wouldn’t use them. We want mortal man (you) to be ...
    Notification: Update to Resource Guides for Developer and Administrator published
    By Taylor Kidd (Intel)Posted 03/26/20140
    Hi all, I just wanted to let whoever is listening that I just published updates to the Resource Guide for Intel® Xeon Phi™ Coprocessor Developers and Resource Guide for Intel® Xeon Phi™ Coprocessor Administrators documents. -- Taylor  
    The Chronicles of Phi - part 5 - Plesiochronous phasing barrier – tiled_HT3
    By jimdempseyatthecovePosted 03/25/20141
    For the next optimization, I knew what I wanted to do; I just didn’t know what to call it. In looking for words that describes loosely-synchronous, I came across plesiochronous: In telecommunications, a plesiochronous system is one where different parts of the system are almost, but not quite, p...
    BKMs on the use of the SIMD directive
    By Taylor Kidd (Intel)Posted 03/25/20140
    We had an ask from one of the various “Birds of a Feather” meetings Intel® holds at venues such as at the Super Computing* (SC) and International Super Computing* (ISC) conferences. The customer wanted to know BKMs (Best Known Methods) on the proper usage of the new OpenMP* 4.0 / Intel® Cilk™ Plu...

    Pages

    Subscribe to Intel Developer Zone Blogs