This document describes the process for taking the Intel® Xeon Phi™ processor from the point where the hardware has been received up to the point where the processor is ready to be used by the programmer.
This document does:
- Provide a high level overview of the architecture of the processor, focusing on those parts of the architecture that differ from the other Intel® processors.
- Provide configuration options specific to the Intel Xeon Phi processor.
This document does not:
- Provide information on basic system administration.
- Provide information on optimizing code for the Intel Xeon Phi processor.
- Intel® 64 and IA-32 Architectures Software Developer Manuals: Documents the model-specific registers (MSRs) for the Intel® Xeon Phi™ x200 product family; documentation for the instructions specific to the processor will be added in a later revision.
- Intel Xeon Phi Processor Software Optimization Manual: Documents important features of the Intel Xeon Phi x200 product family and how to take advantage of them.
- Intel Xeon Phi Processor Performance Monitoring Reference Manual: Documents the performance monitoring registers and events for the Intel Xeon Phi processor.
- Intel Xeon Phi Processor Software: A set of software and utilities that enable functionalities of the Intel Xeon Phi x200 product family.
- Intel Xeon Phi x200 Product Family “micperf” User Guide: Included with the Intel Xeon Phi Processor Software release.
- Intel Xeon Phi Processor Software User’s Guide: Included with the Intel Xeon Phi Processor Software release.
- Memkind and jemalloc libraries and documentation project: Source code and documentation for memkind and jemalloc libraries.
- Intel Xeon Phi Product Family
- Intel Xeon Phi Processor SKU Details
Basic system architecture
The Intel® Xeon Phi™ x200 product family is the second-generation Intel Xeon Phi product. It is a many-core processor based on modern Intel Atom® microarchitecture with considerable High Performance Computing (HPC)-focused improvements. As shown in Figure 1, it has a maximum of 72 cores with 4 threads per core, giving a total of 288 CPUs as viewed by the operating system. The cores are laid out in units called tiles. Each tile contains a pair of cores, a shared L2 cache, and a hub connecting the tile to the interprocessor interconnect.
Figure 1. Intel® Xeon Phi™ processor architecture.
Major architectural innovations include the addition of on-package MCDRAM and clustering modes.
MCDRAM is high-bandwidth memory located in the same package as the processor. It can be configured in one of three modes: cache mode, flat mode, or hybrid mode. In cache mode, the MCDRAM is used as an L3 cache; in flat mode, it is used as additional addressable memory; in hybrid mode, a portion of each unit of MCDRAM is used as L3 cache with the remainder being used as additional addressable memory. The MCDRAM configuration is set at boot time and cannot be changed without a reboot.
When all or part of the MCDRAM is used as an L3 in-memory cache, access to the cache is transparent to software and requires no code modifications on the part of the user. When all or part of the MCDRAM is used as flat memory, it can be used transparently, as an extension of the DDR memory address space and/or non-transparently, by explicitly allocating space on the MCDRAM using hbwmalloc. If used transparently, there is no automatic performance gain. To get performance gains, software must be aware of the increased bandwidth of MCDRAM and use it effectively for bandwidth-critical data structures.
Clustering refers to dividing the available cores into contiguous blocks of cores, called clusters. The clustering mode affects the memory latency between the tiles and MCDRAM, and therefore affects performance. The three clustering modes are: Quadrant, Sub-NUMA, and All-to-All.
The default clustering mode is Quadrant. Quadrant mode, which divides the cores into four sections called quadrants and attempts to decrease intra-process communication time by keeping all threads of a single process close together, provides good overall performance and is transparent to the programmer. Sub-NUMA clustering mode, which attempts to increase memory performance by keeping shared memory accesses to MCDRAM closer to the quadrant where the request originated, offers the possibility of greater performance but requires software redesign to achieve it. All-to-All mode does not divide the cores into multiple clusters and is generally used only as a fail-safe mode.
The cluster mode is set at boot time and cannot be changed without rebooting. Both Quadrant and Sub-NUMA clustering modes require that the same number of equal capacity DIMMs be installed on each memory controller; if this condition is not met, All-to-All mode is automatically selected as a fall back.
New Instruction Set
The Intel Xeon Phi x200 product family uses the standard Intel® Architecture (IA) Instruction Set Architecture (ISA) that is similar to other Intel® processors, including 256-bit Intel® Advanced Vector Extensions (Intel® AVX) and Intel® Advanced Vector Extensions 2 (Intel® AVX2).
The Intel Xeon Phi x200 product family also supports Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructions. Each core has two vector processing units (VPUs) that operate on 512-bit vector registers. Four subsets of Intel AVX-512 instructions are available in the Intel Xeon Phi processor:
- AVX512-F: Fundamental instruction set
- AVX512-CD: Conflict Detection instruction set
- AVX512-ER: Exponential and Reciprocal instruction set
- AVX512-PF: Prefetch instruction set
Installing the Software Stack
The Intel Xeon Phi processor should be able to use any operating system that executes on other Intel processors, as long as that operating system supports Intel AVX-512 registers. Table 1 lists some of the latest operating systems that can run on the Intel Xeon Phi processor.
|Operating System||Enabling Status|
|CentOS* 7.3 64-bit||kernel 3.10.0-514|
|CentOS 7.2 64-bit||kernel 3.10.0-327|
|Red Hat* Enterprise Linux* Server 7.3 64-bit||kernel 3.10.0-514|
|Red Hat Enterprise Linux Server 7.2 64-bit||kernel 3.10.0-327|
|SUSE* Linux Enterprise Server SLES 12 SP2||kernel 4.4.21-69-default|
|SUSE Linux Enterprise Server SLES 12 SP1||kernel 3.12.49-11-default|
Table 1.Operating system support for the Intel® Xeon Phi™ processor.
Although the processor should be able to run any operating system that supports the IA ISA containing Intel AVX-512 registers, Intel validates against a limited number of operating systems. The initial validation is being done against Red Hat* Enterprise Linux* (RHEL) 7, CentOS* 7, and SUSE* Linux Enterprise Server (SLES) 12. As delivered, these distributions do not support the Intel Xeon Phi processor but require patches that can be obtained from Intel Xeon Phi Processor Software. From the Intel® Developer Zone page for the Intel Xeon Phi Processor, under “Software and Tools,” select “Intel Xeon Phi Processor Software”. Depending on the operating system installed with your processor, download and install one of the following supported operating systems specified in Table 1.
After installing the software stack, you can configure the Cluster and Memory Modes on your system. Please refer to the following article for information related to configuring Cluster and Memory modes supported by the Intel Xeon Phi processor.
Intel provides the micperf tool for monitoring and evaluating the Intel Xeon Phi processor. micperf is designed to incorporate a variety of benchmarks into a simple user experience with a single interface for execution.
This tool and its documentation can be obtained from the Intel Xeon Phi Processor Software page. The README and User’s Guide of micperf can also found in
Since the Intel Xeon Phi processor is an IA ISA, it is able to run other tools available from Intel.
In addition to the operating system, the OpenFabrics* Enterprise Distribution (OFED) software should be installed if a high-performance network is utilized. For the Intel Xeon Phi processor without an integrated fabric interface, any version of this software supported by a normal Intel Xeon processor should be usable, including OpenFabrics, Mellanox*, and Intel® True Scale Fabric. For Intel Xeon Phi processors with an integrated Intel® Omni-Path Fabric (Intel® OP Fabric) interface, use the Intel® Omni-Path Software (Intel® OP Software) instead. Intel OP Software can be downloaded from https://downloadcenter.intel.com/search?keyword=Omni-Path.
The user environment is, in part, dictated by the system administrator’s choice of operating system.
Unlike the first-generation Intel Xeon Phi coprocessor, for which Intel provided a minimal user environment as part of the MPSS, Intel provides no user environment for the latest generation processor. Administrators can install an environment with as many features as they choose.
Also, unlike the coprocessor, all tools will run natively on the processor. Running the compilers natively removes the complications that previously occurred when attempting to build third-party and open source software that relied on configure scripts to determine the architecture and available compilers.
Intel® Parallel Studio XE
The following Intel® products support—or will support, in the case of tools not yet released—program development on the Intel Xeon Phi processor:
- Intel® C Compiler/Intel® C++ Compiler/Intel® Fortran Compiler
- Intel® Math Kernel Library (Intel® MKL)
- Intel® Data Analytics Acceleration Library (Intel® DAAL)
- Intel® Integrated Performance Primitives (Intel® IPP)
- Intel® Cilk™ Plus
- Intel® Threading Building Blocks (Intel® TBB)
- Intel® VTune™ Amplifier XE
- Intel® Advisor XE
- Intel® Inspector XE
- Intel® MPI Library
- Intel® Trace Analyzer and Collector
- Intel® Cluster Ready
- Intel® Cluster Checker
Open Source Tools
Support for the processor is included in the mainline for GDB 7.12 (included in Intel® Parallel Studio XE 2018).
Programmers should note three major changes: the MCDRAM on-package high-bandwidth memory, the new clustering modes, and the new Intel AVX-512 instructions. This section provides a very brief description of the unique Intel software and libraries to facilitate the use of MCDRAM and clustering modes.
For more specific information on these three topics, see the pre-release version of Intel 64 and IA-32 Architectures Software Developer Manuals.
To simplify the use of both MCDRAM and the new clustering modes, Intel is working with the Open Source community to develop the hbw_malloc library. Figure 2 shows the syntax for calling this library. This library is based on the jemalloc and memkind APIs and libraries. It provides a simple way to exploit both new capabilities by simply replacing a program’s malloc() calls with hbw_malloc() calls in C/C++ and the FASTMEMORY directive for Fortran.
Figure 2: High Bandwidth malloc (hbwmalloc) APIs
hbwmalloc is intended to be a simple and low-cost way of allowing developers to take advantage of both MCDRAM and cluster modes. hbw_malloc() allocates memory from MCDRAM when possible. It also is aware of the clustering mode and will automatically allocate, if possible, memory that is closer to the tile with the allocating thread.
Figure 3 shows the syntax for both C/C++ and Fortran to allocate memory from MCDRAM.
Figure 3: Code snippets illustrating the use of hbwmalloc() for using MCDRAM and clustering modes
WHAT IS IT
ENABLING REQUIRED (BIOS / OS / VMM / SW)
Intel® AVX-512 supported functionality (F, CD, ER, PF)
Supports 3 types of clustering in the inter-processor mesh:
Six channels of DDR4 – Up to 384 GB
Legacy Intel® Xeon® processor compatibility
Binary compatible with legacy code for the Intel Xeon processor
Future Intel Xeon processor compatibility
The same ISA with some minor exceptions:
SW: Intel C++ and Intel Fortran Compilers 2015 (version 15.0 or later), GCC 4.9+
Languages, libraries, and tools
Currently released Intel® languages and tools recognize Intel AVX-512 instructions
SW: Intel® Parallel Studio XE 2015 Update 2, GCC 4.9+
Open Source support
GCC generates Intel AVX-512 and other Intel AVX-512 extensions for KNL; GDB generates Intel AVX-512 and other Intel AVX-512 extensions for KNL
GCC 4.9+/GDB 7.8.1+
OS support for Intel Xeon Phi processor
Linux kernels 3.15+
Table 2: Software feature enabling for the Intel® Xeon Phi™ processor