The Intel® Xeon® processor E7 V2 family, codenamed “Ivy Bridge EX”, is a 2, 4 or 8-socket platform based on Intel’s most recent microarchitecture. Ivy Bridge is the 22-nanometer shrink of the “Sandy Bridge” microarchitecture. This product brings additional capabilities for data centers: more cores, more memory bandwidth and extended Reliability, Availability and Serviceability (RAS) features. As a result, platforms based on this product family can yield up to 2X improvement in performance compared to the previous generation Intel Xeon processor E7 family. Additional features introduced (such as Intel® Advanced Vector Extensions (Intel® AVX), Intel® Secure Key, and RAS features) provide opportunities to create faster, more secure, and more resilient applications.
The Intel Xeon processor E7 V2 family is based on Ivy Bridge EX microarchitecture, an enhanced version of the Sandy Bridge microarchitecture. The platform supporting the Intel Xeon processor E7 V2 family is named “Brickland” This paper discusses the new features available compared to the previous generation Intel Xeon processor E7 family. Each section includes information about what developers need to do to take advantage of new features for improving application performance, security and reliability.
Some of the new features that come with the Intel Xeon processor E7 V2 family include:
Figure 1. The Intel® Xeon® processor E7-4800 V2 product family Microarchitecture
Figure 1 shows a block diagram of the 4-socket Intel Xeon processor E7-4800 V2 family microarchitecture. All processors in the family have up to 15 cores (compared to 10 cores in its predecessor), which bring additional computing power to the table. They also have 25% additional cache (37.5 MB), higher memory capacity and bandwidth. With the 22-nm process technology, the Intel Xeon processor E7 V2 family consumes less power, during idle periods, compared to its predecessor platform.
Table 1 shows a comparison of the Intel Xeon processor E7-4800 V2 product family features compared to its predecessor, the Intel Xeon processor E7-4800.
Table 1. Comparison of the Intel® Xeon® processor E7–4800 product family to the Intel® Xeon® processor E7–4800 V2 product family
On Jordan Creek based platforms, exact memory speeds will depend on the memory configuration and population rules as well as the memory controller mode selected in the BIOS (Performance or Lockstep)
The rest of this paper discusses some of the main enhancements in this product family.
The C104/102 scalable memory buffer available for Intel Xeon processor E7 V2 platforms significantly increase memory capacity – with 24 DDR3 DIMMs (64 GB) per socket, it is possible to support up to 6TB in a 4 socket platform. The Intel Xeon processor E7 V2 family supports up to 1600 MHz DDR3 speeds. There are 2 modes of operation for the memory controller – performance and lock step mode. Performance mode is the normal (default) mode of operation with higher I/O and bandwidth. Lockstep Memory mode uses two memory channels at a time, stores half the cacheline in one DIMM on one channel and the other half on the next, and offers an even higher level of protection. In lockstep mode, two channels operate as a single channel—each write and read operation moves a data word two channels wide. In three-channel memory systems, the third channel is unused and left unpopulated. The Lockstep Memory mode is the most reliable, but it reduces the total system memory bandwidth by one-third in most systems. This mode of operation will be configurable from the BIOS, and often the BIOS by default will be set to operate in ‘Performance’ mode.
Intel Secure Key (Digital Random Number Generator: DRNG) is a hardware approach to high-quality and high-performance entropy and random number generation. The entropy source is thermal noise within the silicon.
Figure 2. Digital Random Number Generator using RDRAND instruction
Figure 2 shows a block diagram of the Digital Random Number Generator. The entropy source outputs a random stream of bits at the rate of 3 GHz that is sent to the conditioner for further processing. The conditioner takes pairs of 256-bit raw entropy samples generated by the entropy source and reduces them to a single 256-bit conditioned entropy sample. This is passed to a deterministic random bit generator (DRBG) that spreads the sample into a large set of random values, thus increasing the amount of random numbers available by the module. DRNG is compliant with ANSI X9.82, NIST, and SP800-90 and certifiable to FIPS-140-2.
Since DRNG is implemented in hardware as a part of the processor, both the entropy source and DRBG execute at processor clock speeds. There is no system I/O required to obtain entropy samples and no off-chip bus latencies to slow entropy transfer. DRNG is scalable enough to support heavy server application workloads and multiple VMs.
DRNG can be accessed through a new instruction named RDRAND. RDRAND takes the random value generated by DRNG and stores it in a 16-bit or 32-bit destination register (size of the destination register determines size of the random value). RDRAND can be emulated via CPUID.1.ECX and is available at all privilege levels and operating modes. Performance of RDRAND instruction is dependent on the bus infrastructure; it varies between processor generations and families.
Software developers can use the RDRAND instruction either through cryptographic libraries (OpenSSL* 1.0.1) or through direct application use (assembly functions). The Intel® Compiler (starting with version 12.1), Microsoft Visual Studio* 2012, and GCC* 4.6 support the RDRAND instruction.
Microsoft Windows* 8 uses the DRNG as an entropy source to improve the quality of output from its cryptographically secure random number generator. Linux* distributions based on the 3.2 kernel use DRNG inside the kernel for random timings. Linux distributions based on the 3.3 kernel use it to improve the quality of random numbers coming from /dev/random and /dev/urandom, but not the quantity. That being said, Red Hat Fedora* Core 18 ships with the rngd daemon enabled by default, which will use DRNG to increase both the quality and quantity of random numbers in /dev/random and /dev/urandom.
For more details on DRNG and RDRAND instruction, refer to the Intel DRNG Software Implementation Guide.
Intel OS Guard (Supervisor Mode Execution Protection: SMEP) prevents execution out of untrusted application memory while operating at a more privileged level. By doing this, Intel OS Guard helps prevent Escalation of Privilege (EoP) security attacks. Intel OS Guard is available in both 32-bit and 64-bit operating modes and can be enumerated via CPUID.7.0.EBX.
Figure 3. Pictorial description of Intel® OS Guard operation
Support for Intel OS Guard needs to be in the operating system (OS) or Virtual Machine Monitor (VMM) you are using. Please contact your OS or VMM providers to determine which versions include this support. No changes are required at the BIOS or application level to use this feature.
Intel®AVX, a new-256 bit instruction set extension designed for applications that are floating-point (FP) intensive. This product family also introduces a new set of instructions to convert between single-precision and half-precision floating-point formats.
Figure 4. Intel® Advanced Vector Extensions Instruction Format
Intel AVX introduces the following architectural enhancements:
Intel AVX employs an instruction encoding scheme using a new prefix (known as a “VEX” prefix). Instruction encoding using the VEX prefix can directly encode a register operand within the VEX prefix. This supports two new instruction syntaxes in Intel 64 architecture:
Two-operand instruction syntax previously expressed as
ADDPS xmm1, xmm2/m128
now can be expressed in three-operand syntax as
VADDPS xmm1, xmm2, xmm3/m128
In four-operand syntax, the extra register operand is encoded in the immediate byte. The introduction of three-operand and four-operand syntaxes helps to reduce the number of register to register copies, thus making the programming more efficient.
Intel AVX also brings some new data manipulation and arithmetic compute primitives, including broadcast, permute, fused-multiply-add, etc
Intel AVX improves performance due to wider vectors, new extensible syntax, and rich functionality which results in better data management. Applications that could benefit from Intel AVX include general purpose applications like image, audio/video processing, scientific simulations, financial analytics and 3D modeling and analysis.
Operating system and compiler support are needed for executing applications with Intel AVX. Some of the supporting operating systems include Linux* 2.6.30 or later, Windows 7* SP1 or later and Windows* 2008 server SP1 or later. The compilers supporting Intel AVX include Intel C/C++ and Fortran Compilers version 11.1 or later, Microsoft* Visual Studio 2010 or later and GCC* 4.4.1 or later.
There are a couple of ways a developer can make use of Intel AVX in their applications:
To best illustrate how AVX can be used, here is an example of how AVX was used to significantly improve performance of a financial services application: Case Study: Computing Black-Scholes with Intel® Advanced Vector Extensions
For more details on Intel AVX, please go to Intel® Advanced Vector Extensions (Intel® AVX)
A significant amount of performance overhead in machine virtualization is due to Virtual Machine (VM) exits. Every VM exit can cause a penalty of approximately 2,000 – 7,000 CPU cycles (see Figure 5), and a significant portion of these exits are for APIC and interrupt virtualization. Whenever a guest operating system tries to read an APIC register, the VM has to exit and the Virtual Machine Monitor (VMM) has to fetch and decode the instruction.
The Intel Xeon processor E7 V2 family introduces support for APIC virtualization (APICv); in this context, the guest OS can read most APIC registers without requiring VM exits. Hardware and microcode emulate (virtualize) the APIC controller, thus saving thousands of CPU cycles and improving VM performance.
Figure 5. APIC Virtualization
This feature must be enabled at the VMM layer: please contact your VMM supplier for their roadmap on APICv support. No application-level changes are required to take advantage of this feature.
The Intel Xeon processor E7 V2 family supports PCIe atomic operations (as a completer). Today, message-based transactions are used for PCIe devices, and these use interrupts that can experience long latency, unlike CPU updates to main memory that use atomic transactions. An Atomic Operation (AtomicOp) is a single PCIe transaction that targets a location in memory space, reads the location’s value, potentially writes a new value back to the location, and returns the original value. This “read-modify-write” sequence to the location is performed atomically. This is a new operation added per PCIe Specification 3.0. FetchAdd, Swap, and CAS (Compare and Swap) are the new atomic transactions.
The benefits of atomic operations include:
The Intel Xeon processor E7 V2 family also supports X16 non transparent bridge. All these contribute to better I/O performance.
These PCIe features are inherently transparent and require no application changes.
For more details on these PCIe features, refer to:
The new RAS features require additional enabling. Please refer to the Appendix for the supported Operating Systems and VMMs that support these new features.
In summary, the Intel Xeon processor E7 V2 family, combined with the Brickland platform, provides many new and improved features that could significantly change your performance and power experience on enterprise platforms. Developers can make use of most of these new features without making any changes to their applications.
Figure 6: Intel® Xeon® Processor E7 Family RAS Features OS Support Summary
** New features will be supported in upcoming OS releases. Please contact OS vendors for additional details
¥ denotes new features.
Figure 7: Intel® Xeon® Processor E7 Family RAS Features Virtualization (VMM) Support Summary
** Additional features will be supported in upcoming releases. Please contact vendors for additional details
¥ denotes new features.
Sree Syamalakumari is a software engineer in the Software & Service Group at Intel Corporation. Sree holds a Master's degree in Computer Engineering from Wright State University, Dayton, Ohio.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804