

Refer to the disclosure and list of affected processors for more information on CVE-2018-12207.
Intel processors that support machine check architecture have a mechanism to detect and report hardware errors (such as system bus errors, ECC errors, parity errors, cache errors, and TLB errors) to system software. The machine check architecture enables the processor to generate a machine check exception to signal the detection of a machine check error. This allows system software and OS developers to gracefully shut down the system when hardware error conditions are detected on the platform.
Recently, Intel discovered that a previously published erratum on some Intel platforms can be exploited by malicious software to potentially cause a denial of service by triggering a machine check that will crash or hang the system. The error code logged for this machine check is 0150H.
This article describes the conditions for the machine check error erratum, details how the 0150H error can be exploited to potentially enable a denial of service, and describes methods to mitigate this issue. Refer to the List of Affected Processors section for a full list of affected processors and errata.
The CVE assigned for the page size change MCE vulnerability is CVE-2018-12207 with a CVSS score of 6.5.
Modern OSes use paging for programs, execution, and memory accesses. The processor translates programs' linear memory address accesses (for example, instruction fetch or data access) to physical addresses using page tables.
Page tables are data structures in memory used by the processor to translate linear addresses to physical addresses. These translations happen on certain granularity, also known as the page size, because the translation of linear address to physical address stays the same for the entire page. Intel processors support page sizes of 4 KB, 2 MB, 4 MB and 1 GB. More details on paging can be found in the Intel SDM Vol. 3, Section 4.9 Paging and Memory Typing.
To save the performance cost of walking through the page tables on each linear memory access within the same page, the processor uses a hardware structure called the translation lookaside buffer (TLB) to cache resolved translations from linear addresses to physical addresses. The processor can use the cached translations instead of performing a page table walk on the next access to a linear address within the same page. The page size change MCE issue is related to the instruction TLB (ITLB) that contains the code fetch address translations.
Privileged system software, like the OS or VMM, is responsible for platform memory management and thus owns the page tables. The OS or VMM may perform page table modifications to create a new translation or to alter an existing translation for a linear address. When the OS or VMM modifies an existing page table translation and the TLB already contains a translation for that page, the processor may choose either translation (for example, either from the TLB or from the page table in memory). The implications of this software-induced inconsistency are described in the Intel SDM Vol. 3, Section 4.10.4.4 Delayed Invalidation.
There are a series of errata that have been filed for the instruction fetch unit in multiple generations of Intel processors. The full list is available in the List of Affected Processors section. An example for 6th generation Intel® Core™ processors is shown below.
SKL002 | Instruction Fetch May Cause Machine Check if Page Size Was Changed Without Invalidation |
---|---|
Problem | This erratum may cause a machine-check error (IA32_MCi_STATUS.MCACOD=0150H) on the fetch of an instruction that crosses a 4 KB address boundary. It applies only if all of the following are true:
|
Implication | Due to this erratum an unexpected machine check with error code 0150H may occur, possibly resulting in a shutdown. Intel has not observed this erratum with any commercially available software. |
Workaround | Software should not write to a paging-structure entry in a way that would change, for any linear address, both the page size and the memory type. It can instead use the following algorithm: first clear the P flag in the relevant paging-structure entry (for example, PDE); then invalidate any translations for the affected linear addresses, and then modify the relevant paging-structure entry to set the P flag and establish the new page size and memory type. |
Software sequences that may lead to machine check error code 0150H can be summarized as follows:
IA32_MCi_STATUS.MCACOD=150H
), which can result in a system hang or shutdown.These errata are applicable only to code pages. There is no known issue with the data pages in similar page size change conditions.
Applications cannot cause an OS to make changes to page tables that would trigger the conditions described in the erratum. Intel has worked with industry partners to ensure that OSes follow the guidelines documented in the Intel Software Developer manual. There is no known security vulnerability created by this erratum in bare metal OS environments.
In virtualized environments, a malicious guest has direct access to its own page tables and could make changes to them in ways that would trigger this issue. Note that when an off-the-shelf commercial operating system is used in a Platform as a Service (PAAS) or Software as a Service (SAAS) environment, the applications in that environment do not have the necessary access to force the OS to make changes to page tables that would trigger the issue. Infrastructure as a service (IAAS) cloud services allow users to provision their own operating system and other privileged software, like device drivers. In this case, the privileged guest software is outside of the trusted computing base (TCB) of the host cloud service provider and could potentially contain malicious code sequences.
The page size change MCE issue can be mitigated by applying software algorithms to the VMM/hypervisor. This section describes:
Intel has added a new bit in the IA32_ARCH_CAPABILITIES
MSR to current and future generation CPUs to help VMMs and hypervisor software determine if the processor is vulnerable to the page size change MCE issue. Your system may need to apply the latest MCUs to correctly detect the vulnerability.
Bit | Scope | Attribute | Description |
6 | Thread | Read only | IF_PSCHANGE_MC_NO: The processor is not susceptible to a machine check error due to modifying the size of a code page without TLB invalidation. |
IA32_ARCH_CAPABILITIES
MSR (address 10AH) exists. If IA32_ARCH_CAPABILITIES[IF_PSCHANGE_MC_NO]
(bit 6) is set, then the CPU is not vulnerable to the page size change MCE issue.IA32_ARCH_CAPABILITIES
MSR does not exist, or if IF_PSCHANGE_MC_NO
is clear, then the CPU may be vulnerable to the page size change MCE issue.
Pseudocode that represents the algorithms is shown below:
If (CPUID.(EAX=07H,ECX=0H).EDX[29] == 1 && IA32_ARCH_CAPABILITIES[IF_PSCHANGE_MC_NO] == 1) { CPU is not vulnerable; no mitigation needed } else { CPU may be vulnerable Consult CPUID Family/Model/Stepping for further determination Use SW mitigation if needed }
VMMs can mitigate the page size change MCE vulnerability by using the software techniques described below.
The VMM can use Extended Page Tables (EPT) to enforce that each guest physical address is 4 KB in size and that guest software cannot change the hardware page size for translations. Guests running on the VMM with EPT enabled in this way will protect the hardware from the page size change MCE vulnerability, but may encounter impacts from longer page walks, and reduced TLB coverage.
One approach to minimize these impacts is to limit the use of 4 KB entries to only those locations where they are strictly required to mitigate the MCE issue. Since only executable translations can be cached in the ITLB, 2 MB, 4 MB, and 1 GB page EPT entries may be used as long as they are always marked nonexecutable. The following example shows how VMMs can accomplish this using a 2 MB large page:
The sequence above requires bit 10 (Execute access for user-mode linear address) to be treated in the same manner as bit 2 when mode-based execution controls are active (the mode-based execute control for EPT feature is present and the VM execution control is set to 1).
As soon as the guest software attempts to execute code mapped in a 2 MB, 4 MB, or 1 GB page, the VMM gets a notification of an EPT violation due to lack of execution permission. The exception handler in the VMM can convert the large page into 4 KB pages. When the VMM resumes the guest software, the software can continue to run, and the page size information obtained by the ITLB will be 4 KB.
The large page used by the EPT paging structures can be a 1 GB or 2 MB page. To resolve such EPT violations, the VMM can convert the executable 2 MB area into 4 KB pages and add execution permission. If the large page is 1 GB, the VMM can convert the nonexecutable areas into 2 MB pages with nonexecutable permission as an intermediate step instead of converting the entire 1 GB region into 4 KB pages. Through the detection of EPT violations described above, only code pages will get converted pages in a selective manner.
One of the notable consequences of the above algorithm is that the number of 4 KB pages could constantly increase as the guest software runs more workloads. If pages used for code execution are randomly chosen in the guest software, the large pages are converted to 4 KB pages probabilistically. This would eventually start to cause performance impacts on workloads. To avoid this situation, the VMM can periodically scan the EPT paging table structures to find and clean up 4 KB pages by combining 512 of them into a 2 MB page with the execute permission bit cleared at runtime.
Intel has been working with industry partners to make necessary software changes to enable software mitigations. It is possible that your latest system software or VMM/hypervisor already contains a security mitigation that is either enabled by default or can be optionally enabled. Intel strongly recommends that users keep their systems up-to-date with the latest software, and also recommends confirming the mitigation status of your VMM/hypervisor with your software provider to ensure that mitigations are properly enabled in exposed environments. Some VMMs/hypervisors may support configuring these mitigation on a per-VM basis. The guidelines below will help you evaluate your VM environment.
As described in earlier sections, it is important to note that the page size change MCE vulnerability applies only to certain environments. The following general guidelines can help you determine whether it is necessary to enable the mitigation in your environment. This decision can be highly dependent on the actual environment and requirements of your system.
IA32_ARCH_CAPABILITIES
MSR with IF_PSCHANGE_MC_NO
set. Enabling the software mitigation in the nested guest VMMs would be ineffective.It is possible to unnecessarily enable the software mitigation due to incorrect diagnosis of the vulnerability (described in the Enumeration of Vulnerability section). However, the suggested software techniques in the Software Mitigation Algorithms section is compatible with non-vulnerable CPUs and should not cause functional issues.
Applying the software mitigation described here may result in reduced TLB coverage and increased TLB fill costs, which may affect system behavior.
On certain processors (family model 06_2DH with Intel® Virtualization Technology (Intel® VT) for Directed I/O (Intel® VT-d), Intel has observed performance degradation when using 2 MB pages for Intel® VT-d translations. Intel recommends only using 4 KB pages for Intel VT-d.
Furthermore, Intel has observed that when sharing page tables with EPT and Intel VT-D, breaking up 2 MB mappings to 4 KB mappings while DMA is in progress may result in spurious unrecoverable Intel VT-d faults on this processor family.
Therefore, Intel suggests an additional mitigation step for this processor family to ensure that EPT tables and Intel VT-d tables are not shared.