This technical deep dive expands on the information in the Load Value Injection (LVI) disclosure overview for software developers. Note that this documentation will use more precise (but different) terminology for transient execution side channel methods than we have used in past documents. Be sure to review the updated terminology guide and the list of affected processors.
Triggering a Fault, Assist or Abort
OS access using application pointer
Induced Victim Memory Access in Application
Clearing of accessed or present bits causing memory pressure
Attacker manipulation of page tables
Intel TSX Abort
Impact to OS from application (including when virtualized)
Impact to VMM from VM
Impact between guests in virtualized environments
Between different applications
Inside an application
LVI Mitigations for Intel SGX
Mitigating Load+Load+Transmit and Load+Branch
Tooling Support to Automate LVI Mitigation
Instructions that Require Special Treatment
On some processors, faulting or assisting1 load operations may transiently receive data from a microarchitectural buffer2.
If an adversary can cause a specified victim load to fault, assist, or abort, the adversary may be able to select the data to have forwarded to dependent operations by the faulting/assisting/aborting load. For certain code sequences, those dependent operations may create a covert channel with data of interest to the adversary. The adversary may then be able to infer the data's value through analyzing the covert channel. This transient execution attack3 is called load value injection (LVI) and is an example of a cross-domain transient execution attack.
Because LVI methods requires several complex steps4 to be chained together when the victim is executing, it is primarily applicable to synthetic victim code developed by researchers or attacks against SGX by a malicious operating systems (OSes) or virtual machine managers (VMMs). LVI has been assigned CVE-2020-0551 with a base score of 5.6 Medium, CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:C/C:H/I:N/A:N.
There are four types of hardware behavior for which we will discuss LVI applicability: LVI stale data, LVI zero data, Zero-at-ret, and No forwarding.
The types of hardware behavior that might allow LVI methods are:
The types of hardware behavior on processors not affected by LVI are:
Due to the numerous, complex requirements that must be satisfied to implement the LVI method successfully, LVI is not a practical exploit in real-world environments where the OS and VMM are trusted. Because of Intel SGX's strong adversary model, attacks on Intel SGX enclaves loosen some of these requirements. Notably, the strong adversary model of Intel SGX assumes that the OS or VMM may be malicious, and therefore the adversary may manipulate the victim enclave's page tables to cause arbitrary enclave loads to fault or assist. Where the OS and VMM are not malicious, LVI attacks are significantly more difficult to perform, even against Intel SGX enclaves. Accordingly, system administrators and application developers should carefully consider the particular threat model applicable to their systems when deciding whether and where to mitigate LVI.
There are three, multi-step LVI methods that malicious actors could potentially use to infer secret data or linear memory addresses from a victim's application's load operations. Note that the first two LVI methods require the attacker to be able to provide input to the victim application that will be read from memory and written to memory. All three methods require the adversary to perform surveillance and find a suitable code sequence in the victim's program that satisfies all of the requirements, enumerated in each section below.
The first method is to use LVI in conjunction with pre-existing patterns in the victim code as a universal read gadget that allows the attacker to select which values in the victim's memory it wishes to infer.
Example victim code (LVI MSBDS universal read gadget):
*b = a; // Prime attacker data 'a' in a store buffer d = *c; // Load faults, attacker data 'a' forwarded to 'd' e = *d; // Load secret from attacker-controlled address 'd' leak = oracle[e * 4096]; // Transmit secret over covert channel
Note that steps 2, 3 and 4 are referred to later as the "Load+Load+Transmit" pattern.
The second method is to use LVI to redirect transient control flow to jump to other code inside the victim that reads and transmits a secret:
Inject data, hijack control flow: There is a subsequent branching instruction (for example, an indirect call/jump) within the transient execution window that depends on the result of the faulting/assisting/aborting load in step 2. The adversary-provided input from step 1 may be forwarded to the address operand of this branching instruction. Hence, the branching instruction may redirect the instruction pointer to the adversary-provided address. Note that the same constraints on data forwarding that apply to the universal read gadget above also apply here.
Another technique is to overwrite the victim code's stack pointer, typically through LVI zero data, and to cause the
RET instruction to retrieve the
RIP value from the stack memory under the attacker's control. This stack pointer hijacking method is primarily relevant to Intel SGX enclaves.
Example victim code (LVI MSBDS control-flow hijacking gadget):
*b = a; // Prime attacker data 'a' in a store buffer d = *c; // Load faults, attacker data 'a' forwarded to 'd' d(); // Branch to attacker-controlled address `d
Note that steps 2 and 3 in the example above are also referred to later as the "Load+Branch" pattern.
There is also a third related method known as Load+Transmit that can allow an attacker to read non-arbitrary secrets from a victim application. This method is not exactly LVI, since it does not involve an injection of attacker data. Instead, it can be characterized as triggering MDS in the victim application so that the victim leaks a specific secret that a pre-existing portion of the victim code already loads or stores (a non-universal read gadget). The steps are as follows:
Example victim code (LVI MSBDS non-universal read gadget):
*b = s; // Victim secret 's' allocated to a store buffer d = *c; // Load faults, victim secret 's' forwarded to 'd' leak = oracle[d * 4096]; // Transmit secret over covert channel
Note that steps 2 and 3 in the above are also referred to later as the Load+Transmit pattern.
The previous section outlined the steps that the attacker must perform in attempting to implement an LVI method. All of these methods have the same high degree of complexity as the MDS-style methods that they utilize, in addition to requiring the additional, and extremely complicated, step of the attacker being able to trigger a fault, assist, or Intel TSX abort in the victim's context, while the victim program is executing, and on a specific load instruction in a section of victim code that meets all of the previously described requirements.
This section lists several methods that an attacker could use to attempt to trigger the fault/assist/abort.
A malicious application makes a system call to the OS and passes a parameter that requires the OS to access application memory. Then, the malicious application chooses a memory location for the parameter that would cause either a page fault or assist. Note that hardware features such as Supervisor Mode Access Prevention (SMAP), as well as existing software mitigations for bounds check bypass (Spectre variant 1), may prevent malicious actors from triggering a fault/assist in this manner. Refer to the LVI Impact on OS/VMM section for more details.
A malicious application or guest can attempt to manipulate a victim's pointer such that the victim's usage of that pointer transiently signals a fault or assist.
There are two classes of induced access:
Type confusion is an example of a programatically invalid access. It is possible for typed languages to transiently load from a number that is not a pointer. Such a load may use a non-canonical address and could thus receive incorrect data. These programmatically invalid accesses only occur transiently and might be made to any location in the address space. These can be mitigated with speculation control before the access, as described in the Typecasting and indirect calls section in the Bounds Check Bypass Deep Dive.
A page fault that occurs when accessing a memory mapped file is an example of a programatically valid access. The memory location passes all validity checks and can occur in both the speculative and retired instruction streams.
Certain vector load instructions may also generate a fault when they are not aligned and may receive incorrect data. Because modern OSes do not use segment limits and applications rarely enable alignment checks, these faults are not useful to attackers.
A malicious application or guest may reference enough memory to cause the OS or VMM to take actions to reclaim memory pages. Note that causing the victim application or guest's memory to be paged out may be an undesirable outcome for the attacker since, on Intel® Core(tm) i processor family models, forwarding from store buffers does not occur when the page is marked not-present and forwarding from fill buffers or load port does not occur on such faults until the load is ready to retire. So the attacker would want to cause just enough memory pressure so that the OS/VMM clears the accessed bit in the page table or extended page table (EPT) so that an assist occurs when the victim accesses memory.
Causing memory pressure on a victim requires sharing the OS's Least-Recently-Used (LRU) list with the victim, which is typically not the case when the victim runs in a different container, or in a different VM. It is also not possible when the targeted victim memory is pinned, as can be the case for many VMs and some applications.
It is theoretically possible for an application to cause memory pressure that results in the OS paging out or clearing A/D bits for OS data in a way that allows an LVI method to be possible. It is also theoretically possible for a guest application to cause memory pressure in a way that results in the VMM paging out or clearing EPT A/D bits for data belonging to the guest OS.
This type of attack is of concern only to Intel SGX, as an Intel SGX enclave is the only environment where the adversary has the potential to directly control Intel® architecture (IA) or EPT page tables of the victim. The malicious OS/VMM can potentially arbitrarily manipulate the victim enclave's page tables to induce faults or assists on attacker-chosen enclave pages.
Regarding Intel TSX transactions, the vast majority of code execution is outside of Intel TSX transactions. However, if the victim program uses Intel TSX, then Intel TSX aborts are a possible avenue for an LVI method. In addition to the fault cases discussed above, Intel TSX aborts can potentially be caused by L1 cache evictions. Although conflicts typically cannot be caused by different applications, one exception is when a different application executes on a sibling thread on the same physical core. However, this case is mitigated by the MDS mitigations for simultaneous multithreading (SMT). Refer to the Deep Dive: Microarchitectural Data Sampling for more details.
During out-of-order execution of a load operation, the processor may speculatively select a value from a microarchitectural data source as the result of the load. If this speculative value matches the correct value, then subsequent speculative instructions that depend on this value may eventually retire. Otherwise, these (transient) speculative instructions will eventually be squashed. However, transient instructions that depend on a mis-speculated value may have microarchitectural side effects that can be observed via a covert channel.
The following transient execution attacks may be used to enable the LVI method:
Microarchitectural Data Sampling or TAA methods may cause faulting, assisting or aborting loads to receive the incorrect data from the fill buffers (MFBDS), store buffers (MSBDS), or load ports (MLPDS). Refer to the MDS Deep Dive and the TAA Deep Dive for background information.
An LVI method using MSBDS works through the attacker causing a victim load to fault, assist, or Intel TSX abort so that the load operation is transiently forwarded data that the attacker desires from a store buffer entry. This forwarded data might be a secret or the memory address of a secret that the attacker wants to infer. On the Intel® Core(tm) i processor family, loads that fault due to not present pages or not present EPT pages (
RWX are all 0), do not transiently forward store buffer data to dependent operations and thus cannot cause MSBDS-based LVI on such faulting loads.
Because the vast majority of code execution is outside of Intel TSX transactions, and because the number of loads done within Intel TSX is relatively low compared to software executing outside of an Intel TSX transaction, Intel TSX aborts that may cause MSBDS are expected to be less useful to an attacker attempting to use the LVI method. The most likely LVI vector using MSBDS from a non-system software attacker would be to cause the victim process to take a microarchitectural assist to update a paging accessed bit or cause a non-canonical address violation in a memory location that the attacker desires. Refer to the LVI Impact on OS/VMM section for further discussions of the applicability of using MSBDS to mount LVI methods outside of Intel SGX.
An LVI method using MFBDS works through the attacker causing a victim load to fault, assist, or Intel TSX abort so that the load is transiently forwarded data that the attacker desires from a fill buffer entry. Because this only occurs if the fill buffer entry has the same physical address as the victim load, MFBDS LVI is more relevant to system software attackers against Intel SGX than to non-system software attackers, who are the adversaries for non-SGX LVI victims. A system software adversary of Intel SGX has direct control of the Intel SGX victim's page tables and thus may be able to put secret data or a memory address desired by the attacker at the exact physical address at the leaf level of the page walk for the faulting loads. Non-system software (for example, an application adversary) does not have direct control of page tables that map the victim and thus cannot put the exact address the attacker wants in the victim's page tables.
MFBDS may still occur without such direct control. This occurs when a fill buffer is allocated but the data portion of the fill buffer entry has not yet been updated and thus is stale. If a later assisting/faulting/aborting load matches the physical address of this newly allocated fill buffer, it may be forwarded the stale data, which may be of use to the adversary. Unlike with the MFBDS attack, an LVI method using MFBDS needs to induce this in victim code. Specifically, it needs a faulting/assisting/aborting load to hit a fill buffer entry that is currently in use by a non-faulting/assisting/aborting operation with the same physical address where that entry's data that not yet been updated and happens to be useful to the attacker. Inducing this in victim code is more complex than the MFBDS attack, which occurs within the attacker's code.
There are two causes for MLPDS:
There are a number of limitations for vector MLPDS based LVI attacks. A faulting/assisting/aborting vector load will only forward non-zero data in the upper bits of the vector register-the lower 64 bits will be zeroed. There are fewer victim code sequences that use vector registers in a way that create a covert channel based on their contents because pointers are generally dealt with using the general purpose registers instead of the vector registers. The data forwarded by MLPDS, the retained data on the load ports, is a small set of data. This makes it more difficult for malicious actors to cause their desired data to be forwarded to a MLPDS faulting operation, making exploitation of LVI using vector MLPDS even more difficult to exploit than other variants.
A faulting/assisting/aborting load which spans a 64-byte boundary may also enable the conditions for MLPDS. The set of data which can be forwarded is small as discussed above for vector MLPDS LVI and there fewer of the victim code's loads are likely to split 64-byte boundaries.
Unlike MSBDS, MLPDS may be caused by not present faults or EPT violations on the Intel® Core(tm) processor family. Loads that take not present faults or EPT violations are not executed transiently, only at retirement. This creates a much smaller window of time for the disclosure gadget to execute and cause a covert channel and has a much more specific set of conditions to create an exploitable gadget (loads that split cache lines and have a disclosure gadget immediately following the load), both of which make split MLPDS LVI even more difficult to exploit than other variants.
As with the MFBDS LVI method, an L1TF or E2E method can only inject data into a victim load from the same physical address. Because system software has direct control of the page tables, it may be able to put secrets or an attacker-desired linear address at the exact physical address of the faulting loads. In general, non-system software does not have direct control of page tables that map the victim and thus cannot do that. Thus, L1TF and E2E methods are primarily of concern with respect to system software attackers (for example, against Intel SGX enclaves).
Many processors may forward a fixed value of 0 to a faulting/assisting load's dependent instructions, for example when the targeted address is not present in the L1D cache. Some processors mitigate general cases of RDCL, L1TF, MDS, or TAA by forwarding a value of 0 to dependent operations of the load (instead of forwarding other data values that may contain secret data or be controlled by a malicious actor). Although this mitigation reduces the risk of an LVI method in typical OS environments, there are certain situations where an adversary injecting a value of 0 to dependent operations may lead to a victim transiently creating a covert channel desired by the adversary. Since mainstream OSes mark the low page containing address 0 as not present, this LVI zero data method is primarily relevant to Intel SGX enclaves with a system software adversary.
Some processors will generally forward a value of 0 to dependent operations, but only when the faulting load is the next operation to retire. This behavior is called Zero-at-ret. Such behavior ensures the processor will not transiently forward 0 to dependent operations before previous instructions have resolved (for example, before an older jump mispredicts). This significantly constrains speculation-only a few dependent operations will execute in the transient execution window.
Using Zero-at-ret to target and leak memory contents would require a dependency chain longer than allowed by the at-ret cancellation window, and therefore is impractical on processors with Zero-at-ret behavior. Accordingly, there are enormous difficulties to finding and exploiting a Zero-at-ret vulnerability in real-world production software9.
Unlike domain-bypass attacks like MDS or L1TF, where the attacker has direct control over the instructions executed, LVI is a cross-domain method and thus requires manipulating the victim code's behavior. As described in the Steps and elements to cause LVI section, the malicious actor needs to:
Needing to perform all these steps increases the complexity of the attack, beyond the already significant complexities present in other transient execution vulnerabilities
We describe the potential impacts for Intel SGX, OS/VMM and applications separately below:
Intel SGX's threat model identifies all software running outside of an Intel SGX enclave as untrusted, including privileged OS (or hypervisor) software. In the context of LVI, an malicious OS can cause arbitrary loads to fault or assist during enclave execution by marking an enclave page as not present, and then resuming the enclave. The next time the enclave code attempts to load from any address within the page marked not present, the memory access will fault, and stale data or a value of 0 may be forwarded to dependent instructions.
As explained in the previous sections, on processors affected by L1TF or MDS, stale data might be forwarded to the faulting/assisting instruction if the specific conditions for stale data forwarding are met. On these processors, with the microcode mitigations for L1TF and MDS applied, any interrupt or exception (including the single-stepping timer interrupt generated by, for example, the SGX-Step tool10) in the attempt to modify the page tables at a specific moment flushes the L1D and the microarchitectural buffers that can be exploited by MDS. Therefore, malicious actors would need uninterrupted enclave execution between the instruction that created the stale data and the load instruction that might fault/assist to ensure the success of the stale data forwarding.
On processors that mitigate L1TF and MDS by forwarding a value of 0 to dependent operations, the value 0 is forwarded to the faulting/assisting load instruction instead of any stale data. This limits the scope of what the attacker may be able to achieve to the LVI zero data variant.
To construct an LVI exploit, forwarding stale data or a value of 0 to the faulting/assisting instruction is a necessary but insufficient requirement. The exploit must also make sure the dependent instructions inside the enclave access secret data and transmit the secret data through a covert channel, all within the transient execution window.
It is worth clarifying that in an environment where a malicious OS (or hypervisor) is not involved (for example, the platform owner does not intentionally load a malicious OS to attack an Intel SGX enclave, but instead the system was infested by unprivileged malware) it is much harder for an unprivileged attacker than for the malicious OS to mount a LVI attack on the Intel SGX enclave. The scenario of an unprivileged malware attacking an Intel SGX enclave should be considered a special case of an unprivileged malware attacking another application discussed in the Between different applications section later.
An unprivileged adversary has few points of leverage to induce faults or assists into code executing at a higher privilege level. OSes and VMMs that have already been mitigated against Spectre and L1TF/MDS will significantly reduce the risk of LVI attacks against the OS or VMM.
Refer to the OS access using application pointer section. The values of user-supplied parameters are not trusted by the kernel, and hence the ability to transiently inject arbitrary values does not supply the current process with any additional control of the kernel's speculative execution. Existing kernels should already be hardened against transient execution attacks on user application interfaces for Spectre variant 1.
If the OS makes use of Supervisor Mode Access Prevention (SMAP) on processors with SMAP enabled, then LVI on kernel load from user pages will be mitigated. This is because the
STAC instructions have
LFENCE semantics on processors affected by LVI, and this serves as a speculation fence around kernel loads from user pages.
An OS that pages its own memory may provide more opportunities for malicious actors to find a gadget that follows code that takes an assist on a kernel page where the OS has cleared the accessed bit. But malicious actors have no control over when the OS may clear accessed bits, and the rate at which the OS does so is low.
When executing sandboxed code in a kernel that relies on language based security, mitigations against other transient execution attacks (for example, bounds check bypass, branch target injection, and speculative store bypass) would greatly increase the difficulty of LVI methods, since LVI relies on similar code patterns as these methods.
Similar to the impact an application has on an OS, a VMM responding to VM calls by a guest can access guest-controlled addresses. Typically VMMs walk page tables in software, which doesn't allow faults while accessing the guest's memory. This makes it difficult for guests to cause faults in the hypervisor.
OSes that do not page their own memory may be theoretically vulnerable to taking faults and assists while executing if they are running as a guest of a VMM that is clearing EPT accessed/present bits due to memory pressure. Pinned VMs, or VMs running with separated LRU lists in containers, are not impacted. Even for non-pinned VMs, the necessary attack scenario is very complex and is highly unlikely to be practical. Malicious actors would need to take similar steps as described in the application section below.
Malicious applications may attempt to use LVI stale data11 with the attacker directly injecting data into internal CPU buffers to infer data of other applications. However, with the specified MDS mitigation applied on affected CPUs, internal CPU buffers are cleared on
MD_CLEAR operations (including when switching to an application) and may be protected through appropriate SMT scheduling for sibling hyperthreads.
This implies that attackers cannot directly inject values for the prime data step in the Load+Load+Transmit and Load+Branch LVI variants on systems that already mitigate MDS. Malicious actors would instead need to rely on values already present in the victim process. These values might be present in the victim process because it is interacting with the attacker (for example, if the attacker is passing data to the victim process). If the attacker cannot inject data values into the victim's data, the attacker will not be able to accomplish the prime data step and thus cannot perform those LVI variants.
As discussed in the Speculative Microarchitectural Data Sources section, an LVI mechanism that avoids some of these restrictions for a non-system-software adversary (for example, a malicious application) is MSBDS on a paging accessed bit update assist. The methods to cause such an assist are detailed in the Triggering a Fault, Assist or Abort section.
A successful LVI stale data method on another application using paging accessed bits requires the following preconditions:
VERW(for example, on context switch or system call).
When an application's memory pressure increases to help identify candidate pages for swapping, the OS uses the page accessed bit to help identify which pages are least recently used. The OS does this by periodically clearing the accessed bits and reviewing which pages have the accessed bit set (not candidates of page swapping).
There are a number of challenges for an application to influence the memory pressure of another application, and in many cases it is not possible at all. Refer to the Clearing of Accessed or Present bits Attacker causes memory pressure section for more details. Malicious adversaries would need to influence the OS to clear accessed bits on the correct victim process pages. This requires generating memory pressure, which is possible, but generally requires touching a lot of memory and is quite slow and noisy. The exact timing of accessed bit clearing is hard to control.
Even if a malicious adversary is able to clear the accessed bit, any retiring load or store to a page that the malicious adversary selects to cause an assist will trigger a hardware paging table update assist, which will set the accessed bit on that page. This means that the attacker would likely need to clear the accessed bit again (for example, by cycling through the LRU list again), and to repeat that necessary step with each attempt to infer data.
Lastly, the gadget in the victim must leak values via a covert channel (for example, cache) and the attacker will need to infer the values leaked by the covert channel (for example, through monitoring the cache), before system noise (like normal cache traffic) obscures the signal.
Although the usage of MSBDS with paging accessed assists avoids some of the restrictions that limit other techniques, nevertheless lining up all these conditions to successfully execute this method in a non-contrived scenario, with the necessary precision to extract meaningful data, is extremely complex. Accordingly, software developers should carefully evaluate their environment and workloads before choosing to mitigate this method in actual applications.
It is also possible for LVI to be used as an in-domain method. In this situation, untrusted code running within a sandbox could employ the same steps described earlier in the application-to-application section in order to infer data values in the same process.
A sandbox running untrusted code would require the same steps described in the application-to-application section to mount an LVI method against higher privileged code in the same process. To trigger accessed bit assists, the sandboxed application may need to create significant memory pressure. Given that the sandbox is within the same process as the higher privileged code, memory pressure can have a more direct effect on the higher privileged code. However, existing resource limits should help mitigate the issue. Lining up all of the required steps will increase the difficulty of a practical method.
In general, in-domain transient execution attacks are able to leverage the fact that the adversary has more control over code generation and can more easily generate the desired gadgets instead of needing to find them in victim code. This applies to previously disclosed in-domain transient execution attacks like bounds check bypass (Spectre variant 1), branch target injection (Spectre variant 2), and speculative store bypass (Spectre variant 4), as well as to in-domain LVI. For unmitigated runtimes, the risk of in-domain LVI methods is less than the risk of existing in-domain transient execution attacks due to the complexity of LVI methods. Intel has already published a deep dive for managed runtimes, and the primary recommended mitigation discussed there is also effective against in-domain LVI.
The threat model for Intel SGX assumes that a malicious OS/hypervisor may arbitrarily manipulate an Intel SGX enclave's page tables. This allows the attacker to cause arbitrary loads to fault or assist during enclave execution.
Because any load may fault or assist, and because it is difficult to determine at compile time whether adversary-desired data may be forwarded by a faulting/assisting load, mitigation techniques may need to consider all possible gadgets, even if many of them might not be exploitable.
The following are summary characterizations of LVI exploits:
This section will describe software mitigation techniques that can be applied to enclaves in order to mitigate LVI attacks against those enclaves. Additionally, updates to the Intel SGX SDK will be released that apply these software mitigations. There is no additional microcode update needed to mitigate LVI (either for Intel SGX or in general).
The Load+Transmit LVI variant requires a faulting/assisting load from memory and a subsequent operation that may transmit the loaded value over a covert channel. For example:
MOV rbx, QWORD PTR [rdi] # Load MOV rcx, QWORD PTR [rbx] # Transmit
If the first load faults/assists, then a stale value may be forwarded to the second load's memory operand. Hence the stale value will be used as an address to access memory, potentially disclosing that value through a covert channel (for example, the last level cache (LLC)). Note that this is only a valid LVI exploit if the stale value is a program secret.
In general, it is not possible to statically determine whether any given load may forward a secret. Therefore, a comprehensive mitigation strategy must consider all Load+Transmit "gadgets" (even if not all of them are exploitable). For each Load+Transmit gadget, the developer should ensure that at least one
LFENCE instruction will be executed in between the load and the transmit, along all viable control flow paths. The
LFENCE ensures that if the load faults/assists, then the load will retire before a stale value can be transiently forwarded to the transmit instruction.
Because load and branch instructions transmit their memory operands, any mitigations deployed for Load+Transmit gadgets are inherently adequate to mitigate Load+Load+Transmit and Load+Branch gadgets.
For example, consider the following code that fits the Load+Load+Transmit pattern:
MOV rbx, QWORD PTR [rdi] # Load may fault and inject stale value into rbx MOV rcx, QWORD PTR [rbx] # Attacker uses stale value to load secret into rcx MOV QWORD PTR [rcx], r8 # Transmit secret over cache based covert channel
Notice that the second and third instructions fit the Load+Transmit pattern, and so do the first and second instructions. Hence the Load+Transmit mitigation described in the prior section would yield:
MOV rbx, QWORD PTR [rdi] # Load LFENCE # Forces prior Load to retire MOV rcx, QWORD PTR [rbx] # Load -- rbx guaranteed to be non-stale LFENCE # Forces prior Load to retire MOV QWORD PTR [rcx], r8 # Store -- rcx guaranteed to be non-stale
The Load+Transmit mitigation applies similarly to Load+Branch gadgets. Consider the following:
MOV rcx, QWORD PTR [rsi] # Load JMP rcx # Branch/Transmit
If the MOV instruction is used to inject stale data into
rcx, then the
JMP instruction can be used to either branch to an attacker-chosen instruction sequence, or to transmit the stale data over a covert channel (by fetching instructions into caches from the jump target). The latter case is analogous to Jump Oriented Programming-style methods. Either way, the gadget can be mitigated by inserting an
LFENCE after the load.
Depending on the execution properties of the Intel SGX enclave workload (for example, CPU-bound vs. I/O-bound, cache locality, etc.), the performance impact of mitigating all potential Load+Transmit, Load+Load+Transmit, and Load+Branch gadgets will vary depending on workload but may be significant in some cases. If the overhead imposed by mitigating all loads is unacceptable and their particular threat model allows for it, then independent software vendors (ISVs) may also opt to only apply partial mitigations.
For example, universal read gadgets of the form of the Load+Load+Transmit LVI variant may be very difficult to find, and even more difficult to exploit, due to needing to satisfy all of the factors described in Steps and elements for attackers to cause LVI. Potential Load+Branch (control flow) gadgets, such as
RET or indirect jump/call, if exploitable, may give malicious adversaries more options to locate a code sequence to read and transmit a secret through a covert channel within the transient execution window. Accordingly, ISVs should evaluate the risk of the potential LVI methods and should carefully consider whether to mitigate only the Load+Branch sequences, which will likely have a lower performance impact (depending on the workload). By constraining the amount of speculation in the program, mitigating only Load+Branch may also indirectly prevent some Load+Transmit and Load+Load+Transmit gadgets present in the program.
Intel and industry partners provide toolchain support for compiler and assembler tools that yield object files that satisfy the following property:
For all Load+Transmit gadgets in each procedure/function, every path in the control flow graph from Load to Transmit is "cut" by at least one
This property suffices to mitigate all Load+Transmit, Load+Load+Transmit, and Load+Branch gadgets (known as all-load-mitigation) in Intel SGX enclaves, assuming the mitigation is applied to all code that runs inside the enclave, including any code downloaded into or generated (for example, enclave with a JIT engine) inside the enclave at enclave runtime. This property also mitigates bounds check bypass vulnerabilities. Similar to LVI, the bounds check bypass gadget also necessarily consists of a load and a transmit instruction. Squashing the mispredicted transient execution after the load and before the transmit mitigates any bounds check bypass methods. Therefore, Intel SGX enclave code that is compiled with the LVI all-load-mitigation does not also require other bounds check bypass mitigations, such as speculative load hardening.
In general, it is difficult to analyze assembly code to discover data dependency chains that can form LVI gadgets. Therefore, Intel is making a patched GNU assembler available that trivially achieves the above property by inserting an
LFENCE instruction after each instruction that performs a load (the Instructions that Require Special Treatment section discusses instructions requiring special handling). Microsoft* is also releasing an update to the Visual C/C++ compiler with similar capability. The C and C++ languages have semantics that are more amenable to static analysis. To take advantage of this, Intel is collaborating with industry partners to develop an extension to the clang compiler (a part of the LLVM framework) that optimally inserts
LFENCE instructions to achieve the property stated above. This optimization approach is further explained in this article.
There are several x86 instructions that combine both a load and a dependent memory access or branch. For these instructions, the mitigation is more complicated than simply inserting an
LFENCE instruction. The first special case is the handling of function returns (for example,
RET instructions.) A compiler can replace all
RET instructions with a safe alternative. Specifically, it can identify an available scratch register, and replace each
ret with the following:
POP <scratch register> LFENCE # Forces the pop to retire JMP <scratch register>
This sequence has the same semantics as a
RET instruction but is not vulnerable to LVI. Unlike the compiler for C/C++ source code, the assembler is not able to infer liveness for registers, and thus it cannot reliably identify a scratch register. Instead, the assembler replaces each
RET instruction with the following sequence:
SHL QWORD PTR [rsp], 0 LFENCE RET
The rationale behind this sequence is explained in the Elaboration on ad-hoc Load+Branch mitigations section12.
The second exception is related to indirect call and indirect branch instructions with a memory operand. For example:
JMP QWORD PTR [rsi]
For the example above, a compiler can instead generate:
MOV <scratch register> , [rsi] LFENCE # Forces the prior MOV to retire JMP <scratch register>
If a scratch register is not available, a compiler might instead replace the indirect jump/call from memory with the following instruction sequence that uses a general purpose register (GPR):
XOR QWORD PTR <some GPR> , [rsi] XOR QWORD PTR <some GPR> , [rsi] LFENCE JMP QWORD PTR [rsi]
XOR instructions do not alter the GPR contents, but do change flags. The compiler should only use this sequence if the changes to flags are acceptable.
Some compilers have options that prevent the compiler from generating indirect calls or branches through memory, which is clearly helpful in mitigating LVI. The Intel SGX SDK takes advantage of this and facilitates Intel SGX developers doing likewise.
Unlike a compiler, an assembler is not able to infer liveness for registers or flags, thus can not use either sequence. If the assembly source code contains indirect calls or branches through memory, manual inspection and modification is required to apply the LVI mitigation, considering whether a scratch register is available or flags can be changed. The updated GNU assembler discussed in the Tooling Support to Automate LVI Mitigation section will output a warning if it encounters indirect calls or branches through memory.
Note that the above mitigations which use indirect
CALLs are incompatible with retpoline (which replaces all such indirect
CALL instructions with
RET instructions). Retpoline is intended to mitigate branch target injection. Intel SGX-enabled processors with recent microcode updates will enumerate IBRS support and thus already mitigate branch target injection inside enclaves by ensuring that the predicted targets of near indirect branches executed inside an enclave cannot be controlled by software that executes outside the enclave. More details on this are in the guidance on Branch Target Injection.
There are also two
REP string instructions that require special treatment. Specifically, the compare string (
CMPS) and scan string (
SCAS) instructions set
EFLAGS in a manner that depends on the data being compared/scanned. Therefore, when used with a
REP prefix, the number of iterations may vary depending on this data. If the data is a program secret chosen by the adversary using an LVI method, then this data-dependent behavior may leak some aspect of the secret. The solution is to unfold any
REP CMPS and
REP SCAS operations into a loop, and insert an
LFENCE after the
SCAS instruction. For example,
REPNZ SCAS can be unfolded to:
.RepLoop: JRCXZ .ExitRepLoop # or JECXZ (see next line) DEC rcx # or ecx if the REPNZ SCAS uses a 32-bit address size SCAS LFENCE JNZ .RepLoop .ExitRepLoop: ...
For Intel SGX, enclave developers should evaluate the risk of potential LVI attack and performance implication of the mitigation, and decide whether to apply mitigations to their enclaves. For LVI-affected processors, the Intel SGX Attestation Service will report a new status code,
SW_HARDENING_NEEDED, to indicate the platform is affected by a security advisory for which software hardening is recommended.
The Intel SGX SDK will support building enclaves with different levels of software hardening against the potential LVI attack. In particular:
LFENCEinstructions in developers' code, nor in the linked enclave libraries provided by the SDK. Developers can manually modify the code to apply
JMPinstructions with an
LFENCE-protected instruction stream in developers' C/C++/assembly source code and linker configuration that selects the set of SDK-provided enclave libraries with the same mitigation.
LFENCEinstruction after each instruction that performs a load and replaces
JMPinstructions with an
LFENCE-protected instruction stream in developers' C/C++/assembly source code, and linker configuration that selects the set of SDK-provided enclave libraries with the same mitigation.
Both the Control-Flow-Mitigation and the All-Loads-Mitigation options have performance impacts that vary depending on the specific enclave code over which the mitigation is applied. The effect of the mitigations will vary by workload and in some cases may be significant, especially for the All-Loads-Mitigation. As the Intel SGX application includes both enclave code and non-enclave code and the mitigation is only applicable to the enclave code, the overall overhead at the Intel SGX application level is determined not only by the mitigation overhead introduced to the enclave, but also by the amount of time the code executes inside the enclave compared to execution outside of the enclave before the mitigation is applied.
Developers who choose to mitigate LVI can use LVI mitigation-enabled compilers and assemblers discussed in the Tooling Support to Automate LVI Mitigation section to apply the selected level of mitigation to their C/C++ and assembly source code for the enclave. The Intel SGX SDK simplifies this by letting developers choose the mitigation level rather than requiring developers to understand the tools' specific command line options. The SDK documentation has been updated to reflect these changes. It is worth noting that any library binary that is not recompiled or reassembled using the tool chain and configuration recommended by the Intel SGX SDK might not include the desired mitigations; neither will any dynamically generated code within the enclave at enclave runtime, if supported by the enclave.
Developers using third party SGX SDKs should consult their SGX SDK provider for mitigation plans and release timelines.
Enclave developers who want to support Intel SGX-enabled platforms should determine the level of software hardening that their environment requires, based on risk analysis and an evaluation of the performance impacts of mitigation.
If none of the supported platforms are affected by LVI, including LVI zero data, no additional action is required. If only some of the supported platforms are affected by LVI, developers could choose to release one version of the enclave with the selected level of mitigations enabled for all platforms. Alternatively, developers could release multiple versions of the enclave, with one version for platforms that are not affected by LVI which does not include mitigations, and another version which does include mitigations for platforms that are affected by LVI. As in all usages of Intel SGX that utilize Intel SGX remote attestation, developers should provide the identities of the enclaves (
ISVSVN and other relevant fields in the enclave
SIGSTRUCT) to the relying parties (verifier of the Intel SGX remote attestation data) so the relying parties can determine which enclave or version of an enclave they are communicating with.
Developers who choose to support multiple versions of enclaves should sign the enclaves to identify which enclaves include software mitigations against LVI. Furthermore, data sealed by an enclave that includes software hardening should not be unsealable by alternative versions of the enclave that do not include software hardening. One way to achieve this is to assign a higher enclave ISVSVN value to the enclave version with software hardening than you do to the enclave version without software hardening.
Intel has not been able to identify and successfully exploit any Load+Transmit or Load+Load+Transmit code gadgets inside the Intel enclaves involved in Intel SGX remote attestation. Nonetheless, out of an abundance of caution, Intel will release updates to those enclaves with All-Loads-Mitigation applied and conduct an Intel SGX trusted computing base (TCB) Recovery event to enable relying parties to tell whether the updated Intel SGX attestation enclaves were utilized.
Intel SGX Attestation services will indicate whether the platform the attestation request originated from is affected by LVI (LVI-stale-data and/or LVI zero data), through a new status code,
SW_HARDENING_NEEDED. A platform with the required version of microcode and Intel SGX attestation software stack, that is properly configured according to the relevant Intel SGX security advisories (for example, INTEL-SA-00233 and INTEL-SA-00219), will receive one of the two following status codes:
UP-TO-DATE: The platform is not affected by LVI
SW_HARDENING_NEEDED: The platform is affected by LVI.
The relying party should evaluate the potential risk of an attack on platforms affected by LVI and whether the attesting enclave employs adequate software hardening to mitigate the risk, which is reflected in the enclave identity (
ISVSVN and other relevant fields in the attestation data). The relying party might reject attestations from enclaves without appropriate LVI mitigations.
Because malicious adversaries have limited ability to influence the paging behavior of victim processes to cause faults or assists in non-SGX environments, LVI is not a practical exploit in real-world non-SGX environments. Developers can mitigate potentially vulnerable code by inserting additional
LFENCE instructions to block speculative activity or techniques like array index masking to prevent leaking data via covert channels. But because of the complexity of lining up all these conditions to successfully execute this method in a non-contrived scenario, software developers should carefully evaluate their environments and workloads before choosing to mitigate this.
An OS or VMM can discover their potential susceptibility to LVI stale data by determining whether the processor is affected by L1TF, MDS, TAA. In particular, processors with the combination of the three following properties are not affected by LVI stale data:
TAA_NO, or does not support Intel TSX, or has disabled TSX/RTM using
On processors that are affected by TAA but not by MDS, software that does not use loads within an Intel TSX region cannot be impacted by LVI stale data.
Intel SGX usage may need an alternative mechanism to detect whether the CPU is affected by LVI, as it does not trust the OS. Through Intel SGX remote attestation, a relying party can examine the remote attestation evaluation status code and tell whether the remote attestation request is from a platform affected by LVI (LVI stale data and/or LVI zero data). Refer to the Relying Parties section for details.
The effect of the
SHL instruction is to assert that the stack pointer refers to a valid page, without changing the contents of memory or clobbering any registers (including flags). The
LFENCE then ensures that the
SHL retires before finally issuing the
RET. This ensures that instructions dependent on the
RET will not transiently execute if the
SHL instruction signals a fault/assist. Enclave entry will clear out buffers affected by MDS or L1TF, and thus an attacker cannot inject non-zero data to the
RET if they enter the enclave between the
The soundness of the
SHL+LFENCE+RET sequence should not only depend on the length of the transient window. For (non-NULL) LVI, microarchitectural buffers and the data cache unit are cleared at the end of
ERESUME, so if the
ERESUME hits after the
SHL, then malicious actors would have nothing to inject. For LVI zero data, it is possible to inject 0 as the return address, which will cause transient execution to jump to
LIP 0. If
LIP 0 is (architecturally) outside of the enclave, the
RET instruction will fault and
AEX will be delivered. The microarchitecture will not allow instructions to be transiently fetched and executed in this case. If
LIP 0 is (architecturally) inside of the enclave and
LIP 0 is executable with valid instructions at
LIP 0, then these instructions may be transiently executed. Note however that the Intel SGX SDK will not build enclaves with instructions or execute permission at the beginning of the enclave. For other SGX SDKs, a trivial mitigation is to similarly build these enclaves such that they do not have instructions at the beginning of the address space. In summary, even if a malicious OS maps the beginning of the enclave at
LIP 0, there will be no executable instructions at
An LVI zero data attack can be used to hijack the Intel SGX enclave stack pointer during transient execution. In the following example, on a processor affected by LVI zero data, the attack is able to cause the load from
0x58(%rsp) to fault and the dependent instructions in the code gadget to forward value 0 to the
rsp register. At that point, all subsequent
RET instructions dereference the malicious memory page mapped at virtual address 0, outside of the enclave. By filling the memory at virtual address 0 with specifically crafted content, the attacker is able to cause the transient execution to branch to any desired code gadget within the enclave. The attacker might also be able to mount a transient ROP attack by chaining together multiple subsequent
MOV rbp, QWORD PTR [rsp+58h] # Fault, rbp <- 0 ... MOV rsp, rbp # rsp <- 0 POP rbp # rbp <- *(0) RET # rip <- *(0 + 8)
When the Control-Flow-Mitigation, for example, the
SHL-LFENCE sequence inserted before the
RET instruction, is applied to the code above, the
LFENCE before the
RET instruction ensures that the Load fault will retire before the next instruction can execute. As a result, the fault will be signaled and transient execution cannot reach the
RET instruction to branch to the attacker's desired code gadget or any subsequent
MOV rbp, QWORD PTR [rsp+58h] # Fault, rbp <- 0 ... MOV rsp, rbp # rsp <- 0 POP rbp # rbp <- *(0) SHL QWORD PTR [rsp], 0 # *(8) <- *(8) LFENCE # Forces prior Load from [rsp+58h] to # retire. Fault signaled. RET
SHL-LFENCE-RETsequence has the same security properties, but requires one less instruction.