Security Software

Guidelines for Mitigating Timing Side Channels Against Cryptographic Implementations

The primary concern with side channels is the disclosure of secrets. Secrets are broadly defined as any data that should not be seen or known by other users, applications, or even other code modules. The most common type of secrets that malicious actors pursue through side channel methods include API keys, user passwords, or cryptographic keys because these may allow malicious actors to decrypt or access other protected secrets.

Timing side channels can be classified into two broad categories: traditional side channels, which take advantage of architecturally committed operations, and speculative execution side channels, which take advantage of operations that only execute speculatively and thus are not committed into the architectural state.

Traditional timing side channels take advantage of execution timing differences. These timing differences can be found in the mutual access of shared system resources (for example, branch predictors, cache lines, translation lookaside buffers (TLB), execution ports), instructions that have varying execution timing based on input, or observable differences in API execution time. The methods used to observe these timing differences vary between specific side channels, but developers can follow general best practices to defend against them.

Most traditional timing side channels can be mitigated by applying all three of the following general principles, listed here at a high level. We discuss details and examples of these principles later.

  1. Ensure runtime is identical regardless of secret or input values1.
  2. Ensure code access patterns2 are identical regardless of secret or input values.
  3. Ensure data access patterns3 are identical regardless of secret or input values.

These principles are conceptually simple, but in practice sometimes can be difficult to implement depending on the complexity of your algorithms. Luckily, most developers won’t need to work on the most complex situations.

Speculative execution side channels methods may need different mitigations than those discussed in this article. In particular, some speculative execution side channel methods may be able to generate a universal read gadget in either the cryptographic software or other software in the same address space as secret data. You can find additional resources and techniques for mitigating speculative execution side channels on the Security Findings and Mitigations site. These techniques include compiler techniques that constrain speculation as well as broader software techniques like moving secrets to another address space.

Constant Time Code and Compiler Optimizations

Before we describe constant-time principles, a word about compiler optimizations and the effect on generated code: In short, be wary of code generated from source code that appears to adhere to all of these recommendations. Compilers with optimizers will look for ways to make the code faster, and sometimes the changes they introduce are quite unexpected. For any code that you write—whether C or hand-coded assembly—you must verify constant-time operation in the environment where you expect to use it.

In the C language, keywords like volatile, register, inline, and others can change the way compiler optimizations are applied. Using compiler command line options, such as gcc’s -O0 through -O3, can also generate vastly different machine code for the same simple lines of source code. The C language keywords can successfully mitigate some unexpected compiler optimizations for some compilers, but you must not rely on these keywords since they are only considered to be hints to the code generators. Code generator implementations may change at any time.

Many languages do not have good support—or any support—for providing code generation hints to the compiler or just-in-time (JIT) compiler for managed and dynamic languages. When considering constant time coding principles in these environments, it’s important to be extra vigilant about what is actually executing on the processor.

Examples in This Article

Code examples in this article are (mostly) similar to C, and are intended to illustrate code flow and logic structure. They assume a compiler that will always generate code that does not add unexpected optimizations that defeat the intended code flows. It is up to the reader to understand their own individual toolset and account for specific behavior in their environment.

Real-World Example

Many of the techniques discussed later in this article are implemented by a set of patches to OpenSSL listed below. You may find it instructive to look at these patches and see real-world examples of the recommendations in this article.

Principles of Traditional Side Channel Resistant Coding

Let’s start by elaborating on the high level principles enumerated above.

  1. Ensure the execution time is identical regardless of secret values4.

    Timing side-channels take advantage of the smallest differences in execution time, sometimes at the resolution of a single instruction. Ensure algorithms never ignore bits of a secret for performance, and only use instructions whose latency is invariant to the data values. This requirement applies to the primary operation as well as any subfunctions.

  2. Ensure code access patterns are identical regardless of secret values.

    There must be no conditional branches based on secret values and no indirect branches whose target is affected by secret values. In both cases, there must never be a case where the value of a secret changes the offset within a cache line of the next instruction to execute.

  3. Ensure data access patterns are identical regardless of secret values.

    Secret values should not change the address or data size of loads/stores (including by changing the order of the loads/stores). This means ensuring no secret values change the offset within a cache line of a load/store. For example, don’t use a secret value to load a specific location from a table representing an s-box without precautions.

Example: Information Leakage Based Conditional State Change

Principles: identical runtime, identical code access, identical data access

Some algorithms require conditional state changes and use simple structures like the example below:

int cond_update(int newstate, int a, int x) {
  static int state = 0;

  if (a == x) {
    state = newstate;
  }

  return state;
}

Compiling this code with gcc 7.3.0 with -O1 produces the following assembly code5:

cond_update:
  cmp    %edx,%esi
  je     do_update
do_return:
  mov    $state,%eax
  retq
do_update:
  mov    %edi,$state
  jmp    do_return

This simple code flow will exhibit execution time differences depending on whether or not the condition is true. Code access patterns may be visible by timing cache accesses or via resource contention. Data access patterns will also vary when the condition is true.

The solution for simple cases like this is to use the time-invariant CMOV family of instructions. The CMOV instructions issue all memory read and write operations and their latency does not depend on whether the condition evaluates to true or false6. The resulting instructions sequence should look like the following.

cond_update:
  cmp    %edx,%esi
  cmove  %edi,$state
  mov    $state,%eax
  retq

Example: Information Leakage Based on Comparison Timing

Principles: identical runtime, identical code access

Operations such as message authentication code (MAC) and password processing are especially sensitive to timing side channels. Both cases include a step that compares two values. If the comparison time is dependent on the inputs, malicious actors can use the timing data to learn valuable information. These exploits, known as oracle attacks, can even target processes that are not vulnerable to speculative execution side channels and be exploited at an API level.

Consider the following example applied to simple password comparison, where the correct password is passed as a and the adversary’s guess is passed as b.

bool equals(byte a[], size_t a_len,
            byte b[], size_t b_len) {
  if (a_len != b_len)
    return false;
  for (size_t i = 0; i < a_len; i++) {
    if (a[i] != b[i]) {
      return false;
    }
  }
 
  return true;
}

If we assume that each comparison operation takes one unit of time to execute, there are a couple oracles in this function. First, the malicious actor can determine the correct length of the password by repeatedly causing the function to be invoked with values of increasing length until a runtime measurement of more than one unit is found.

Second, the malicious actor can efficiently find the password one byte at a time by causing the function to be invoked with b values of length a_len (discovered using the first oracle), and then increment through valid byte values (for example, for the first byte, the malicious actor sends aaaa, baaa, caaa, etc.) until an increase in execution time is detected. The increase in execution time means the byte being tested has matched, so the adversary can then send queries that increment the next byte.

To put the above mechanism into more concrete terms, let’s look at an example system that uses a password between 3 and 5 bytes long and only allows ASCII characters a through z. If a malicious actor wants to find a user's password, and the user's password is oops, the malicious actor uses the following invocations to gain information:

  • aaa (time=1), aaaa (time=2, secret length=4)
  • baaa (time=2), … oaaa (time=3, first byte is o)
  • obaa (time=3), … ooaa (time=4, second byte is o)
  • ooba (time=4), … oopa (time=5, third byte is p)
  • oopb (time=5), … oops (login successful! User’s password is oops)

The solution is to use a constant time comparison, as well as always maintaining a and b in known length buffers. In an actual implementation, the buffers would be allocated to the maximum size of a password (perhaps 64 bytes). The comparison between the two passwords would then always iterate over the entire length of the buffer (64 bytes), even when the actual password and the guess are smaller than the buffer size. It is then possible to use a constant time comparison function like the following:

bool constant_equals(byte a[], byte b[], size_t len) {
  byte x = 0;
 
  for (size_t i = 0; i < len; i++) {
    x |= a[i] ^ b[i];
  }

  return (x==0);
}

The constant_equal function uses the XOR operator on each byte to compute differences and the bitwise OR operator to accumulate the differences. If all bytes are the same, it returns true because X will be zero. The important thing to understand is that the time to compare the two values is constant for all invocations of the comparison, preventing the side channel demonstrated above.

The example above uses raw passwords as an example only. In production systems, the recommendation for passwords is to compare hashes instead of raw passwords (salted hashed passwords are the preferred approach). But the same advice on constant time comparison applies. All bytes of the password hashes should be compared using the XOR approach above that OR’s the result into an accumulation flag to indicate if the hash comparison succeeded or failed.

Example: Information Leakage Based on Secret Data

Principles: identical timing, identical code access, identical data access

Some of the underlying mathematical primitives used to perform cryptographic operations can exhibit detectable timing differences depending on the key material being used when the implementation is simple or is heavily optimized for speed over security. An example of this is the primitive used to implement the RSA and Diffie-Hellman algorithms: modular exponentiation.

To most clearly illustrate the the side-channel, let’s consider the most naïve implementation of modular exponentiation. To compute the value c=ab mod n, the simplest implementation does the following:

  1. Set c to 1.
  2. Scanning from most significant to least significant bit of exponent b.
    1. Square c (c=c*c mod n).
    2. Multiply c with a (c=c*a mod n) if the current bit is a 1.

Assuming all multiplies take the same time to execute, the amount of time the modular exponentiation computation takes is directly related to the number of 1 bits in the exponent. Therefore, this method causes the computation timing to vary depending on the key material, both for the overall computation and for each individual step involving a bit of b. Observing the time required for each step reveals the bits of b. When performing a private key operation using RSA, b is the private key, so discovering the bits of b is equivalent to stealing the private key.

A method that always executes the same code path and has a constant execution time uses an intermediate value r and performs the calculation like this:

  1. Set c to 1.
  2. Scanning from most significant to least significant bit of exponent b.
    1. Square c (c=c*c mod n).
    2. Compute r=(a-1)*current_bit_of(b)+1.
    3. Multiply c with r (c=c*r mod n).

With the revised process, the intermediate value r will be either 1 or a if the current bit of b is 0 or 1, respectively. The code will always take the same execution path and will always execute the same number of multiplies. The downside of this method is that performance is significantly slower than the simple method. The simple method will execute an average of 1.5 multiplies per bit in the exponent, where the revised method will always perform 3 multiples per bit.

 

A method to enhance the performance of RSA computations is to use a fixed sliding window method of computation. With this technique, multiple bits of the exponent are processed at the same time7. At the start of the operation, you do the following:

  1. Chose a window size, w.
  2. 2w powers of a are precomputed as apre=[a0 mod n, a1 mod n, …, a2^w-1 mod n], and c is set to 1 as before.
  3. The exponent b is now scanned from the most significant bit to the least significant bit; w bits are computed at a time.
  4. The value of c is squared w times—in other words, for each bit in the window, c is squared just as in the previous algorithm.
  5. Then c is multiplied by the precomputed power of a using the appropriate stored value in apre indexed by the current set of w bits treated as an integer index to apre.

In addition to achieving constant execution times, the fixed sliding window method also achieves the goal of not branching based on the content of secret data, because the fixed sliding window method follows the same code path regardless of the private key value.

Note that this implementation is still vulnerable to side channels based on the pattern of data access, so it should not be used alone. To satisfy the principle of ensuring data access patterns are identical, the code that selects the precomputed apre value must access all values of apre consistently. Each access must have identical size—for example, 4 bytes—the addresses accessed must be 4 byte aligned, and accessed in exactly the same order. There are a couple ways to accomplish this. One is the scatter-gather method employed by libraries like OpenSSL. Another is to use a scan loop through apre using instructions such as the CMOV family discussed earlier using a test on the w bits of b and the index into apre.

An important consideration when implementing these numeric methods is to ensure that the base multiplication is constant time, has identical execution path for all values, and has identical data access for all values.

Example: Observation of Lookup Table Access

Many block ciphers make use of lookup tables. For example, the AES algorithm defines a procedure called SubBytes() that performs a byte-for-byte substitution using a table called an S-box. The S-box is a 256-byte constant, which spans four cache lines on most processors. The table is public, but the indexing into the table is private. By watching which one of the four cache lines is accessed in a straightforward implementation of SubBytes(), a spy process can obtain 2 bits of information about a secret round key. If the spy process can determine the 32-bit word within a cache line where the lookup occurs it can obtain 6 bits of information about a secret round key. Although the spy process cannot know the exact values of the S-box accessed, repeating this observation many times for many encryptions may reveal the entire private key.

To ensure memory access patterns are the same regardless of the input value, the code must access all table data with the same data width, the exact same address order, and identical data alignment regardless of the table entry to lookup. The required value can be extracted using instructions such as the CMOV family. For performance, we recommend that tables are aligned on a 64-byte boundary—the typical size of a cache line—and table entries are aligned on a 4-byte boundary. If a spy process observes the cache, it observes the same accesses in the same order, preventing anomalies that can disclose secrets through this side channel.

For algorithms like AES where the S-box values are computed deterministically, it is also an option to compute the replacement value for each substitution instead of performing a table lookup.

Summary

Developers who wish to protect secret data against timing side channel methods should ensure that their code execution time, data access patterns, and code access patterns are identical regardless of secret or input values. Even developing simple cryptographic libraries that are resistant to timing side channel methods requires adhering to these guidelines as well as having a thorough background in cryptographic principles, algorithms, and implementations. Most developers should use existing tools and libraries that already follow the best practices described here, instead of trying to develop their own routines. Developers can refer to Security Best Practices for Side Channel Resistance for more generic techniques. Additional precautions are required to protect secret data from speculative execution side channel methods. Refer to the Software Security Findings and Mitigations site for specific mitigations for speculative execution side channel methods.

Footnotes

  1. For some constant set of algorithm parameters—for example: AES block encrypt with same length key, RSA signature generation with same size modulus.
  2. A program's code access pattern is the order and address of instructions that it executes.
  3. A program's data access pattern is the order and address of memory operands that it loads and stores.
  4. For some constant set of algorithm parameters—for example: AES block encrypt with same length key, RSA signature generation with same size modulus.
  5. See https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI for calling convention description.
  6. In the future, it is possible that new ISA or modes will be created where the latency varies based on the condition.
  7. A reference for the basic sliding window algorithm is Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone. The Handbook of Applied Cryptography, CRC Press, 1996. Algorithm 14.109, pg 624.

Was this article helpful?YesNo
0% of users found this helpful

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at www.intel.com.

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

Intel provides these materials as-is, with no express or implied warranties.

No product can be absolutely secure.

Intel, the Intel logo, Intel Core, Intel Atom, Intel Xeon, Intel Xeon Phi, Intel® C Compiler, Intel Software Guard Extensions, and Intel® Trusted Execution Engine are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.