Security Software

Guidelines for Mitigating Timing Side Channels Against Cryptographic Implementations

The primary concern with side channels is the disclosure of secrets. Secrets are broadly defined as any data that should not be seen or known by other users, applications, or even other code modules. The most common type of secrets that malicious actors pursue through side channel methods include API keys, user passwords, or cryptographic keys because these may allow malicious actors to decrypt or access other protected secrets.

Software side channels can be classified into two broad categories: traditional side channels, which take advantage of architecturally committed operations, and speculative execution side channels, which take advantage of operations that only execute speculatively and thus are not committed into the architectural state.

Traditional timing side channels take advantage of execution timing differences. These timing differences can be found in the mutual access of shared system resources (for example, branch predictors, cache lines, translation lookaside buffers (TLB), execution ports), instructions that have varying execution timing based on input, or observable differences in API execution time. The methods used to observe these timing differences vary between specific side channels, but developers can follow general best practices to defend against them.

Most traditional timing side channels can be mitigated by applying all three of the following general principles, listed here at a high level. We discuss details and examples of these principles later.

  1. Ensure runtime is independent of secret values1.
  2. Ensure code access patterns2 are independent of secret values.
  3. Ensure data access patterns3 are independent of secret values.

These principles are conceptually simple, but in practice sometimes can be difficult to implement depending on the complexity of your algorithms. Luckily, most developers won’t need to work on the most complex situations.

Speculative execution side channels methods may need different mitigations than those discussed in this article. In particular, some speculative execution side channel methods may be able to generate a universal read gadget in either the cryptographic software or other software in the same address space as secret data. You can find additional resources and techniques for mitigating speculative execution side channels on the Security Findings and Mitigations site.

Constant Time Code and Compiler Optimizations

Before we describe constant-time principles, a word about compiler optimizations and the effect on generated code: In short, be wary of code generated from source code that appears to adhere to all of these recommendations. Compilers with optimizers will look for ways to make the code faster, and sometimes the changes they introduce are quite unexpected. For any code that you write—whether C or hand-coded assembly—you must verify constant-time operation in the environment where you expect to use it.

In the C language, keywords like volatile, register, inline, and others can change the way compiler optimizations are applied. Using compiler command line options, such as gcc’s -O0 through -O3, can also generate vastly different machine code for the same simple lines of source code. The C language keywords can successfully mitigate some unexpected compiler optimizations for some compilers, but you must not rely on these keywords since they are only considered to be hints to the code generators. Code generator implementations may change at any time.

Many languages do not have good support—or any support—for providing code generation hints to the compiler or just-in-time (JIT) compiler for managed and dynamic languages. When considering constant time coding principles in these environments, it’s important to be extra vigilant about what is actually executing on the processor.

Real-World Example

Many of the techniques discussed later in this document are implemented by a set of patches to OpenSSL listed below. You may find it instructive to look at these patches and see real-world examples of the recommendations in this document.

Principles of Traditional Side Channel Resistant Coding

Let’s start by elaborating on the high level principles enumerated above.

Timing side channels take advantage of the smallest differences in execution time, sometimes at the resolution of a single clock cycle. Ensure algorithms never ignore bits of a secret for performance, and only use instructions whose latency is invariant to the data values. This requirement applies to the primary operation as well as any subfunctions.

The value of a secret must not affect a conditional branch or the target of an indirect branch. The address sequence of executed instructions must be independent of secret values.

Secret values should not cause a change to the order of accessed addresses or data size of loads/stores. For example, don’t use a secret value to load a specific location from a table representing an s-box without precautions.

  1. Ensure the execution time is independent of secret values.4
  2. Ensure code access patterns are independent of secret values.
  3. Ensure data access patterns are independent of secret values.

Principle: Information Leakage Based on Comparison Timing

Note: Code examples for these principles are being updated.

Operations such as message authentication code (MAC) and password processing are especially sensitive to timing side channels. Both cases include a step that compares two values. If the comparison time is dependent on the inputs, malicious actors can use the timing data to learn valuable information. These exploits, known as oracle attacks, can even target processes that are not vulnerable to speculative execution side channels and be exploited at an API level.

Principle: Information Leakage Based on Secret Data

Some of the underlying mathematical primitives used to perform cryptographic operations can exhibit detectable timing differences depending on the key material being used when the implementation is simple or is heavily optimized for speed over security. An example of this is the primitive used to implement the RSA and Diffie-Hellman algorithms: modular exponentiation.

To most clearly illustrate the side channel, let’s consider the most naïve implementation of modular exponentiation. To compute the value c=ab mod n, the simplest implementation does the following:

  1. Set c to 1.
  2. Scanning from most significant to least significant bit of exponent b.
    1. Square c (c=c*c mod n).
    2. Multiply c with a (c=c*a mod n) if the current bit is a 1.

Assuming all multiplies take the same time to execute, the amount of time the modular exponentiation computation takes is directly related to the number of 1 bits in the exponent. Therefore, this method causes the computation timing to vary depending on the key material, both for the overall computation and for each individual step involving a bit of b. Observing the time required for each step reveals the bits of b. When performing a private key operation using RSA, b is the private key, so discovering the bits of b is equivalent to stealing the private key.

A method that always executes the same code path and has a constant execution time uses an intermediate value r and performs the calculation like this:

  1. Set c to 1.
  2. Scanning from most significant to least significant bit of exponent b.
    1. Square c (c=c*c mod n).
    2. Compute r=(a-1)*current_bit_of(b)+1.
    3. Multiply c with r (c=c*r mod n).

With the revised process, the intermediate value r will be either 1 or a if the current bit of b is 0 or 1, respectively. The code will always take the same execution path and will always execute the same number of multiplies. The downside of this method is that performance is significantly slower than the simple method. The simple method will execute an average of 1.5 multiplies per bit in the exponent, where the revised method will always perform 3 multiples per bit.

A method to enhance the performance of RSA computations is to use a fixed sliding window method of computation. With this technique, multiple bits of the exponent are processed at the same time5.

In addition to achieving constant execution times, the fixed sliding window method also achieves the goal of not branching based on the content of secret data, because the fixed sliding window method follows the same code path regardless of the private key value.

Note that this implementation is still vulnerable to side channels based on the pattern of data access, so it should not be used alone. To satisfy the principle of ensuring data access patterns are identical, the code that selects the precomputed apre value must access all values of apre consistently. Each access must have identical size—for example, 4 bytes—the addresses accessed must be 4 byte aligned, and accessed in exactly the same order. This can be accomplished with a scan loop through apre using instructions such as the CMOV family using a test on the w bits of b and the index into apre.

An important consideration when implementing these numeric methods is to ensure that the base multiplication is constant time, has identical execution path for all values, and has identical data access for all values.

Principle: Observation of Lookup Table Access

Many block ciphers make use of lookup tables. For example, the AES algorithm defines a procedure called SubBytes() that performs a byte-for-byte substitution using a table called an S-box. The S-box is a 256-byte constant, which spans four cache lines on most processors. The table is public, but the indexing into the table is private. By watching which one of the four cache lines is accessed in a straightforward implementation of SubBytes(), a spy process can obtain 2 bits of information about a secret round key. If the spy process can determine the 32-bit word within a cache line where the lookup occurs it can obtain 6 bits of information about a secret round key. Although the spy process cannot know the exact values of the S-box accessed, repeating this observation many times for many encryptions may reveal the entire private key.

To ensure memory access patterns are the same independent of the input value, the code must access all table data with the same data width, the exact same address order, and identical data alignment regardless of the table entry to lookup. The required value can be extracted using instructions such as the CMOV family. For performance, we recommend that tables are aligned on a 64-byte boundary—the typical size of a cache line—and table entries are aligned on a 4-byte boundary. If a spy process observes the cache, it observes the same accesses in the same order, preventing anomalies that can disclose secrets through this side channel.

For algorithms like AES where the S-box values are computed deterministically, it is also an option to compute the replacement value for each substitution instead of performing a table lookup.

Conclusion

Developers who wish to protect secret data against timing side channel methods should ensure that their code execution time, data access patterns, and code access patterns are identical independent of secret values. Even developing simple cryptographic libraries that are resistant to timing side channel methods requires adhering to these guidelines as well as having a thorough background in cryptographic principles, algorithms, and implementations. Most developers should use existing tools and libraries that already follow the best practices described here, instead of trying to develop their own routines. Developers can refer to Security Best Practices for Side Channel Resistance for more generic techniques. Additional precautions are required to protect secret data from speculative execution side channel methods. Refer to the Software Security Findings and Mitigations site for specific mitigations for speculative execution side channel methods.

Footnotes

  1. For some constant set of algorithm parameters—for example: AES block encrypt with same length key, RSA signature generation with same size modulus.
  2. A program's code access pattern is the order and address of instructions that it executes.
  3. A program's data access pattern is the order and address of memory operands that it loads and stores.
  4. For some constant set of algorithm parameters—for example: AES block encrypt with same length key, RSA signature generation with same size modulus.
  5. A reference for the basic sliding window algorithm is Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone. The Handbook of Applied Cryptography, CRC Press, 1996. Algorithm 14.109, pg 624.

Was this article helpful?YesNo
0% of users found this helpful

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software, or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at www.intel.com.

All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest Intel product specifications and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request.

Intel provides these materials as-is, with no express or implied warranties.

No product can be absolutely secure.

Intel, the Intel logo, Intel Core, Intel Atom, Intel Xeon, Intel Xeon Phi, Intel® C Compiler, Intel Software Guard Extensions, and Intel® Trusted Execution Engine are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.