In this article we’ll take a closer look at AES counter (CTR) mode implementation from Intel® AES-NI library (it can be downloaded from http://software.intel.com/en-us/articles/download-the-intel-aesni-sample-library/).
AES stands for Advanced Encryption Standard and it is a symmetric encryption standard. More detailed information about AES at http://de.wikipedia.org/wiki/Advanced_Encryption_Standard.
AES-NI refers to Intel® Advanced Encryption Standard (AES) Instructions Set which is comprised of 7 new instructions targeting different phases from the AES encryption/decryption standard. [More details can be found here (http://software.intel.com/en-us/articles/intel-advanced-encryption-standard-instructions-aes-ni/).]
Block ciphers can be used to encrypt/decrypt a stream of data in a number of ways, which are called modes of operation. The details of the counter mode can be looked up in a number of places (e.g. http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation), but at a high level it encrypts successive values of a “counter”, and then XORs the input data with the encrypted counter values.
The size of the counter is the same as the block size of the cipher, which in the case of AES is 128 bits. The definition of counter mode does not specify how the counter is “incremented”, as long as values do not repeat for a sufficiently long number of increments. Things such as an linear feedback shift register (LFSR) could be used for a counter, but in practice the increment function is almost always some form of addition by 1.
The exact definition of the increment function is therefore defined at a higher-level, and there is in general a different definition for each application. Typically, the counter consists of a fixed portion (typically the IV or Initialization Vector) and a variable portion. Only the variable portion changes during the increment operation, so if the variable portion is n-bits wide, then the counter will repeat after 2^n increments.
The implementation of counter mode in the Intel AES-NI sample library implements a Big-Endian 32-bit increment. That is, the most significant 32 bits of the counter are incremented by 1 (when viewed as a big-endian integer), and the remaining 96 bits are unchanged. This is the definition required by GCM (Galois Counter Mode).
In the following code excerpt of the iEnc192_CTR function from intel_aes_lib/asm/x64/iaesx64.s file the paddd SIMD instruction is used to implement the 32 bit Big-Endian increment function.
pshufb xmm0, xmm6 ; byte swap counter back
paddd xmm5,[counter_add_one wrt rip]
add rdx, 16
pxor xmm0, xmm4
If some other increment function is desired (e.g. a Big-Endian 64-bit increment, or a Little-Endian increment), then there are two options:
- To modify the existing code to implement the different increment function (which will in general give the best performance) or
- To write a new function which implements desired increment function and use the AES-NI library functions iEnc128(), iEnc192(), or iEnc256() to encrypt the counter values.
The first option could be for example achieved by replacing paddd with paddq in the above code excerpt and therefore changing to a 64 bit Big-Endian increment function instead of the 32 bit one (to get the correct behavior for any input stream length paddd must be replaced in the load_and_inc4 macro as well).