The digital random number generator (DRNG) behind Intel® Data Protection Technology with Secure Key provides high-quality random numbers that are accessible via the CPU instruction RDRAND. This easy-to-use feature is of great benefit to virtualized environments where limited system entropy must be divided up among a large number of virtual machines. Secure Key’s extremely high data rates—measured in the hundreds of MB/sec—combined with its accessibility via a single CPU instruction ensures that it can supply sufficient entropy to all of the virtual machines on a single system, even under a heavy load, without fear of starving any of them.
In a virtual environment without the benefit of Intel® Secure Key, the operating system must rely on hardware interrupts from system activity as a source of entropy. While this can be an acceptable solution for a single client system, this method does not scale well to virtual hosts for several reasons:
The end result is that virtual machines are dividing up a very limited entropy source and assuming that there is more entropy in their pools than is actually available. Secure Key solves this problem by providing a reliable source of entropy with extremely high throughput that can be distributed to individual processes. Each RDRAND instruction results in a random number delivered only to the thread on the virtual machine that requested it, allowing each machine to have its own, discreet source of entropy.
Information about Intel® Secure Key and the DRNG can be found in the Software Implementation Guide.
To test Secure Key’s ability to meet the entropy demands of a large, virtual environment, we designed a test configuration that was designed to maximize the entropy demands of each virtual host. The hypervisor software, VMware* ESXi 5.1, was installed on a system with two pre-production Intel® Xeon® E5-2650 v2 processors and 64 GB of RAM. This hardware configuration provides 24 physical cores and 48 hardware threads.
Within ESXi we create a total of sixty virtual machines, all clones of a single OS image: Ubuntu* 12.04.2 LTS 64-bit, with one virtual processor. Note that this setup oversubscribes the hardware.
The Ubuntu guest hosts all ran the latest build of the rngd daemon from the rng-tools package. This was obtained from the source repository on github*, and ensures support for Secure Key. The purpose of rngd is to monitor the kernel’s entropy pool, and fill it as needed from external hardware sources of random bytes.
The Secure Key-enabled rngd uses the DRNG as an input source. The DRNG guarantees a reseed of its hardware-based pseudorandom number generator after producing 512 128-bit samples, and thus can produce seed-grade entropy that is acceptable to the Linux kernel by employing AES mixing to combine intermediate samples per the DRNG Software Implementation Guide.
To place a maximum load on the kernel’s entropy pools, the rngtest utility from the rng-tools package was run using /dev/random as an input source. Per the man page, rngtest uses the FIPS 140-2 tests to verify the randomness of its input data and also produces statistics about the speed of the input stream. Used in this manner, rngtest consumes entropy from /dev/random faster than it can be supplied by rngd so that any bottlenecks in the system occur in the source.
The test methodology was as follows:
This procedure resulted in an increasing entropy demand on Secure Key. The more VMs active, the more random bytes the DRNG needed to deliver to the various rngd instances.
The performance limits of Secure Key gave us a rough idea of what to expect. On the E5-2560 v2 processor, the bus connecting the CPU cores to the DRNG limits the total number of RDRAND transactions across all hardware threads on the CPU to about 47.5 million RDRANDs/second. The round-trip latencies for a RDRAND transaction limit each individual hardware thread to about 9 million RDRAND/second. On a 64-bit OS, a RDRAND transaction can be up to 64 bits, so we have the following limits on RDRAND throughput:
RDRAND throughput scales linearly with the number of threads until the total throughput limit is reached (in this case, 380 MB/sec). However, we have two CPUs in the test system, so that doubles the maximum throughput to 760 MB/sec. Hence, we expect the DRNG to maintain a supply rate of 73 MB/sec to each VM until we have more than 10 active VMs.
When our test is running in 11 VMs, the throughput ceiling is reached, and the fixed, total entropy supply of 760 MB/sec will get divided up amongst the VMs. As more VMs are added, it should be divided even further, with each VM getting a smaller and smaller share, averaging out to 760/n MB/sec where n is the number of virtual machines. There may, however, be some jitter in the results due to congestion on the bus.
The next transition should occur at 25 VMs, where the number of active guests exceeds the physical cores in the test system. Here, we expect to see even more jitter in the results as the CPU relies on Hypter Threading Technology to manage the additional software threads. Though DRNG performance scales with Hyper Threading, the guest OS (and rngd) is doing more than just requesting random numbers. The average entropy rate per VM will continue to trail off, but there should be some variation in each VM’s individual supply.
The last transition is at 49 VMs. Here, the number of guest machines exceeds the physical resources of the CPU. As the threads stack up, the RDRAND requests just get serialized so each VM should see a roughly equal share of entropy, but some threads may get more than others. We expect to see the average entropy rate per VM trail off as we keep adding machines, but with some bumps in each VM’s individual supply rate.
Rngtest reports the input channel speed, in our case the bit rate coming from /dev/random, in Kibits/sec, and rngd is performing a data reduction of 512:1 when generating seed-grade entropy from RDRAND. Converting MB/sec to Kibits/sec and dividing by 512 results in the following expectations from rngtest:
|VM Count||Average Input Channel Speed (Kibit/sec)|
Table 1. Expected input channel speeds
Again, the guest OS is doing more than just requesting random numbers from the DRNG so we should expect to see slightly lower performance figures, but these make a useful, theoretical limit.
The theoretical and actual performance figures are shown in Figure 1.
Figure 1. Actual vs. Expected Entropy Rates per VM
With only a few exceptions, the measured bit rate per VM very closely matched with expectations. At one VM, five VMs, and nine VMs, there is a curious drop in the average bit rate that is unexplained. It is interesting that these anomalies occur at the start of a group of four, but the underlying architectural cause is unknown.
Above nine simultaneous VMs, the bit rate drops more quickly than expected, and is probably due to saturation on the bus. Still, the bit rates stay within about 10% of expectations. Above 24 VMs, the difference between expected and actual throughput is barely noticeable.
When the VM count exceeds the number of physical cores, the per-VM throughput varies significantly for each guest, as shown in Figure 2. At this point, the hypervisor is relying on Hyper-Threading to handle the additional workload. When the VM count exceeds the number of physical threads, the hypervisor is oversubscribed and thread scheduling becomes the dominant performance driver. Despite these extreme demands on the system, entropy is still available to every guest OS. At our test limit of 60 virtual machines, each VM was seeing an entropy supply rate of about 200 Kbits/sec (roughly 25 KB/sec).
Through our tests we were able to validate that Intel® Secure Key has sufficient throughput to supply entropy to a large number of VMs, and at very high bit rates. Even when the number of active virtual machines on the system exceeds the cores and physical threads, there is still entropy available at bitrates measured in KB/sec.
In a production data center we would not expect to see such continuous taxing of the DRNG from multiple concurrent VMs, much less on an oversubscribed system. Although this is clearly an artificial test, what it does prove is that Intel® Secure Key is capable of serving entropy to a large number of virtual machines even under the most extreme conditions.
Figure 2. Entropy rates per VM
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804