Optimized Pseudo Random Number Generators with AVX2

Intel® Math Kernel Library includes powerful and versatile random number generators that have been optimized to take full advantage of Intel® Advanced Vector Extensions 2 (aka Intel® AVX2) introduced with the Haswell CPUs.

In this post, I’ll explain how to use a random number generator that benefits from Intel® AVX2 and how easy it is for developers to use it in C++11 without having to learn specialized instructions but still taking full advantage of the new instructions introduced in Haswell. I’ll provide an example with Intel® Parallel Studio XE 2013 SP1 and Visual Studio 2013.

Both Big Data and Internet of Things (aka IoT) are increasing the total amount of data we need to process. The additional instructions are extremely powerful to process multiple data with a single instruction or to perform operations that required dozens of instructions with a single one. Thus, these instructions are very useful to optimize code that has to run as fast as possible in projects related to Big Data and IoT. You can boost your code performance when you take advantage of the latest additions to the Intel® CPUs.

Since 1996, I’ve been explaining how useful the different improvements in the instruction sets introduced in the different Intel® CPUs were useful to improve code performance in different application domains and according how IT trends have been evolving. So, as you might guess, I’m a big fan of the usage of new instruction sets to boost performance. Intel® AVX2 instructions follow the same programming model introduced by their predecessor: Intel® AVX instructions.

The generation of pseudorandom numbers is a very common requirement in number crunching applications. The good news is that you don’t need to learn the details about the new instruction set to write code that generates random numbers taking full advantage of Intel® AVX2 instructions in C++. In fact, you don’t need to write your own optimized algorithm. You can take advantage of the MRG32k3a pseudorandom number generator included in Intel® Math Kernel Library, a component of Intel® Parallel Studio XE 2013. MRG32k3a is a combined multiple recursive pseudorandom generator with two components of order 3 that is highly optimized for Haswell CPUs and uses Intel® AVX2 instructions.

With a few lines of code, you can take advantage of the most modern SIMD instructions introduced in Intel® CPUs. Because Intel® Parallel Studio XE and Intel® Math Kernel Library have very frequent updates, you can rest assured the algorithms are going to be improved to take advantage of future micro-architecture features and instruction sets. Thus, you can focus on using the generated pseudo random numbers in your application domain. You can think of the highly optimized pseudo random generator as your silver bullet.

Before moving to the code, let me dive a bit deeper on Intel® Math Kernel Library (aka Intel® MKL). Intel® Vector Statistical Library (aka VSL) is a component within MKL that provides optimized routines that implement pseudo-random and quasi-random number generators with continuous and discrete distributions. Thus, the code will use the MRG32k3a pseudo random generator included in VSL. You can read more information about the MRG32k3a pseudo random generator here.

Notice that Intel® Vector Statistical Library provides a wide range of Basic Random Number Generators (aka BRNG). You can use them to obtain random numbers of various statistical distributions and you should choose the appropriate Basic Random Number Generator based on your application requirements. In this case, I’m using the MRG32k3a pseudo random generator because it includes specific optimizations that take advantage of Intel® AVX2. However, depending on your application requirements, other Basic Random Number Generators might be more suitable.

The following steps allow you to create a project that uses MKL and compiles with Intel® C++ Compiler in Visual Studio 2013. The great integration that Intel® Parallel Studio XE 2013 has with Visual Studio 2013 makes it really easy to start working with Intel® MKL with just a few clicks.

1. Use the Launch Intel® Parallel Studio XE 2013 with VS 2013 shortcut to launch the IDE.

2. Create a Windows console application.

3. Select Project | Intel® Composer XE 2013 SP1 | Use Intel® C++ Compiler.

4. Now, right click on the project name in Solution Explorer and select Properties.

5. Select Configuration Properties | Intel® Performance Libraries. Click on the dropdown at the right-hand side of Use Intel® MKL, under Intel® Math Kernel Library. Select the desired working mode based on your needs. In my case, I’ve selected Parallel to use parallel Intel® MKL libraries. See the following figure.

Selecting the desired working mode for Intel® Math Kernel Library in Visual Studio 2013.

I will use a very useful include file, errcheck.inc, that is part of the Intel® Math Kernel Library samples. This file defines the CheckVslError function that receives the int status code returned by any Intel® MKL function call and displays a message explaining the problem with that call when something went wrong. In order to access this file, you have to decompress the examples_core.zip file located in the mkl\examples folder within the Intel® Composer XE 2013 installation folder. So, for example, if you are working with a 64-bit Windows version, the default installation folder for Intel® Composer XE 2013 will be C:\Program Files (x86)\Intel\Composer XE 2013 SP1, and the full path for examples_core.zip will be C:\Program Files (x86)\Intel\Composer XE 2013 SP1\mkl\examples. It is usually a good idea to copy this zip file to another folder and decompress it. Once you decompress the file, you will find the errcheck.inc file within the vslc\source folder. For example, if you decompressed examples_core.zip in C:\mkl_samples, you will find errcheck.inc in C:\mkl_samples\vslc\source. I know you require a few steps, but believe me, errcheck.inc is very useful when you work with Intel® MKL.

The following lines show C++11 code that generates 1,000 pseudo random numbers by using the MRG32k3a pseudo random generator with the BOXMULLER2 method. This method generates normally distributed random numbers. You can read more information about the different methods and their related formulas here. The Intel® MKL functions are C-style calls, but as I cannot stop using C++11 features, I’ve made the C-style calls in a C++ Windows console application that uses some C++11 features to display all the generated pseud random numbers.

#include <iostream>
#include <stdio.h>

#include "mkl.h"
#include "mkl_vsl.h"
// Replace with your own path to errcheck.inc
#include "C:\mkl_samples\vslc\source\errcheck.inc"

#define SEED  7777777
#define RANDOM_NUMBERS     1000

using namespace std;

int main()

       // Buffer for RANDOM_NUMBERS pseudo random numbers
       float pseudorandom[RANDOM_NUMBERS];

       VSLStreamStatePtr stream;

       // Initialize the stream
       // Generate the stream and initialize it specifying the 32-bit input integer parameter seed
       auto status = vslNewStream(&stream, VSL_BRNG_MRG32K3A, SEED);

       // Mean value
       float mean = 0.0f;

       // Standard deviation
       float sigma = 1.0f;

       // Generate normally distributed random numbers
       status = vsRngGaussian(VSL_METHOD_SGAUSSIAN_BOXMULLER2,
              stream, RANDOM_NUMBERS, pseudorandom, mean, sigma);


       // Delete the stream
       status = vslDeleteStream(&stream);

       cout << "Pseudo random numbers:\n";
       for (auto n : pseudorandom) {
              cout << n << '\n';

       return 0;


If your CPU doesn’t support the instructions required for the used generator, the status will be equal to VSL_ERROR_CPU_NOT_SUPPORTED and CheckVslError with display an appropriate message. The code is very easy to understand and the generator is taking advantage of Intel® AVX2.

First, the code declares a buffer to hold the number of float pseudo random numbers defined in RANDOM_NUMBERS: 1,000. Then, a call to the vslNewStream function generates the stream and initializes it specifying the generator, VSL_BRNG_MRG32K3A, and the seed defined in SEED. Notice that each Intel® MKL function call is followed by a call to CheckVslError with the status returned by the Intel® MKL function call as an argument.

Then, the call to vsRngGaussian generates normally distributed random numbers with the BOXMULLER2 method (VSL_METHOD_SGAUSSIAN_BOXMULLER2). The mean value is 0 and the standard deviation (sigma) is 1. The random numbers will be stored in the previously pseudorandom buffer. Finally, the code deletes the stream and displays all the generated pseudo random numbers.

As you can learn from this small example, it is extremely easy to work with Intel® Math Kernel Library in Visual Studio 2013 thanks to the great integration that Intel® Parallel Studio XE 2013 provides with this IDE. With just a few lines of code, you can start taking full advantage of the Intel® AVX2 instructions introduced in Haswell CPUs in your C++ applications.

Intel® Math Kernel Library is a commercial product, but you can download a free 30-day evaluation version here.


For more complete information about compiler optimizations, see our Optimization Notice.