Faster random number generation in Intel® Distribution for Python*

The update 1 of the Intel® Distribution for Python* 2017 Beta introduces numpy.random_intel, an extension to numpy which closely mirrors the design of numpy.random and uses Intel® MKL's vector statistics library to achieve significant performance boost.

Unlocking the performance benefits is as simple as replacing numpy.random with numpy.random_intel:

In [1]: import numpy as np

In [2]: from numpy import random, random_intel

In [3]: %timeit np.random.rand(10**5)
1000 loops, best of 3: 1.05 ms per loop

In [4]: %timeit np.random_intel.rand(10**5)
10000 loops, best of 3: 146 µs per loop

All the outputs and timing measurements were obtained using Intel® Distribution for Python 3.5.1 2017, Beta update 1 (Jun. 8, 2016) with Numpy 1.11.0 and using Intel® MKL 11.3.3 run on a computer with the following configuration info: Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz (2 sockets, 16 cores, HTT=OFF), 64GB of RAM, Operating system: Ubuntu 14.04 LTS.

The following table illustrates performance improvements for select common distributions.

Total timing of sampling of 100,000 variates repeated 256 times
Distributiontiming(random) in secstiming(random_intel) in secsspeedup factor
uniform(-1, 1)0.3570.03410.52
normal(0, 1)0.8340.08110.35
gamma(5.2, 1)1.3990.2675.25
beta(0.7, 2.5)3.6770.5566.61
    
randint(0, 100)0.2280.0534.33
poisson(7.6)2.9900.05257.44
hypergeometric(214, 97, 83)11.3530.51721.96

The module numpy.random_intel allows users to take advantage of different basic pseudo-random number generators supported in Intel® MKL, which can be specified using brng argument to RandomState class constructor, or its initialization method seed. The default basic pseudo-random generator is the same as in the numpy.random.

The following code measures performance of sampling of 100,000 standard uniform random variables across different basic pseudo-random number generators relative to the Mersenne twister basic random number generator 'MT19937':

import numpy as np
from timeit import timeit, Timer
from numpy import random_intel
from operator import itemgetter

def timer_brng(brng_name):
    return Timer('numpy.random_intel.uniform(0,1,size=10**5)', 
        setup='import numpy.random_intel; numpy.random_intel.seed(77777, brng="{}")'.format(brng_name))

_brngs = ['WH', 'PHILOX4X32X10', 'MT2203', 'MCG59', 'MCG31', 'MT19937', 'MRG32K3A', 'SFMT19937', 'R250']
tdata = sorted([(brng, timer_brng(brng).timeit(number=1000)) for brng in _brngs], key=itemgetter(1))

def relative_perf(tdata):
	base = dict(tdata).get('MT19937')
	return [(name, t/base) for name, t in tdata]

relative_perf(tdata)

Which produces the output:

In [9]: relative_perf(tdata)
Out[9]:
[('MCG31', 0.607988362606216),
 ('SFMT19937', 0.6520599397085953),
 ('MCG59', 0.6522106463148241),
 ('MT2203', 0.8041550837721791),
 ('MT19937', 1.0),
 ('PHILOX4X32X10', 1.021985386332785),
 ('R250', 1.2991300341128584),
 ('WH', 1.4285221807604567),
 ('MRG32K3A', 2.148446327408357)]

Furthermore, numpy.random_intel exposes leapfrog and skipahead methods for initialization of RandomState class, which allow one to ensure that random streams used in different threads are uncorrelated.

For example:

rs1 = np.random_intel.RandomState(77777, brng='SFMT19937')
rs2 = np.random_intel.RandomState(77777, brng='SFMT19937')
# skip the first stream 2**60 steps ahead
rs1.skipahead(2**60)

s1 = rs1.randint(0,2**31,size=10**5)
s2 = rs2.randint(0,2**31,size=10**5)

We can now use the Pearson's r to test the null hypothesis that the two samples are uncorrelated:

from scipy.stats.stats import pearsonr
pearsonr(s1, s2)

which produces an output:

In[12]: pearsonr(s1, s2)
Out[12]:
(0.0034069712002612325, 0.28131564831676692)

meaning that data do not contradict the hypothesis.

For additional details see documentation of the module, as well we Intel® MKL notes for vector statistics.

As always, your feedback is most appreciated. Speedy computing.

For more complete information about compiler optimizations, see our Optimization Notice.