Faster random number generation in Intel® Distribution for Python*

By Oleksandr Pavlyk,

Published:06/15/2016   Last Updated:06/15/2016

The update 1 of the Intel® Distribution for Python* 2017 Beta introduces numpy.random_intel, an extension to numpy which closely mirrors the design of numpy.random and uses Intel® MKL's vector statistics library to achieve significant performance boost.

Unlocking the performance benefits is as simple as replacing numpy.random with numpy.random_intel:

In [1]: import numpy as np

In [2]: from numpy import random, random_intel

In [3]: %timeit np.random.rand(10**5)
1000 loops, best of 3: 1.05 ms per loop

In [4]: %timeit np.random_intel.rand(10**5)
10000 loops, best of 3: 146 µs per loop

All the outputs and timing measurements were obtained using Intel® Distribution for Python 3.5.1 2017, Beta update 1 (Jun. 8, 2016) with Numpy 1.11.0 and using Intel® MKL 11.3.3 run on a computer with the following configuration info: Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz (2 sockets, 16 cores, HTT=OFF), 64GB of RAM, Operating system: Ubuntu 14.04 LTS.

The following table illustrates performance improvements for select common distributions.

Total timing of sampling of 100,000 variates repeated 256 times
Distribution timing(random) in secs timing(random_intel) in secs speedup factor
uniform(-1, 1) 0.357 0.034 10.52
normal(0, 1) 0.834 0.081 10.35
gamma(5.2, 1) 1.399 0.267 5.25
beta(0.7, 2.5) 3.677 0.556 6.61
       
randint(0, 100) 0.228 0.053 4.33
poisson(7.6) 2.990 0.052 57.44
hypergeometric(214, 97, 83) 11.353 0.517 21.96

The module numpy.random_intel allows users to take advantage of different basic pseudo-random number generators supported in Intel® MKL, which can be specified using brng argument to RandomState class constructor, or its initialization method seed. The default basic pseudo-random generator is the same as in the numpy.random.

The following code measures performance of sampling of 100,000 standard uniform random variables across different basic pseudo-random number generators relative to the Mersenne twister basic random number generator 'MT19937':

import numpy as np
from timeit import timeit, Timer
from numpy import random_intel
from operator import itemgetter

def timer_brng(brng_name):
    return Timer('numpy.random_intel.uniform(0,1,size=10**5)', 
        setup='import numpy.random_intel; numpy.random_intel.seed(77777, brng="{}")'.format(brng_name))

_brngs = ['WH', 'PHILOX4X32X10', 'MT2203', 'MCG59', 'MCG31', 'MT19937', 'MRG32K3A', 'SFMT19937', 'R250']
tdata = sorted([(brng, timer_brng(brng).timeit(number=1000)) for brng in _brngs], key=itemgetter(1))

def relative_perf(tdata):
	base = dict(tdata).get('MT19937')
	return [(name, t/base) for name, t in tdata]

relative_perf(tdata)

Which produces the output:

In [9]: relative_perf(tdata)
Out[9]:
[('MCG31', 0.607988362606216),
 ('SFMT19937', 0.6520599397085953),
 ('MCG59', 0.6522106463148241),
 ('MT2203', 0.8041550837721791),
 ('MT19937', 1.0),
 ('PHILOX4X32X10', 1.021985386332785),
 ('R250', 1.2991300341128584),
 ('WH', 1.4285221807604567),
 ('MRG32K3A', 2.148446327408357)]

Furthermore, numpy.random_intel exposes leapfrog and skipahead methods for initialization of RandomState class, which allow one to ensure that random streams used in different threads are uncorrelated.

For example:

rs1 = np.random_intel.RandomState(77777, brng='SFMT19937')
rs2 = np.random_intel.RandomState(77777, brng='SFMT19937')
# skip the first stream 2**60 steps ahead
rs1.skipahead(2**60)

s1 = rs1.randint(0,2**31,size=10**5)
s2 = rs2.randint(0,2**31,size=10**5)

We can now use the Pearson's r to test the null hypothesis that the two samples are uncorrelated:

from scipy.stats.stats import pearsonr
pearsonr(s1, s2)

which produces an output:

In[12]: pearsonr(s1, s2)
Out[12]:
(0.0034069712002612325, 0.28131564831676692)

meaning that data do not contradict the hypothesis.

For additional details see documentation of the module, as well we Intel® MKL notes for vector statistics.

As always, your feedback is most appreciated. Speedy computing.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804