Intel® Summary Statistics Library: why not to use multi-core advantages?

In my previous posts I described some features and usage model of Intel® Statistics Library. However, there are many available statistical packages that provide good similar functionality. Does Intel® Summary Statistics Library deliver difference, bring something new and specific? The answer is yes. 


New era raises new problems of big dimensions. For example, human genome has at least 3 billion DNA base pairs, 20,000-25,000 protein coding genes. This is a really huge amount of information to process. Fortunately, multi-core processors come to help and make processing of such data arrays easier.      


One of important estimators in the library is the algorithm for computation of variance-covariance matrix. I decided to understand how the algorithm in Intel® Summary Statistics Library is fast versus the same feature available in the other popular libraries. For comparison I chose C-written algorithm for covariance estimation which is underneath of R* project, GNU suite of functions for data processing, version 2.8.0. Performance measurements were done on two-way quad-core Intel® Xeon® processor E5440 Series running at 2.83 GHz with 8 GB RAM, 2x6MB L2 Cache. Total number of available cores is 8, and I use function omp_set_num_threads() to set maximal number of cores to be exploited in the measurements. Dimension of the task was 100 and number of observations is 1,000,000. The dataset was generated from multivariate Gaussian distribution, one pass method was used for computation of variance-covariance matrix. If number of available cores is 8 then the algorithm in Intel® Summary Statistics Library is ~16.7x times faster than algorithm in R*.  


 




The chart below provides an additional idea how the covariance estimator in Intel® Summary Statistics Library is well scaled over number of additional cores. In the performance measurements number of observations remains the same that is, 1,000,000 for all task dimensions p=20, 40, 60, 80, and 100. In a nutshell, the more cores I have the faster I get the results.


 

Para obtener información más completa sobre las optimizaciones del compilador, consulte nuestro Aviso de optimización.