Intel® Summary Statistics Library: why not to use multi-core advantages?

By Dmitry Kabaev (Intel) (11 posts) on October 28, 2008 at 5:23 am

In my previous posts I described some features and usage model of Intel® Statistics Library. However, there are many available statistical packages that provide good similar functionality. Does Intel® Summary Statistics Library deliver difference, bring something new and specific? The answer is yes. 

New era raises new problems of big dimensions. For example, human genome has at least 3 billion DNA base pairs, 20,000-25,000 protein coding genes. This is a really huge amount of information to process. Fortunately, multi-core processors come to help and make processing of such data arrays easier.      

One of important estimators in the library is the algorithm for computation of variance-covariance matrix. I decided to understand how the algorithm in Intel® Summary Statistics Library is fast versus the same feature available in the other popular libraries. For comparison I chose C-written algorithm for covariance estimation which is underneath of R* project, GNU suite of functions for data processing, version 2.8.0. Performance measurements were done on two-way quad-core Intel® Xeon® processor E5440 Series running at 2.83 GHz with 8 GB RAM, 2x6MB L2 Cache. Total number of available cores is 8, and I use function omp_set_num_threads() to set maximal number of cores to be exploited in the measurements. Dimension of the task was 100 and number of observations is 1,000,000. The dataset was generated from multivariate Gaussian distribution, one pass method was used for computation of variance-covariance matrix. If number of available cores is 8 then the algorithm in Intel® Summary Statistics Library is ~16.7x times faster than algorithm in R*.  

 

The chart below provides an additional idea how the covariance estimator in Intel® Summary Statistics Library is well scaled over number of additional cores. In the performance measurements number of observations remains the same that is, 1,000,000 for all task dimensions p=20, 40, 60, 80, and 100. In a nutshell, the more cores I have the faster I get the results.

 

Categories: Financial Services Industry, Parallel Prog. & Multi-Core, Software Engineering, What If Software

Comments (4)

November 11, 2008 12:27 AM PST


Sergey Maidanov
Nice demonstration of how variance-covariance computation scales on Intel multi-core system. Does the Intel(R) Summary Statistics Library scale well on non-Intel processors? Also what if I want to estimate mean and variance of large multidimensional dataset? Would the library scale also?
November 11, 2008 7:43 AM PST

Ilya Burylov (Intel)
Total Points:
210
Status Points:
160
Green Belt
Sergey, Intel(R) Summary Statistics Library was designed to be scalable on all supported platforms, though scaling numbers depend an actual hardware configuration. We use threading oportunities in computation of algebraic and central moments as well. The algorithms in the library for computation of mean and variance scale slightly worse than the algorithm discussed above. However if we compute full set of algebraic and central moments (up to 4th order), skewness, kurtosis and variation coefficient scalability is about 6.98 (performance experiment was done on two-way quad-core Intel® Xeon® processor E5440, total 8 cores, the task dimension is 64 and number of observations is 1,000,000).
November 12, 2008 7:08 AM PST


Sergey Maidanov
Ilya, thank you. It answers my questions. More performance charts are welcome.
February 5, 2009 2:34 AM PST


idrees
this is my com i won't to install the grapics

Trackbacks (0)


Leave a comment  

To obtain technical support, please go to Software Support.
Name (required)*

Email (required; will not be displayed on this page)*

Your URL (optional)


Comment*