In one of my previous posts I described the scheme for detection of outliers in datasets which is important component of the Intel® Summary Statistics Library. We included optimized version of this algorithm in the Update for the first version of the package that was recently released. To have an idea about speed of the algorithm I measured its performance on two Intel CPU, Intel® Xeon® E5440, 2.83 GHz and Intel® Core™ i7, 2.93GHz based machines. For these experiments I generate the dataset from multivariate Gaussian distribution. Dimension of the Gaussian vector, p is varied from 50 till 1,000, and number of observations n – from 20,000 till 100,000. Generation of outliers is similar to that in my previous post. Two graphs below demonstrate performance of the outliers detection in Intel® Summary Statistics Library 1.0 Update. For p=50 performance of the algorithm is less than 0.5 second and is not showed on the graphs.
If dimension of the task p is equal to 1,000 and number of observations is 100,000 then the whole procedure takes less then one minute on Intel® Core™ i7 CPU based machine and a little bit longer – on Intel® Xeon® E5440.
In other words, Intel® Core™ i7 CPU is up 2x times faster than Intel® Xeon® E5440 in this specific application. The graph below that compares two platforms cleans out the CPU speed. As Intel® Core™ i7 CPU has higher frequency then speed-up of the algorithm for detection of outliers on this platform is even higher.