Introduction to Intel® Data Analytics Acceleration Libarary in Intel® System Studio: What It Has to Offer

By Jonghak Kim, Published: 09/13/2017, Last Updated: 09/12/2017


Introduction to Intel® Data Analytics Acceleration Library ( Intel® DAAL )

  For most of data analytics tasks in different industries and domains, computational speed is a key ingredient for success. The Intel® Data Analytics Acceleration Library (Intel® DAAL) helps software developers reduce the time it takes to develop high performance analytics applications. For example Intel® DAAL covers basic statistics for dataset, transformation with matrices, statistical dependence and similarity between variables, feature suggestions, finding co-occurrence patterns and detecting anomalies and .

 Intel® DAAL is a set of libraries to boost machine learning and big data analytics performance. It optimizes the data ingestion and algorithmic compute together for the highest performance. 

 Intel® DAAL enables applications to make better predictions faster and analyze larger data sets with available compute resources. The library also takes advantage of next-generation processors even before they're available. Just link to the newest version and your code is ready for when those new chips hit the market.

Intel® DAAL is available as part of Intel® System Studio from 2018 version and Intel® Parallel Studio XE. It also has free stand-alone and open-source versions. License purchase includes Priority Support.


Why use Intel® DAAL

 The answer is performance. Many data analytics are compute extensive. Intel® DAAL is designed and optimized to extract high performance from todays and tomorrows processors. The "Intel® DAAL v.s. Apache Spark MLlib Performance" benchmark on the Link is a performance comparison with Spark* MLlib.


Machine Learning in Embedded Systems

 Predictive capabilities and reducing irrelevant computations are extremely useful especially when it comes to embedded systems which usually are lacking resources to handle heavy tasks. Therefore, code modernization and optimization become more important then ever. In that perspective, applying Intel® DAAL can be directly connected to cost saving.

 Of course the heavy Machine Learning algorithms should run on the server platforms that are relatively resourceful but if embedded systems can contribute as much as they can and take advantage of their own resources, the whole process becomes more efficient in many ways.

 Therefore, Intel® System Studio now includes Intel® DAAL to provide benefits on running all kinds of data analytics techniques.


Faster Machine Learning and Data Analytics

  • Features highly tuned functions for classical machine learning and analytics performance across spectrum of Intel® architecture devices
  • Optimizes data ingestion together with algorithmic computation for highest analytics throughput
  • Includes Python*, C++, and Java* APIs and connectors to popular data sources including Spark* and Hadoop*
  • Free and open source community-supported versions are available, as well as paid versions that include premium support.


Data Transformation and Analysis in Intel® DAAL

 From Statistics to Matrix operations, Intel DAAL addresses all stages of data analytics pipeline : pre-processing, transformation, analysis, modeling, validation, and decision making.

 Below diagram shows the supported functions and algorithms in pre-processing, data transformation and analysis by Intel® DAAL. Also DAAL supports batch processing, online and distributed processing.



Machine Learning in Intel® DAAL

 You can find the list of Machine Learning algorithms Intel® DAAL offers.


Intel® DAAL examples ( K-means )

Intel® DAAL supports C++, Python* and Java* APIs and also provides examples in those programming languages.

The examples include many different function usages such as Boosting, Cholesky, covariance computation, distance calculation, K-means, Naïve Bayes, Neural Networks, outlier detection and etc.

Here we will see a simple example, K-means, which is among the most popular and simplest clustering methods. It is intended to partition a data set into a small number of clusters such that feature vectors within a cluster have greater similarity with one another than with feature vectors from other clusters. Each cluster is characterized by a representative point, called a centroid, and a cluster radius. In other words, the clustering methods enable reducing the problem of analysis of the entire data set to the analysis of clusters.There are numerous ways to define the measure of similarity and centroids. For K-Means, the centroid is defined as the mean of feature vectors within the cluster.

Here we have the code example of K-means in C++ from Intel® DAAL. To learn more about K-means with DAAL, please refer this article here.

/* file: kmeans_csr_batch.cpp */
* Copyright 2014-2017 Intel Corporation
* All Rights Reserved.
* If this  software was obtained  under the  Intel Simplified  Software License,
* the following terms apply:
* The source code,  information  and material  ("Material") contained  herein is
* owned by Intel Corporation or its  suppliers or licensors,  and  title to such
* Material remains with Intel  Corporation or its  suppliers or  licensors.  The
* Material  contains  proprietary  information  of  Intel or  its suppliers  and
* licensors.  The Material is protected by  worldwide copyright  laws and treaty
* provisions.  No part  of  the  Material   may  be  used,  copied,  reproduced,
* modified, published,  uploaded, posted, transmitted,  distributed or disclosed
* in any way without Intel's prior express written permission.  No license under
* any patent,  copyright or other  intellectual property rights  in the Material
* is granted to  or  conferred  upon  you,  either   expressly,  by implication,
* inducement,  estoppel  or  otherwise.  Any  license   under such  intellectual
* property rights must be express and approved by Intel in writing.
* Unless otherwise agreed by Intel in writing,  you may not remove or alter this
* notice or  any  other  notice   embedded  in  Materials  by  Intel  or Intel's
* suppliers or licensors in any way.
* If this  software  was obtained  under the  Apache License,  Version  2.0 (the
* "License"), the following terms apply:
* You may  not use this  file except  in compliance  with  the License.  You may
* obtain a copy of the License at
* Unless  required  by   applicable  law  or  agreed  to  in  writing,  software
* distributed under the License  is distributed  on an  "AS IS"  BASIS,  WITHOUT
* WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the   License  for the   specific  language   governing   permissions  and
* limitations under the License.

!  Content:
!    C++ example of sparse K-Means clustering in the batch processing mode

 * \example kmeans_csr_batch.cpp

#include "daal.h"
#include "service.h"

using namespace std;
using namespace daal;
using namespace daal::algorithms;

typedef float algorithmFPType; /* Algorithm floating-point type */

/* Input data set parameters */
string datasetFileName     = "../data/batch/kmeans_csr.csv";

/* K-Means algorithm parameters */
const size_t nClusters   = 20;
const size_t nIterations = 5;

int main(int argc, char *argv[])
    checkArguments(argc, argv, 1, &datasetFileName);

    /* Retrieve the data from the input file */
    CSRNumericTablePtr dataTable(createSparseTable<float>(datasetFileName));

    /* Get initial clusters for the K-Means algorithm */
    kmeans::init::Batch<algorithmFPType, kmeans::init::randomCSR> init(nClusters);

    init.input.set(kmeans::init::data, dataTable);

    NumericTablePtr centroids = init.getResult()->get(kmeans::init::centroids);

    /* Create an algorithm object for the K-Means algorithm */
    kmeans::Batch<algorithmFPType, kmeans::lloydCSR> algorithm(nClusters, nIterations);

    algorithm.input.set(kmeans::data,           dataTable);
    algorithm.input.set(kmeans::inputCentroids, centroids);


    /* Print the clusterization results */
    printNumericTable(algorithm.getResult()->get(kmeans::assignments), "First 10 cluster assignments:", 10);
    printNumericTable(algorithm.getResult()->get(kmeans::centroids  ), "First 10 dimensions of centroids:", 20, 10);
    printNumericTable(algorithm.getResult()->get(kmeans::objectiveFunction), "Objective function value:");

    return 0;


Results :



The K-means clustering algorithm outputs the results described below.

Assignments : Use when assignFlag=true. Pointer to the n x 1 numeric table with assignments of cluster indices to feature vectors in the input data. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Centroids : Pointer to the nClusters x p numeric table with the cluster centroids. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except PackedTriangularMatrix, PackedSymmetricMatrix, and CSRNumericTable.

Objective function : Pointer to the 1 x 1 numeric table with the value of the goal function. By default, this result is an object of the HomogenNumericTable class, but you can define the result as an object of any class derived from NumericTable except CSRNumericTable.

For more details about the function, please see Developer Guide and the K-means article.


Performance Comparison

 Here we have a performance comparison data between Intel® DAAL and Spark* MLlib running Alternating Least Squares, Correlation and PCA ( Principal Component Analysis ).


  • 2x Intel® Xeon® E5-2660 CPU @ 2.60GHz, 128 GB, Intel® DAAL 2018;
  • Alternating Least Squares – Users=1M Products=1M Ratings=10M Factors=100 Iterations=1 MLLib time=165.9 sec DAAL time=40.5 sec Gain=4.1x
  • Correlation – N=1M P=2000 size=37 GB Mllib time=169.2 sec DAAL=12.9 sec Gain=13.1x
  • PCA – n=10M p=1000 Partitions=360 Size=75 GB Mllib=246.6 sec DAAL (seq)=17.4 sec Gain=14.2x

Performance Optimization with other components in Intel® System Studio

 We can easily profile an application's performance and find hotspots and tuning points or opportunities using  Intel® VTune™ Amplifier. Then one can apply Intel® C++ Compiler and other libraries for acceleration. Mostly vectorization and parallelization can be done by applying components in Intel® System Studio and this can save up a lot of time and efforts.

 Please refer the image below to catch the concept of optimization flow with  Intel® System Studio



More Information

Intel® System Studio 2018 beta ( Including DAAL ) :

Intel® DAAL :

Accelerating Scikit-learn with the Intel® DAAL : /content/www/us/en/develop/videos/accelerating-scikit-learn-with-the-intel-daal-performance-library.html

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804