Features

### Optimized for Your Hardware

This library is tuned for a broad range of Intel® processors that include Intel Atom®, Intel® Core™, and Intel® Xeon® processors. Since applications may benefit from splitting analytics processing across several platforms, it targets IoT gateways to back-end servers.

For maximum calculation speed, each function is highly tuned to the instruction set, vector width, core count, and memory architecture of each target processor.

### Optimized for Developer Productivity

Access advanced Python*, C++, and Java* data analytics functions that span all processing stages. To reduce software development time, they are pre-optimized and ready to use.

#### Scikit-learn* Optimizations

This Python analytics library has been accelerated with Intel DAAL to increase performance to common routines and algorithms. For more information, see Tech.Decoded.

### Analytics at the Edge or in a Cluster Environment

Batch, streaming, and distributed compute models are supported to cover a range of sizes and performance requirements for application data sets. Mathematical routines in Intel DAAL are built from scratch for efficient processing of incremental learning and master-slave distributed workloads. A unifying API encompasses all of these processing models.

Algorithms

Data Analysis: Characterization, summarization, and transformation

#### Low-Order Moments

Compute the basic dataset characteristics such as sums, means, second order raw moments, variances, and standard deviations.

#### Quantile

Compute quantiles that summarize the distribution of data across equal-sized groups.

#### Correlation and Variance-Covariance Matrices

Quantify a pairwise statistical relationship between feature vectors.

#### Cosine Distance Matrix

Measure pairwise similarity between feature vectors.

#### Correlation Distance Matrix

Measure pairwise similarity between feature vectors.

#### Cholesky Decomposition

Decompose a symmetric positive-definite matrix into a product of a lower triangular matrix and its transposition. Use this basic operation to solve linear systems, nonlinear optimization, Kalman filtration, and more.

#### QR Decomposition

Decompose a general matrix into a product of orthogonal and upper triangular matrices. Solve linear inverse and least squares problems. Find eigenvalues and eigenvectors with this fundamental operation.

#### Singular Value Decomposition (SVD)

Decompose a matrix into a product of a left singular vector, singular values, and a right singular vector. It is the basis of principal component analysis (PCA), linear inverse problem-solving, and data fitting.

#### Principal Component Analysis (PCA)

Reduce the dimensionality of data by transforming input feature vectors into a new set of principal components that are orthogonal to each other.

#### K-Means

Partition a dataset into clusters of similar data points. A centroid represents each cluster, which is the mean of all data points.

#### Expectation-Maximization (EM)

Find the maximum likelihood estimation of the parameters in models. Use it for the Gaussian mixture model as a clustering method, nonlinear dimensionality reduction, missing value problems, and more.

#### Outlier Detection

Identify abnormal distances between observations. To determine if the corresponding observation is an outlier, consider an entire feature vector (multivariate) or a single feature value (univariate).

#### Association Rules

Discover a relationship between variables with confidence.

#### Linear and Radial Basis Functions

Map data onto higher-dimensional space for kernel functions.

#### Quality Metrics

Compute a set of numeric values to characterize quantitative properties of the results that analytical algorithms return. These metrics include a confusion matrix, accuracy, precision, recall, and F-score.

#### Decision Trees

This method is commonly used in data mining. It takes observations about an item (represented in the branches) to make conclusions about the item's target value (represented in the leaves).

#### Decision Forests

This ensemble learning method constructs a multitude of decision trees at training time. It outputs the class mode (classification) or mean prediction (regression) for individual trees.

#### k-Nearest Neighbors (k-NN)

In this type of instance-based learning (lazy learning), the function is only approximated locally and all computation is deferred until classification.

For more complete information about compiler optimizations, see our Optimization Notice.