Intel® Data Analytics Acceleration Library Release Notes and New Features

This page provides the current Release Notes for Intel® Data Analytics Acceleration Library. The notes are categorized by year, from newest to oldest, with individual releases listed within each year.

Click a version to expand it into a summary of new features, changes, and known issues in that version since the last release, or the buttons under each major release to see important information, such as pre-requisites, software compatibility, and installation instructions.

You can copy a link to a specific version's section by clicking the chain icon next to its name.

To get product updates, log in to the Intel® Software Development Products Registration Center.
For questions or technical support, visit Intel® Software Developer Support.

2019 Beta

Installation GuideSystem Requirements

Initial Release
  • Enabled support for user-defined data modification procedure in CSV and ODBC data sources. This functionality provides capability to implement a wide range of feature extraction and transformation techniques on the user side.

2018

Installation GuideSystem Requirements

Update 3
  • Bug fixes.
Update 2
  • Host application interface is added, which enables algorithm-level computation cancelling by user-defined callback. This interface is available in Decision Forest and Gradient Boosting Trees algorithms. New example code is provided.
  • New technical preview for experimental Intel DAAL and Intel DAAL extension library
    1. Introduced distributed k-Nearest Neighbors classifiers for both training and prediction. Included new sample that demonstrates how to use this algorithm with Intel® MPI.
    2. Developed experimental extension library on top of existing Intel DAAL Python APIs that provides easy to use API for Intel® DAAL neural networks. This extension library supports configuring and training neural network models in a few lines of code, and allows use of existing TensorFlow* and Caffe* inference models.
  • Gradient Boosting Trees training algorithm has been extended with inexact splits calculation mode. It is applied to continuous features that are bucketed into discrete bins and the possible splits are restricted by the buckets borders.
  • Intel® Threading Building Blocks (Intel® TBB) dependency is removed in library sequential mode.

Known Issues

  • Online linear regression with QR method incorrectly merges results from continuous compute calls. The bug is expected to be fixed in a future release.
  • Categorical features processing bug in Gradient Boosting Trees algorithm. The bug is expected to be fixed in a future release.
Update 1
  • Introduced gradient boosted trees algorithm for classification and regression as stochastic gradient boosting machine with regularization and second order numerical optimization in training procedure (xgboost-like) and exact splits mode. The implementation employs multiple levels of parallelization in trees construction and prediction.
  • Developed experimental extension library on top of existing pyDAAL package that provides easy to use API for Intel® DAAL neural networks. Extension library allows to configure and train neural network model in few lines of code, and to use existing TensorFlow or Caffe models on inference stage.
  • Fixed issue in multi-class classifier so that it now supports other boosting binary classifiers in addition to SVM. Now boosting algorithm clones weak learner before using it, so different threads in multiclass classifier work with different weak learner objects.
  • Introduced new experimental distributed k – Nearest Neighbors classifiers for both training and prediction stages. Added new sample which demonstrates how to use this algorithm along with MPI. The experimental distributed kNN is available here https://github.com/01org/daal/tree/daal_2018_experimental.
  • Added support in PCA algorithm for wide matrices (number of rows is less than the number of columns) with correlation method.
  • Introduced new feature of optionally calculating results for means and variances of input data set in PCA algorithm. Added support of sign-deterministic output. Library is extended by PCA Transformation algorithm. This feature includes the PCA transformation of dataset with optional data normalization and data whitening. Introduced quality metrics for PCA: explained variances, explained variance ratios and noise variance.
  • Introduced new feature of optionally calculating results for means and variances of input data set in Zscore algorithm.
Initial Release
  • Introduced API modifications to streamline library usage and enable consistency across functionality.
  • Introduced support for Decision Tree for both classification and regression. The feature includes calculation of Gini index and Information Gain for classification, and mean squared error (MSE) for regression split criteria, and Reduced Error Pruning.
  • Introduced support for Decision Forest for both classification and regression. The feature includes calculation of Gini index for classification, variance for regression split criteria, generalization error, and variable importance measures such as Mean Decrease Impurity and Mean Decrease Accuracy.
  • Introduced support for varying learning rate in the Stochastic Gradient Descent algorithm for neural network training.
  • Introduced support for filtering in the Data Source including loading selected features/columns from CSV data source and binary representation of the categorical features.
  • Extended Neural Network layers with Element Wise Add layer.
  • Introduced new samples that allow easy integration of the library with Spark* MLlib.
  • Introduced service method for enabling thread pinning;Performance improvements in various algorithms on Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server).

Known Issues:

  • Intel DAAL Python API (a.k.a. pyDAAL) is provided as source. When build it on Windows, users may see warning messages. These warning messages do not indicate critical issues and do not affect the library's functionality. 
  • Intel DAAL Python API (a.k.a. pyDAAL) built from the source does not work on OS X* El Capitan (version 10.11). Workaround: Users can get the Intel Distribution of Python as an Anaconda package (http://anaconda.org/intel/), which contains a pre-built pyDAAL that works on OS X* El Capitan.

2017

Installation GuideSystem RequirementsBug Fix Log

Update 4
  • Small fixes for Python examples
  • Tune Microsoft Visual Studio solution for cpp examples: disable debug for release configurations; set start point for relative paths; add possibility to run examples from IDE
  • Enabled support for macOS with Xcode 8.3
  • Performance tuning for few algorithms to address previous degradation
  • Fixes in documentation
Update 3
  • Intel DAAL (on Linux and macOS) can now be installed directly from yum, apt, and conda repositories.
  • Bug fixes and performance improvements
  • Intel DAAL (for Linux and macOS)  switched to the Apache License 2.0
Update 2
  • Lots of improvements for the neural networks API:
    • Added the transposed convolution layer
    • Added the reshape layer
    • Extended interface of loss softmax cross-entropy layer to support input tensors of arbitrary dimensions
    • Added sigmoid cross-entropy criterion
    • Added truncated Gaussian initializer for tensors
    • Extended support for distributed computing by adding the objective function with pre-computed characteristics
    • Improved performance of neural network layers used in topologies such as AlexNet
  • Added more samples to demonstrate the usage of this library. You can find and download the latest samples from: Intel® Data Analytics Acceleration Library Code Samples
Update 1
  • Added K-Nearest Neighbors (KNN) algorithm  for batch computing mode
  • Added distributed processing mode for neural network training to support distributed parallel data processing
  • Introduced diagonal variance-covariance matrices in EM GMM and controls to treat degenerated covariance matrices
  • Introduced k-means++ and k-means|| initialization methods for K-Means clustering
  • Introduced the Gaussian initializer for neural network model parameters (weights and biases) initialization
  • Introduced min-max normalization algorithm
  • Added multiple ground truth tensors and multiple result tensors for neural networks training and inference stage, respectively
  • Added optional arguments and results in the SGD solver to enable computation resumption from a paused state
  • Added support for merging of the numeric tables by rows
  • Added support for symmetric and triangular packed numeric tables in Java
  • Performance improvements for the following functions:
    • Neural network training and inference, including support for batch mode on the inference stage
    • Local response normalization layer and 2D max pooling layer
    • Abs and Tanh backward layers
    • Cosine distance for result in lower triangular layout, correlation distance for result in full, lower- and upper triangular layouts
    • Lower order moments
    • z-score normalization
    • PCA 
    • Kernel functions for CSR NumericTables
    • CSV feature manager
  • Bug fixes for the following components:
    • Multi-class classifier
    • IBFGS optimization solver
    • Documentation
Initial Release
  • Introducing Python programming language API
  • Introducing Neural Networks functionality
    • Uniform and Xavier initialization methods
    • Layers
      • Two-dimensional convolutional
      • One-, two-, and three-dimensional max pooling
      • One-, two-, and three-dimensional average pooling
      • Spatial pyramid pooling, stochastic pooling and locally connected layers
      • Fully connected
      • Dropout
      • Logistic
      • Hyperbolic tangent
      • Rectifier Linear Unit (ReLu)
      • Parametric Rectifier Linear Unit (pReLu)
      • Smooth Rectifier Linear Unit (smooth ReLu)
      • Softmax with cross-entropy loss
      • Absolute value (abs)
      • Batch normalization
      • Local response normalization
      • Local contrast normalization
      • Concat
      • Split
    • Optimization solvers
      • Stochastic gradient descent
      • Mini-batch stochastic gradient descent
      • Stochastic limited memory Broyden–Fletcher–Goldfarb–Shanno (lBFGS)
      • Mini-batch Adagrad optimization solver
    • Objective functions
      • Mean squared error (MSE)
    • Tensor: Support multiple data layouts, axes control, and computation of tensor size
    • Other: Support for user-defined memory allocation to store layer results in Neural Networks
  • Added Ridge Linear regression algorithm in batch/online/distributed processing mode
  • Added support for quality metrics for linear regression
  • Added z-score normalization
  • Improved performance for QR, SVD, PCA, variance-covariance, linear regression, Expectation Maximization (EM) for Gaussian Mixture Models (GMM), K-means, and the Naïve Bayes algorithms on the 2nd generation of Intel® Xeon Phi™ processors (codenamed Knights Landing), as well as on the Intel® Xeon® E5-xxxx v3 (codenamed Haswell) and the Intel® Xeon® E5-xxxx v4 (codenamed Broadwell) processors. 
  • Bug fixes and other improvements in the library and its documentation
  • Intel DAAL User's Guide and the API documentation are available for online browsing, and are removed from the installer packages
  • Intel DAAL samples are now available as online download and removed from the installer packages
  • Support removed for installation on IA-32 architecture hosts. The 32-bit library continues to exist and can be used on Intel® 64 architecture hosts.

2016

Installation GuideSystem RequirementsBug Fix Log

Update 4
  • Fixed bug in the SVM example with the size of training dataset
  • Fixed bug in C++ examples on OS X* 10.11.4
  • Other minor bug fixes and improvements in the documentation
Update 3
  • Fixed bug in the initialization of Expectation-Maximization algorithm
  • Added examples of using the CSR format of sparse matrices with kernel functions
  • Fixed bug in MPI samples of linear regression
  • Fixed memory leak in AOS NumericTables  
  • Other minor bug fixes and improvements in the documentation
Update 2
  • Improved numerical stability and error handling for EM GMM algorithm.
  • Performance improvements for multi-class classifiers, SVM, kernel functions, Apriori, and ALS algorithms.
  • Introduced support for Sorting algorithm in batch processing mode.
  • Introduced support for CSR data layout format in the initialization phase of the KMeans algorithm.
  • Bug fixes and other improvements in the library and its documentation. 
Update 1
  • Introduced support for Alternating Least Squares algorithm in batch and distributed processing modes.
  • Added support for compressed sparse row (CSR) sparse matrix storage format in Principal Component Analysis, Naïve Bayes and K-means algorithms.
  • Introduced new features in Data Management component:
    • Data loading from the Data Source into several numeric tables
    • Data loading with unknown number of feature vectors
    • Performance improvements in data serialization and deserialization
  • Bug fixes and other improvements in the library and its documentation. 
Initial Release
  • C++ and Java programming languages API.
  • Optimized performance for a range of Intel architectures, including Intel® Xeon®, Intel® Core™, and Intel® Atom™.
  • Data mining and analysis algorithms for
    • Computing correlation distance and cosine distance
    • PCA (Correlation, SVD)
    • Matrix decomposition (SVD, QR, Cholesky)
    • Computing statistical moments
    • Computing variance-covariance and correlation matrices
    • Computing quantiles
    • Univariate and multivariate outlier detection
    • Association rule mining
    • Linear and RBF kernel functions
  • Algorithms for supervised and unsupervised machine learning:
    • Linear regressions
    • Naïve Bayes classifier
    • AdaBoost, LogitBoost, and BrownBoost classifiers
    • SVM classifier
    • K-Means clustering
    • Expectation Maximization (EM) for Gaussian Mixture Models (GMM)
    • Support for validation metrics for classifiers including Confusion Matrix, Accuracy, Precision, Recall, and Fscore.
  • Support for batch, online, and distributed processing modes:
    • Algorithms supporting batch processing: All
    • Algorithms supporting online processing: Statistical moments, Variance-covariance matrix, Correlation matrix, SVD, QR, PCA, Linear regression, Naïve Bayes
    • Algorithms supporting distributed processing: Statistical moments, Variance-covariance matrix, Correlation matrix, SVD, QR, PCA, Linear regression, Naïve Bayes, K-Means
  • Support for local and distributed data sources:
    • In-file and in-memory CSV
    • MySQL
    • HDFS
    • Support for Resilient Distributed Dataset (RDD) objects for Apache Spark*.
  • Data compression and decompression:
    • ZLIB
    • LZO
    • RLE
    • BZIP2
  • Data serialization and deserialization.    
For more complete information about compiler optimizations, see our Optimization Notice.