Developer Guide

Contents

Details

Given
n
feature vectors
X
= {
x
1
= (
x
11
,…,
x
1
p
), ...,
x
n
= (
x
n
1
,…,
x
np
) } of
n
p
-dimensional feature vectors, a vector of class labels
y
= (
y
1
, … ,
y
n
), where
y
i
{0, 1, ... ,
C
- 1} describes the class to which the feature vector
x
i
belongs and
C
is the number of classes, the problem is to build a decision forest classifier.

Training Stage

Decision forest classifier follows the algorithmic framework of decision forest training with Gini impurity metrics as impurity metrics, that are calculated as follows:
where
is the fraction of observations in the subset
D
that belong to the
i
-th class.

Prediction Stage

Given decision forest classifier and vectors
x
1
, ... ,
x
r
, the problem is to calculate the labels for those vectors. To solve the problem for each given query vector
x
i
, the algorithm finds the leaf node in a tree in the forest that gives the classification response by that tree. The forest chooses the label
y
taking the majority of trees in the forest voting for that label.

Out-of-bag Error

Decision forest classifier follows the algorithmic framework for calculating the decision forest out-of-bag (OOB) error, where aggregation of the out-of-bag predictions in all trees and calculation of the OOB error of the decision forest is done as follows:
  • For each vector
    x
    i
    in the dataset
    X
    , predict its label
    by having the majority of votes from the trees that contain
    x
    i
    in their OOB set, and vote for that label.
  • Calculate the OOB error of the decision forest
    T
    as the average of misclassifications:
  • If OOB error value per each observation is required, then calculate the prediction error for
    x
    i

Variable Importance

The library computes
Mean Decrease Impurity
(MDI) importance measure, also known as the
Gini importance
or
Mean Decrease Gini
, by using the Gini index as impurity metrics.

Training Alternative

If you already have a set of precomputed values for nodes in each tree, you can use the Model Builder class to get a trained Intel DAAL Decision Forest Classification model based on the external model you have.
The following schema illustrates the use of the Model Builder class for Decision Forest Classification:
For general information on using the Model Builder class, see Training and Prediction . For details on using the Model Builder class for Decision Forest Classification, see Usage of training alternative .

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804