Developer Guide

Contents

Details

Given
n
feature vectors
X
= {
x
1
= (
x
11
, … ,
x
1
p
), ... ,
x
n
= (
x
n
1
, … ,
x
np
) } of
n
p
-dimensional feature vectors and
n
responses
Y
= {
y
1
, … ,
y
n
}, the problem is to build a decision forest classification or regression model.

Training Stage

Library uses the following algorithmic framework for the training stage. Let
S
= (
X
,
Y
) be the set of observations. Given a positive integer parameters, such as the number of trees
B
, the bootstrap parameter
N
=
f
*
n
, where
f
is a fraction of observations used for a training of one tree, and the number of features per node
m
, the algorithm does the following for
b
= 1, ... ,
B
:
  • Selects randomly with replacement the set
    D
    b
    of
    N
    vectors from the set
    S
    . The set
    D
    b
    is called a
    bootstrap
    set.
  • Trains a decision tree classifier
    T
    b
    on
    D
    b
    using parameter
    m
    for each tree.
Decision tree
T
is trained using the training set
D
of size
N
. Each node
t
in the tree corresponds to the subset
D
t
of the training set
D
, with the root node being
D
itself. Its internal nodes
t
represent a binary test (split) dividing their subset
X
t
in two subsets
X
t
L
and
X
t
R
, corresponding to their children
t
L
and
t
R
.
The metric for measuring the best split is called
impurity
,
i(t)
. It generally reflects the homogeneity of responses within the subset
D
t
in the node
t
.
For the detailed definition of
i(t)
metrics, see the description of a specific algorithm.
Let the
impurity decrease
in the node
t
be
The library supports the following termination criteria of decision forest training:
  • Minimal number of observations in a leaf node
    . Node
    t
    is not processed if |
    D
    t
    | is smaller than the predefined value. Splits that produce nodes with the number of
    observations
    smaller than that value are not allowed.
  • Maximal tree depth
    . Node
    t
    is not processed, if its depth in the tree reached the predefined value.
  • Impurity threshold
    . Node
    t
    is not processed, if its impurity is smaller than the predefined threshold.
Decision tree follows the recursive procedure below applied in each terminal node
t
:
  • Stop, if the termination criteria is met.
  • Choose randomly without replacement
    m
    feature indices
    J
    t
    {0, 1, ... ,
    p
    -1}.
  • For each
    j
    J
    t
    , find the best split
    s
    j,t
    that partitions subset
    D
    t
    and maximizes impurity decrease
    ∆i(t)
    .
  • Get the best split
    s
    t
    that maximizes impurity decrease
    ∆i
    in all
    s
    j,t
    splits.
  • Apply this procedure recursively to
    t
    L
    and
    t
    R
    .
To create a
bootstrap
set and choose feature indices in the performant way, the training algorithm requires the source of random numbers, capable to produce sequences of random numbers in parallel. See Algorithms > Analysis > Engines for the description of the respective techniques.
Initialization of the engine in the decision forest is based on the scheme below:
The state of the engine is updated once the training of the decision forest model is completed. The library provides support to retrieve the instance of the engine with updated state that can be used in other computations. The update of the state is engine-specific and depends on the parallelization technique used as defined earlier:
  • Family: the updated state is the set of states that represent individual engines in the family.
  • Leapfrog: the updated state is the state of the sequence with the rightmost position on the sequence. The example below demonstrates the idea for case of 2 subsequences (‘x’ and ‘o’) of the random number sequence:
    Leapfrog parallelization algorithm
  • SkipAhead: the updated state is the state of the independent sequence with the rightmost position on the sequence. The example below demonstrates the idea for case of 2 subsequences (‘x’ and ‘o’) of the random number sequence:
    SkipAhead parallelization algorithm

Prediction Stage

Given decision forest classifier and vectors
x
1, ... ,
x
r
, the problem is to calculate the responses for those vectors. To solve the problem for each given query vector
x
i
, the algorithm finds the leaf node in a tree in the forest that gives the response by that tree. The response of the forest is based on an aggregation of responses from all trees in the forest. For the detailed definition, see the description of a specific algorithm.

Additional Characteristics Calculated by the Decision Forest

Decision forests can produce additional characteristics, such as an estimate of generalization error and an importance measure (relative decisive power) of each of
p
features (variables).

Out-of-bag Error

The estimate of the generalization error based on the training data can be obtained and calculated as follows:
  • For each tree
    T
    b
    in the forest, trained on the bootstrap set
    D
    b
    , the set
    is called the out-of-bag (OOB) set.
  • Predict the data from
    set by
    T
    b
    .
  • For each vector
    x
    i
    in the dataset
    X
    , predict its response
    by the trees that contain
    x
    i
    in their OOB set.
  • Aggregate the out-of-bag predictions in all trees and calculate the OOB error of the decision forest.
  • If OOB error value per each observation is required, then calculate the prediction error for
    x
    i
    .
For the detailed definition, see the description of a specific algorithm.

Variable Importance

There are two main types of variable importance measures:
  • Mean Decrease Impurity
    importance (MDI).
    Importance of the
    j
    -th variable for predicting
    Y
    is the sum of weighted impurity decreases
    p(t)∆i(s
    t
    ,t)
    for all nodes
    t
    that use
    x
    j
    , averaged over all
    B
    trees in the forest:
    where
    is the fraction of observations reaching node
    t
    in the tree
    T
    b
    , and
    v(s
    t
    )
    is the index of the variable used in split
    s
    t
    .
  • Mean Decrease Accuracy
    (MDA).
    Importance of the
    j
    -th variable for predicting
    Y
    is the average increase in the OOB error over all trees in the forest when the values of the
    j
    -th variable are randomly permuted in the OOB set. For that reason, this latter measure is also known as
    permutation importance
    .
    In more details, the library calculates MDA importance as follows:
    • Let
      π
      (X,j)
      be the set of feature vectors where the
      j
      -th variable is randomly permuted over all vectors in the set.
    • Let
      E
      b
      be the OOB error calculated for
      T
      b
      on its out-of-bag dataset
      .
    • Let
      E
      b,j
      be the OOB error calculated for
      T
      b
      using
      , and its out-of-bag dataset
      is permuted on the
      j
      -th variable. Then
      • is the OOB error increase for the tree
        T
        b
        .
      • is MDA importance.
      • where
        is the variance of
        D
        b,j
        .

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804