Decision Forest
Details
Given n feature vectors
of
n
p
dimensional feature vectors and n responses
, the problem is to build a decision forest
classification or regression model.Training Stage
Library uses the following algorithmic framework for the training
stage. Let
be the set of observations. Given a positive
integer parameters, such as the number of trees
B
, the bootstrap
parameter
, where f
is a fraction of observations used for
a training of one tree, and the number of features per node m
, the
algorithm does the following for
: Selects randomly with replacement the set ofNvectors from the setS. The set is called abootstrapset.
 Trains a decision tree classifier on using parametermfor each tree.
Decision tree
T
is trained using the training set D
of size N
.
Each node t
in the tree corresponds to the subset
of
the training set D
, with the root node being D
itself. Its
internal nodes t
represent a binary test (split) dividing their
subset
in two subsets
and
,
corresponding to their children
and
.Inexact Histogram Computation Method
In inexact histogram method only a selected subset of splits is considered for computation of a best split.
This subset is computed for each feature at the initialization stage of the algorithm. After the set of splits is computed,
each value from initially provided data is substituted with the value of the corresponding bin.
The bins are continuous intervals between the selected splits.
Split Criteria
The metric for measuring the best split is called
impurity
,
. It generally reflects the homogeneity of responses within
the subset
in the node t
.
For the detailed definition of
metrics, see the description
of a specific algorithm.Let the
impurity decrease
in the node t
beTermination Criteria
The library supports the following termination criteria of
decision forest training:
 Minimal number of observations in a leaf node
 Nodetis not processed if is smaller than the predefined value. Splits that produce nodes with the number of observations smaller than that value are not allowed.
 Minimal number of observations in a split node
 Nodetis not processed if is smaller than the predefined value. Splits that produce nodes with the number of observations smaller than that value are not allowed.
 Minimum weighted fraction of the sum total of weights of all the input observations required to be at a leaf node
 Nodetis not processed if is smaller than the predefined value. Splits that produce nodes with the number of observations smaller than that value are not allowed.
 Maximal tree depth
 Nodetis not processed if its depth in the tree reached the predefined value.
 Impurity threshold
 Nodetis not processed if its impurity is smaller than the predefined threshold.
 Maximal number of leaf nodes
 Grow trees with positive maximal number of leaf nodes in a bestfirst fashion. Best nodes are defined by relative reduction in impurity. If maximal number of leaf nodes equals zero, then this criterion does not limit the number of leaf nodes, and trees grow in a depthfirst fashion.
Tree Building Strategies
Maximal number of leaf nodes defines the strategy of tree building:
depthfirst or bestfirst.
Depthfirst Strategy
If maximal number of leaf nodes equals zero, a decision tree is built using depthfirst strategy.
In each terminal node
t
, the following recursive procedure is applied: Stop if the termination criteria are met.
 Choose randomly without replacementmfeature indices .
 For each , find the best split that partitions subset and maximizes impurity decrease .
 A node is a split if this split induces a decrease of the impurity greater than or equal to the predefined value. Get the best split that maximizes impurity decrease in all splits.
 Apply this procedure recursively to and .
Bestfirst Strategy
If maximal number of leaf nodes is positive, a decision tree is built using bestfirst strategy.
In each terminal node
t
, the following steps are applied: Stop if the termination criteria are met.
 Choose randomly without replacementmfeature indices .
 For each , find the best split that partitions subset and maximizes impurity decrease .
 A node is a split if this split induces a decrease of the impurity greater than or equal to the predefined value and the number of split nodes is less or equal to . Get the best split that maximizes impurity decrease in all splits.
 Put a node into a sorted array, where sort criterion is the improvement in impurity . The node with maximal improvement is the first in the array. For a leaf node, the improvement in impurity is zero.
 Apply this procedure to and and grow a tree one by one getting the first element from the array until the array is empty.
Random Numbers Generation
To create a
bootstrap
set and choose feature indices in the
performant way, the training algorithm requires the source of
random numbers, capable to produce sequences of random numbers in
parallel.Initialization of the engine in the decision forest is based on
the scheme below:
The state of the engine is updated once the training of the
decision forest model is completed. The library provides support
to retrieve the instance of the engine with updated state that can
be used in other computations. The update of the state is
enginespecific and depends on the parallelization technique used
as defined earlier:
 Family: the updated state is the set of states that represent individual engines in the family.
 Leapfrog: the updated state is the state of the sequence with the rightmost position on the sequence. The example below demonstrates the idea for case of 2 subsequences (‘x’ and ‘o’) of the random number sequence:
 SkipAhead: the updated state is the state of the independent sequence with the rightmost position on the sequence. The example below demonstrates the idea for case of 2 subsequences (‘x’ and ‘o’) of the random number sequence:
Prediction Stage
Given decision forest classifier and vectors
,
the problem is to calculate the responses for those vectors. To
solve the problem for each given query vector
, the
algorithm finds the leaf node in a tree in the forest that gives
the response by that tree. The response of the forest is based on
an aggregation of responses from all trees in the forest. For the
detailed definition, see the description of a specific algorithm.
Additional Characteristics Calculated by the Decision Forest
Decision forests can produce additional characteristics, such as
an estimate of generalization error and an importance measure
(relative decisive power) of each of p features (variables).
Outofbag Error
The estimate of the generalization error based on the training
data can be obtained and calculated as follows:
 For each tree in the forest, trained on the bootstrap set , the set is called the outofbag (OOB) set.
 Predict the data from set by .
 For each vector in the dataset X, predict its response by the trees that contain in their OOB set.
 Aggregate the outofbag predictions in all trees and calculate the OOB error of the decision forest.
 If OOB error value per each observation is required, then calculate the prediction error for .
For the detailed definition, see the description of a specific
algorithm.
Variable Importance
There are two main types of variable importance measures:
 Mean Decrease Impurityimportance (MDI).
Importance of thejth variable for predictingYis the sum of weighted impurity decreases for all nodestthat use , averaged over allBtrees in the forest:where is the fraction of observations reaching nodetin the tree , and is the index of the variable used in split .
 Mean Decrease Accuracy(MDA).
Importance of thejth variable for predictingYis the average increase in the OOB error over all trees in the forest when the values of thejth variable are randomly permuted in the OOB set. For that reason, this latter measure is also known aspermutation importance.In more details, the library calculates MDA importance as follows:
Let be the set of feature vectors where thejth variable is randomly permuted over all vectors in the set. Let be the OOB error calculated for on its outofbag dataset . Let be the OOB error calculated for using , and its outofbag dataset is permuted on thejth variable. Then
is the OOB error increase for the tree . is MDA importance. , where is the variance of
Batch Processing
Decision forest classification and regression follows the general
workflow described in Classification Usage Model.
Training
At the training stage, decision forest regression has the following parameters:
Parameter  Default Value  Description 

nTrees  100  The number of trees in the forest. 
observationsPerTreeFraction  1  Fraction of the training set S used to form the bootstrap set for a
single tree training,
. The
observations are sampled randomly with replacement. 
featuresPerNode  0  The number of features tried as possible splits per node. If the
parameter is set to 0 , the library uses the square root of the number of
features,
, for classification
and
features for regression. 
maxTreeDepth  0  Maximal tree depth. Default is 0 (unlimited). 
DEPRECATED: seed  777  The seed for random number generator, which is used to choose the
bootstrap set, split features in every split node in a tree, and
generate permutation required in computations of MDA variable importance.This parameter is deprecated and will be removed in future releases. Use engine instead. 
engine  SharePtr< engines:: mt2203:: Batch>()  Pointer to the random number generator engine. The random numbers produced by this engine are used to choose the bootstrap set,
split features in every split node in a tree, and generate permutation required in computations
of MDA variable importance. 
impurityThreshold  0  The threshold value used as stopping criteria: if the impurity value in
the node is smaller than the threshold, the node is not split anymore. 
varImportance  none  The variable importance computation mode. Possible values:

resultsToCompute  0  The 64bit integer flag that specifies which extra characteristics of
the decision forest to compute. Provide one of the following values to
request a single characteristic or use bitwise OR to request a
combination of the characteristics:

bootstrap  true  If true, the training set for a tree is a bootstrap of the whole training set.
If false, the whole training set is used to build trees. 
minObservationsInLeafNode  1 for classification, 5 for regression  Minimum number of observations in the leaf node. 
minObservationsInSplitNode  2  Minimum number of samples required to split an internal node; it can be any nonnegative number. 
minWeightFractionInLeafNode  0.0  Minimum weighted fraction of the sum total of weights of all the input observations required to be at a leaf node,
from 0.0 to 0.5 .All observations have equal weights if the weights of the observations are not provided. 
minImpurityDecreaseInSplitNode  0.0  Minimum amount of impurity decrease required to split a node; it can be any nonnegative number. 
maxLeafNodes  0  Grow trees with positive maximal number of leaf nodes in a bestfirst fashion.
Best nodes are defined as relative reduction in impurity.
If maximal number of leaf nodes equals zero,
then this parameter does not limit the number of leaf nodes, and trees grow in a depthfirst fashion. 
Output
In addition to regression or classifier output, decision forest
calculates the result described below. Pass the
Result ID
as a
parameter to the methods that access the result of your algorithm.Result ID  Result 

outOfBagError  A numeric table
containing outofbag error computed when the
computeOutOfBagErroroption option is on.By default, this result is an object of the HomogenNumericTable class,
but you can define the result as an object of any class derived from NumericTable . 
variableImportance  A numeric table
that contains variable importance values for each
feature. If you set the varImportance parameter to none, the library
returns a null pointer to the table.By default, this result is an object of the HomogenNumericTable class,
but you can define the result as an object of any class derived from NumericTable
except PackedTriangularMatrix and PackedSymmetricMatrix . 
outOfBagErrorPerObservation  A numeric table of size
that contains the computed outofbag error
when the computeOutOfBagErrorPerObservation option is enabled. The value
in the table indicates that no OOB value was computed because this
observation was not in OOB set for any of the trees in the model (never
left out during the bootstrap).By default, this result is an object of the HomogenNumericTable class,
but you can define the result as an object of any class derived from NumericTable . 
updatedEngine  Engine instance with state updated after computations. 
Performance Considerations
To get the best performance of the decision forest variable
importance computation, use the Mean Decrease Impurity (MDI) rather
than the Mean Decrease Accuracy (MDA) method.
Optimization Notice 

Intel’s compilers may or may not optimize to the same degree for
nonIntel microprocessors for optimizations that are not unique to
Intel microprocessors. These optimizations include SSE2, SSE3, and
SSSE3 instruction sets and other optimizations. Intel does not
guarantee the availability, functionality, or effectiveness of any
optimization on microprocessors not manufactured by Intel.
Microprocessordependent optimizations in this product are intended
for use with Intel microprocessors. Certain optimizations not
specific to Intel microarchitecture are reserved for Intel
microprocessors. Please refer to the applicable product User and
Reference Guides for more information regarding the specific
instruction sets covered by this notice. Notice revision #20110804 