Contents

# Details

Given
n
feature vectors
x
1
=(
x
11
, ...,
x
1
p
), ...,
x
n
=(
x
n
1
, ...,
x
np
) of size
p
, their non-negative sample weights
w
=(
w
1
, ...,
w
n
),
and the vector of responses
y
=(
y
1
, ...,
y
n
), the problem is to build a decision tree.

## Split Criteria

The library provides the decision tree classification algorithm based on split criteria Gini index [Breiman84] and Information gain [Quinlan86], [Mitchell97]. See Classification: Decision tree > Details > Split criteria for more information.
The library also provides the decision tree regression algorithm based on the mean-squared error (MSE) [Breiman84]. See Regression: Decision tree > Details > Split Criterion for details.

## Types of Tests

The library inducts decision trees with the following types of tests:
1. For continuous features, the test has a form of
f
j
<
constant
, where
f
j
is a feature,
j
{1, ...,
p
}.
While enumerating all possible tests for each continuous feature, the
constant
can be any threshold as midway between sequential values for sorted unique values of given feature
f
j
that reach the node.
2. For categorical features, the test has a form of
f
j
=
constant
, where
f
j
is a feature,
j
{1, ...,
p
}.
While enumerating all possible tests for each categorical feature, the
constant
can be any value of given feature
f
j
that reach the node.
3. For ordinal features, the test has a form of
f
j
<
constant
, where
f
j
is a feature,
j
{1, ...,
p
}.
While enumerating all possible tests for each ordinal feature, the
constant
can be any unique value except for the first one (in the ascending order) of given feature
f
j
that reach the node

## Post-pruning

Optionally, the decision tree can be post-pruned using given
m
feature vectors
x
1
pruning
= (
x
1 1
pruning
, …,
x
1
p
pruning
), …,
x
m
pruning
= (
x
m
1
pruning
, …,
x
m
p
pruning
) of size
p
, a vector of class labels
y
pruning
= (
y
1
pruning
, …,
y
m
pruning
) for classification or a vector of responses
y
pruning
= (
y
1
pruning
, …,
y
m
pruning
) for regression. For more details about pruning, see [Quinlan87].
Pruned dataset can be some fraction of original training dataset (e.g. randomly chosen 30% of observations), but in this case those observations must be excluded from the training dataset.

## Training Stage

The library uses the following algorithmic framework for the training stage.
The decision tree grows recursively from the root node, which corresponds to the entire training dataset. This process takes into account pre-pruning parameters:
maximum tree depth
and
minimum number of observations in the leaf node
. For each feature, each possible test is examined to be the best one according to the given split criterion. The best test is used to perform partition of the feature space into a set of hypercubes, and each hypercube represents appropriate part of the training dataset to accomplish the construction of each node at the next level in the decision tree.
After the decision tree is built, it can optionally be pruned by Reduced Error Pruning (REP) [Quinlan87] to avoid overfitting. REP assumes that there is a separate pruning dataset, each observation in which is used to get prediction by the original (unpruned) tree. For every non-leaf subtree, the change in mispredictions is examined over the pruning dataset that would occur if this subtree was replaced by the best possible leaf:
where
• E
subtree
is the number of errors (for classification) and the mean-squared error (MSE) (for regression) for a given subtree
• E
leaf
is the number of errors (for classification) and the MSE (for regression) for the best possible leaf, which replaces the given subtree.
If the new tree gives an equal or fewer mispredictions (
) and the subtree contains no subtree with the same property, the subtree is replaced by the leaf. The process continues until any further replacements increase mispredictions over the pruning dataset. The final tree is the most accurate subtree of the original tree with respect to the pruning dataset and is the smallest tree with that accuracy.
The training procedure contains the following steps:
1. Grow the decision tree (subtree):
• If all observations contain the same class label (for classification) or same value of dependent variable (for regression), or pre-pruning parameters disallow further decision tree growing, construct a leaf node.
• Otherwise
• For each feature, sort given feature values and evaluate an appropriate split criterion for every possible test (see Split Criteria and Types of Tests for details).
• Construct a node with a test corresponding to the best split criterion value.
• Partition observations according to outcomes of the found test and recursively grow a decision subtree for each partition.
2. Post-prune the decision tree (see Post-pruning for details).

## Prediction Stage

The library uses the following algorithmic framework for the prediction stage.
Given the decision tree and vectors
x
1
, …,
x
r
, the problem is to calculate the responses for those vectors.
To solve the problem for each given vector
x
i
, the algorithm examines
x
i
by tests in split nodes to find the leaf node, which contains the prediction response.

#### Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804