#
Details

`feature vectors
n`

`
x`

_{ 1}=(

`
x`

_{ 11}, ...,

`
x`

_{ 1 p}), ...,

`
x`

_{ n}=(

`
x`

_{ n 1}, ...,

`
x`

_{ np}) of size

`
p`

`=(
w`

`
w`

_{ 1}, ...,

`
w`

_{ n}),

`=(
y`

`
y`

_{ 1}, ...,

`
y`

_{ n}), the problem is to build a decision tree.

##
Split Criteria

##
Types of Tests

- For continuous features, the test has a form off
_{ j}<, whereconstantf_{ j}is a feature,j∈{1, ...,}.pWhile enumerating all possible tests for each continuous feature, thecan be any threshold as midway between sequential values for sorted unique values of given featureconstantf_{ j}that reach the node. - For categorical features, the test has a form off
_{ j}=, whereconstantf_{ j}is a feature,j∈{1, ...,}.pWhile enumerating all possible tests for each categorical feature, thecan be any value of given featureconstantf_{ j}that reach the node. - For ordinal features, the test has a form off
_{ j}<, whereconstantf_{ j}is a feature,j∈{1, ...,}.pWhile enumerating all possible tests for each ordinal feature, thecan be any unique value except for the first one (in the ascending order) of given featureconstantf_{ j}that reach the node

##
Post-pruning

`feature vectors
m`

`
x`

_{ 1}

^{ pruning}= (

`
x`

_{ 1 1}

^{ pruning}, …,

`
x`

_{ 1 p}

^{ pruning}), …,

`
x`

_{ m}

^{ pruning}= (

`
x`

_{ m}

_{ 1}

^{ pruning}, …,

`
x`

_{ m}

_{ p}

^{ pruning}) of size

`, a vector of class labels
p`

`
y`

^{ pruning}= (

`
y`

_{ 1}

^{ pruning}, …,

`
y`

_{ m}

^{ pruning}) for classification or a vector of responses

`
y`

^{ pruning}= (

`
y`

_{ 1}

^{ pruning}, …,

*
y*

_{ m}

^{ pruning}) for regression. For more details about pruning, see [ Quinlan87 ].

##
Training Stage

*and
maximum tree depth*

*. For each feature, each possible test is examined to be the best one according to the given split criterion. The best test is used to perform partition of the feature space into a set of hypercubes, and each hypercube represents appropriate part of the training dataset to accomplish the construction of each node at the next level in the decision tree.
minimum number of observations in the leaf node*

- E
_{ subtree}is the number of errors (for classification) and the mean-squared error (MSE) (for regression) for a given subtree - E
_{ leaf}is the number of errors (for classification) and the MSE (for regression) for the best possible leaf, which replaces the given subtree.

- Grow the decision tree (subtree):
- If all observations contain the same class label (for classification) or same value of dependent variable (for regression), or pre-pruning parameters disallow further decision tree growing, construct a leaf node.
- Otherwise
- For each feature, sort given feature values and evaluate an appropriate split criterion for every possible test (see Split Criteria and Types of Tests for details).
- Construct a node with a test corresponding to the best split criterion value.
- Partition observations according to outcomes of the found test and recursively grow a decision subtree for each partition.

- Post-prune the decision tree (see Post-pruning for details).

##
Prediction Stage

`
x`

_{ 1}, …,

`
x`

_{ r}, the problem is to calculate the responses for those vectors.

`
x`

_{ i}, the algorithm examines

`
x`

_{ i}by tests in split nodes to find the leaf node, which contains the prediction response.