# Decision Tree

Decision trees partition the feature space into a set of hypercubes,
and then fit a simple model in each hypercube. The simple model can
be a prediction model, which ignores all predictors and predicts the
majority (most frequent) class (or the mean of a dependent variable
for regression), also known as 0-R or constant classifier.

Decision tree induction forms a tree-like graph structure as shown in
the figure below, where:

- Each internal (non-leaf) node denotes a test on one of the features
- Each branch descending from a non-leaf node corresponds to an outcome of the test
- Each external node (leaf) denotes the mentioned simple model

A test is a rule for partitioning the feature space. A test
depends on feature values. Each outcome of a test represents an
appropriate hypercube associated with both the test and one of the
descending branches.

If a test is a Boolean expression (for
example,

*or*f < c

*, where*f = c

*is a feature and*f

*is a constant fitted during decision tree induction), the inducted decision tree is a binary tree, so its non-leaf nodes have exactly two branches, ‘true’ and ‘false’, each corresponding to the result of the Boolean expression.*c

Prediction is performed by starting at the root node of the tree,
testing features by the test specified in this node, then moving down
the tree branch corresponding to the outcome of the test for the
given sample. This process is then repeated for the subtree rooted
at the node, discovered at the selected branch. The final result is the prediction of the simple
model at the leaf node.

Decision trees are often used in ensemble algorithms, such as boosting, bagging, or decision forest.