Given n feature vectors X = { x 1= (x 11,…,x 1p ), ..., x n = (x n1,…,x np ) } of n p-dimensional feature vectors, a vector of class labels y = (y 1, … ,y n ), where y i {0, 1, ... , C - 1} describes the class to which the feature vector x i belongs and C is the number of classes, the problem is to build a decision forest classifier.

Training Stage

Decision forest classifier follows the algorithmic framework of decision forest training with Gini impurity metrics as impurity metrics, that are calculated as follows:

where is the fraction of observations in the subset D that belong to the i-th class.

Prediction Stage

Given decision forest classifier and vectors x 1, ... , x r , the problem is to calculate the labels for those vectors. To solve the problem for each given query vector x i , the algorithm finds the leaf node in a tree in the forest that gives the classification response by that tree. The forest chooses the label y taking the majority of trees in the forest voting for that label.

Out-of-bag Error

Decision forest classifier follows the algorithmic framework for calculating the decision forest out-of-bag (OOB) error, where aggregation of the out-of-bag predictions in all trees and calculation of the OOB error of the decision forest is done as follows:

  • For each vector x i in the dataset X, predict its label by having the majority of votes from the trees that contain x i in their OOB set, and vote for that label.
  • Calculate the OOB error of the decision forest T as the average of misclassifications:
  • If OOB error value per each observation is required, then calculate the prediction error for x i

Variable Importance

The library computes Mean Decrease Impurity (MDI) importance measure, also known as the Gini importance or Mean Decrease Gini, by using the Gini index as impurity metrics.

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)