Details

Given:

  • n feature vectors x 1=(x 11, ..., x 1p ), ..., x n =(x n1, ..., x np ) of size p
  • The vector of class labels y=(y 1, ..., y n) that describes the class to which the feature vector x i belongs, where y i {0, 1, ..., C-1} and C is the number of classes.

The problem is to build a decision tree classifier.

Split Criteria

The library provides the decision tree classification algorithm based on split criteria Gini index [Breiman84] and Information gain [Quinlan86], [Mitchell97]:

  1. Gini index



    where

    • D is a set of observations that reach the node

    • p i is the observed fraction of observations with class i in D

    To find the best test using Gini index, each possible test is examined using

    where

    • O( τ ) is the set of all possible outcomes of test τ

    • D v is the subset of D, for which outcome of τ is v, for example, .

    The test to be used in the node is selected as . For binary decision tree with 'true' and 'false' branches,

  2. Information gain

    where

    • O( τ ), D, D v are defined above
    • , with p i defined above in Gini index.

      Similarly to Gini index, the test to be used in the node is selected as . For binary decision tree with 'true' and 'false' branches, .

Training Stage

The classification decision tree follows the algorithmic framework of decision tree training described in Classification and Regression > Decision tree >Training stage.

Prediction Stage

The classification decision tree follows the algorithmic framework of decision tree prediction described in Classification and Regression > Decision tree > Prediction stage.

Given decision tree and vectors x 1, …, x r , the problem is to calculate the responses for those vectors.

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)