Developer Guide and Reference

  • 2021.1
  • 12/04/2020
  • Public Content
Contents

Classification Decision Tree

Classification decision tree is a kind of a decision tree described in Decision Tree.

Details

Given:
  • n feature vectors LaTex Math image. of size
    p
  • The vector of class labels LaTex Math image. that describes the class to which the feature vector LaTex Math image. belongs, where LaTex Math image. and C is the number of classes.
The problem is to build a decision tree classifier.
Split Criteria
The library provides the decision tree classification algorithm based on split criteria Gini index [Breiman84] and Information gain [Quinlan86], [Mitchell97]:
  1. Gini index
    LaTex Math image.
    where
    • D
      is a set of observations that reach the node
    • LaTex Math image. is the observed fraction of observations with class
      i
      in
      D
    To find the best test using Gini index, each possible test is examined using
    LaTex Math image.
    where
    • LaTex Math image. is the set of all possible outcomes of test LaTex Math image.
    • LaTex Math image. is the subset of
      D
      , for which outcome of LaTex Math image. is
      v
      , for example LaTex Math image.
    The test to be used in the node is selected as LaTex Math image. . For binary decision tree with ‘true’ and ‘false’ branches, LaTex Math image.
  2. Information gain
LaTex Math image.
where
  • LaTex Math image. ,
    D
    , LaTex Math image. are defined above
  • LaTex Math image. , with LaTex Math image. defined above in Gini index.
    Similarly to Gini index, the test to be used in the node is selected as LaTex Math image. . For binary decision tree with ‘true’ and ‘false’ branches, LaTex Math image.
Training Stage
The classification decision tree follows the algorithmic framework of decision tree training described in Decision Tree.
Prediction Stage
The classification decision tree follows the algorithmic framework of decision tree prediction described in Decision Tree.
Given decision tree and vectors LaTex Math image. , the problem is to calculate the responses for those vectors.

Batch Processing

Decision tree classification follows the general workflow described in Classification Usage Model.
Training
In addition to common input for a classifier, decision trees can accept the following inputs that are used for post-pruning:
Input ID
Input
dataForPruning
Pointer to the LaTex Math image. numeric table with the pruning data set. This table can be an object of any class derived from NumericTable.
labelsForPruning
Pointer to the LaTex Math image. numeric table with class labels. This table can be an object of any class derived from NumericTable except PackedSymmetricMatrix and PackedTriangularMatrix.
At the training stage, decision tree classifier has the following parameters:
Parameter
Default Value
Description
algorithmFPType
float
The floating-point type that the algorithm uses for intermediate computations. Can be
float
or
double
.
method
defaultDense
The computation method used by the decision tree classification. The only training method supported so far is the default dense method.
nClasses
Not applicable
The number of classes. A required parameter.
splitCriterion
infoGain
Split criterion to choose the best test for split nodes. Available split criteria for decision trees:
  • gini
    - the Gini index
  • infoGain
    - the information gain
pruning
reducedErrorPruning
Method to perform post-pruning. Available options for the pruning parameter:
  • reducedErrorPruning
    - reduced error pruning. Provide dataForPruning and labelsForPruning inputs, if you use pruning.
  • none
    - do not prune.
maxTreeDepth
0
Maximum tree depth. Zero value means unlimited depth. Can be any non-negative number.
minObservationsInLeafNodes
1
Minimum number of observations in the leaf node. Can be any positive number.
Prediction
At the prediction stage, decision tree classifier has the following parameters:
Parameter
Default Value
Description
algorithmFPType
float
The floating-point type that the algorithm uses for intermediate computations. Can be
float
or
double
.
method
defaultDense
The computation method used by the decision tree classification. The only training method supported so far is the default dense method.
Examples
C++ (CPU)
Batch Processing:
Java*
There is no support for Java on GPU.
Batch Processing:

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.