Developer Guide

Contents

Batch Processing

Decision forest classification and regression follows the general workflow described in Training and Prediction > Classification > Usage Model .

Training

For the description of the input and output, refer to Training and Prediction > Classification > Usage Model .
At the training stage, decision tree regression has the following parameters:
Parameter
Default Value
Description
seed
777
The seed for random number generator, which is used to choose the bootstrap set, split features in every split node in a tree, and generate permutation required in computations of MDA variable importance.
nTrees
100
The number of trees in the forest.
observationsPerTree
Fraction
1
Fraction of the training set
S
used to form the bootstrap set for a single tree training, 0 <
observationsPerTreeFraction
<= 1. The observations are sampled randomly with replacement.
featuresPerNode
0
The number of features tried as possible splits per node. If the parameter is set to 0, the library uses the square root of the number of features for classification and (the number of features)/3 for regression.
maxTreeDepth
0
Maximal tree depth. Default is 0 (unlimited).
minObservationsInLeaf
Node
1 for classification, 5 for regression
Minimum number of observations in the leaf node.
impurityThreshold
0
The threshold value used as stopping criteria: if the impurity value in the node is smaller than the threshold, the node is not split anymore.
varImportance
none
The variable importance computation mode.
Possible values:
  • none
    – variable importance is not calculated
  • MDI
    - Mean Decrease of Impurity, also known as the Gini importance or Mean Decrease Gini
  • MDA_Raw
    - Mean Decrease of Accuracy (permutation importance)
  • MDA_Scaled
    - the
    MDA_Raw
    value scaled by its standard deviation
resultsToCompute
0
The 64-bit integer flag that specifies which extra characteristics of the decision forest to compute. Provide one of the following values to request a single characteristic or use bitwise OR to request a combination of the characteristics:
  • computeOutOfBagError
  • computeOutOfBagErrorPerObservation
engine
SharePtr< engines:: mt2203:: Batch>()
Pointer to the random number generator engine.

Output

In addition to regression or classifier output, decision forest calculates the result described below. Pass the Result ID as a parameter to the methods that access the result of your algorithm. For more details, see Algorithms .
Result ID
Result
outOfBagError
Numeric table 1x1 containing out-of-bag error computed when the
computeOutOfBagError
option is on. By default, this result is an object of the
HomogenNumericTable
class, but you can define the result as an object of any class derived from
NumericTable
.
variableImportance
Numeric table 1 x
p
that contains variable importance values for each feature. If you set the
varImportance
parameter to none, the library returns a null pointer to the table. By default, this result is an object of the
HomogenNumericTable
class, but you can define the result as an object of any class derived from
NumericTable
except
PackedTriangularMatrix
and
PackedSymmetricMatrix
.
outOfBagErrorPerObservation
Numeric table of size 1 x n that contains the computed out-of-bag error when the
computeOutOfBagErrorPerObservation
option is enabled. The value -1 in the table indicates that no OOB value was computed because this observation was not in OOB set for any of the trees in the model (never left out during the bootstrap). By default, this result is an object of the
HomogenNumericTable
class, but you can define the result as an object of any class derived from
NumericTable
.
updatedEngine
Engine instance with state updated after computations.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804