Developer Guide and Reference

  • 2021.1
  • 12/04/2020
  • Public Content
Contents

Gradient Boosted Trees

Details

Given n feature vectors LaTex Math image. of
n
p
-dimensional feature vectors and
n
responses LaTex Math image. , the problem is to build a gradient boosted trees classification or regression model.
The tree ensemble model uses M additive functions to predict the output LaTex Math image. where LaTex Math image. is the space of regression trees,
T
is the number of leaves in the tree,
w
is a leaf weights vector, LaTex Math image. is a score on
i
-th leaf. LaTex Math image. represents the structure of each tree that maps an observation to the corresponding leaf index.
Training procedure is an iterative functional gradient descent algorithm which minimizes objective function over function space by iteratively choosing a function (regression tree) that points in the negative gradient direction. The objective function is
LaTex Math image.
where LaTex Math image. is twice differentiable convex loss function and LaTex Math image. is a regularization term that penalizes the complexity of the model defined by the number of leaves T and the L2 norm of the weights LaTex Math image. for each tree, LaTex Math image. and LaTex Math image. are regularization parameters.
Training Stage
Library uses the second-order approximation of objective function
LaTex Math image.
where LaTex Math image. , LaTex Math image. and following algorithmic framework for the training stage.
Let LaTex Math image. be the set of observations. Given the training parameters, such as the number of iterations
M
, loss function LaTex Math image. , regression tree training parameters, regularization parameters LaTex Math image. and LaTex Math image. , shrinkage (learning rate) parameter LaTex Math image. , the algorithm does the following:
  • Find an initial guess LaTex Math image. ,
    i = 1, …, n
  • For LaTex Math image. :
    • Update LaTex Math image. and LaTex Math image. ,
      i = 1, …, n
    • Grow a regression tree LaTex Math image. that minimizes the objective function LaTex Math image. , where LaTex Math image.LaTex Math image. , LaTex Math image. , LaTex Math image. .
    • Assign an optimal weight LaTex Math image. to the leaf
      j
      , LaTex Math image. .
    • Apply shrinkage parameter LaTex Math image. to the tree leafs and add the tree to the model
    • Update LaTex Math image.
The algorithm for growing the tree:
  • Generate a bootstrap training set if required (stochastic gradient boosting) as follows: select randomly without replacement LaTex Math image. observations, where
    f
    is a fraction of observations used for training of one tree.
  • Start from the tree with depth
    0
    .
  • For each leaf node in the tree:
    • Choose a subset of feature for split finding if required (stochastic gradient boosting).
    • Find the best split that maximizes the gain:
    LaTex Math image.
    • Stop when a termination criterion is met.
For more details, see [Chen2016].
The library supports the following termination criteria when growing the tree:
  • Minimal number of observations in a leaf node.
    Node t is not processed if the subset of observations is smaller than the predefined value. Splits that produce nodes with the number of observations smaller than that value are not allowed.
  • Maximal tree depth.
    Node t is not processed, if its depth in the tree reached the predefined value.
  • Minimal split loss.
    Node t is not processed, if the best possible split is smaller than parameter LaTex Math image. .
Prediction Stage
Given a gradient boosted trees model and vectors LaTex Math image. , the problem is to calculate the responses for those vectors. To solve the problem for each given query vector LaTex Math image. , the algorithm finds the leaf node in a tree in the ensemble which gives the response by that tree. Resulting response is based on an aggregation of responses from all trees in the ensemble. For detailed definition, see description of a specific algorithm.
Split Calculation Mode
The library supports two split calculation modes:
  • exact - all possible split values are examined when searching for the best split for a feature.
  • inexact - continuous features are bucketed into discrete bins and the possible splits are restricted by the buckets borders only.

Batch Processing

Gradient boosted trees classification and regression follows the general workflow described in Classification Usage Model and Regression Usage Model.
Training
For description of the input and output, refer to .
At the training stage, the gradient boosted trees batch algorithm has the following parameters:
Parameter
Default Value
Description
splitMethod
inexact
Split computation mode.
Possible values:
  • inexact
    - continuous features are bucketed into discrete bins and the buckets borders are examined only
  • exact
    - all possible splits for a given feature are examined
maxIterations
50
Maximal number of iterations when training the model, defines maximal number of trees in the model.
maxTreeDepth
6
Maximal tree depth. If the parameter is set to
0
then the depth is unlimited.
shrinkage
0.3
Learning rate of the boosting procedure. Scales the contribution of each tree by a factor LaTex Math image.
minSplitLoss
0
Loss regularization parameter. Minimal loss reduction required to make a further partition on a leaf node of the tree. Range: LaTex Math image.
lambda
1
L2 regularization parameter on weights. Range: LaTex Math image.
observationsPerTreeFraction
1
Fraction of the training set S used for a single tree training, LaTex Math image. . The observations are sampled randomly without replacement.
featuresPerNode
0
The number of features tried as the possible splits per node. If the parameter is set to
0
, all features are used.
minObservationsInLeafNode
5
Minimal number of observations in the leaf node.
memorySavingMode
false
If true then use memory saving (but slower) mode.
engine
SharePtr< engines:: mt19937:: Batch>()
Pointer to the random number generator.
maxBins
256
Used with inexact split method only. Maximal number of discrete bins to bucket continuous features. Increasing the number results in higher computation costs
minBinSize
5
Used with inexact split method only. Minimal number of observations in a bin.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.