Gradient Boosted Trees
- Find an initial guess ,i = 1, …, n
- For :
- Update and ,i = 1, …, n
- Grow a regression tree that minimizes the objective function , where , , , .
- Assign an optimal weight to the leafj,j = 1, ldots, T.
- Apply shrinkage parameter to the tree leafs and add the tree to the model
- Generate a bootstrap training set if required (stochastic gradient boosting) as follows: select randomly without replacement observations, wherefis a fraction of observations used for training of one tree.
- Start from the tree with depth0.
- For each leaf node in the tree:
- Choose a subset of feature for split finding if required (stochastic gradient boosting).
- Find the best split that maximizes the gain:
- Stop when a termination criterion is met.
- Minimal number of observations in a leaf node.Node t is not processed if the subset of observations is smaller than the predefined value. Splits that produce nodes with the number of observations smaller than that value are not allowed.
- Maximal tree depth.Node t is not processed, if its depth in the tree reached the predefined value.
- Minimal split loss.Node t is not processed, if the best possible split is smaller than parameter .
- exact - all possible split values are examined when searching for the best split for a feature.
- inexact - continuous features are bucketed into discrete bins and the possible splits are restricted by the buckets borders only.
Split computation mode.
Maximal number of iterations when training the model, defines maximal number of trees in the model.
Maximal tree depth. If the parameter is set to
0then the depth is unlimited.
Learning rate of the boosting procedure. Scales the contribution of each tree by a factor
Loss regularization parameter. Minimal loss reduction required to make a further partition on a leaf node of the tree. Range:
L2 regularization parameter on weights. Range:
Fraction of the training set S used for a single tree training, . The observations are sampled randomly without replacement.
The number of features tried as the possible splits per node. If the parameter is set to
0, all features are used.
Minimal number of observations in the leaf node.
If true then use memory saving (but slower) mode.
SharePtr< engines:: mt19937:: Batch>()
Pointer to the random number generator.
Used with inexact split method only. Maximal number of discrete bins to bucket continuous features. Increasing the number results in higher computation costs
Used with inexact split method only. Minimal number of observations in a bin.