# Details

`feature vectorsn`

`= {X`

`x`

_{1}= (

`x`

_{11},...,

`x`

_{1p}),...,

`x`

_{n}= (

`x`

_{n1},...,

`x`

_{np}) } of

`n`

`-dimensional feature vectors andp`

`responsesn`

`= {Y`

`y`

_{1},...,

`y`

_{n}}, the problem is to build a gradient boosted trees classification or regression model.

`is the number of leaves in the tree,T`

`is a leaf weights vector,w`

`w`

_{i}is a score on

`-th leaf.i`

`(q`

`)represents the structure of each tree that maps an observation to the corresponding leaf index.x`

`(f`

`x`

_{i}) and the response

`y`

_{i}, and

`andγ`

`are regularization parameters.λ`

## Training Stage

`= (S`

`,X`

`) be the set of observations. Given the training parameters, such as the number of iterationsY`

`, loss functionM`

`andγ`

`, shrinkage (learning rate) parameterλ`

`, the algorithm does the following:θ`

- Find an initial guess
- For
= 1,...,k:M- Updateg
_{i}and - Grow a regression tree
- Assign an optimal weight
- Apply shrinkage parameter
to the tree leafs and add the tree to the modelθ - Update

- Generate a bootstrap training set if required (stochastic gradient boosting) as follows: select randomly without replacement
=N*fobservations, wherenis a fraction of observations used for training of one tree.f - Start from the tree with depth 0.
- For each leaf node in the tree:
- Choose a subset of feature for split finding if required (stochastic gradient boosting).
- Find the best split that maximizes the gain:

- Stop when a termination criterion is met.

NodeMinimal number of observations in a leaf node.is not processed if the subset of observations is smaller than the predefined value. Splits that produce nodes with the number of observations smaller than that value are not allowed.tNodeMaximal tree depth.is not processed, if its depth in the tree reached the predefined value.tNodeMinimal split loss.is not processed, if the best possible split is smaller than parametert.γ

## Training Alternative

## Prediction Stage

`x`

_{1},...,

`x`

_{r}, the problem is to calculate the responses for those vectors. To solve the problem for each given query vector

`x`

_{i}, the algorithm finds the leaf node in a tree in the ensemble which gives the response by that tree. Resulting response is based on an aggregation of responses from all trees in the ensemble. For detailed definition, see description of a specific algorithm.

## Split Calculation Mode

- all possible split values are examined when searching for the best split for a feature.exact- continuous features are bucketed into discrete bins and the possible splits are restricted by the buckets borders only.inexact

## Variable Importance

represents the number of nodes that split samples by the given feature:WeightwherefeatureSplit_{jk}is the number of features that is used to split samples in thek^{th}node of thej^{th}treerepresents the number of samples passing through a node that splits the entire set by the given feature:Total coverwhere- featureSplit
_{jk}is the number of features used to split samples in thek^{th}node of thejthtree - samples
_{jk}is the number of samples passing through thek^{th}node of thej^{th}tree

represents the average number of samples passing through one split of the given feature:Coveris the reduction of the loss function of the given feature in the entire model:Total gainwhere- featureSplit
_{jk}is the number of features used to split samples in thek^{th}node of thejthtree - impurityLeft
_{jk}is impurity of left child of thek^{th}node of thej^{th}tree - impurityRight
_{jk}is impurity of right child of thek^{th}node of thej^{th}tree - impurity
_{jk}is impurity of thek^{th}node of thej^{th}tree

is the reduction of the loss function by one split for the given feature:Gain