Developer Guide

Contents

Details

Given:
  • n
    feature vectors
    X
    = {
    x
    1
    = (
    x
    11
    ,…,
    x
    1
    p
    ), ...,
    x
    n
    = (
    x
    n
    1
    ,…,
    x
    np
    ) } of size
    p
  • their non-negative sample weights
    w
    ={
    w
    1
    , ...,
    w
    n
    }
  • the vector of responses
    y
    = (
    y
    1
    , … ,
    y
    n
    )
The problem is to build a decision forest regression model that minimizes the Mean-Square Error (MSE) between the predicted and true value.

Training Stage

Decision forest regression follows the algorithmic framework of decision forest training algorithm based on the mean-squared error (MSE) [Breiman84]. If sample weights are provided as input, the library uses a weighted version of the algorithm.
MSE is an impurity metric (
D
is a set of observations that reach the node), calculated as follows:
Without sample weights
With sample weights
, which is equivalent to the number of elements in
S

Prediction Stage

Given decision forest regression model and vectors
x
1
, ... ,
x
r
, the problem is to calculate the responses for those vectors. To solve the problem for each given query vector
x
i
, the algorithm finds the leaf node in a tree in the forest that gives the response by that tree as the mean of dependent variables. The forest predicts the response as the mean of responses from trees.

Out-of-bag Error

Decision forest regression follows the algorithmic framework for calculating the decision forest out-of-bag (OOB) error, where aggregation of the out-of-bag predictions in all trees and calculation of the OOB error of the decision forest is done as follows:
  • For each vector
    x
    i
    in the dataset
    X
    , predict its response
    as the mean of prediction from the trees that contain
    x
    i
    in their OOB set.
  • Calculate the OOB error of the decision forest
    T
    as the Mean-Square Error (MSE):
  • If OOB error value per each observation is required, then calculate the prediction error for
    x
    i
    .

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserverd for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804