Developer Guide and Reference

  • 2021.1
  • 12/04/2020
  • Public Content
Contents

Decision Forest Classification and Regression (DF)

Decision Forest (DF) classification and regression algorithms are based on an ensemble of tree-structured classifiers, which are known as decision trees. Decision forest is built using the general technique of bagging, a bootstrap aggregation, and a random choice of features. Decision tree is a binary tree graph. Its internal (split) nodes represent a decision function used to select the child node at the prediction stage. Its leaf, or terminal, nodes represent the corresponding response values, which are the result of the prediction from the tree. For more details, see [Breiman84] and [Breiman2001].

Mathematical formulation

Training
Refer to Decision Forest.
Training method:
Dense
In “Dense” training method all possible split variants for each feature (from selected features’ subset for current node) are evaluated for best split computation.
Training method:
Hist
“inexact” (also called “histogram”) training method. In this method we consider only some selected subset of splits for best split computation. This subset of splits is computed for each feature on initialization stage of the algorithm. After computing subset of splits, we substitute each value from initially provided data with the value of the corresponding bin. Bins are continuous intervals between selected splits.
Inference methods:
Dense
and
Hist
“Dense” and “Hist” inference methods performs prediction by the same way:
  1. For classification, LaTex Math image. , where
    C
    is the number of classes, the tree ensemble model predicts the output by selecting the response
    y
    , which is voted for by the majority of the trees in the forest.
  2. For regression, the tree ensemble model uses the mean of
    M
    functions’ results to predict the output, i.e. LaTex Math image. where LaTex Math image. is a set of regression trees,
    W
    is a set of tree leaves’ scores and
    T
    is the number of leaves in the tree. In other words, each tree maps an observation to the corresponding leaf’s score.

Programming Interface

All types and functions in this section shall be declared in the
oneapi::dal::decision_forest
namespace and be available via inclusion of the
oneapi/dal/algo/decision_forest.hpp
header file.
Enum classes
enum class
error_metric_mode
error_metric_mode::none
Do not compute error metric.
error_metric_mode::out_of_bag_error
Train produces LaTex Math image. table with cumulative prediction error for out of bag observations.
error_metric_mode::out_of_bag_error_per_observation
Train produces LaTex Math image. table with prediction error for out of bag observations.
enum class
variable_importance_mode
variable_importance_mode::none
Do not compute variable importance.
variable_importance_mode::mdi
Mean Decrease Impurity. Computed as the sum of weighted impurity decreases for all nodes where the variable is used, averaged over all trees in the forest.
variable_importance_mode::mda_raw
Mean Decrease Accuracy (permutation importance). For each tree, the prediction error on the out-of-bag portion of the data is computed (error rate for classification, MSE for regression). The same is done after permuting each predictor variable. The difference between the two are then averaged over all trees.
variable_importance_mode::mda_scaled
Mean Decrease Accuracy (permutation importance). This is MDA_Raw value scaled by its standard deviation.
enum class
infer_mode
infer_mode::class_labels
Infer produces a LaTex Math image. table with the predicted labels.
infer_mode::class_probabilities
Infer produces LaTex Math image. table with the predicted class probabilities for each observation.
enum class
voting_mode
voting_mode::weighted
The final prediction is combined through a weighted majority voting.
voting_mode::unweighted
The final prediction is combined through a simple majority voting.
Descriptor
template<typename
Task
= task::by_default>
class
descriptor_base
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be
task::classification
or
task::regression
.
Constructors
descriptor_base
()
Creates a new instance of the class with the default property values.
Properties
double
observations_per_tree_fraction
= 1.0
The fraction of observations per tree.
Getter & Setter


double get_observations_per_tree_fraction() const

Invariants


observations_per_tree_fraction > 0.0
observations_per_tree_fraction <= 1.0

double
impurity_threshold
= 0.0
The impurity threshold, a node will be split if this split induces a decrease of the impurity greater than or equal to the input value.
Getter & Setter


double get_impurity_threshold() const

Invariants


impurity_threshold >= 0.0

double
min_weight_fraction_in_leaf_node
= 0.0
The min weight fraction in leaf node. The minimum weighted fraction of the sum total of weights (of all the input observations) required to be at a leaf node.
Getter & Setter


double get_min_weight_fraction_in_leaf_node() const

Invariants


min_weight_fraction_in_leaf_node >= 0.0
min_weight_fraction_in_leaf_node <= 0.5

double
min_impurity_decrease_in_split_node
= 0.0
The min impurity decrease in a split node - a threshold for early stopping in tree growth. A node will split if its impurity is above the threshold, otherwise it is a leaf.
Getter & Setter


double get_min_impurity_decrease_in_split_node() const

Invariants


min_impurity_decrease_in_split_node >= 0.0

std::int64_t
tree_count
= 100
The number of trees in the forest.
Getter & Setter


std::int64_t get_tree_count() const

Invariants


tree_count > 0

std::int64_t features_per_node = task::classification ? sqrt(p) : p/3, where p is the total number of features
The number of features to consider when looking for the best split for a node.
Getter & Setter


std::int64_t get_features_per_node() const

std::int64_t
max_tree_depth
= 0
The maximal depth of the tree. If 0, then nodes are expanded until all leaves are pure or until all leaves contain less or equal to min observations in leaf node samples.
Getter & Setter


std::int64_t get_max_tree_depth() const

std::int64_t min_observations_in_leaf_node = task::classification ? 1 : 5
The minimal number of observations in a leaf node.
Getter & Setter


std::int64_t get_min_observations_in_leaf_node() const

Invariants


min_observations_in_leaf_node > 0

std::int64_t
min_observations_in_split_node
= 2
The minimal number of observations in a split node.
Getter & Setter


std::int64_t get_min_observations_in_split_node() const

Invariants


min_observations_in_split_node > 1

std::int64_t
max_leaf_nodes
= 0
The maximal number of the leaf nodes. If 0, then no limit for the number of leaf nodes.
Getter & Setter


std::int64_t get_max_leaf_nodes() const

std::int64_t
max_bins
= 256
The maximal number of discrete bins to bucket continuous features. Used with
method::hist
split finding method only. Increasing the number results in higher computation costs.
Getter & Setter


std::int64_t get_max_bins() const

Invariants


max_bins > 1

std::int64_t
min_bin_size
= 5
The minimal number of observations in a bin. Used with
method::hist
split finding method only.
Getter & Setter


std::int64_t get_min_bin_size() const

Invariants


min_bin_size > 0

bool
memory_saving_mode
= False
The memory saving (but slower) mode.
Getter & Setter


bool get_memory_saving_mode() const

bool
bootstrap
= True
The bootstrap mode, if True, the training set for a tree is a bootstrap of the whole training set, if False, the whole dataset is used to build each tree.
Getter & Setter


bool get_bootstrap() const

error_metric_mode
error_metric_mode
= error_metric_mode::none
The error metric mode.
Getter & Setter


error_metric_mode get_error_metric_mode() const

variable_importance_mode
variable_importance_mode
= variable_importance_mode::none
The variable importance mode.
Getter & Setter


variable_importance_mode get_variable_importance_mode() const

std::int64_t
class_count
= 2
The class count. Used with
task::classification
only.
Getter & Setter


template > std::int64_t get_class_count() const

infer_mode
infer_mode
The infer mode. Used with
task::classification
only.
Getter & Setter


template > infer_mode get_infer_mode() const

voting_mode
voting_mode
The voting mode. Used with
task::classification
only.
Getter & Setter


template > voting_mode get_voting_mode() const

template<typename
Float
= detail::descriptor_base<>::float_t, typename
Method
= detail::descriptor_base<>::method_t, typename
Task
= detail::descriptor_base<>::task_t>
class
descriptor
Template Parameters
  • Float
    – The floating-point type that the algorithm uses for intermediate computations. Can be
    float
    or
    double
    .
  • Method
    – Tag-type that specifies an implementation of algorithm. Can be
    method::dense
    or
    method::hist
    .
  • Task
    – Tag-type that specifies type of the problem to solve. Can be
    task::classification
    or
    task::regression
    .
Public Methods
auto &
set_observations_per_tree_fraction
(double
value
)
auto &
set_impurity_threshold
(double
value
)
auto &
set_min_weight_fraction_in_leaf_node
(double
value
)
auto &
set_min_impurity_decrease_in_split_node
(double
value
)
auto &
set_tree_count
(std::int64_t
value
)
auto &
set_features_per_node
(std::int64_t
value
)
auto &
set_max_tree_depth
(std::int64_t
value
)
auto &
set_min_observations_in_leaf_node
(std::int64_t
value
)
auto &
set_min_observations_in_split_node
(std::int64_t
value
)
auto &
set_max_leaf_nodes
(std::int64_t
value
)
auto &
set_max_bins
(std::int64_t
value
)
auto &
set_min_bin_size
(std::int64_t
value
)
auto &
set_error_metric_mode
(error_metric_mode
value
)
auto &
set_memory_saving_mode
(bool
value
)
auto &
set_bootstrap
(bool
value
)
auto &
set_variable_importance_mode
(variable_importance_mode
value
)
template<typename
T
= Task, typename
None
= detail::enable_if_classification_t<T>> auto &
set_class_count
(std::int64_t
value
)
template<typename
T
= Task, typename
None
= detail::enable_if_classification_t<T>> auto &
set_infer_mode
(infer_mode
value
)
template<typename
T
= Task, typename
None
= detail::enable_if_classification_t<T>> auto &
set_voting_mode
(voting_mode
value
)
Method tags
struct
dense
Tag-type that denotes dense computational method.
struct
hist
Tag-type that denotes hist computational method.
using
by_default
= dense
Alias tag-type for dense computational method.
Task tags
struct
classification
Tag-type that parameterizes entities used for solving classification problem.
struct
regression
Tag-type that parameterizes entities used for solving regression problem.
using
by_default
= classification
Alias tag-type for classification task.
Model
template<typename
Task
= task::by_default>
class
model
Template Parameters
Task
– Tag-type that specifies type of the problem to solve. Can be
task::classification
or
task::regression
.
Constructors
model
()
Creates a new instance of the class with the default property values.
Properties
std::int64_t
tree_count
= 100
The number of trees in the forest.
Getter & Setter


std::int64_t get_tree_count() const

Invariants


tree_count > 0

std::int64_t
class_count
= 2
The class count. Used with
task::classification
only.
Getter & Setter


template > std::int64_t get_class_count() const

Training
train(...)
Input
template<typename
Task
= task::by_default>
class
train_input
Template Parameters
Task
– Tag-type that specifies the type of the problem to solve. Can be
task::classification
or
task::regression
.
Constructors
train_input
(
const
table &
data
,
const
table &
labels
)
Creates a new instance of the class with the given
data
and
labels
property values.
Properties
const
table &
data
= table{}
The training set LaTex Math image. .
Getter & Setter


const table & get_data() const
auto & set_data(const table &value)

const
table &
labels
= table{}
Vector of labels LaTex Math image. for the training set LaTex Math image. .
Getter & Setter


const table & get_labels() const
auto & set_labels(const table &value)

Result
template<typename
Task
= task::by_default>
class
train_result
Template Parameters
Task
– Tag-type that specifies the type of the problem to solve. Can be
task::classification
or
task::regression
.
Constructors
train_result
()
Creates a new instance of the class with the default property values.
Properties
const
model<Task> &
model
= model<Task>{}
The trained Decision Forest model.
Getter & Setter


const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

const
table &
oob_err
= table{}
A LaTex Math image. table containing cumulative out-of-bag error value. Computed when error_metric_mode set with
error_metric_mode::out_of_bag_error
.
Getter & Setter


const table & get_oob_err() const
auto & set_oob_err(const table &value)

const
table &
oob_err_per_observation
= table{}
A LaTex Math image. table containing out-of-bag error value per observation. Computed when error_metric_mode set with
error_metric_mode::out_of_bag_error_per_observation
.
Getter & Setter


const table & get_oob_err_per_observation() const
auto & set_oob_err_per_observation(const table &value)

const
table &
var_importance
= table{}
A LaTex Math image. table containing variable importance value for each feature. Computed when
variable_importance_mode != variable_importance_mode::none
.
Getter & Setter


const table & get_var_importance() const
auto & set_var_importance(const table &value)

Inference
infer(...)
Input
template<typename
Task
= task::by_default>
class
infer_input
Template Parameters
Task
– Tag-type that specifies the type of the problem to solve. Can be
task::classification
or
task::regression
.
Constructors
infer_input
(
const
model<Task> &
trained_model
,
const
table &
data
)
Creates a new instance of the class with the given
model
and
data
property values.
Properties
const
model<Task> &
model
= model<Task>{}
The trained Decision Forest model.
Getter & Setter


const model< Task > & get_model() const
auto & set_model(const model< Task > &value)

const
table &
data
= table{}
The dataset for inference LaTex Math image. .
Getter & Setter


const table & get_data() const
auto & set_data(const table &value)

Result
template<typename
Task
= task::by_default>
class
infer_result
Template Parameters
Task
– Tag-type that specifies the type of the problem to solve. Can be
task::classification
or
task::regression
.
Constructors
infer_result
()
Creates a new instance of the class with the default property values.
Properties
const
table &
labels
= table{}
The LaTex Math image. table with the predicted labels.
Getter & Setter


const table & get_labels() const
auto & set_labels(const table &value)

const
table &
probabilities
A LaTex Math image. table with the predicted class probabilities for each observation.
Getter & Setter


template > const table & get_probabilities() const
template > auto & set_probabilities(const table &value)

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.