Developer Guide

Contents

Usage Model: Training and Prediction

Training

Given a (
p
+1)-dimensional tensor
q
of size
n
1
x
n
2
x ... x
n
p
x
n
p
+1
where each element is a sample, a (
p
+1)-dimensional tensor
y
of size
n
1
x
n
2
x ... x
n
p
x
n
p
+1
where each element is a stated result for the corresponding sample, and a neural network that consists of
n
layers, the problem is to train the neural network. For more details, see Training and Prediction.
Intel DAAL
supports only supervised learning with a known vector of class labels.
The key mechanism used to train a neural network is a backward propagation of errors [Rumelhart86]. During the training stage the algorithm performs forward and backward computations.
The training stage consists of one or several epochs. An epoch is the time interval when the network processes the entire input data set performing several forward passes, backward passes, and updates of weights and biases in the neural network model.
Each epoch consists of several iterations. An iteration is the time interval when the network performs one forward and one backward pass using a part of the input data set called a batch. At each iteration, the optimization solver performs an optimization step and updates weights and biases in the model.
Forward Computation
Follow these steps:
  1. Provide the neural network with the input data for training. You can provide either one sample or a set of samples. The
    batchSize
    parameter specifies the number of simultaneously processed samples.
  2. Compute
    x
    i
    +1
    =
    f
    (
    x
    i
    ), where:
    • x
      i
      is the input data for the layer
      i
    • x
      i
      +1
      is the output value of the layer
      i
    • f
      i
      (
      x
      ) is the function corresponding to the layer
      i
      .
    • i
      = 0, …,
      n
      -1 is the index of the layer
    For some layers, the computation can also use weights
    w
    and biases
    b
    . For more details, see Layers.
  3. Compute an error as the result of a loss layer:
    e
    =
    f
    loss
    (
    x
    n
    -1
    ,
    y
    ). For available loss layers, see Layers.
In the descriptions of specific forward layers in the Layers section, the preceding layer for the layer
i
is the layer
i
-1.
Backward Computation
Follow these steps:
  1. Compute the input gradient for the penultimate layer as the gradient of the loss layer
    grad
    n
    = ∇
    f
    loss
    (
    x
    n
    -1
    ,
    y
    ).
  2. Compute
    grad
    i
    = ∇
    f
    i
    (
    x
    i
    )*
    grad
    i
    +1
    , where:
    • grad
      i
      is the gradient obtained at the
      i
      -the layer
    • grad
      i
      +1
      is the gradient obtained at the (
      i
      +1)-the layer
    • i
      =
      n
      - 1, ..., 0
    For some layers, the computation can also use weights
    w
    and biases
    b
    . For more details, see Layers.
  3. Apply one of the optimization methods to the results of the previous step. Compute
    w
    ,
    b
    =
    optimizationSolver
    (
    w
    ,
    b
    ,
    grad
    o
    ), where
    w
    = (
    w
    o
    ,
    w
    1
    , ...,
    w
    n
    -1
    ),
    b
    = (
    b
    o
    ,
    b
    1
    , ...,
    b
    n
    -1
    ). For available optimization solver algorithms, see Optimization Solvers.
As a result of the training stage, you receive the trained model with the optimum set of weights and biases. Use the
getPredictionModel
method to get the model you can use at the prediction stage. This method performs the following steps to produce the prediction model from the training model:
  1. Clones all the forward layers of the training model except the loss layer.
  2. Replaces the loss layer with the layer returned by the
    getLayerForPrediction
    method of the forward loss layer. For example, the loss softmax cross-entropy forward layer is replaced with the softmax forward layer.
In the descriptions of specific backward layers in the Layers section, the preceding layer for the layer
i
is the layer
i
+1.

Prediction

Given the trained network (with optimum set of weights
w
and biases
b
) and a new (
p
+1)-dimensional tensor
x
of size
n
1
x
n
2
x ... x
n
p
x
n
p
+1
, the algorithm determines the result for each sample (one of elements of the tensor
y
). Unlike the training stage, during prediction the algorithm performs the forward computation only.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804