#
Usage Model: Training and Prediction

## Training

`+1)-dimensional tensorp`

`of sizeq`

`n`

_{1}x

`n`

_{2}x ... x

`n`

_{p}x

`n`

_{p+1}where each element is a sample, a (

`+1)-dimensional tensorp`

`of sizey`

`n`

_{1}x

`n`

_{2}x ... x

`n`

_{p}x

`n`

_{p+1}where each element is a stated result for the corresponding sample, and a neural network that consists of

`layers, the problem is to train the neural network. For more details, see Training and Prediction.n`

** Forward Computation**

- Provide the neural network with the input data for training. You can provide either one sample or a set of samples. ThebatchSizeparameter specifies the number of simultaneously processed samples.
- Computex
_{i+1}=(fx_{i}), where:- x
_{i}is the input data for the layeri - x
_{i+1}is the output value of the layeri - f
_{i}() is the function corresponding to the layerx.i = 0, …,i-1 is the index of the layern

- Compute an error as the result of a loss layer:
=ef_{loss}(x_{n-1},). For available loss layers, see Layers.y

`is the layeri`

`-1.i`

**Backward Computation**

- Compute the input gradient for the penultimate layer as the gradient of the loss layergrad
_{n}= ∇f_{loss}(x_{n-1},).y - Computegrad
_{i}= ∇f_{i}(x_{i})*grad_{i+1}, where:- grad
_{i}is the gradient obtained at the-the layeri - grad
_{i+1}is the gradient obtained at the (+1)-the layeri =i- 1, ..., 0n

- Apply one of the optimization methods to the results of the previous step. Compute
,w=boptimizationSolver(,w,bgrad_{o}), where= (ww_{o},w_{1}, ...,w_{n-1}),= (bb_{o},b_{1}, ...,b_{n-1}). For available optimization solver algorithms, see Optimization Solvers.

- Clones all the forward layers of the training model except the loss layer.
- Replaces the loss layer with the layer returned by thegetLayerForPredictionmethod of the forward loss layer. For example, the loss softmax cross-entropy forward layer is replaced with the softmax forward layer.

`is the layeri`

`+1.i`

## Prediction

`and biasesw`

`) and a new (b`

`+1)-dimensional tensorp`

`of sizex`

`n`

_{1}x

`n`

_{2}x ... x

`n`

_{p}x

`n`

_{p+1}, the algorithm determines the result for each sample (one of elements of the tensor

`). Unlike the training stage, during prediction the algorithm performs the forward computation only.y`