For an input tensor X ∈ R n 1 x n 2 x ... x n k x ... x n p , selected dimension k of size n k , and ground truth tensor T ∈ R n 1 x n 2 x ... x 1 x ... x n p , the forward loss softmax cross-entropy layer computes a one-dimensional tensor with the cross-entropy value. For more details, see Forward Loss Softmax Cross-entropy Layer.
The backward loss softmax cross-entropy layer computes gradient values z m = s m - δ m , where s m are probabilities computed on the forward layer and δ m are indicator functions computed using t m , the ground truth values computed on the preceding layer.
- The p-dimensional tensor T = (t j 1...j k ...j p ) ∈ R n 1 x n 2 x ... x 1 x ... x n p that contains ground truths, where
The p-dimensional tensor S = (s j 1...j k ...j p ) ∈ R n 1 x n 2 x ... x n k x ... x n p with the probabilities
that the sample j 1...j p corresponds to the ground truth t j 1...j p .
The problem is to compute a one-dimensional tensor Z ∈ R n 1 x n 2 x ... x n k x ... x n p such that: