Softmax Backward Layer

For any x i 1...i p from XR n 1 x ... x n p and for dimension k of size n k , the softmax activation layer applies the transform defined as

The softmax function is known as the normalized exponential (see [Bishop2006] for exact definitions of softmax).

The backward softmax layer for dimension k of size n k computes the value:

where g i 1...i p is the input gradient computed on the preceding layer.

Problem Statement

Given p-dimensional tensors of size n 1 x n 2 x ... x n p :

  • G = (g i 1...i p ) with the gradient computed on the preceding layer

  • Y = (y i 1...i p ) with the output of the forward softmax layer

The problem is to compute the p-dimensional tensor Z = (z i 1...i p ) of size n 1 x n 2 x ... x n p such that:

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)