Saturday, 11 January 2020

DNN Gradient Descent Backpropagation with Mean Squared Error

Gradient descent is applied in the backpropagation (dynamic programming) to get gradients of loss with respect to weights. Mean Squared Error (L2) is commonly used instead of Mean Absolute Error (L1).

MSE = mean[ sum[ (Out-Y)2 ] ]

Where Out-Y is delta, and mean squared error is also called mean squared loss; not the loss is squared, deltas are squared and summed up.

The algorithm:
  • Notation:
    • H: Hidden layer output
    • U: Output layer output
    • u: Single node output (at output layer)
    • B: Backpropagation intermediate value
    • f: Activation function
    • fd: Activation function derivative
    • X: Input to network
    • Y: Labels or expected results
    • y: Single node expected result
    • R: Learning rate
    • Dot without arguments is dot in feedforward
  • Feedforward:
    • First hidden layer:
      • H = f(dot(X,W))
    • Other hidden layers:
      • H = f(dot(Hprev,W))
    • Output layer:
      • U = f(dot(Hprev,W))
  • Backpropagate:
    • Output layer:
      • B = 2*(u-y)/N * fd(dot)
    • Other  hidden layers:
      • B = dot(Bright,W) * fd(dot)
    • Update weights (Optimise after having all B(s))
      • G = B*Hprev
      • W -= R*G

No comments:

Post a Comment