Abivin: DNN Gradient Descent Backpropagation with Mean Squared Error

Saturday, 11 January 2020

DNN Gradient Descent Backpropagation with Mean Squared Error

Gradient descent is applied in the backpropagation (dynamic programming) to get gradients of loss with respect to weights. Mean Squared Error (L2) is commonly used instead of Mean Absolute Error (L1).

MSE = mean[ sum[ (Out-Y)² ] ]

Where Out-Y is delta, and mean squared error is also called mean squared loss; not the loss is squared, deltas are squared and summed up.

The algorithm:

Notation:

H: Hidden layer output
U: Output layer output
u: Single node output (at output layer)
B: Backpropagation intermediate value
f: Activation function
fd: Activation function derivative
X: Input to network
Y: Labels or expected results
y: Single node expected result
R: Learning rate
Dot without arguments is dot in feedforward

Feedforward:

First hidden layer:

H = f(dot(X,W))

Other hidden layers:

H = f(dot(H_prev,W))

Output layer:

U = f(dot(H_prev,W))

Backpropagate:

Output layer:

B = 2*(u-y)/N * fd(dot)

Other hidden layers:

B = dot(B_right,W) * fd(dot)

Update weights (Optimise after having all B(s))

G = B*H_prev
W -= R*G

Saturday, 11 January 2020

DNN Gradient Descent Backpropagation with Mean Squared Error

No comments:

Post a Comment