Abivin: Case Study: Relation between Normalisation, ReLU, and Learning Rate

ReLU is usually used in modern ANN as it has much better performance than other sigmoid-like activation functions. And learning rate is always in range (0,1), ie. to change variables by certain percent.

Normalising X input values is important as when X values are big and much bigger than 1, the ReLU activation function gives big results and small learning rate can't change weight variables and loss curve converges to non-zero horizontal line.

Furthermore, weights must be initialised randomly in range [-1,1] instead of [0,1] to make the calculation go in both directions (up, down). Each hidden layer (ReLU layer, not output layer which holds probabilities of classes) should be able to do 1-time linear separation.

Source code:

#libs
import tensorflow        as tf;
import matplotlib.pyplot as pyplot;

#data
X = [[0,0],   [0,1],   [1,0],   [10,10], [20,21]];
Y = [[1,0,0], [1,0,0], [1,0,0], [0,1,0], [0,0,1]]; #probabilities
Batch_Size = 5;

#normalise
#BECAUSE RELU IS USED (FOR PERFORMANCE) AND LEARNING RATE IS SMALL (<1),
#DNN WON'T CONVERGE WITHOUT THE FOLLOWING 3 LINES FOR NORMALISING.
for I in range(len(X)):
  X[I][0] = X[I][0]/21;
  X[I][1] = X[I][1]/21;
#end for

#model
Input     = tf.placeholder(dtype=tf.float32, shape=[Batch_Size,2]);
Expected  = tf.placeholder(dtype=tf.float32, shape=[Batch_Size,3]); #3 classes

Weight1   = tf.Variable(tf.random_uniform(shape=[2,20], minval=-1, maxval=1));
Bias1     = tf.Variable(tf.random_uniform(shape=[  20], minval=-1, maxval=1));
Hidden1   = tf.nn.relu(tf.matmul(Input,Weight1) + Bias1);

Weight2   = tf.Variable(tf.random_uniform(shape=[20,10], minval=-1, maxval=1));
Bias2     = tf.Variable(tf.random_uniform(shape=[   10], minval=-1, maxval=1));
Hidden2   = tf.nn.relu(tf.matmul(Hidden1,Weight2) + Bias2);

Weight3   = tf.Variable(tf.random_uniform(shape=[10,3], minval=-1, maxval=1));
Bias3     = tf.Variable(tf.random_uniform(shape=[   3], minval=-1, maxval=1));
Output    = tf.sigmoid(tf.matmul(Hidden2,Weight3) + Bias3);

Loss      = tf.reduce_sum(tf.square(Expected-Output));
Optimiser = tf.train.GradientDescentOptimizer(1e-1);
Training  = Optimiser.minimize(Loss);

#train
Sess = tf.Session();
Init = tf.global_variables_initializer();
Sess.run(Init);

Losses = [];
for I in range(10000):
  if (I%1000==0):
    Lossvalue = Sess.run(Loss, feed_dict={Input:X, Expected:Y});
    Losses += [Lossvalue];
    print("Loss:",Lossvalue);
  #end if
  
  Sess.run(Training, feed_dict={Input:X, Expected:Y});
#end for

#result: loss
Lastloss = Sess.run(Loss, feed_dict={Input:X, Expected:Y});
Losses  += [Lastloss];
print("Loss:",Lastloss,"(Last)");

#result: eval
Evalresult = Sess.run(Output, feed_dict={Input:X, Expected:Y});
for I in range(Batch_Size):
  Evalresult[I][0] = round(Evalresult[I][0],3);  
  Evalresult[I][1] = round(Evalresult[I][1],3);
  Evalresult[I][2] = round(Evalresult[I][2],3);
#end for
print("Eval:\n"+str(Evalresult));

#result: diagram
print("Loss curve:");
pyplot.plot(Losses);
#eof

Colab link:
https://colab.research.google.com/drive/1BONVbohZicRYmeNy7JUBgvu4mqgiKAOq

Thursday, 5 September 2019

Case Study: Relation between Normalisation, ReLU, and Learning Rate

No comments:

Post a Comment