Normalising X input values is important as when X values are big and much bigger than 1, the ReLU activation function gives big results and small learning rate can't change weight variables and loss curve converges to non-zero horizontal line.
Furthermore, weights must be initialised randomly in range [-1,1] instead of [0,1] to make the calculation go in both directions (up, down). Each hidden layer (ReLU layer, not output layer which holds probabilities of classes) should be able to do 1-time linear separation.
Source code:
#libs import tensorflow as tf; import matplotlib.pyplot as pyplot; #data X = [[0,0], [0,1], [1,0], [10,10], [20,21]]; Y = [[1,0,0], [1,0,0], [1,0,0], [0,1,0], [0,0,1]]; #probabilities Batch_Size = 5; #normalise #BECAUSE RELU IS USED (FOR PERFORMANCE) AND LEARNING RATE IS SMALL (<1), #DNN WON'T CONVERGE WITHOUT THE FOLLOWING 3 LINES FOR NORMALISING. for I in range(len(X)): X[I][0] = X[I][0]/21; X[I][1] = X[I][1]/21; #end for #model Input = tf.placeholder(dtype=tf.float32, shape=[Batch_Size,2]); Expected = tf.placeholder(dtype=tf.float32, shape=[Batch_Size,3]); #3 classes Weight1 = tf.Variable(tf.random_uniform(shape=[2,20], minval=-1, maxval=1)); Bias1 = tf.Variable(tf.random_uniform(shape=[ 20], minval=-1, maxval=1)); Hidden1 = tf.nn.relu(tf.matmul(Input,Weight1) + Bias1); Weight2 = tf.Variable(tf.random_uniform(shape=[20,10], minval=-1, maxval=1)); Bias2 = tf.Variable(tf.random_uniform(shape=[ 10], minval=-1, maxval=1)); Hidden2 = tf.nn.relu(tf.matmul(Hidden1,Weight2) + Bias2); Weight3 = tf.Variable(tf.random_uniform(shape=[10,3], minval=-1, maxval=1)); Bias3 = tf.Variable(tf.random_uniform(shape=[ 3], minval=-1, maxval=1)); Output = tf.sigmoid(tf.matmul(Hidden2,Weight3) + Bias3); Loss = tf.reduce_sum(tf.square(Expected-Output)); Optimiser = tf.train.GradientDescentOptimizer(1e-1); Training = Optimiser.minimize(Loss); #train Sess = tf.Session(); Init = tf.global_variables_initializer(); Sess.run(Init); Losses = []; for I in range(10000): if (I%1000==0): Lossvalue = Sess.run(Loss, feed_dict={Input:X, Expected:Y}); Losses += [Lossvalue]; print("Loss:",Lossvalue); #end if Sess.run(Training, feed_dict={Input:X, Expected:Y}); #end for #result: loss Lastloss = Sess.run(Loss, feed_dict={Input:X, Expected:Y}); Losses += [Lastloss]; print("Loss:",Lastloss,"(Last)"); #result: eval Evalresult = Sess.run(Output, feed_dict={Input:X, Expected:Y}); for I in range(Batch_Size): Evalresult[I][0] = round(Evalresult[I][0],3); Evalresult[I][1] = round(Evalresult[I][1],3); Evalresult[I][2] = round(Evalresult[I][2],3); #end for print("Eval:\n"+str(Evalresult)); #result: diagram print("Loss curve:"); pyplot.plot(Losses); #eof
Colab link:
https://colab.research.google.com/drive/1BONVbohZicRYmeNy7JUBgvu4mqgiKAOq
No comments:
Post a Comment