Friday, 27 September 2019

Sample TensorFlow 2 DNN: Tensor Maths Style

TensorFlow 2 is slightly different from TensorFlow 1 as eager mode is default and everything is now value instead of op+value. Output of any `tf.**` function is now value.

Source code:
#!pip install tensorflow==2.0.0rc2
#%tensorflow_version 2.x
%reset -f

#libs
import tensorflow as tf;

#data
X = [[0,0],[0,1],[1,0],[1,1]];
Y = [[0],  [1],  [1],  [0]  ];
X = tf.convert_to_tensor(X,tf.float32);
Y = tf.convert_to_tensor(Y,tf.float32);

#model
W1 = tf.Variable(tf.random.uniform([2,20],-1,1));
B1 = tf.Variable(tf.random.uniform([  20],-1,1));

W2 = tf.Variable(tf.random.uniform([20,2],-1,1));
B2 = tf.Variable(tf.random.uniform([   2],-1,1));

@tf.function
def feedforward(X):
  H1  = tf.nn.leaky_relu(tf.matmul(X,W1) + B1);
  Out = tf.sigmoid(tf.matmul(H1,W2) + B2);
  return Out;
#end def

#train
Optim = tf.keras.optimizers.SGD(1e-1);
Steps = 1000;

for I in range(Steps):
  if I%(Steps/10)==0:
    Out  = feedforward(X);
    Loss = tf.reduce_sum(tf.square(Y-Out));
    print("Loss:",Loss.numpy());
  #end if

  with tf.GradientTape() as T:
    Out  = feedforward(X);
    Loss = tf.reduce_sum(tf.square(Y-Out));
  #end with

  Grads = T.gradient(Loss,[W1,B1,W2,B2]);
  Optim.apply_gradients(zip(Grads,[W1,B1,W2,B2]));
#end for

Out  = feedforward(X);
Loss = tf.reduce_sum(tf.square(Y-Out));
print("Loss:",Loss.numpy(),"(Last)");

print("\nDone.");
#eof

Thursday, 26 September 2019

Move from TensorFlow 1 to TensorFlow 2

TensorFlow was released as beta in 2015 and the first official release 1.0 was in 2016. Until now, 2019, TensorFlow is at 2.0 and having a release candidate.

Why switching to TF 2:
  • TensorFlow 1 will be finally phased out, although might take many years as there are already too many people using 1.x
  • TensorFlow 2 Python is more similar to TensorFlow in C++
  • TensorFlow 2 makes tensor maths is just like numberic maths with default and only eager mode.
What to change in TF 1 code:
  • Bear in mind that there is only eager execution mode, no more session
  • Replace session graph with @tf.function annotation, or just call tf.function to make graph
  • No more tf.placeholder and Session.run or feed_dict, so just use tf.convert_to_tensor to convert regular numeric data to tensors before calling TensorFlow ops.
  • tf.random_uniform is now tf.random.uniform
  • Optimisers in tf.train are all now in tf.keras.optimizers, and no more GD (tf.train.GradientDescentOptimizer), use SGD (tf.keras.optimizers.SGD) instead.
  • No more magical Someoptimiser.minimize, use these together:
    • tf.GradientTape
    • tf.GradientTape.gradient
    • Someoptimiser.apply_gradients
  • More to look in too...
Bottom line: TF 2 Python is good, more like maths, more similar to C/C++.

TensorFlow: Get Gradients of Any Function Using Auto-Differentiation

Gradient-based optimisers need derivatives of activation functions, and partial derivative of loss function to optimise and minimise loss. TensorFlow has tf.GradientTape to record changed value to get gradient value dy/dx.

Source code:
#ipython
%reset -f

#libs
import tensorflow as tf;
import numpy      as np;

#init
tf.enable_eager_execution();

#code
X = tf.convert_to_tensor([-3,-2,-1,0,1,2,3],dtype=tf.float32);

with tf.GradientTape() as T:
  T.watch(X);
  Y = Fx = X**2 + X + 1;

Dy_Dx = T.gradient(Y,X);
print(Dy_Dx);

print("\nDone.");
#eof

Tuesday, 24 September 2019

TenorFlow RNN: Text Generation Using LSTM

Text generation can be done with DNN by feeding a sequence of words then the class is a word. However, RNN with time dimension does much better in this job, but basic RNN neurons will make the network face the vanishing gradients problem. LSTM network solves this problem. The following source code learns to generate a few sentences.

Source code:
import tensorflow as tf;
tf.reset_default_graph();

#data
'''
t0       t1       t2
british  gray     is  cat
0        1        2   (3)  <=x
1        2        3        <=y
white    samoyed  is  dog
4        5        2   (6)  <=x
5        2        6        <=y 
'''
Bsize = 2;
Times = 3;
Max_X = 5;
Max_Y = 6;

X = [[[0],[1],[2]], [[4],[5],[2]]];
Y = [[[1],[2],[3]], [[5],[2],[6]]];

#normalise
for I in range(len(X)):
  for J in range(len(X[I])):
    X[I][J][0] /= Max_X;

for I in range(len(Y)):
  for J in range(len(Y[I])):
    Y[I][J][0] /= Max_Y;

#model
Input    = tf.placeholder(tf.float32, [Bsize,Times,1]);
Expected = tf.placeholder(tf.float32, [Bsize,Times,1]);

#single LSTM layer
'''
Layer1   = tf.keras.layers.LSTM(20);
Hidden1  = Layer1(Input);
'''

#multi LSTM layers
#'''
Layers = tf.keras.layers.RNN([
  tf.keras.layers.LSTMCell(30), #hidden 1
  tf.keras.layers.LSTMCell(20)  #hidden 2
],
return_sequences=True);
Hidden2 = Layers(Input);
#'''

Weight3  = tf.Variable(tf.random_uniform([20,1], -1,1));
Bias3    = tf.Variable(tf.random_uniform([   1], -1,1));
Output   = tf.sigmoid(tf.matmul(Hidden2,Weight3) + Bias3); #sequence of 2d * 2d

Loss     = tf.reduce_sum(tf.square(Expected-Output));
Optim    = tf.train.GradientDescentOptimizer(1e-1);
Training = Optim.minimize(Loss);

#train
Sess = tf.Session();
Init = tf.global_variables_initializer();
Sess.run(Init);

Feed   = {Input:X, Expected:Y};
Epochs = 10000;

for I in range(Epochs): #number of feeds, 1 feed = 1 batch
  if I%(Epochs/10)==0: 
    Lossvalue = Sess.run(Loss,Feed);
    print("Loss:",Lossvalue);
  #end if

  Sess.run(Training,Feed);
#end for

Lastloss = Sess.run(Loss,Feed);
print("Loss:",Lastloss,"(Last)");

#eval
Results = Sess.run(Output,Feed).tolist();
print("\nEval:");
for I in range(len(Results)):
  for J in range(len(Results[I])):
    for K in range(len(Results[I][J])):
      Results[I][J][K] = round(Results[I][J][K]*Max_Y);
#end for i      
print(Results);

print("\nDone.");
#eof

Colab link:
https://colab.research.google.com/drive/1C4jZfMb0YLoLWDj5T6gecjjJHjrYpdJc

Friday, 20 September 2019

TensorFlow RNN: Text Classification

TensorFlow has RNN features which are similar to Keras. RNN can be used to do classification or generation. RNN classifies best with sequential data with similarity in time. The following example shows how to classify text with single LSTM layer or multiple LSTM layers. LSTM layer is better than basic RNN layer as basic RNN layer faces vanishing gradients problem as backpropagation in big time dimension is just like in too deep network.

Source code:
import tensorflow as tf;
tf.reset_default_graph();

#data
'''
t0      t1      t2
british gray    is => cat (y=0)
0       1       2
white   samoyed is => dog (y=1)
3       4       2 
'''
Bsize = 2;
Times = 3;
Max_X = 4;
Max_Y = 1;

X = [[[0],[1],[2]], [[3],[4],[2]]];
Y = [[0],           [1]          ];

#normalise
for I in range(len(X)):
  for J in range(len(X[I])):
    X[I][J][0] /= Max_X;

for I in range(len(Y)):
  Y[I][0] /= Max_Y;

#model
Inputs   = tf.placeholder(tf.float32, [Bsize,Times,1]);
Expected = tf.placeholder(tf.float32, [Bsize,      1]);

#single LSTM layer
#'''
Layer1   = tf.keras.layers.LSTM(20);
Hidden1  = Layer1(Inputs);
#'''

#multi LSTM layers
'''
Layers = tf.keras.layers.RNN([
  tf.keras.layers.LSTMCell(30), #hidden 1
  tf.keras.layers.LSTMCell(20)  #hidden 2
]);
Hidden2 = Layers(Inputs);
'''

Weight3  = tf.Variable(tf.random_uniform([20,1], -1,1));
Bias3    = tf.Variable(tf.random_uniform([   1], -1,1));
Output   = tf.sigmoid(tf.matmul(Hidden1,Weight3) + Bias3);

Loss     = tf.reduce_sum(tf.square(Expected-Output));
Optim    = tf.train.GradientDescentOptimizer(1e-1);
Training = Optim.minimize(Loss);

#train
Sess = tf.Session();
Init = tf.global_variables_initializer();
Sess.run(Init);

Feed = {Inputs:X, Expected:Y};
for I in range(1000): #number of feeds, 1 feed = 1 batch
  if I%100==0: 
    Lossvalue = Sess.run(Loss,Feed);
    print("Loss:",Lossvalue);
  #end if

  Sess.run(Training,Feed);
#end for

Lastloss = Sess.run(Loss,Feed);
print("Loss:",Lastloss,"(Last)");

#eval
Results = Sess.run(Output,Feed);
print("\nEval:");
print(Results);

print("\nDone.");
#eof

Colab link:
https://colab.research.google.com/drive/1_TRH5ZshDApJC6JRHdRJAiBWLw3f4wRF

Tuesday, 17 September 2019

Guide to DNN: Learn a Text Sequence

DNN regression can be used to learn a sequence of numbers or sequence of words (a text). The following example learns a simple sentence.

Source code:
#libs
import tensorflow        as tf;
import matplotlib.pyplot as pyplot;

#data
Text = "the quick brown fox jumps over the lazy dog";
Data = [0,  1,    2,    3,  4,    5,   0,  6,   7];

#make training data
X = [];
Y = [];
Max   = 7; #x & y
Bsize = 6; #batch size, take the whole epoch.

for I in range(len(Data)-3):
  X += [Data[I:I+3]];
  Y += [[Data[I+3]]];
#end for
print("X =",X);
print("Y =",Y);

#normalise
for I in range(len(X)):
  Y[I][0] /= Max;
  for J in range(len(X[0])):
    X[I][J] /= Max;
#end for

#model
Input     = tf.placeholder(dtype=tf.float32, shape=[Bsize,3]);
Expected  = tf.placeholder(dtype=tf.float32, shape=[Bsize,1]);

Weight1   = tf.Variable(tf.random_uniform(shape=[3,20], minval=-1, maxval=1));
Bias1     = tf.Variable(tf.random_uniform(shape=[  20], minval=-1, maxval=1));
Hidden1   = tf.nn.leaky_relu(tf.matmul(Input,Weight1) + Bias1);

Weight2   = tf.Variable(tf.random_uniform(shape=[20,1], minval=-1, maxval=1));
Bias2     = tf.Variable(tf.random_uniform(shape=[   1], minval=-1, maxval=1));
Output    = tf.sigmoid(tf.matmul(Hidden1,Weight2) + Bias2);

Loss      = tf.reduce_sum(tf.square(Expected-Output));
Optimiser = tf.train.GradientDescentOptimizer(1e-1);
Training  = Optimiser.minimize(Loss);

#training
Sess = tf.Session();
Init = tf.global_variables_initializer();
Sess.run(Init);

Epochs = 10000;
Losses = [];
Feed   = {Input:X, Expected:Y};

for I in range(Epochs):
  if (I%(Epochs/10)==0):
    Lossvalue = Sess.run(Loss, feed_dict=Feed);
    Losses   += [Lossvalue];
    print("Loss:",Lossvalue);
  #end if

  Sess.run(Training, feed_dict=Feed);
#end for

Lastloss = Sess.run(Loss, feed_dict=Feed);
Losses  += [Lastloss];
print("Loss:",Lastloss,"(last)");

#eval
print("\nEval:");
Evalresults = Sess.run(Output, feed_dict=Feed).tolist();
for I in range(len(Evalresults)): Evalresults[I][0] = round(Evalresults[I][0]*Max);
print(Evalresults);
Sess.close();

print("\nDone.");
#eof

Colab link:
https://colab.research.google.com/drive/1caY8GUts-BOUjl-uQ1WXRrO94wx_n8uT

Friday, 13 September 2019

Case Study: Avoid Dead ReLU Units, Exploding Gradients, and Vanishing Gradients

The previous blog article (https://blog.abivin.vn/2019/09/case-study-separate-heavily-mixed-up-ys.html) states about separating heavily mixed-up classes. However, it uses sigmoid at activation which will make the computation slow, not the computation at matrix mul but a bit slow down as number of activations is much smaller than number of matrix multiplications.

And another matter is that sigmoid is not useful when the network is deep, very deep, as sigmoid-like activation functions face the vanishing gradients problem.

To achieve better performance, ReLU activation should be used, but ReLU has dead ReLU units problem, so Leaky ReLU is better.

There's still one more problem, ReLU and Leaky ReLU may make output values result in all max values when the network is too deep, unnecessarily deep. Thus reducing number of layers is important to make Leaky ReLU work as expected.

Training data & separation:

Source code:
#core
import time,os,sys;

#libs
import tensorflow        as tf;
import matplotlib.pyplot as pyplot;

#exit
def exit():
  #os._exit(1);
  sys.exit();
#end def

#mockup to emphasize value name
def units(Num):
  return Num;
#end def

#PROGRAMME ENTRY POINT==========================================================
#data
#https://i.imgur.com/uVOxZR7.png
X = [[1,1],[1,2],[1,3],[2,1],[2,2],[2,3],[3,1],[3,2],[3,3],[4,1],[4,2],[4,3],[5,1],[6,1]];
Y = [[0],  [1],  [0],  [1],  [0],  [1],  [0],  [2],  [1],  [1],  [1],  [0],  [0],  [1]  ];
Max_X      = 6#6;
Max_Y      = 2#2;
Batch_Size = 14#14;

#convert Y to probabilities P
In   = tf.placeholder(dtype=tf.int32, shape=[Batch_Size]);
Out  = tf.one_hot(In, depth=Max_Y+1);
Temp = [];
for I in range(len(Y)): Temp+=[Y[I][0]];

Sess = tf.Session();
P = Sess.run(Out, feed_dict={In:Temp}).tolist();
Sess.close();
#print(P);

#normalise
for I in range(len(X)):
  X[I][0] /= Max_X;
  X[I][1] /= Max_X;
  Y[I][0] /= Max_Y; #unused when using probs
#end for

#model
Input     = tf.placeholder(dtype=tf.float32, shape=[Batch_Size,2]);
#regress:
#Expected  = tf.placeholder(dtype=tf.float32, shape=[Batch_Size,1]);
#probs:
Expected  = tf.placeholder(dtype=tf.float32, shape=[Batch_Size,Max_Y+1]);

#SIGMOID WORKS BUT SLOW. USE LEAKY_RELU FOR PERFORMANCE, BUT REDUCE NUMBER OF 
#LAYERS TO AVOID LOSS STUCK AS ALL OUTPUT PROBS ARE 1, EXPLODING GRADIENTS?
activation_fn = tf.nn.leaky_relu;
#activation_fn = tf.sigmoid;
'''
#1
Weight1   = tf.Variable(tf.random_uniform(shape=[2,units(60)], minval=-1, maxval=1));
Bias1     = tf.Variable(tf.random_uniform(shape=[  units(60)], minval=-1, maxval=1));
Hidden1   = activation_fn(tf.matmul(Input,Weight1) + Bias1);

#2
Weight2   = tf.Variable(tf.random_uniform(shape=[60,units(50)], minval=-1, maxval=1));
Bias2     = tf.Variable(tf.random_uniform(shape=[   units(50)], minval=-1, maxval=1));
Hidden2   = activation_fn(tf.matmul(Hidden1,Weight2) + Bias2);

#3
Weight3   = tf.Variable(tf.random_uniform(shape=[50,units(40)], minval=-1, maxval=1));
Bias3     = tf.Variable(tf.random_uniform(shape=[   units(40)], minval=-1, maxval=1));
Hidden3   = activation_fn(tf.matmul(Hidden2,Weight3) + Bias3);
'''
#4
Weight4   = tf.Variable(tf.random_uniform(shape=[2,units(30)], minval=-1, maxval=1));
Bias4     = tf.Variable(tf.random_uniform(shape=[  units(30)], minval=-1, maxval=1));
Hidden4   = activation_fn(tf.matmul(Input,Weight4) + Bias4);

#5
Weight5   = tf.Variable(tf.random_uniform(shape=[30,units(20)], minval=-1, maxval=1));
Bias5     = tf.Variable(tf.random_uniform(shape=[   units(20)], minval=-1, maxval=1));
Hidden5   = activation_fn(tf.matmul(Hidden4,Weight5) + Bias5);

#out
#regress:
#Weight6   = tf.Variable(tf.random_uniform(shape=[20,units(1)], minval=-1, maxval=1));
#Bias6     = tf.Variable(tf.random_uniform(shape=[   units(1)], minval=-1, maxval=1));

#probs:
#N_Classes = Max_Y+1
Weight6   = tf.Variable(tf.random_uniform(shape=[20,units(Max_Y+1)], minval=-1, maxval=1));
Bias6     = tf.Variable(tf.random_uniform(shape=[   units(Max_Y+1)], minval=-1, maxval=1));
Output    = tf.sigmoid(tf.matmul(Hidden5,Weight6) + Bias6);

Loss      = tf.reduce_sum(tf.square(Expected-Output));
Optimiser = tf.train.GradientDescentOptimizer(1e-1);
Training  = Optimiser.minimize(Loss);

#training
Sess = tf.Session();
Init = tf.global_variables_initializer();
Sess.run(Init);

#regress:
#Feed = {Input:X, Expected:Y};

#probs:
Feed = {Input:X, Expected:P};

Epochs = 5000;
Losses = [];
Start  = time.time();
for I in range(Epochs):
  if (I%(Epochs/10)==0):
    Lossvalue = Sess.run(Loss, feed_dict=Feed);
    Losses   += [Lossvalue];
    
    if (I==0):
      print("Loss:",Lossvalue,"(first)");
    else:
      print("Loss:",Lossvalue);
  #end if
  
  Sess.run(Training, feed_dict=Feed);
#end for

Lastloss = Sess.run(Loss, feed_dict=Feed);
Losses  += [Lastloss];
print("Loss:",Lastloss,"(last)");

Finish = time.time();
print("Time:",Finish-Start,"seconds");

#eval
Evalresults = Sess.run(Output,feed_dict=Feed).tolist();

Sse = 0;
for I in range(len(P)):
  for J in range(len(P[I])):
    Sse    += (P[I][J]-Evalresults[I][J])**2;
    P[I][J] = round(P[I][J]);    
  #end for

for I in range(len(Evalresults)):
  for J in range(len(Evalresults[I])):
    #regress:
    #Evalresults[I][J] = round(Evalresults[I][J]*Max_Y);
    #probs:
    Evalresults[I][J] = round(Evalresults[I][J]);    
  #end for
#end for

print("\nSSE = {}".format(Sse));
print("Probs (Expected):");
print(P);    
print("Probs (Eval):");
print(Evalresults);
Sess.close();

#result: diagram
print("\nLoss curve:");
pyplot.plot(Losses,"-bo");
#eof

Reference:
Colab link:
https://colab.research.google.com/drive/1mixW4_wPM3_c_KQwh-hWLuW5Aay92Pit

Colab link (Regression version, instead of class probabilities):
https://colab.research.google.com/drive/1h6yrLPbGnzj5cPW0er9LBLRR5rNM3rmE

Thursday, 12 September 2019

Case Study: Separate Heavily Mixed-up Ys, and Dead ReLU Problem

Separation has to do hard works when the class distribution on plane, in space, or hyperspace gets mixed-up like randomised.

The training data in the drawing below need like 5 hidden layers to separate, but when using ReLU, the output doesn't change because of dead ReLU nodes, exploding gradients?. The problem here is dead ReLU, not the vanishing gradient problem of sigmoid-like activation.

Training data & class distribution:

Source code:
#libs
import tensorflow        as tf;
import matplotlib.pyplot as pyplot;

#mockup to emphasize value name
def units(Num):
  return Num;
#end def

#PROGRAMME ENTRY POINT==========================================================
#data
#https://i.imgur.com/uVOxZR7.png
X = [[1,1],[1,2],[1,3],[2,1],[2,2],[2,3],[3,1],[3,2],[3,3],[4,1],[4,2],[4,3],[5,1],[6,1]];
Y = [[0],  [1],  [0],  [1],  [0],  [1],  [0],  [2],  [1],  [1],  [1],  [0],  [0],  [1]  ];
Max_X      = 6;
Max_Y      = 2;
Batch_Size = 14;

#normalise
for I in range(len(X)):
  X[I][0] /= Max_X;
  X[I][1] /= Max_X;
  Y[I][0] /= Max_Y;
#end for

#model
Input     = tf.placeholder(dtype=tf.float32, shape=[Batch_Size,2]);
Expected  = tf.placeholder(dtype=tf.float32, shape=[Batch_Size,1]);

#RELU DOESN'T WORK, DEAD RELU? SIGMOID WORKS BUT SLOW.
#1
Weight1   = tf.Variable(tf.random_uniform(shape=[2,units(60)], minval=-1, maxval=1));
Bias1     = tf.Variable(tf.random_uniform(shape=[  units(60)], minval=-1, maxval=1));
Hidden1   = tf.sigmoid(tf.matmul(Input,Weight1) + Bias1);

#2
Weight2   = tf.Variable(tf.random_uniform(shape=[60,units(50)], minval=-1, maxval=1));
Bias2     = tf.Variable(tf.random_uniform(shape=[   units(50)], minval=-1, maxval=1));
Hidden2   = tf.sigmoid(tf.matmul(Hidden1,Weight2) + Bias2);

#3
Weight3   = tf.Variable(tf.random_uniform(shape=[50,units(40)], minval=-1, maxval=1));
Bias3     = tf.Variable(tf.random_uniform(shape=[   units(40)], minval=-1, maxval=1));
Hidden3   = tf.sigmoid(tf.matmul(Hidden2,Weight3) + Bias3);

#4
Weight4   = tf.Variable(tf.random_uniform(shape=[40,units(30)], minval=-1, maxval=1));
Bias4     = tf.Variable(tf.random_uniform(shape=[   units(30)], minval=-1, maxval=1));
Hidden4   = tf.sigmoid(tf.matmul(Hidden3,Weight4) + Bias4);

#5
Weight5   = tf.Variable(tf.random_uniform(shape=[30,units(20)], minval=-1, maxval=1));
Bias5     = tf.Variable(tf.random_uniform(shape=[   units(20)], minval=-1, maxval=1));
Hidden5   = tf.sigmoid(tf.matmul(Hidden4,Weight5) + Bias5);

#out
Weight6   = tf.Variable(tf.random_uniform(shape=[20,units(1)], minval=-1, maxval=1));
Bias6     = tf.Variable(tf.random_uniform(shape=[   units(1)], minval=-1, maxval=1));
Output    = tf.sigmoid(tf.matmul(Hidden5,Weight6) + Bias6);

Loss      = tf.reduce_sum(tf.square(Expected-Output));
Optimiser = tf.train.GradientDescentOptimizer(1e-1);
Training  = Optimiser.minimize(Loss);

#training
Sess = tf.Session();
Init = tf.global_variables_initializer();
Sess.run(Init);

Losses = [];
for I in range(20000):
  if (I%2000==0):
    Lossvalue = Sess.run(Loss, feed_dict={Input:X, Expected:Y});
    Losses   += [Lossvalue];
    
    if (I==0):
      print("Loss:",Lossvalue,"(first)");
    else:
      print("Loss:",Lossvalue);
  #end if
  
  Sess.run(Training, feed_dict={Input:X, Expected:Y});
#end for

Lastloss = Sess.run(Loss, feed_dict={Input:X, Expected:Y});
Losses  += [Lastloss];
print("Loss:",Lastloss,"(last)");

#eval
print("\nEval:");
Evalresults = Sess.run(Output,feed_dict={Input:X, Expected:Y}).tolist();
for I in range(len(Evalresults)):
  Evalresults[I] = [round(Evalresults[I][0]*Max_Y)];
#end for
print(Evalresults);
Sess.close();

#result: diagram
print("\nLoss curve:");
pyplot.plot(Losses);
#eof

Colab link:
https://colab.research.google.com/drive/1F8G1ug09IJo3-haZVV5lHgIP_wveQAzk

Wednesday, 11 September 2019

Case Study: 2-node Linear Separation

This is the case when 2 lines of linear separation won't do, but with a layer of at least 2 nodes will do.

Source code:
#see:
#https://i.imgur.com/RWCpYNw.png

#libs
import tensorflow        as tf;
import matplotlib.pyplot as pyplot;
import numpy             as np;

#mockup function
def units(Num):
  return Num;
#end def

#data
X = [[0,0],[0,1],[1,0],[1,1],[2,0]];
Y = [[0],  [1],  [1],  [2],  [0]  ];
Batch_Size = 5;
Max_X      = 2;
Max_Y      = 2;

#normalise
for I in range(len(X)):
  X[I][0] /= Max_X;
  X[I][1] /= Max_X;
  Y[I][0] /= Max_Y;
#end for

#model
Input     = tf.placeholder(dtype=tf.float32, shape=[Batch_Size,2]);
Expected  = tf.placeholder(dtype=tf.float32, shape=[Batch_Size,1]);

#just units(2) would do, THEORETICALLY:
Weight1   = tf.Variable(tf.random_uniform(shape=[2,units(20)], minval=-1, maxval=1));
Bias1     = tf.Variable(tf.random_uniform(shape=[  units(20)], minval=-1, maxval=1));
Hidden1   = tf.nn.relu(tf.matmul(Input,Weight1) + Bias1);

#output of 1 neuron only, units(1):
Weight2   = tf.Variable(tf.random_uniform(shape=[20,units(1)], minval=-1, maxval=1));
Bias2     = tf.Variable(tf.random_uniform(shape=[   units(1)], minval=-1, maxval=1));
Output    = tf.sigmoid(tf.matmul(Hidden1,Weight2) + Bias2);

Loss      = tf.reduce_sum(tf.square(Expected-Output));
Optimiser = tf.train.GradientDescentOptimizer(1e-1);
Training  = Optimiser.minimize(Loss);

#train
Sess = tf.Session();
Init = tf.global_variables_initializer();
Sess.run(Init);

Losses = [];
for I in range(10000):
  if (I%1000==0):
    Lossvalue = Sess.run(Loss, feed_dict={Input:X, Expected:Y});
    print("Loss:",round(Lossvalue,18));
    Losses += [Lossvalue];
  #end if
  
  Sess.run(Training, feed_dict={Input:X, Expected:Y});
#end for

Lastloss = Sess.run(Loss, feed_dict={Input:X, Expected:Y});
Losses  += [Lastloss];
print("Loss:",round(Lastloss,18),"(last)");

#eval
Evalresults = Sess.run(Output, feed_dict={Input:X, Expected:Y});
Evalresults = Evalresults.tolist();
for I in range(len(Evalresults)):
  Evalresults[I] = [round(Evalresults[I][0]*Max_Y),round(Evalresults[I][0]*Max_Y,18)];

print("\nEval:");
print(Evalresults);

#plot
print("\nLoss curve:");
pyplot.plot(Losses,"-bo");
#eof

Colab link:
https://colab.research.google.com/drive/1eZ1uEsq4TA0Qwe4_1moc-AGd08y26rVk

Monday, 9 September 2019

Guide: Convert Labels to Probabilities for Network with Probability Outputs

A simple regression model comes with just 1 neuron in output layer for class index. However, it can be multiple neurons when having probability outputs instead of class index. For example, output to 4 classes will have 3 probability neurons at output layer.

The matter is that training data might be in form [x1...xN] => class. The following code uses onehot to transform classes in training data to probabilities.

Source code:
#libs
import tensorflow as tf;

tf.enable_eager_execution();
tf.executing_eagerly();

#for single neuron class-index output layer
Y_Labels = [0,1,2,3,2,3];

#for multi-neuron probabilities output layer
#depth is number of classes
Y_Probs  = tf.one_hot(Y_Labels, depth=4);
print(Y_Probs);
#eof

Result:
tf.Tensor( [[1. 0. 0. 0.] [0. 1. 0. 0.] [0. 0. 1. 0.] [0. 0. 0. 1.] [0. 0. 1. 0.] [0. 0. 0. 1.]], shape=(6, 4), dtype=float32)

Colab link:
https://colab.research.google.com/drive/1jgBlqQLS4j5pMP5W6Ae3uInrxT2DNQUz

Friday, 6 September 2019

Case Study: Separate Mixed-up (Messily Distributed) Ys

Consider this sample:
The wandering Y value 0 at the middle of two Ys of class 3 makes it hard to separate. Takes at least 4 lines.

This case study is related to the problem of finding types of products from full product names. A full product name is roughly 20 words which means the Y values are not separated by lines (2 axes), nor planes (3 axes), but separated by hyper-planes.

Training data consist of ~20,000 samples, and 100 classes (product types). It seems hard to separate those 20k Y points in the hyper-space.

Back to the simple sample above, it can be separated with 4 hidden layers, see the source code below.

Source code:
#libs
import tensorflow        as tf;
import matplotlib.pyplot as pyplot;

#data
#https://i.imgur.com/PUw7NWv.png
X = [[0,0], [0,1], [1,0], [10,10], [15,15], [20,20]];
Y = [[0],   [1],   [2],   [3],     [0],     [3]    ];
Batch_Size = 6;
MAX        = 20;

#normalise
for I in range(len(X)):
  X[I][0] = X[I][0]/MAX;
  X[I][1] = X[I][1]/MAX;
  Y[I][0] = Y[I][0]/3;
#end for

#model
Input     = tf.placeholder(dtype=tf.float32, shape=[Batch_Size,2]);
Expected  = tf.placeholder(dtype=tf.float32, shape=[Batch_Size,1]);

Weight1   = tf.Variable(tf.random_uniform(shape=[2,40], minval=-1, maxval=1));
Bias1     = tf.Variable(tf.random_uniform(shape=[  40], minval=-1, maxval=1));
Hidden1   = tf.nn.relu(tf.matmul(Input,Weight1) + Bias1);

Weight2   = tf.Variable(tf.random_uniform(shape=[40,30], minval=-1, maxval=1));
Bias2     = tf.Variable(tf.random_uniform(shape=[   30], minval=-1, maxval=1));
Hidden2   = tf.nn.relu(tf.matmul(Hidden1,Weight2) + Bias2);

Weight3   = tf.Variable(tf.random_uniform(shape=[30,20], minval=-1, maxval=1));
Bias3     = tf.Variable(tf.random_uniform(shape=[   20], minval=-1, maxval=1));
Hidden3   = tf.nn.relu(tf.matmul(Hidden2,Weight3) + Bias3);

Weight4   = tf.Variable(tf.random_uniform(shape=[20,10], minval=-1, maxval=1));
Bias4     = tf.Variable(tf.random_uniform(shape=[   10], minval=-1, maxval=1));
Hidden4   = tf.nn.relu(tf.matmul(Hidden3,Weight4) + Bias4);

Weight5   = tf.Variable(tf.random_uniform(shape=[10,1], minval=-1, maxval=1));
Bias5     = tf.Variable(tf.random_uniform(shape=[   1], minval=-1, maxval=1));
Output    = tf.sigmoid(tf.matmul(Hidden4,Weight5) + Bias5);

Loss      = tf.reduce_sum(tf.square(Expected-Output));
Optimiser = tf.train.GradientDescentOptimizer(1e-1);
Training  = Optimiser.minimize(Loss);

#train
Sess = tf.Session();
Init = tf.global_variables_initializer();
Sess.run(Init);

Losses = [];
for I in range(20000):
  if (I%2000==0):
    Lossvalue = Sess.run(Loss, feed_dict={Input:X, Expected:Y});
    Losses += [Lossvalue];
    print("Loss:",round(Lossvalue,18));
  #end if
  
  Sess.run(Training, feed_dict={Input:X, Expected:Y});
#end for

#result: loss
Lastloss = Sess.run(Loss, feed_dict={Input:X, Expected:Y});
Losses  += [Lastloss];
print("Loss:",Lastloss,"(Last)");

#result: eval
Evalresult = Sess.run(Output, feed_dict={Input:X, Expected:Y});
for I in range(Batch_Size):
  Evalresult[I][0] = round(Evalresult[I][0],18);

print("Eval (0 0.33 0.66 1 0 1):\n"+str(Evalresult));

#result: diagram
print("Loss curve:");
pyplot.plot(Losses);
#eof

Colab link:
https://colab.research.google.com/drive/11n78tFnKPgLpPsi6Ff3NgZUM0UYunWy7