Back in the saddle / C# neural network demo
I felt like I was making good progress back in 2016 in my AI research. But I realized it was not going to turn into an income-yielding venture anytime soon. So I moved on to other ventures. This was during a sabbatical from "real work". I eventually gave up and got a real job again. While that job pays reasonably well, it's also fairly boring. Kira and I moved from Madison to Miami this year. For various reasons I decided to do a YouTube channel for a while to give my impressions of life here. That was fun for a while Recently I got somewhat bored with that and put it on hold.
I was in a bit of a funk about what I should do with my free time when I'm not doing my regular work. I decided to resume my AI research. I've been dipping my toes in this week. Reading back on my own blog to reconnect with my most recent project. I decided to start into the subject of artificial neural networks (ANNs) again.
I'm a little embarrassed to admit that I never wrote a traditional ANN until last night. I don't just want to play with existing algorithms. I feel compelled to write one from scratch to make sure I can genuinely understand how they work from the experience. This is actually something I've wanted to do since somewhere around 1991 and just never got around to. One key reason is that I've struggled to understand the code I have seen in the past and even more to translate the arcane mathematics into algorithms. Each time I tried in the past I found that what I wrote did not functionally learn anything. The documented explanations were always missing some ingredient required for me to have a complete enough understanding.
I was running into that same problem last night when I started in to the problem again. I decided to stick with C# this time so I wouldn't get sidetracked by getting back into doing C++ development again. I looked for some online articles about ANNs in C#. I found quite a few. And as per usual, most of them made it difficult for me to make sense of what was going on in their algorithms. I didn't want to just copy one and call it done. Moreover, most of them seemed to make poor use of memory. I ultimately settled on a super simplified demo with a fairly hardcoded demo that trains to mimic a logical XOR gate. Simple training set. Easy to verify. And the sample code was compact and readable. Off to a great start. I pasted it into my test program and ran it as-is. I mirrored what I gleaned from its design in a brand new set of classes designed to generally optimize memory and performance for large networks. I even chose the same XOR test case.
What was driving me bonkers was that once again it was never learning this trivial function. Most of the few hours that I worked on it were devoted to staring at the two pieces of code and trying to figure out where the bugs in mine were. Most had to do with addressing the wrong array elements. And yet after all that the network was just never settling into a meaningful behavior. Along the way I discovered the truth that neither did the demo code I had copy-and-pasted. It suffered the same problem my code did. I honestly can't believe the author went through all that trouble to craft the code and blog about it without meaningfully proving that it works.
I eventually figured out that the activation functions (AF) I was experimenting with were critical to get right. I was blindly implementing them from examples I found online and not working to ensure that they were meaningful and that I chose the correct first derivatives of them for backpropagation purposes. I eventually got a proper implementation of the hyperbolic tangent (aka "tanh") AF and its correct 1 - tanh(X)2 first derivative. And finally my demo program worked. First time ever for me.
In some ways I credit the wealth of relatively new web pages out there discussing ANNs for programmers. Most of them weren't around back in 2016 when I was last exploring classifier systems. But I mainly credit my absolute determination to finally crack this. I wasn't going to accept a nonfunctional algorithm and set the code aside for later again.
I was intrigued to learn about the rectified linear unit (ReLU) activation function. I implemented it as an alternative AF in my program. Sometimes it works. And sometimes its weights veer off to infinity. I was afraid that would happen because of its partial linearity. I still haven't fully wrapped my head around ReLU and the "leaky" version (LReLU) yet. But it sounds like in recent years most ANNs have shifted toward ReLUs. The calculations are way less expensive than for sigmoid-type AFs. And the fact that they are not squished into a narrow range (0 to 1 or -1 to 1) apparently makes deep learning algorithms with multiple hidden layers possible. I get why in a basic sense. But I need to study this a lot further before I'll feel more confident in it.
I'm just getting started with my ANN experiments. I need to construct better training scenarios and shake this out some more. If you are interested in seeing my code, here it is. It's a simple console project using .Net Core 5. I find that on most runs it gets fully trained and working correctly by around iteration 1000. But sometimes it does not settle at all before it stops at 10000. You'll know it has successfully settled when you stop seeing the "------" lines that indicate an incorrect prediction. Note that the NeuralNetwork class constructor will let you choose as many middle layers as you want. But so far I've only tested with one.
*** 10/6/2021 update ***
After a couple weeks of experiments I decided to replace this code with a trimmed down demo version of my latest. There was a critical bug that was causing the training to eventually collapse. So many of the C# code samples I studied along the way had their own bugs and/or were extremely difficult for me to follow. Most of the C# code samples appeared to be tweaked versions of one single flawed demo shared nearly a decade ago. And all had significant limitations on what you could do with the final product. I've stripped out some L1+L2 regularization experiments I'm dabbling with. And JSON serialization for loading and saving state.
I'm hoping that my demo will help others struggling with understanding basic ANNs and applying them to their own projects.
I think I've pounded out all the real bugs. If you find a bug, please do let me know!
Typical output:
Iteration Inputs Output Valid? Accuracy 0 1 xor 0 = 0.599 100.0% |----------| 1,000 1 xor 0 = 0.557 42.0% |---- | 2,000 1 xor 1 = 0.520 (wrong) 42.0% |---- | 3,000 0 xor 1 = 0.490 (wrong) 50.0% |----- | 4,000 1 xor 0 = 0.551 70.0% |------- | 5,000 1 xor 1 = 0.592 (wrong) 62.0% |------ | 6,000 1 xor 0 = 0.521 29.0% |--- | 7,000 0 xor 0 = 0.368 41.0% |---- | 8,000 1 xor 1 = 0.532 (wrong) 65.0% |------- | 9,000 0 xor 0 = 0.284 82.0% |-------- | 10,000 1 xor 1 = 0.587 (wrong) 70.0% |------- | 11,000 1 xor 0 = 0.569 78.0% |-------- | 12,000 1 xor 1 = 0.417 100.0% |----------| I've had 1,000 flawless predictions recently. Continue anyway?
Code:
using System;
using System.Collections.Generic;
namespace BasicNeuralNetworkDemo {
class Program {
static void Main(string[] args) {
var nn = new NeuralNetwork();
nn.AddLayer(2);
nn.AddLayer(2, true, ActivationFunctionEnum.TanH, 0.01f);
nn.AddLayer(1, true, ActivationFunctionEnum.TanH, 0.01f);
float[][] training = new float[][] {
new float[] { 0, 0, 0 },
new float[] { 0, 1, 1 },
new float[] { 1, 0, 1 },
new float[] { 1, 1, 0 },
};
Console.WriteLine("Iteration Inputs Output Valid? Accuracy");
int maxIterations = 1000000;
var corrects = new List<bool>();
int flawlessRuns = 0;
int i = 0;
while (i < maxIterations) {
int trainingCase = NeuralNetwork.NextRandomInt(0, training.Length);
var trainingData = training[trainingCase];
nn.SetInputs(trainingData);
nn.FeedForward();
nn.TrainingOutputs[0] = trainingData[2];
bool isCorrect = (nn.OutputLayer.Neurons[0].Output < 0.5 ? 0 : 1) == nn.TrainingOutputs[0];
corrects.Add(isCorrect);
while (corrects.Count > 100) corrects.RemoveAt(0);
float percentCorrect = 0;
foreach (var correct in corrects) if (correct) percentCorrect += 1;
percentCorrect /= corrects.Count;
if (percentCorrect == 1) flawlessRuns++;
else flawlessRuns = 0;
nn.Backpropagate();
if (i % 100 == 0) {
#region Output state
Console.WriteLine(
RightJustify(i.ToString("#,##0"), 9) + " " +
trainingData[0] +
" xor " +
trainingData[1] + " = " +
RightJustify("" + nn.OutputLayer.Neurons[0].Output.ToString("0.000"), 7) + " " +
(isCorrect ? " " : "(wrong)") +
RightJustify((percentCorrect * 100).ToString("0.0") + "% ", 12) +
RenderPercent(percentCorrect * 100)
);
#endregion
}
if (flawlessRuns == 1000) {
Console.WriteLine("I've had " + flawlessRuns.ToString("#,##0") + " flawless predictions recently. Continue anyway?");
Console.Beep();
Console.ReadLine();
}
i++;
}
Console.WriteLine("Done");
Console.Beep();
Console.ReadLine();
}
static string RenderPercent(float percent) {
float value = percent / 10f;
if (value < 0.5) return "| |";
if (value < 1.5) return "|- |";
if (value < 2.5) return "|-- |";
if (value < 3.5) return "|--- |";
if (value < 4.5) return "|---- |";
if (value < 5.5) return "|----- |";
if (value < 6.5) return "|------ |";
if (value < 7.5) return "|------- |";
if (value < 8.5) return "|-------- |";
if (value < 9.5) return "|--------- |";
return "|----------|";
}
static string RightJustify(string text, int width) {
while (text.Length < width) text = " " + text;
return text;
}
}
public enum ActivationFunctionEnum {
/// <summary> Rectified Linear Unit </summary>
ReLU,
/// <summary> Leaky Rectified Linear Unit </summary>
LReLU,
/// <summary> Logistic sigmoid </summary>
Sigmoid,
/// <summary> Hyperbolic tangent </summary>
TanH,
/// <summary> Softmax function </summary>
Softmax,
}
public class NeuralNetwork {
/// <summary>
/// The layers of neurons from input (0) to output (N)
/// </summary>
public Layer[] Layers { get; private set; }
/// <summary>
/// Equivalent to Layers.Length
/// </summary>
public int LayerCount { get; private set; }
/// <summary>
/// Equivalent to InputLayer.NeuronCount
/// </summary>
public int InputCount { get; private set; }
/// <summary>
/// Equivalent to OutputLayer.NeuronCount
/// </summary>
public int OutputCount { get; private set; }
/// <summary>
/// Equivalent to Layers[0]
/// </summary>
public Layer InputLayer { get; private set; }
/// <summary>
/// Equivalent to Layers[LayerCount - 1]
/// </summary>
public Layer OutputLayer { get; private set; }
/// <summary>
/// Provides the desired output values for use in backpropagation training
/// </summary>
public float[] TrainingOutputs { get; private set; }
public NeuralNetwork() { }
/// <summary>
/// Constructs and adds a new neuron layer to .Layers
/// </summary>
public Layer AddLayer(
int neuronCount,
bool randomize = false,
ActivationFunctionEnum activationFunction = ActivationFunctionEnum.TanH,
float learningRate = 0.01f
) {
// Since we can't expand the array we'll construct a new one
var newLayers = new Layer[LayerCount + 1];
if (LayerCount > 0) Array.Copy(Layers, newLayers, LayerCount);
// Interconnect layers
Layer previousLayer = null;
if (LayerCount > 0) previousLayer = newLayers[LayerCount - 1];
// Construct the new layer
var layer = new Layer(neuronCount, previousLayer);
layer.ActivationFunction = activationFunction;
layer.LearningRate = learningRate;
if (randomize) layer.Randomize();
newLayers[LayerCount] = layer;
// Interconnect layers
if (LayerCount > 0) previousLayer.NextLayer = layer;
// Cache some helpful properties
if (LayerCount == 0) {
InputLayer = layer;
InputCount = neuronCount;
}
if (LayerCount == newLayers.Length - 1) {
OutputLayer = layer;
OutputCount = neuronCount;
TrainingOutputs = new float[neuronCount];
}
// Emplace the new array and move on
Layers = newLayers;
LayerCount++;
return layer;
}
/// <summary>
/// Copy the array of input values to the input layer's .Output properties
/// </summary>
public void SetInputs(float[] inputs) {
for (int n = 0; n < InputCount; n++) {
InputLayer.Neurons[n].Output = inputs[n];
}
}
/// <summary>
/// Copy the output layer's .Output property values to the given array
/// </summary>
public void GetOutputs(float[] outputs) {
for (int n = 0; n < OutputCount; n++) {
outputs[n] = OutputLayer.Neurons[n].Output;
}
}
/// <summary>
/// Interpret the output array as a singular category (0, 1, 2, ...) or -1 (none)
/// </summary>
public int Classify() {
float maxValue = 0;
int bestIndex = -1;
for (int o = 0; o < OutputCount; o++) {
float value = OutputLayer.Neurons[o].Output;
if (value > maxValue) {
bestIndex = o;
maxValue = value;
}
}
if (maxValue == 0) return -1;
return bestIndex;
}
/// <summary>
/// Copy the given array's values to the .TrainingOutputs property
/// </summary>
public void SetTrainingOutputs(float[] outputs) {
Array.Copy(outputs, TrainingOutputs, OutputCount);
}
/// <summary>
/// Flipside of .Classify() that sets .TrainingOutputs to all zeros and the given index to one
/// </summary>
public void SetTrainingClassification(int value) {
for (int o = 0; o < OutputCount; o++) {
if (o == value) {
TrainingOutputs[o] = 1;
} else {
TrainingOutputs[o] = 0;
}
}
}
/// <summary>
/// Feed .Inputs forward to populate .Outputs
/// </summary>
public void FeedForward() {
for (int l = 1; l < LayerCount; l++) {
var layer = Layers[l];
layer.FeedForward();
}
}
/// <summary>
/// One iteration of backpropagation training using inputs and training outputs after .Predict() was called on the same
/// </summary>
public void Backpropagate() {
for (int l = LayerCount - 1; l > 0; l--) {
var layer = Layers[l];
layer.Backpropagate(TrainingOutputs);
}
}
/// <summary>
/// Returns a random float in the range from min to max (inclusive)
/// </summary>
public static float NextRandom(float min, float max) {
return (float)random.NextDouble() * (max - min) + min;
}
/// <summary>
/// Returns a random int that is at least min and less than max
/// </summary>
public static int NextRandomInt(int min, int max) {
return random.Next(min, max);
}
private static Random random = new Random();
}
public class Layer {
/// <summary>
/// All the neurons in this layer
/// </summary>
public Neuron[] Neurons;
/// <summary>
/// Reference to the earlier layer that I get my input from
/// </summary>
public Layer PreviousLayer;
/// <summary>
/// Reference to the later layer that gets its input from me
/// </summary>
public Layer NextLayer;
/// <summary>
/// A tunable parameter that trades shorter training times for greater final accuracy
/// </summary>
public float LearningRate = 0.01f;
/// <summary>
/// How to transform the summed-up scalar output value of each neuron during feed forward
/// </summary>
public ActivationFunctionEnum ActivationFunction = ActivationFunctionEnum.TanH;
/// <summary>
/// Equivalent to Neurons.Length
/// </summary>
public int NeuronCount { get; private set; }
public Layer(int neuronCount, Layer previousLayer) {
PreviousLayer = previousLayer;
NeuronCount = neuronCount;
Neurons = new Neuron[NeuronCount];
for (int n = 0; n < NeuronCount; n++) {
Neuron neuron = new Neuron(this);
Neurons[n] = neuron;
}
}
/// <summary>
/// Forget all prior training by randomizing all input weights and biases
/// </summary>
public void Randomize() {
// Put weights in the range of -0.5 to 0.5
const float randomWeightRadius = 0.5f;
foreach (Neuron neuron in Neurons) {
neuron.Randomize(randomWeightRadius);
}
}
/// <summary>
/// Feed-forward algorithm for this layer
/// </summary>
public void FeedForward() {
foreach (var neuron in Neurons) {
// Sum up the previous layer's outputs multiplied by this neuron's weights for each
float sigma = 0;
for (int i = 0; i < PreviousLayer.NeuronCount; i++) {
sigma += PreviousLayer.Neurons[i].Output * neuron.InputWeights[i];
}
sigma += neuron.Bias; // Add in each neuron's bias too
// Shape the output using the activation function
float output = ActivationFn(sigma);
neuron.Output = output;
}
// The Softmax activation function requires extra processing of aggregates
if (ActivationFunction == ActivationFunctionEnum.Softmax) {
// Find the max output value
float max = float.NegativeInfinity;
foreach (var neuron in Neurons) {
if (neuron.Output > max) max = neuron.Output;
}
// Compute the scale
float scale = 0;
foreach (var neuron in Neurons) {
scale += (float)Math.Exp(neuron.Output - max);
}
// Shift and scale the outputs
foreach (var neuron in Neurons) {
neuron.Output = (float)Math.Exp(neuron.Output - max) / scale;
}
}
}
/// <summary>
/// Backpropagation algorithm
/// </summary>
public void Backpropagate(float[] trainingOutputs) {
// Compute error for each neuron
for (int n = 0; n < NeuronCount; n++) {
var neuron = Neurons[n];
float output = neuron.Output;
if (NextLayer == null) { // Output layer
var error = trainingOutputs[n] - output;
neuron.Error = error * ActivationFnDerivative(output);
} else { // Hidden layer
float error = 0;
for (int o = 0; o < NextLayer.NeuronCount; o++) {
var nextNeuron = NextLayer.Neurons[o];
var iw = nextNeuron.InputWeights[n];
error += nextNeuron.Error * iw;
}
neuron.Error = error * ActivationFnDerivative(output);
}
}
// Adjust weights of each neuron
for (int n = 0; n < NeuronCount; n++) {
var neuron = Neurons[n];
// Update this neuron's bias
var gradient = neuron.Error;
neuron.Bias += gradient * LearningRate;
// Update this neuron's input weights
for (int i = 0; i < PreviousLayer.NeuronCount; i++) {
gradient = neuron.Error * PreviousLayer.Neurons[i].Output;
neuron.InputWeights[i] += gradient * LearningRate;
}
}
}
private float ActivationFn(float value) {
switch (ActivationFunction) {
case ActivationFunctionEnum.ReLU:
if (value < 0) return 0;
return value;
case ActivationFunctionEnum.LReLU:
if (value < 0) return value * 0.01f;
return value;
case ActivationFunctionEnum.Sigmoid:
return (float)(1 / (1 + Math.Exp(-value)));
case ActivationFunctionEnum.TanH:
return (float)Math.Tanh(value);
case ActivationFunctionEnum.Softmax:
return value; // This is only the first part of summing up all the values
}
return value;
}
private float ActivationFnDerivative(float value) {
switch (ActivationFunction) {
case ActivationFunctionEnum.ReLU:
if (value > 0) return 1;
return 0;
case ActivationFunctionEnum.LReLU:
if (value > 0) return 1;
return 0.01f;
case ActivationFunctionEnum.Sigmoid:
return value * (1 - value);
case ActivationFunctionEnum.TanH:
return 1 - value * value;
case ActivationFunctionEnum.Softmax:
return (1 - value) * value;
}
return 0;
}
}
public class Neuron {
/// <summary>
/// The weight I put on each of my inputs when computing my output as my essential learned memory
/// </summary>
public float[] InputWeights;
/// <summary>
/// My bias is also part of my learned memory
/// </summary>
public float Bias;
/// <summary>
/// My feed-forward computed output
/// </summary>
public float Output;
/// <summary>
/// My back-propagation computed error
/// </summary>
public float Error;
public Neuron(Layer layer) {
if (layer.PreviousLayer != null) {
InputWeights = new float[layer.PreviousLayer.NeuronCount];
}
}
/// <summary>
/// Forget all prior training by randomizing my input weights and bias
/// </summary>
public void Randomize(float radius) {
if (InputWeights != null) {
for (int i = 0; i < InputWeights.Length; i++) {
InputWeights[i] = NeuralNetwork.NextRandom(-radius, radius);
}
}
Bias = NeuralNetwork.NextRandom(-radius, radius);
}
}
}
Comments
Post a Comment