Search This Blog

Saturday, October 9, 2021

Neural network in C# with multicore parallelization / MNIST digits demo

I've been working for a couples weeks on building my first fully functional artificial neural network (ANN). I'm not blazing any news trails here by doing so. I'm a software engineer. I can barely follow the mathematical explanations of how ANNs work. For the most part I have turned to the source code others have shared online for inspiration. In most cases I've struggled to understand even that, despite programming for a living.

Part of the challenge is that more than a few of those demos have surprised me by being nonfunctional. They did something for sure. They just didn't learn anything or perform significantly better than random chance making correct predictions, no matter how many iterations they went through. Or they had bugs that prevented them from working according to the well-worn basic backpropagation algorithm.

I mostly worked from C# examples when I could find them. One thing that was a genuine struggle for me to deal with is my sense that they all derived from one source from a decade and a half ago that itself had bugs. And which struck me as poorly structured to begin with. In short, I found it hard to read most of the source code samples I found because they were written in cryptic ways in my opinion. Along the way I wrote and rewrote from scratch. If I couldn't duplicate what was contained in one demo I might download and run it directly within my project. And usually I would find it wouldn't work for one reason or another. I was amazed that people blogged about the subject without apparently confirming that their own code worked properly.

For a while I was very frustrated because I was seeing a strange behavior nobody else had documented. My models would train and get very good. And then their accuracy rates would start falling off as though rolling down the other side of a hill. I spent over a week trying to figure out the cause. Ultimately I discovered an extra loop in my code that influenced training in a way that didn't demolish it completely, but which somehow compounded after a while to eventually undo all of the training. Once I fixed that I immediately started seeing my code behaving like everybody else's. Hooray!

I know there are lots of code samples out there already. But here is my own. My previous blog post showed a simplified version of this. Short enough to paste directly on the page. In this case I'm going to instead point you to a Github repository with my complete program in it:

https://github.com/JimCarnicelli/BasicNeuralNetwork

I'm also going to skip trying to write an extensive explanation of how an ANN works, including the backpropagation algorithm. I think that has been very well covered on so many other websites that I would have little more of value to add. So I'll just tell you a little more about what's in my demo code.

For starters, my demo has a solidly OOP structure. The very reusable basis is a set of NeuralNetwork, Layer, and Neuron classes. These classes are well oriented toward the basics of both training and later practical use. NeuralNetwork features .FromJson() and .ToJson() methods for serialization of the trained state of a model. The layers can be separately configured to use different learning rates and activation functions, including Softmax, TanH, Sigmoid, ReLU, and LReLU. You can have as many hidden layers as you want too. NeuralNetwork offers various ways to inject input values and get your output, including the .Classify() method, which gives you an integer representing which output neuron had the highest value and is thus the predicted class. I've added a lot of inline comments to help explain everything for both the practical programmer and the programmer looking to understand the inner workings.

I didn't want to only focus on readability. I also put a lot of thought into performance. Starting with memory. You might think that instantiating one Neuron instance for each logical neuron would be very memory wasteful but it's not. I tested this with very large test networks with thousands and even millions of neurons. As the number of neurons grows and thus the number of interconnections among them, the size of the memory footprint of the network approaches 4 bytes times the total number of input weights. That's 4 bytes per floating point number, which is the common currency for this code. So if you had a network with 1,000 hidden-layer neurons and 1,000 output-layer neurons, that accounts for 1,000,000 input weights and thus the total network will take up around 4MB of memory. Which is quite compact. One thing my code does not do during training or behaving is allocate temporary arrays or collections that then go away. That saves memory and speeds things up.

The structure of my code lends itself to speedy execution too. I just configured a network for the MNIST demo with 784 inputs, 100 hidden-layer neurons, and 10 output neurons. The latter 2 layers use the sigmoid activation function. In about 75 seconds it has churned through 100k training iterations on my laptop. Which has an 8-processor, 16-core CPU set running at around 4.27 GHz. I think this is decent performance and is a result of a reasonably optimal coding. But I also added a switch to enable each layer to spread the training and behaving calculations out across all the computer's processors. In my fairly small tests this about doubles the speed. With larger networks I start seeing 7x speedups. I haven't tried it on any very large networks with millions of nodes yet. Hopefully it starts approaching a 16x speedup for 16 cores and so forth.

Windows CPU utilization sample

My code has a shabby set of demos included. One is a classic XOR gate demo. This one is great for study because it is such a small network that you can visualize the whole network fairly easily.

The second demo similarly involves synthetic data in the problem of learning to classify all of the ASCII characters from 32 (space) to 95 (tilde) as Whitespace, Symbol, Letter, Digit, or None. This network has 7 inputs representing the 7 bits needed for these characters, 10 hidden-layer neurons, and 5 output neurons representing the character classes. This one uses the LReLU activation function.

The final demo uses the aforementioned MNIST data set. The files can be found on the MNIST project page. I was frustrated at how slow loading the source data files was each time. I included a utility function to convert the training and test file pairs into pure .PNG files. Here's the smaller test image:

If you zoom in very close you will see single red pixels in the upper left corner of each digit tile. Since every source pixel was white I decided to pack the digit's value into the blue channel. In the example below you can see that the "7" digit's actual value is packed into the (255, 0, 7) RGB value.


Doing this one-time transformation of these 4 files into 2 PNGs is great. The PNG files are smaller and load much faster. And they are easier to visualize using an ordinary image viewer or paint program.

My results seem in line with those documented on the MNIST project site. I experimented with a lot of network configurations and parameter values. But for one concrete example, I configured the single hidden layer with 100 neurons using the sigmoid activation function and a learning rate of 0.1. My accuracy rate on the test set was 97.3% after 1M training iterations. That's a 2.7% error rate. The closest comparison I can see in the table of results is listed as "2-layer NN, 1000 hidden units" with no preprocessing (same as mine). The error rate on that is reported as 4.5%. That was from a 1998 paper by LeCun et al.

Quick note. You're going to need to edit this line in Program.cs to point to your own project data folder immediately under the project root folder:

  static string dataDirectory = @"G:\My Drive\Ventures\MsDev\BasicNeuralNetwork\Data\";

I want to emphasize that while I have worked on this code for a couple weeks and feel very good about it, I can't guarantee it is without bugs. I encourage you to comment here or reach out to me if you find any bugs, no matter how small.

I'd also welcome hearing from you about your experiences in using this code for your own projects. Cheers.


Thursday, September 23, 2021

Back in the saddle / C# neural network demo

I felt like I was making good progress back in 2016 in my AI research. But I realized it was not going to turn into an income-yielding venture anytime soon. So I moved on to other ventures. This was during a sabbatical from "real work". I eventually gave up and got a real job again. While that job pays reasonably well, it's also fairly boring. Kira and I moved from Madison to Miami this year. For various reasons I decided to do a YouTube channel for a while to give my impressions of life here. That was fun for a while Recently I got somewhat bored with that and put it on hold.

I was in a bit of a funk about what I should do with my free time when I'm not doing my regular work. I decided to resume my AI research. I've been dipping my toes in this week. Reading back on my own blog to reconnect with my most recent project. I decided to start into the subject of artificial neural networks (ANNs) again.

I'm a little embarrassed to admit that I never wrote a traditional ANN until last night. I don't just want to play with existing algorithms. I feel compelled to write one from scratch to make sure I can genuinely understand how they work from the experience. This is actually something I've wanted to do since somewhere around 1991 and just never got around to. One key reason is that I've struggled to understand the code I have seen in the past and even more to translate the arcane mathematics into algorithms. Each time I tried in the past I found that what I wrote did not functionally learn anything. The documented explanations were always missing some ingredient required for me to have a complete enough understanding.

I was running into that same problem last night when I started in to the problem again. I decided to stick with C# this time so I wouldn't get sidetracked by getting back into doing C++ development again. I looked for some online articles about ANNs in C#. I found quite a few. And as per usual, most of them made it difficult for me to make sense of what was going on in their algorithms. I didn't want to just copy one and call it done. Moreover, most of them seemed to make poor use of memory. I ultimately settled on a super simplified demo with a fairly hardcoded demo that trains to mimic a logical XOR gate. Simple training set. Easy to verify. And the sample code was compact and readable. Off to a great start. I pasted it into my test program and ran it as-is. I mirrored what I gleaned from its design in a brand new set of classes designed to generally optimize memory and performance for large networks. I even chose the same XOR test case.

What was driving me bonkers was that once again it was never learning this trivial function. Most of the few hours that I worked on it were devoted to staring at the two pieces of code and trying to figure out where the bugs in mine were. Most had to do with addressing the wrong array elements. And yet after all that the network was just never settling into a meaningful behavior. Along the way I discovered the truth that neither did the demo code I had copy-and-pasted. It suffered the same problem my code did. I honestly can't believe the author went through all that trouble to craft the code and blog about it without meaningfully proving that it works.

I eventually figured out that the activation functions (AF) I was experimenting with were critical to get right. I was blindly implementing them from examples I found online and not working to ensure that they were meaningful and that I chose the correct first derivatives of them for backpropagation purposes. I eventually got a proper implementation of the hyperbolic tangent (aka "tanh") AF and its correct 1 - tanh(X)2 first derivative. And finally my demo program worked. First time ever for me.

In some ways I credit the wealth of relatively new web pages out there discussing ANNs for programmers. Most of them weren't around back in 2016 when I was last exploring classifier systems. But I mainly credit my absolute determination to finally crack this. I wasn't going to accept a nonfunctional algorithm and set the code aside for later again.

I was intrigued to learn about the rectified linear unit (ReLU) activation function. I implemented it as an alternative AF in my program. Sometimes it works. And sometimes its weights veer off to infinity. I was afraid that would happen because of its partial linearity. I still haven't fully wrapped my head around ReLU and the "leaky" version (LReLU) yet. But it sounds like in recent years most ANNs have shifted toward ReLUs. The calculations are way less expensive than for sigmoid-type AFs. And the fact that they are not squished into a narrow range (0 to 1 or -1 to 1) apparently makes deep learning algorithms with multiple hidden layers possible. I get why in a basic sense. But I need to study this a lot further before I'll feel more confident in it.

I'm just getting started with my ANN experiments. I need to construct better training scenarios and shake this out some more. If you are interested in seeing my code, here it is. It's a simple console project using .Net Core 5. I find that on most runs it gets fully trained and working correctly by around iteration 1000. But sometimes it does not settle at all before it stops at 10000. You'll know it has successfully settled when you stop seeing the "------" lines that indicate an incorrect prediction. Note that the NeuralNetwork class constructor will let you choose as many middle layers as you want. But so far I've only tested with one.

*** 10/6/2021 update ***

After a couple weeks of experiments I decided to replace this code with a trimmed down demo version of my latest. There was a critical bug that was causing the training to eventually collapse. So many of the C# code samples I studied along the way had their own bugs and/or were extremely difficult for me to follow. Most of the C# code samples appeared to be tweaked versions of one single flawed demo shared nearly a decade ago. And all had significant limitations on what you could do with the final product. I've stripped out some L1+L2 regularization experiments I'm dabbling with. And JSON serialization for loading and saving state.

I'm hoping that my demo will help others struggling with understanding basic ANNs and applying them to their own projects.

I think I've pounded out all the real bugs. If you find a bug, please do let me know!


Typical output:

Iteration     Inputs    Output   Valid?      Accuracy
        0    1 xor 0 =   0.599              100.0% |----------|
    1,000    1 xor 0 =   0.557               42.0% |----      |
    2,000    1 xor 1 =   0.520  (wrong)      42.0% |----      |
    3,000    0 xor 1 =   0.490  (wrong)      50.0% |-----     |
    4,000    1 xor 0 =   0.551               70.0% |-------   |
    5,000    1 xor 1 =   0.592  (wrong)      62.0% |------    |
    6,000    1 xor 0 =   0.521               29.0% |---       |
    7,000    0 xor 0 =   0.368               41.0% |----      |
    8,000    1 xor 1 =   0.532  (wrong)      65.0% |-------   |
    9,000    0 xor 0 =   0.284               82.0% |--------  |
   10,000    1 xor 1 =   0.587  (wrong)      70.0% |-------   |
   11,000    1 xor 0 =   0.569               78.0% |--------  |
   12,000    1 xor 1 =   0.417              100.0% |----------|
I've had 1,000 flawless predictions recently. Continue anyway?

Code:

using System;
using System.Collections.Generic;

namespace BasicNeuralNetworkDemo {

    class Program {

        static void Main(string[] args) {

            var nn = new NeuralNetwork();
            nn.AddLayer(2);
            nn.AddLayer(2, true, ActivationFunctionEnum.TanH, 0.01f);
            nn.AddLayer(1, true, ActivationFunctionEnum.TanH, 0.01f);

            float[][] training = new float[][] {
                new float[] { 0, 0,   0 },
                new float[] { 0, 1,   1 },
                new float[] { 1, 0,   1 },
                new float[] { 1, 1,   0 },
            };

            Console.WriteLine("Iteration     Inputs    Output   Valid?      Accuracy");

            int maxIterations = 1000000;
            var corrects = new List<bool>();
            int flawlessRuns = 0;
            int i = 0;
            while (i < maxIterations) {

                int trainingCase = NeuralNetwork.NextRandomInt(0, training.Length);
                var trainingData = training[trainingCase];
                nn.SetInputs(trainingData);
                nn.FeedForward();

                nn.TrainingOutputs[0] = trainingData[2];

                bool isCorrect = (nn.OutputLayer.Neurons[0].Output < 0.5 ? 0 : 1) == nn.TrainingOutputs[0];
                corrects.Add(isCorrect);
                while (corrects.Count > 100) corrects.RemoveAt(0);
                float percentCorrect = 0;
                foreach (var correct in corrects) if (correct) percentCorrect += 1;
                percentCorrect /= corrects.Count;

                if (percentCorrect == 1) flawlessRuns++;
                else flawlessRuns = 0;

                nn.Backpropagate();

                if (i % 100 == 0) {
                    #region Output state

                    Console.WriteLine(
                        RightJustify(i.ToString("#,##0"), 9) + "    " +
                        trainingData[0] +
                        " xor " +
                        trainingData[1] + " = " +
                        RightJustify("" + nn.OutputLayer.Neurons[0].Output.ToString("0.000"), 7) + "  " +
                        (isCorrect ? "       " : "(wrong)") +
                        RightJustify((percentCorrect * 100).ToString("0.0") + "% ", 12) +
                        RenderPercent(percentCorrect * 100)
                    );

                    #endregion
                }

                if (flawlessRuns == 1000) {
                    Console.WriteLine("I've had " + flawlessRuns.ToString("#,##0") + " flawless predictions recently. Continue anyway?");
                    Console.Beep();
                    Console.ReadLine();
                }

                i++;
            }

            Console.WriteLine("Done");
            Console.Beep();
            Console.ReadLine();
        }

        static string RenderPercent(float percent) {
            float value = percent / 10f;
            if (value < 0.5) return "|          |";
            if (value < 1.5) return "|-         |";
            if (value < 2.5) return "|--        |";
            if (value < 3.5) return "|---       |";
            if (value < 4.5) return "|----      |";
            if (value < 5.5) return "|-----     |";
            if (value < 6.5) return "|------    |";
            if (value < 7.5) return "|-------   |";
            if (value < 8.5) return "|--------  |";
            if (value < 9.5) return "|--------- |";
            return "|----------|";
        }

        static string RightJustify(string text, int width) {
            while (text.Length < width) text = " " + text;
            return text;
        }

    }


    public enum ActivationFunctionEnum {
        /// <summary> Rectified Linear Unit </summary>
        ReLU,
        /// <summary> Leaky Rectified Linear Unit </summary>
        LReLU,
        /// <summary> Logistic sigmoid </summary>
        Sigmoid,
        /// <summary> Hyperbolic tangent </summary>
        TanH,
        /// <summary> Softmax function </summary>
        Softmax,
    }

    public class NeuralNetwork {

        /// <summary>
        /// The layers of neurons from input (0) to output (N)
        /// </summary>
        public Layer[] Layers { get; private set; }

        /// <summary>
        /// Equivalent to Layers.Length
        /// </summary>
        public int LayerCount { get; private set; }

        /// <summary>
        /// Equivalent to InputLayer.NeuronCount
        /// </summary>
        public int InputCount { get; private set; }

        /// <summary>
        /// Equivalent to OutputLayer.NeuronCount
        /// </summary>
        public int OutputCount { get; private set; }

        /// <summary>
        /// Equivalent to Layers[0]
        /// </summary>
        public Layer InputLayer { get; private set; }

        /// <summary>
        /// Equivalent to Layers[LayerCount - 1]
        /// </summary>
        public Layer OutputLayer { get; private set; }

        /// <summary>
        /// Provides the desired output values for use in backpropagation training
        /// </summary>
        public float[] TrainingOutputs { get; private set; }

        public NeuralNetwork() { }

        /// <summary>
        /// Constructs and adds a new neuron layer to .Layers
        /// </summary>
        public Layer AddLayer(
            int neuronCount,
            bool randomize = false,
            ActivationFunctionEnum activationFunction = ActivationFunctionEnum.TanH,
            float learningRate = 0.01f
        ) {
            // Since we can't expand the array we'll construct a new one
            var newLayers = new Layer[LayerCount + 1];
            if (LayerCount > 0) Array.Copy(Layers, newLayers, LayerCount);

            // Interconnect layers
            Layer previousLayer = null;
            if (LayerCount > 0) previousLayer = newLayers[LayerCount - 1];

            // Construct the new layer
            var layer = new Layer(neuronCount, previousLayer);
            layer.ActivationFunction = activationFunction;
            layer.LearningRate = learningRate;
            if (randomize) layer.Randomize();
            newLayers[LayerCount] = layer;

            // Interconnect layers
            if (LayerCount > 0) previousLayer.NextLayer = layer;

            // Cache some helpful properties
            if (LayerCount == 0) {
                InputLayer = layer;
                InputCount = neuronCount;
            }
            if (LayerCount == newLayers.Length - 1) {
                OutputLayer = layer;
                OutputCount = neuronCount;
                TrainingOutputs = new float[neuronCount];
            }

            // Emplace the new array and move on
            Layers = newLayers;
            LayerCount++;
            return layer;
        }

        /// <summary>
        /// Copy the array of input values to the input layer's .Output properties
        /// </summary>
        public void SetInputs(float[] inputs) {
            for (int n = 0; n < InputCount; n++) {
                InputLayer.Neurons[n].Output = inputs[n];
            }
        }

        /// <summary>
        /// Copy the output layer's .Output property values to the given array
        /// </summary>
        public void GetOutputs(float[] outputs) {
            for (int n = 0; n < OutputCount; n++) {
                outputs[n] = OutputLayer.Neurons[n].Output;
            }
        }

        /// <summary>
        /// Interpret the output array as a singular category (0, 1, 2, ...) or -1 (none)
        /// </summary>
        public int Classify() {
            float maxValue = 0;
            int bestIndex = -1;
            for (int o = 0; o < OutputCount; o++) {
                float value = OutputLayer.Neurons[o].Output;
                if (value > maxValue) {
                    bestIndex = o;
                    maxValue = value;
                }
            }
            if (maxValue == 0) return -1;
            return bestIndex;
        }

        /// <summary>
        /// Copy the given array's values to the .TrainingOutputs property
        /// </summary>
        public void SetTrainingOutputs(float[] outputs) {
            Array.Copy(outputs, TrainingOutputs, OutputCount);
        }

        /// <summary>
        /// Flipside of .Classify() that sets .TrainingOutputs to all zeros and the given index to one
        /// </summary>
        public void SetTrainingClassification(int value) {
            for (int o = 0; o < OutputCount; o++) {
                if (o == value) {
                    TrainingOutputs[o] = 1;
                } else {
                    TrainingOutputs[o] = 0;
                }
            }
        }

        /// <summary>
        /// Feed .Inputs forward to populate .Outputs
        /// </summary>
        public void FeedForward() {
            for (int l = 1; l < LayerCount; l++) {
                var layer = Layers[l];
                layer.FeedForward();
            }
        }

        /// <summary>
        /// One iteration of backpropagation training using inputs and training outputs after .Predict() was called on the same
        /// </summary>
        public void Backpropagate() {
            for (int l = LayerCount - 1; l > 0; l--) {
                var layer = Layers[l];
                layer.Backpropagate(TrainingOutputs);
            }
        }

        /// <summary>
        /// Returns a random float in the range from min to max (inclusive)
        /// </summary>
        public static float NextRandom(float min, float max) {
            return (float)random.NextDouble() * (max - min) + min;
        }
        /// <summary>
        /// Returns a random int that is at least min and less than max
        /// </summary>
        public static int NextRandomInt(int min, int max) {
            return random.Next(min, max);
        }
        private static Random random = new Random();

    }


    public class Layer {

        /// <summary>
        /// All the neurons in this layer
        /// </summary>
        public Neuron[] Neurons;

        /// <summary>
        /// Reference to the earlier layer that I get my input from
        /// </summary>
        public Layer PreviousLayer;

        /// <summary>
        /// Reference to the later layer that gets its input from me
        /// </summary>
        public Layer NextLayer;

        /// <summary>
        /// A tunable parameter that trades shorter training times for greater final accuracy
        /// </summary>
        public float LearningRate = 0.01f;

        /// <summary>
        /// How to transform the summed-up scalar output value of each neuron during feed forward
        /// </summary>
        public ActivationFunctionEnum ActivationFunction = ActivationFunctionEnum.TanH;

        /// <summary>
        /// Equivalent to Neurons.Length
        /// </summary>
        public int NeuronCount { get; private set; }

        public Layer(int neuronCount, Layer previousLayer) {
            PreviousLayer = previousLayer;
            NeuronCount = neuronCount;
            Neurons = new Neuron[NeuronCount];
            for (int n = 0; n < NeuronCount; n++) {
                Neuron neuron = new Neuron(this);
                Neurons[n] = neuron;
            }
        }

        /// <summary>
        /// Forget all prior training by randomizing all input weights and biases
        /// </summary>
        public void Randomize() {
            // Put weights in the range of -0.5 to 0.5
            const float randomWeightRadius = 0.5f;
            foreach (Neuron neuron in Neurons) {
                neuron.Randomize(randomWeightRadius);
            }
        }

        /// <summary>
        /// Feed-forward algorithm for this layer
        /// </summary>
        public void FeedForward() {
            foreach (var neuron in Neurons) {

                // Sum up the previous layer's outputs multiplied by this neuron's weights for each
                float sigma = 0;
                for (int i = 0; i < PreviousLayer.NeuronCount; i++) {
                    sigma += PreviousLayer.Neurons[i].Output * neuron.InputWeights[i];
                }
                sigma += neuron.Bias;  // Add in each neuron's bias too

                // Shape the output using the activation function
                float output = ActivationFn(sigma);
                neuron.Output = output;
            }

            // The Softmax activation function requires extra processing of aggregates
            if (ActivationFunction == ActivationFunctionEnum.Softmax) {
                // Find the max output value
                float max = float.NegativeInfinity;
                foreach (var neuron in Neurons) {
                    if (neuron.Output > max) max = neuron.Output;
                }
                // Compute the scale
                float scale = 0;
                foreach (var neuron in Neurons) {
                    scale += (float)Math.Exp(neuron.Output - max);
                }
                // Shift and scale the outputs
                foreach (var neuron in Neurons) {
                    neuron.Output = (float)Math.Exp(neuron.Output - max) / scale;
                }
            }
        }

        /// <summary>
        /// Backpropagation algorithm
        /// </summary>
        public void Backpropagate(float[] trainingOutputs) {

            // Compute error for each neuron
            for (int n = 0; n < NeuronCount; n++) {
                var neuron = Neurons[n];
                float output = neuron.Output;

                if (NextLayer == null) {  // Output layer
                    var error = trainingOutputs[n] - output;
                    neuron.Error = error * ActivationFnDerivative(output);
                } else {  // Hidden layer
                    float error = 0;
                    for (int o = 0; o < NextLayer.NeuronCount; o++) {
                        var nextNeuron = NextLayer.Neurons[o];
                        var iw = nextNeuron.InputWeights[n];
                        error += nextNeuron.Error * iw;
                    }
                    neuron.Error = error * ActivationFnDerivative(output);
                }
            }

            // Adjust weights of each neuron
            for (int n = 0; n < NeuronCount; n++) {
                var neuron = Neurons[n];

                // Update this neuron's bias
                var gradient = neuron.Error;
                neuron.Bias += gradient * LearningRate;

                // Update this neuron's input weights
                for (int i = 0; i < PreviousLayer.NeuronCount; i++) {
                    gradient = neuron.Error * PreviousLayer.Neurons[i].Output;
                    neuron.InputWeights[i] += gradient * LearningRate;
                }
            }

        }

        private float ActivationFn(float value) {
            switch (ActivationFunction) {
                case ActivationFunctionEnum.ReLU:
                    if (value < 0) return 0;
                    return value;
                case ActivationFunctionEnum.LReLU:
                    if (value < 0) return value * 0.01f;
                    return value;
                case ActivationFunctionEnum.Sigmoid:
                    return (float)(1 / (1 + Math.Exp(-value)));
                case ActivationFunctionEnum.TanH:
                    return (float)Math.Tanh(value);
                case ActivationFunctionEnum.Softmax:
                    return value;  // This is only the first part of summing up all the values
            }
            return value;
        }

        private float ActivationFnDerivative(float value) {
            switch (ActivationFunction) {
                case ActivationFunctionEnum.ReLU:
                    if (value > 0) return 1;
                    return 0;
                case ActivationFunctionEnum.LReLU:
                    if (value > 0) return 1;
                    return 0.01f;
                case ActivationFunctionEnum.Sigmoid:
                    return value * (1 - value);
                case ActivationFunctionEnum.TanH:
                    return 1 - value * value;
                case ActivationFunctionEnum.Softmax:
                    return (1 - value) * value;
            }
            return 0;
        }

    }


    public class Neuron {

        /// <summary>
        /// The weight I put on each of my inputs when computing my output as my essential learned memory
        /// </summary>
        public float[] InputWeights;

        /// <summary>
        /// My bias is also part of my learned memory
        /// </summary>
        public float Bias;

        /// <summary>
        /// My feed-forward computed output
        /// </summary>
        public float Output;

        /// <summary>
        /// My back-propagation computed error
        /// </summary>
        public float Error;

        public Neuron(Layer layer) {
            if (layer.PreviousLayer != null) {
                InputWeights = new float[layer.PreviousLayer.NeuronCount];
            }
        }

        /// <summary>
        /// Forget all prior training by randomizing my input weights and bias
        /// </summary>
        public void Randomize(float radius) {
            if (InputWeights != null) {
                for (int i = 0; i < InputWeights.Length; i++) {
                    InputWeights[i] = NeuralNetwork.NextRandom(-radius, radius);
                }
            }
            Bias = NeuralNetwork.NextRandom(-radius, radius);
        }

    }

}