A step by step forward pass and backpropagation example

Published April 23, 2021By Rabindra LamsalCategorized as Neural Networks

The neural network that we'll be solving in this article.

There are multiple libraries (PyTorch, TensorFlow) that can assist you in implementing almost any architecture of neural networks. This article is not about solving a neural net using one of those libraries. There are already plenty of articles, videos on that. In this article, we’ll see a step by step forward pass (forward propagation) and backward pass (backpropagation) example. We’ll be taking a single hidden layer neural network and solving one complete cycle of forward propagation and backpropagation.

Getting to the point, we will work step by step to understand how weights are updated in neural networks. The way a neural network learns is by updating its weight parameters during the training phase. There are multiple concepts needed to fully understand the working mechanism of neural networks: linear algebra, probability, calculus. I’ll try my best to re-visit calculus for the chain rule concept. I will keep aside the linear algebra (vectors, matrices, tensors) for this article. We’ll work on each and every computation and in the end up we’ll update all the weights of the example neural network for one complete cycle of forward propagation and backpropagation. Let’s get started.

Here’s a simple neural network on which we’ll be working.

I think the above example neural network is self-explanatory. There are two units in the Input Layer, two units in the Hidden Layer and two units in the Output Layer. The w1,w2,w2,…,w8 represent the respective weights. b1 and b2 are the biases for Hidden Layer and Output Layer, respectively.

In this article, we’ll be passing two inputs i1 and i2, and perform a forward pass to compute total error and then a backward pass to distribute the error inside the network and update weights accordingly.

Before getting started, let us deal with two basic concepts which should be sufficient to comprehend this article.

Peeking inside a single neuron

Inside h1 (first unit of the hidden layer)

Inside a unit, two operations happen (i) computation of weighted sum and (ii) squashing of the weighted sum using an activation function. The result from the activation function becomes an input to the next layer (until the next layer is an Output Layer). In this example, we’ll be using the Sigmoid function (Logistic function) as the activation function. The Sigmoid function basically takes an input and squashes the value between 0 and +1. We’ll discuss the activation functions in later articles. But, what you should note is that inside a neural network unit, two operations (stated above) happen. We can suppose the input layer to have a linear function that produces the same value as the input.

Chain Rule in Calculus

If we have $y = f(u)$ and $u = g(x)$ then we can write the derivative of y as:

\frac{dy}{dx} = \frac{dy}{du} * \frac{du}{dx}

The Forward Pass

Remember that each unit of a neural network performs two operations: compute weighted sum and process the sum through an activation function. The outcome of the activation function determines if that particular unit should activate or become insignificant.

Let’s get started with the forward pass.

For h1,

sum_{h1} = i_{1}*w_{1}+i_{2}*w_{3}+b_{1}

sum_{h1} = 0.1*0.1+0.5*0.3+0.25 = 0.41

Now we pass this weighted sum through the logistic function (sigmoid function) so as to squash the weighted sum into the range (0 and +1). The logistic function is an activation function for our example neural network.

output_{h1}=\frac{1}{1+e^{-sum_{h1}}}

output_{h1}=\frac{1}{1+e^{-0.41}} = 0.60108

Similarly for h2, we perform the weighted sum operation $sum_{h2}$ and compute the activation value $output_{h2}$ .

sum_{h2} = i_{1}*w_{2}+i_{2}*w_{4}+b_{1} = 0.47

output_{h2} = \frac{1}{1+e^{-sum_{h2}}} = 0.61538

Now, $output_{h1}$ and $output_{h2}$ will be considered as inputs to the next layer.

For o1,

sum_{o1} = output_{h1}*w_{5}+output_{h2}*w_{6}+b_{2} = 1.01977

output_{o1}=\frac{1}{1+e^{-sum_{o1}}} = 0.73492

Similarly for o2,

sum_{o2} = output_{h1}*w_{7}+output_{h2}*w_{8}+b_{2} = 1.26306

output_{o2}=\frac{1}{1+e^{-sum_{o2}}} = 0.77955

Computing the total error

We started off supposing the expected outputs to be 0.05 and 0.95 respectively for $output_{o1}$ and $output_{o2}$ . Now we will compute the errors based on the outputs received until now and the expected outputs.

We’ll use the following error formula,

E_{total} = \sum \frac{1}{2}(target-output)^{2}

To compute $E_{total}$ , we need to first find out respective errors at $o1$ and $o2$ .

E_{1} = \frac{1}{2}(target_{1}-output_{o1})^{2}

E_{1} = \frac{1}{2}(0.05-0.73492)^{2} = 0.23456

Similarly for E2,

E_{2} = \frac{1}{2}(target_{2}-output_{o2})^{2}

E_{2} = \frac{1}{2}(0.95-0.77955)^{2} = 0.01452

Therefore, $E_{total} = E_{1} + E_{2} = 0.24908$

The Backpropagation

The aim of backpropagation (backward pass) is to distribute the total error back to the network so as to update the weights in order to minimize the cost function (loss). The weights are updated in such as way that when the next forward pass utilizes the updated weights, the total error will be reduced by a certain margin (until the minima is reached).

For weights in the output layer (w5, w6, w7, w8)

For w5,

Let’s compute how much contribution w5 has on $E_{1}$ . If we become clear on how w5 is updated, then it would be really easy for us to generalize the same to the rest of the weights. If we look closely at the example neural network, we can see that $E_{1}$ is affected by $output_{o1}$ , $output_{o1}$ is affected by $sum_{o1}$ , and $sum_{o1}$ is affected by $w5$ . It’s time to recall the Chain Rule.

\frac{\partial E_{total}}{\partial w5} = \frac{\partial E_{total}}{\partial output_{o1}} * \frac{\partial output_{o1}}{\partial sum_{o1}} * \frac{\partial sum_{o1}}{\partial w5}

Let’s deal with each component of the above chain separately.

Component 1: partial derivative of Error w.r.t. Output

E_{total} = \sum \frac{1}{2}(target-output)^{2}

E_{total} = \frac{1}{2}(target_{1}-output_{o1})^{2} + \frac{1}{2}(target_{2}-output_{o2})^{2}

Therefore,

\frac{\partial E_{total}}{\partial output_{o1}} = 2*\frac{1}{2}*(target_{1}-output_{o1})*-1

= output_{o1} - target_{1}

Component 2: partial derivative of Output w.r.t. Sum

The output section of a unit of a neural network uses non-linear activation functions. The activation function used in this example is Logistic Function. When we compute the derivative of the Logistic Function, we get:

\sigma(x) = \frac{1}{1+e^{-x}}

\frac{\mathrm{d}}{\mathrm{dx}}\sigma(x) = \sigma(x)(1-\sigma(x))

Therefore, the derivative of the Logistic function is equal to output multiplied by (1 – output).

\frac{\partial output_{o1}}{\partial sum_{o1}} = output_{o1} (1 - output_{o1})

Component 3: partial derivative of Sum w.r.t. Weight

sum_{o1} = output_{h1}*w_{5}+output_{h2}*w_{6}+b_{2}

Therefore,

\frac{\partial sum_{o1}}{\partial w5} = output_{h1}

Putting them together,

\frac{\partial E_{total}}{\partial w5} = \frac{\partial E_{total}}{\partial output_{o1}} * \frac{\partial output_{o1}}{\partial sum_{o1}} * \frac{\partial sum_{o1}}{\partial w5}

\frac{\partial E_{total}}{\partial w5} = [output_{o1} - target_{1} ]* [output_{o1} (1 - output_{o1})] * [output_{h1}]

\frac{\partial E_{total}}{\partial w5} = 0.68492 * 0.19480 * 0.60108

\frac{\partial E_{total}}{\partial w5} = 0.08020

The $new\_w_{5}$ is,

$new\_w_{5} = w5 - n * \frac{\partial E_{total}}{\partial w5}$ , where n is learning rate.

new\_w_{5} = 0.5 - 0.6 * 0.08020

new\_w_{5} = 0.45187

We can proceed similarly for w6, w7 and w8.

For w6,

\frac{\partial E_{total}}{\partial w6} = \frac{\partial E_{total}}{\partial output_{o1}} * \frac{\partial output_{o1}}{\partial sum_{o1}} * \frac{\partial sum_{o1}}{\partial w6}

The first two components of this chain have already been calculated. The last component $\frac{\partial sum_{o1}}{\partial w6} = output_{h2}$ .

\frac{\partial E_{total}}{\partial w6} = 0.68492 * 0.19480 * 0.61538 = 0.08211

The $new\_w_{6}$ is,

new\_w_{6}= w6 - n * \frac{\partial E_{total}}{\partial w6}

new\_w_{6} = 0.6 - 0.6 * 0.08211

new\_w_{6} = 0.55073

For w7,

\frac{\partial E_{total}}{\partial w7} = \frac{\partial E_{total}}{\partial output_{o2}} * \frac{\partial output_{o2}}{\partial sum_{o2}} * \frac{\partial sum_{o2}}{\partial w7}

For the first component of the above chain, Let’s recall how the partial derivative of Error is computed w.r.t. Output.

\frac{\partial E_{total}}{\partial output_{o2}} = output_{o2} - target_{2}

For the second component,

\frac{\partial output_{o2}}{\partial sum_{o2}} = output_{o2} (1 - output_{o2})

For the third component,

\frac{\partial sum_{o2}}{\partial w7} = output_{h1}

Putting them together,

\frac{\partial E_{total}}{\partial w7} = [output_{o2} - target_{2}] * [output_{o2} (1 - output_{o2})] * [output_{h1}]

\frac{\partial E_{total}}{\partial w7} = -0.17044 * 0.17184 * 0.60108

\frac{\partial E_{total}}{\partial w7} = -0.01760

The $new\_w_{7}$ is,

new\_w_{7} = w7 - n * \frac{\partial E_{total}}{\partial w7}

new\_w_{7} = 0.7 - 0.6 * -0.01760

new\_w_{7} = 0.71056

Proceeding similarly, we get $new\_w_{8} = 0.81081$ (with $\frac{\partial E_{total}}{\partial w8} = -0.01802$ ).

For weights in the hidden layer (w1, w2, w3, w4)

Similar calculations are made to update the weights in the hidden layer. However, this time the chain becomes a bit longer. It does not matter how deep the neural network goes, all we need to find out is how much error is propagated (contributed) by a particular weight to the total error of the network. For that purpose, we need to find the partial derivative of Error w.r.t. to the particular weight. Let’s work on updating w1 and we’ll be able to generalize similar calculations to update the rest of the weights.

For w1 (with respect to E1),

For simplicity let us compute $\frac{\partial E_{1}}{\partial w1}$ and $\frac{\partial E_{2}}{\partial w1}$ separately, and later we can add them to compute $\frac{\partial E_{total}}{\partial w1}$ .

\frac{\partial E_{1}}{\partial w1} = \frac{\partial E_{1}}{\partial output_{o1}} * \frac{\partial output_{o1}}{\partial sum_{o1}} * \frac{\partial sum_{o1}}{\partial output_{h1}} * \frac{\partial output_{h1}}{\partial sum_{h1}} * \frac{\partial sum_{h1}}{\partial w1}

Let’s quickly go through the above chain. We know that $E_{1}$ is affected by $output_{o1}$ , $output_{o1}$ is affected by $sum_{o1}$ , $sum_{o1}$ is affected by $output_{h1}$ , $output_{h1}$ is affected by $sum_{h1}$ , and finally $sum_{h1}$ is affected by $w1$ . It is quite easy to comprehend, isn’t it?

For the first component of the above chain,

\frac{\partial E_{1}}{\partial output_{o1}} = output_{o1} - target_{1}

We’ve already computed the second component. This is one of the benefits of using the chain rule. As we go deep into the network, the previous computations are re-usable.

For the third component,

sum_{o1} = output_{h1}*w_{5}+output_{h2}*w_{6}+b_{2}

\frac{\partial sum_{o1}}{\partial output_{h1}} = w5

For the fourth component,

\frac{\partial output_{h1}}{\partial sum_{h1}} = output_{h1}*(1-output_{h1})

For the fifth component,

sum_{h1} = i_{1}*w_{1}+i_{2}*w_{3}+b_{1}

\frac{\partial sum_{h1}}{\partial w1} = i_{1}

Putting them all together,

\frac{\partial E_{1}}{\partial w1} = \frac{\partial E_{1}}{\partial output_{o1}} * \frac{\partial output_{o1}}{\partial sum_{o1}} * \frac{\partial sum_{o1}}{\partial output_{h1}} * \frac{\partial output_{h1}}{\partial sum_{h1}} * \frac{\partial sum_{h1}}{\partial w1}

\frac{\partial E_{1}}{\partial w1} = 0.68492 * 0.19480 * 0.5 * 0.23978 * 0.1 = 0.00159

Similarly, for w1 (with respect to E2),

\frac{\partial E_{2}}{\partial w1} = \frac{\partial E_{2}}{\partial output_{o2}} * \frac{\partial output_{o2}}{\partial sum_{o2}} * \frac{\partial sum_{o2}}{\partial output_{h1}} * \frac{\partial output_{h1}}{\partial sum_{h1}} * \frac{\partial sum_{h1}}{\partial w1}

For the first component of the above chain,

\frac{\partial E_{2}}{\partial output_{o2}} = output_{o2} - target_{2}

The second component is already computed.

For the third component,

sum_{o2} = output_{h1}*w_{7}+output_{h2}*w_{8}+b_{2}

\frac{\partial sum_{o2}}{\partial output_{h1}} = w7

The fourth and fifth components have also been already computed while computing $\frac{\partial E_{1}}{\partial w1}$ .

Putting them all together,

\frac{\partial E_{2}}{\partial w1} = \frac{\partial E_{2}}{\partial output_{o2}} * \frac{\partial output_{o2}}{\partial sum_{o2}} * \frac{\partial sum_{o2}}{\partial output_{h1}} * \frac{\partial output_{h1}}{\partial sum_{h1}} * \frac{\partial sum_{h1}}{\partial w1}

\frac{\partial E_{2}}{\partial w1} = -0.17044 * 0.17184 * 0.7 * 0.23978 * 0.1 = -0.00049

Now we can compute $\frac{\partial E_{total}}{\partial w1} = \frac{\partial E_{1}}{\partial w1} + \frac{\partial E_{2}}{\partial w1}$ .

$\frac{\partial E_{total}}{\partial w1} = 0.00159 + (-0.00049) = 0.00110$ .

The $new\_w_{1}$ is,

new\_w_{1} = w1 - n * \frac{\partial E_{total}}{\partial w1}

new\_w_{1}= 0.1 - 0.6 * 0.00110

new\_w_{1} = 0.09933

Proceeding similarly, we can easily update the other weights (w2, w3 and w4).

new\_w_{2} = 0.19919

new\_w_{3} = 0.29667

new\_w_{4} = 0.39597

Once we’ve computed all the new weights, we need to update all the old weights with these new weights. Once the weights are updated, one backpropagation cycle is finished. Now the forward pass is done and the total new error is computed. And based on this newly computed total error the weights are again updated. This goes on until the loss value converges to minima. This way a neural network starts with random values for its weights and finally converges to optimum values.

I hope you found this article useful. I’ll see you in the next one.

Share this article:

By Rabindra Lamsal

Ph.D. Candidate (Computer Science) at the University of Melbourne.

View all of Rabindra Lamsal's posts.

55 comments

Ben Hardy says:

January 11, 2022 at 9:13 pm

Thanks for this straightforward explanation! Once I separately went away and learned about partial derivatives it made complete sense. Very helpful for this programmer. Kudos!

Reply
1. Rabindra Lamsal says:
  
  July 12, 2022 at 1:51 am
  
  Hello Ben. Glad to know that the article was helpful.
  
  Reply
Joe Roland says:

January 27, 2022 at 6:06 pm

Thank you for this excellent article, Rabindra.

I’ve stepped through multiple tutorials similar to this, but in each case there was a problem with the tutorial. Either it was incomplete, or it contained errors. Also, it seems that the values for one tutorial in particular were copied into numerous other tutorials, including one YouTube video, and in each, there were incomplete steps. This tutorial had none of those issues. It was well written, concise, and accurate. I think you for that. I do not have a math background and most tutorials displayed endless calculus equations that I couldn’t read. What I needed was a complete step-by-step walkthrough of the actual numbers for one complete forward and one complete backward pass, and that is exactly what you provided. To underscore how much your example helped me: In other articles, where the author left it to the reader to determine the new/updated w2, w3 and w4 values, I was hopelessly lost, but by going through your articles, I was able to compute those values and verify them accurate against your results. I can’t thank you enough. I searched for days trying to find an article or video to help me grasp the concepts of a NN, and this is the only article that truly helped me.

Reply
1. Rabindra Lamsal says:
  
  July 12, 2022 at 1:51 am
  
  Hello Joe. Thank you so much for the words.
  
  Reply
Amin says:

June 21, 2022 at 5:32 am

Hi, Thanks a lot for excellent article.

Reply
1. Rabindra Lamsal says:
  
  July 12, 2022 at 1:52 am
  
  Thanks, Amin.
  
  Reply
Bipul says:

August 17, 2022 at 11:54 pm

Thanks for the detailed explanation. Finally got to go through this popular blog. Helped me a lot in understanding. We are at the same Uni, we should catch up sometime.

Reply
1. Rabindra Lamsal says:
  
  August 19, 2022 at 7:33 am
  
  Thanks, brother. Nobody knows about the blog :p, only this post seems to be getting lots of attention. Yeah, we should catch up sometime. HAHA!!
  
  Reply
Vladimir says:

August 21, 2022 at 1:17 pm

Thanks for the artical, it’s indeed most fullfilled one compare to banch others online
However, the network would not be working properly as the biases initialized and used for forward propagation but never updates… which means at any point of the function there would be offset, not equal to zero, but to other constants (.25 and .35) for the layers, not for individual neurons
Otherwise thanks!

Reply
1. Rabindra Lamsal says:
  
  October 3, 2022 at 9:03 am
  
  Yes, the biases also need to be updated accordingly.
  
  Reply
Yonten Jamtsh says:

September 6, 2022 at 9:07 am

Thank you for the wonderful blog. While computing the weight of w5, can’t we take E1 instead of E_total.

Reply
1. Rabindra Lamsal says:
  
  October 3, 2022 at 9:12 am
  
  Yes, we can. However, this applies to only the weights in the final layer. Once you come back inside the network (the other layers), the weights there have their effects on both E1 and E2. So its good to follow a general representation.
  
  Reply
Anaïs says:

October 14, 2022 at 4:47 pm

Thank you for this great article 🙂 ! It was really helpful, especially concerning the use of the Chain Rule !

Reply
1. Rabindra Lamsal says:
  
  October 20, 2022 at 9:42 pm
  
  Glad to know that you found the article useful! Thanks.
  
  Reply
Mumin Adam says:

October 18, 2022 at 5:51 am

You’re the best, man. From your article, I learn a lot. I salute you

Reply
1. Rabindra Lamsal says:
  
  October 20, 2022 at 9:45 pm
  
  Thank you, Mumin. Glad to know that the article was helpful.
  
  Reply
Fredrick Kimingi says:

October 22, 2022 at 6:59 am

This was really helpful. Quite simplified and well explained. Thank you.

Reply
1. Rabindra Lamsal says:
  
  October 24, 2022 at 12:15 am
  
  Glad that the article was of help!
  
  Reply
Pooja says:

October 24, 2022 at 3:31 am

Thank you so much for detailed and clear explanation! I finally understood topic.

Reply
1. Rabindra Lamsal says:
  
  November 1, 2022 at 11:48 am
  
  Glad to know that, pooja.
  
  Reply
jorge says:

November 12, 2022 at 3:36 pm

Muy bonito trabajo….muy bien explicado, e interesante!!!

Reply
1. Rabindra Lamsal says:
  
  November 17, 2022 at 8:46 pm
  
  Thanks, jorge.
  
  Reply
Roger Green says:

December 12, 2022 at 2:57 pm

Good as far as it goes despite missing out the biases.
I’d like to see you do one with 2 or 3 hidden layers.
The chain rule mathematics is fine but then the summing of the
error derivatives for weights and biases as you move back through the hidden layers gets a bit more complicated.

Reply
1. Rabindra Lamsal says:
  
  December 16, 2022 at 5:28 am
  
  The deep the network, the longer the chain of derivatives. The steps discussed above are generalizable.
  
  Reply
Urgesa says:

March 12, 2023 at 9:58 am

its really interesting, but am bit confused how come n=0.6(learning rate)?

Reply
1. Rabindra Lamsal says:
  
  May 18, 2023 at 4:10 am
  
  Hi Urgesa,
  
  I just assumed the learning rate to be 0.6. To understand in deep how the learning rate affects the training of neural networks, refer to this article: https://theneuralblog.com/gradient-descent-algorithm/
  
  Reply
Ade says:

April 1, 2023 at 5:40 pm

I am glad I got this post just by chance. It is really concise and simple to understand. You are indeed a teacher. Thank you for taking time out to help others through your knowledge.

Reply
1. Rabindra Lamsal says:
  
  May 18, 2023 at 11:13 pm
  
  Great to know that the article was helpful, Ade!
  
  Reply
Dr JAY says:

April 7, 2023 at 5:46 am

Hi Rabindra, Thanks a lot for excellent worked example.

Reply
1. Rabindra Lamsal says:
  
  May 18, 2023 at 11:12 pm
  
  Thank you, JAY.
  
  Reply
Vasundhara Sharma says:

April 25, 2023 at 1:58 pm

Great article! Helped me finish my neural networks homework problem.

Reply
1. Rabindra Lamsal says:
  
  May 18, 2023 at 11:12 pm
  
  Thank you, Vasundhara.
  
  Reply
Ayman says:

July 3, 2023 at 10:53 am

thank you for this artiicle
I think you did forword and backword probagation for one case of input , as I understand from keras foe example the forword is done for lot of input data then the update is done for weight is it right?

Reply
1. Rabindra Lamsal says:
  
  September 18, 2023 at 1:16 am
  
  Yes.
  
  Reply
Pingback: PyTorch and Tensors fundamentals
Jordan says:

December 11, 2023 at 6:26 am

Hi, it looks like in the visualizations, w6 and w7 are switched, making it look like w7 is connected to h1 and not h2, which does not align with your calculations.

Reply
1. Rabindra Lamsal says:
  
  December 19, 2023 at 12:22 pm
  
  Hi Jordan. As per the example neural network, we are computing (in the chain) sumO1 w.r.t. w6 and sumO2 w.r.t. w7.
  
  Reply
zyad says:

December 18, 2023 at 12:43 am

i understood everything in this amazing tutorial . english isnt my first language and i easily understood every thing . thats how high is your level . good job man .

Reply
1. Rabindra Lamsal says:
  
  December 18, 2023 at 8:57 am
  
  Glad it was helpful, Zyad.
  
  Reply
Pratap Simha says:

January 6, 2024 at 12:39 am

Super article and very well explained

Reply
RK says:

January 8, 2024 at 6:31 am

Hii…can we update value of bias b1 and b2 as well? if yes, b1 and b2 comes different for both the output nodes and hidden nodes. Is it correct or it should be same for both the nodes in one layer? Please reply

Reply
1. Rabindra Lamsal says:
  
  May 3, 2024 at 9:51 am
  
  Chain rule applies. All you need to compute is derivative of a loss function with respect to the parameter you want to update.
  
  Reply
hezrone mujawo says:

January 28, 2024 at 8:02 am

Very well explained,thank you so very much.

Reply
1. Rabindra Lamsal says:
  
  May 3, 2024 at 9:50 am
  
  Thank you!
  
  Reply
pradip says:

February 8, 2024 at 4:03 am

please make me clear about learning rate

Reply
1. Rabindra Lamsal says:
  
  May 3, 2024 at 9:49 am
  
  Please refer to this article for understanding the significance of a learning rate.
  
  https://theneuralblog.com/gradient-descent-algorithm/
  
  Reply
Mahit says:

February 12, 2024 at 5:31 am

Just beautifully explained

Reply
1. Rabindra Lamsal says:
  
  May 3, 2024 at 9:51 am
  
  Thank you!
  
  Reply
Ermy says:

February 23, 2024 at 7:15 pm

This is the best article I could find in the web, and I’ve searched for just some time. Usually articles fly over the differentiation formulas for the back propagation. I think, one couldn’t get the essence of a neural network in a more concentrated and crystal clear way. Thank you so much! Please continue posting.

P.S.: … and I would also have a suggestion for another post or post completion: your example takes a single feature value as an input and outputs a single value to the hidden layer: how does it work with multiple feature inputs with a single value output for the input layer? How with a matrix input? Thanks again!

Reply
1. Rabindra Lamsal says:
  
  May 3, 2024 at 9:55 am
  
  Thank you. Glad to know that you found the article useful.
  
  I am preparing some other related articles; maybe they will start appearing next month.
  
  Reply
Howard U. Dewing says:

April 18, 2024 at 8:06 am

Is that error formula of 0.5(|actual – calculated|)^2 an arbitrary decision? or is that *THE* error formula that Neural Network always uses??

Reply
1. Rabindra Lamsal says:
  
  May 3, 2024 at 9:46 am
  
  It is a loss function. There are plenty other ones. Fundamentally, every loss function compute the difference between actual values and predicted values.
  
  Reply
Prince says:

May 10, 2024 at 1:48 am

Well explained. But images are not showing. Can you just fix it?

Reply
Alican says:

May 22, 2024 at 2:47 am

Thanks for becoming helpful for writing some code.

Reply
Nick H says:

June 14, 2024 at 3:45 pm

Thanks for this tutorial. With the recent explosion in deep learning, almost every search result I find about neural networks immediately assumes you want to build a massive network using some complicated library. This is one of the only fully worked example (except for calculating the biases 😉 ) that shows you what happens as the network trains.

I am building a simple C# library to enable me to build some shallow networks and this has really helped me iron out the wrinkles, by being able to trace values through the network as it trains.

Thank you!

Reply

The Forward Pass

Computing the total error

The Backpropagation

For weights in the output layer (w5, w6, w7, w8)

For weights in the hidden layer (w1, w2, w3, w4)

By Rabindra Lamsal

55 comments

Leave a comment Cancel reply