The Sigmoid Function is one of the non-linear functions that is used as an activation function in neural networks. We know that a unit of a neural network has two operations. First, compute the weighted sum and second, pass the resulting sum through an activation function to squeeze the sum into a certain range such as (-1,1), (0,+1) etc. Based on the result obtained from the activation function, the unit is decided to be active or inactive.
While finding out the partial derivative of output with respect to sum, we have been performing the following computation (if the activation function used is Sigmoid):
\frac{\partial output_{o1}}{\partial sum_{o1}} = output_{o1} (1 - output_{o1})How does the above computation get derived?
For this, we must differentiate the Sigmoid Function. We know the Sigmoid Function is written as,
\sigma(x) = \frac{1}{1+e^{-x}}Differentiating both the sides w.r.t x, we get,
\frac{\mathrm{d}}{\mathrm{d}x}\sigma(x) = \frac{\mathrm{d}}{\mathrm{d}x}\left (\frac{1}{1+e^{-x}} \right ) = \frac{\mathrm{d}}{\mathrm{d}x}\left ({1+e^{-x}} \right )^{-1}Let’s apply the derivative.
= -1*({1+e^{-x}})^{-1-1}*-1*(e^{-x}) = -1*({1+e^{-x}})^{-2}*-(e^{-x})Simplifying,
= \frac{e^{-x}}{{(1+e^{-x}})^{2}} = \frac{1}{{(1+e^{-x}})}*\frac{e^{-x}}{(1+e^{-x})}Playing with 1s in the second block,
= \frac{1}{{(1+e^{-x}})}*\frac{1+ e^{-x}-1}{(1+e^{-x})} = \frac{1}{{(1+e^{-x}})}* \left (\frac{(1+ e^{-x})}{(1+e^{-x})} - \frac{1}{(1+e^{-x})} \right) = \frac{1}{{(1+e^{-x}})}* \left (1 - \frac{1}{(1+e^{-x})} \right)Substituting \frac{1}{1+e^{-x}} = \sigma(x) in above equation, we get,
\frac{\mathrm{d}}{\mathrm{dx}}\sigma(x) = \sigma(x)(1-\sigma(x))Therefore, the derivative of a sigmoid function is equal to the multiplication of the sigmoid function itself with (1 – sigmoid function itself). Quite elegant, isn’t it?
Thanks for reading this article. I will see you in the next one.