Understanding Activation Functions in Depth

What is an Activation function ?
or
Example of a binary classification problem:
when we multiply each of them features with a weight (w1, w2, …, wm) and sum them all together, node output = activation(weighted sum of inputs).

(1)  

Some Important terminology and mathematical concept –
  • Propagation is a procedure to repeatedly adjust the weights so as to minimize the difference between actual output and desired output.
  • Hidden Layers is which are neuron nodes stacked in between inputs and outputs, allowing neural networks to learn more complicated features (such as XOR logic).
  • Backpropagation is a procedure to repeatedly adjust the weights so as to minimize the difference between actual output and desired output. It allows the information to go back from the cost backward through the network in order to compute the gradient. Therefore, loop over the nodes starting from the final node in reverse topological order to compute the derivative of the final node output. Doing so will help us know who is responsible for the most error and change the parameters appropriate in that direction.
  • Gradient Descent is used while training a machine learning model. It is an optimization algorithm, based on a convex function, that tweaks its parameters iteratively to minimize a given function to its local minimum. A gradient measures how much the output of a function changes if you change the inputs a little bit. Note: If gradient descent is working properly, the cost function should decrease after every iteration.

Types of activation Functions:

1. Linear Activation Function –
Equation : f(x) = x
Range : (-infinity to infinity)
2. Non-linear Activation Functions –
1. Derivative:
2. Monotonic function:
The Nonlinear Activation Functions are mainly divided on the basis of their range or curves as follows:
Let’s take a deeper insight in each Activations Functions-
1. Sigmoid:
Binary classifier
Logistic Activation function
Derivation of Sigmoid:

   

2. ReLU (Rectified Linear Unit):
f(x) = max(0, x)
3. Leaky ReLU:

(2)   =0)(x) \end{equation*}" title="Rendered by QuickLaTeX.com">

4. Tanh or hyperbolic tangent:
 tanh(x)=2 sigmoid(2x)-1 
  5. Softmax :
where 0 is a vector of the inputs to the output layer (if you have 10 output units, then there are 10 elements in z). And again, j indexes the output units, so j = 1, 2, …, K.
Properties of Softmax Function –
1.
2.
Softmax Function Usage –
1.
2.
Example:

(3)  



Contact Us