Algorithm for Gradient Descent
Steps should be made in proportion to the negative of the function gradient (move away from the gradient) at the current point to find local minima. Gradient Ascent is the procedure for approaching a local maximum of a function by taking steps proportional to the positive of the gradient (moving towards the gradient).
repeat until convergence { w = w - (learning_rate * (dJ/dw)) b = b - (learning_rate * (dJ/db)) }
Step 1: Initializing all the necessary parameters and deriving the gradient function for the parabolic equation 4x2. The derivative of x2 is 2x, so the derivative of the parabolic equation 4x2 will be 8x.
x0 = 3 (random initialization of x)
learning_rate = 0.01 (to determine the step size while moving towards local minima)
gradient = (Calculating the gradient function)
Step 2: Let us perform 3 iterations of gradient descent:
For each iteration keep on updating the value of x based on the gradient descent formula.
Iteration 1: x1 = x0 - (learning_rate * gradient) x1 = 3 - (0.01 * (8 * 3)) x1 = 3 - 0.24 x1 = 2.76 Iteration 2: x2 = x1 - (learning_rate * gradient) x2 = 2.76 - (0.01 * (8 * 2.76)) x2 = 2.76 - 0.2208 x2 = 2.5392 Iteration 3: x3 = x2 - (learning_rate * gradient) x3 = 2.5392 - (0.01 * (8 * 2.5392)) x3 = 2.5392 - 0.203136 x3 = 2.3360
From the above three iterations of gradient descent, we can notice that the value of x is decreasing iteration by iteration and will slowly converge to 0 (local minima) by running the gradient descent for more iterations. Now you might have a question, for how many iterations we should run gradient descent?
We can set a stopping threshold i.e. when the difference between the previous and the present value of x becomes less than the stopping threshold we stop the iterations. When it comes to the implementation of gradient descent for machine learning algorithms and deep learning algorithms we try to minimize the cost function in the algorithms using gradient descent. Now that we are clear with the gradient descent’s internal working, let us look into the python implementation of gradient descent where we will be minimizing the cost function of the linear regression algorithm and finding the best fit line. In our case the parameters are below mentioned:
How to implement a gradient descent in Python to find a local minimum ?
Gradient Descent is an iterative algorithm that is used to minimize a function by finding the optimal parameters. Gradient Descent can be applied to any dimension function i.e. 1-D, 2-D, 3-D. In this article, we will be working on finding global minima for parabolic function (2-D) and will be implementing gradient descent in python to find the optimal parameters for the linear regression equation (1-D). Before diving into the implementation part, let us make sure the set of parameters required to implement the gradient descent algorithm. To implement a gradient descent algorithm, we require a cost function that needs to be minimized, the number of iterations, a learning rate to determine the step size at each iteration while moving towards the minimum, partial derivatives for weight & bias to update the parameters at each iteration, and a prediction function.
Till now we have seen the parameters required for gradient descent. Now let us map the parameters with the gradient descent algorithm and work on an example to better understand gradient descent. Let us consider a parabolic equation y=4x2. By looking at the equation we can identify that the parabolic function is minimum at x = 0 i.e. at x=0, y=0. Therefore x=0 is the local minima of the parabolic function y=4x2. Now let us see the algorithm for gradient descent and how we can obtain the local minima by applying gradient descent:
Contact Us