It is a hyperparameter that can control the amount by which coefficients are adjusted i.e. shifted to either direction left or right. Since it is a hyperparameter, we can change its value.
m[i]= m[i-1] - α * (slope_m)
- m= coefficient value
- α= learning rate
Case 1: If Learning Rate is very High
If the value of the learning rate is kept very high, there will be large updates in the weight i.e the weights will explode (Exploding Gradient) from one value to another and may never reach global minima.
Case 2: If Learning Rate is too Low
Keeping a low value for the learning rate makes the weights be updated by a very small amount. It can cause the Vanishing Gradient problem i.e. the value will take a huge amount of time to reach global minima.
How to Find Optimal Value for Learning Rate?
Initially keep a high value for the learning rate and if you find it is overshooting, decrease its value until you reach an optimal value.