Adjusted R-Squared is the modified version of R-Squared. Both R-Squared statistics and Adjusted R-Squared statistics are used to evaluate the performance of Regression models.
R-squared statistic tells the proportion of variation in the target variable explained by the linear regression model. It is also known as the Coefficient of Determination or Goodness of fit.
Let’s break down the formula and understand each term individually.
Residual Sum of Square (RSS)
Residual is the difference between actual output and predicted output in the Regression model.
Residual =
In the above graph, red color lines represent the Residuals.
The summation of the difference between actual output and predicted output in the Regression model is called the Residual Sum of Square.
RSS =
Total Sum of Square (TSS)
Total Sum of Square is the total variation in the target variable that can be calculated by taking the sum of squares of the difference between the actual values and their mean.
TSS =
R-Squared Formula
R-squared = (TSS-RSS)/TSS R-squared = Explained variation / Total variation R-squared = 1 – Unexplained variation / Total variation
Problem with R-Squared
When a new independent variable is added to a regression model, the value of the R-squared always increases even when the new variable is not adding any value to the model.
Adjusted R Squared
Adjusted R Squared solves the problem of R Squared by including the number of independent variables in the formula. This helps us to find if including the new feature adds some value to the model.
Adjusted R-Squared =
where
- n = number of data points in our dataset.
- k = number of independent variables.
- R = R-squared values.