The adjusted R squared (or adjusted coefficient of determination) is used in the multiple regression to see the degree of intensity or effectiveness that the independent variables have in explaining the dependent variable.
In simpler words, the adjusted R squared tells us what percentage of variation of the dependent variable is collectively explained by all the independent variables.
The use of this coefficient is justified in that as we add variables to a regression , the coefficient of adjustment without adjusting tends to increase. Even when the marginal contribution of each of the new added variables is not statistically relevant.
Therefore, when adding variables to the model, the coefficient of determination could increase and we could think, erroneously, that the chosen set of variables is capable of explaining a greater part of the variation of the independent variable. This problem is commonly known as “model overestimation”.
Adjusted coefficient of determination formula
To solve the problem described above, many researchers suggest adjusting the coefficient of determination using the following formula:
R 2 a → R squared adjusted or coefficient of determination adjusted
R 2 → R squared or coefficient of determination
n → Number of observations in the sample
k → Number of independent variables
Taking into account that 1-R 2 is a constant number and that n is greater than k, as we add variables to the model, the quotient in parentheses becomes larger. Consequently. also the result of multiplying this by 1-R 2 . With which we see that the formula is built to adjust and penalize the inclusion of coefficients in the model.
In addition to the previous advantage, the adjustment used in the previous formula also allows us to compare models with different numbers of independent variables. Again, the formula adjusts the number of variables between one model and another and allows us to make a homogeneous comparison.
Returning to the previous formula, we can deduce that the adjusted coefficient of determination will always be equal to or less than the coefficient of R 2 . Unlike the determination coefficient that varies between 0 and 1, the adjusted determination coefficient could be negative for 2 reasons:
- The closer ka n approaches.
- The lower the coefficient of determination