(Tenth in a series)
Today we resume our discussion of multiple regression analysis. Last week, we built a model to determine the extent of any relationship between U.S. savings & loan associations’ percent profit margin and two independent variables, net revenues per deposit dollar and number of S&L offices. Today, we will compute the 95% confidence interval for each parameter estimate; determine whether the model is valid; check for autocorrelation; and use the model to forecast. Recall that our resulting model was:
Y_{t} = 1.56450 + 0.23720X_{1t} – 0.000249X_{2t}
Where Y_{t} is the percent profit margin for the S&L in Year t; X_{1t} is the net revenues per deposit dollar in Year t; and X_{2t} is the number of S&L offices in the U.S. in Year t. Recall that the R^{2} is .865, indicating that 86.5% of the change in percentage profit margin is explained by changes in net revenues per deposit dollar and number of S&L offices.
Determining the 95% Confidence Interval for the Partial Slope Coefficients
In multiple regression analysis, since there are multiple independent variables, the parameter estimates for each independent variable both impact the slope of the line; hence the coefficients β_{1t} and β_{2t} are referred to as partial slope estimates. As with simple linear regression, we need to determine the 95% confidence interval for each parameter estimate, so that we could get an idea where the true population parameter lies. Recall from our June 3 post, we did that by determining the equation for the standard error of the estimate, s_{ε}, and then the standard error of the regression slope, s_{b}. That worked well for simple regression, but for multiple regression, it is more complicated. Unfortunately, deriving the standard error of the partial regression coefficients requires the use of linear algebra, and would be too complicated to discuss here. Several statistical programs and Excel compute these values for us. So, we will state the values of s_{b1} and s_{b2} and go from there.
S_{b1}=0.05556
S_{b2}=0.00003
Also, we need our criticalt value for 22 degrees of freedom, which is 2.074.
Hence, our 95% confidence interval for β_{1} is denoted as:
0.23720 ± 2.074 × 0.05556
=0.12197 to 0.35243
Hence, we are saying that we can be 95% confident that the true parameter β_{1 }lies somewhere between the values of 0.12197 and 0.35243.
Similarly, for β_{2}, the procedure is similar:
0.000249 ± 2.074 × 0.00003
=0.00032 to 0.00018
Hence, we can be 95% confident that the true parameter β_{2} lies somewhere between the values of 0.00032 and 0.00018. Also, the confidence interval for the intercept, α, ranges from 1.40 to 1.73.
Note that in all of these cases, the confidence interval does not contain a value of zero within its range. The confidence intervals for α and β_{1 }are positive; that for β_{2} is negative. If any parameter’s confidence interval ranges crossed zero, then the parameter estimate would not be significant.
Is Our Model Valid?
The next thing we want to do is determine if our model is valid. When validating our model we are trying to prove that our independent variables explain the variation in the dependent variable. So we start with a hypothesis test:
H_{0}: β_{1} = β_{2} = 0
H_{A}: at least one β ≠ 0
Our null hypothesis says that our independent variables, net revenue per deposit dollar and number of S&L offices, explain nothing of the variation in an S&L percentage profit margin, and hence, that our model is not valid. Our alternative hypothesis says that at least one of our independent variable explains some of the variation in an S&L’s percentage profit margin, and thus is valid.
So how do we do it? Enter the Ftest. Like the Ttest, the Ftest is a means for hypothesis testing. Let’s first start by calculating our Fstatistic for our model. We do that with the following equation:
Remember that RSS is the regression sum of squares and ESS is the error sum of squares. The May 27th Forecast Friday post showed you how to calculate RSS and ESS. For this model, our RSS=0.4015, and our ESS=0.0625; k is the number of independent variables, and n is the sample. Our equation reduces to:
= 70.66
If our F_{calc} is greater than the critical F value for the distribution, then we can reject our null hypothesis and conclude that there is strong evidence that at least one of our independent variables explains some of the variation in an S&L’s percentage profit margin. How do we determine our critical F? There is yet another table in any statistics book or statistics Web site called the “F Distribution” table. In it, you look for two sets of degrees of freedom – one for the numerator and one for the denominator of your F_{calc} equation. In the numerator, we have two degrees of freedom; in the denominator, 22. So we look at the F Distribution table notice the columns represent numerator degrees of freedom, and the rows, denominator degrees of freedom. When we find column (2), row (22), we end up with an Fvalue of 5.72.
Our F_{calc} is greater than that, so we can conclude that our model is valid.
Is Our Model Free of Autocorrelation?
Recall from our assumptions that none of our error terms should be correlated with one another. If they are, autocorrelation results, rendering our parameter estimates inefficient. Check for autocorrelation, we need to look at our error terms, when we compare our predicted percentage profit margin, Ŷ, with our actual, Y:
Year 
Percentage Profit Margin 

Actual (Y_{t}) 
Predicted by Model (Ŷt) 
Error 

1 
0.75 
0.68 
(0.0735) 
2 
0.71 
0.71 
0.0033 
3 
0.66 
0.70 
0.0391 
4 
0.61 
0.67 
0.0622 
5 
0.7 
0.68 
(0.0162) 
6 
0.72 
0.71 
(0.0124) 
7 
0.77 
0.74 
(0.0302) 
8 
0.74 
0.76 
0.0186 
9 
0.9 
0.79 
(0.1057) 
10 
0.82 
0.79 
(0.0264) 
11 
0.75 
0.80 
0.0484 
12 
0.77 
0.83 
0.0573 
13 
0.78 
0.80 
0.0222 
14 
0.84 
0.80 
(0.0408) 
15 
0.79 
0.75 
(0.0356) 
16 
0.7 
0.73 
0.0340 
17 
0.68 
0.70 
0.0249 
18 
0.72 
0.69 
(0.0270) 
19 
0.55 
0.64 
0.0851 
20 
0.63 
0.61 
(0.0173) 
21 
0.56 
0.57 
0.0101 
22 
0.41 
0.48 
0.0696 
23 
0.51 
0.44 
(0.0725) 
24 
0.47 
0.40 
(0.0746) 
25 
0.32 
0.38 
0.0574 
The next thing we need to do is subtract the previous period’s error from the current period’s error. After that, we square our result. Note that we will only have 24 observations (we can’t subtract anything from the first observation):
Year 
Error 
Difference in Errors 
Squared Difference in Errors 
1 
(0.07347) 


2 
0.00334 
0.07681 
0.00590 
3 
0.03910 
0.03576 
0.00128 
4 
0.06218 
0.02308 
0.00053 
5 
(0.01624) 
(0.07842) 
0.00615 
6 
(0.01242) 
0.00382 
0.00001 
7 
(0.03024) 
(0.01781) 
0.00032 
8 
0.01860 
0.04883 
0.00238 
9 
(0.10569) 
(0.12429) 
0.01545 
10 
(0.02644) 
0.07925 
0.00628 
11 
0.04843 
0.07487 
0.00561 
12 
0.05728 
0.00884 
0.00008 
13 
0.02217 
(0.03511) 
0.00123 
14 
(0.04075) 
(0.06292) 
0.00396 
15 
(0.03557) 
0.00519 
0.00003 
16 
0.03397 
0.06954 
0.00484 
17 
0.02489 
(0.00909) 
0.00008 
18 
(0.02697) 
(0.05185) 
0.00269 
19 
0.08509 
0.11206 
0.01256 
20 
(0.01728) 
(0.10237) 
0.01048 
21 
0.01012 
0.02740 
0.00075 
22 
0.06964 
0.05952 
0.00354 
23 
(0.07252) 
(0.14216) 
0.02021 
24 
(0.07460) 
(0.00208) 
0.00000 
25 
0.05738 
0.13198 
0.01742 
If we sum up the last column, we will get .1218, if we then divide that by our ESS of 0.0625, we get a value of 1.95. What does this mean?
We have just computed what is known as the DurbinWatson Statistic, which is used to detect the presence of autocorrelation. The DurbinWatson statistic, d, can be anywhere from zero to 4. Generally, when d is close to zero, it suggests the presence of positive autocorrelation; a value close to 2 indicates no autocorrelation; while a value close to 4 indicates negative autocorrelation. In any case, you want your DurbinWatson statistic to be as close to two as possible, and ours is.
Hence, our model seems to be free of autocorrelation.
Now, Let’s Go Forecast!
Now that we have validated our model, and saw that it was free of autocorrelation, we can be comfortable forecasting. Let’s say that for years 26 and 27, we have the following forecasts for net revenues per deposit dollar, X_{1t} and number of S&L offices, X_{2t}. They are as follows:
X_{1,26} = 4.70 and X_{2,26} = 9,350
X_{1,27} = 4.80 and X_{2,27} = 9,400
Plugging each of these into our equations, we generate the following forecasts:
Ŷ_{26} = 1.56450 + 0.23720 * 4.70 – 0.000249 * 9,350
=0.3504
Ŷ_{27} = 1.56450 + 0.23720 * 4.80 – 0.000249 * 9,400
=0.3617
Next Week’s Forecast Friday Topic: The Effect of Omitting an Important Variable
Now that we’ve walked you through this process, you know how to forecast and run multiple regression. Next week, we will discuss what happens when a key independent variable is omitted from a regression model and all the problems it causes when we violate the regression assumption that “all relevant and no irrelevant independent variables are included in the model.” Next week’s post will show a complete demonstration of such an impact. Stay tuned!
Tags: autocorrelation, confidence interval, dependent variable, durbinwatson, f distribution confidence interval, f statistic, Forecast Friday, Forecasting, independent variable, multiple regression, partial slope coefficients, partial slope parameters, predictive modeling, regression, simple regression, statistical modeling, validation of regression model
July 6, 2010 at 12:05 am 
[…] last week’s Forecast Friday post, we discussed several of the important checks you must do to ensure that your model is valid. You […]