Forecast Friday Topic: Multiple Regression Analysis (continued)

(Tenth in a series)

Today we resume our discussion of multiple regression analysis. Last week, we built a model to determine the extent of any relationship between U.S. savings & loan associations’ percent profit margin and two independent variables, net revenues per deposit dollar and number of S&L offices. Today, we will compute the 95% confidence interval for each parameter estimate; determine whether the model is valid; check for autocorrelation; and use the model to forecast. Recall that our resulting model was:

Yt = 1.56450 + 0.23720X1t – 0.000249X2t

Where Yt is the percent profit margin for the S&L in Year t; X1t is the net revenues per deposit dollar in Year t; and X2t is the number of S&L offices in the U.S. in Year t. Recall that the R2 is .865, indicating that 86.5% of the change in percentage profit margin is explained by changes in net revenues per deposit dollar and number of S&L offices.

Determining the 95% Confidence Interval for the Partial Slope Coefficients

In multiple regression analysis, since there are multiple independent variables, the parameter estimates for each independent variable both impact the slope of the line; hence the coefficients β1t and β2t are referred to as partial slope estimates. As with simple linear regression, we need to determine the 95% confidence interval for each parameter estimate, so that we could get an idea where the true population parameter lies. Recall from our June 3 post, we did that by determining the equation for the standard error of the estimate, sε, and then the standard error of the regression slope, sb. That worked well for simple regression, but for multiple regression, it is more complicated. Unfortunately, deriving the standard error of the partial regression coefficients requires the use of linear algebra, and would be too complicated to discuss here. Several statistical programs and Excel compute these values for us. So, we will state the values of sb1 and sb2 and go from there.

Sb1=0.05556

Sb2=0.00003

Also, we need our critical-t value for 22 degrees of freedom, which is 2.074.

Hence, our 95% confidence interval for β1 is denoted as:

0.23720 ± 2.074 × 0.05556

=0.12197 to 0.35243

Hence, we are saying that we can be 95% confident that the true parameter β1 lies somewhere between the values of 0.12197 and 0.35243.

Similarly, for β2, the procedure is similar:

-0.000249 ± 2.074 × 0.00003

=-0.00032 to -0.00018

Hence, we can be 95% confident that the true parameter β2 lies somewhere between the values of -0.00032 and -0.00018. Also, the confidence interval for the intercept, α, ranges from 1.40 to 1.73.

Note that in all of these cases, the confidence interval does not contain a value of zero within its range. The confidence intervals for α and β1 are positive; that for β2 is negative. If any parameter’s confidence interval ranges crossed zero, then the parameter estimate would not be significant.

Is Our Model Valid?

The next thing we want to do is determine if our model is valid. When validating our model we are trying to prove that our independent variables explain the variation in the dependent variable. So we start with a hypothesis test:

H0: β1 = β2 = 0

HA: at least one β ≠ 0

Our null hypothesis says that our independent variables, net revenue per deposit dollar and number of S&L offices, explain nothing of the variation in an S&L percentage profit margin, and hence, that our model is not valid. Our alternative hypothesis says that at least one of our independent variable explains some of the variation in an S&L’s percentage profit margin, and thus is valid.

So how do we do it? Enter the F-test. Like the T-test, the F-test is a means for hypothesis testing. Let’s first start by calculating our F-statistic for our model. We do that with the following equation:

Remember that RSS is the regression sum of squares and ESS is the error sum of squares. The May 27th Forecast Friday post showed you how to calculate RSS and ESS. For this model, our RSS=0.4015, and our ESS=0.0625; k is the number of independent variables, and n is the sample. Our equation reduces to:


= 70.66

If our Fcalc is greater than the critical F value for the distribution, then we can reject our null hypothesis and conclude that there is strong evidence that at least one of our independent variables explains some of the variation in an S&L’s percentage profit margin. How do we determine our critical F? There is yet another table in any statistics book or statistics Web site called the “F Distribution” table. In it, you look for two sets of degrees of freedom – one for the numerator and one for the denominator of your Fcalc equation. In the numerator, we have two degrees of freedom; in the denominator, 22. So we look at the F Distribution table notice the columns represent numerator degrees of freedom, and the rows, denominator degrees of freedom. When we find column (2), row (22), we end up with an F-value of 5.72.

Our Fcalc is greater than that, so we can conclude that our model is valid.

Is Our Model Free of Autocorrelation?

Recall from our assumptions that none of our error terms should be correlated with one another. If they are, autocorrelation results, rendering our parameter estimates inefficient. Check for autocorrelation, we need to look at our error terms, when we compare our predicted percentage profit margin, Ŷ, with our actual, Y:

Year

Percentage Profit Margin

Actual (Yt)

Predicted by Model (Ŷt)

Error

1

0.75

0.68

(0.0735)

2

0.71

0.71

0.0033

3

0.66

0.70

0.0391

4

0.61

0.67

0.0622

5

0.7

0.68

(0.0162)

6

0.72

0.71

(0.0124)

7

0.77

0.74

(0.0302)

8

0.74

0.76

0.0186

9

0.9

0.79

(0.1057)

10

0.82

0.79

(0.0264)

11

0.75

0.80

0.0484

12

0.77

0.83

0.0573

13

0.78

0.80

0.0222

14

0.84

0.80

(0.0408)

15

0.79

0.75

(0.0356)

16

0.7

0.73

0.0340

17

0.68

0.70

0.0249

18

0.72

0.69

(0.0270)

19

0.55

0.64

0.0851

20

0.63

0.61

(0.0173)

21

0.56

0.57

0.0101

22

0.41

0.48

0.0696

23

0.51

0.44

(0.0725)

24

0.47

0.40

(0.0746)

25

0.32

0.38

0.0574

The next thing we need to do is subtract the previous period’s error from the current period’s error. After that, we square our result. Note that we will only have 24 observations (we can’t subtract anything from the first observation):

Year

Error

Difference in Errors

Squared Difference in Errors

1

(0.07347)

  

  

2

0.00334

0.07681

0.00590

3

0.03910

0.03576

0.00128

4

0.06218

0.02308

0.00053

5

(0.01624)

(0.07842)

0.00615

6

(0.01242)

0.00382

0.00001

7

(0.03024)

(0.01781)

0.00032

8

0.01860

0.04883

0.00238

9

(0.10569)

(0.12429)

0.01545

10

(0.02644)

0.07925

0.00628

11

0.04843

0.07487

0.00561

12

0.05728

0.00884

0.00008

13

0.02217

(0.03511)

0.00123

14

(0.04075)

(0.06292)

0.00396

15

(0.03557)

0.00519

0.00003

16

0.03397

0.06954

0.00484

17

0.02489

(0.00909)

0.00008

18

(0.02697)

(0.05185)

0.00269

19

0.08509

0.11206

0.01256

20

(0.01728)

(0.10237)

0.01048

21

0.01012

0.02740

0.00075

22

0.06964

0.05952

0.00354

23

(0.07252)

(0.14216)

0.02021

24

(0.07460)

(0.00208)

0.00000

25

0.05738

0.13198

0.01742

 

If we sum up the last column, we will get .1218, if we then divide that by our ESS of 0.0625, we get a value of 1.95. What does this mean?

We have just computed what is known as the Durbin-Watson Statistic, which is used to detect the presence of autocorrelation. The Durbin-Watson statistic, d, can be anywhere from zero to 4. Generally, when d is close to zero, it suggests the presence of positive autocorrelation; a value close to 2 indicates no autocorrelation; while a value close to 4 indicates negative autocorrelation. In any case, you want your Durbin-Watson statistic to be as close to two as possible, and ours is.

Hence, our model seems to be free of autocorrelation.

Now, Let’s Go Forecast!

Now that we have validated our model, and saw that it was free of autocorrelation, we can be comfortable forecasting. Let’s say that for years 26 and 27, we have the following forecasts for net revenues per deposit dollar, X1t and number of S&L offices, X2t. They are as follows:

X1,26 = 4.70 and X2,26 = 9,350

X1,27 = 4.80 and X2,27 = 9,400

Plugging each of these into our equations, we generate the following forecasts:

Ŷ26 = 1.56450 + 0.23720 * 4.70 – 0.000249 * 9,350

=0.3504

Ŷ27 = 1.56450 + 0.23720 * 4.80 – 0.000249 * 9,400

=0.3617

Next Week’s Forecast Friday Topic: The Effect of Omitting an Important Variable

Now that we’ve walked you through this process, you know how to forecast and run multiple regression. Next week, we will discuss what happens when a key independent variable is omitted from a regression model and all the problems it causes when we violate the regression assumption that “all relevant and no irrelevant independent variables are included in the model.” Next week’s post will show a complete demonstration of such an impact. Stay tuned!

Advertisements

Tags: , , , , , , , , , , , , , , , ,

One Response to “Forecast Friday Topic: Multiple Regression Analysis (continued)”

  1. Multiple Regression: Specification Bias « Insight Central Says:

    […] last week’s Forecast Friday post, we discussed several of the important checks you must do to ensure that your model is valid. You […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: