(Fifteenth in a series)
We have spent the last few Forecast Friday posts discussing violations of different assumptions in regression analysis. So far, we have discussed the effects of specification bias and multicollinearity on parameter estimates, and their corresponding effect on your forecasts. Today, we will discuss another violation, autocorrelation, which occurs when sequential residual (error) terms are correlated with one another.
When working with time series data, autocorrelation is the most common problem forecasters face. When the assumption of uncorrelated residuals is violated, we end up with models that have inefficient parameter estimates and upwardlybiased tratios and R^{2} values. These inflated values make our forecasting model appear better than it really is, and can cause our model to miss turning points. Hence, if you’re model is predicting an increase in sales and you, in actuality, see sales plunge, it may be due to autocorrelation.
What Does Autocorrelation Look Like?
Autocorrelation can take on two types: positive or negative. In positive autocorrelation, consecutive errors usually have the same sign: positive residuals are almost always followed by positive residuals, while negative residuals are almost always followed by negative residuals. In negative autocorrelation, consecutive errors typically have opposite signs: positive residuals are almost always followed by negative residuals and vice versa.
In addition, there are different orders of autocorrelation. The simplest, most common kind of autocorrelation, firstorder autocorrelation, occurs when the consecutive errors are correlated. Secondorder autocorrelation occurs when error terms two periods apart are correlated, and so forth. Here, we will concentrate solely on firstorder autocorrelation.
You will see a visual depiction of positive autocorrelation later in this post.
What Causes Autocorrelation?
The two main culprits for autocorrelation are sluggishness in the business cycle (also known as inertia) and omitted variables from the model. At various turning points in a time series, inertia is very common. At the time when a time series turns upward (downward), its observations build (lose) momentum, and continue going up (down) until the series reaches its peak (trough). As a result, successive observations and the error terms associated with them depend on each other.
Another example of inertia happens when forecasting a time series where the same observations can be in multiple successive periods. For example, I once developed a model to forecast enrollment for a community college, and found autocorrelation to be present in my initial model. This happened because many of the students enrolled during the spring term were also enrolled in the previous fall term. As a result, I needed to correct for that.
The other main cause of autocorrelation is omitted variables from the model. When an important independent variable is omitted from a model, its effect on the dependent variable becomes part of the error term. Hence, if the omitted variable has a positive correlation with the dependent variable, it is likely to cause error terms that are positively correlated.
How Do We Detect Autocorrelation?
To illustrate how we go about detecting autocorrelation, let’s first start with a data set. I have pulled the average hourly wages of textile and apparel workers for the 18 months from January 1986 through June 1987. The original source was the Survey of Current Business, September issues from 1986 and 1987, but this data set was reprinted in Data Analysis Using Microsoft ® Excel, by Michael R. Middleton, page 219:
Month 
t 
Wage 
Jan86 
1 
5.82 
Feb86 
2 
5.79 
Mar86 
3 
5.8 
Apr86 
4 
5.81 
May86 
5 
5.78 
Jun86 
6 
5.79 
Jul86 
7 
5.79 
Aug86 
8 
5.83 
Sep86 
9 
5.91 
Oct86 
10 
5.87 
Nov86 
11 
5.87 
Dec86 
12 
5.9 
Jan87 
13 
5.94 
Feb87 
14 
5.93 
Mar87 
15 
5.93 
Apr87 
16 
5.94 
May87 
17 
5.89 
Jun87 
18 
5.91 
Now, let’s run a simple regression model, using time period t as the independent variable and Wage as the dependent variable. Using the data set above, we derive the following model:
Ŷ = 5.7709 + 0.0095t
Examine the Model Output
Notice also the following model diagnostic statistics:
R^{2}= 
0.728 

Variable 
Coefficient 
tratio 
Intercept 
5.7709 
367.62 
t 
0.0095 
6.55 
You can see that the R^{2} is a high number, with changes in t explaining nearly threequarters the variation in average hourly wage. Note also the tratios for both the intercept and the parameter estimate for t. Both are very high. Recall that a high R^{2} and high tratios are symptoms of autocorrelation.
Visually Inspect Residuals
Just because a model has a high R^{2} and parameters with high tratios doesn’t mean autocorrelation is present. More work must be done to detect autocorrelation. Another way to check for autocorrelation is to visually inspect the residuals. The best way to do this is through plotting the average hourly wage predicted by the model against the actual average hourly wage, as Middleton has done:
Notice the green line representing the Predicted Wage. It is a straight, upward line. This is to be expected, since the independent variable is sequential and shows an increasing trend. The red line depicts the actual wage in the time series. Notice that the model’s forecast is higher than actual for months 5 through 8, and for months 17 and 18. The model also underpredicts for months 12 through 16. This clearly illustrates the presence of positive, firstorder autocorrelation.
The DurbinWatson Statistic
Examining the model components and visually inspecting the residuals are intuitive, but not definitive ways to diagnose autocorrelation. To really be sure if autocorrelation exists, we must compute the DurbinWatson statistic, often denoted as d.
In our June 24 Forecast Friday post, we demonstrated how to calculate the DurbinWatson statistic. The actual formula is:
That is, beginning with the error term for the second observation, we subtract the immediate previous error term from it; then we square the difference. We do this for each observation from the second one onward. Then we sum all of those squared differences together. Next, we square the error terms for each observation, and sum those together. Then we divide the sum of squared differences by the sum of squared error terms, to get our DurbinWatson statistic.
For our example, we have the following:
t 
Error 
Squared Error 
etet1 
Squared Difference 
1 
0.0396 
0.0016 

2 
0.0001 
0.0000 
(0.0395)  0.0016 
3 
0.0006 
0.0000 
0.0005  0.0000 
4 
0.0011 
0.0000 
0.0005  0.0000 
5 
(0.0384) 
0.0015 
(0.0395)  0.0016 
6 
(0.0379) 
0.0014 
0.0005  0.0000 
7 
(0.0474) 
0.0022 
(0.0095)  0.0001 
8 
(0.0169) 
0.0003 
0.0305  0.0009 
9 
0.0536 
0.0029 
0.0705  0.0050 
10 
0.0041 
0.0000 
(0.0495)  0.0024 
11 
(0.0054) 
0.0000 
(0.0095)  0.0001 
12 
0.0152 
0.0002 
0.0205  0.0004 
13 
0.0457 
0.0021 
0.0305  0.0009 
14 
0.0262 
0.0007 
(0.0195)  0.0004 
15 
0.0167 
0.0003 
(0.0095)  0.0001 
16 
0.0172 
0.0003 
0.0005  0.0000 
17 
(0.0423) 
0.0018 
(0.0595)  0.0035 
18 
(0.0318) 
0.0010 
0.0105  0.0001 
Sum: 
0.0163 
0.0171 
To obtain our DurbinWatson statistic, we plug our sums into the formula:
= 1.050
What Does the DurbinWatson Statistic Tell Us?
Our DurbinWatson statistic is 1.050. What does that mean? The DurbinWatson statistic is interpreted as follows:
 If d is close to zero (0), then positive autocorrelation is probably present;
 If d is close to two (2), then the model is likely free of autocorrelation; and
 If d is close to four (4), then negative autocorrelation is probably present.
As we saw from our visual examination of the residuals, we appear to have positive autocorrelation, and the fact that our DurbinWatson statistic is about halfway between zero and two suggests the presence of positive autocorrelation.
Next Forecast Friday Topic: Correcting Autocorrelation
Today we went through the process of understanding the causes and effect of autocorrelation, and how to suspect and detect its presence. Next week, we will discuss how to correct for autocorrelation and eliminate it so that we can have more efficient parameter estimates.
*************************
If you Like Our Posts, Then “Like” Us on Facebook and Twitter!
Analysights is now doing the social media thing! If you like Forecast Friday – or any of our other posts – then we want you to “Like” us on Facebook! By “Likeing” us on Facebook, you’ll be informed every time a new blog post has been published, or when other information comes out. Check out our Facebook page! You can also follow us on Twitter.
Tags: autocorrelation, Data Analysis Using Microsoft Excel, durbinwatson, error term, firstorder autocorrelation, forecast, Forecast Friday, Forecasting, Michael R. Middleton, negative autocorrelation, positive autocorrelation, regression analysis, residuals, Survey of Current Business, time series
Leave a Reply