Posts Tagged ‘forecast’

Forecast Friday Topic: Judgmental Bias in Forecasting

March 17, 2011

(Fortieth in a series)

Over the last several weeks, we have discussed many of the qualitative forecasting methods, approaches that rely heavily on judgment and less on analytical tools. Because judgmental forecasting techniques rely upon a person’s thought processes and experiences, they can be highly subjected to bias. Today, we will complete our coverage of judgmental forecasting methods with a discussion of some of the common biases they inspire.

Inconsistency and Conservatism

Two very opposite biases in judgmental forecasting are inconsistency and conservatism. Inconsistency occurs when decision-makers apply different decision criteria in similar situations. Sometimes memories fade; other times, a manager or decision-maker may overestimate the impact of some new or extraneous event that is occurring in the subsequent situation that makes it different from the previous; he/she could be influenced by his/her mood that day; or he/she just wants to try something new out of boredom. Inconsistency can have serious negative repercussions.

One way to overcome inconsistency is to have a set of formal decision rules, or “expert systems,” that set objective criteria for decision-making, which must be applied to each similar forecasting situation. These criteria would be the factors to measure, the weight each one gets, and the objective of the forecasting project. When formal decision rules are imposed and applied consistently, forecasts tend to improve. However, it is important to monitor your environment as your expert systems are applied, so that they can be changed as your market evolves. Otherwise, failing to change a process in light of strong new information or evidence is a new bias, conservatism.

Now, have I just contradicted myself? No. Learning must always be applied in any expert system. We live in a dynamic world, not a static one. However, most change to our environment, and hence our expert systems, doesn’t occur dramatically or immediately. Often, they occur gradually and more subtly. It’s important to apply your expert systems and practice them for time, monitoring anything else in the environment, as well as the quality of forecasts your expert systems are measuring. If the gap between your forecast and actual performance is growing consistently, then it might be time to revisit your criteria. Perhaps you assigned too much or too little weight to one or more factors; perhaps new technologies are being introduced in your industry.

Decision-makers walk a fine line between inconsistency and conservatism in judgmental forecasts. Trying to reduce one bias may inspire another.

Recency

Often, when there are shocks in the economy, or disasters, these recent events tend to dominate our thoughts about the future. We tend to believe these conditions are permanent, so we downplay or ignore relevant events from the past. So, to avoid recency bias, we must remember that business cycles exist, and that ups and downs don’t last forever. Moreover, we should still keep expert systems in place that force us to consider all factors relevant in forecasting the event of interest.

Optimism

I’m guilty of this bias! Actually, many people are. Our projections are often clouded by the future outcomes we desire. Sometimes, we feel compelled to provide rosy projections because of pressure by higher-up executives. Unfortunately, optimism in forecasting can be very dangerous, and its repercussions severe when it is discovered how different our forecasted vs. actual results are. Many a company’s stock price has plunged because of overly optimistic forecasts. The best ways to avoid optimism are to have a disinterested third party generate the forecasts; or have other individuals make their own independent forecasts.

************

These are just a sample of the biases common in judgmental forecasting methods. And as you’ve probably guessed, deciding which biases you’re able to live with and which you are not able to live with is also a subjective decision! In general, for your judgmental forecasts to be accurate, you must consistently guard against biases and have set procedures in place for decision-making, that include learning as you go along.

*************************************

Next Forecast Friday Topic: Combining Forecasts

For the last 10 months, I have introduced you to the various ways by which forecasts are generated and the strengths and limitations of each approach. Organizations frequently generate multiple forecasts based on different approaches, decision criteria, and different assumptions. Finding a way to combine these forecasts into a representative composite forecast for the organization, as well as evaluating each forecast is crucial to the learning process and, ultimately, the success of the organization. So, beginning with next week’s Forecast Friday post, we begin our final Forecast Friday mini-series on combining and evaluating forecasts.

Advertisements

Forecast Friday Topic: Other Judgmental Forecasting Methods

March 3, 2011

(Thirty-ninth in a series)

Over the last several weeks, we discussed a series of different non-quantitative forecasting methods: Delphi Method, Jury of Executive Opinion, Sales Force Composite Forecasts, and Surveys of Expectations. In today’s brief post, we’ll finish with a brief discussion of three more judgmental forecasting methods: Scenario Writing, La Prospective, and Cross-Impact Analysis.

Scenario Writing

When a company’s or industry’s long-term future is far too difficult to predict (whose isn’t!), it is common for experts in that company or industry to ponder over possible situations in which the company or industry may find itself in the distant future. The documentation of these situations – scenarios – is known as scenario writing. Scenario writing seeks to get managers thinking in terms of possible outcomes at a future time where quantitative forecasting methods may be inadequate for forecasting. Unfortunately, much literature on this approach suggests that writing multiple scenarios does not have much better quality over any of the other judgmental forecasting methods we’ve discussed to date.

La Prospective

Developed in France, La Prospective eschews quantitative models and emphasizes several potential futures that may result from the activities of individuals. Interaction among several events, many of which can be, and indeed are, dynamic in structure and constantly evolving, are studied and their impacts are cross-analyzed, and their effect on the future is assessed. La Prospective devotes considerable attention to the power, strategies, and resources of the individual “agents” whose actions will influence the future. Because the different components being analyzed can be dynamic, the forecasting process for La Prospective is often not linear; stages can progress in different or simultaneous order. And the company doing the forecasting may also be one of the influential agents involved. This helps companies assess the value of any actions the company might take. After the La Prospective process is complete, scenarios of the future are written, from which the company can formulate strategies.

Cross-Impact Analysis

Cross-Impact analysis seeks to account for the interdependence of uncertain future events. Quite often, a future event occurring can be caused or determined by the occurrence of another event. And often, an analyst may have strong knowledge of one event, and little or no knowledge about the others. For example, in trying to predict the future price of tissue, experts at companies like Kimberly-Clark, along with resource economists, forest experts, and conservationists may all have useful views. If a country that has vast acreages of timber imposes more stringent regulations on the cutting down of trees, that can result in sharp increases in the price of tissue. Moreover, if there is a major increase, or even a sharp reduction, in the incidence of influenza or of the common cold – the realm of epidemiologists – that too can influence the price of tissue. And even the current tensions in the Middle East – the realm of foreign policy experts – can affect the price of tissue. If tensions in the Middle East exacerbate, the price of oil shoots up, driving up the price of the energy required to convert the timber into paper, and also the price of gas to transport the timber to the paper mill and the tissue to the wholesalers and to the retailer. Cross-impact analysis measures the likelihood that each of these events will occur and attempts to assess the impact they will have on the future of the event of interest.

Next Forecast Friday Topic: Judgmental Bias in Forecasting

Now that we have discussed several of the judgmental forecasting techniques available to analysts, it is obvious that, unlike quantitative methods, these techniques are not objective. Because, as their name implies, judgmental forecasting methods are based on judgment, they are highly susceptible to biases. Next week’s Forecast Friday post will discuss some of the biases that can result from judgmental forecasting methods.

Forecast Friday Topic: Stationarity in Time Series Data

January 13, 2011

(Thirty-fifth in a series)

In last week’s Forecast Friday post, we began our coverage of ARIMA modeling with a discussion of the Autocorrelation Function (ACF). We also learned that in order to generate forecasts from a time series, the series needed to exhibit no trend (either up or down), fluctuate around a constant mean and variance, and have covariances between terms in the series that depended only on the time interval between the terms, and not their absolute locations in the time series. A time series that meets these criteria is said to be stationary. When a time series appears to have a constant mean, then it is said to be stationary in the mean. Similarly, if the variance of the series doesn’t appear to change, then the series is also stationary in the variance.

Stationarity is nothing new in our discussions of time series forecasting. While we may not have discussed it in detail, we did note that the absence of stationarity made moving average methods less accurate for short-term forecasting, which led into our discussion of exponential smoothing. When the time series exhibited a trend, we relied upon double exponential smoothing to adjust for nonstationarity; in our discussions of regression analysis, we ensured stationarity by decomposing the time series (removing the trend, seasonal, cyclical, and irregular components), adding seasonal dummy variables into the model, and lagging the dependent variable. The ACF is another way of detecting seasonality. And that is what we’ll discuss today.

Recall our ACF from last week’s Forecast Friday discussion:

Because there is no discernable pattern, and because the lags pierce the ±1.96 standard error boundaries less than 5% (in fact, zero percent) of the time, this time series is stationary. Let’s do a simple plot of our time series:

A simple eyeballing of the time series plot shows that the series’ mean and variance both seem to hold fairly constant for the duration of the data set. But now let’s take a look at another data set. In the table below, which I snatched from my graduate school forecasting textbook, we have 160 quarterly observations on real gross national product:

160 Quarters of U.S. Real Gross Domestic Product

t

Xt

t

Xt

t

Xt

t

Xt

1

1,148.2

41

1,671.6

81

2,408.6

121

3,233.4

2

1,181.0

42

1,666.8

82

2,406.5

122

3,157.0

3

1,225.3

43

1,668.4

83

2,435.8

123

3,159.1

4

1,260.2

44

1,654.1

84

2,413.8

124

3,199.2

5

1,286.6

45

1,671.3

85

2,478.6

125

3,261.1

6

1,320.4

46

1,692.1

86

2,478.4

126

3,250.2

7

1,349.8

47

1,716.3

87

2,491.1

127

3,264.6

8

1,356.0

48

1,754.9

88

2,491.0

128

3,219.0

9

1,369.2

49

1,777.9

89

2,545.6

129

3,170.4

10

1,365.9

50

1,796.4

90

2,595.1

130

3,179.9

11

1,378.2

51

1,813.1

91

2,622.1

131

3,154.5

12

1,406.8

52

1,810.1

92

2,671.3

132

3,159.3

13

1,431.4

53

1,834.6

93

2,734.0

133

3,186.6

14

1,444.9

54

1,860.0

94

2,741.0

134

3,258.3

15

1,438.2

55

1,892.5

95

2,738.3

135

3,306.4

16

1,426.6

56

1,906.1

96

2,762.8

136

3,365.1

17

1,406.8

57

1,948.7

97

2,747.4

137

3,451.7

18

1,401.2

58

1,965.4

98

2,755.2

138

3,498.0

19

1,418.0

59

1,985.2

99

2,719.3

139

3,520.6

20

1,438.8

60

1,993.7

100

2,695.4

140

3,535.2

21

1,469.6

61

2,036.9

101

2,642.7

141

3,577.5

22

1,485.7

62

2,066.4

102

2,669.6

142

3,599.2

23

1,505.5

63

2,099.3

103

2,714.9

143

3,635.8

24

1,518.7

64

2,147.6

104

2,752.7

144

3,662.4

25

1,515.7

65

2,190.1

105

2,804.4

145

2,721.1

26

1,522.6

66

2,195.8

106

2,816.9

146

3,704.6

27

1,523.7

67

2,218.3

107

2,828.6

147

3,712.4

28

1,540.6

68

2,229.2

108

2,856.8

148

3,733.6

29

1,553.3

69

2,241.8

109

2,896.0

149

3,781.2

30

1,552.4

70

2,255.2

110

2,942.7

150

3,820.3

31

1,561.5

71

2,287.7

111

3,001.8

151

3,858.9

32

1,537.3

72

2,300.6

112

2,994.1

152

3,920.7

33

1,506.1

73

2,327.3

113

3,020.5

153

3,970.2

34

1,514.2

74

2,366.9

114

3,115.9

154

4,005.8

35

1,550.0

75

2,385.3

115

3,142.6

155

4,032.1

36

1,586.7

76

2,383.0

116

3,181.6

156

4,059.3

37

1,606.4

77

2,416.5

117

3,181.7

157

4,095.7

38

1,637.0

78

2,419.8

118

3,178.7

158

4,112.2

39

1,629.5

79

2,433.2

119

3,207.4

159

4,129.7

40

1,643.4

80

2,423.5

120

3,201.3

160

4,133.2

Reprinted from Introductory Business & Economic Forecasting, 2nd Ed., Newbold, P. and Bos, T., Cincinnati, 1994, pp. 362-3.

Let’s plot the series:

As you can see, the series is on a steady, upward climb. The mean of the series appears to be changing, and moving upward; hence the series is likely not stationary. Let’s take a look at the ACF:

Wow! The ACF for the real GDP is in sharp contrast to our random series example above. Notice the lags: they are not cutting off. Each lag is quite strong. And the fact that most of them pierce the ±1.96 standard error line is clearly proof that the series is not white noise. Since the lags in the ACF are declining very slowly, that means that terms in the series are correlated several periods in the past. Because this series is not stationary, we must transform it into a stationary time series so that we can build a model with it.

Removing Nonstationarity: Differencing

The most common way to remove nonstationarity is to difference the time series. We talked about differencing in our discussion on correcting multicollinearity, and we mentioned quasi-differencing in our discussion on correcting autocorrelation. The concept is the same here. Differencing a series is pretty straightforward. We subtract the first value from the second, the second value from the third, and so forth. Subtracting a period’s value from its immediate subsequent period’s value is called first differencing. The formula for a first difference is given as:

 

Let’s try it with our series:

When we difference our series, our plot of the differenced data looks like this:

As you can see, the differenced series is much smoother, except towards the end where we have two points where real GDP dropped or increased sharply. The ACF looks much better too:

As you can see, only the first lag breaks through the ±1.96 standard errors line. Since it is only 5% of the lags displayed, we can conclude that the differenced series is stationary.

Second Order Differencing

Sometimes, first differencing doesn’t eliminate all nonstationarity, so a differencing must be performed on the differenced series. This is called second order differencing. Differencing can go on multiple times, but very rarely does an analyst need to go beyond second order differencing to achieve stationarity. The formula for second order differencing is as follows:

We won’t show an example of second order differencing in this post, and it is important to note that second order differencing is not to be confused with second differencing, which is to subtract the value two periods prior to the current period from the value of the current period. 

Seasonal Differencing

Seasonality can greatly affect a time series and make it appear nonstationary. As a result, the data set must be differenced for seasonality, very similar to seasonally adjusting a time series before performing a regression analysis. We will discuss seasonal differencing later in this ARIMA miniseries.

Recap

Before we can generate forecasts upon a time series, we must be sure our data set is stationary. Trend and seasonal components must be removed in order to generate accurate forecasts. We built on last week’s discussion of the autocorrelation function (ACF) to show how it could be used to detect stationarity – or the absence of it. When a data series is not stationary, one of the key ways to remove the nonstationarity is through differencing. The concept behind differencing is not unlike the other methods we’ve used in past discussions on forecasting: seasonal adjustment, seasonal dummy variables, lagging dependent variables, and time series decomposition.

Next Forecast Friday Topic: MA, AR, and ARMA Models

Our discussion of ARIMA models begins to hit critical mass with next week’s discussion on moving average (MA), autoregressive (AR), and autoregressive moving average (ARMA) models. This is where we begin the process of identifying the model to build for a dataset, and how to use the ACF and partial ACF (PACF) to determine whether an MA, AR, or ARMA model is the best fit for the data. That discussion will lay the foundation for our next three Forecast Friday discussions, where we delve deeply into ARIMA models.

 

*************************

What is your biggest gripe about using data? Tell us in our discussion on Facebook!

Is there a recurring issue about data analysis – or manipulation – that always seems to rear its ugly head?  What issues about data always seem to frustrate you?  What do you do about it?  Readers of Insight Central would love to know.  Join our discussion on Facebook. Simply go to our Facebook page and click on the “Discussion” tab and share your thoughts!   While you’re there, be sure to “Like” Analysights’ Facebook page so that you can always stay on top of the latest insights on marketing research, predictive modeling, and forecasting, and be aware of each new Insight Central post and discussions!  You can even follow us on Twitter!  So get this New Year off right and check us out on Facebook and Twitter!

Forecast Friday Topic: Decomposing a Time Series

September 9, 2010

(Twentieth in a series)

Welcome to our 20th Forecast Friday post. The last four months have been quite a journey, as we went through the various time series methods like moving average models, exponential smoothing models, and regression analysis, followed by in-depth discussions of the assumptions behind regression analysis and the consequences and remedies of violating those assumptions. Today, we resume the more practical aspects of time series analysis, with a discussion of decomposing a time series. If you recall from our May 3 post, a time series consists of four components: a trend component; a seasonal component; a cyclical component; and an irregular, or random, component. Today, we will show you how to isolate and control for these components, using the fictitious example of Billie Burton, a self-employed gift basket maker.

Billie Burton’s Gifts

Billie Burton has always loved making gift baskets and care packages, and has run her own business for the last 10 years. Billie knows that business seems to be increasing year after year, but she also knows that her business is seasonal. Billie is also certain that people don’t buy as many care packages and gift baskets when the economy is slow. She is trying to assess the impact of each of these components on her business. Since Billie’s business is a one-person shop and all her gift baskets are handmade (she doesn’t make the baskets or their contents, but assembles them, wraps them decoratively, and ships them), she is more concerned right now with forecasting the number of gift basket orders, rather than sales, so that she could estimate her workload.

So Billie pulls together her monthly orders for the years 2005-2009. They look like this:

Month 

TOTAL GIFT BASKET ORDERS 

2005 

2006 

2007 

2008 

2009 

January 

15 

18 

22 

26 

31 

February 

30 

36 

43 

52 

62 

March 

25 

18 

22 

43 

32 

April 

15 

30 

36 

27 

52 

May 

13 

16 

19 

23 

28 

June 

14 

17 

20 

24 

29 

July

12 

14 

17 

20 

24 

August 

22 

26 

31 

37 

44 

September 

20 

24 

29 

35 

42 

October 

14 

17 

20 

24 

29 

November 

35 

42 

50 

60 

72 

December 

40 

48 

58 

70 

84 

 

Trend Component

When a variable exhibits a long-term increase or decrease over the course of time, it is said to have a trend. Billie’s gift basket orders for the past five years exhibit a long-term, upward trend, as shown by the time series plot below:

Although the graph looks pretty busy and bumpy, you can see that Billie’s monthly orders seem to be moving upward over the course of time. Notice that we fit a straight line across Billie’s time series. This is a linear trend line. Most times, we plot the data in a time series and then draw a straight line freehand to show whether a trend is increasing or decreasing. Another approach to fitting a trend line – like the one I used here – is to use simple regression analysis, using each time period, t, as the independent variable, and numbering each period in sequential order. Hence, January 2005 would be t=1 and December 2009 would be t=60. This is very similar to the approach we discussed in our May 27 blog post when we demonstrated how our other fictitious businesswoman, Sue Stone, could forecast her sales.

In using regression analysis, to fit our trend line, we would get the following equation:

Ŷ= 0.518t +15.829

Since the slope of the trend line is positive, we know that the trend is upward. Billie’s orders seem to increase by slightly more than half an order each month, on average. However, when we look at the R2, we get just .313, suggesting the trend line doesn’t fit the actual data well. But that is because of the drastic seasonality in the data set, which we will address shortly. For now, we at least know that the trend is increasing.

Seasonal Component

When a time series shows a repeating pattern over time, usually during the same time of the year, that pattern is known as the seasonal component in the time series. Some time series have more than one period in the year in which seasonality is strong; others have no seasonality. If you look at each of the January points, you’ll notice that it is greatly lower than the preceding December and the following February. Also, if you look at each December, you’ll see that it is the highest point of orders for each year. This strongly suggests seasonality in the data.

But what is the impact of the seasonality? We find out by isolating the seasonal component and creating a seasonal index, known as the ratio to moving average. Computing the ratio to moving average is a four-step process:

First, take the moving average of the series

Since our data is monthly, we will be taking a 12-month moving average. If our data was quarterly, we would do a 4-quarter moving average. We’ve essentially done this in the third column of the table below.

Second, center the moving averages

Next, we center the moving averages by taking the average of each successive pair of moving averages, the result is shown in the fourth column.

Third, compute the ratio to moving average

To obtain the ratio to moving average, divide the number of orders for a given month by the centered 12-month moving average that corresponds to that month. Notice that July 2005 is the first month to have a centered 12-month moving average. That is because we lose data points when we take a moving average. For July 2005, we divide its number of orders, 12, by its centered 12-month moving average, 21.38, and get .561 (the number’s multiplied by 100 for percentages, in this example).

Month 

Orders 

12-Month Moving Average 

Centered 12-Month Moving Average

Ratio to Moving Average (%) 

Jan-05 

15 

     

Feb-05 

30 

     

Mar-05 

25 

     

Apr-05 

15 

     

May-05 

13 

     

Jun-05 

14 

21.25  

   

Jul-05 

12 

21.50  

21.38  

56.1  

Aug-05 

22 

22.00  

21.75  

101.1  

Sep-05 

20 

21.42  

21.71  

92.1  

Oct-05 

14 

22.67  

22.04  

63.5  

Nov-05 

35 

22.92  

22.79  

153.6  

Dec-05 

40 

23.17  

23.04  

173.6  

Jan-06 

18 

23.33  

23.25

77.4  

Feb-06 

36 

23.67  

23.50  

153.2  

Mar-06 

18 

24.00  

23.83  

75.5  

Apr-06 

30 

24.25  

24.13  

124.4  

May-06 

16 

24.83  

24.54  

65.2  

Jun-06 

17 

25.50  

25.17  

67.5  

Jul-06 

14 

25.83  

25.67  

54.5  

Aug-06 

26 

26.42  

26.13  

99.5  

Sep-06 

24 

26.75  

26.58  

90.3  

Oct-06 

17 

27.25  

27.00

63.0  

Nov-06 

42 

27.50  

27.38  

153.4  

Dec-06 

48 

27.75  

27.63  

173.8  

Jan-07 

22 

28.00  

27.88  

78.9  

Feb-07 

43 

28.42  

28.21  

152.4  

Mar-07 

22 

28.83  

28.63  

76.9  

Apr-07 

36 

29.08  

28.96  

124.3  

May-07 

19 

29.75  

29.42  

64.6  

Jun-07 

20 

30.58  

30.17  

66.3  

Jul-07 

17 

30.92  

30.75  

55.3  

Aug-07 

31 

31.67  

31.29  

99.1  

Sep-07 

29 

33.42  

32.54  

89.1  

Oct-07 

20 

32.67  

33.04  

60.5  

Nov-07 

50 

33.00  

32.83  

152.3  

Dec-07 

58 

33.33  

33.17  

174.9  

Jan-08 

26 

33.58  

33.46  

77.7  

Feb-08 

52 

34.08  

33.83  

153.7  

Mar-08 

43 

34.58  

34.33  

125.2  

Apr-08 

27 

34.92  

34.75  

77.7  

May-08 

23 

35.75  

35.33  

65.1  

Jun-08 

24 

36.75  

36.25  

66.2  

Jul-08 

20 

37.17  

36.96  

54.1  

Aug-08 

37 

38.00  

37.58  

98.4  

Sep-08 

35 

37.08  

37.54  

93.2  

Oct-08 

24 

39.17  

38.13  

63.0  

Nov-08 

60 

39.58  

39.38  

152.4  

Dec-08 

70 

40.00  

39.79  

175.9  

Jan-09 

31 

40.33  

40.17  

77.2  

Feb-09 

62 

40.92  

40.63  

152.6  

Mar-09 

32 

41.50  

41.21  

77.7  

Apr-09 

52 

41.92  

41.71  

124.7  

May-09 

28 

42.92  

42.42  

66.0  

Jun-09 

29 

44.08  

43.50  

66.7  

Jul-09 

24 

     

Aug-09 

44 

     

Sep-09 

42 

     

Oct-09 

29 

     

Nov-09 

72 

     

Dec-09 

84 

  

  

  

 

We have exactly 48 months of ratios to examine. Lets plot each year’s ratios on a graph:

At first glance, it appears that there are only two lines on the graphs, those for years three and four. However, all four years are represented on this graph. It’s just that all the turning points are the same, and the ratio to moving averages for each month are nearly identical. The only difference is in Year 3 (July 2007 to June 2008). Notice how the green line for year three doesn’t follow the same pattern as the other years, from February to April. Year 3’s ratio to moving average is actually higher for March than in all previous years, and lower for April. This is because Easter Sunday fell in late March 2008, so the Easter gift basket season was moved a couple weeks earlier than in prior years.

Finally, compute the average seasonal index for each month

We now have the ratio to moving averages for each month. Let’s average them:

RATIO TO MOVING AVERAGES 

Month 

Year 1 

Year 2 

Year 3 

Year 4 

Average 

July 

0.56 

0.55 

0.55 

0.54 

0.55 

August 

1.01 

1.00 

0.99 

0.98 

1.00 

September 

0.92 

0.90 

0.89 

0.93 

0.91 

October 

0.64 

0.63 

0.61 

0.63 

0.62 

November

1.54 

1.53 

1.52 

1.52 

1.53 

December 

1.74 

1.74 

1.75 

1.76 

1.75 

January 

0.77 

0.79 

0.78 

0.77 

0.78 

February 

1.53 

1.52 

1.54 

1.53 

1.53 

March 

0.76 

0.77 

1.25 

0.78 

0.89 

April 

1.24 

1.24 

0.78 

1.25 

1.13 

May 

0.65 

0.65 

0.65 

0.66 

0.65 

June 

0.68 

0.66 

0.66 

0.67 

0.67

Hence, we see that August is a normal month (the average seasonal index =1). However, look at December. Its seasonal index is 1.75. That means that Billie’s orders are generally 175 percent higher than the monthly average in December. Given the Christmas gift giving season, that’s expected in Billie’s gift basket business. We also notice higher seasonal indices in November (when the Christmas shopping season kicks off), February (Valentine’s Day), and in April (Easter). The other months tend to be below average.

Notice that April isn’t superbly high above the baseline and that March had one year where it’s index was 1.25 (when in other years it was under 0.80). That’s because Easter sometimes falls in late March. Stuff like this is important to keep track of, since it can dramatically impact planning. Also, if a given month has five weekends one year and only 4 weekends the next; or if leap year adds one day in February every four years, depending on your business, these events can make a big difference in the accuracy of your forecasts.

The Cyclical and Irregular Components

Now that we’ve isolated the trend and seasonal components, we know that Billie’s orders exhibit an increasing trend and that orders tend to be above average during November, December, February, and April. Now we need to isolate the cyclical and seasonal components. Cyclical variations don’t repeat themselves in a regular pattern, but they are not random variations either. Cyclical patterns are recognizable, but they almost always vary in intensity (the height from peak to trough) and timing (frequency with which the peaks and troughs occur). Since they cannot be accurately predicted, they are often analyzed with the irregular components.

The way we isolate the cyclical and irregular components is by first isolating the trend and seasonal components like we did above. So we take our trend regression equation from above, plug in each month’s sequence number to get the trend value. Then we multiply it by that month’s average seasonal ratio to moving average to derive the statistical normal. To derive the cyclical/irregular component, we divide the actual orders for that month by the statistical normal. The following table shows us how:

Month 

Orders  

Time Period 

Trend Value

Seasonal Index Ratio 

Statistical Normal 

Cyclical – Irregular Component (%) 

Y 

t 

T 

S 

T*S 

100*Y/(T*S) 

Jan-05 

15 

1 

16 

0.78  

12.72  

117.92  

Feb-05 

30 

2 

17 

1.53  

25.80  

116.27  

Mar-05 

25 

3 

17 

0.89  

15.44  

161.91  

Apr-05 

15 

4 

18 

1.13  

20.19  

74.31  

May-05 

13 

5 

18 

0.65  

12.01  

108.20  

Jun-05 

14 

6 

19 

0.67  

12.63  

110.86  

Jul-05

12 

7 

19 

0.55  

10.71  

112.09  

Aug-05 

22 

8 

20 

1.00  

19.88  

110.64  

Sep-05 

20 

9 

20 

0.91  

18.69  

107.02  

Oct-05 

14 

10 

21 

0.62  

13.13  

106.63

Nov-05 

35 

11 

22 

1.53  

32.92  

106.31  

Dec-05 

40 

12 

22 

1.75  

38.48  

103.95  

Jan-06 

18 

13 

23 

0.78  

17.56  

102.52  

Feb-06 

36 

14 

23 

1.53  

35.31

101.94  

Mar-06 

18 

15 

24 

0.89  

20.96  

85.86  

Apr-06 

30 

16 

24 

1.13  

27.20  

110.30  

May-06 

16 

17 

25 

0.65  

16.07  

99.57  

Jun-06 

17 

18 

25 

0.67

16.77  

101.34  

Jul-06 

14 

19 

26 

0.55  

14.13  

99.10  

Aug-06 

26 

20 

26 

1.00  

26.07  

99.72  

Sep-06 

24 

21 

27 

0.91  

24.36  

98.53  

Oct-06 

17

22 

27 

0.62  

17.02  

99.91  

Nov-06 

42 

23 

28 

1.53  

42.43  

98.99  

Dec-06 

48 

24 

28 

1.75  

49.33  

97.30  

Jan-07 

22 

25 

29 

0.78  

22.40  

98.23  

Feb-07 

43 

26 

29 

1.53  

44.83  

95.92  

Mar-07 

22 

27 

30 

0.89  

26.49  

83.06  

Apr-07 

36 

28 

30 

1.13  

34.21  

105.23  

May-07 

19 

29 

31 

0.65  

20.13  

94.41  

Jun-07 

20 

30 

31 

0.67  

20.92  

95.60  

Jul-07 

17 

31 

32 

0.55  

17.55  

96.88  

Aug-07 

31 

32 

32 

1.00  

32.26  

96.08  

Sep-07 

29 

33 

33 

0.91  

30.03  

96.58  

Oct-07 

20 

34 

33 

0.62  

20.90  

95.69  

Nov-07 

50 

35 

34 

1.53  

51.94  

96.27  

Dec-07 

58 

36 

34 

1.75  

60.19  

96.37

Jan-08 

26 

37 

35 

0.78  

27.23  

95.47  

Feb-08 

52 

38 

36 

1.53  

54.34  

95.70  

Mar-08 

43 

39 

36 

0.89  

32.01  

134.34  

Apr-08 

27 

40 

37 

1.13  

41.22  

65.50  

May-08 

23 

41 

37 

0.65  

24.18  

95.12  

Jun-08 

24 

42 

38 

0.67  

25.07  

95.75  

Jul-08 

20 

43 

38 

0.55  

20.97  

95.38  

Aug-08 

37 

44 

39 

1.00

38.45  

96.22  

Sep-08 

35 

45 

39 

0.91  

35.70  

98.05  

Oct-08 

24 

46 

40 

0.62  

24.79  

96.83  

Nov-08 

60 

47 

40 

1.53  

61.44  

97.65  

Dec-08 

70

48 

41 

1.75  

71.04  

98.54  

Jan-09 

31 

49 

41 

0.78  

32.07  

96.66  

Feb-09 

62 

50 

42 

1.53  

63.85  

97.10  

Mar-09 

32 

51 

42 

0.89  

37.53  

85.26  

Apr-09 

52 

52 

43 

1.13  

48.23  

107.81  

May-09 

28 

53 

43 

0.65  

28.24  

99.16  

Jun-09 

29 

54 

44 

0.67  

29.21  

99.27  

Jul-09 

24 

55 

44 

0.55  

24.39  

98.40  

Aug-09 

44 

56 

45 

1.00  

44.64  

98.56  

Sep-09 

42 

57 

45 

0.91  

41.37  

101.53  

Oct-09 

29 

58 

46 

0.62  

28.67  

101.14  

Nov-09 

72 

59 

46 

1.53  

70.95  

101.48  

Dec-09 

84 

60 

47 

1.75  

81.89  

102.58  

 

For the most part, Billie’s orders don’t seem to exhibit much cyclical or irregular behavior. In most months, the cyclical-irregular component ratio is pretty close to 100. Given her kind of business, we know this would be either not true or a fluke, since the recession of 2008 through 2009 would likely have meant a reduction in orders. In much of those months, we would expect to see a ratio well below 100. We do see that in much of 2005, the cyclical-irregular component for Billie’s gift basket orders are well above 100. It is very likely that in these years, Billie’s business was seeing a positive cyclical pattern. We then see irregular patterns in March and April of later years, where the cyclical-irregular component is also well above 100. That’s again the irregularity of when Easter falls. Not surprisingly, Easter has both a seasonal and irregular component!

This does not mean that Billie can kick up her feet and rest assured knowing that her business doesn’t suffer much from cyclical or irregular patterns. A deepening of the recession can ultimately sink her orders; a war can cut off the materials that are used to produce her gift baskets; a shortage or drastic price increase in the materials she uses can also force her prices higher, which in turn lowers her orders; her workshop could be destroyed in a flood or fire; and so on. To handle some of these irregular patterns – which are almost impossible to plan for – Billie would purchase insurance.

**********************************

Knowing the composition of a time series is an important element of forecasting. Decomposing the time series helps decision makers know and explain the variability in their data and how much of it to attribute it to trend, seasonal, cyclical and irregular components. In next week’s Forecast Friday post, we’ll discuss how to forecast using data that is seasonally-adjusted.

Forecast Friday Topic: Detecting Autocorrelation

July 29, 2010

(Fifteenth in a series)

We have spent the last few Forecast Friday posts discussing violations of different assumptions in regression analysis. So far, we have discussed the effects of specification bias and multicollinearity on parameter estimates, and their corresponding effect on your forecasts. Today, we will discuss another violation, autocorrelation, which occurs when sequential residual (error) terms are correlated with one another.

When working with time series data, autocorrelation is the most common problem forecasters face. When the assumption of uncorrelated residuals is violated, we end up with models that have inefficient parameter estimates and upwardly-biased t-ratios and R2 values. These inflated values make our forecasting model appear better than it really is, and can cause our model to miss turning points. Hence, if you’re model is predicting an increase in sales and you, in actuality, see sales plunge, it may be due to autocorrelation.

What Does Autocorrelation Look Like?

Autocorrelation can take on two types: positive or negative. In positive autocorrelation, consecutive errors usually have the same sign: positive residuals are almost always followed by positive residuals, while negative residuals are almost always followed by negative residuals. In negative autocorrelation, consecutive errors typically have opposite signs: positive residuals are almost always followed by negative residuals and vice versa.

In addition, there are different orders of autocorrelation. The simplest, most common kind of autocorrelation, first-order autocorrelation, occurs when the consecutive errors are correlated. Second-order autocorrelation occurs when error terms two periods apart are correlated, and so forth. Here, we will concentrate solely on first-order autocorrelation.

You will see a visual depiction of positive autocorrelation later in this post.

What Causes Autocorrelation?

The two main culprits for autocorrelation are sluggishness in the business cycle (also known as inertia) and omitted variables from the model. At various turning points in a time series, inertia is very common. At the time when a time series turns upward (downward), its observations build (lose) momentum, and continue going up (down) until the series reaches its peak (trough). As a result, successive observations and the error terms associated with them depend on each other.

Another example of inertia happens when forecasting a time series where the same observations can be in multiple successive periods. For example, I once developed a model to forecast enrollment for a community college, and found autocorrelation to be present in my initial model. This happened because many of the students enrolled during the spring term were also enrolled in the previous fall term. As a result, I needed to correct for that.

The other main cause of autocorrelation is omitted variables from the model. When an important independent variable is omitted from a model, its effect on the dependent variable becomes part of the error term. Hence, if the omitted variable has a positive correlation with the dependent variable, it is likely to cause error terms that are positively correlated.

How Do We Detect Autocorrelation?

To illustrate how we go about detecting autocorrelation, let’s first start with a data set. I have pulled the average hourly wages of textile and apparel workers for the 18 months from January 1986 through June 1987. The original source was the Survey of Current Business, September issues from 1986 and 1987, but this data set was reprinted in Data Analysis Using Microsoft ® Excel, by Michael R. Middleton, page 219:

Month

t

Wage

Jan-86

1

5.82

Feb-86

2

5.79

Mar-86

3

5.8

Apr-86

4

5.81

May-86

5

5.78

Jun-86

6

5.79

Jul-86

7

5.79

Aug-86

8

5.83

Sep-86

9

5.91

Oct-86

10

5.87

Nov-86

11

5.87

Dec-86

12

5.9

Jan-87

13

5.94

Feb-87

14

5.93

Mar-87

15

5.93

Apr-87

16

5.94

May-87

17

5.89

Jun-87

18

5.91

Now, let’s run a simple regression model, using time period t as the independent variable and Wage as the dependent variable. Using the data set above, we derive the following model:

Ŷ = 5.7709 + 0.0095t

Examine the Model Output

Notice also the following model diagnostic statistics:

R2=

0.728

Variable

Coefficient

t-ratio

Intercept

5.7709

367.62

t

0.0095

6.55

 

You can see that the R2 is a high number, with changes in t explaining nearly three-quarters the variation in average hourly wage. Note also the t-ratios for both the intercept and the parameter estimate for t. Both are very high. Recall that a high R2 and high t-ratios are symptoms of autocorrelation.

Visually Inspect Residuals

Just because a model has a high R2 and parameters with high t-ratios doesn’t mean autocorrelation is present. More work must be done to detect autocorrelation. Another way to check for autocorrelation is to visually inspect the residuals. The best way to do this is through plotting the average hourly wage predicted by the model against the actual average hourly wage, as Middleton has done:

Notice the green line representing the Predicted Wage. It is a straight, upward line. This is to be expected, since the independent variable is sequential and shows an increasing trend. The red line depicts the actual wage in the time series. Notice that the model’s forecast is higher than actual for months 5 through 8, and for months 17 and 18. The model also underpredicts for months 12 through 16. This clearly illustrates the presence of positive, first-order autocorrelation.

The Durbin-Watson Statistic

Examining the model components and visually inspecting the residuals are intuitive, but not definitive ways to diagnose autocorrelation. To really be sure if autocorrelation exists, we must compute the Durbin-Watson statistic, often denoted as d.

In our June 24 Forecast Friday post, we demonstrated how to calculate the Durbin-Watson statistic. The actual formula is:

That is, beginning with the error term for the second observation, we subtract the immediate previous error term from it; then we square the difference. We do this for each observation from the second one onward. Then we sum all of those squared differences together. Next, we square the error terms for each observation, and sum those together. Then we divide the sum of squared differences by the sum of squared error terms, to get our Durbin-Watson statistic.

For our example, we have the following:

t

Error

Squared Error

et-et-1

Squared Difference

1

0.0396

0.0016

     

2

0.0001

0.0000

(0.0395) 0.0016

3

0.0006

0.0000

0.0005 0.0000

4

0.0011

0.0000

0.0005 0.0000

5

(0.0384)

0.0015

(0.0395) 0.0016

6

(0.0379)

0.0014

0.0005 0.0000

7

(0.0474)

0.0022

(0.0095) 0.0001

8

(0.0169)

0.0003

0.0305 0.0009

9

0.0536

0.0029

0.0705 0.0050

10

0.0041

0.0000

(0.0495) 0.0024

11

(0.0054)

0.0000

(0.0095) 0.0001

12

0.0152

0.0002

0.0205 0.0004

13

0.0457

0.0021

0.0305 0.0009

14

0.0262

0.0007

(0.0195) 0.0004

15

0.0167

0.0003

(0.0095) 0.0001

16

0.0172

0.0003

0.0005 0.0000

17

(0.0423)

0.0018

(0.0595) 0.0035

18

(0.0318)

0.0010

0.0105 0.0001
  

Sum:

0.0163

  

0.0171

 

To obtain our Durbin-Watson statistic, we plug our sums into the formula:

= 1.050

What Does the Durbin-Watson Statistic Tell Us?

Our Durbin-Watson statistic is 1.050. What does that mean? The Durbin-Watson statistic is interpreted as follows:

  • If d is close to zero (0), then positive autocorrelation is probably present;
  • If d is close to two (2), then the model is likely free of autocorrelation; and
  • If d is close to four (4), then negative autocorrelation is probably present.

As we saw from our visual examination of the residuals, we appear to have positive autocorrelation, and the fact that our Durbin-Watson statistic is about halfway between zero and two suggests the presence of positive autocorrelation.

Next Forecast Friday Topic: Correcting Autocorrelation

Today we went through the process of understanding the causes and effect of autocorrelation, and how to suspect and detect its presence. Next week, we will discuss how to correct for autocorrelation and eliminate it so that we can have more efficient parameter estimates.

*************************

If you Like Our Posts, Then “Like” Us on Facebook and Twitter!

Analysights is now doing the social media thing! If you like Forecast Friday – or any of our other posts – then we want you to “Like” us on Facebook! By “Like-ing” us on Facebook, you’ll be informed every time a new blog post has been published, or when other information comes out. Check out our Facebook page! You can also follow us on Twitter.