Forecast Friday Topic: Seasonal Dummy Variables

(Twenty-third in a series)

Last week, I introduced you to the use of dummy variables as a means of incorporating qualitative information into a regression model. Dummy variables can also be used to account for seasonality. A couple of weeks ago, we discussed adjusting your data for seasonality before constructing your model. As you saw, that could be pretty time consuming. One faster approach would be to take the raw time series data and add a dummy variable for each season of the year less one. So, if you’re working with quarterly data, you would want to use three dummy variables; if you have monthly variables, you want to add in 11 dummy variables.

For example, the fourth quarter of the year is often the busiest for most retailers. If a retail chain didn’t seasonally adjust its data, it might choose to create three dummy variables: D1, D2, and D3. The first quarter of the year would be D1; the second quarter, D2 ; and the third quarter, D3. As we discussed last week, we always want to have one fewer dummy variable than we do outcomes. In our example, if we know the fourth quarter is the busiest quarter, then we would expect our three dummy variables to be significant and negative.

Revisiting Billie Burton

A couple of weeks ago, while discussing how to decompose a time series, I used the example of Billie Burton, a businesswoman who makes gift baskets. Billie had been trying to forecast orders for planning and budgeting purposes. She had five years of monthly order data:

Month

TOTAL GIFT BASKET ORDERS

2005

2006

2007

2008

2009

January

15

18

22

26

31

February

30

36

43

52

62

March

25

18

22

43

32

April

15

30

36

27

52

May

13

16

19

23

28

June

14

17

20

24

29

July

12

14

17

20

24

August

22

26

31

37

44

September

20

24

29

35

42

October

14

17

20

24

29

November

35

42

50

60

72

December

40

48

58

70

84

 

You recall the painstaking effort we went through to adjust Billie’s orders for seasonality. Is there a simpler way? Yes. We can use dummy variables. Let’s first assume Billie ran her regression on the data just as it is, with no adjustment for seasonality. She ends up with the following regression equation:

Ŷ= 0.518t +15.829

This model suggests an upward trend with each passing month but doesn’t fit the data quite as well as we would like: R2 is just 0.313 and the F-statistic is just 26.47.

Imagine now that Billie decides to use seasonal dummy variables. Since her data is monthly, Billie must use 11 dummy variables. Since December is her busiest month, Billie decides to make one dummy variable for each month from January to November. D1 is January; D2 is February; and so on until D11 , which is November. Hence, in January, D1 will be flagged as a 1 and D2 to D11 will be 0. In February, D2 will equal 1 while all the other dummies will be zero. And so forth. Note that all dummies will be zero in December.

Picture in your mind a table with 60 rows and 13 columns. Each row contains the monthly data from January 2005 to December 2009. The first column is the number of orders for the month; the second is the time period, t, which is 1 to 60. That is our independent variable from our original model. The next eleven columns are the dummy variables. Billie enters these into Excel and runs her regression. What does she get?

I’m going to show you the resulting equation in tabular form, as it would look far too complicated in standard form. Billie gets the following output:

Parameter

Coefficients

t Stat

Intercept

42.93

15.93

t

0.47

12.13

D1 (January)

-32.38

-9.88

D2 (February)

-10.66

-3.26

D3 (March)

-27.73

-8.48

D4 (April)

-24.21

-7.41

D5 (May)

-36.88

-11.31

D6 (June)

-36.35

-11.16

D7 (July)

-40.23

-12.35

D8 (August)

-26.10

-8.02

D9 (September)

-28.58

-8.79

D10 (October)

-38.25

-11.76

D11 (November)

-7.73

-2.38

 

Billie gets a great model: notice that all the parameter estimates are significant, and they’re all negative, indicating December as the busiest month. Billie’s R2 has now shot up to 0.919, indicating an even better fit. And the F-statistic is up to 44.73, and it is more significant.

How does this compare to Billie’s model on her seasonally-adjusted data? Recall that when doing her regressions on seasonally adjusted data, Billie got the following results:

Ŷ = 0.47t +17.12

Her model had an R2 of 0.872, but her F-statistic was almost 395! So, even though Billie gained a few more points in R2 with the seasonal dummies, her F-statistic wasn’t quite as significant. However, Billie’s F-statistic using the dummy variables is still very strong, and I would argue more stable. Recall that the F-statistic is determined by dividing the mean squared error of the regression by the mean squared error of the residuals. The mean squared error of the regression is the sum of squares regression (RSS) divided by the number of independent variables in the model; the mean squared error of the residuals is the Sum of Squared Error (SSE) divided by the number of observations less the number of independent variables and less one more. To illustrate, here is a side by side comparison:

   Seasonally Adjusted Model Seasonal Dummy Model
# Observations

60

60

SSR

3,982

14,179

# Independent Variables

1

12

Mean Square Error of Regression

3,982

1,182

SSE

585

1,241

Degrees of Freedom

58

47

Mean Squared Error of Residuals

10.08

26.41

F-Statistic

394.91

44.73

So, although the F-statistic is much lower for the seasonal dummy model, the mean square error of the regression is also much lower. As a result, the F-statistic is still quite significant, but much more stable than our one variable model built on the seasonally-adjusted data.

It is important to note that sometimes data sets do not lend themselves well to seasonal dummies, and that the manual adjustment process we worked through a few weeks ago may be a better approach.

Next Forecast Friday Topic: Slope Dummy Variables

The dummy variables we worked with last week and this week are intercept dummies. These dummy variables alter the Y-intercept of the regression equation. Sometimes, it is necessary to affect the slope of the equation. We will discuss how slope dummies are used in next week’s Forecast Friday post.

*************************

If you Like Our Posts, Then “Like” Us on Facebook and Twitter!

Analysights is now doing the social media thing! If you like Forecast Friday – or any of our other posts – then we want you to “Like” us on Facebook! By “Like-ing” us on Facebook, you’ll be informed every time a new blog post has been published, or when other information comes out. Check out our Facebook page! You can also follow us on Twitter.

Tags: , , , , , , , , , , ,

One Response to “Forecast Friday Topic: Seasonal Dummy Variables”

  1. Forecast Friday Topic: Stationarity in Time Series Data « Insight Central Says:

    […] the time series (removing the trend, seasonal, cyclical, and irregular components), adding seasonal dummy variables into the model, and lagging the dependent variable. The ACF is another way of detecting […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: