## Forecast Friday Topic: Seasonal Dummy Variables

(Twenty-third in a series)

Last week, I introduced you to the use of dummy variables as a means of incorporating qualitative information into a regression model. Dummy variables can also be used to account for seasonality. A couple of weeks ago, we discussed adjusting your data for seasonality before constructing your model. As you saw, that could be pretty time consuming. One faster approach would be to take the raw time series data and add a dummy variable for each season of the year less one. So, if you’re working with quarterly data, you would want to use three dummy variables; if you have monthly variables, you want to add in 11 dummy variables.

For example, the fourth quarter of the year is often the busiest for most retailers. If a retail chain didn’t seasonally adjust its data, it might choose to create three dummy variables: D1, D2, and D3. The first quarter of the year would be D1; the second quarter, D2 ; and the third quarter, D3. As we discussed last week, we always want to have one fewer dummy variable than we do outcomes. In our example, if we know the fourth quarter is the busiest quarter, then we would expect our three dummy variables to be significant and negative.

Revisiting Billie Burton

A couple of weeks ago, while discussing how to decompose a time series, I used the example of Billie Burton, a businesswoman who makes gift baskets. Billie had been trying to forecast orders for planning and budgeting purposes. She had five years of monthly order data:

 Month TOTAL GIFT BASKET ORDERS 2005 2006 2007 2008 2009 January 15 18 22 26 31 February 30 36 43 52 62 March 25 18 22 43 32 April 15 30 36 27 52 May 13 16 19 23 28 June 14 17 20 24 29 July 12 14 17 20 24 August 22 26 31 37 44 September 20 24 29 35 42 October 14 17 20 24 29 November 35 42 50 60 72 December 40 48 58 70 84

You recall the painstaking effort we went through to adjust Billie’s orders for seasonality. Is there a simpler way? Yes. We can use dummy variables. Let’s first assume Billie ran her regression on the data just as it is, with no adjustment for seasonality. She ends up with the following regression equation:

Ŷ= 0.518t +15.829

This model suggests an upward trend with each passing month but doesn’t fit the data quite as well as we would like: R2 is just 0.313 and the F-statistic is just 26.47.

Imagine now that Billie decides to use seasonal dummy variables. Since her data is monthly, Billie must use 11 dummy variables. Since December is her busiest month, Billie decides to make one dummy variable for each month from January to November. D1 is January; D2 is February; and so on until D11 , which is November. Hence, in January, D1 will be flagged as a 1 and D2 to D11 will be 0. In February, D2 will equal 1 while all the other dummies will be zero. And so forth. Note that all dummies will be zero in December.

Picture in your mind a table with 60 rows and 13 columns. Each row contains the monthly data from January 2005 to December 2009. The first column is the number of orders for the month; the second is the time period, t, which is 1 to 60. That is our independent variable from our original model. The next eleven columns are the dummy variables. Billie enters these into Excel and runs her regression. What does she get?

I’m going to show you the resulting equation in tabular form, as it would look far too complicated in standard form. Billie gets the following output:

 Parameter Coefficients t Stat Intercept 42.93 15.93 t 0.47 12.13 D1 (January) -32.38 -9.88 D2 (February) -10.66 -3.26 D3 (March) -27.73 -8.48 D4 (April) -24.21 -7.41 D5 (May) -36.88 -11.31 D6 (June) -36.35 -11.16 D7 (July) -40.23 -12.35 D8 (August) -26.10 -8.02 D9 (September) -28.58 -8.79 D10 (October) -38.25 -11.76 D11 (November) -7.73 -2.38

Billie gets a great model: notice that all the parameter estimates are significant, and they’re all negative, indicating December as the busiest month. Billie’s R2 has now shot up to 0.919, indicating an even better fit. And the F-statistic is up to 44.73, and it is more significant.

How does this compare to Billie’s model on her seasonally-adjusted data? Recall that when doing her regressions on seasonally adjusted data, Billie got the following results:

Ŷ = 0.47t +17.12

Her model had an R2 of 0.872, but her F-statistic was almost 395! So, even though Billie gained a few more points in R2 with the seasonal dummies, her F-statistic wasn’t quite as significant. However, Billie’s F-statistic using the dummy variables is still very strong, and I would argue more stable. Recall that the F-statistic is determined by dividing the mean squared error of the regression by the mean squared error of the residuals. The mean squared error of the regression is the sum of squares regression (RSS) divided by the number of independent variables in the model; the mean squared error of the residuals is the Sum of Squared Error (SSE) divided by the number of observations less the number of independent variables and less one more. To illustrate, here is a side by side comparison:

 Seasonally Adjusted Model Seasonal Dummy Model # Observations 60 60 SSR 3,982 14,179 # Independent Variables 1 12 Mean Square Error of Regression 3,982 1,182 SSE 585 1,241 Degrees of Freedom 58 47 Mean Squared Error of Residuals 10.08 26.41 F-Statistic 394.91 44.73

So, although the F-statistic is much lower for the seasonal dummy model, the mean square error of the regression is also much lower. As a result, the F-statistic is still quite significant, but much more stable than our one variable model built on the seasonally-adjusted data.

It is important to note that sometimes data sets do not lend themselves well to seasonal dummies, and that the manual adjustment process we worked through a few weeks ago may be a better approach.

Next Forecast Friday Topic: Slope Dummy Variables

The dummy variables we worked with last week and this week are intercept dummies. These dummy variables alter the Y-intercept of the regression equation. Sometimes, it is necessary to affect the slope of the equation. We will discuss how slope dummies are used in next week’s Forecast Friday post.

*************************

If you Like Our Posts, Then “Like” Us on Facebook and Twitter!

Analysights is now doing the social media thing! If you like Forecast Friday – or any of our other posts – then we want you to “Like” us on Facebook! By “Like-ing” us on Facebook, you’ll be informed every time a new blog post has been published, or when other information comes out. Check out our Facebook page! You can also follow us on Twitter.