(Twelfth in a series)
We’ve spent the last six weeks discussing regression analysis as a forecasting method. As you have seen, simple regression is a bit tedious to work out by hand, but for multiple regression analysis, you almost always need the aid of a computerized software package. Today I will demonstrate for you how to use the Regression Analysis feature of Microsoft Excel’s^{1} Analysis ToolPak AddIn. Excel 2007 comes with the Analysis ToolPak AddIn, which you can choose to activate. One way to know if Analysis ToolPak is activated on your version of Excel is to click on the Data tab on your workspace and see if there is a “Data Analysis” icon. The following thumbnail will illustrate:
Checking for the Data Analysis AddIn
Notice towards the upper right corner of the image, “Data Analysis” is highlighted in orange. The presence of the Data Analysis icon means that we have activated the Analysis ToolPak AddIn. If it wasn’t there, you would need to activate the AddIn, which you could do very easily by clicking the “Office” button in the top left hand corner, then clicking the “Excel Options” button, which will take you through the process of activating your addin.
Setting up a Regression – Our Data Set
Generally in direct mail marketing, three components that often determine how much one spends – or whether he/she buys at all – are recency, frequency, and monetary value – known in short as RFM. Generally, the longer it has been since one’s last purchase (recency), the less he/she is likely to spend. Hence, we would expect a minus sign by the coefficient for recency. Also, RFM theorizes that the more frequently one buys, the greater his/her purchase. So, we would expect a plus sign by the coefficient for frequency. Finally, the higher the customer’s average purchases (monetary value), the greater his/her spending, so we would also expect a plus sign here. RFM is also used heavily by nonprofits in their capital and contributor campaigns, since they are often heavily reliant upon direct mail.
In our example here, a local nonprofit decided to test whether each RFM component had a relationship to a donor’s contribution, so it randomly selected 20 donors who contributed to its last appeal. Naturally, our dependent variable, Y, was the Contribution amount. The nonprofit also looked at three independent variables: months since last contribution (X_{1}), times donated in last 12 months (X_{2}), and average contribution over the last 12 months (X_{3}). These independent variables represent recency, frequency, and monetary value, respectively. The table below shows our dataset:
Giving Patterns of 20 Donors

Donor

Contribution

Months Since Last Donation

Times Donated in last 12 months

Average Contribution in last 12 months

1 
95

10

1

85

2

110

8

2

95

3

100

10

2

90

4

115

8

3

75

5

100

9

1

95

6

120

6

2

100

7

105

9

1

90

8

125

10

1

125

9

105

9

2

100

10

130

4

3

150

11

135

7

4

125

12

150

2

8

150

13

140

4

3

125

14

155

2

9

140

15

140

2

8

130

16

160

2

10

150

17

145

3

6

135

18

165

1

12

150

19

150

3

4

160

20

170

1

12

140 
The thumbnail below shows what the data set looks like in Excel:
Regression Data Set in Excel
Running the Regression
To run the regression, we need to select the regression tool from the Analysis ToolPak. We do this by clicking on the Data Analysis Tab. The next thumbnail shows us what we need to do:
Selecting the Regression option from the Data Analysis ToolPak
After selecting the regression tool, we need to select our independent variables and our dependent variables. It is best to make sure all columns containing your independent variables are adjacent to each other, as they are in columns D, E, and F. Notice that column C from rows 2 to 22 contains our Yrange values (including the column label). In columns D, E, and F, rows 2 through 22 contain their respective Xrange values. Notice in the thumbnail how we indicate those column/row positions for Yrange and Xrange values.
Regression Options
We also need to decide where to place the regression output and what data we want the output to contain. In the thumbnail below, we choose to have the output placed in a new worksheet, called “Regression Output”, and we also check the box indicating that we want the residuals printed. Also notice that we checked the box “Labels”, so that row 2 won’t be inadvertently added into the model.
Regression Options  Continued
Looking at the Output
Now we run the regression and get the following output:
Regression Output (residuals not shown)
As you can see, cell B5 contains our R^{2}, equal to .933, indicating that 93.3% of the variation in a donor’s contribution amount is explained by changes in recency, frequency, and monetary value. Also, notice the Fstatistic in cell E12. It’s a large, strong 73.90, and cell F12 to the right is 0.00, suggesting the model is significant. (Note, the Significance F in cell F12 and the PValues in cells E17E20 for each parameter estimate are quick cues to significance. If you’re using a 95% confidence interval – which we are here – then you want those values to be no higher than 0.05).
Now let’s look at each parameter estimate. Cells B17B20 contain our regression coefficients. We have the following equation:
Contribution Estimate = 87.27 – 1.80 *Months_Since_Last_Donation + 2.45 *Times_Donated_Last_12_Months + 0.35 *Average_Contribution_Last_12_Months
Simplifying, we have:
Contribution Estimate = 87.27 – 1.80*RECENCY + 2.45*FREQUENCY + 0.35*MONETARY_VALUE
Ŷ = 87.27 – 1.80X_{1} + 2.45X_{2} + 0.35X_{3}
Note that even though we opted to display the residuals for each observation, I chose not to show them here. It would have run below the fold, and would have been difficult to see. Besides, for our analysis, we’re not going to worry about residuals right now.
Interpreting the Output
As we can see, each month since a donor’s last contribution reduces his contribution by an average of $1.80, when we hold frequency and monetary value constant. Likewise, for each time a donor has given in the last 12 months, the size of his contribution increases by an average of $2.45, holding the other two variables constant. In addition, each onedollar increase in a donor’s average contribution increases his contribution by an average of 35 cents. Hence, all of our coefficients have the signs we expect.
TStatistics and PValues
Next, we need to look at the tstatistics and Pvalues. As mentioned above, for a 95% confidence interval, a parameter estimate must have a pvalue no greater than 0.05 (or 0.10 for a 90% confidence interval, etc.), in order to be significant. In like manner, for a 95% confidence interval, tstatistics should be values of at least 1.96 (slightly higher for small samples, but 1.96 will work) or less than 1.96 if the coefficient is negative, to be significant:
Parameter

Coefficient

Tstatistic

Significant?

Intercept 
87.27

4.32

Yes

Months since Last 
(1.80)

(1.44)

No

Times Donated 
2.45

2.87

Yes

Average Contribution 
0.35

3.26

Yes

Notice that the coefficient for Months Since Last Donation has a tstatistic of 1.44. It is not significant. Another way to tell whether the parameter estimates are significant is to look at the Lower 95% and Upper 95% values in columns F and G. If the lower and upper 95% confidence interval values for a parameter estimate are both negative or both positive, they are significant. However, if the lower 95% value is negative and the upper 95% is positive (as is the case with Months Since Last Donation), then the parameter estimate is not significant, since its confidence interval range crosses zero. Hence Months Since Last Donation is not significant. Yet, the model still has a 93.3% coefficient of determination. Does this mean we can drop this variable from our regression? Not so fast!
Regression Violation Present!
Generally, when an independent variable we expect to be an important predictor of our dependent variable comes up as statistically insignificant, it is sometimes a sign of multicollinearity. And that is definitely the case with the nonprofit’s model. That will be our topic in next week’s Forecast Friday post.
Forecasting with the Output
Since we’re going to take on multicollinearity next week, let’s pretend our model is AOK, and generate some forecasts.
We’ll go to our regression output worksheet, select cells A17 through B20, which contain our regression variables and coefficients, and then click Copy (or do a CTRLC):
Selecting the Coefficients
Next, let’s paste those coefficients and transpose them in another worksheet. Here’s how to select the “Transpose” option when pasting:
Pasting Data Using the Transpose Option
Next, this is what the result of our transpose will be:
Transposed Data
Now, the nonprofit organization looks at five prospective donors whom they are planning to solicit. They look at their past giving history as shown in the next thumbnail:
Prospective Donors  Before Applying Model
Knowing this information, we want to multiply those values by their respective coefficients. Take a look at the formula in cell F7 as we do just that, in the next thumbnail:
Forecasting with Regression Output
Note how the cell numbers containing the coefficients have their column letters enveloped in ‘$’. The dollar signs tell Excel that when we copy the formula down the next four rows, that it still reference those cells. Otherwise, for each row down, Excel would multiply each blank cell below the coefficients by the next donor’s information. Here’s are the forecasts generated:
Forecasts made with Model
Next Forecast Friday Topic: Multicollinearity
Today you learned how to develop regression models using Excel and how to use Excel to interpret the output. You also found out that our model exhibited multicollinearity, a violation of one of the key regression assumptions. Next week and the week after, we will discuss multicollinearity in depth: how to detect it, how to correct it, and when to live with it. We will again be using the nonprofit’s model. As I’ve said before, models are far from perfect and, as such, should only aid – not replace – the decisionmaking process.
^{1} Note: Excel is a registered trademark of Microsoft Corporation. Use of Microsoft Excel in this post is intended only for a demonstration of how to use Excel for regression analysis and does not constitute an endorsement of Microsoft Excel or any other Microsoft product by Analysights, LLC.
*************************
If you Like Our Posts, Then “Like” Us on Facebook!
Analysights is now doing the social media thing! If you like Forecast Friday – or any of our other posts – then we want you to “Like” us on Facebook! By “Likeing” us on Facebook, you’ll be informed every time a new blog post has been published, or when other information comes out. Check out our Facebook page!