Posts Tagged ‘Direct marketing’

Forecast Friday Topic: Building Regression Models With Excel

July 8, 2010
 (Twelfth in a series)
We’ve spent the last six weeks discussing regression analysis as a forecasting method. As you have seen, simple regression is a bit tedious to work out by hand, but for multiple regression analysis, you almost always need the aid of a computerized software package. Today I will demonstrate for you how to use the Regression Analysis feature of Microsoft Excel’s1 Analysis ToolPak Add-In. Excel 2007 comes with the Analysis ToolPak Add-In, which you can choose to activate. One way to know if Analysis ToolPak is activated on your version of Excel is to click on the Data tab on your workspace and see if there is a “Data Analysis” icon. The following thumbnail will illustrate:

 

Seeing if the Data Analysis ToolPak Add-In is activated.

Checking for the Data Analysis Add-In

Notice towards the upper right corner of the image, “Data Analysis” is highlighted in orange. The presence of the Data Analysis icon means that we have activated the Analysis ToolPak Add-In. If it wasn’t there, you would need to activate the Add-In, which you could do very easily by clicking the “Office” button in the top left hand corner, then clicking the “Excel Options” button, which will take you through the process of activating your add-in.

Setting up a Regression – Our Data Set

Generally in direct mail marketing, three components that often determine how much one spends – or whether he/she buys at all – are recency, frequency, and monetary value – known in short as RFM. Generally, the longer it has been since one’s last purchase (recency), the less he/she is likely to spend. Hence, we would expect a minus sign by the coefficient for recency. Also, RFM theorizes that the more frequently one buys, the greater his/her purchase. So, we would expect a plus sign by the coefficient for frequency. Finally, the higher the customer’s average purchases (monetary value), the greater his/her spending, so we would also expect a plus sign here. RFM is also used heavily by nonprofits in their capital and contributor campaigns, since they are often heavily reliant upon direct mail.

In our example here, a local nonprofit decided to test whether each RFM component had a relationship to a donor’s contribution, so it randomly selected 20 donors who contributed to its last appeal. Naturally, our dependent variable, Y, was the Contribution amount. The nonprofit also looked at three independent variables: months since last contribution (X1), times donated in last 12 months (X2), and average contribution over the last 12 months (X3). These independent variables represent recency, frequency, and monetary value, respectively. The table below shows our dataset:

Giving Patterns of 20 Donors

Donor

Contribution

Months Since Last Donation

Times Donated in last 12 months

Average Contribution in last 12 months

1

95

10

1

85

2

110

8

2

95

3

100

10

2

90

4

115

8

3

75

5

100

9

1

95

6

120

6

2

100

7

105

9

1

90

8

125

10

1

125

9

105

9

2

100

10

130

4

3

150

11

135

7

4

125

12

150

2

8

150

13

140

4

3

125

14

155

2

9

140

15

140

2

8

130

16

160

2

10

150

17

145

3

6

135

18

165

1

12

150

19

150

3

4

160

20

170

1

12

140

The thumbnail below shows what the data set looks like in Excel:

Data Set in Excel

Regression Data Set in Excel

Running the Regression

To run the regression, we need to select the regression tool from the Analysis ToolPak. We do this by clicking on the Data Analysis Tab. The  next thumbnail shows us what we need to do:

Regression Option

Selecting the Regression option from the Data Analysis ToolPak

After selecting the regression tool, we need to select our independent variables and our dependent variables. It is best to make sure all columns containing your independent variables are adjacent to each other, as they are in columns D, E, and F. Notice that column C from rows 2 to 22 contains our Y-range values (including the column label). In columns D, E, and F, rows 2 through 22 contain their respective X-range values. Notice in the thumbnail how we indicate those column/row positions for Y-range and X-range values.

Regression Options

Regression Options

We also need to decide where to place the regression output and what data we want the output to contain. In the thumbnail below, we choose to have the output placed in a new worksheet, called “Regression Output”, and we also check the box indicating that we want the residuals printed. Also notice that we checked the box “Labels”, so that row 2 won’t be inadvertently added into the model.

Regression Options - Continued

Regression Options - Continued

Looking at the Output

Now we run the regression and get the following output:

Regression Output

Regression Output (residuals not shown)

As you can see, cell B5 contains our R2, equal to .933, indicating that 93.3% of the variation in a donor’s contribution amount is explained by changes in recency, frequency, and monetary value. Also, notice the F-statistic in cell E12. It’s a large, strong 73.90, and cell F12 to the right is 0.00, suggesting the model is significant. (Note, the Significance F in cell F12 and the P-Values in cells E17-E20 for each parameter estimate are quick cues to significance. If you’re using a 95% confidence interval – which we are here – then you want those values to be no higher than 0.05).

Now let’s look at each parameter estimate. Cells B17-B20 contain our regression coefficients. We have the following equation:

Contribution Estimate = 87.27 – 1.80 *Months_Since_Last_Donation + 2.45 *Times_Donated_Last_12_Months + 0.35 *Average_Contribution_Last_12_Months

Simplifying, we have:

Contribution Estimate = 87.27 – 1.80*RECENCY + 2.45*FREQUENCY + 0.35*MONETARY_VALUE

Ŷ = 87.27 – 1.80X1 + 2.45X2 + 0.35X3

Note that even though we opted to display the residuals for each observation, I chose not to show them here.  It would have run below the fold, and would have been difficult to see.  Besides, for our analysis, we’re not going to worry about residuals right now.

Interpreting the Output

As we can see, each month since a donor’s last contribution reduces his contribution by an average of $1.80, when we hold frequency and monetary value constant. Likewise, for each time a donor has given in the last 12 months, the size of his contribution increases by an average of $2.45, holding the other two variables constant. In addition, each one-dollar increase in a donor’s average contribution increases his contribution by an average of 35 cents. Hence, all of our coefficients have the signs we expect.

T-Statistics and P-Values

Next, we need to look at the t-statistics and P-values. As mentioned above, for a 95% confidence interval, a parameter estimate must have a p-value no greater than 0.05 (or 0.10 for a 90% confidence interval, etc.), in order to be significant. In like manner, for a 95% confidence interval, t-statistics should be values of at least 1.96 (slightly higher for small samples, but 1.96 will work) or less than -1.96 if the coefficient is negative, to be significant:

Parameter

Coefficient

T-statistic

Significant?

Intercept

87.27

4.32

Yes

Months since Last

(1.80)

(1.44)

No

Times Donated

2.45

2.87

Yes

Average Contribution

0.35

3.26

Yes

Notice that the coefficient for Months Since Last Donation has a t-statistic of -1.44. It is not significant. Another way to tell whether the parameter estimates are significant is to look at the Lower 95% and Upper 95% values in columns F and G.  If the lower and upper 95% confidence interval values for a parameter estimate are both negative or both positive, they are significant.  However, if the lower 95% value is negative and the upper 95% is positive (as is the case with Months Since Last Donation), then the parameter estimate is not significant, since its confidence interval range crosses zero.  Hence Months Since Last Donation is not significant.  Yet, the model still has a 93.3% coefficient of determination. Does this mean we can drop this variable from our regression? Not so fast!

Regression Violation Present!

Generally, when an independent variable we expect to be an important predictor of our dependent variable comes up as statistically insignificant, it is sometimes a sign of multicollinearity. And that is definitely the case with the nonprofit’s model. That will be our topic in next week’s Forecast Friday post.

Forecasting with the Output

Since we’re going to take on multicollinearity next week, let’s pretend our model is A-OK, and generate some forecasts.

We’ll go to our regression output worksheet, select cells A17 through B20, which contain our regression variables and coefficients, and then click Copy (or do a CTRL-C):

Selecting the Coefficients

Selecting the Coefficients

Next, let’s paste those coefficients and transpose them in another worksheet.  Here’s how to select the “Transpose” option when pasting:

Pasting Data Using the Transpose Option

Next, this is what the result of our transpose will be:

Transposed Data

Transposed Data

Now, the nonprofit organization looks at five prospective donors whom they are planning to solicit. They look at their past giving history as shown in the next thumbnail:

Prospective Donors - Before Applying Model

Knowing this information, we want to multiply those values by their respective coefficients. Take a look at the formula in cell F7 as we do just that, in the next thumbnail:

Forecasting with Regression Output

Note how the cell numbers containing the coefficients have their column letters enveloped in ‘$’. The dollar signs tell Excel that when we copy the formula down the next four rows, that it still reference those cells. Otherwise, for each row down, Excel would multiply each blank cell below the coefficients by the next donor’s information.  Here’s are the forecasts generated:

Forecasts made with Model

Next Forecast Friday Topic: Multicollinearity

Today you learned how to develop regression models using Excel and how to use Excel to interpret the output. You also found out that our model exhibited multicollinearity, a violation of one of the key regression assumptions. Next week and the week after, we will discuss multicollinearity in depth: how to detect it, how to correct it, and when to live with it. We will again be using the nonprofit’s model.  As I’ve said before, models are far from perfect and, as such, should only aid – not replace – the decision-making process.

 1 Note: Excel is a registered trademark of Microsoft Corporation. Use of Microsoft Excel in this post is intended only for a demonstration of how to use Excel for regression analysis and does not constitute an endorsement of Microsoft Excel or any other Microsoft product by Analysights, LLC.

*************************

If you Like Our Posts, Then “Like” Us on Facebook!

Analysights is now doing the social media thing! If you like Forecast Friday – or any of our other posts – then we want you to “Like” us on Facebook! By “Like-ing” us on Facebook, you’ll be informed every time a new blog post has been published, or when other information comes out. Check out our Facebook page!

Advertisements

The Man Who Feared Analytics

June 9, 2010

A business owner had once been referred to me by a colleague with whom he had already been doing business. For many years, the businessman’s photography business had been sustained through direct mail advertising, and he often received a 5%-7% response rate, an accomplishment that would boggle most direct marketers. But the recent economic downturn combined with photography’s being a discretionary expense, he soon found his direct mail solicitations bringing in a puny 0.8% response rate. The business owner had a great product, a great price, and a great offer, but at that response rate, he was no longer breaking even.

My colleague and I spoke with the businessman about his dilemma. We talked through his business; we looked at his most recent mailer, learned how he obtained his mailing lists, and discussed his promotion schedule. We found that the photographer would buy a list of names, mail them once, and then use a different list, not giving people enough opportunity to develop awareness of his business. We also found that he didn’t have much information about the people he was mailing.

We recommended that analytics could help the photographer maximize his margin by improving both the top and bottom line. Analytics would first help him understand which customers were responding to his mailings. Then he could purchase lists of people with characteristics similar to those past respondents. His response rate would go up, since he would be sending to a list of people most receptive to his photography. He would also be able to mail fewer people, cutting out those with little likelihood of response. He could then use the savings to remail the members of his target segments who hadn’t responded to his earlier mailing, and thus increase their awareness. It all sounded good to the photographer.

And then, he decided he was going to wait to see if things got better!

Why the Fear of Analytics?

The photographer’s decision is a common refrain of marketers. Marketers and business owners who are introduced to analytics are like riders on a roller coaster: thrilled and nervous at the same time. While marketers are excited about the benefits of analytics, they are also concerned about its cost; they’re afraid of change; and they’re intimidated by the perceived complexity of analytics. We’ll tackle each of these fears here.

FEAR #1: Analytics could be expensive.

REALITY: Analytics is an investment that pays for itself.

The cost of analytics can appear staggering, especially in lean times. Some of the most sophisticated analytics techniques can run into tens – if not hundreds – of thousands of dollars for a large corporation. However, for many smaller companies, analytics can run a few thousand dollars, but still a lot of money. But analytics is not an expense; you are getting something great in return: the insights you need to make better informed marketing decisions and identify the areas in your marketing that you can improve or enhance; the ability to target customers and prospects more effectively, resulting in increased sales and reduced costs; and the chance to establish long-term continuous improvement systems.

Had the photographer gone through with the analytics for his upcoming mail, the entire analysis would have cost him somewhere between $1,300 and $1,800. But that fee would have enabled him to identify where his mailings were getting the greatest bang for his buck and he might have made up for it in reduced mailing costs and increased revenues. Once the analytics had saved or made the photographer at least $1,800, it would have paid for itself.

FEAR #2: Analytics means a change in the way we do things.

REALITY: Analytics brings about change gradually and seamlessly.
The photographer had been using direct mail over and over again, because it worked over and over again – until recently. In fact, having lost so much money on his recent direct mails, he’s probably leery of new approaches, so he stays the course out of familiarity. That’s quite common. But this is the nice part about analytics: change can be gradual! Analytics is about testing the waters, so to reduce risk. Perhaps the photographer could have done a test where half of his mailings were executed the traditional way, and half done the way the analytics recommended. Over the course of a short period, the photographer could then decide for himself what approach was working best.

FEAR #3: Analytics is “over my head.”
REALITY: You need only understand a few high level concepts.

Those complicated and busy mathematical formulas, in all their Greek symbol glory, can be intimidating to people who are not mathematicians, statisticians, or economists. In fact, even I get intimidated from those equations. We must remember, however, that these formulas were developed to improve the way we do things! With analytics, all you need to know is what approach was employed, what it does, why it’s important, and how to apply it – all of which are very simple. Analysts like me deal with all the complicated stuff – finding the approach, employing it, debugging it, refining it, and then packaging it in a way that you can apply seamlessly. And if you don’t understand something about the analytical approach employed, by all means, ask! And any good analyst will give you all the guidance you need until you’re able to apply the analytics on your own.

Forgoing Analytics Can Cost Your Business Three Times Over!

Analytics is one of those tools that many marketers know can enhance their businesses, yet decide to hold off on – either for cost, perceived complexity, or just plain fear. This inaction can be very dangerous. Analytics is not just a tool that improves your business decision making; it also helps you diagnose problems, identify opportunities, and make predictions about the future. Failure to do these properly costs you in three ways. First, you market incorrectly, wasting money. Second, you market to the wrong people; they don’t buy, and you lose revenue you could have made marketing to the right people. Third, you fail to recognize opportunities, and you forgo any sales those missed opportunities may have brought. Analytics is an investment that pays for itself, pays dividends down the road, brings about change in an easy and acceptable way, and whose benefits are easy to grasp and financially rewarding.

Charities are Spying on You – But That’s Not Necessarily a Bad Thing!

May 26, 2010

The June 2010 issue of SmartMoney magazine contained an interesting article, “Are Charities Spying On You?,” which discussed the different ways nonprofit organizations are trying to find out information – available from public sources – on current and prospective donors. As one who has worked in the field of data mining and predictive analytics, I found the article interesting in large part because of how well the nonprofit sector has made use of these very techniques in designing their campaigns, solicitations, and programming.

At first glance, it can seem frightening what charities can learn about you. For instance, the article mentions how some charities’ prospect-research departments look at LinkedIn profiles, survey your salary history, and even use satellite images to get information on the home in which you live. And there is a wealth of information out there about us: Zillow.com gives info about the value of our homes and those around it; if you write articles or letters to the editor of your newspaper, online versions can often be found on Google; buy or sell any real estate? That too gets published in the online version of the newspaper; and online bridal and baby shower registries, graduation and wedding announcements, and any other news are fair game. And your shopping history! If you buy online or through a catalog, your name ends up on mailing lists that charities buy. Face it, there’s a lot of information about us that is widely and publicly available.

But is this so terrible? For the most part, I don’t think so. Surely, it’s bad if that information is being used against you. But think of the ways this data mining proves beneficial:

Customization

Let’s assume that you and I are both donors to the Republican National Committee. That suggests we’re both politically active and politically conservative. But are we engaged with the RNC in the same way? Most likely not. You might have donated to the RNC because you’re a wealthy individual who values low taxes and opposes a national health care plan; I might have donated because I am a social conservative who wants prayer in public schools, favors school choice, and opposes abortion. By seeking out information on us, the RNC can tailor its communications in a manner that speaks to each of us individually, sending you information about how it’s fighting proposed tax hikes in various states, and sending me information about school choice initiatives. In this way, the RNC maintains its relevance to each of us.

In addition, it’s very likely, in this example, that you’re donating a lot more money to the RNC than I am. Hence, that would likely lead the RNC to offer you special perks, such as free passes for you and a guest to meet various candidates or attend special luncheons or events. For me, I might at best be given an autographed photo of the event – in exchange for a donation of course – or an invite to the same events, but with a donation of a lot of money requested. I might get information about when the next Tea Party rally in my area will be held. Or even a brief newsletter. One can argue that the treatment you’re getting vs. that of what I’m getting is unfair. However, think of it like this: at a casino, people who gamble regularly and heavily are given all sorts of complimentary perks: drinks, food, a host to attend to their needs, and even special reduced rate stays. That’s because these gamblers are making so much money for the casino, that the cost of these “comps” is small in comparison. In addition, the casino wants to make it more fun for these gamblers to lose money, so that they’ll keep on playing. In short, the special treatment you’re getting is something you’re paying for, if indirectly. I’m getting less because I’m giving less; you’re getting more because you’re giving more. And the charity will give you more to keep you giving more!

Reduced Waste

Before direct marketing got so sophisticated, mass marketing was the only tactic. If you had a product to sell, you sent the same solicitation to thousands, if not millions of people and hoped for a 1-2% response rate. Most people simply threw your solicitation in the garbage when it came in the mail. Many recipients didn’t have a need for the item you were selling or the appeal for which you were soliciting, and disregarded your piece. As a result, lots of paper was wasted, and the phrase “junk mail” came into existence. In addition, if you used follow-up methods, such as phone calls after the mailing, that got costly trying to qualify the leads, just because of the labor involved.

Now, with targeted marketing and list rental, sales, and sharing, charities can build predictive models that estimate each current and prospective donor’s likelihood of responding to a promotion. As a result, the charity doesn’t need to send out quite a large mailing; it can mail solely to those with the best chance of responding, reducing the amount of paper, print, and postage involved, not to mention reduced labor costs involved, both in the production of the piece and in the staffing of the outbound call center. In short, the charity’s data mining is helping the environment, reducing overhead, and increasing the top and bottom lines.

Better Programming

By knowing more about you, the charity can know what makes you “tick,” so that it can come up with programs that fit your needs. Even if you’re not a large donor, if you and other donors feel strongly about certain issues, or value certain programs, the charity can develop programs that are suitable to its members at large. And while many larger donors may be granted special privileges, their large donations can help fund the programs of those who donate less. Everybody wins.

Not bad at all

The data mining tactics charities use aren’t bad. People don’t want to be bombarded with solicitations for which they see no value in it for themselves. Data mining makes it very possible to give you an offer that is relevant to your situation, is cost-effective and resource-efficient, and design programs from which you’re likely to benefit. It is important to note, that while major donors get several great perks, charities must not ignore those whose donations are smaller, for two reasons: first, they have the potential to become major donors, and second, because of their smaller donations, it’s very likely their frequency of giving is greater. This can mean a great stream of gifts to the charity over time. Hence, charities should do things that show these donors they’re appreciated – and, quite often, this too is often accomplished by data mining.

We welcome replies to our blog post!

Using Statistics to Evaluate a Promotion

May 25, 2010

Marketing – as much as cashflow – is the lifeblood of any business. No matter how good your product or service may be, it’s worthless if you can’t get it in front of your customers and get them to buy it. So all businesses, large and small, must engage in marketing. And we see countless types of marketing promotions or tactics being tried: radio and TV commercials, magazine and newspaper advertisements, public relations, coupons, email blasts, and so forth. But are our promotions working? The merchant John Wannamaker, often dubbed the father of modern advertising is said to have remarked, “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.”

Some basic statistics can help you evaluate the effectiveness of your marketing and take away much of the mystique Wannamaker complained about. When deciding whether to do a promotion, managers and business owners have no way of knowing whether it will succeed; in fact, in today’s economy, budgets are still tight. The cost to roll out a full promotion can wipe out an entire marketing budget if it proves to be a fiasco. This is why many businesses do a test before doing a complete rollout. The testing helps to reduce the amount of uncertainty involved in an all-out campaign.

Quite often, large companies need to choose between two or more competing campaigns for rollout. But how do they know which will be effective? Consider the example of Jenny Kaplan, owner of K-Jen, a New Orleans-style restaurant. K-Jen serves up a tasty jambalaya entrée, which is priced at $10.00. Jenny believes that the jambalaya is a draw to the restaurant and believes that by offering a discount, she can increase the average amount of the table check. Jenny decides to issue coupons via email to patrons who have opted-in to receive such promotions. She wants to knock a dollar off the price of the jambalaya as the offer, but doesn’t know whether customers would respond better to an offer worded as “$1.00 off” or as “10% off.” So, Jenny decides to test the two concepts.

Jenny goes to her database of nearly 1,000 patrons and randomly selects 200 patrons. She decides to send half of those a coupon for $1.00 off for jambalaya, and the other half a coupon for 10% off. When the coupon offer expires 10 days later, Jenny finds that 10 coupons were redeemed for each offer – a redemption rate of 10% each. Jenny observes that either wording will get the same number of people to respond. But she wonders which offer generated the largest table check. So she looks at the guest checks to which the coupons were stapled. She notices the following:

Guest Check Amounts

 

Offer

 
 

$1.00 off

10% Off

 
 

$38.85

$50.16

 
 

$36.97

$54.44

 
 

$35.94

$32.20

 
 

$54.17

$32.69

 
 

$68.18

$51.09

 
 

$49.47

$46.18

 
 

$51.39

$57.72

 
 

$32.72

$44.30

 
 

$22.59

$59.29

 
 

$24.13

$22.94

 

 

Jenny quickly computes the average for each offer. The “$1.00 off” coupon generated an average table check of $41.44; the “10% off” coupon generated an average of $45.10. At first glance, it appears that the 10% off promotion generated a higher guest check. But is that difference meaningful, or is it due to chance? Jenny needs to do further analysis.

Hypothesis Testing

How does Jenny determine if the 10% off coupon really did better than the $1.00 off coupon? She can use statistical hypothesis testing, which is a structured analytical method for comparing the difference between two groups – in this case, two promotions. Jenny starts her analysis by formulating two hypotheses: a null hypothesis, which states that there is no difference in the average check amount for either offer; and an alternative hypothesis, which states that there is, in fact, a difference in the average check amount between the two offers. The null hypothesis is often denoted as H0, and the alternative hypothesis is denoted as HA. Jenny also refers to the $1.00 off offer as Offer #1, and the 10% off offer as Offer #2. She wants to compare the means of the two offers, the means of which are denoted as μ1 and μ2, respectively. Jenny writes down her two hypotheses:

H0: The average guest check amount for the two offers is equal.

HA: The average guest check amount for the two offers is not equal.

Or, more succinctly:

H0: μ12

HA: μ1≠μ2

 

Now, Jenny is ready to go to work. Note that the symbol μ denotes the population she wants to measure. Because Jenny did her test on a portion – a sample – of her database, the averages she computed were the sample average, which is denoted as . As we stated earlier, the average table checks for the “$1.00 off” and “10% off” offers were 1=$41.44 and 2=$45.10, respectively. Jenny needs to approximate μ using . She must also compute the sample standard deviation, or s for each offer.

Computing the Sample Standard Deviation

To compute the sample standard deviation, Jenny must subtract the mean of a particular offer from each of its check amounts in the sample; square the difference; sum them up; divide by the total observations minus 1(9) and then take the square root:

$1.00 Off

Actual Table Check

Average Table Check

Difference

Difference Squared

$38.85

$41.44

-$2.59

$6.71

$36.97

$41.44

-$4.47

$19.99

$35.94

$41.44

-$5.50

$30.26

$54.17

$41.44

$12.73

$162.03

$68.18

$41.44

$26.74

$714.97

$49.47

$41.44

$8.03

$64.46

$51.39

$41.44

$9.95

$98.98

$32.72

$41.44

-$8.72

$76.06

$22.59

$41.44

-$18.85

$355.36

$24.13

$41.44

-$17.31

$299.67

   

Total

$1,828.50

   

S21=

$203.17

   

S1=

$14.25

 

10% Off

Actual Table Check

Average Table Check

Difference

Difference Squared

$50.16

$45.10

$5.06

$25.59

$54.44

$45.10

$9.34

$87.22

$32.20

$45.10

-$12.90

$166.44

$32.69

$45.10

-$12.41

$154.03

$51.09

$45.10

$5.99

$35.87

$46.18

$45.10

$1.08

$1.16

$57.72

$45.10

$12.62

$159.24

$44.30

$45.10

-$0.80

$0.64

$59.29

$45.10

$14.19

$201.33

$22.94

$45.10

-$22.16

$491.11

   

Total

$1,322.63

   

S22=

$146.96

   

S2=

$12.12

 

Notice the denotation of S2. That is known as the variance. The variance and the standard deviation are used to measure the average distance between each data point and the mean. When data are normally distributed, about 95% of all observations fall within two standard deviations from the mean (actually 1.96 standard deviations). Hence, approximately 95% of the guest checks for the $1.00 off offer should fall between $41.44 ± 1.96*($14.25) or between $13.51 and $69.37. All ten fall within this range. For the 10% off offer, about 95% will fall between $45.10 ± 1.96*($12.12), or between $21.34 and $68.86. All 10 observations also fall within this range.

Degrees of Freedom and Pooled Standard Deviation

Jenny noticed two things immediately: first, that the 10% off coupon has the higher sample average, and second each individual table check is closer to it mean than it is for the $1.00 off coupon. Also notice that when we were computing the sample standard deviation for each offer, Jenny divided by 9, and not 10. Why? Because she was making estimates of the population standard deviation. Since samples are subject to error, we must account for that. Each observation gives us information into the population’s actual values. However, Jenny had to make an estimate based on that sample, so she gives up one observation to account for the sampling error – that is, she lost a degree of freedom. In this example, Jenny has 20 total observations; since she estimated the population standard deviation for both offers, she lost two degrees of freedom, leaving her with 18 (10 + 10 – 2).

Knowing the remaining degrees of freedom, Jenny must pool the standard deviations, weighting them by their degrees of freedom. This would be especially evident if the sample sizes of the two offers were not equal. The pooled standard deviation is given by:

FYI – n is simply the sample size. Jenny then computes the pooled standard deviation:

S2p = ((9 * $203.17) + (9 * $146.96))) / (10 + 10 – 2)

= ($1,828.53 + $1,322.64)/18

= $3,151.17/18

= $175.07

Now take the square root: $13.23

Hence, the pooled standard deviation is $13.23

Computing the t-Test Statistic

Now the fun begins. Jenny knows the sample mean of the two offers; she knows the hypothesized difference between the two population means (which we would expect to be zero, if the null hypothesis said they were equal); she knows the pooled standard deviation; she knows the sample size; and she knows the degrees of freedom. Jenny must now calculate the t-Test statistic. The t-Test Statistic, or the t-value, represents the number of estimated standard errors the sample average is from that of the population. The t-value is computed as follows:

 

So Jenny sets to work computing her t-Test Statistic:

t = (($41.44 – $45.10) – (0)) / ($13.23) * SQRT(1/10 + 1/10)

= -$3.66 / ($13.23 * SQRT(1/5))

=-$3.66 / ($13.23 * .45)

=-$3.66/$5.92

= -0.62

This t-statistic gives Jenny a basis for testing her hypothesis. Jenny’s t-statistic indicates that the difference in sample table checks between the two offers is 0.62 standard errors below the hypothesized difference of zero. We now need to determine the critical t – the value that we get from a t-distribution table that is available in most statistics textbooks and online. Since we are estimating with a 95% confidence interval, and since we must account for a small sample, our critical t-value is adjusted slightly from the 1.96 standard deviations from the mean. For 18 degrees of freedom, our critical t is 2.10. The larger the sample size, the closer to 1.96 the critical t would be.

So, does Jenny Accept or Reject her Null Hypothesis (Translation: Is the “10% Off” Offer Better than the “$1.00 Off” Offer)?

Jenny now has all the information she needs to determine whether one offer worked better than the other. What does the critical t of 2.10 mean? If Jenny’s t-statistic is greater than 2.10, or (since one offer can be lower than the other), less than -2.10, then she would reject her null hypothesis, as there is sufficient evidence to suggest that the two means are not equal. Is that the case?

Jenny’s t-statistic is -0.62, which is between -2.10 and 2.10. Hence, it is within the parameters. Jenny should not reject H0, since there is not enough evidence to suggest that one offer was better than the other at generating higher table checks. In fact, there’s nothing to say that the difference between the two offers is due to anything other than chance.

What Does Jenny Do Now?

Basically, Jenny can conclude that there’s not enough evidence that the “$1.00 off” coupon was worse/better than the “10% off” coupon in generating higher table check amounts, and vice-versa. This does not mean that our hypotheses were true or false, just that there was not enough statistical evidence to say so. In this case, we did not accept the null hypothesis, but rather, failed to reject it. Jenny can do a few things:

  1. She can run another test, and see if the same phenomenon holds.
  2. Jenny can accept the fact that both offers work equally well, and compare their overall average table checks to those of who ordered jambalaya without the coupons during the time the offer ran; if the coupons generated average table checks that were higher (using the hypothesis testing procedures outlined above) than those who paid full price, then she may choose to rollout a complete promotion using either or both of the offers described above.
  3. Jenny may decide that neither coupon offer raised average check amounts and choose not to do a full rollout after all.

So Why am I Telling You This?

The purpose of this blog post was to take you step-by-step into how you can use a simple concept like t-tests to judge the performance of two promotion concepts. Although a spreadsheet like Excel can run this test in seconds, I wanted to walk you through the theory in laymen’s terms, so that you can grasp the theory, and then apply it to your business. Analysights is in the business of helping companies – large and small – succeed at marketing, and this blog post is one ingredient in the recipe for your marketing success. If you would like some assistance in setting up a promotion test or in evaluating the effectiveness of a campaign, feel free to contact us at www.analysights.com.

 

Three Metrics for E-mail Marketing Excellence

April 24, 2009

The principles of direct marketing apply just as much online as they do offline.  The process for tracking the performance of an e-mail campaign is essentially the same as for that of a direct mail campaign.

How do you know if your e-mail campaigns are working?  Start with three basic statistics: your bounce rate, your open rate, and your click-through rate.

Bounce Rate

The bounce rate tells you the percentage of your e-mails that were returned because they were undeliverable.  If you sent 10,000 e-mails and 1,000 were undeliverable, your bounce rate is 10%.  The 9,000 e-mails that were delivered are known as your non-bounce total.

Use the bounce rate to assess the quality and recency of your e-mail list.  Eyeball the list of e-mail addresses that bounced back.  You may find that some are simply invalid addresses (“,com” instead of “.com”) which can easily be rectified.  Others may be incomplete and thus useless.  Still other addresses might be old, which suggests you should have a continuous process in place for your customers and prospects to update their e-mail addresses.

Reducing bounce rate should be an ongoing objective of your e-mail marketing strategy.

Open Rate

The open rate is the number of recipients who opened an  HTML version of your e-mail, expressed as a percentage of your non-bounce total.  The open rate can give you an idea of how compelling and attention-getting your e-mail is.  Continuing with the example above, if 1,800 recipients opened your e-mail, then you have an open rate of 20% (1,800/9,000). 

The “HTML version” and the non-bounce total are very important components of this definition.  E-mail Service Providers (ESPs) can only track HTML e-mail messages, not text.  And the use of the non-bounce total has its own share of problems, because the non-bounce total isn’t synonymous with the total e-mails delivered.

E-mails may not be considered bounced because some e-mail servers inadvertently send them to a junk folder on the recipient’s computer, which he/she cannot access.  Furthermore, if the e-mail isn’t bounced by the server, but by a portable device or software on the recipient’s computer, it will not show up in your e-mail tracking report.  Hence, you are basing your open rate on the number of e-mails sent, as opposed to delivered.

An additional problem with the open rate lies in the definition of “opened.”  Your e-mail is considered “opened” if the recipient either 1) opens it in full view or lets its images display in the preview pane, or 2) clicks a link in the e-mail.   The preview pane is a double-edged sword: If the recipient let the images of your e-mail display in the pane, your open rate may be overstated.  On the other hand, if the recipient didn’t allow images to show in the pane, but scanned the e-mail, you open rate will be understated.

You might want to use some qualitative methods to estimate the degree to which these flaws exist.  For example, a survey may give you an idea of the percentage of your customers who use the preview pane and allow images to display; a test of 100 pre-recruited members of your list to receive your e-mail (who report whether or not they received it) might give you clues into how many non-bounced e-mails weren’t sent.  This will help you place a margin of error around your open rate.

If you find your open rates declining over several campaigns, that is a sign to make your messages more compelling.

 Click-Through Rate

Your click-through rate tells you the percentage of unique individuals who click at least one link in your opened e-mail.  If, from the 1,800 e-mails were opened, 180 recipients clicked at least one link, then your click-through rate is 10% (of your opened e-mails).  It’s important that you subtract multiple clicks by a recipient (whether he/she clicked more than one link, or one link several times), in order to prevent double counting.  Most ESPs do this for you seemlessly.

Your click-through rate is a measure of how well your e-mail calls your prospects to action.  Low or declining click-through rates suggest your e-mail message isn’t generating interest or desire.

Always remember to:

  1. Track every e-mail campaign you do;
  2. Look at non-click activity (increased store traffic, phone inquiries, etc.) that occur immediately following an e-mail campaign;
  3. Look at activity to your Web site immediately following an e-mail campaign; and
  4. Track your metrics over time.  Look for trends in these metrics to refine and improve your e-mail marketing results.