Archive for May, 2010

Forecast Friday Topic: Simple Regression Analysis

May 27, 2010

(Sixth in a series)

Today, we begin our discussion of regression analysis as a time series forecasting tool. This discussion will take the next few weeks, as there is much behind it. As always, I will make sure everything is simplified and easy for you to digest. Regression is a powerful tool that can be very helpful for mid- and long-range forecasting. Quite often, the business decisions we make require us to consider relationships between two or more variables. Rarely can we make changes to our promotion, pricing, and/or product development strategies without them having an impact of some kind on our sales. Just how big an impact would that be? How do we measure the relationship between two or more variables? And does a real relationship even exist between those variables? Regression analysis helps us find out.

One thing I must point out: Remember the “deviations” we discussed in the posts on moving average and exponential smoothing techniques: The difference between the forecasted and actual values for each observation, of which we took the absolute value? Good. In regression analysis, we refer to the deviations as the “error terms” or “residuals.” In regression analysis, the residuals – which we will square, rather than take the absolute value – become very important in gauging the regression model’s accuracy, validity, efficiency, and “goodness of fit.”

Simple Linear Regression Analysis

Sue Stone, owner of Stone & Associates, looked at her CPA practice’s monthly receipts from January to December 2009. The sales were as follows:

Month 

Sales 

January 

$10,000 

February 

$11,000 

March 

$10,500 

April 

$11,500 

May 

$12,500 

June 

$12,000 

July 

$14,000 

August 

$13,000 

September 

$13,500 

October 

$15,000 

November

$14,500 

December 

$15,500 

Sue is trying to predict what sales will be for each month in the first quarter of 2010, but is unsure of how to go about it. Moving average and exponential smoothing techniques rarely go more than one period ahead. So, what is Sue to do?

When we are presented with a set of numbers, one of the ways we try to make sense of it is by taking its average. Perhaps Sue can average all 12 months’ sales – $12,750 – and use that her forecast for each of next three months. But how accurately would that measure each month of 2009? How spread out are each month’s sales from the average? Sue subtracts the average from each month’s sales and examines the difference:

Month 

Sales 

Sales Less Average Sales 
January 

$10,000 

-$2,750 

February

$11,000 

-$1,750 

March 

$10,500 

-$2,250 

April 

$11,500 

-$1,250 

May 

$12,500 

-$250 

June 

$12,000 

-$750 

July 

$14,000 

$1,250 

August 

$13,000 

$250 

September 

$13,500 

$750 

October 

$15,000 

$2,250 

November 

$14,500 

$1,750 

December 

$15,500 

$2,750 

 

Sue notices that the error between actual and average is quite high in both the first four months of 2009 and in the last three months of 2009. She wants to understand the overall error in using the average as a forecast of sales. However, when she sums up all the errors from month to month, Sue finds they sum to zero. That tells her nothing. So she squares each month’s error value and sums them:

Month 

Sales 

Error 

Error Squared 

January 

$10,000 

-$2,750 

$7,562,500 

February 

$11,000 

-$1,750 

$3,062,500 

March 

$10,500

-$2,250 

$5,062,500 

April 

$11,500 

-$1,250 

$1,562,500 

May 

$12,500 

-$250 

$62,500 

June 

$12,000 

-$750 

$562,500 

July 

$14,000 

$1,250 

$1,562,500 

August 

$13,000 

$250 

$62,500 

September 

$13,500 

$750 

$562,500 

October 

$15,000 

$2,250 

$5,062,500 

November 

$14,500

$1,750 

$3,062,500 

December 

$15,500 

$2,750 

$7,562,500 

   

Total Error: 

$35,750,000 

    

In totaling these squared errors, Sue derives the total sum of squares, or TSS error: 35,750,000. Is there any way she can improve upon that? Sue thinks for a while. She doesn’t know too much more about her 2009 sales except for the month in which they were generated. She plots the sales on a chart:

Sue notices that sales by month appear to be on an upward trend. Sue thinks for a moment. “All I know is the sales and the month,” she says to herself, “How can I develop a model to forecast accurately?” Sue reads about a statistical procedure called regression analysis and, seeing that each month’s sales is in sequential order, she wonders whether the mere passage of time simply causes sales to go higher. Sue numbers each month, with January assigned a 1 and December, a 12.

She also realizes that she is trying to predict sales with each passing month. Hence, she hypothesizes that the change in sales depends on the change in the month. Hence, sales is Sue’s dependent variable. Because the month number is used to estimate change in sales, it is her independent variable. In regression analysis, the relationship between an independent and a dependent value is expressed:

Y = α + βX + ε

    Where: Y is the value of the dependent variable

    X is the value of the independent variable

    α is a population parameter, called the intercept, which would be the value of Y when X=0

    β is also a population parameter – the slope of the regression line – representing the change in Y associated with each one-unit change in X.

    ε is the error term.

Sue further reads that the goal of regression analysis is to minimize the error sum of squares, which is why it is referred to as ordinary least squares (OLS) regression. She also notices that she is building her regression on a sample, so there is a sample regression equation used to estimate what the true regression is for the population:

Essentially, the equation is the same as the one above, however the terms indicate the sample. The Y-term (called “Y hat”) is the sample forecasted value of the dependent variable (sales) at period i; a is the sample estimate of α; b is the sample estimate of β; Xi is the value of the independent variable at period i; and ei is the error, or difference between Y hat (the forecasted value) and actual Y for period i. Sue needs to find the values for a and b – the estimates of the population parameters – that minimize the error sum of squares.

Sue reads that the equations for estimating a and b are derived from calculus, but expressed algebraically as:

Sue learns that the X and Y terms with lines above them, known as “X bar” and “Y bar,” respectively are the averages of all the X and Y values, respectively. She also reads that the Σ notation – the Greek letter sigma – represents a sum. Hence, Sue realizes a few things:

  1. She must estimate b before she can estimate a;
  2. To estimate b,she must take care of the numerator:
    1. first subtract each observation’s month number from the average month’s number (X minus X-bar),
    2. subtract each observation’s sales from the average sales (Y minus Y-bar),
    3. multiply those two together, and
    4. Add up (2c) for all observations.
  3. To get the denominator for calculating b, she must:
    1. Again subtract X-bar from X, but then square the difference, for each observation.
    2. Sum them up
  4. Calculating b is easy: She needs only to divide the result from (2) by the result from (3).
  5. Calculating a is also easy: She multiplies her b value by the average month (X-bar), and subtracts it from average sales (Y-bar).

Sue now goes to work to compute her regression equation. She goes into Excel and enters her monthly sales data in a table, and computes the averages for sales and month number:

 

Month (X) 

Sales (Y) 

 

1 

$10,000 

 

2 

$11,000 

 

3 

$10,500 

 

4 

$11,500 

 

5 

$12,500 

 

6 

$12,000 

 

7 

$14,000 

 

8 

$13,000 

 

9 

$13,500 

 

10 

$15,000 

 

11 

$14,500 

 

12 

$15,500 

Average 

6.5 

$12,750 

 

Sue goes ahead and subtracts the X and Y values from their respective averages, and computes the components she needs (the “Product” is the result of multiplying the values in the first two columns together):

X minus X-bar 

Y minus Y-bar 

Product 

(X minus X-bar) Squared 

-5.5 

-$2,750 

$15,125 

30.25 

-4.5 

-$1,750 

$7,875 

20.25 

-3.5 

-$2,250 

$7,875 

12.25 

-2.5 

-$1,250 

$3,125 

6.25 

-1.5 

-$250 

$375 

2.25 

-0.5 

-$750 

$375 

0.25 

0.5 

$1,250 

$625 

0.25 

1.5 

$250 

$375 

2.25 

2.5 

$750 

$1,875 

6.25 

3.5 

$2,250 

$7,875 

12.25 

4.5 

$1,750 

$7,875

20.25 

5.5 

$2,750 

$15,125 

30.25 

Total 

$68,500 

143 

 

Sue computes b:

b = $68,500/143

= $479.02

Now that Sue knows b, she calculates a:

a = $12,750 – $479.02(6.5)

= $12,750 – $3,113.64

= $9,636.36

Hence, assuming errors are zero, Sue’s least-squares regression equation is:

Y(hat) =$9,636.36 + $479.02X

Or, in business terminology:

Forecasted Sales = $9,636.36 + $479.02 * Month number.

This means that each passing month is associated with an average increase in sales of $479.02 for Sue’s CPA firm. How accurately does this regression model predict sales? Sue estimates the error by plugging each month’s number into the equation and then comparing her forecast for that month with the actual sales:

Month (X) 

Sales (Y) 

Forecasted Sales 

Error 

1 

$10,000 

$10,115.38

-$115.38 

2 

$11,000 

$10,594.41 

$405.59 

3 

$10,500 

$11,073.43 

-$573.43 

4 

$11,500 

$11,552.45 

-$52.45 

5 

$12,500 

$12,031.47 

$468.53 

6 

$12,000 

$12,510.49 

-$510.49 

7 

$14,000 

$12,989.51 

$1,010.49 

8 

$13,000 

$13,468.53 

-$468.53 

9 

$13,500 

$13,947.55 

-$447.55

10 

$15,000 

$14,426.57 

$573.43 

11 

$14,500 

$14,905.59 

-$405.59 

12 

$15,500 

$15,384.62 

$115.38 

 

Sue’s actual and forecasted sales appear to be pretty close, except for her July estimate, which is off by a little over $1,000. But does her model predict better than if she simply used average sales as her forecast for each month? To do that, she must compute the error sum of squares, ESS, error. Sue must square the error terms for each observation and sum them up to obtain ESS:

ESS = Σe2

Error 

Squared Error 

-$115.38 

$13,313.61 

$405.59 

$164,506.82 

-$573.43 

$328,818.04 

-$52.45 

$2,750.75 

$468.53 

$219,521.74 

-$510.49 

$260,599.54 

$1,010.49 

$1,021,089.05 

-$468.53 

$219,521.74 

-$447.55 

$200,303.19 

$573.43 

$328,818.04 

-$405.59 

$164,506.82 

$115.38 

$13,313.61 

ESS=

$2,937,062.94 

 

Notice Sue’s error sum of squares. This is the error, or unexplained, sum of squared deviations between the forecasted and actual sales. The difference between the total sum of squares (TSS) and the Error Sum of Squares (ESS) is the regression sum of squares, RSS, and that is the sum of squared deviations that are explained by the regression. RSS is also calculated as each forecasted value of sales less the average of sales:

Forecasted Sales 

Average Sales

Regression Error 

Reg. Error Squared 

$10,115.38 

$12,750 

-$2,634.62 

$6,941,198.22 

$10,594.41 

$12,750 

-$2,155.59 

$4,646,587.24 

$11,073.43 

$12,750 

-$1,676.57 

$2,810,898.45 

$11,552.45 

$12,750 

-$1,197.55 

$1,434,131.86 

$12,031.47 

$12,750 

-$718.53

$516,287.47 

$12,510.49 

$12,750 

-$239.51 

$57,365.27 

$12,989.51 

$12,750 

$239.51 

$57,365.27 

$13,468.53 

$12,750 

$718.53 

$516,287.47 

$13,947.55 

$12,750 

$1,197.55 

$1,434,131.86 

$14,426.57 

$12,750 

$1,676.57 

$2,810,898.45 

$14,905.59 

$12,750 

$2,155.59 

$4,646,587.24

$15,384.62 

$12,750 

$2,634.62 

$6,941,198.22 

   

RSS= 

$32,812,937.06 

 

Sue immediately adds the RSS and the ESS and sees they match the TSS: $35,750,000. She also knows that nearly 33 million of that TSS is explained by her regression model, so she divides her RSS by the TSS:

32,812,937.06 / 35,750,000

=.917 or 91.7%

This quotient, known as the coefficient of determination, and denoted as R2, tells Sue that each passing month explains 91.7% of the change in monthly sales that she experiences. What R2 means is that Sue improved her forecast accuracy by 91.7% by using this simple model instead of the simple average. As you will find out in subsequent blog posts, maximizing R2 isn’t the “be all and end all”. In fact, there is still much to do with this model, which will be discussed in next week’s Forecast Friday post. But for now, Sue’s model seems to have reduced a great deal of error.

It is important to note that while each month does seem to be related to sales, the passing months do not cause the increase in sales. Correlation does not mean causation. There could be something behind the scenes (e.g., Sue’s advertising, or the types of projects she works on, etc.) that is driving the upward trend in her sales.

Using the Regression Equation to Forecast Sales

Now Sue can use the same model to forecast sales for January 2010 and February 2010, etc. She has her equation, so since January 2010 is period 13, she plugs in 13 for X, and gets a forecast of $15,863.64; for February (period 14), she gets $16,342.66.

Recap and Plan for Next Week

You have now learned the basics of simple regression analysis. You have learned how to estimate the parameters for the regression equation, how to measure the improvement in accuracy from the regression model, and how to generate forecasts. Next week, we will be checking the validity of Sue’s equation, and discussing the important assumptions underlying regression analysis. Until then, you have a basic overview of what regression analysis is.

Advertisements

Charities are Spying on You – But That’s Not Necessarily a Bad Thing!

May 26, 2010

The June 2010 issue of SmartMoney magazine contained an interesting article, “Are Charities Spying On You?,” which discussed the different ways nonprofit organizations are trying to find out information – available from public sources – on current and prospective donors. As one who has worked in the field of data mining and predictive analytics, I found the article interesting in large part because of how well the nonprofit sector has made use of these very techniques in designing their campaigns, solicitations, and programming.

At first glance, it can seem frightening what charities can learn about you. For instance, the article mentions how some charities’ prospect-research departments look at LinkedIn profiles, survey your salary history, and even use satellite images to get information on the home in which you live. And there is a wealth of information out there about us: Zillow.com gives info about the value of our homes and those around it; if you write articles or letters to the editor of your newspaper, online versions can often be found on Google; buy or sell any real estate? That too gets published in the online version of the newspaper; and online bridal and baby shower registries, graduation and wedding announcements, and any other news are fair game. And your shopping history! If you buy online or through a catalog, your name ends up on mailing lists that charities buy. Face it, there’s a lot of information about us that is widely and publicly available.

But is this so terrible? For the most part, I don’t think so. Surely, it’s bad if that information is being used against you. But think of the ways this data mining proves beneficial:

Customization

Let’s assume that you and I are both donors to the Republican National Committee. That suggests we’re both politically active and politically conservative. But are we engaged with the RNC in the same way? Most likely not. You might have donated to the RNC because you’re a wealthy individual who values low taxes and opposes a national health care plan; I might have donated because I am a social conservative who wants prayer in public schools, favors school choice, and opposes abortion. By seeking out information on us, the RNC can tailor its communications in a manner that speaks to each of us individually, sending you information about how it’s fighting proposed tax hikes in various states, and sending me information about school choice initiatives. In this way, the RNC maintains its relevance to each of us.

In addition, it’s very likely, in this example, that you’re donating a lot more money to the RNC than I am. Hence, that would likely lead the RNC to offer you special perks, such as free passes for you and a guest to meet various candidates or attend special luncheons or events. For me, I might at best be given an autographed photo of the event – in exchange for a donation of course – or an invite to the same events, but with a donation of a lot of money requested. I might get information about when the next Tea Party rally in my area will be held. Or even a brief newsletter. One can argue that the treatment you’re getting vs. that of what I’m getting is unfair. However, think of it like this: at a casino, people who gamble regularly and heavily are given all sorts of complimentary perks: drinks, food, a host to attend to their needs, and even special reduced rate stays. That’s because these gamblers are making so much money for the casino, that the cost of these “comps” is small in comparison. In addition, the casino wants to make it more fun for these gamblers to lose money, so that they’ll keep on playing. In short, the special treatment you’re getting is something you’re paying for, if indirectly. I’m getting less because I’m giving less; you’re getting more because you’re giving more. And the charity will give you more to keep you giving more!

Reduced Waste

Before direct marketing got so sophisticated, mass marketing was the only tactic. If you had a product to sell, you sent the same solicitation to thousands, if not millions of people and hoped for a 1-2% response rate. Most people simply threw your solicitation in the garbage when it came in the mail. Many recipients didn’t have a need for the item you were selling or the appeal for which you were soliciting, and disregarded your piece. As a result, lots of paper was wasted, and the phrase “junk mail” came into existence. In addition, if you used follow-up methods, such as phone calls after the mailing, that got costly trying to qualify the leads, just because of the labor involved.

Now, with targeted marketing and list rental, sales, and sharing, charities can build predictive models that estimate each current and prospective donor’s likelihood of responding to a promotion. As a result, the charity doesn’t need to send out quite a large mailing; it can mail solely to those with the best chance of responding, reducing the amount of paper, print, and postage involved, not to mention reduced labor costs involved, both in the production of the piece and in the staffing of the outbound call center. In short, the charity’s data mining is helping the environment, reducing overhead, and increasing the top and bottom lines.

Better Programming

By knowing more about you, the charity can know what makes you “tick,” so that it can come up with programs that fit your needs. Even if you’re not a large donor, if you and other donors feel strongly about certain issues, or value certain programs, the charity can develop programs that are suitable to its members at large. And while many larger donors may be granted special privileges, their large donations can help fund the programs of those who donate less. Everybody wins.

Not bad at all

The data mining tactics charities use aren’t bad. People don’t want to be bombarded with solicitations for which they see no value in it for themselves. Data mining makes it very possible to give you an offer that is relevant to your situation, is cost-effective and resource-efficient, and design programs from which you’re likely to benefit. It is important to note, that while major donors get several great perks, charities must not ignore those whose donations are smaller, for two reasons: first, they have the potential to become major donors, and second, because of their smaller donations, it’s very likely their frequency of giving is greater. This can mean a great stream of gifts to the charity over time. Hence, charities should do things that show these donors they’re appreciated – and, quite often, this too is often accomplished by data mining.

We welcome replies to our blog post!

Using Statistics to Evaluate a Promotion

May 25, 2010

Marketing – as much as cashflow – is the lifeblood of any business. No matter how good your product or service may be, it’s worthless if you can’t get it in front of your customers and get them to buy it. So all businesses, large and small, must engage in marketing. And we see countless types of marketing promotions or tactics being tried: radio and TV commercials, magazine and newspaper advertisements, public relations, coupons, email blasts, and so forth. But are our promotions working? The merchant John Wannamaker, often dubbed the father of modern advertising is said to have remarked, “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.”

Some basic statistics can help you evaluate the effectiveness of your marketing and take away much of the mystique Wannamaker complained about. When deciding whether to do a promotion, managers and business owners have no way of knowing whether it will succeed; in fact, in today’s economy, budgets are still tight. The cost to roll out a full promotion can wipe out an entire marketing budget if it proves to be a fiasco. This is why many businesses do a test before doing a complete rollout. The testing helps to reduce the amount of uncertainty involved in an all-out campaign.

Quite often, large companies need to choose between two or more competing campaigns for rollout. But how do they know which will be effective? Consider the example of Jenny Kaplan, owner of K-Jen, a New Orleans-style restaurant. K-Jen serves up a tasty jambalaya entrée, which is priced at $10.00. Jenny believes that the jambalaya is a draw to the restaurant and believes that by offering a discount, she can increase the average amount of the table check. Jenny decides to issue coupons via email to patrons who have opted-in to receive such promotions. She wants to knock a dollar off the price of the jambalaya as the offer, but doesn’t know whether customers would respond better to an offer worded as “$1.00 off” or as “10% off.” So, Jenny decides to test the two concepts.

Jenny goes to her database of nearly 1,000 patrons and randomly selects 200 patrons. She decides to send half of those a coupon for $1.00 off for jambalaya, and the other half a coupon for 10% off. When the coupon offer expires 10 days later, Jenny finds that 10 coupons were redeemed for each offer – a redemption rate of 10% each. Jenny observes that either wording will get the same number of people to respond. But she wonders which offer generated the largest table check. So she looks at the guest checks to which the coupons were stapled. She notices the following:

Guest Check Amounts

 

Offer

 
 

$1.00 off

10% Off

 
 

$38.85

$50.16

 
 

$36.97

$54.44

 
 

$35.94

$32.20

 
 

$54.17

$32.69

 
 

$68.18

$51.09

 
 

$49.47

$46.18

 
 

$51.39

$57.72

 
 

$32.72

$44.30

 
 

$22.59

$59.29

 
 

$24.13

$22.94

 

 

Jenny quickly computes the average for each offer. The “$1.00 off” coupon generated an average table check of $41.44; the “10% off” coupon generated an average of $45.10. At first glance, it appears that the 10% off promotion generated a higher guest check. But is that difference meaningful, or is it due to chance? Jenny needs to do further analysis.

Hypothesis Testing

How does Jenny determine if the 10% off coupon really did better than the $1.00 off coupon? She can use statistical hypothesis testing, which is a structured analytical method for comparing the difference between two groups – in this case, two promotions. Jenny starts her analysis by formulating two hypotheses: a null hypothesis, which states that there is no difference in the average check amount for either offer; and an alternative hypothesis, which states that there is, in fact, a difference in the average check amount between the two offers. The null hypothesis is often denoted as H0, and the alternative hypothesis is denoted as HA. Jenny also refers to the $1.00 off offer as Offer #1, and the 10% off offer as Offer #2. She wants to compare the means of the two offers, the means of which are denoted as μ1 and μ2, respectively. Jenny writes down her two hypotheses:

H0: The average guest check amount for the two offers is equal.

HA: The average guest check amount for the two offers is not equal.

Or, more succinctly:

H0: μ12

HA: μ1≠μ2

 

Now, Jenny is ready to go to work. Note that the symbol μ denotes the population she wants to measure. Because Jenny did her test on a portion – a sample – of her database, the averages she computed were the sample average, which is denoted as . As we stated earlier, the average table checks for the “$1.00 off” and “10% off” offers were 1=$41.44 and 2=$45.10, respectively. Jenny needs to approximate μ using . She must also compute the sample standard deviation, or s for each offer.

Computing the Sample Standard Deviation

To compute the sample standard deviation, Jenny must subtract the mean of a particular offer from each of its check amounts in the sample; square the difference; sum them up; divide by the total observations minus 1(9) and then take the square root:

$1.00 Off

Actual Table Check

Average Table Check

Difference

Difference Squared

$38.85

$41.44

-$2.59

$6.71

$36.97

$41.44

-$4.47

$19.99

$35.94

$41.44

-$5.50

$30.26

$54.17

$41.44

$12.73

$162.03

$68.18

$41.44

$26.74

$714.97

$49.47

$41.44

$8.03

$64.46

$51.39

$41.44

$9.95

$98.98

$32.72

$41.44

-$8.72

$76.06

$22.59

$41.44

-$18.85

$355.36

$24.13

$41.44

-$17.31

$299.67

   

Total

$1,828.50

   

S21=

$203.17

   

S1=

$14.25

 

10% Off

Actual Table Check

Average Table Check

Difference

Difference Squared

$50.16

$45.10

$5.06

$25.59

$54.44

$45.10

$9.34

$87.22

$32.20

$45.10

-$12.90

$166.44

$32.69

$45.10

-$12.41

$154.03

$51.09

$45.10

$5.99

$35.87

$46.18

$45.10

$1.08

$1.16

$57.72

$45.10

$12.62

$159.24

$44.30

$45.10

-$0.80

$0.64

$59.29

$45.10

$14.19

$201.33

$22.94

$45.10

-$22.16

$491.11

   

Total

$1,322.63

   

S22=

$146.96

   

S2=

$12.12

 

Notice the denotation of S2. That is known as the variance. The variance and the standard deviation are used to measure the average distance between each data point and the mean. When data are normally distributed, about 95% of all observations fall within two standard deviations from the mean (actually 1.96 standard deviations). Hence, approximately 95% of the guest checks for the $1.00 off offer should fall between $41.44 ± 1.96*($14.25) or between $13.51 and $69.37. All ten fall within this range. For the 10% off offer, about 95% will fall between $45.10 ± 1.96*($12.12), or between $21.34 and $68.86. All 10 observations also fall within this range.

Degrees of Freedom and Pooled Standard Deviation

Jenny noticed two things immediately: first, that the 10% off coupon has the higher sample average, and second each individual table check is closer to it mean than it is for the $1.00 off coupon. Also notice that when we were computing the sample standard deviation for each offer, Jenny divided by 9, and not 10. Why? Because she was making estimates of the population standard deviation. Since samples are subject to error, we must account for that. Each observation gives us information into the population’s actual values. However, Jenny had to make an estimate based on that sample, so she gives up one observation to account for the sampling error – that is, she lost a degree of freedom. In this example, Jenny has 20 total observations; since she estimated the population standard deviation for both offers, she lost two degrees of freedom, leaving her with 18 (10 + 10 – 2).

Knowing the remaining degrees of freedom, Jenny must pool the standard deviations, weighting them by their degrees of freedom. This would be especially evident if the sample sizes of the two offers were not equal. The pooled standard deviation is given by:

FYI – n is simply the sample size. Jenny then computes the pooled standard deviation:

S2p = ((9 * $203.17) + (9 * $146.96))) / (10 + 10 – 2)

= ($1,828.53 + $1,322.64)/18

= $3,151.17/18

= $175.07

Now take the square root: $13.23

Hence, the pooled standard deviation is $13.23

Computing the t-Test Statistic

Now the fun begins. Jenny knows the sample mean of the two offers; she knows the hypothesized difference between the two population means (which we would expect to be zero, if the null hypothesis said they were equal); she knows the pooled standard deviation; she knows the sample size; and she knows the degrees of freedom. Jenny must now calculate the t-Test statistic. The t-Test Statistic, or the t-value, represents the number of estimated standard errors the sample average is from that of the population. The t-value is computed as follows:

 

So Jenny sets to work computing her t-Test Statistic:

t = (($41.44 – $45.10) – (0)) / ($13.23) * SQRT(1/10 + 1/10)

= -$3.66 / ($13.23 * SQRT(1/5))

=-$3.66 / ($13.23 * .45)

=-$3.66/$5.92

= -0.62

This t-statistic gives Jenny a basis for testing her hypothesis. Jenny’s t-statistic indicates that the difference in sample table checks between the two offers is 0.62 standard errors below the hypothesized difference of zero. We now need to determine the critical t – the value that we get from a t-distribution table that is available in most statistics textbooks and online. Since we are estimating with a 95% confidence interval, and since we must account for a small sample, our critical t-value is adjusted slightly from the 1.96 standard deviations from the mean. For 18 degrees of freedom, our critical t is 2.10. The larger the sample size, the closer to 1.96 the critical t would be.

So, does Jenny Accept or Reject her Null Hypothesis (Translation: Is the “10% Off” Offer Better than the “$1.00 Off” Offer)?

Jenny now has all the information she needs to determine whether one offer worked better than the other. What does the critical t of 2.10 mean? If Jenny’s t-statistic is greater than 2.10, or (since one offer can be lower than the other), less than -2.10, then she would reject her null hypothesis, as there is sufficient evidence to suggest that the two means are not equal. Is that the case?

Jenny’s t-statistic is -0.62, which is between -2.10 and 2.10. Hence, it is within the parameters. Jenny should not reject H0, since there is not enough evidence to suggest that one offer was better than the other at generating higher table checks. In fact, there’s nothing to say that the difference between the two offers is due to anything other than chance.

What Does Jenny Do Now?

Basically, Jenny can conclude that there’s not enough evidence that the “$1.00 off” coupon was worse/better than the “10% off” coupon in generating higher table check amounts, and vice-versa. This does not mean that our hypotheses were true or false, just that there was not enough statistical evidence to say so. In this case, we did not accept the null hypothesis, but rather, failed to reject it. Jenny can do a few things:

  1. She can run another test, and see if the same phenomenon holds.
  2. Jenny can accept the fact that both offers work equally well, and compare their overall average table checks to those of who ordered jambalaya without the coupons during the time the offer ran; if the coupons generated average table checks that were higher (using the hypothesis testing procedures outlined above) than those who paid full price, then she may choose to rollout a complete promotion using either or both of the offers described above.
  3. Jenny may decide that neither coupon offer raised average check amounts and choose not to do a full rollout after all.

So Why am I Telling You This?

The purpose of this blog post was to take you step-by-step into how you can use a simple concept like t-tests to judge the performance of two promotion concepts. Although a spreadsheet like Excel can run this test in seconds, I wanted to walk you through the theory in laymen’s terms, so that you can grasp the theory, and then apply it to your business. Analysights is in the business of helping companies – large and small – succeed at marketing, and this blog post is one ingredient in the recipe for your marketing success. If you would like some assistance in setting up a promotion test or in evaluating the effectiveness of a campaign, feel free to contact us at www.analysights.com.

 

Forecast Friday Topic: Double Exponential Smoothing

May 20, 2010

(Fifth in a series)

We pick up on our discussion of exponential smoothing methods, focusing today on double exponential smoothing. Single exponential smoothing, which we discussed in detail last week, is ideal when your time series is free of seasonal or trend components, which create patterns that your smoothing equation would miss due to lags. Single exponential smoothing produces forecasts that exceed actual results when the time series exhibits a decreasing linear trend, and forecasts that trail actual results when the time series exhibits an increasing trend. Double exponential smoothing takes care of this problem.

Two Smoothing Constants, Three Equations

Recall the equation for single exponential smoothing:

Ŷt+1 = αYt + (1-α) Ŷt

Where: Ŷt+1 represents the forecast value for period t + 1

Yt is the actual value of the current period, t

Ŷt is the forecast value for the current period, t

and α is the smoothing constant, or alpha, 0≤ α≤ 1

To account for a trend component in the time series, double exponential smoothing incorporates a second smoothing constant, beta, or β. Now, three equations must be used to create a forecast: one to smooth the time series, one to smooth the trend, and one to combine the two equations to arrive at the forecast:

Ct = αYt + (1-α)(Ct-1 + T t-1)

Tt = β(Ct – Ct-1) + (1 – β)T t-1

Ŷt+1 = Ct + Tt

All symbols appearing in the single exponential smoothing equation represent the same in the double exponential smoothing equation, but now β is the trend-smoothing constant (whereas α is the smoothing constant for a stationary – constant – process) also between 0 and 1; Ct is the smoothed constant process value for period t; and Tt is the smoothed trend value for period t.

As with single exponential smoothing, you must select starting values for Ct and Tt, as well as values for α and β. Recall that these processes are judgmental, and constants closer to a value of 1.0 are chosen when less smoothing is desired (and more weight placed on recent values) and constants closer to 0.0 when more smoothing is desired (and less weight placed on recent values).

An Example

Let’s assume you’ve got 12 months of sales data, shown in the table below:

Month t

Sales Yt

1

152

2

176

3

160

4

192

5

220

6

272

7

256

8

280

9

300

10

280

11

312

12

328

You want to see if there is any discernable trend, so you plot your sales on the chart below:

The time series exhibits an increasing trend. Hence, you must use double exponential smoothing. You must first select your initial values for C and T. One way to do that is to again assume that the first value is equal to its forecast. Using that as the starting point, you set C2 = Y1, or 152. Then you subtract Y1 from Y2 to get T2: T2 = Y2 – Y1 = 24. Hence, at the end of period 2, your forecast for period 3 is 176 (Ŷ3 = 152 + 24).

Now you need to choose α and β. For the purposes of this example, we will choose an α of 0.20 and a β of 0.30. Actual sales in period 3 were 160, and our constant-smoothing equation is:

C3 = 0.20(160) + (1 – 0.20)(152 + 24)

= 32 + 0.80(176)

= 32 + 140.8

= 172.8

Next, we compute the trend value with our trend-smoothing equation:

T3 = 0.30(172.8 – 152) + (1 – 0.30)(24)

= 0.30(20.8) + 0.70(24)

= 6.24 + 16.8

=23.04

Hence, our forecast for period 4 is:

Ŷ4 = 172.8 + 23.04

= 195.84

Then, carrying out your forecasts for the 12-month period, you get the following table:

     

Alpha=

0.2

Beta=

0.3

Month t

Sales Yt

Ct

Tt

Ŷt

Absolute Deviation

1

152

       

2

176

152.00

24.00

152.00

 

3

160

172.80

23.04

176.00

16.00

4

192

195.07

22.81

195.84

3.84

5

220

218.31

22.94

217.88

2.12

6

272

247.39

24.78

241.24

30.76

7

256

268.94

23.81

272.18

16.18

8

280

290.20

23.05

292.75

12.75

9

300

310.60

22.25

313.25

13.25

10

280

322.28

19.08

332.85

52.85

11

312

335.49

17.32

341.36

29.36

12

328

347.85

15.83

352.81

24.81

       

MAD=

20.19

 

Notice a couple of things: the absolute deviation is the absolute value of the difference between Yt (shown in lavender) and Ŷt (shown in light blue). Note also that beginning with period 3, Ŷ3 is really the sum of C and T computed in period 2. That’s because period 3’s constant and trend forecasts were generated at the end of period 2 – and onward until period 12. Mean Absolute Deviation has been computed for you. As with our explanation of single exponential smoothing, you need to experiment with the smoothing constants to find a balance that most accurate forecast at the lowest possible MAD.

Now, we need to forecast for period 13. That’s easy. Add C12 and T12:

Ŷ13 = 347.85 + 15.83

= 363.68

And, your chart comparing actual vs. forecasted sales is:

As with single exponential smoothing, you see that your forecasted curve is smoother than your actual curve. Notice also how small the gaps are between the actual and forecasted curves. The fit’s not bad.

Exponential Smoothing Recap

Now let’s recap our discussion on exponential smoothing:

  1. Exponential smoothing methods are recursive, that is, they rely on all observations in the time series. The weight on each observation diminishes exponentially the more distant in the past it is.
  2. Smoothing constants are used to assign weights – between 0 and 1 – to the most recent observations. The closer the constant is to 0, the more smoothing that occurs and the lighter the weight assigned to the most recent observation; the closer the constant is to 1, the less smoothing that occurs and the heavier the weight assigned to the most recent observation.
  3. When no discernable trend is exhibited in the data, single exponential smoothing is appropriate; when a trend is present in the time series, double exponential smoothing is necessary.
  4. Exponential smoothing methods require you to generate starting forecasts for the first period in the time series. Deciding on those initial forecasts, as well as on the values of your smoothing constants – alpha and beta – are arbitrary. You need to base your judgments on your experience in the business, as well as some experimentation.
  5. Exponential smoothing models do not forecast well when the time series pattern (e.g., level of sales) is suddenly, drastically, and permanently altered by some event or change of course or action. In these instances, a new model will be necessary.
  6. Exponential smoothing methods are best used for short-term forecasting.

Next Week’s Forecast Friday Topic: Regression Analysis (Our Series within the Series!)

Next week, we begin a multi-week discussion of regression analysis. We will be setting up the next few weeks with a discussion of the principles of ordinary least squares regression (OLS), and then discussions of its use as a time-series forecasting approach, and later as a causal/econometric approach. During the course of the next few Forecast Fridays, we will discuss the issues that occur with regression: specification bias, autocorrelation, heteroscedasticity, and multicollinearity, to name a few. There will be some discussions on how to detect – and correct – these violations. Once the regression analysis miniseries is complete, we will be set up to discuss ARMA and ARIMA models, which will be written by guest bloggers who are well-experienced in those approaches. We know you’ll be very pleased with the weeks ahead!

Still don’t know why our Forecast Friday posts appear on Thursday? Find out at: http://tinyurl.com/26cm6ma

New York Life: How Traditional Approach Made for Great Marketing

May 19, 2010

This week, I got the May 24 issue of Fortune Magazine and skipped to this issue’s profile of one of the “World’s Most Admired Companies.” This time it was New York Life, the nation’s largest mutual life insurer. As I read the article, I was pretty intrigued by the company’s operation: very conservative. While New York Life is owned by policyholders, it didn’t follow the lead of other major insurers to invest aggressively for the sake of paying generous dividends. And the insurer chose to remain neutral in a price war on some lines of insurance, even though that meant losing some business in 2008. New York Life also invests in its own captive sales force – 12,000 agents strong – a practice so cost prohibitive to many publicly-traded insurers that they’re forced to rely on a network of banks, independent agents, and broker-dealers to push their insurance.

Fewer and fewer of us want to be viewed as traditional or passé, so one would think that New York Life’s conservative approach would have cost it a great deal of business. And in the go-go years, that seemed to be the case. But now, two years after a near meltdown in financial services, New York Life appears to have been vindicated: it had a record $15 billion surplus of cash in 2009; it has continued to pay policyholders dividends for the 156th year, and had an increase of 40,000 policies sold in 2009. Even better, it didn’t have to raise premium rates like many of its price war competitors.

Just look at the effective marketing system New York Life has built for itself. Recall the components of the marketing mix: product, price, position, promotion, and distribution. It’s easy to discern from the article that New York Life got all of these components right. While New York Life also sells mutual funds, long-term care insurance, and annuities, it has neither forgotten nor abandoned its core product: life insurance. In fact, the company still emphasizes it as an important part of a family’s protection. Because of its traditional investment style, New York Life’s pricing is competitive. In terms of promotion, New York Life turned its traditional operation into a distinct advantage, boosting ad spending by 24% and trumpeting how its conservative style was appropriate for these economic times. Distribution is handled through by New York Life’s own captive agent force – the only agents for New York Life, all New York Life, and nothing but New York Life. Every New York Life agent I’ve met knows its products backward and forward, and knows quickly which ones are most ideal for prospective and existing customers. Now positioning… New York Life apparently could market itself as the kind of insurance company that gives its policyholders great peace of mind. Policyholders can sleep at night knowing dividends will be paid consistently, premiums will remain stable, that they have the right insurance, and that the company will be around to pay out when they need to make a claim.

I am not a New York Life policyholder. I came very close a couple of years ago, but another company had a policy that was better suited to my needs. And I found it hard to reject the New York Life agent who had been working with me to find the right policy for me. But when my insurance needs change, New York Life is on my short list, a further testament of its marketing success: make a great impression on a prospective customer so that if he/she doesn’t buy now, there’s a good chance he/she will do so in the future.