## Archive for June, 2010

### Forecast Friday Topic: Multiple Regression Analysis

June 17, 2010

(Ninth in a series)

Quite often, when we try to forecast sales, more than one variable is often involved. Sales depends on how much advertising we do, the price of our products, the price of competitors’ products, the time of the year (if our product is seasonal), and also demographics of the buyers. And there can be many more factors. Hence, we need to measure the impact of all relevant variables that we know drive our sales or other dependent variable. That brings us to the need for multiple regression analysis. Because of its complexity, we will be spending the next several weeks discussing multiple regression analysis in easily digestible parts. Multiple regression is a highly useful technique, but is quite easy to forget if not used often.

Another thing to note, regression analysis is often used for both time series and cross-sectional analysis. Time series is what we have focused on all along. Cross-sectional analysis involves using regression to analyze variables on static data (such as predicting how much money a person will spend on a car based on income, race, age, etc.). We will use examples of both in our discussions of multiple regression.

Determining Parameter Estimates for Multiple Regression

When it comes to deriving the parameter estimates in a multiple regression, the process gets both complicated and tedious, even if you have just two independent variables. We strongly advise you to use the regression features of MS-Excel, or some statistical analysis tool like SAS, SPSS, or MINITAB. In fact, we will not work out the derivation of the parameters with the data sets, but will provide you the results. You are free to run the data we provide on your own to replicate the results we display. I do, however, want to show you the equations for computing the parameter estimates for a three-variable (two independent variables and one dependent variable), and point out something very important.

Let’s assume that sales is your dependent variable, Y, and advertising expenditures and price are your independent variables, X1 and X2, respectively. Also, the coefficients – your parameter estimates will have similar subscripts to correspond to their respective independent variable. Hence, your model will take on the form:

Now, how do you go about computing α, β1 and β2? The process is similar to that of a two-variable model, but a little more involved. Take a look:

The subscript “i” represents the individual oberservation.  In time series, the subscript can also be represented with a “t“.

What do you notice about the formulas for computing β1 and β2? First, you notice that the independent variables, X1 and X2, are included in the calculation for each coefficient. Why is this? Because when two or more independent variables are used to estimate the dependent variable, the independent variables themselves are likely to be related linearly as well. In fact, they need to be in order to perform multiple regression analysis. If either β1 or β2 turned out to be zero, then simple regression would be appropriate. However, if we omit one or more independent variables from the model that are related to those variables in the model, we run into serious problems, namely:

Specification Bias (Regression Assumptions Revisited)

Recall from last week’s Forecast Friday discussion on regression assumptions that 1) our equation must correctly specify the true regression model, namely that all relevant variables and no irrelevant variables are included in the model and 2) the independent variables must not be correlated with the error term. If either of these assumptions is violated, the parameter estimates you get will be biased. Looking at the above equations for β1 and β2, we can see that if we excluded one of the independent variables, say X2, from the model, the value derived for β1 will be incorrect because X1 has some relationship with X2. Moreover, X2‘s values are likely to be accounted for in the error terms, and because of its relationship with X1, X1 will be correlated with the error term, violating the second assumption above. Hence, you will end up with incorrect, biased estimators for your regression coefficient, β1.

Omitted Variables are Bad, but Excessive Variables Aren’t Much Better

Since omitting relevant variables can lead to biased parameter estimates, many analysts have a tendency to include any variable that might have any chance of affecting the dependent variable, Y. This is also bad. Additional variables means that you need to estimate more parameters, and that reduces your model’s degrees of freedom and the efficiency (trustworthiness) of your parameter estimates. Generally, for each variable – both dependent and independent – you are considering, you should have at least five data points. So, for a model with three independent variables, your data set should have 20 observations.

Another Important Regression Assumption

One last thing about multiple regression analysis – another assumption, which I deliberately left out of last week’s discussion, since it applies exclusively to multiple regression:

No combination of independent variables should have an exact linear relationship with one another.

OK, so what does this mean? Let’s assume you’re doing a model to forecast the effect of temperature on the speed at which ice melts. You use two independent variables: Celsius temperature and Fahrenheit temperature. What’s the problem here? There is a perfect linear relationship between these two variables. Every time you use a particular value of Fahrenheit temperature, you will get the same value of Celsius temperature. In this case, you will end up with multicollinearity, an assumption violation that results in inefficient parameter estimates. A relationship between independent variables need not be perfectly linear for multicollinearity to exist. Highly correlated variables can do the same thing. For example, independent variables such as “Husband Age” and “Wife Age,” or “Home Value” and “Home Square Footage” are examples of independent variables that are highly correlated.

You want to be sure that you do not put variables in the model that need not be there, because doing so could lead to multicollinearity.

Now Can We Get Into Multiple Regression????

Wasn’t that an ordeal? Well, now the fun can begin! I’m going to use an example from one of my old graduate school textbooks, because it’s good for several lessons in multiple regression. This data set is 25 annual observations to predict the percentage profit margin (Y) for U.S. savings and loan associations, based on changes in net revenues per deposit dollar (X1) and number of offices (X2). The data are as follows:

 Year Percentage Profit Margin (Yt) Net Revenues Per Deposit Dollar (X1t) Number of Offices (X2t) 1 0.75 3.92 7,298 2 0.71 3.61 6,855 3 0.66 3.32 6,636 4 0.61 3.07 6,506 5 0.70 3.06 6,450 6 0.72 3.11 6,402 7 0.77 3.21 6,368 8 0.74 3.26 6,340 9 0.90 3.42 6,349 10 0.82 3.42 6,352 11 0.75 3.45 6,361 12 0.77 3.58 6,369 13 0.78 3.66 6,546 14 0.84 3.78 6,672 15 0.79 3.82 6,890 16 0.70 3.97 7,115 17 0.68 4.07 7,327 18 0.72 4.25 7,546 19 0.55 4.41 7,931 20 0.63 4.49 8,097 21 0.56 4.70 8,468 22 0.41 4.58 8,717 23 0.51 4.69 8,991 24 0.47 4.71 9,179 25 0.32 4.78 9,318

Data taken from Spellman, L.J., “Entry and profitability in a rate-free savings and loan market.” Quarterly Review of Economics and Business, 18, no. 2 (1978): 87-95, Reprinted in Newbold, P. and Bos, T., Introductory Business & Economic Forecasting, 2nd Edition, Cincinnati (1994): 136-137

What is the relationship between the S&Ls’ profit margin percentage and the number of S&L offices? How about between the margin percentage and the net revenues per deposit dollar? Is the relationship positive (that is, profit margin percentage moves in the same direction as its independent variable(s))? Or negative (the dependent and independent variables move in opposite directions)? Let’s look at each independent variable’s individual relationship with the dependent variable.

Net Revenue Per Deposit Dollar (X1) and Percentage Profit Margin (Y)

Generally, if revenue per deposit dollar goes up, would we not expect the percentage profit margin to also go up? After all, if the S & L is making more revenue on the same dollar, it suggests more efficiency. Hence, we expect a positive relationship. So, in the resulting regression equation, we would expect the coefficient, β1, for net revenue per deposit dollar to have a “+” sign.

Number of S&L Offices (X2) and Percentage Profit Margin (Y)

Generally, if there are more S&L offices, would that not suggest either higher overhead, increased competition, or some combination of the two? Those would cut into profit margins. Hence, we expect a negative relationship. So, in the resulting regression equation, we would expect the coefficient, β2, for number of S&L offices to have a “-” sign.

Are our Expectations Correct?

Do our relationship expectations hold up?  They certainly do. The estimated multiple regression model is:

Yt = 1.56450 + 0.23720X1t – 0.000249X2t

What do the Parameter Estimates Mean?

Essentially, the model says that if net revenues per deposit dollar (X1t) increase by one unit, then percentage profit margin (Yt) will – on average – increase by 0.23720 percentage points, when the number of S&L offices is fixed. If the number of offices (X2t) increases by one, then percentage profit margin (Yt) will decrease by an average of 0.000249 percentage points, when net revenues are fixed.

Do Changes in the Independent Variables Explain Changes in The Dependent Variable?

We compute the coefficient of determination, R2, and get 0.865, indicating that changes in the number of S&L offices and in the net revenue per deposit dollar explain 86.5% of the variation in S&L percentage profit margin.

Are the Parameter Estimates Statistically Significant?

We have 25 observations, and three parameters – two coefficients for the independent variables, and one intercept – hence we have 22 degrees of freedom (25-3). If we choose a 95% confidence interval, we are saying that if we resampled and replicated this analysis 100 times, the average of our parameter estimates will be contain the true parameter approximately 95 times. To do this, we need to look at the t-values for each parameter estimate. For a two-tailed 95% significance test with 22 degrees of freedom, our critical t-value is 2.074. That means that if the t-statistic for a parameter estimate is greater than 2.074, then there is a strong positive relationship between the independent variable and the dependent variable; if the t-statistic for the parameter estimate is less than -2.074, then there is a strong negative relationship. This is what we get:

 Parameter Value T-Statistic Significant? Intercept 1.5645000 19.70 Yes B1t 0.2372000 4.27 Yes B2t (0.0002490) (7.77) Yes

So, yes, all our parameter estimates are significant.

Next Forecast Friday: Building on What You Learned

I think you’ve had enough for this week! But we are still not finished. We’re going to stop here and continue with further analysis of this example next week. Next week, we will discuss computing the 95% confidence interval for the parameter estimates; determining whether the model is valid; and checking for autocorrelation. The following Forecast Friday (July 1) blog post will discuss specification bias in greater detail, demonstrating the impact of omitting a key independent variable from the model.

### Are We Cramming Surveys Down Peoples’ Throats?

June 16, 2010

Yesterday, I wrote about the problems free online survey tools can cause if a survey is not well-constructed and its purpose isn’t properly thought out. Today, I want to spend time talking about another issue, one that is directly influenced by the availability of several online survey tools, both free and full-price: the overabundance of surveys. Back in the day, I used to look forward to those mail surveys that used to come in the mail once and a while, so that I could open them up and take the dollar that came with them and throw the surveys into the trash (I was in high school then)! Surveys were so few and far between that they were practically a welcome interruption in our lives then.

Fast-forward twenty years. Email, social media, and the Web have transformed everything. Communication is a lot faster. Competition for customers in almost every industry is cutthroat. Angry customers will not hesitate to tweet or blog their dissatisfaction to anyone who will read, retweet, or forward their rants. People have so many choices for entertainment, where and on what to spend their money, and who to buy from.

Businesses need to stay relevant to their customers and need constant feedback. Quite often, the best way to do it is the survey. When you buy a new car, the dealer sends you a survey. Have lunch at Panera? Your receipt will have a website you can visit to take a survey. Stay at a hotel? Survey. Attend a seminar? There’s an evaluation form at the end of the session. Welcome to survey Hell.

We are bombarded with surveys everywhere. I used to be on a consumer panel to take surveys and earn reward points. After two months, I stopped answering because I was getting three of them a week! When you have a business to run, a life to live, and other responsibilities, you just can’t take every survey. After a while, these surveys get complicated and involved. At 9:00 pm, as I struggle to stay awake, I don’t want to take a survey that makes me think!

Let’s consider the various problems involved with survey overuse. Among them:

Reduced response rates

Too many surveys reduce their value. If you get one survey every two weeks, you might complete most, if not all of them. If you get a survey every two days, you’re probably not going to complete even half. There’s just not time. Also, because free online tools have enabled many amateurs to launch a survey, many of these amateur survey “professionals” construct questionnaires with vague, misleading, loaded, or double-barreled questions. Some questions have too many choices. These surveys tend to frustrate respondents, who may choose not to participate. Furthermore, amateurs may pay no attention to the relevant population, and send the survey to anybody, and only those with an interest will respond (and their responses will be biased).

Biased or bogus responses

Imagine getting a survey that wasn’t relevant to you. You might either not respond to it, or you may jokingly fill it out and send it back. Either way, the result is useless to the one conducting the survey. Or, imagine getting a survey whose questions are described as above. If you do respond, your responses won’t be truthful. Or, if the survey is complicated, requiring you to rank several items, or choose from a long list, you may be tempted to answer just the top choices, or pick your choices randomly or haphazardly. You might be compelled to do the same if you just get tons of surveys, or surveys that pay you for taking them.

Another way bias rears its head is in customer satisfaction. When I bought my car four years ago, the dealer told me I would be receiving a survey. He asked me to give him 100%, because his performance evaluation depended on the number of buyers who gave him 100% satisfaction. Another time, I was eating in a Corner Bakery Café in downtown Chicago, when an employee came up to me and said that if I could fill out a customer satisfaction form favorable to the store, there would be a free pastry in it for me. Seeing any problems here?

Reduced Brand Image

Survey abuse can even hurt your company’s brand image. Imagine if different departments send out their own surveys. What if marketing sends out a customer satisfaction survey, while the product development department sends out a survey of its own? Without coordination between the two departments, they could be surveying many of the same people with many of the same questions. As the respondent, you see only the company sending you the surveys, not the individual departments. Hence, you view the company as inefficient and “clumsy,” so you begin to question its brand, service, and quality.

What to do?

There are several ways we can remedy the abuse of surveys. The most immediate thing to remember is that surveys are not the “be-all and end-all.” There are many different opportunities to collect feedback from customers. Businesses need to be nimble, but not be superfast. Remember, haste makes waste. Here are some suggestions:

Save surveys for major projects and initiatives; use other immediate forms of feedback

It’s OK to have a very brief survey to give to customers at the point of service to understand their satisfaction. But nine times out of ten, you should save your surveys for obtaining really important information: identifying the optimal price to charge, determining the size of a market, understanding public opinion, identifying which marketing messages work best, or conducting surveys if and only if there is no other good way to get key information. Instead, try to generate feedback from less formal channels. A hotel might train its service desk employees and concierges to ask guests at various touchpoints about how their stay is going; ask what services or amenities they could use; and ask what can be done to make the remainder of the stay even more enjoyable. The employees can note the responses privately, and feed them into a client database, enhancing marketing messages and service level treatment for future stays.

Don’t tie customer satisfaction to employee incentives

Customer satisfaction is important, but if you tie employee compensation to increasing satisfaction, you’re likely to get scenarios like those I faced with the car dealer and the café. Customers can say in their survey that they were satisfied and that they would return, but then never do so. Instead, base employee compensation on other customer service factors that will truly increase satisfaction.

Try other ways to engage customers

Instead of having seminar attendees fill out an evaluation form, instead, a moderator could take the last 10 minutes to solicit open, honest feedback from the audience to see what they liked, didn’t like, and what could be done better. It’s one thing for people to write things down privately, but another to give thoughts publicly. There is strength in numbers, and people may be inclined to give more honest feedback, for better or worse, collectively. Other businesses might encourage their customers to talk about their experiences on the company blog, Twitter, or Facebook.

In summary, surveys are not the only means of obtaining insights. A good combination of open-customer communication, social media, secondary research, and customer service delivery, along with carefully thought out study objectives can prove highly invaluable. If you see response rates dropping steadily from survey to survey, it probably means you are surveying too much!

### Free Online Survey Tools Can Yield Costly Useless Results if not Used Carefully

June 15, 2010

Thanks to online survey tools like Zoomerang, Surveymonkey, and SurveyPirate, the ability to conduct surveys has been greatly democratized. Small businesses, non-profits, and departments within larger firms can now conduct surveys that they would never have been able to do because of cost and lack of resources. Unfortunately, the greatest drawback of these free survey tools is the same as their greatest benefit: anyone can launch a survey. Launching an effective survey requires a clear definition of the business problem at hand; a carefully thought out discussion of the information needed to address the business problem, the audience of the survey, and how to reach it; determination of the sample size and how to select them; designing, testing, and implementing the questionnaire; and analyzing the results. Free online survey tools do not change this process.

Recently, a business owner from one of my networking groups sent me an online survey that he designed with one of these free tools. It was a questionnaire about children’s toys – which was the business he was in. He wasn’t sending me the survey to look at and give advice; he sent it to me as if I were a prospective customer. Unfortunately, I’m not married and don’t have kids; and all my nieces and nephews are past the age of toys. The survey was irrelevant to me. The toy purveyor needed to think about who his likely buyers were – and he should have good knowledge, based on his past sales, of who his typical buyers are. Then he could have purchased a list of people to whom he could send the survey. Even if that meant using a mail or phone survey, which could be costly, the owner could get more meaningful results. Imagine how many other irrelevant or uninterested recipients received the business owner’s survey. Most probably didn’t respond; but others might have responded untruthfully, giving the owner bogus results.

Also, the “toy-preneur’s” survey questions were poorly designed. A double-barreled question: “Does your child like educational or action toys?” What if a respondent’s child liked both educational and action toys? The owner should have asked two separate questions: “Does your child like educational toys?” and “Does your child like action toys?” Or he could have asked a multi-part question like, “Check the box next to each of the types of toys your child likes to play with,” followed with a list of the different types of toys.

The survey gets worse… questions like: “How much does your child’s happiness mean to you?” How many people are going to answer that question negatively? Hello? Another asking the respondent to rank-order various features of a toy for which there was no prototype pictured, and if that wasn’t bad enough, there were at least 9 items to rank? Most people can’t rank more than five items, especially not for an object they cannot visualize.

We also don’t know how the toy manufacturer selected his sample. My guess was that he sent it to everyone whose business card he collected. Hence, most of the people he was surveying were the wrong people. In addition to getting unacceptable results, another danger of these online survey tools is that people are more frequently bombarded with surveys that they stop participating in surveys altogether. Imagine if you were to receive five or more of these surveys in less than two weeks. How much time are you willing to give to answering these surveys? Then when a truly legitimate survey comes up, how likely are you to participate?

I think it’s great that most companies now have the ability to conduct surveys on the cheap. However, the savings can be greatly offset by the uselessness of the results if the survey is designed poorly or sent to the wrong sample. There is nothing wrong with reading up on how to do a survey and then executing it, as described, as long as the problem is well-defined, the relevant population is identified, and the sampling, execution, and analysis plans are in place. “Free” surveying isn’t good if it costs you money and time in rework and/or in faulty actions taken based on your findings.

Do you have trouble deciding whether you need to do a survey? Do you spend a lot of time trying to find out what you’re trying to learn from a survey? Or how many people to survey? Or the questions you need to ask? Or which people to survey? Let Analysights help. We have nearly 20 years of survey research experience and a strong background in data analysis. We can help you determine whether a survey is the best approach for your research needs, the best questions to ask to get the information you need, and help you understand what the findings mean. Feel free to call us at (847) 895-2565.

### Forecast Friday Topic: Prelude to Multiple Regression Analysis – Regression Assumptions

June 10, 2010

(Eighth in a series)

In last week’s Forecast Friday post, we continued our discussion of simple linear regression analysis, discussing how to check both the slope and intercept coefficients for significance. We then discussed how to create a prediction interval for our forecasts. I had intended this week’s Forecast Friday post to delve straight into multiple regression analysis, but have decided instead to spend some time talking about the assumptions that go into building a regression model.  These assumptions apply to both simple and multiple regression analysis, but their importance is especially noticeable with multiple regression, and I feel it is best to make you aware of them, so that when we discuss multiple regression both as a time series and as a causal/econometric forecasting tool, you’ll know how to detect and correct regression models that violate these assumptions. We will formally begin our discussion of multiple regression methods next week.

Five Key Assumptions for Ordinary Least Squares (OLS) Regression

When we develop our parameter estimates for our regression model, we want to make sure that all of our estimators have the smallest variance. Recall that when you were computing the value of your estimate, b, for the parameter β, in the equation below:

You were subtracting your independent variable’s average from each of its actual values, and doing likewise for the dependent variable. You then multiplied those two quantities together (for each observation) and summed them up to get the numerator of that calculation. To get the denominator, you again subtracted the independent variable’s mean from each of its actual values and then squared them. Then you summed those up. The calculation of the denominator is the focal point here: the value you get for your estimate of β is the estimate that minimizes the squared error for your model. Hence, the term, least squares. If you were to take the denominator of the equation above and divide it by your sample size (less one: n-1), you would get the variance of your independent variable, X. This variance is something you also want to minimize, so that your estimate of β is efficient. When your parameter estimates are efficient, you can make stronger statistical statements about them.

We also want to be sure that our estimators are free of bias. That is, we want to be sure that our sample estimate, b, is on average, equal to our true population parameter, β. That is, if we calculated several estimates of β, the average of our b’s should equal β.

Essentially, there are five assumptions that must be made to ensure our estimators are unbiased and efficient:

Assumption #1: The regression equation correctly specifies the true model.

In order to correctly specify the true model, the relationship between the dependent and independent variable must be linear. Also, we must neither exclude relevant independent variables from nor include irrelevant independent variables in our regression equation. If any of these conditions are not met – that is, Assumption #1 is violated – then our parameter estimates will exhibit bias, particularly specification bias.

In addition, our independent and dependent variables must be measured accurately. For example, if we are trying to estimate salary based on years of schooling, we want to make sure our model is measuring years of schooling as actual years of schooling, and not desired years of schooling.

Assumption #2: The independent variables are fixed numbers and not correlated with error terms.

I warned you at the start of our discussion of linear regression that the error terms were going to be important. Let’s start with the notion of fixed numbers. When you are running a regression analysis, the values of each independent variable should not change every time you test of the equation. That is, the values of your independent variables are known and controlled by you. In addition, the independent variables should not be correlated with the error term. If an independent variable is correlated with the error term, then it is very possible a relevant independent variable was excluded from the equation. If Assumption #2 is violated, then your parameter estimates will be biased.

Assumption #3: The error terms ε, have a mean, or expected value, of zero.

As you noticed in the past blog post, when we developed our regression equation for Sue Stone’s monthly sales, we went back in and plugged each observation’s independent variable into our model and generated estimates of sales for that month. We then subtracted the estimated sales from the actual. Some of our estimates were higher than average, some were lower. Summing up all these errors, they should equal zero. If they don’t, they will result in a biased estimate of the intercept, a (which we use to estimate α). This assumption is not of serious concern, however, since the intercept is often of secondary importance to the slope estimate. We also assume that the error terms are normally distributed.

Assumption #4: The error terms have a constant variance.

The variance of the error term for all values of Xi should be constant, that is, the error terms should be homoscedastic. Visually, if you were to plot the line generated by your regression equation, and then plot the error terms for each observation as points above or below the regression line, the points should cluster around the line in a band of equal width above and below the regression line. If, instead, the points began to move further and further away from the regression line as the value of X increased, then the error terms are heteroscedastic, and the constant variance assumption is violated. Heteroscedasticity does not bias parameter estimates, but makes them inefficient, or untrustworthy.

Why does heteroscedasticity occur? Sometimes, a data set has some observations whose values for the independent variable are vastly different from those of the other observations. These cases are known as outliers. For example, if you have five observations, and their X values were as follows:

{ 5, 6, 6, 7, 20}

The fifth observation would be the outlier, since its X value of 20 is so different from that of the four previous observations. Regression equations place excessive weight on extreme values. Let’s assume that you were trying to construct a model to predict new car purchases based on income. You choose “household income” as your dependent variable and “new car spending” as the dependent variable. You survey 10 people who bought a new car, and you record both their income and the amount they paid for the car. You sort each respondent in order by income and look at their spending, as depicted in the table below:

 Respondent Annual Income New Car Purchase Price 1 \$30,000 \$25,900 2 \$32,500 \$27,500 3 \$35,000 \$26,000 4 \$37,500 \$29,000 5 \$40,000 \$32,000 6 \$42,500 \$30,500 7 \$45,000 \$34,000 8 \$47,500 \$26,500 9 \$50,000 \$38,000 10 \$52,500 \$40,000

Do you notice the pattern that as income increases, the new car purchase price tends to move upward? For the most part, it does. But, does it go up consistently? No. Notice how respondent #3 spent less for a car than the two respondents with lower incomes; respondent #8 spent much less for a car than lower-income respondents 4-7. Respondent #8 is an outlier. This happens because lower-income households are limited in their options for new cars, while higher-income households have more options. A low-income respondent may be limited to buying a Ford Focus or a Honda Civic; but a higher-income respondent may be able to buy a Lexus or BMW, yet still choose to buy the Civic or the Focus. Heteroscedasticity is very likely to occur with this data set. In case you haven’t guessed, heteroscedasticity is more likely to occur with cross-sectional data, rather than with time series data.

Assumption #5: The error terms are not correlated with each other.

Knowing the error term for any of our observations should not allow us to predict the error term of any other observation; the errors must be truly random. If they aren’t, autocorrelation results and the parameter estimates are inefficient, though unbiased. Autocorrelation is much more common with time series data than with cross-sectional data, and occurs because past occurrences can influence future ones. A good example of this is when I was building a regression model to help a college forecast enrollment. I started by building a simple time series regression model, then examined the errors and detected autocorrelation. How did it happen? Because most students who are enrolled in the Fall term are also likely to be enrolled again in the consecutive Spring term. Hence, I needed to correct for that autocorrelation. Similarly, while a company’s advertising expenditures in April may impact its sales in April, they are also likely to have some impact on its sales in May. This too can cause autocorrelation.

When these assumptions are kept, your regression equation is likely to contain parameter estimates that are the “best, linear, unbiased estimators” or BLUE. Keep these in mind as we go through our upcoming discussions on multiple regression.

Next Forecast Friday Topic: Regression with Two or More Independent Variables

Next week, we will plunge into our discussion of multiple regression. I will give you an example of how multiple variables are used to forecast a single dependent variable, and how to check for validity. As we go through the next couple of discussions, I will show you how to analyze the error terms to find violations of the regression assumptions. I will also show you how to determine the validity of the model, and to identify whether all independent variables within your model are relevant.

### The Man Who Feared Analytics

June 9, 2010

My colleague and I spoke with the businessman about his dilemma. We talked through his business; we looked at his most recent mailer, learned how he obtained his mailing lists, and discussed his promotion schedule. We found that the photographer would buy a list of names, mail them once, and then use a different list, not giving people enough opportunity to develop awareness of his business. We also found that he didn’t have much information about the people he was mailing.

We recommended that analytics could help the photographer maximize his margin by improving both the top and bottom line. Analytics would first help him understand which customers were responding to his mailings. Then he could purchase lists of people with characteristics similar to those past respondents. His response rate would go up, since he would be sending to a list of people most receptive to his photography. He would also be able to mail fewer people, cutting out those with little likelihood of response. He could then use the savings to remail the members of his target segments who hadn’t responded to his earlier mailing, and thus increase their awareness. It all sounded good to the photographer.

And then, he decided he was going to wait to see if things got better!

Why the Fear of Analytics?

The photographer’s decision is a common refrain of marketers. Marketers and business owners who are introduced to analytics are like riders on a roller coaster: thrilled and nervous at the same time. While marketers are excited about the benefits of analytics, they are also concerned about its cost; they’re afraid of change; and they’re intimidated by the perceived complexity of analytics. We’ll tackle each of these fears here.

FEAR #1: Analytics could be expensive.

REALITY: Analytics is an investment that pays for itself.

The cost of analytics can appear staggering, especially in lean times. Some of the most sophisticated analytics techniques can run into tens – if not hundreds – of thousands of dollars for a large corporation. However, for many smaller companies, analytics can run a few thousand dollars, but still a lot of money. But analytics is not an expense; you are getting something great in return: the insights you need to make better informed marketing decisions and identify the areas in your marketing that you can improve or enhance; the ability to target customers and prospects more effectively, resulting in increased sales and reduced costs; and the chance to establish long-term continuous improvement systems.

Had the photographer gone through with the analytics for his upcoming mail, the entire analysis would have cost him somewhere between \$1,300 and \$1,800. But that fee would have enabled him to identify where his mailings were getting the greatest bang for his buck and he might have made up for it in reduced mailing costs and increased revenues. Once the analytics had saved or made the photographer at least \$1,800, it would have paid for itself.

FEAR #2: Analytics means a change in the way we do things.