## Posts Tagged ‘demand forecasting’

### Forecast Friday Topic: Simple Regression Analysis

May 27, 2010

(Sixth in a series)

Today, we begin our discussion of regression analysis as a time series forecasting tool. This discussion will take the next few weeks, as there is much behind it. As always, I will make sure everything is simplified and easy for you to digest. Regression is a powerful tool that can be very helpful for mid- and long-range forecasting. Quite often, the business decisions we make require us to consider relationships between two or more variables. Rarely can we make changes to our promotion, pricing, and/or product development strategies without them having an impact of some kind on our sales. Just how big an impact would that be? How do we measure the relationship between two or more variables? And does a real relationship even exist between those variables? Regression analysis helps us find out.

One thing I must point out: Remember the “deviations” we discussed in the posts on moving average and exponential smoothing techniques: The difference between the forecasted and actual values for each observation, of which we took the absolute value? Good. In regression analysis, we refer to the deviations as the “error terms” or “residuals.” In regression analysis, the residuals – which we will square, rather than take the absolute value – become very important in gauging the regression model’s accuracy, validity, efficiency, and “goodness of fit.”

Simple Linear Regression Analysis

Sue Stone, owner of Stone & Associates, looked at her CPA practice’s monthly receipts from January to December 2009. The sales were as follows:

 Month Sales January \$10,000 February \$11,000 March \$10,500 April \$11,500 May \$12,500 June \$12,000 July \$14,000 August \$13,000 September \$13,500 October \$15,000 November \$14,500 December \$15,500

Sue is trying to predict what sales will be for each month in the first quarter of 2010, but is unsure of how to go about it. Moving average and exponential smoothing techniques rarely go more than one period ahead. So, what is Sue to do?

When we are presented with a set of numbers, one of the ways we try to make sense of it is by taking its average. Perhaps Sue can average all 12 months’ sales – \$12,750 – and use that her forecast for each of next three months. But how accurately would that measure each month of 2009? How spread out are each month’s sales from the average? Sue subtracts the average from each month’s sales and examines the difference:

 Month Sales Sales Less Average Sales January \$10,000 -\$2,750 February \$11,000 -\$1,750 March \$10,500 -\$2,250 April \$11,500 -\$1,250 May \$12,500 -\$250 June \$12,000 -\$750 July \$14,000 \$1,250 August \$13,000 \$250 September \$13,500 \$750 October \$15,000 \$2,250 November \$14,500 \$1,750 December \$15,500 \$2,750

Sue notices that the error between actual and average is quite high in both the first four months of 2009 and in the last three months of 2009. She wants to understand the overall error in using the average as a forecast of sales. However, when she sums up all the errors from month to month, Sue finds they sum to zero. That tells her nothing. So she squares each month’s error value and sums them:

 Month Sales Error Error Squared January \$10,000 -\$2,750 \$7,562,500 February \$11,000 -\$1,750 \$3,062,500 March \$10,500 -\$2,250 \$5,062,500 April \$11,500 -\$1,250 \$1,562,500 May \$12,500 -\$250 \$62,500 June \$12,000 -\$750 \$562,500 July \$14,000 \$1,250 \$1,562,500 August \$13,000 \$250 \$62,500 September \$13,500 \$750 \$562,500 October \$15,000 \$2,250 \$5,062,500 November \$14,500 \$1,750 \$3,062,500 December \$15,500 \$2,750 \$7,562,500 Total Error: \$35,750,000

In totaling these squared errors, Sue derives the total sum of squares, or TSS error: 35,750,000. Is there any way she can improve upon that? Sue thinks for a while. She doesn’t know too much more about her 2009 sales except for the month in which they were generated. She plots the sales on a chart:

Sue notices that sales by month appear to be on an upward trend. Sue thinks for a moment. “All I know is the sales and the month,” she says to herself, “How can I develop a model to forecast accurately?” Sue reads about a statistical procedure called regression analysis and, seeing that each month’s sales is in sequential order, she wonders whether the mere passage of time simply causes sales to go higher. Sue numbers each month, with January assigned a 1 and December, a 12.

She also realizes that she is trying to predict sales with each passing month. Hence, she hypothesizes that the change in sales depends on the change in the month. Hence, sales is Sue’s dependent variable. Because the month number is used to estimate change in sales, it is her independent variable. In regression analysis, the relationship between an independent and a dependent value is expressed:

Y = α + βX + ε

Where: Y is the value of the dependent variable

X is the value of the independent variable

α is a population parameter, called the intercept, which would be the value of Y when X=0

β is also a population parameter – the slope of the regression line – representing the change in Y associated with each one-unit change in X.

ε is the error term.

Sue further reads that the goal of regression analysis is to minimize the error sum of squares, which is why it is referred to as ordinary least squares (OLS) regression. She also notices that she is building her regression on a sample, so there is a sample regression equation used to estimate what the true regression is for the population:

Essentially, the equation is the same as the one above, however the terms indicate the sample. The Y-term (called “Y hat”) is the sample forecasted value of the dependent variable (sales) at period i; a is the sample estimate of α; b is the sample estimate of β; Xi is the value of the independent variable at period i; and ei is the error, or difference between Y hat (the forecasted value) and actual Y for period i. Sue needs to find the values for a and b – the estimates of the population parameters – that minimize the error sum of squares.

Sue reads that the equations for estimating a and b are derived from calculus, but expressed algebraically as:

Sue learns that the X and Y terms with lines above them, known as “X bar” and “Y bar,” respectively are the averages of all the X and Y values, respectively. She also reads that the Σ notation – the Greek letter sigma – represents a sum. Hence, Sue realizes a few things:

1. She must estimate b before she can estimate a;
2. To estimate b,she must take care of the numerator:
1. first subtract each observation’s month number from the average month’s number (X minus X-bar),
2. subtract each observation’s sales from the average sales (Y minus Y-bar),
3. multiply those two together, and
4. Add up (2c) for all observations.
3. To get the denominator for calculating b, she must:
1. Again subtract X-bar from X, but then square the difference, for each observation.
2. Sum them up
4. Calculating b is easy: She needs only to divide the result from (2) by the result from (3).
5. Calculating a is also easy: She multiplies her b value by the average month (X-bar), and subtracts it from average sales (Y-bar).

Sue now goes to work to compute her regression equation. She goes into Excel and enters her monthly sales data in a table, and computes the averages for sales and month number:

 Month (X) Sales (Y) 1 \$10,000 2 \$11,000 3 \$10,500 4 \$11,500 5 \$12,500 6 \$12,000 7 \$14,000 8 \$13,000 9 \$13,500 10 \$15,000 11 \$14,500 12 \$15,500 Average 6.5 \$12,750

Sue goes ahead and subtracts the X and Y values from their respective averages, and computes the components she needs (the “Product” is the result of multiplying the values in the first two columns together):

 X minus X-bar Y minus Y-bar Product (X minus X-bar) Squared -5.5 -\$2,750 \$15,125 30.25 -4.5 -\$1,750 \$7,875 20.25 -3.5 -\$2,250 \$7,875 12.25 -2.5 -\$1,250 \$3,125 6.25 -1.5 -\$250 \$375 2.25 -0.5 -\$750 \$375 0.25 0.5 \$1,250 \$625 0.25 1.5 \$250 \$375 2.25 2.5 \$750 \$1,875 6.25 3.5 \$2,250 \$7,875 12.25 4.5 \$1,750 \$7,875 20.25 5.5 \$2,750 \$15,125 30.25 Total \$68,500 143

Sue computes b:

b = \$68,500/143

= \$479.02

Now that Sue knows b, she calculates a:

a = \$12,750 – \$479.02(6.5)

= \$12,750 – \$3,113.64

= \$9,636.36

Hence, assuming errors are zero, Sue’s least-squares regression equation is:

Y(hat) =\$9,636.36 + \$479.02X

Forecasted Sales = \$9,636.36 + \$479.02 * Month number.

This means that each passing month is associated with an average increase in sales of \$479.02 for Sue’s CPA firm. How accurately does this regression model predict sales? Sue estimates the error by plugging each month’s number into the equation and then comparing her forecast for that month with the actual sales:

 Month (X) Sales (Y) Forecasted Sales Error 1 \$10,000 \$10,115.38 -\$115.38 2 \$11,000 \$10,594.41 \$405.59 3 \$10,500 \$11,073.43 -\$573.43 4 \$11,500 \$11,552.45 -\$52.45 5 \$12,500 \$12,031.47 \$468.53 6 \$12,000 \$12,510.49 -\$510.49 7 \$14,000 \$12,989.51 \$1,010.49 8 \$13,000 \$13,468.53 -\$468.53 9 \$13,500 \$13,947.55 -\$447.55 10 \$15,000 \$14,426.57 \$573.43 11 \$14,500 \$14,905.59 -\$405.59 12 \$15,500 \$15,384.62 \$115.38

Sue’s actual and forecasted sales appear to be pretty close, except for her July estimate, which is off by a little over \$1,000. But does her model predict better than if she simply used average sales as her forecast for each month? To do that, she must compute the error sum of squares, ESS, error. Sue must square the error terms for each observation and sum them up to obtain ESS:

ESS = Σe2

 Error Squared Error -\$115.38 \$13,313.61 \$405.59 \$164,506.82 -\$573.43 \$328,818.04 -\$52.45 \$2,750.75 \$468.53 \$219,521.74 -\$510.49 \$260,599.54 \$1,010.49 \$1,021,089.05 -\$468.53 \$219,521.74 -\$447.55 \$200,303.19 \$573.43 \$328,818.04 -\$405.59 \$164,506.82 \$115.38 \$13,313.61 ESS= \$2,937,062.94

Notice Sue’s error sum of squares. This is the error, or unexplained, sum of squared deviations between the forecasted and actual sales. The difference between the total sum of squares (TSS) and the Error Sum of Squares (ESS) is the regression sum of squares, RSS, and that is the sum of squared deviations that are explained by the regression. RSS is also calculated as each forecasted value of sales less the average of sales:

 Forecasted Sales Average Sales Regression Error Reg. Error Squared \$10,115.38 \$12,750 -\$2,634.62 \$6,941,198.22 \$10,594.41 \$12,750 -\$2,155.59 \$4,646,587.24 \$11,073.43 \$12,750 -\$1,676.57 \$2,810,898.45 \$11,552.45 \$12,750 -\$1,197.55 \$1,434,131.86 \$12,031.47 \$12,750 -\$718.53 \$516,287.47 \$12,510.49 \$12,750 -\$239.51 \$57,365.27 \$12,989.51 \$12,750 \$239.51 \$57,365.27 \$13,468.53 \$12,750 \$718.53 \$516,287.47 \$13,947.55 \$12,750 \$1,197.55 \$1,434,131.86 \$14,426.57 \$12,750 \$1,676.57 \$2,810,898.45 \$14,905.59 \$12,750 \$2,155.59 \$4,646,587.24 \$15,384.62 \$12,750 \$2,634.62 \$6,941,198.22 RSS= \$32,812,937.06

Sue immediately adds the RSS and the ESS and sees they match the TSS: \$35,750,000. She also knows that nearly 33 million of that TSS is explained by her regression model, so she divides her RSS by the TSS:

32,812,937.06 / 35,750,000

=.917 or 91.7%

This quotient, known as the coefficient of determination, and denoted as R2, tells Sue that each passing month explains 91.7% of the change in monthly sales that she experiences. What R2 means is that Sue improved her forecast accuracy by 91.7% by using this simple model instead of the simple average. As you will find out in subsequent blog posts, maximizing R2 isn’t the “be all and end all”. In fact, there is still much to do with this model, which will be discussed in next week’s Forecast Friday post. But for now, Sue’s model seems to have reduced a great deal of error.

It is important to note that while each month does seem to be related to sales, the passing months do not cause the increase in sales. Correlation does not mean causation. There could be something behind the scenes (e.g., Sue’s advertising, or the types of projects she works on, etc.) that is driving the upward trend in her sales.

Using the Regression Equation to Forecast Sales

Now Sue can use the same model to forecast sales for January 2010 and February 2010, etc. She has her equation, so since January 2010 is period 13, she plugs in 13 for X, and gets a forecast of \$15,863.64; for February (period 14), she gets \$16,342.66.

Recap and Plan for Next Week

You have now learned the basics of simple regression analysis. You have learned how to estimate the parameters for the regression equation, how to measure the improvement in accuracy from the regression model, and how to generate forecasts. Next week, we will be checking the validity of Sue’s equation, and discussing the important assumptions underlying regression analysis. Until then, you have a basic overview of what regression analysis is.

### 8 Steps to Determining Market Size

May 1, 2009

Whether you’re an entrepreneur writing a business plan or an established firm looking to introduce a new product or service, you will encounter the need to estimate the size of the market/s that you plan to serve.  Market-sizing is an interesting and exciting branch of marketing research, but it can be almost as much an art as it is a science.  Today, I will walk you through the process of estimating market size, using the example of a financial planner looking to develop a practice in his community.

Step 1: Define your target market

This can never be stressed enough.  If you don’t know the type of customer you want to serve, you will waste a lot of time and money trying to get any customers.  Market-sizing is easier when you know the exact group you’re searching for.  Our financial planner has decided that his target market will be married couples with young children.

Step 2: Determine the needs of your target market and how they create demand for your product/service

Here you formulate a hypothesis.  Ask yourself the benefits your product or service offers your target customers.  What problem does your product help them solve?  Begin with a statement about why your target customers need your product.  Our financial planner’s statement might be: “Married couples with young children need my services because they must be prepared for college, as well as for unexpected emergencies such as disability and early death.”  This statement assumes, of course, that the financial planner sells financial products that address these needs; if the planner sells only financial plans, his statement will be different.

Step 3: Identify the information you need to estimate the size of your market

Now that you have identified your target market and hypothesized about its demand for your product, what information do you need to develop your estimates?  Among other things, our financial planner would need to know:

• The age distribution within the geographic area he serves;
• The number of households with children in that area;
• The distribution of family income in that area;
• Home market values in the area;
• Educational attainment and college enrollment rates for graduating high school students;
• How many competitors, direct (other financial planners and insurance agents) and indirect (stock brokers, banks with financial planning services, etc.) are serving the market; and
• What financial planning services people buy and how much they pay.

There are others, but this list is pretty comprehensive.

Step 4: Identify the sources you need to obtain that information

So where do you find information about your market?  These days, there is such a wealth of published statistics about almost every industry and market segment, that a combination of library and online research can fulfill most of your information needs.  In some cases, if you are looking for very specialized information, you may need to conduct your own primary research (surveys, focus groups, etc.) to get what you need.

The U.S. Bureau of the Census provides comprehensive demographic statistics by metropolitan area, county, ZIP code, census tract, and state.  Information about population, age, income, educational attainment, presence of children, and home market value can easily be obtained at any of these levels, so the financial planner would be able to answer many of his questions.  In addition, the Census Bureau also produces County Business Patterns, which provides information about the activity of each industry by each of the same geographic levels listed earlier.  Hence, our financial planner can also obtain the number of financial planning establishments,  insurance agencies, and brokerage firms serving the area in which he hopes to establish his practice.

In addition, our financial planner may consult online data sources such as Dun & Bradstreet’s Million Dollar Database and ABI’s ReferenceUSA to identify specific financial planning firms and insurance agencies in his area and get estimates of their employment size and revenues.

The financial planner can also get lots of relevant information from trade associations, local chambers of commerce, Web sites of his existing competitors, and through primary research, such as surveys and interviews with experts.

Step 5: Collect the data

Now that you have identified your data sources, you need to extract the data.  The financial planner will scour all the sources he identified to pull out data that meets his information needs.  He will determine whether his data sources provide sufficient and useful data, or whether they provide insufficient or suspect data, at which point he may seek out additional sources to answer his questions.

Step 6: Analyze the data

Now that you have all the data, what does it mean?  What is it telling you?  Let’s say that the area our financial planner wants to serve has 200,000 households, of which 15% – or 30,000 – are two-parent households, with a median family income of \$60,000 per year, a median age of 32, and an average household size of 4.  Immediately, the financial planner knows he is serving a young upscale market, and it’s very likely – without looking at the number of competition – that there will already be an above average number of financial planners trying to serve them.

The financial planner may also find from financial planning industry statistics that 60% of families in that age group carry life insurance, and that the average policy face value is \$100,000.  Given the affluence of this area, the planner may reason that households in his target market have much greater assets and income to protect, so he may adjust his estimates of life insurance coverage for that area upward – to policies of maybe \$250,000 or \$500,000.  He’ll make similar estimates for any other financial products and services he offers.

Step 7: Derive your market estimate

Now that you’ve compiled and analyzed your data, you need to come up with an estimate of market size.  Our financial planner may – through all his data sources – come up with an average and standard deviation of the policy amounts of life, disability, and other policies aimed at his target market in that area.  He will then project that amount out by the number of households within that market to come up with an aggregate size of the financial planning market in that area.  From there, he will build in a margin of error, perhaps using a 95% confidence interval, to come up with a low estimate, a middle estimate (which would be the aggregate size he determined earlier), and a high estimate.