Archive for the ‘Demand’ Category

Forecast Friday Topic: The Identification Problem

November 11, 2010

(Twenty-ninth in a series)

When we work with regression analysis, it is assumed that outside factors determine each of the independent variables in the model; these factors are said to be exogenous to the system. This is especially of interest to economists, who have long used econometric models to forecast demand and supply for various goods. The price the market will bear for a good or service, for example, is not determined by a single equation, but by the interaction of the equations for both supply and demand. If price was what we were trying to forecast, then a single equation would do us little good. In fact, since price is part of a multi-equation system, performing regression analysis for just demand without supply or vice-versa will result in biased parameter estimates.

This post begins our three-part “series within a series” on “Simultaneous Equations and Two-Stage Least Squares Regression”. Although this topic sounds intimidating, I will not be covering it in much technical detail. My purpose in discussing it is to make you aware of these concepts, so that you can determine when to look beyond a simple regression analysis.

Hence, we start with the most basic concept of simultaneous equations: the Identification problem. Let’s assume that you are the supply chain manager for a beer company. You need to forecast the price of barley, so your company can budget how much money it needs to spend in order to have enough barley to produce its beer; determine whether the price is on an upward trend, so that it could purchase derivatives to hedge its risk; and determine the final price for its beer.

You have statistics for the price and traded quantity of barley for the last several years. You also remember three concepts from your college economics class:

  1. The price and quantity supplied of a good have a direct relationship – producers supply more as the price goes up and less as the price goes down;
  2. The price and quantity demanded of a good have an inverse relationship – consumers purchase less as the price goes up and vice-versa; and
  3. The market price is determined by the interaction of the supply and demand equations.

Since price and quantity are positively sloped for supply and negatively sloped for demand, with only the two variables of quantity and price, you cannot determine – that is identify – the supply and demand equations using regression analysis; the information is insufficient. However, if you can identify variables that are in one equation and not the other, you will be able to identify the individual relations.

In agriculture, the supply of a crop is greatly affected by weather. If you can obtain information on the amount of rainfall in barley producing regions during the years for which you have data, you might be able to identify the different equations. Moreover, production costs impact supply. So if you can obtain information on the costs of planting and harvesting the barley, that too would help. On the demand side, barley’s quantity can be influenced by changes in tastes. If beer demand goes up, so too will the demand for barley; if farm animal raising increases, farmers may need to purchase more barley for animal fodder; and various health fads may emerge, increasing the demands for barley breads and soups. If you can obtain these kinds of information, you are on your way to identifying the supply and demand curves.

Exogenous and Endogenous Variables

Since rainfall affects the supply of barley, but the barley market does not influence the amount of rainfall, rainfall is said to be an exogenous variable, because its value is determined by factors outside of the equation system. Since the demand for beer helps derive the demand for barley, but not the other way around, beer demand is an exogenous variable.

Because price and quantity of barley are part of a demand and supply system, they are determined by the interaction of the two equations – that is by the equation system – so they are said to be endogenous variables.

Identifying an Equation

If you are trying to identify an equation that is part of a multi-equation system, you must have a minimum of one less variable than you do equations excluded from that equation. Hence, if you have a two-equation system, you must have at least one variable excluded from the model you’re trying to identify, that is included in the other equation; if your system has three equations, you need to have at least two variables excluded from the model you want to identify, and so on.

When you have just enough exogenous variables in one equation that is not in the other equation(s), then your equation is just identified. You can use several econometric techniques to estimate just identified systems, however they are quite rare in practice. When you have no exogenous variables that are unique to one equation in the system, your equations are under identified and cannot be estimated with any econometric techniques. Most often, equations are over identified, because there are more exogenous variables excluded from one equation than required by the number of equations in the system. When over identification is the case, then two-stage least squares (the topic of the third post of this miniseries) is required in order to tell which of the variables is causing your supply (or demand) curve to shift along the fixed demand (or supply) curve.

Next Forecast Friday Topic: Structural and Reduced Forms

Next week’s Forecast Friday topic builds on today’s topic with a discussion of structural and reduced forms of equations. These are the first steps in Two-Stage Least Squares Regression analysis, and are part of the effort to solve the identification problem.


Be Sure to Follow us on Facebook and Twitter !

Thanks to all of you, Analysights now has nearly 200 fans on Facebook … and we’d love more!  If you like Forecast Friday – or any of our other posts – then we want you to “Like” us on Facebook! And if you like us that much, please also pass these posts on to your friends who like forecasting and invite them to “Like” Analysights! By “Like-ing” us on Facebook, you and they will be informed every time a new blog post has been published, or when new information comes out. Check out our Facebook page! You can also follow us on Twitter. Thanks for your help!

Analyzing Subgroups of Data

July 21, 2010

The data available to us has never been more voluminous. Thanks to technology, data about us and our environment are collected almost continuously. When we use a cell phone to call someone else’s cell phone, several pieces of information are collected: the two phone numbers involved in the call; the time the call started and ended; the cell phone towers closest to the two parties; the cell phone carriers; the distance of the call; the date; and many more. Cell phone companies use this information to determine where to increase capacity; refine, price, and promote their plans more effectively; and identify regions with inadequate coverage.

Multiply these different pieces of data by the number of calls in a year, a month, a day – even an hour – and you can easily see that we are dealing with enormous amounts of records and observations. While it’s good for decision makers to see what sales, school enrollment, cell phone usage, or any other pattern looks like in total, quite often they are even more interested in breaking down data into groups to see if certain groups behave differently. Quite often we hear decision makers asking questions like these:

  • How do depositors under age 35 compare with those between 35-54 and 55 & over in their choice of banking products?
  • How will voter support for Candidate A differ by race or ethnicity?
  • How does cell phone usage differ between men and women?
  • Does the length or severity of a prison sentence differ by race?

When we break data down into subgroups, we are trying to see whether knowing about these groups adds any additional meaningful information. This helps us customize marketing messages, product packages, pricing structures, and sales channels for different segments of our customers. There are many different ways we can break data down: by region, age, race, gender, income, spending levels; the list is limitless.

To give you an example of how data can be analyzed by groups, let’s revisit Jenny Kaplan, owner of K-Jen, the New Orleans-style restaurant. If you recall from the May 25 post, Jenny tested two coupon offers for her $10 jambalaya entrée: one offering 10% off and another offering $1 off. Even though the savings was the same, Jenny thought customers would respond differently. As Jenny found, neither offer was better than the other at increasing the average size of the table check. Now, Jenny wants to see if there is a preference for one offer over the other, based on customer age.

Jenny knows that of her 1,000-patron database, about 50% are the ages of 18 to 35; the rest are older than 35. So Jenny decides to send out 1,000 coupons via email as follows:


$1 off

10% off

Total Coupons





Over 35




Total Coupons




Half of Jenny’s customers received one coupon offer and half received the other. Looking carefully at the table above, half the people in each age group got one offer and the other half got the other offer. At the end of the promotion period, Jenny received back 200 coupons. She tracks the coupon codes back to her database and finds the following pattern:

Coupons Redeemed (Actual)


$1 off

10% off

Coupons Redeemed





Over 35




Coupons Redeemed





Exactly 200 coupons were redeemed, 100 from each age group. But notice something else: of the 200 people redeeming the coupon, 110 redeemed the coupon offering 10% off; just 90 redeemed the $1 off coupon. Does this mean the 10% off coupon was the better offer? Not so fast!

What Else is the Table Telling Us?

Look at each age group. Of the 100 customers aged 18-35, 65 redeemed the 10% off coupon; but of the 100 customers age 35 and up, just 45 did. Is that a meaningful difference or just a fluke? Do persons over 35 prefer an offer of $1 off to one of 10% off? There’s one way to tell: a chi-squared test for statistical significance.

The Chi-Squared Test

Generally, a chi-squared test is useful in determining associations between categories and observed results. The chi-squared – χ2 – statistic is value needed to determine statistical significance. In order to compute χ2, Jenny needs to know two things: the actual frequency distribution of the coupons redeemed (which is shown in the last table above), and the expected frequencies.

Expected frequencies are the types of frequencies you would expect the distribution of data to fall, based on probability. In this case, we have two equal sized groups: customers age 18-35 and customers over 35. Knowing nothing else besides the fact that the same number of people in these groups redeemed coupons, and that 110 of them redeemed the 10% off coupon, and 90 redeemed the $1 off coupon, we would expect that 55 customers in each group would redeem the 10% off coupon and 45 in each group would redeem the $1 off coupon. Hence, in our expected frequencies, we still expect 55% of the total customers to redeem the 10% off offer. Jenny’s expected frequencies are:

Coupons Redeemed (Expected)


$1 off

10% off

Coupons Redeemed

18-35 45 55 100
Over 35 45 55 100
Coupons Redeemed 90 110 200


As you can see, the totals for each row and column match those in the actual frequency table above. The mathematical way to compute the expected frequencies for each cell would be to multiply its corresponding column total by its corresponding row total and then divide it by the total number of observations. So, we would compute as follows:

Frequency of:



18-35 redeeming $1 off: =(100*90)/200


18-35 redeeming 10% off: =(100*110)/200


Over 35 redeeming $1 off: =(100*90)/200


Over 35 redeeming 10% off: =(100*110)/200



Now that Jenny knows the expected frequencies, she must determine the critical χ2 statistic to determine significance, then she must compute the χ2 statistic for her data. If the latter χ2 is greater than the critical χ2 statistic, then Jenny knows that the customer’s age group is associated the coupon offer redeemed.

Determining the Critical χ2 Statistic

To find out what her critical χ2 statistic is, Jenny must first determine the degrees of freedom in her data. For cross-tabulation tables, the number of degrees of freedom is a straightforward calculation:

Degrees of freedom = (# of rows – 1) * (# of columns -1)

So, Jenny has two rows of data and two columns, so she has (2-1)*(2-1) = 1 degree of freedom. With this information, Jenny grabs her old college statistics book and looks at the χ2 distribution table in the appendix. For a 95% confidence interval with one degree of freedom, her critical χ2 statistic is 3.84. When Jenny calculates the χ2 statistic from her frequencies, she will compare it with the critical χ2 statistic. If Jenny’s χ2 statistic is greater than the critical, she will conclude that the difference is statistically significant and that age does relate to which coupon offer is redeemed.

Calculating the χ2 Value From Observed Frequencies

Now, Jenny needs to compare the actual number of coupons redeemed for each group to their expected number. Essentially, to compute her χ2 value, Jenny follows a particular formula. For each cell, she subtracts the expected frequency of that cell from the actual frequency, squares the difference, and then divides it by the expected frequency. She does this for each cell. Then she sums up her results to get her χ2 value:


$1 off

10% off

18-35 =(35-45)^2/45 = 2.22 =(65-55)^2/55=1.82
Over 35 =(55-45)^2/45 = 2.22 =(45-55)^2/55=1.82






Jenny’s χ2 value is 8.08, much higher than the critical 3.84, indicating that there is indeed an association between age and coupon redemption.

Interpreting the Results

Jenny concludes that patrons over the age of 35 are more inclined than patrons age 18-35 to take advantage of a coupon stating $1 off; patrons age 18-35 are more inclined to prefer the 10% off coupon. The way Jenny uses this information depends on the objectives of her business. If Jenny feels that K-Jen needs to attract more middle-aged and senior citizens, she should use the $1 off coupon when targeting them. If Jenny feels K-Jen isn’t selling enough Jambalaya, then she might try to stimulate demand by couponing, sending the $1 off coupon to patrons over the age of 35 and the 10% off coupon to those 18-35.

Jenny might even have a counterintuitive use for the information. If most of K-Jen’s regular patrons are over age 35, they may already be loyal customers. Jenny might still send them coupons, but give the 10% off coupon instead. Why? These customers are likely to buy the jambalaya anyway, so why not give them the coupon they are not as likely to redeem? After all, why give someone a discount if they’re going to buy anyway! Giving the 10% off coupon to these customers does two things: first, it shows them that K-Jen still cares about their business and keeps them aware of K-Jen as a dining option. Second, by using the lower redeeming coupon, Jenny can reduce her exposure to subsidizing loyal customers. In this instance, Jenny uses the coupons for advertising and promoting awareness, rather than moving orders of jambalaya.

There are several more ways to analyze data by subgroup, some of which will be discussed in future posts. It is important to remember that your research objectives dictate the information you collect, which dictate the appropriate analysis to conduct.


If you Like Our Posts, Then “Like” Us on Facebook and Twitter!

Analysights is now doing the social media thing! If you like Forecast Friday – or any of our other posts – then we want you to “Like” us on Facebook! By “Like-ing” us on Facebook, you’ll be informed every time a new blog post has been published, or when other information comes out. Check out our Facebook page! You can also follow us on Twitter.

A Series on Forecasting Sales for Your Business’ Success

April 22, 2010

Trying to predict what sales will be like next year, next month, next week, or even tomorrow is as much an art as it is a science.  Because the forecasting process is often difficult and tedious, and because there’s no guarantee that a forecast will be precise, some companies and businesses don’t even bother to do it, resigning themselves to what Hamlet would call “the slings and arrows of outrageous fortune,” that the business world can certainly hash out.

Yet forecasting, if done properly, can be a great tool to reduce uncertainty about the future, as well as risk: it can make it easier to manage staffing and inventory levels; decide when to tap credit lines, step up marketing and sales efforts, or  acquire new equipment; and/or determine whether a new product will be worth launching.  In short, forecasting is intended to facilitate better planning.

These next few posts will introduce you to the different kinds of forecasting methods available, how to determine which method is most appropriate for your project, what information you need to generate the forecast, how to generate it, and how to test it for effectiveness.

This post serves as a roadmap for what is to come on Insight Central.  For the next few posts, you can expect to see (in this order):

  1. A high-level discussion of the different categories of forecasting methods: time series, causal/econometric, judgmental, artificial intelligence, and other.
  2. A deeper discussion of the more popular forecasting methods: regression analysis, exponential smoothing, moving average, ARIMA (Box-Jenkins), as well as some of the judgmental methods like surveys, composite forecasts, scenario building, and simulation.  Each discussion will also include a section on how to know if it’s one you should use.
  3. A discussion about the data you use to forecast; and how to prepare it for forecasting.
  4. How to determine if your forecast model is valid, reliable, and good for predicting.
  5. Some (nontechnical) case studies in which Analysights applied forecasting methods, and the results.

The next several posts will give you the pointers you need in order to forecast your business’ sales more effectively, and I believe you’ll find them to be informative, interesting, and exciting, not to mention beneficial to your top and bottom lines.  Let the voyage begin!

8 Steps to Determining Market Size

May 1, 2009

Whether you’re an entrepreneur writing a business plan or an established firm looking to introduce a new product or service, you will encounter the need to estimate the size of the market/s that you plan to serve.  Market-sizing is an interesting and exciting branch of marketing research, but it can be almost as much an art as it is a science.  Today, I will walk you through the process of estimating market size, using the example of a financial planner looking to develop a practice in his community.

Step 1: Define your target market

This can never be stressed enough.  If you don’t know the type of customer you want to serve, you will waste a lot of time and money trying to get any customers.  Market-sizing is easier when you know the exact group you’re searching for.  Our financial planner has decided that his target market will be married couples with young children.

Step 2: Determine the needs of your target market and how they create demand for your product/service

Here you formulate a hypothesis.  Ask yourself the benefits your product or service offers your target customers.  What problem does your product help them solve?  Begin with a statement about why your target customers need your product.  Our financial planner’s statement might be: “Married couples with young children need my services because they must be prepared for college, as well as for unexpected emergencies such as disability and early death.”  This statement assumes, of course, that the financial planner sells financial products that address these needs; if the planner sells only financial plans, his statement will be different.  

Step 3: Identify the information you need to estimate the size of your market

Now that you have identified your target market and hypothesized about its demand for your product, what information do you need to develop your estimates?  Among other things, our financial planner would need to know:

  • The age distribution within the geographic area he serves;
  • The number of households with children in that area;
  • The distribution of family income in that area;
  • Home market values in the area;
  • Educational attainment and college enrollment rates for graduating high school students;
  • How many competitors, direct (other financial planners and insurance agents) and indirect (stock brokers, banks with financial planning services, etc.) are serving the market; and
  • What financial planning services people buy and how much they pay.

There are others, but this list is pretty comprehensive.

Step 4: Identify the sources you need to obtain that information

So where do you find information about your market?  These days, there is such a wealth of published statistics about almost every industry and market segment, that a combination of library and online research can fulfill most of your information needs.  In some cases, if you are looking for very specialized information, you may need to conduct your own primary research (surveys, focus groups, etc.) to get what you need. 

The U.S. Bureau of the Census provides comprehensive demographic statistics by metropolitan area, county, ZIP code, census tract, and state.  Information about population, age, income, educational attainment, presence of children, and home market value can easily be obtained at any of these levels, so the financial planner would be able to answer many of his questions.  In addition, the Census Bureau also produces County Business Patterns, which provides information about the activity of each industry by each of the same geographic levels listed earlier.  Hence, our financial planner can also obtain the number of financial planning establishments,  insurance agencies, and brokerage firms serving the area in which he hopes to establish his practice.

In addition, our financial planner may consult online data sources such as Dun & Bradstreet’s Million Dollar Database and ABI’s ReferenceUSA to identify specific financial planning firms and insurance agencies in his area and get estimates of their employment size and revenues.

The financial planner can also get lots of relevant information from trade associations, local chambers of commerce, Web sites of his existing competitors, and through primary research, such as surveys and interviews with experts.

Step 5: Collect the data

Now that you have identified your data sources, you need to extract the data.  The financial planner will scour all the sources he identified to pull out data that meets his information needs.  He will determine whether his data sources provide sufficient and useful data, or whether they provide insufficient or suspect data, at which point he may seek out additional sources to answer his questions.

Step 6: Analyze the data

Now that you have all the data, what does it mean?  What is it telling you?  Let’s say that the area our financial planner wants to serve has 200,000 households, of which 15% – or 30,000 – are two-parent households, with a median family income of $60,000 per year, a median age of 32, and an average household size of 4.  Immediately, the financial planner knows he is serving a young upscale market, and it’s very likely – without looking at the number of competition – that there will already be an above average number of financial planners trying to serve them.

The financial planner may also find from financial planning industry statistics that 60% of families in that age group carry life insurance, and that the average policy face value is $100,000.  Given the affluence of this area, the planner may reason that households in his target market have much greater assets and income to protect, so he may adjust his estimates of life insurance coverage for that area upward – to policies of maybe $250,000 or $500,000.  He’ll make similar estimates for any other financial products and services he offers.

Step 7: Derive your market estimate

Now that you’ve compiled and analyzed your data, you need to come up with an estimate of market size.  Our financial planner may – through all his data sources – come up with an average and standard deviation of the policy amounts of life, disability, and other policies aimed at his target market in that area.  He will then project that amount out by the number of households within that market to come up with an aggregate size of the financial planning market in that area.  From there, he will build in a margin of error, perhaps using a 95% confidence interval, to come up with a low estimate, a middle estimate (which would be the aggregate size he determined earlier), and a high estimate.

Step 8: Apply your estimate

Your market size estimate is useless if you do not apply it.  Once our financial planner derives his aggregate estimate, he will estimate how much of that market he can reasonably get based on his competition and the amount of money he can earn based on his commission structure.  This will feed his business plan projections.

In addition, the size and characteristics of this market will help our financial planner determine how best to market his services, whether by direct mail, giving presentations, networking, or other means.

Market-sizing can be a daunting, tedious task, but the value it adds to your planning and marketing efforts can make the time, money and effort invested in it more than worthwhile.

Your Marketing Should Stress Time, Not Money

March 24, 2009

In her blog today, Kelly Spors, entrepreneurship columnist for the Wall Street Journal discussed a study by Stanford Business School which suggests that marketers who emphasize time as opposed to money when promoting their products and services tend to have better sales.

One experiment the researchers did was to have two six-year old girls set up a lemonade stand, and tested three signs: “Spend a little time and enjoy C&D’s lemonade.”; “Spend a little money and enjoy C&D’s lemonade.”; and simply “Enjoy C&D’s lemonade.”  According to Spors’ post, the researchers were asked to pay between $1 and $3 for a glass of lemonade and were then asked questions about their impressions about the lemonade. 

The experiment found that the “Spend a little time…” sign attracted twice as many passersby than the sign emphasizing money, and those who were attracted by the “time” sign paid almost twice as much for the lemonade and said they were more likely to enjoy it than those who saw the “money” sign.  

Despite the recession, time still seems to be more precious than money to most people.  And it also seems that people are more judicious with their time and will pay more for something that either makes the best use of their time, or – as suggested in the blog post – helps them achieve a great experience with a product or brand.

Marketers might be better advised to try building engagement with their brands rather than promote it as a money saver. 

Kelly Spors’ blog post can be seen here.