Archive for the ‘Sampling’ Category

Marketing Research in Practice

October 12, 2010

Most of the topics I have written about discuss the concepts of marketing research in theory. Today, I want to give you an overview of how marketing research works in practice. Marketing research from a practical standpoint should be discussed periodically because the realities of business are constantly changing and the ideal approach to research and the feasible approach can be very far apart.

Recently, I submitted a bid to a prospective client, who was looking to conduct a survey from a population that was difficult to reach. My bid came up higher than was expected. The department who was to execute the findings of this survey was on a tight budget. Yet, I had to explain the largest cost driver was hiring a marketing research firm to provide the sample. One faction within the company wanted to move ahead at the price I quoted; another wanted to look for ways to reduce the scope of the study and hence the cost. The tradeoff between cost and scope is often the first issue that emerges in the practice of marketing research.

Much of the practice of marketing research parallels what economists have long referred to as “the basic economic problem:” limited resources against unlimited wants. Thanks to the push for company departments to work cross-functionally, there have never been more stakeholders in the outcome of marketing research, with each function having its own agenda from the outcomes of the research. The scope of the study can expand greatly because of the many stakeholders involved; yet the time and money available for the study are often finite.

Another issue that comes up is the selection of the marketing research vendor. Ideally, a company should retain a vendor who is strong in the type of research methodology that needs to be done. In reality, however, this isn’t always possible. Many marketers don’t deal enough with marketing research vendors in order to know their areas of expertise; many believe that every vendor is the same. That’s hardly the case. Before I started Analysights, I worked for a membership association. The association had conducted an employee satisfaction survey and retained a firm that had conducted several. As part of the project, the employee research firm would compare the ratings to those of other companies’ employees who took a similar survey. However, most of the employers who called on this firm to conduct surveys were financial institutions – banks in particular – and their ratings were not comparable to those of the association. As a result, the peer comparison was useless.

Moreover, picking a vendor who is well-versed in a particular methodology may not be possible because they do it so well, that they charge a premium for the service. Hence, clients are often required to develop second-best solutions.

There are many other political issues that come up in the practice of marketing research, too numerous to list here. The key to remember is that marketing research provides information, and information provides power. The department with control of the information has great power in the organization, which results in less than ideal marketing research outcomes.

To ensure that your marketing research outcomes come as close to ideal, it is necessary to take a series of proactive steps. First, get all the stakeholders together. Without concern for money and time, the stakeholders as a group should determine the objectives of the study. Once the objectives are set, the group needs to think through the information they need for those objectives. Collectively, they should distinguish between the “need to know” and the “nice to know,” information and first go with the former. Generally, about 20% of the findings you generate will provide nearly 80% of the actionable information you need. It’s always best to start with a study design whose results provide the greatest amount of relevant, actionable information at the smallest scope possible.

Once the stakeholders are on board for the objectives and the information they must obtain for the objectives, then there should be some agreement on the tradeoffs between the cost of executing the research, the sophistication of the approach, and the data to be collected. Then timeframe and money should be considered. Once the tradeoffs have been agreed to, the study scope can be adjusted to meet the time allotted for the study and the budget.

Marketing research, in theory, focuses on the approaches and tools for doing marketing research. In practice, however, the marketing research encompasses much more: office politics and culture; time and budget constraints; dealing with organizational power and conflict; and identifying the appropriate political and resource balance for conducting the study.

Results from Nonprobability Samples Can Be Quite Useful, Despite Limits

August 18, 2010

In marketing research, samples are often taken of the population because surveys of everyone is neither feasible nor cost-effective. Researchers often rely on two types of sampling methods: probability and nonprobability. Probability sampling
methods are those in which members of the population have a known chance – or probability – of being selected for the sample. Nonprobability sampling methods are more subjective. The probability of someone selected for a nonprobability sample is not known or cannot be determined. In fact, the sampling process for nonprobability samples is much less formal.

Wherever possible, researchers prefer probability samples because they give an idea of how effectively the sample represents the population, and their results can easily be generalized to the entire population. With nonprobability samples, however, there’s no way to know how well the sample represents the population; any statistical analysis on the sample cannot be extrapolated to the population. Yet there are times when probability samples are not possible: either funds are not available, or time is of concern, or the population is hard to find or otherwise hidden. In those cases, marketing researchers must rely on nonprobability sampling approaches, like mall intercept surveys, convenience sampling, or referral sampling.

Despite their inability for generalization to the population, nonprobability samples can still provide valuable information. While the findings from nonprobability samples cannot be used for inferential purposes, they can be used for exploratory purposes. For instance, the owner of a sports bar may invite some of his patrons to sample some new beer brands he’s considering putting on tap. He might ask the patrons what they like and dislike about the new brands. If the owner notices that middle-aged male customers seem to like the Imperial Stout brand that is being sampled, the owner may consider the times of evening and days of the week when many middle-aged men are in high patronage at the bar, and then offer the Imperial Stout to see if it has any takers. Similarly, if customers are gravitating towards lighter, or sweeter beers, the owner may look to see if his selections on tap are lacking in these qualities. He may even consider testing more beers of those varieties.  In this case, the nonprobability sampling produced directional insights.

Sometimes, a population is hidden. If, for instance, you were trying to market a home remedy type of product to alleviate an embarrassing condition (such as hemorrhoids, erectile dysfunction, jock itch, or nail fungus) and you wanted to see how receptive people would be for using it, you might rely on a self-selection or referral sampling approach. You might place a special coupon in or on the package encouraging the user to participate in a brief study. You might also invite people to the study through newspaper or billboard ads.

While the people you get to participate in your study may not be representative of all people who suffer from the malady your product claims to remedy, they are nonetheless, from your relevant population. If several people complain that your remedy burns, or leaves a foul odor, or doesn’t last long, it doesn’t matter whether or not the results can be generalized. After all, if you want your product to be well-received by its intended customer, you want to minimize those complaints before you go to market. So then you might look to see if there’s a less abrasive ingredient you can substitute without diluting the effectiveness of your remedy. Another nice thing about these “hidden population studies” is that those people who participate in the study could point you in the direction of others suffering from the condition, enabling you to build a database for future studies and marketing. They may even provide you with insights about the population you’re interested in, so that you can do a full-scale study using a probability sample later.

Although results drawn from nonprobability samples have severe limitations, they shouldn’t be  discounted. Results from nonprobability sampling can provide directional information, lead you to your relevant customer base, enable you to see a problem from a different angle, and discover new ways of approaching a business problem. Such results can even lay the foundation for future probability study surveys. In addition, you can still get a lot of useful information in a short time with limited funds. As long as you critically evaluate your findings by knowing which groups were systematically excluded and which were overrepresented, and verifying that other researchers – using different samples and approaches – have replicated the findings you have generated, you should be OK.

*************************

Help us Reach 200 Fans on Facebook!

Thanks to all of you, Analysights now has more than 150 Facebook fans! We had hoped to get up to 200 fans by this past Friday, but weren’t so lucky. Can you help us out? If you like Forecast Friday – and our other posts – then we want you to “Like” us on Facebook! And if you like us that much, please also pass these posts on to your friends who like Insight Central and invite them to “Like” Analysights! By “Like-ing” us on Facebook, you’ll be informed every time a new blog post has been published, or when new information comes out. Check out our Facebook page! You can also follow us on Twitter. Thanks for your help!

Considerations for Selecting a Representative Sample

July 27, 2010

When trying to understand and make inferences about a population, it is neither possible nor cost effective to survey everyone who comprises that population. Therefore, analysts choose to survey a reasonably-sized sample of the population, whose results they can generalize to the entire population. Since such sampling is subject to error, it is vitally important that an analyst select a sample that is adequately representative of the population at large. Ensuring that a sample represents the population as accurately as possible requires that the sample be drawn using well-established, specific principles. In today’s post, we will be discussing the considerations for selecting a representative sample.

What is the Unit of Analysis?

What is the population you are interested in measuring? Let’s assume you are a market research analyst for a life insurance company and you are trying to understand the degree of existing life insurance coverage of households in the greater Chicago area. Already, this is a challenging prospect. What constitutes “life insurance coverage?” “A household”? or “The greater Chicago area?” As the analyst, you must define these before you can move forward. Does “coverage” mean having any life insurance policy, regardless of amount? Or does it mean having life insurance that covers the oft recommended eight to ten times the principal breadwinner’s salary? Does it mean having individual vs. group life insurance, or either one?

Does “household” mean a unit with at least one adult and the presence of children? Can a household consist of one person for your analysis?

Does the “greater Chicago area” mean every household within the Chicago metropolitan statistical area (MSA), as defined by the U.S. Census Bureau, or does it mean the city of Chicago and its suburban collar counties (e.g., Cook, DuPage, Lake, Will, McHenry, Kane, Kendall)?

All of these are considerations you must decide on.

You talk through these issues with some of the relevant stakeholders: your company’s actuarial department, the marketing department, and the product development department, and you learn some new information. You find out that your company wants to sell a highly-specialized life insurance product to young (under 40), high-salaried (at least $200,000) male heads-of-household that provides up to ten times the income coverage. You find that “male head-of-household” is construed to mean any man who has children under 18 present in his household and has either no spouse or a spouse earning less than $20,000 per year.

You also learn that this life insurance product is being pilot tested in the Chicago area, and that the insurance company’s captive agent force has offices only within the City and its seven collar counties, although agents may write policies for any qualifying person in Illinois. You can do one of two things here. Since all your company’s agents are in the City and collar counties, you might simply restrict your definition of “greater Chicago area” to this region. Or, you might select this area, and add to it nearby counties without agencies, where agents write a large number of policies. Whether you do the former or latter depends on the timeframe available to you. If you can easily and quickly obtain the information for determining the additional counties, you might select the latter definition. If not, you’ll likely go with the former. Let’s assume you choose only those in the City and its collar counties.

Another thing you find out through communicating with stakeholders is that the intent of this insurance product is to close gaps in, not replace, existing life insurance coverage. Hence, you now know your relevant population:

Men under the age of 40, living in the city of Chicago or its seven collar counties, with a salary income of at least $200,000 per year, heading a household with at least one child under 18 present, with either no spouse or a spouse earning less than $20,000 per year, and who have life insurance coverage that is less than ten times their annual salary income.

You can see that this is a very specific unit of analysis. For this type of insurance product, you do not want to survey the general population, as this product will be irrelevant for most. Hence, the above italicized definition is your working population. It is from this group that you want to draw your sample.

How Do You Reach This Working Population?

Now that you have identified your working population, you must find a master list of people from which to draw your sample. Such a list is known as the sample frame. As you’ve probably guessed, there is no one list that will contain your working list precisely. Hence, you will spend some time searching for as comprehensive a list, or some combination of lists that will contain as complete a list as possible of everyone in your working population. The degree to which your sample frame fails to account for all of your working population is known as its bias or sample frame error, and such error cannot be totally eradicated.

Sample frame error exists because some of these upscale households move out while others move in; some die; some have unlisted phone numbers or don’t give out their email addresses; some will lose their jobs, while others move into these high paying jobs; and some will hit age 40, or their wives will get higher paying jobs. And these changes are dynamic. There’s nothing you can do, except be aware of them.

To obtain your sample frame, you might start by asking yourself several questions about your working population: What ZIP codes are they likely to live in? What types of hobbies do they engage in? What magazines and newspapers do they subscribe to? Where do they take vacations? What clubs and civic organizations do they join? Do they use financial planners or CPA’s?

Armed with this information, you might purchase mailing lists of such men from magazine subscriptions; you might search phone listings in upscale Chicago area communities like Winnetka, Kenilworth, and Lake Forest. You might network with travel agents, real estate brokers, financial advisors, and charitable organization. You may also purchase membership lists from clubs. You will then combine these lists to come up with your sample frame. The degree to which you can do this depends on your time and budget constraints, as well as any regulatory and ethical practices (e.g., privacy, Do Not Call lists, etc.) governing collection of such lists.

Many market research firms have made identifying the sample frame much easier in recent years, thanks to survey panels. Panels are groups of respondents who have agreed in advance to participate in surveys. The existence of survey panels has greatly reduced the amount of time and cost involved in compiling one’s own sample frame. The drawback, however, is that respondents from a panel self-select to join the panel. And panel respondents can be very different from other members of the working population who are not on a panel.

Weeding Out the Irrelevant Population

Your sample frame will never include all those who fit your working population, nor will it exclude all those who do not fit your working population. As a result, you will need to eliminate extraneous members of your sample frame. Unfortunately, there’s no proactive way to do this. Typically, you must ask screening questions at the beginning of your survey to identify if a respondent qualifies to take the survey, and then terminate the survey if a respondent fails to meet the criteria.

Summary

Selecting a representative sample is an intricate process that requires serious thought and communication between stakeholders, about the objectives of the survey, the definition of the relevant working population, the approach to finding and reaching members of the working population, and the time, budget, and regulatory constraints involved. No sample will ever be completely representative of the population, but samples can and should be reasonably representative.

Forecast Friday Topic: Multicollinearity – Correcting and Accepting it

July 22, 2010

(Fourteenth in a series)

In last week’s Forecast Friday post, we discussed how to detect multicollinearity in a regression model and how dropping a suspect variable or variables from the model can be one approach to reducing or eliminating multicollinearity. However, removing variables can cause other problems – particularly specification bias – if the suspect variable is indeed an important predictor. Today we will discuss two additional approaches to correcting multicollinearity – obtaining more data and transforming variables – and will discuss when it’s best to just accept the multicollinearity.

Obtaining More Data

Multicollinearity is really an issue with the sample, not the population. Sometimes, sampling produces a data set that might be too homogeneous. One way to remedy this would be to add more observations to the data set. Enlarging the sample will introduce more variation in the data series, which reduces the effect of sampling error and helps increase precision when estimating various properties of the data. Increased sample sizes can reduce either the presence or the impact of multicollinearity, or both. Obtaining more data is often the best way to remedy multicollinearity.

Obtaining more data does have problems, however. Sometimes, additional data just isn’t available. This is especially the case with time series data, which can be limited or otherwise finite. If you need to obtain that additional information through great effort, it can be costly and time consuming. Also, the additional data you add to your sample could be quite similar to your original data set, so there would be no benefit to enlarging your data set. The new data could even make problems worse!

Transforming Variables

Another way statisticians and modelers go about eliminating multicollinearity is through data transformation. This can be done in a number of ways.

Combine Some Variables

The most obvious way would be to find a way to combine some of the variables. After all, multicollinearity suggests that two or more independent variables are strongly correlated. Perhaps you can multiply two variables together and use the product of those two variables in place of them.

So, in our example of the donor history, we had the two variables “Average Contribution in Last 12 Months” and “Times Donated in Last 12 Months.” We can multiply them to create a composite variable, “Total Contributions in Last 12 Months,” and then use that new variable, along with the variable “Months Since Last Donation” to perform the regression. In fact, if we did that with our model, we end up with a model (not shown here) that has an R2=0.895, and this time the coefficient for “Months Since Last Donation” is significant, as is our “Total Contribution” variable. Our F statistic is a little over 72. Essentially, the R2 and F statistics are only slightly lower than in our original model, suggesting that the transformation was useful. However, looking at the correlation matrix, we still see a strong negative correlation between our two independent variables, suggesting that we still haven’t eliminated multicollinearity.

Centered Interaction Terms

Sometimes we can reduce multicollinearity by creating an interaction term between variables in question. In a model trying to predict performance on a test based on hours spent studying and hours of sleep, you might find that hours spent studying appears to be related with hours of sleep. So, you create a third independent variable, Sleep_Study_Interaction. You do this by computing the average value for both the hours of sleep and hours of studying variables. For each observation, you subtract each independent variable’s mean from its respective value for that observation. Once you’ve done that for each observation, multiply their differences together. This is your interaction term, Sleep_Study_Interaction. Run the regression now with the original two variables and the interaction term. When you subtract the means from the variables in question, you are in effect centering interaction term, which means you’re taking into account central tendency in your data.

Differencing Data

If you’re working with time series data, one way to reduce multicollinearity is to run your regression using differences. To do this, you take every variable – dependent and independent – and, beginning with the second observation – subtract the immediate prior observation’s values for those variables from the current observation. Now, instead of working with original data, you are working with the change in data from one period to the next. Differencing eliminates multicollinearity by removing the trend component of the time series. If all independent variables had followed more or less the same trend, they could end up highly correlated. Sometimes, however, trends can build on themselves for several periods, so multiple differencing may be required. In this case, subtracting the period before was taking a “first difference.” If we subtracted two periods before, it’s a “second difference,” and so on. Note also that with differencing, we lose the first observations in the data, depending on how many periods we have to difference, so if you have a small data set, differencing can reduce your degrees of freedom and increase your risk of making a Type I Error: concluding that an independent variable is not statistically significant when, in truth it is.

Other Transformations

Sometimes, it makes sense to take a look at a scatter plot of each independent variable’s values with that of the dependent variable to see if the relationship is fairly linear. If it is not, that’s a cue to transform an independent variable. If an independent variable appears to have a logarithmic relationship, you might substitute its natural log. Also, depending on the relationship, you can use other transformations: square root, square, negative reciprocal, etc.

Another consideration: if you’re predicting the impact of violent crime on a city’s median family income, instead of using the number of violent crimes committed in the city, you might instead divide it by the city’s population and come up with a per-capita figure. That will give more useful insights into the incidence of crime in the city.

Transforming data in these ways helps reduce multicollinearity by representing independent variables differently, so that they are less correlated with other independent variables.

Limits of Data Transformation

Transforming data has its own pitfalls. First, transforming data also transforms the model. A model that uses a per-capita crime figure for an independent variable has a very different interpretation than one using an aggregate crime figure. Also, interpretations of models and their results get more complicated as data is transformed. Ideally, models are supposed to be parsimonious – that is, they explain a great deal about the relationship as simply as possible. Typically, parsimony means as few independent variables as possible, but it also means as few transformations as possible. You also need to do more work. If you try to plug in new data to your resulting model for forecasting, you must remember to take the values for your data and transform them accordingly.

Living With Multicollinearity

Multicollinearity is par for the course when a model consists of two or more independent variables, so often the question isn’t whether multicollinearity exists, but rather how severe it is. Multicollinearity doesn’t bias your parameter estimates, but it inflates their variance, making them inefficient or untrustworthy. As you have seen from the remedies offered in this post, the cures can be worse than the disease. Correcting multicollinearity can also be an iterative process; the benefit of reducing multicollinearity may not justify the time and resources required to do so. Sometimes, any effort to reduce multicollinearity is futile. Generally, for the purposes of forecasting, it might be perfectly OK to disregard the multicollinearity. If, however, you’re using regression analysis to explain relationships, then you must try to reduce the multicollinearity.

A good approach is to run a couple of different models, some using variations of the remedies we’ve discussed here, and comparing their degree of multicollinearity with that of the original model. It is also important to compare the forecast accuracy of each. After all, if all you’re trying to do is forecast, then a model with slightly less multicollinearity but a higher degree of forecast error is probably not preferable to a more precise forecasting model with higher degrees of multicollinearity.

The Takeaways:

  1. Where you have multiple regression, you almost always have multicollinearity, especially in time series data.
  2. A correlation matrix is a good way to detect multicollinearity. Multicollinearity can be very serious if the correlation matrix shows that some of the independent variables are more highly correlated with each other than they are with the dependent variable.
  3. You should suspect multicollinearity if:
    1. You have a high R2 but low t-statistics;
    2. The sign for a coefficient is opposite of what is normally expected (a relationship that should be positive is negative, and vice-versa).
  4. Multicollinearity doesn’t bias parameter estimates, but makes them untrustworthy by enlarging their variance.
  5. There are several ways of remedying multicollinearity, with obtaining more data often being the best approach. Each remedy for multicollinearity contributes a new set of problems and limitations, so you must weigh the benefit of reduced multicollinearity on time and resources needed to do so, and the resulting impact on your forecast accuracy.

Next Forecast Friday Topic: Autocorrelation

These past two weeks, we discussed the problem of multicollinearity. Next week, we will discuss the problem of autocorrelation – the phenomenon that occurs when we violate the assumption that the error terms are not correlated with each other. We will discuss how to detect autocorrelation, discuss in greater depth the Durbin-Watson statistic’s use as a measure of the presence of autocorrelation, and how to correct for autocorrelation.

*************************

If you Like Our Posts, Then “Like” Us on Facebook and Twitter!

Analysights is now doing the social media thing! If you like Forecast Friday – or any of our other posts – then we want you to “Like” us on Facebook! By “Like-ing” us on Facebook, you’ll be informed every time a new blog post has been published, or when other information comes out. Check out our Facebook page! You can also follow us on Twitter.

Consider Respondents When Using Rating Scale Questions in Surveys

July 13, 2010

The art of questionnaire design is full of so many minute details, especially for designing rating scales. The considerations for ratings questions are as normative as they are numerous: how many ratings points to use? Even or odd number of points? Balanced or unbalanced scale? Forced or unforced choice? There are many options, and many researchers default to a five or 10-point rating scale just out of rote or past experience. A poorly – or overly painstakingly – chosen rating scale can lead to biased responses, respondent fatigue and abandonment, and useless results. When deciding on what rating scales to use, it is most important to consider first who your respondents are.

How Many Points?

The number of points to use in a rating scale can be challenging. Use too few points, and you may not get much precise data; use too many, and you may confuse or tire your respondents. Just how many points are appropriate depends on your audience. If your respondents are likely to skew either heavily positive or heavily negative, then you might want to opt for more points, like a seven to 10-point scale. This is because people who are generally positive (or negative) toward your company or product can have different intensities in their attitudes and agreements.

Let’s assume a professional association conducts a survey of its members and asks the question “Overall, how satisfied are you with your membership in our organization?” Consider a five-point scale below:

Generally, if 80% of the association’s members were either “satisfied” or “very satisfied,” it’s really of no value to the association. There’s no way to gauge the intensity of their satisfaction. But, if the association were to use a nine-point scale like this one:

                 

Then those 80% of satisfied members will be more spread out in terms of their satisfaction. For example, if 80% of respondents give a score greater than 5, but only 10% give a score of 9, then the association has an approximation of its hardest core supporters, and then has a better idea of how fluid member satisfaction is. It can then focus on developing programs that graduate members from the six to eight ratings towards a nine.

Also, the lengthier scale can be useful if you’re using this question’s responses as a dependent variable when performing regression analysis, using the responses of other questions to predict responses to this question. These options are not available with the five-point scale. Of course, a seven-point scale might be used instead of a nine-point, depending on the degree of skewness in responses.

How do you Determine the Degree of Respondent Skewness Before Administering the Survey?

It can be hard to know in advance how respondents will rate and whether the ratings will be normally distributed or skewed. There are two ways to find out: past surveys and pilot surveys.

Past Surveys

If the association has conducted this membership satisfaction survey in the past, it might see how respondents have traditionally fallen. If responses have generally been normally distributed, and the association has been using a five-point scale, then the association might want to stay the course.

On the other hand, if the association finds that past surveys are falling lopsidedly on one side of the five-point survey, then it might want to consider increasing the length of the scale. Or, if the association was using a seven or nine-point scale previously and finding sparse responses on both ends (because of the wide lengths), it may choose to collapse the scales down to five points.

Making changes to survey scales based on past survey responses can be problematic, however, if the past surveys are used for benchmarking. Care must be exercised to ensure that the results of the modified scale are easily translatable or imputable to the results of the past survey scales, so that comparability is maintained.

Pilot Surveys

The association can also use a pilot survey as a litmus test for the spread among respondent opinion. If the association is unsure how members will score on certain rating questions, it might send out two or three versions of the same questions to a very small sample of its membership, one testing a five-point, another a seven-point, and the other a nine-point. If results come back with a normal distribution on the five-point, and more sparse and spread out on the seven and nine point scales, then the association knows that a five-point scale is appropriate.

If, on the other hand, the association notices concentration on one end of the scale for all three versions, then it can look at the seven and the nine-point tests. If it sees more sparseness in the nine-point scale, then it may opt for the seven-point scale. Otherwise, it may choose to go with the nine-point scale.

Of course, for the pilot survey to work, each member of the association must have an equal chance of selection. Of those members who do receive the pilot survey, each must also have an equal chance of getting one of the three versions. This ensures a random probability sample which can be generalized to the association’s full membership base.

As you can see, there are lots of considerations involved in constructing a rating scale question. In tomorrow’s blog post, we’ll discuss whether it’s best to use an even or odd number of points, and hence, forced and unforced choices. 

*************************************

Let Analysights Take the Pain out of Survey Design!

Rating scales are but one of the important things you need to consider when designing an effective survey.  If you need to design a survey that gets to the heart of what you need to know in order for your company to achieve marketing success, call on Analysights.  We will take the drudgery out of designing your survey, so you can concentrate on running your business.  Check out our Web site or call (847) 895-2565.