Posts Tagged ‘survey sampling’

Considerations for Selecting a Representative Sample

July 27, 2010

When trying to understand and make inferences about a population, it is neither possible nor cost effective to survey everyone who comprises that population. Therefore, analysts choose to survey a reasonably-sized sample of the population, whose results they can generalize to the entire population. Since such sampling is subject to error, it is vitally important that an analyst select a sample that is adequately representative of the population at large. Ensuring that a sample represents the population as accurately as possible requires that the sample be drawn using well-established, specific principles. In today’s post, we will be discussing the considerations for selecting a representative sample.

What is the Unit of Analysis?

What is the population you are interested in measuring? Let’s assume you are a market research analyst for a life insurance company and you are trying to understand the degree of existing life insurance coverage of households in the greater Chicago area. Already, this is a challenging prospect. What constitutes “life insurance coverage?” “A household”? or “The greater Chicago area?” As the analyst, you must define these before you can move forward. Does “coverage” mean having any life insurance policy, regardless of amount? Or does it mean having life insurance that covers the oft recommended eight to ten times the principal breadwinner’s salary? Does it mean having individual vs. group life insurance, or either one?

Does “household” mean a unit with at least one adult and the presence of children? Can a household consist of one person for your analysis?

Does the “greater Chicago area” mean every household within the Chicago metropolitan statistical area (MSA), as defined by the U.S. Census Bureau, or does it mean the city of Chicago and its suburban collar counties (e.g., Cook, DuPage, Lake, Will, McHenry, Kane, Kendall)?

All of these are considerations you must decide on.

You talk through these issues with some of the relevant stakeholders: your company’s actuarial department, the marketing department, and the product development department, and you learn some new information. You find out that your company wants to sell a highly-specialized life insurance product to young (under 40), high-salaried (at least $200,000) male heads-of-household that provides up to ten times the income coverage. You find that “male head-of-household” is construed to mean any man who has children under 18 present in his household and has either no spouse or a spouse earning less than $20,000 per year.

You also learn that this life insurance product is being pilot tested in the Chicago area, and that the insurance company’s captive agent force has offices only within the City and its seven collar counties, although agents may write policies for any qualifying person in Illinois. You can do one of two things here. Since all your company’s agents are in the City and collar counties, you might simply restrict your definition of “greater Chicago area” to this region. Or, you might select this area, and add to it nearby counties without agencies, where agents write a large number of policies. Whether you do the former or latter depends on the timeframe available to you. If you can easily and quickly obtain the information for determining the additional counties, you might select the latter definition. If not, you’ll likely go with the former. Let’s assume you choose only those in the City and its collar counties.

Another thing you find out through communicating with stakeholders is that the intent of this insurance product is to close gaps in, not replace, existing life insurance coverage. Hence, you now know your relevant population:

Men under the age of 40, living in the city of Chicago or its seven collar counties, with a salary income of at least $200,000 per year, heading a household with at least one child under 18 present, with either no spouse or a spouse earning less than $20,000 per year, and who have life insurance coverage that is less than ten times their annual salary income.

You can see that this is a very specific unit of analysis. For this type of insurance product, you do not want to survey the general population, as this product will be irrelevant for most. Hence, the above italicized definition is your working population. It is from this group that you want to draw your sample.

How Do You Reach This Working Population?

Now that you have identified your working population, you must find a master list of people from which to draw your sample. Such a list is known as the sample frame. As you’ve probably guessed, there is no one list that will contain your working list precisely. Hence, you will spend some time searching for as comprehensive a list, or some combination of lists that will contain as complete a list as possible of everyone in your working population. The degree to which your sample frame fails to account for all of your working population is known as its bias or sample frame error, and such error cannot be totally eradicated.

Sample frame error exists because some of these upscale households move out while others move in; some die; some have unlisted phone numbers or don’t give out their email addresses; some will lose their jobs, while others move into these high paying jobs; and some will hit age 40, or their wives will get higher paying jobs. And these changes are dynamic. There’s nothing you can do, except be aware of them.

To obtain your sample frame, you might start by asking yourself several questions about your working population: What ZIP codes are they likely to live in? What types of hobbies do they engage in? What magazines and newspapers do they subscribe to? Where do they take vacations? What clubs and civic organizations do they join? Do they use financial planners or CPA’s?

Armed with this information, you might purchase mailing lists of such men from magazine subscriptions; you might search phone listings in upscale Chicago area communities like Winnetka, Kenilworth, and Lake Forest. You might network with travel agents, real estate brokers, financial advisors, and charitable organization. You may also purchase membership lists from clubs. You will then combine these lists to come up with your sample frame. The degree to which you can do this depends on your time and budget constraints, as well as any regulatory and ethical practices (e.g., privacy, Do Not Call lists, etc.) governing collection of such lists.

Many market research firms have made identifying the sample frame much easier in recent years, thanks to survey panels. Panels are groups of respondents who have agreed in advance to participate in surveys. The existence of survey panels has greatly reduced the amount of time and cost involved in compiling one’s own sample frame. The drawback, however, is that respondents from a panel self-select to join the panel. And panel respondents can be very different from other members of the working population who are not on a panel.

Weeding Out the Irrelevant Population

Your sample frame will never include all those who fit your working population, nor will it exclude all those who do not fit your working population. As a result, you will need to eliminate extraneous members of your sample frame. Unfortunately, there’s no proactive way to do this. Typically, you must ask screening questions at the beginning of your survey to identify if a respondent qualifies to take the survey, and then terminate the survey if a respondent fails to meet the criteria.

Summary

Selecting a representative sample is an intricate process that requires serious thought and communication between stakeholders, about the objectives of the survey, the definition of the relevant working population, the approach to finding and reaching members of the working population, and the time, budget, and regulatory constraints involved. No sample will ever be completely representative of the population, but samples can and should be reasonably representative.

Advertisements

Beware of “Professional” Survey Respondents!

April 3, 2009

Thanks to the Internet, conducting surveys has never been easier.  Being able to use the Web to conduct marketing research has greatly reduced the cost and time involved and has democratized the process for many companies.

While online surveys have increased simplicity and cost-savings, they have also given rise to a dangerous breed of respondents – “Professional” survey-takers.   

A “professional” respondent is one who actively seeks out online surveys offering paid incentives – cash, rewards, or some other benefit – for completing the survey.  In fact, many blogs and online articles tell of different sites people can go to find paid online surveys.

If your company conducts online surveys, “professionals” can render your findings useless.  In order for your survey to provide accurate and useful results, the people surveyed must be representative of the population you are measuring and selected randomly (that is, everyone from the population has an equal chance of selection).

“Professionals” subvert the sampling principles of representativeness and randomness simply because they self-select to take the survey.  The survey tool does not know that they are not part of the population to be measured, nor their probability of selection.  What’s more, online surveys exclude persons from the population without Internet access.  This results in a survey bias double-whammy.

In addition, “professionals” may simply go through a survey for the sake of the incentive.  Hence they may speed through it, paying little or no attention to the questions, or they may give untruthful answers.  Now your survey results are both biased and wrong.

 Minimizing the impact of “Professionals”

There are some steps you can take to protect your survey from “professionals,” including:

  • Maintain complete control of your survey distribution.  If possible, use a professional online survey panel company, such as e-Rewards, Greenfield Online, or Harris Interactive.  There are lots of others, and all maintain tight screening processes for their survey participants and tight controls for distribution of your survey;
  • If an online survey panel is out of your budget, perhaps you can build your own controlled e-mail list (following CAN-SPAM laws, of course).  E-mailing your survey is less prone to bias than keeping it on a Web site for anyone to join.
  • Have adequate screening criteria in your survey.  If you can get respondents to sign in using a passcode and/or ask questions at the beginning, which terminate the survey for people whose responses indicate they are not representative of the population, you can reduce the number of “professionals”;
  • Put “speed bumps” into your survey.  An example would be to have a dummy question inside that simply says: “Select the 3rd radio bottom from the top.”  Put two or three bumps in your survey.  A respondent who answers two or more of those bump questions incorrectly is likely to be a speeder and the survey can be instructed to terminate;
  • Ask validation questions.  That is, ask a question one way and then later in the survey ask it in another form, and see if the responses are consistent.  If they’re not, then the respondent may be a “professional” or a speeder.

The Internet may have made marketing research easier, but it has also made it more susceptible to bias.  The tools to conduct marketing research have become much easier and more user-friendly, but that doesn’t change the principles of statistics and marketing research.  Online surveys, no matter how easily, fast, or cheaply they can be implemented, will waste time and money if those principles are violated.