Posts Tagged ‘statistical sampling’

Considerations for Selecting a Representative Sample

July 27, 2010

When trying to understand and make inferences about a population, it is neither possible nor cost effective to survey everyone who comprises that population. Therefore, analysts choose to survey a reasonably-sized sample of the population, whose results they can generalize to the entire population. Since such sampling is subject to error, it is vitally important that an analyst select a sample that is adequately representative of the population at large. Ensuring that a sample represents the population as accurately as possible requires that the sample be drawn using well-established, specific principles. In today’s post, we will be discussing the considerations for selecting a representative sample.

What is the Unit of Analysis?

What is the population you are interested in measuring? Let’s assume you are a market research analyst for a life insurance company and you are trying to understand the degree of existing life insurance coverage of households in the greater Chicago area. Already, this is a challenging prospect. What constitutes “life insurance coverage?” “A household”? or “The greater Chicago area?” As the analyst, you must define these before you can move forward. Does “coverage” mean having any life insurance policy, regardless of amount? Or does it mean having life insurance that covers the oft recommended eight to ten times the principal breadwinner’s salary? Does it mean having individual vs. group life insurance, or either one?

Does “household” mean a unit with at least one adult and the presence of children? Can a household consist of one person for your analysis?

Does the “greater Chicago area” mean every household within the Chicago metropolitan statistical area (MSA), as defined by the U.S. Census Bureau, or does it mean the city of Chicago and its suburban collar counties (e.g., Cook, DuPage, Lake, Will, McHenry, Kane, Kendall)?

All of these are considerations you must decide on.

You talk through these issues with some of the relevant stakeholders: your company’s actuarial department, the marketing department, and the product development department, and you learn some new information. You find out that your company wants to sell a highly-specialized life insurance product to young (under 40), high-salaried (at least $200,000) male heads-of-household that provides up to ten times the income coverage. You find that “male head-of-household” is construed to mean any man who has children under 18 present in his household and has either no spouse or a spouse earning less than $20,000 per year.

You also learn that this life insurance product is being pilot tested in the Chicago area, and that the insurance company’s captive agent force has offices only within the City and its seven collar counties, although agents may write policies for any qualifying person in Illinois. You can do one of two things here. Since all your company’s agents are in the City and collar counties, you might simply restrict your definition of “greater Chicago area” to this region. Or, you might select this area, and add to it nearby counties without agencies, where agents write a large number of policies. Whether you do the former or latter depends on the timeframe available to you. If you can easily and quickly obtain the information for determining the additional counties, you might select the latter definition. If not, you’ll likely go with the former. Let’s assume you choose only those in the City and its collar counties.

Another thing you find out through communicating with stakeholders is that the intent of this insurance product is to close gaps in, not replace, existing life insurance coverage. Hence, you now know your relevant population:

Men under the age of 40, living in the city of Chicago or its seven collar counties, with a salary income of at least $200,000 per year, heading a household with at least one child under 18 present, with either no spouse or a spouse earning less than $20,000 per year, and who have life insurance coverage that is less than ten times their annual salary income.

You can see that this is a very specific unit of analysis. For this type of insurance product, you do not want to survey the general population, as this product will be irrelevant for most. Hence, the above italicized definition is your working population. It is from this group that you want to draw your sample.

How Do You Reach This Working Population?

Now that you have identified your working population, you must find a master list of people from which to draw your sample. Such a list is known as the sample frame. As you’ve probably guessed, there is no one list that will contain your working list precisely. Hence, you will spend some time searching for as comprehensive a list, or some combination of lists that will contain as complete a list as possible of everyone in your working population. The degree to which your sample frame fails to account for all of your working population is known as its bias or sample frame error, and such error cannot be totally eradicated.

Sample frame error exists because some of these upscale households move out while others move in; some die; some have unlisted phone numbers or don’t give out their email addresses; some will lose their jobs, while others move into these high paying jobs; and some will hit age 40, or their wives will get higher paying jobs. And these changes are dynamic. There’s nothing you can do, except be aware of them.

To obtain your sample frame, you might start by asking yourself several questions about your working population: What ZIP codes are they likely to live in? What types of hobbies do they engage in? What magazines and newspapers do they subscribe to? Where do they take vacations? What clubs and civic organizations do they join? Do they use financial planners or CPA’s?

Armed with this information, you might purchase mailing lists of such men from magazine subscriptions; you might search phone listings in upscale Chicago area communities like Winnetka, Kenilworth, and Lake Forest. You might network with travel agents, real estate brokers, financial advisors, and charitable organization. You may also purchase membership lists from clubs. You will then combine these lists to come up with your sample frame. The degree to which you can do this depends on your time and budget constraints, as well as any regulatory and ethical practices (e.g., privacy, Do Not Call lists, etc.) governing collection of such lists.

Many market research firms have made identifying the sample frame much easier in recent years, thanks to survey panels. Panels are groups of respondents who have agreed in advance to participate in surveys. The existence of survey panels has greatly reduced the amount of time and cost involved in compiling one’s own sample frame. The drawback, however, is that respondents from a panel self-select to join the panel. And panel respondents can be very different from other members of the working population who are not on a panel.

Weeding Out the Irrelevant Population

Your sample frame will never include all those who fit your working population, nor will it exclude all those who do not fit your working population. As a result, you will need to eliminate extraneous members of your sample frame. Unfortunately, there’s no proactive way to do this. Typically, you must ask screening questions at the beginning of your survey to identify if a respondent qualifies to take the survey, and then terminate the survey if a respondent fails to meet the criteria.

Summary

Selecting a representative sample is an intricate process that requires serious thought and communication between stakeholders, about the objectives of the survey, the definition of the relevant working population, the approach to finding and reaching members of the working population, and the time, budget, and regulatory constraints involved. No sample will ever be completely representative of the population, but samples can and should be reasonably representative.

Advertisements

Forecast Friday Topic: Simple Regression Analysis (Continued)

June 3, 2010

(Seventh in a series)

Last week I introduced the concept of simple linear regression and how it could be used in forecasting. I introduced the fictional businesswoman, Sue Stone, who runs her own CPA firm. Using the last 12 months of her firm’s sales, I walked you through the regression modeling process: determining the independent and dependent variables, estimating the parameter estimates, α and β, deriving the regression equation, calculating the residuals for each observation, and using those residuals to estimate the coefficient of determination – R2 – which indicates how much of the change in the dependent variable is explained by changes in the independent variable. Then I deliberately skipped a couple of steps to get straight to using the regression equation for forecasting. Today, I am going to fill in that gap, and then talk about a couple of other things so that we can move on to next week’s topic on multiple regression.

Revisiting Sue Stone

Last week, we helped Sue Stone develop a model using simple regression analysis, so that she could forecast sales. She had 12 months of sales data, which was her dependent variable, or Y, and each month (numbered from 1 to 12), was her independent variable, or X. Sue’s regression equation was as follows:

Where i is the period number corresponding to the month. So, in June 2009, i would be equal to 6; in January 2010, i would be equal to 13. Of course, since X is the month number, X=i in this example. Recall that Sue’s equation states that each passing month is associated with an average sales increase of $479.02, suggesting her sales are on an upward trend. Also note that Sue’s R2=.917, which says 91.7% of the change in Sue’s monthly sales is explained by changes in the passing months.

Are these claims valid? We need to do some further work here.

Are the Parameter Estimates Statistically Significant?

Measuring an entire population is often impossible. Quite often, we must measure a sample of the population and generalize our findings to the population. When we take an average or standard deviation of a data set that is a subset of the population, our values are estimates of the actual parameters for the population’s true average and standard deviation. These are subject to sampling error. Likewise, when we perform regression analysis on a sample of the population, our coefficients (a and b) are also subject to sampling error. Whenever we estimate population parameters (the population’s true α and β), we are frequently concerned that they might actually have values of zero. Even though we have derived values a=$9636.36 and b=$479.02, we want to perform a statistical significance test to make sure their distance from zero is meaningful and not due to sampling error.

Recall from the May 25 blog post, Using Statistics to Evaluate a Promotion, that in order to do significance testing, we must set up a hypothesis test. In this case, our null hypothesis is that the true population coefficient for month – β – is equal to zero. Our alternative hypothesis is that β is not equal to zero:

H0: β = 0

HA: β≠ 0

Our first step here is to compute the standard error of the estimate, that is, how spread out each value of the dependent variable (sales) is from the average value of sales. Since we are sampling from a population, we are looking for the estimator for the standard error of the estimate. That equation is:

Where ESS is the error sum of squares – or $2,937,062.94 – from Sue’s equation; n is the sample size, or 12; k is the number of independent variables in the model, in this case, just 1. When we plug those numbers into the above equation, we’re dividing the ESS by 10 and then taking the square root, so Sue’s estimator is:

sε = $541.95

Now that we know the estimator for the standard error of the estimate, we need to use that to find the estimator for the standard deviation of the regression slope (b). That equation is given by:

Remember from last week’s blog post that the sum of all the (x-xbar) squared values was 143. Since we have the estimator for the standard error of the estimate, we divide $541.95 by the square root of 143 to get an Sb = 45.32. Next we need to compute the t-statistic. If Sue’s t-statistic is greater than her critical t-value, then she’ll know the parameter estimate of $479.02 is significant. In Sue’s regression, she has 12 observations, and thus 10 degrees of freedom: (n-k-1) = (12-1-1) = 10. Assuming a 95% confidence interval, her critical t is 2.228. Since parameter estimates can be positive or negative, if her t value is less than -2.228 or greater than 2.228, Sue can reject her null hypothesis and conclude that her parameter estimates is meaningfully different from zero.

To compute the t-statistic, all Sue needs to do is divide her b1 coefficient ($479.02) by her sb ($45.32). She ends up with a t-statistic of 10.57, which is significant.

Next Sue must do the same for her intercept value, a. To do this, Sue, must compute the estimator of the standard deviation of the intercept (a). The equation for this estimate is:

All she needs to do is plug in her numbers from earlier: her sε = $541.95; n=12; she just takes her average x-bar of 6.5 and squares it, bringing it to 42.25; and the denominator is the same 143. Working that all in, Sue gets a standard error of 333.545. She divides her intercept value of $9636.36 by 333.545 and gets a t-statistic of 28.891, which exceeds the 2.228 critical t, so her intercept is also significant.

Prediction Intervals in Forecasting

Whew! Aren’t you glad those t-statistics calculations are over? If you run regressions in Excel, these values will be calculated for you automatically, but it’s very important that you understand how they were derived and the theory behind them. Now, we move back to forecasting. In last week’s post, we predicted just a single point with the regression equation. For January 2010, we substituted the number 13 for X, and got a point forecast for sales in that month: $15,863.64. But Sue needs a range, because she knows forecasts are not precise. Sue wants to develop a prediction interval. A prediction interval is simply the point forecast plus or minus the critical t value (2.228) for a desired level of confidence (95%, in this example) times the estimator of the standard error of the estimate ($541.95). So, Sue’s prediction interval is:

$15,863.64 ± 2.228($541.95)

= $15,863.64 ± $1,207.46

$14,656.18_____$17,071.10

So, since Sue had chosen a 95% level of confidence, she can be 95% confident that January 2010 sales will fall somewhere between $14,656.18 and $17,071.10

Recap and Plan for Next Week’s Post

Today, you learned how to test the parameter estimates for significance to determine the validity of your regression model. You also learned how to compute the estimates of the standard error of the estimates, as well as the estimators of the standard deviations of the slope and intercept. You then learned how to derive the t-statistics you need to determine whether those parameter estimates were indeed significant. And finally, you learned how to derive a prediction interval. Next week, we begin our discussion of multiple regression. We will begin by talking about the assumptions behind a regression model; then we will talk about adding a second independent variable into the model. From there, we will test the model for validity, assess the model against those assumptions, and generate projections.

Using Statistics to Evaluate a Promotion

May 25, 2010

Marketing – as much as cashflow – is the lifeblood of any business. No matter how good your product or service may be, it’s worthless if you can’t get it in front of your customers and get them to buy it. So all businesses, large and small, must engage in marketing. And we see countless types of marketing promotions or tactics being tried: radio and TV commercials, magazine and newspaper advertisements, public relations, coupons, email blasts, and so forth. But are our promotions working? The merchant John Wannamaker, often dubbed the father of modern advertising is said to have remarked, “Half the money I spend on advertising is wasted; the trouble is I don’t know which half.”

Some basic statistics can help you evaluate the effectiveness of your marketing and take away much of the mystique Wannamaker complained about. When deciding whether to do a promotion, managers and business owners have no way of knowing whether it will succeed; in fact, in today’s economy, budgets are still tight. The cost to roll out a full promotion can wipe out an entire marketing budget if it proves to be a fiasco. This is why many businesses do a test before doing a complete rollout. The testing helps to reduce the amount of uncertainty involved in an all-out campaign.

Quite often, large companies need to choose between two or more competing campaigns for rollout. But how do they know which will be effective? Consider the example of Jenny Kaplan, owner of K-Jen, a New Orleans-style restaurant. K-Jen serves up a tasty jambalaya entrée, which is priced at $10.00. Jenny believes that the jambalaya is a draw to the restaurant and believes that by offering a discount, she can increase the average amount of the table check. Jenny decides to issue coupons via email to patrons who have opted-in to receive such promotions. She wants to knock a dollar off the price of the jambalaya as the offer, but doesn’t know whether customers would respond better to an offer worded as “$1.00 off” or as “10% off.” So, Jenny decides to test the two concepts.

Jenny goes to her database of nearly 1,000 patrons and randomly selects 200 patrons. She decides to send half of those a coupon for $1.00 off for jambalaya, and the other half a coupon for 10% off. When the coupon offer expires 10 days later, Jenny finds that 10 coupons were redeemed for each offer – a redemption rate of 10% each. Jenny observes that either wording will get the same number of people to respond. But she wonders which offer generated the largest table check. So she looks at the guest checks to which the coupons were stapled. She notices the following:

Guest Check Amounts

 

Offer

 
 

$1.00 off

10% Off

 
 

$38.85

$50.16

 
 

$36.97

$54.44

 
 

$35.94

$32.20

 
 

$54.17

$32.69

 
 

$68.18

$51.09

 
 

$49.47

$46.18

 
 

$51.39

$57.72

 
 

$32.72

$44.30

 
 

$22.59

$59.29

 
 

$24.13

$22.94

 

 

Jenny quickly computes the average for each offer. The “$1.00 off” coupon generated an average table check of $41.44; the “10% off” coupon generated an average of $45.10. At first glance, it appears that the 10% off promotion generated a higher guest check. But is that difference meaningful, or is it due to chance? Jenny needs to do further analysis.

Hypothesis Testing

How does Jenny determine if the 10% off coupon really did better than the $1.00 off coupon? She can use statistical hypothesis testing, which is a structured analytical method for comparing the difference between two groups – in this case, two promotions. Jenny starts her analysis by formulating two hypotheses: a null hypothesis, which states that there is no difference in the average check amount for either offer; and an alternative hypothesis, which states that there is, in fact, a difference in the average check amount between the two offers. The null hypothesis is often denoted as H0, and the alternative hypothesis is denoted as HA. Jenny also refers to the $1.00 off offer as Offer #1, and the 10% off offer as Offer #2. She wants to compare the means of the two offers, the means of which are denoted as μ1 and μ2, respectively. Jenny writes down her two hypotheses:

H0: The average guest check amount for the two offers is equal.

HA: The average guest check amount for the two offers is not equal.

Or, more succinctly:

H0: μ12

HA: μ1≠μ2

 

Now, Jenny is ready to go to work. Note that the symbol μ denotes the population she wants to measure. Because Jenny did her test on a portion – a sample – of her database, the averages she computed were the sample average, which is denoted as . As we stated earlier, the average table checks for the “$1.00 off” and “10% off” offers were 1=$41.44 and 2=$45.10, respectively. Jenny needs to approximate μ using . She must also compute the sample standard deviation, or s for each offer.

Computing the Sample Standard Deviation

To compute the sample standard deviation, Jenny must subtract the mean of a particular offer from each of its check amounts in the sample; square the difference; sum them up; divide by the total observations minus 1(9) and then take the square root:

$1.00 Off

Actual Table Check

Average Table Check

Difference

Difference Squared

$38.85

$41.44

-$2.59

$6.71

$36.97

$41.44

-$4.47

$19.99

$35.94

$41.44

-$5.50

$30.26

$54.17

$41.44

$12.73

$162.03

$68.18

$41.44

$26.74

$714.97

$49.47

$41.44

$8.03

$64.46

$51.39

$41.44

$9.95

$98.98

$32.72

$41.44

-$8.72

$76.06

$22.59

$41.44

-$18.85

$355.36

$24.13

$41.44

-$17.31

$299.67

   

Total

$1,828.50

   

S21=

$203.17

   

S1=

$14.25

 

10% Off

Actual Table Check

Average Table Check

Difference

Difference Squared

$50.16

$45.10

$5.06

$25.59

$54.44

$45.10

$9.34

$87.22

$32.20

$45.10

-$12.90

$166.44

$32.69

$45.10

-$12.41

$154.03

$51.09

$45.10

$5.99

$35.87

$46.18

$45.10

$1.08

$1.16

$57.72

$45.10

$12.62

$159.24

$44.30

$45.10

-$0.80

$0.64

$59.29

$45.10

$14.19

$201.33

$22.94

$45.10

-$22.16

$491.11

   

Total

$1,322.63

   

S22=

$146.96

   

S2=

$12.12

 

Notice the denotation of S2. That is known as the variance. The variance and the standard deviation are used to measure the average distance between each data point and the mean. When data are normally distributed, about 95% of all observations fall within two standard deviations from the mean (actually 1.96 standard deviations). Hence, approximately 95% of the guest checks for the $1.00 off offer should fall between $41.44 ± 1.96*($14.25) or between $13.51 and $69.37. All ten fall within this range. For the 10% off offer, about 95% will fall between $45.10 ± 1.96*($12.12), or between $21.34 and $68.86. All 10 observations also fall within this range.

Degrees of Freedom and Pooled Standard Deviation

Jenny noticed two things immediately: first, that the 10% off coupon has the higher sample average, and second each individual table check is closer to it mean than it is for the $1.00 off coupon. Also notice that when we were computing the sample standard deviation for each offer, Jenny divided by 9, and not 10. Why? Because she was making estimates of the population standard deviation. Since samples are subject to error, we must account for that. Each observation gives us information into the population’s actual values. However, Jenny had to make an estimate based on that sample, so she gives up one observation to account for the sampling error – that is, she lost a degree of freedom. In this example, Jenny has 20 total observations; since she estimated the population standard deviation for both offers, she lost two degrees of freedom, leaving her with 18 (10 + 10 – 2).

Knowing the remaining degrees of freedom, Jenny must pool the standard deviations, weighting them by their degrees of freedom. This would be especially evident if the sample sizes of the two offers were not equal. The pooled standard deviation is given by:

FYI – n is simply the sample size. Jenny then computes the pooled standard deviation:

S2p = ((9 * $203.17) + (9 * $146.96))) / (10 + 10 – 2)

= ($1,828.53 + $1,322.64)/18

= $3,151.17/18

= $175.07

Now take the square root: $13.23

Hence, the pooled standard deviation is $13.23

Computing the t-Test Statistic

Now the fun begins. Jenny knows the sample mean of the two offers; she knows the hypothesized difference between the two population means (which we would expect to be zero, if the null hypothesis said they were equal); she knows the pooled standard deviation; she knows the sample size; and she knows the degrees of freedom. Jenny must now calculate the t-Test statistic. The t-Test Statistic, or the t-value, represents the number of estimated standard errors the sample average is from that of the population. The t-value is computed as follows:

 

So Jenny sets to work computing her t-Test Statistic:

t = (($41.44 – $45.10) – (0)) / ($13.23) * SQRT(1/10 + 1/10)

= -$3.66 / ($13.23 * SQRT(1/5))

=-$3.66 / ($13.23 * .45)

=-$3.66/$5.92

= -0.62

This t-statistic gives Jenny a basis for testing her hypothesis. Jenny’s t-statistic indicates that the difference in sample table checks between the two offers is 0.62 standard errors below the hypothesized difference of zero. We now need to determine the critical t – the value that we get from a t-distribution table that is available in most statistics textbooks and online. Since we are estimating with a 95% confidence interval, and since we must account for a small sample, our critical t-value is adjusted slightly from the 1.96 standard deviations from the mean. For 18 degrees of freedom, our critical t is 2.10. The larger the sample size, the closer to 1.96 the critical t would be.

So, does Jenny Accept or Reject her Null Hypothesis (Translation: Is the “10% Off” Offer Better than the “$1.00 Off” Offer)?

Jenny now has all the information she needs to determine whether one offer worked better than the other. What does the critical t of 2.10 mean? If Jenny’s t-statistic is greater than 2.10, or (since one offer can be lower than the other), less than -2.10, then she would reject her null hypothesis, as there is sufficient evidence to suggest that the two means are not equal. Is that the case?

Jenny’s t-statistic is -0.62, which is between -2.10 and 2.10. Hence, it is within the parameters. Jenny should not reject H0, since there is not enough evidence to suggest that one offer was better than the other at generating higher table checks. In fact, there’s nothing to say that the difference between the two offers is due to anything other than chance.

What Does Jenny Do Now?

Basically, Jenny can conclude that there’s not enough evidence that the “$1.00 off” coupon was worse/better than the “10% off” coupon in generating higher table check amounts, and vice-versa. This does not mean that our hypotheses were true or false, just that there was not enough statistical evidence to say so. In this case, we did not accept the null hypothesis, but rather, failed to reject it. Jenny can do a few things:

  1. She can run another test, and see if the same phenomenon holds.
  2. Jenny can accept the fact that both offers work equally well, and compare their overall average table checks to those of who ordered jambalaya without the coupons during the time the offer ran; if the coupons generated average table checks that were higher (using the hypothesis testing procedures outlined above) than those who paid full price, then she may choose to rollout a complete promotion using either or both of the offers described above.
  3. Jenny may decide that neither coupon offer raised average check amounts and choose not to do a full rollout after all.

So Why am I Telling You This?

The purpose of this blog post was to take you step-by-step into how you can use a simple concept like t-tests to judge the performance of two promotion concepts. Although a spreadsheet like Excel can run this test in seconds, I wanted to walk you through the theory in laymen’s terms, so that you can grasp the theory, and then apply it to your business. Analysights is in the business of helping companies – large and small – succeed at marketing, and this blog post is one ingredient in the recipe for your marketing success. If you would like some assistance in setting up a promotion test or in evaluating the effectiveness of a campaign, feel free to contact us at www.analysights.com.

 

Beware of “Professional” Survey Respondents!

April 3, 2009

Thanks to the Internet, conducting surveys has never been easier.  Being able to use the Web to conduct marketing research has greatly reduced the cost and time involved and has democratized the process for many companies.

While online surveys have increased simplicity and cost-savings, they have also given rise to a dangerous breed of respondents – “Professional” survey-takers.   

A “professional” respondent is one who actively seeks out online surveys offering paid incentives – cash, rewards, or some other benefit – for completing the survey.  In fact, many blogs and online articles tell of different sites people can go to find paid online surveys.

If your company conducts online surveys, “professionals” can render your findings useless.  In order for your survey to provide accurate and useful results, the people surveyed must be representative of the population you are measuring and selected randomly (that is, everyone from the population has an equal chance of selection).

“Professionals” subvert the sampling principles of representativeness and randomness simply because they self-select to take the survey.  The survey tool does not know that they are not part of the population to be measured, nor their probability of selection.  What’s more, online surveys exclude persons from the population without Internet access.  This results in a survey bias double-whammy.

In addition, “professionals” may simply go through a survey for the sake of the incentive.  Hence they may speed through it, paying little or no attention to the questions, or they may give untruthful answers.  Now your survey results are both biased and wrong.

 Minimizing the impact of “Professionals”

There are some steps you can take to protect your survey from “professionals,” including:

  • Maintain complete control of your survey distribution.  If possible, use a professional online survey panel company, such as e-Rewards, Greenfield Online, or Harris Interactive.  There are lots of others, and all maintain tight screening processes for their survey participants and tight controls for distribution of your survey;
  • If an online survey panel is out of your budget, perhaps you can build your own controlled e-mail list (following CAN-SPAM laws, of course).  E-mailing your survey is less prone to bias than keeping it on a Web site for anyone to join.
  • Have adequate screening criteria in your survey.  If you can get respondents to sign in using a passcode and/or ask questions at the beginning, which terminate the survey for people whose responses indicate they are not representative of the population, you can reduce the number of “professionals”;
  • Put “speed bumps” into your survey.  An example would be to have a dummy question inside that simply says: “Select the 3rd radio bottom from the top.”  Put two or three bumps in your survey.  A respondent who answers two or more of those bump questions incorrectly is likely to be a speeder and the survey can be instructed to terminate;
  • Ask validation questions.  That is, ask a question one way and then later in the survey ask it in another form, and see if the responses are consistent.  If they’re not, then the respondent may be a “professional” or a speeder.

The Internet may have made marketing research easier, but it has also made it more susceptible to bias.  The tools to conduct marketing research have become much easier and more user-friendly, but that doesn’t change the principles of statistics and marketing research.  Online surveys, no matter how easily, fast, or cheaply they can be implemented, will waste time and money if those principles are violated.