Archive for June, 2010

Does the Order of Survey Questions Matter? You Bet!

June 29, 2010

Thanks to online survey tools, the cost of executing a survey has never been cheaper. Online surveys allow companies to ask respondents more questions, ask questions on multiple (though related) topics, and get their results faster and less expensively than was once possible with telephone surveys. But the ability to ask more questions on more topics has meant that the sequence of survey questions must be carefully taken into consideration. While the order of the questions in a survey has always mattered, they are even more crucial now.

Order Bias

When question order is not considered, several problems occur, most notably order bias. Imagine that a restaurant owner was conducting a customer satisfaction survey. With no prior survey background, he creates a short survey, with questions ordered like this:

  1. Please rate the temperature of your entrée.
  2. Please rate the taste of your food.
  3. Please rate the menu selection here.
  4. Please rate the courtesy of your server.
  5. Please rate the service you received.
  6. Please rate your overall experience at this restaurant.

What’s wrong with this line of questioning? Assuming they all have the same answer choices, ranging from “poor” to “excellent,” plenty! First, when there are several questions in sequence with the same rating scales, there’s a great chance a respondent will speed through the survey, providing truthful answers near the beginning of the survey, and less truthful answers further down. By placing the overall satisfaction at the end, the restaurateur is biasing the response to it. Hence, if the respondent had a positive experience to the temperature of his/her food, that might cause a halo effect, making him/her think the taste was also good, as well as the menu selection, etc. Halo effects can also be negative. That first question ends up setting the context in which the respondent views his satisfaction.

On the other hand, if the restaurateur shifted the order of the questionnaire as shown below, he will get more reliable answers:

  1. Please rate your overall experience at this restaurant.
  2. Please rate the menu selection here.
  3. Please rate the temperature of your entrée.
  4. Please rate the taste of your food.
  5. Please rate the service you received.
  6. Please rate the courtesy of your server.

Notice the difference? The restaurateur is starting with the overall satisfaction question, followed by satisfaction with the menu selection. Within the menu selection, the restaurateur asks specifically about temperature and taste of food. Then the restaurateur asks about the service, then specifically about the courtesy of the server. What this process does is to start with the respondent’s overall satisfaction. When a respondent offers an overall rating, he is then asked about a component (either the menu selection or the service) of overall satisfaction, so that the researcher can determine if a low overall satisfaction rating is brought on by a low satisfaction rating with either the menu or service, or both. This leads the respondent to speak truthfully of how each component contributed to his/her satisfaction.

Respondent Confusion/No Coherent Organization

Imagine you had developed a new product and wanted to gauge purchase intent for the new product. There’s a ton of stuff you want to know about: best price to charge, best way to promote the product, where respondents will go to buy it, etc. Many survey neophytes may commingle the pricing, promotion, and distribution questions. This is a mistake! The respondent will become confused and fatigued if there’s no clear organization for your survey. If you are asking a question about those three components, your questionnaire should have three sections. At the start of each section, you should indicate “this section asks you some questions about what you feel the ideal price for this product would be…” or “this section asks you about what features you would like and dislike in this product.” In this fashion, the respondent knows what the line of questioning is and doesn’t feel confused.

Tips for Ordering Survey Questions Effectively

These are just two examples. Essentially, if you want to order your questionnaire for maximum reliability, response, and clarity, remember to:

  1. Start with broad, general questions and move to narrow specific ones. If respondents haven’t formed a general opinion or point of view of your topic, you can start your questionnaire going from specific to general.
  2. As I mentioned in last week’s posts, sensitive questions should be asked late in the survey, after your previous questions have established rapport with the respondent.
  3. Unless the topic is highly sensitive, never start a questionnaire with an open-ended question.
  4. Save demographic and classification questions for the end of the questionnaire, unless you need to ask them in order to screen respondents for taking the survey.
  5. Use chronological sequences in questions when obtaining historical information from a respondent.
  6. Make sure all questions on a topic are complete before moving on to another topic and use transitory statements between the topics, like I described in the prior paragraph.

Much like designing survey questions, the order of the questioning is as much an art as it is a science. Taking time to organize your questions will reward you with results that are reliable and actionable.

A Typical-Length Survey or A Few Shorter Ones?

June 28, 2010

Most online surveys today take between 10 and 15 minutes, with a few going as long as 25 to 30 minutes. As marketing researchers, we have long pontificated that surveys should be a reasonable length, as longer ones tend to cause respondents to disengage in many ways: speeding through, skipping questions, even abandoning the survey. Most marketers realize this, and the 10-15 minute survey seems to be the norm. But I wonder how many marketing researchers – on both the client and supplier side – have ever considered the length of a survey from a strategic, rather than a tactical, point of view.

Sure, a typical-length survey is not super long, and is often cost effective for a client. After all, the client can survey several people about several topics in a relatively short time, for a set price, and can get results quickly. But sometimes I believe that instead of one 15-minute survey, some clients might benefit more by conducting two 7- or 8-minute, or three 5-minute surveys, stretched out over time. Marketing researchers on both sides will likely disagree with me here. After all, multiple shorter surveys can cost more to administer. However, I believe that – in the long-run – clients will derive value from the more frequent, shorter surveys that would offset their cost. Multiple, shorter surveys will benefit clients in the following ways:


As marketing research suppliers, it is our job to make sure we understand the client’s key business problem. Many times, clients have several problems that must be addressed. We need to help clients look at all of their business problems and prioritize them in the order of benefit that their resolution would bring. If we could get the client’s survey focused on the one or two problems whose resolution would result in the most positive difference, we can keep the survey short, with more targeted questions. As a result, the client doesn’t get bombarded with tons of data tables or reports with lots of recommendations and end up immobilized wondering which ones should be implemented first. On the contrary, the client will receive a few, very direct, insights about how to respond to these key problems.

Reduced Incentive Costs

Since surveys are shorter, respondents may be willing to do them for little or no incentive. This can save the client money.

Higher Response Rates

Surveys that are 10-15 minutes long generally get decent response rates. However, a survey that’s 3, 5, or 7 minutes long will likely get excellent response rates. Why? Because they’re more convenient, straight to the point, and can be knocked off quickly. As a result, respondents are less willing to put it off. Respondents are also less likely to terminate the survey, speed through it, or skip questions.

Increased Trust by Respondents

Because you didn’t waste their time with the first survey, respondents may be more inclined to participate in your subsequent surveys. If they took your 5-minute survey today, then you send them another 5-minute survey four to six weeks from now, they are likely to trust that this survey won’t take long either, and will likely respond to it. Of course, the key here is to space the surveys out. You don’t want to send all three at once!

More Reliable Data

As mentioned above, respondents are less likely to speed, terminate, or skip questions to a short survey than they are with a longer one. As a result, there will be less non-response error and more truthful responses in the data, and hence more trustworthy findings.

Ability to Act on Results Faster

Because the survey is short and to-the-point, and response rates are higher, the client can achieve the desired number of completed surveys sooner than if the survey were longer, so the survey doesn’t have to be in the field as long. And because the survey is short, the time the marketing research firm needs to tabulate and analyze the data is much shorter. Hence the client can start acting on the insights and implementing the recommendations much sooner.


What would happen if a client conducted a typical-length survey and found a theme emerging in open-ended questions or a trend in responses among a certain demographic group? The client may want to study that. But custom research is expensive. If the client did a typical-length survey, the budget may not be there to do another survey to investigate that newly discovered theme or trend. With a shorter survey, the cost may be somewhat lower, so funds might be left in the budget for another survey. In addition, if the client is scheduling subsequent shorter surveys, the learnings from the first survey can be used to shape questions for further investigation in those upcoming surveys.

The Shorter Survey May Be Enough

Several times, problems are interconnected, or generated by other problems. If research suppliers helped clients isolate their one or two biggest problems, and focused on those, the client might act on the insights and eliminate those problems. The resolution of those problems may also provide solutions to, or help extinguish, the lesser-priority problems. As a result, future surveys may not be needed. In that case, the research supplier did its job – solving the client’s problem in the shortest, most economical, and most effective manner possible.

Granted, many clients probably can’t do things this way. There are economies of scale in doing one longer survey as opposed to two or three shorter ones. Moreover, the client probably has several stakeholders, each of whom has a different opinion of which problem is most important. And each problem may have a different urgency to those stakeholders. This is why it is so important for the research supplier to get the client’s stakeholders and top management on board with this. As research suppliers, it is our job to inform and educate the client and its stakeholders on the research approach that maximizes the best interest of the client as a whole; and if that is not possible, work with those stakeholders to identify second-best solutions. But once the key issues – problems, budget, politics, and urgency – are on the table, research suppliers can work with the client to develop the shortest, most focused, most cost effective survey possible.

Forecast Friday Topic: Multiple Regression Analysis (continued)

June 24, 2010

(Tenth in a series)

Today we resume our discussion of multiple regression analysis. Last week, we built a model to determine the extent of any relationship between U.S. savings & loan associations’ percent profit margin and two independent variables, net revenues per deposit dollar and number of S&L offices. Today, we will compute the 95% confidence interval for each parameter estimate; determine whether the model is valid; check for autocorrelation; and use the model to forecast. Recall that our resulting model was:

Yt = 1.56450 + 0.23720X1t – 0.000249X2t

Where Yt is the percent profit margin for the S&L in Year t; X1t is the net revenues per deposit dollar in Year t; and X2t is the number of S&L offices in the U.S. in Year t. Recall that the R2 is .865, indicating that 86.5% of the change in percentage profit margin is explained by changes in net revenues per deposit dollar and number of S&L offices.

Determining the 95% Confidence Interval for the Partial Slope Coefficients

In multiple regression analysis, since there are multiple independent variables, the parameter estimates for each independent variable both impact the slope of the line; hence the coefficients β1t and β2t are referred to as partial slope estimates. As with simple linear regression, we need to determine the 95% confidence interval for each parameter estimate, so that we could get an idea where the true population parameter lies. Recall from our June 3 post, we did that by determining the equation for the standard error of the estimate, sε, and then the standard error of the regression slope, sb. That worked well for simple regression, but for multiple regression, it is more complicated. Unfortunately, deriving the standard error of the partial regression coefficients requires the use of linear algebra, and would be too complicated to discuss here. Several statistical programs and Excel compute these values for us. So, we will state the values of sb1 and sb2 and go from there.



Also, we need our critical-t value for 22 degrees of freedom, which is 2.074.

Hence, our 95% confidence interval for β1 is denoted as:

0.23720 ± 2.074 × 0.05556

=0.12197 to 0.35243

Hence, we are saying that we can be 95% confident that the true parameter β1 lies somewhere between the values of 0.12197 and 0.35243.

Similarly, for β2, the procedure is similar:

-0.000249 ± 2.074 × 0.00003

=-0.00032 to -0.00018

Hence, we can be 95% confident that the true parameter β2 lies somewhere between the values of -0.00032 and -0.00018. Also, the confidence interval for the intercept, α, ranges from 1.40 to 1.73.

Note that in all of these cases, the confidence interval does not contain a value of zero within its range. The confidence intervals for α and β1 are positive; that for β2 is negative. If any parameter’s confidence interval ranges crossed zero, then the parameter estimate would not be significant.

Is Our Model Valid?

The next thing we want to do is determine if our model is valid. When validating our model we are trying to prove that our independent variables explain the variation in the dependent variable. So we start with a hypothesis test:

H0: β1 = β2 = 0

HA: at least one β ≠ 0

Our null hypothesis says that our independent variables, net revenue per deposit dollar and number of S&L offices, explain nothing of the variation in an S&L percentage profit margin, and hence, that our model is not valid. Our alternative hypothesis says that at least one of our independent variable explains some of the variation in an S&L’s percentage profit margin, and thus is valid.

So how do we do it? Enter the F-test. Like the T-test, the F-test is a means for hypothesis testing. Let’s first start by calculating our F-statistic for our model. We do that with the following equation:

Remember that RSS is the regression sum of squares and ESS is the error sum of squares. The May 27th Forecast Friday post showed you how to calculate RSS and ESS. For this model, our RSS=0.4015, and our ESS=0.0625; k is the number of independent variables, and n is the sample. Our equation reduces to:

= 70.66

If our Fcalc is greater than the critical F value for the distribution, then we can reject our null hypothesis and conclude that there is strong evidence that at least one of our independent variables explains some of the variation in an S&L’s percentage profit margin. How do we determine our critical F? There is yet another table in any statistics book or statistics Web site called the “F Distribution” table. In it, you look for two sets of degrees of freedom – one for the numerator and one for the denominator of your Fcalc equation. In the numerator, we have two degrees of freedom; in the denominator, 22. So we look at the F Distribution table notice the columns represent numerator degrees of freedom, and the rows, denominator degrees of freedom. When we find column (2), row (22), we end up with an F-value of 5.72.

Our Fcalc is greater than that, so we can conclude that our model is valid.

Is Our Model Free of Autocorrelation?

Recall from our assumptions that none of our error terms should be correlated with one another. If they are, autocorrelation results, rendering our parameter estimates inefficient. Check for autocorrelation, we need to look at our error terms, when we compare our predicted percentage profit margin, Ŷ, with our actual, Y:


Percentage Profit Margin

Actual (Yt)

Predicted by Model (Ŷt)






































































































The next thing we need to do is subtract the previous period’s error from the current period’s error. After that, we square our result. Note that we will only have 24 observations (we can’t subtract anything from the first observation):



Difference in Errors

Squared Difference in Errors






































































































If we sum up the last column, we will get .1218, if we then divide that by our ESS of 0.0625, we get a value of 1.95. What does this mean?

We have just computed what is known as the Durbin-Watson Statistic, which is used to detect the presence of autocorrelation. The Durbin-Watson statistic, d, can be anywhere from zero to 4. Generally, when d is close to zero, it suggests the presence of positive autocorrelation; a value close to 2 indicates no autocorrelation; while a value close to 4 indicates negative autocorrelation. In any case, you want your Durbin-Watson statistic to be as close to two as possible, and ours is.

Hence, our model seems to be free of autocorrelation.

Now, Let’s Go Forecast!

Now that we have validated our model, and saw that it was free of autocorrelation, we can be comfortable forecasting. Let’s say that for years 26 and 27, we have the following forecasts for net revenues per deposit dollar, X1t and number of S&L offices, X2t. They are as follows:

X1,26 = 4.70 and X2,26 = 9,350

X1,27 = 4.80 and X2,27 = 9,400

Plugging each of these into our equations, we generate the following forecasts:

Ŷ26 = 1.56450 + 0.23720 * 4.70 – 0.000249 * 9,350


Ŷ27 = 1.56450 + 0.23720 * 4.80 – 0.000249 * 9,400


Next Week’s Forecast Friday Topic: The Effect of Omitting an Important Variable

Now that we’ve walked you through this process, you know how to forecast and run multiple regression. Next week, we will discuss what happens when a key independent variable is omitted from a regression model and all the problems it causes when we violate the regression assumption that “all relevant and no irrelevant independent variables are included in the model.” Next week’s post will show a complete demonstration of such an impact. Stay tuned!

Randomized Responses: More Indirect Techniques to Asking Sensitive Survey Questions

June 23, 2010

Yesterday’s post discussed approaches for asking survey questions of a sensitive nature in a way that would make individual respondents more inclined to answer them truthfully. Sometimes, however, you don’t care about the individual respondent’s answer to the sensitive question, but would rather get an idea of the incidence of that sensitive issue among all respondents. Sometimes, knowing the incidence of such a topic is what we need in order to conduct further research, or get an understanding of the market potential for a new product, or decide how to prioritize the allocation of resources for exploiting that instance. The most effective ways to do this are through Randomized Response Techniques, which are useful for assessing group behavior, as opposed to individual behavior.

Let’s assume that you are marketing a new over-the-counter ointment for athlete’s foot to college males, and you want to understand how large a market you have for your ointment. You decide to survey of 100 college males, randomly selected. Asking them if they’ve had athlete’s foot might be something they don’t want to answer, yet you’re not concerned with whether a particular respondent has athlete’s foot, but rather, get an estimate of how many college age men suffer from it.

Try a Coin Toss

One indirect way of finding out the incidence of athlete’s foot among college men might be to ask a question like this:

“Flip a coin (in private) and answer ‘yes’ if either the coin was a head or you’ve suffered from athlete’s foot in the last three months.”

If the respondent answers “yes” to the question, you will not know whether he did so because of the athlete’s foot or because of the coin toss. However, once you’ve compiled all the responses to this question, you can get a good estimate of the incidence of athlete’s foot among college males. You would figure it out as follows:

Total Respondents


Number answering “yes”


Expected Number of Heads on flip


Excess “Yes” over Expected


Percent with Athlete’s Foot (15/50)


Generally, when you flip a coin, you expect the results of the toss to come up “heads” about 50% of the time. If 65% of the respondents answer “yes” to the heads/athlete’s foot question, then you are 15 points over the expected value. Dividing that difference by the expected value (50) gives you an estimate that 30% of respondents have athlete’s foot.

Roll the Dice

Another approach would be asking respondents to roll a die and answer one question if the roll comes up anywhere from 1 to 4 and answer another if the roll comes up 5 or 6. If the die comes up as 1-4, the respondent answers the question, “I have had athlete’s foot” with either a “Yes” or a “No.” Respondents whose die roll came up 5 or 6 will need to answer the yes/no question, “I have never had athlete’s foot.”

What is the probability that a respondent has had athlete’s foot? The probability of a “Yes” is determined as follows:

P(YES) = P(Directed to first question)*P(Answering Yes to first question) + P(Directed to second question)*P(Answering Yes to second question)

Remember that respondents have a 100% probability of being assigned to either question. Hence the probability of being directed to the first question must be subtracted from 100 in order to get the probability of being directed to the second question. Expressing the probabilities in decimal form, we modify the probability equation as follows:

P(YES)= P(Directed to first question)*P(Answering Yes to first question) + (1-P(Directed to first question))*(1-P(Answering Yes to first question))

In the above example, the probability of being assigned the first question (for rolling a 1-4) is .67 (four chances out of six, or two-thirds). Now, if 35 respondents indicated “Yes” to “I have had athlete’s foot”, we get the following equation, denoting probability as “P”:

0.35 = 0.67P + 0.33(1-P)

0.35 = 0.67P + 0.33 – 0.33P

0.35-0.33 = 0.67P – 0.33P

0.02 = 0.34P


Hence, 5.88% of respondents will have had athlete’s foot.


There are several other randomized response techniques you can do, but these two are some examples you might want to try. Note that the dice approach may not be a very reliable estimator, since if 36 respondents indicated “Yes”, then the probability increases to 8.82%; it’s as if a 1% increase in “Yes” responses increases the overall probability of a “Yes” response by almost 3%. Randomized response techniques are good when you don’t care about the individual responses to sensitive information, but want to know the incidence of such behavior within the respondents. By wording questions in this fashion, you can put respondents as ease when asking these questions, and give them the feeling their responses are obscured, all the while gaining estimates of the percentage of the group engaging in said behavior.

Asking Sensitive Survey Questions

June 22, 2010

As marketers, sometimes we need to get information from respondents that they may not be willing to volunteer freely. When confronted with such inquiries, people may ignore the question, provide either untrue or incomplete responses, or even terminate the survey. Yet often, the survey often provides the only feasible means of obtaining information about a respondent’s religious affiliation, race, income, or other sensitive information. What’s a marketer to do? There are several ways around it:

Build Rapport with Respondent

Quite often, it is best to start a survey with neutral questions, and let the respondent work his or her way through the survey, letting each question lead up to the information you need to ask about. Placing controversial questions late in the questionnaire has two benefits. First, if the respondent chooses to stop the survey once he or she reaches the sensitive questions, you still have the respondent’s answers to all questions beforehand, which you can use for other analyses. Secondly, as the respondent works through the easy, unthreatening questions, he or she may feel as though trust is being established, and will be more likely to answer the question asking the sensitive information.

Be Casual About it!

Let’s assume you are trying to measure the incidence of tax cheating. Getting truthful responses can be very difficult. Try reducing the perceived importance of the topic by asking the question in a nonchalant manner: “Did you happen to have ever cheated on your taxes?” Worded this way, the question leads the respondent to believe the survey’s authors do not think that tax cheating is a big deal, so the respondent may be coaxed to answer truthfully.

Make it Sound Like “Everybody’s Doing It!”

Instead of directly asking a respondent if he or she cheats on his/her taxes, ask if they know of anyone who does. “Do you know any people who cheated on their taxes?” Then the next question could be, “How about you?” When he or she feels he/she isn’t alone, the respondent may be more inclined to be honest. Another way is to combine the casual approach with this one: “As you know, many people have been cheating on their taxes these days. Do you happen to have cheated on yours?”

Choose Longer Questions Instead of Shorter Ones

Longer questions can “soften the blow” with the excess verbiage, and reduce the threat. Consider these examples:

  1. “Even the most liberal people don’t pay their fair share of taxes to the government. Have you, yourself, not reported all your income to the government in the past two years?”
  2. “The Investors Business Daily recently reported on the widespread practice of middle class Americans to not report all their income for tax purposes. Have you happened to report less than all your income at tax time?”
  3. “Did things come up that kept you from reporting all your income to the IRS, or did you happen to report all your income?”

Note the patterns here. In the first question, we again make it sound like everyone is cheating on taxes. In the second, we appeal to an authority. In the third, we make it sound like circumstances beyond the respondent’s control made him or her unable to report all his income.

Try Some Projective Techniques

Make it sound like the respondent is just giving an estimate about someone else. Ask, “As your best guess, approximately what percentage of people in your community fail to report all their income at tax time?” When asked this way, a respondent might base the response on his or her own personal experience.

Try a Hierarchy of Sensitive Issues

Have a question that shows a list of answers ordered from least sensitive to most sensitive. A question like this:

“In the past 12 months or so, which of the following have you done? (Select all that apply):

    “Wear your shirt inside out”

     “Forget to hand in homework”

    “Lock your keys in the car while it was still running”

    “Discipline your child by spanking”

    “Take money out of your spouse’s wallet”

    “Meet an ex-girl/boyfriend behind your spouse’s back”

    “Withhold some information about your income at tax time”

    “Falsely accuse your neighbor of tax dodging”

Notice how this question moves the respondent from less threatening to very threatening answer choices. And by keeping the taxes part embedded – not the very first or the very last – the respondent sees that there are much worse behaviors than tax cheating he/she can admit to. Hence, the he/she is more likely to be truthful.


Questionnaire design is as much an art as it is a science, and wording sensitive questions is almost entirely an art. By building trust with your respondent, making him/her feel that it’s purely human to have the issue/behavior you’re trying to get the respondent to talk about, and finding soft, indirect ways to pierce the issue, you can get him or her to contribute more truthfully and calmly. As they say, “You attract more flies with honey than you do with vinegar!”