Posts Tagged ‘predictive modeling’

Big Data Success Starts With Well-Defined Business Problem

April 18, 2014

(This post also appears on our successor blog, The Analysights Data Mine).

Lots of companies are jumping on the “Big Data” bandwagon; few of them, however, have given real thought to how they will use their data or what they want to achieve with the knowledge the data will give them.  Before reaping the benefits of data mining, companies need to decide what is really important to them.  In order to mine data for actionable insights, technical and business people within the organization need to discuss the business’ needs.

Data mining efforts and processes will vary, depending on a company’s priorities.  A company will use data very differently if its aim is to acquire new customers than if it wants to sell new products to existing customers, or find ways to reduce the cost of servicing customers.  Problem definition puts those priorities in focus.

Problem definition isn’t just about identifying the company’s priorities, however.  In order to help the business achieve its goals, analysts must understand the constraints (e.g., internal privacy policies, regulations, etc.) under which the company operates, whether the necessary data is available, whether data mining is even necessary to solve the problem, the audience at whom data mining is directed, and the experience and intuition of the business and technical sides.

What Does The Company Want to Solve?

Banks, cell phone companies, cable companies, and casinos collect lots of information on their customers.  But their data is of little value if they don’t know what they want to do with it.  In the banking industry, where acquiring new customers often means luring them away from another bank,  a bank’s objective might be to cross-sell, or get its current depositors and borrowers to acquire more of – its products, so that they will be less inclined to leave the bank.  If that’s the case, then the bank’s data mining effort will involve looking at the products its current customers have and the order and manner in which the customer acquired those products.

On the other hand, if the bank’s objective is to identify which customers are at risk of leaving, its data mining effort will examine the activity of departing households in the months leading up to their defection, and compare it to those households it retained.

If a casino’s goal is to decide on what new slot machines to install, its data mining effort will look at the slot machine themes its top patrons play most and use that in its choice of new slot machines.

Who is the Audience the Company is Targeting?

Ok, so the bank wants to prevent customers from leaving.  But do they want to prevent all customers from leaving?  Usually, only a small percentage of households account for all of a bank’s profit; many banking customers are actually unprofitable.  If the bank wants to retain its most profitable customers, it needs only analyze that subgroup of its customer base.  The bank’s predictions of its premier customers’ likelihood to leave based on a model developed on all its customers would be highly inaccurate.  In this case, the bank would need to build a model only on its most profitable customers.

Does the Problem Require Data Mining?

Data mining isn’t always needed.  Years ago, when I was working for a catalog company, I developed regression models to predict which customers were likely to order from a particular catalog.  When a model was requested for the company’s holiday catalog, I was told that it would go to 85 percent of the customer list.  When such a large proportion of the customer base – or the entire customer base for that matter – is to receive communication, then a model is not necessary.  More intuitive methods would have sufficed.

Is Data Available?

Before a data mining effort can be undertaken, the data necessary to solve the business problem must be available or obtainable.  If a bank wants to know the next best product to recommend to its existing customers, it needs to know the first product these customers acquired, how they acquired it, the length of time between their acquisition of their second product, then their third product, and so forth. The bank also needs to understand what products its customers acquired simultaneously (such as a checking account and a credit card), current activity with those products, and the sequence of product acquisition (e.g., checking account first, savings account second, certificate of deposit third, etc.).

It is extremely important that analysts consult both those on the business side and the IT department about the availability of data.  These internal experts often know what data is collected on customers, where it resides, and how it is stored.  In many cases, these experts may have access to data that doesn’t make it into the enterprise’s data warehouse.  And they may know what certain esoteric values for fields in the data warehouse mean.  Consulting these experts can save analysts a lot of time in understanding the data.

Under What Constraints Does the Business Operate?

Companies have internal policies regulating how their operation; are subject to regulations and laws governing the industries and localities in which they operate; and also are bound by ethical standards in those industries and locations.

Often, a company has access to data that, if used in making business decisions, can be illegal or viewed as unethical.  The company doesn’t acquire this data illegally; the data just cannot be used for certain business practices.

For example, I was building customer acquisition models for a bank a few years ago.  The bank’s data warehouse had access to summarized credit score statistics by block groups, as defined by the U.S. Bureau of the Census.  However, banks are subject to the Community Reinvestment Act (CRA), a 1977 law that was passed to prevent banks from excluding low- to moderate-income neighborhoods in their communities from lending decisions.  Obviously, credit scores are going to be lower in lower-income areas. Hence, under CRA guidelines, I could not use the summarized credit statistics to build a model for lending products.  I could, however, use those statistics for a model for deposit products; for post campaign analysis, to see which types of customers responded to the campaign; and also to demonstrate compliance with the CRA.

In addition, the bank’s internal policies did not allow the use of marital status in promoting products.  Hence, when using demographic data that the bank purchased, I had to ignore the field, “married” when building my model.  In cases like these, less direct approaches can be used.  The purchased data also contained a field called “number of adults (in the household).  This was totally appropriate to use, since it did not necessarily mean that a household with two adults was a married-couple household.

Again, the analyst must consult the company’s business experts so it can understand these operational constraints.

Are the Business Experts’ Opinions and Intuition Spot-On?

It’s often said that novices make mistakes out of ignorance and veterans make mistakes out of arrogance.  The business experts have a lot of experience in the company and a great deal of intuition, which can be very insightful.  However, they can be wrong too.  With every data mining effort, the data must be allowed to tell the story.  Does the data validate what the experts say?  For example, most checking accounts are automatically bundled with a debit card; a bank’s business experts know this; and the analysis will often bear this out.

However, if the business experts say that a typical progression in a customer’s banking relationship starts with demand deposit accounts (e.g., checking accounts) then consumer lending products (e.g., auto and personal loans), followed by time deposits (e.g., savings accounts and certificates of deposit), does the analysis confirm that?


Problem definition is the hardest, trickiest, yet most important, prerequisite to getting the most out of “Big Data.”  By knowing what the business needs to solve, analysts must also consider the audience the data mining effort is targeting; whether data mining is necessary; the availability of data and the conditions under which it may be used; and the experience of the business experts.  Effective problem definition begets data mining efforts that produce insights a company can act upon.

Don’t Ignore Business Rules When Building Predictive Models

January 4, 2011

The development of predictive models does not occur in a vacuum. The model-building process requires input from several key stakeholders, many of whom may not directly use the models that result. In several cases, an often overlooked stakeholder is the organization’s compliance officer.

Yes, you read that correctly. Laws, regulations, and internal policies restrict the use and application of data in marketing promotions, planning, and other organizational decision-making. These policies, known as “business rules,” have different degrees and levels across organizations and industries, but their importance is the same: ignoring them when developing your model can get you in a lot of hot water, as one of my past clients found out.

A financial services firm had once retained me to develop a series of prospect propensity models. The client had several types of data available about prospects: demographic overlay data, census data, and summarized data on credit and affluence. The client had obtained all these databases from third-party vendors in order to understand the customers and prospects in the areas where it did business. The client also had hoped to use this data to make smart marketing promotions to non-customers.

After being sure we were in compliance with financial services regulations and internal policies, I went ahead and built the propensity models, a two-month process. The marketing campaign team couldn’t wait to start deploying them. The strategic planning group was eagerly awaiting them to get estimates on future business. We were all excited. UNTIL….

A few months after the modeling engagement ended, the financial services firm renewed its contract with the vendor of the summarized affluence data. The terms of that contract included something the client had overlooked at the start of the engagement: the data was to be used for customer profiling and development, not prospecting!!!

Had the client retained me for building “best next offer” models for its existing customers, there would have been no problem. However, the wealth data had been used to construct prospect propensity models, so the client could not use the models that were built, lest it invite a lawsuit by the vendor. As a result, the client had to re-retain me to rework each model where there was at least one variable from the wealth data – and it turns out all models contain variables from the wealth data. And since the omission was on the client’s part, it had to pay for the rework. And, as if to rub salt into the wound, the marketing campaign team couldn’t use the models until they were redone and thus missed great opportunities in the interim.

The moral of the story: you could save your company thousands of dollars – both in model building costs, time, and opportunity costs – if you heed the business rules that govern the use of your data. Before undertaking a modeling project, make sure you understand the legalities of how you will use the information available to you. Talk to your company’s domain experts about these rules and make sure those constraints are always top of mind when you build you models. Otherwise you can end up like my client, or worse, on the wrong side of a lawsuit.


Start the New Year on the Right Foot: Follow us on Facebook and Twitter !

For the latest insights on marketing research, predictive modeling, and forecasting, be sure to check out Analysights on Facebook and Twitter!  “Like-ing” us on Facebook and following us on Twitter will allow you to stay informed of each new Insight Central post  published, new information about analytics, discussions Analysights will be hosting,  and other opportunities for feedback.  So get this New Year off right and check us out on Facebook and Twitter!

Read All About It: Why Newspapers Need Marketing Analytics

October 26, 2010

After nearly 20 years, I decided to let my subscription to the Wall Street Journal lapse. A few months ago, I did likewise with my longtime subscription to the Chicago Tribune. I didn’t want to end my subscriptions, but as a customer, I felt my voice wasn’t being heard.

Some marketing research and predictive modeling might have enabled the Journal and the Tribune to keep me from defecting. From these efforts, both publications could have spotted my increasing frustration and dissatisfaction and intervened before I chose to vote with my feet.

Long story short, I let both subscriptions lapse for the same reason: chronic unreliable delivery, which was allowed to fester for many years despite numerous calls by me to their customer service numbers about missing and late deliveries.

Marketing Research

Both newspapers could have used marketing research to alert them to the likelihood that I would not renew my subscriptions. They each had lots of primary research readily available to them, without needing to do any surveys: my frequent calls to their customer service department, with the same complaint.

Imagine the wealth of insights both papers could have reaped from this data: they could determine the most common breaches of customer service; by looking at the number of times customers complained about the same issue, they could determine where problems were left unresolved; by breaking down the most frequent complaints by geography, they could determine whether additional delivery persons needed to be hired, or if more training was necessary; and most of all, both newspapers could have also found their most frequent complainers, and reached out to them to see what could be improved.

Both newspapers could have also conducted regular customer satisfaction surveys of their subscribers, asking about overall satisfaction and likelihood of renewing, followed by questions about subscribers’ perceptions about delivery service, quality of reporting, etc. The surveys could have helped the Journal and the Tribune grab the low-hanging fruit by identifying the key elements of service delivery that have the strongest impact on subscriber satisfaction and likelihood of renewal, and then coming up with a strategy to secure satisfaction with those elements.

Predictive Modeling

Another way both newspapers might have been able to intervene and retain my business would have been to predict my likelihood of lapse. This so-called attrition or “churn” modeling is common in industries whose customers are continuity-focused: newspapers and magazines, credit cards, membership associations, health clubs, banks, wireless communications, and broadband cable to name a few.

Attrition modeling (which, incidentally, will be discussed in the next two upcoming Forecast Friday posts) involves developing statistical models comparing attributes and characteristics of current customers with those of former, or churned, customers. The dependent variable being measured is whether a customer churned, so it would be a 1 if “yes” and a 0 if “no.”

Essentially, in building the model, the newspapers would look at several independent, or predictor, variables: customer demographics (e.g., age, income, gender, etc.), frequency of complaints, geography, to name a few. The model would then identify the variables that are the strongest predictors of whether a subscriber will not renew. The model will generate a score between 0 and 1, indicating each subscriber’s probability of not renewing. For example, a probability score of .72 indicates that there is a 72% chance a subscriber will let his/her subscription lapse, and that the newspaper may want to intervene.

In my case, both newspapers might have run such an attrition model to see if number of complaints in the last 12 months was a strong predictor of whether a subscriber would lapse. If that were the case, I would have a high probability of churn, and they could then call me; or, if they found that subscribers who churned were clustered in a particular area, they might be able to look for systemic breakdowns in customer service in that area. Either way, both papers could have found a way to salvage the subscriber relationship.

Why Surveys Go Well With Predictive Models

October 13, 2010

Thanks to advancements in technology, companies now have the capability to analyze millions – if not billions – of transactional, demographic, and psychographic records in a short time and develop sophisticated models that can assess several scenarios: how likely a customer is likely to purchase again; when he/she will purchase again; how much he/she will spend in the next year; how likely he/she will defect; and many more. Yet, by themselves, predictive models don’t provide a complete picture or profile of the customer. While models can provide information on a prospect or customer’s willingness and ability to purchase based on similar characteristics of current customers, they don’t provide much information about the customer or prospect’s readiness to buy. Hence, a survey can be a highly useful supplement.

Using a survey before a promotion – assuming no effort is made trying to sell to the customer under the guise of the survey – can provide valuable information. With a simple attitudinal and behavioral survey, a marketer can gain a read on the market’s readiness and willingness to buy at that moment. Moreover, the marketer can gauge the purchase readiness of certain customer groups and segments, so that he/she can structure marketing promotions in a manner that makes the best use of marketing dollars. In addition, if certain groups are wary of or unwilling to buy a product, the marketer can look for ways to reach out to these groups for the future.

Another benefit of surveys is to help classify customers and prospects into market segments based on their answers to carefully designed questions. Often, surveys can capture data about prospects and customers that transactional and third-party overlay data sources cannot.

Surprisingly, many companies either do marketing research or predictive modeling, but not both. This is squandering a great marketing opportunity. These two approaches together can provide the missing pieces to the puzzle that will help marketers improve their planning, increase their marketing ROI, and maximize their profits and market share.

Forecast Friday Topic: Slope Dummy Variables

October 7, 2010

(Twenty-fourth in a series)

In the last two posts, we discussed the use of dummy variables for factoring the impact of categorical or seasonal phenomena into regression models. Those dummy variables affected the y-intercept of the regression equation. However, many datasets – especially time series – are subject to structural changes that affect the slope – the coefficient – in the regression equation. For example, if you were doing long range forecasting based on several years of data for the airline industry, airline business practices were very different before September 11, 2001 – and you must adjust for it.

Structural changes can also occur in cross-sectional data. If you an operations manager at a factory and were trying to develop a model for worker productivity based on years of experience and education, you might discover that education requirements for factory jobs changed some time ago. Of course, not all current factory workers were affected by the change; some older workers were grandfathered; or union contracts may have shielded these workers from the changes. If for example the newer factory workers were required to obtain a certain amount of college-level training for their work, and you don’t account for the changed requirement, your parameter estimates will be biased.

How do we account for these structural shifts? Slope dummy variables – or slope dummies, for short.

Since the specification of a slope dummy is only slightly more complex than an intercept dummy, I will not be using a full-scale regression example here as I have in past posts. Rather, I will show what a regression model with a slope dummy looks like.

Let’s assume you run a business and your sales are greatly affected by city ordinances – the more ordinances there are, the lower your sales. Your city has two political parties – the Regulation party and the Deregulation party. For the most part, the Regulation party tends to impose more ordinances when they occupy city hall and the Deregulation party tends to impose fewer, or rescind, ordinances when they’re in office.

Of course, sometimes a Regulation administration may not impose new ordinances; and a Deregulation administration may impose them, depending on the policy and economic issues the city is facing at the time. But, for the most part, ordinances tend to increase under Regulation administrations. So how do we account for this?

Let’s start with a simple regression equation:

Ŷ = α -β1Ot + εt

In this equation, Ŷ represents forecasted sales; α is the y-intercept; β1 is the parameter estimate for variable O, which is the number of pages of ordinances on the city’s books in that year; ε is the error term; and t is the time period in the regression. Notice that the parameter estimate, β1, is negative, which we would expect. As the number of pages of ordinances increases, we would expect to see sales go down.

But now you want to account for whether the party in office is a Regulation administration. So you create a dummy variable called Dt. Dt=1 in years when city hall is run by a Regulation mayor and Dt=0 when the city is run by a Deregulation mayor. So your new equation looks like this:

Ŷ = α -β1Ot – β2 Ot Dt + εt

Notice the difference between the slope dummy in this last equation? It’s multiplied against the pages of regulation and then run as an independent variable in the model. Hence, we forecast our sales as follows:

When a Deregulation administration is in city hall (Dt =0), Ŷ=
α -β1Ot;

When a Regulation administration is in city hall (Dt =1), Ŷ=
α –(β12)Ot

Hence, you see the slope (parameter estimate) is different if the Regulation party is in office.

Next Forecast Friday Topic: Selecting Variables for a Regression Model

Sometimes you have a lot of variables to choose from when building a regression model. How do you know which ones to include in your model? We will discuss some approaches to determine which variables to enter into your model next week.


Check out Analysights’ profile at the Janlong Communications Blog!

Marketing communications specialist Janice Long of Janlong Communications profiles many small and up-and-coming businesses on her blog, asking owners their ideal client profile and how they got started. This week, Janlong Communications profiled Analysights! See the brief post about how we got started and the market niche we serve.