(Twenty-eighth in a series)
Sometimes analysts need to forecast the likelihood of discrete outcomes: the probability that this outcome will occur or that outcome will occur. Last week, we discussed the linear probability model (LPM) as one solution. Essentially, the LPM looked that the two discrete outcomes: a 1 if the outcome occurred and a 0 if it did not. We treated that binary variable as the dependent variable and then ran it as if it were ordinary least squares. In our example, we got a pretty good result. However, LPM came up short in many different ways: values that fell outside the 0 to 1 range, dubious R2 values, heteroscedasticity, and non-normal error terms.
One of the more popular improvements on the LPM is the logistic regression model, sometimes referred to as the “logit” model. Logit models are very useful in consumer choice modeling. While the theory is quite complex, today we will introduce you to basic concepts of the logit model, using a simple regression model.
Probabilities, Odds Ratios, and Logits, Oh My!
The first thing to understand about logistic regression is the mathematics of the model. There are three calculations that you need to know: the probability, the odds ratio, and the logits. While these are different values, they are three ways of expressing the very same thing: the likelihood of an outcome occurring.
Let’s start with the easiest of the three: probability. Probability is the likelihood that a particular outcome will happen. It is a number between 0 and 1. A probability of .75 means that an outcome has a 75% chance of occurring. In a logit model, the probability of an observation having a “success” outcome (Y=1) is denoted as Pi. Since Pi is a number between 0 and 1, that means the probability of a “failure” outcome (Y=0) is 1- Pi. If Pi (Y=1)=0.40 then Pi (Y=0)=1.00-0.40 = 0.60.
The odds ratio, then, is the ratio of the probability of a success to the probability of failure:
Hence, in the above example, the odds ratio is (0.40/0.60) = 0.667.
Logits, denoted as Li, are the natural log of the odds ratio:
Hence the logit in our example here is ln(0.667) = -0.405.
A logistic regression model’s equation generates Ŷ values in the form of logits for each observation. The logits are equal to the terms of a regression equation:
Once an observation’s logit is generated, you must take its antilog to derive its odds ratio, and then you must use simple algebra to compute the probability.
Estimate the Logit Model Using Weighted Least Squares Regression
Because data with a logistic distribution are not linear, linear regression is often not appropriate for modeling; however, if data can be grouped, ordinary least squares regression can be used to estimate the logits. This example will use a simple one-variable model to approximate logits. Multiple logistic regression is beyond the scope of this post, and should only be used with a statistical package.
We use weighted least squares (WLS) techniques for approximating the logits. You will recall that we discussed WLS in our Forecast Friday post on heteroscedasticity. In a logistic distribution, error terms are heteroscedastic, making WLS an appropriate tool to employ. The steps in this process are:
- Group the independent variable, each group being its own Xi.
- Note the sample size, Ni of each group, and count the number of successes, ni, in each.
- Compute the relative probabilities for each Xi:
-
Use WLS to transform the model with weights, wi:
-
Perform OLS on the weighted, or transformed model:
L*-hat is computed by multiplying L-hat by the weight, w. Likewise X* is computed by multiplying the original X value by weight; similar for error term.
- Take the antilog of the logits to estimate probabilities for each group.
Predicting Churn in Wireless Telephony
Marissa Martinelli is director of customer retention for Cheapo Wireless, a low-end cell phone provider. Cheapo’s target market are subprime households, whose incomes are generally below $50,000 per year and don’t have bank accounts. As incomes of their customers rise, churn rates for low-end cell phones increases greatly. Cheapo has developed a new cell phone plan that caters to higher income customers, so that it can migrate its existing customers to the new plan as their incomes rise. In order to promote the new plan, Marissa must first identify the customers most at risk of churning.
Marissa takes a random sample of 1,365 current and former Cheapo cell phone customers and looks at their churn rates. She has their incomes based on their applications and credit checks when they first applied for wireless service. She decides to break them down into 19 groups, with incomes from $0 to $50,000, in $2,500 increments. For simplicity, Marissa divides the income amounts by $10,000, and decides to group them. The lowest income group, 0.50, is all customers whose incomes are $5,000 or less; the next group, 0.75, are those with incomes between $5,000-$7,500, and so on. Marissa notes the number of churned customers (ni) for each income level and the number of customers for each income level (Ni):
# Churned
|
Group Size
|
Income level ($10Ks)
|
ni
|
Ni
|
Xi
|
1
|
20
|
0.50
|
2
|
30
|
0.75
|
3
|
30
|
1.00
|
5
|
40
|
1.25
|
6
|
40
|
1.50
|
8
|
50
|
1.75
|
9
|
50
|
2.00
|
12
|
60
|
2.25
|
17
|
80
|
2.50
|
22
|
80
|
2.75
|
35
|
100
|
3.00
|
40
|
100
|
3.25
|
75
|
150
|
3.50
|
70
|
125
|
3.75
|
62
|
100
|
4.00
|
62
|
90
|
4.25
|
64
|
90
|
4.50
|
51
|
70
|
4.75
|
50
|
60
|
5.00
|
As the table shows, of the 60 customers whose income is between $47,500 and $50,000, 50 of them have churned. Knowing this information, Marissa can now compute the conditional probabilities of churn (Y=1) for each income group:
# Churned
|
Group Size
|
Income level ($10Ks)
|
Probability of Churn
|
Probability of Retention
|
ni
|
Ni
|
Xi
|
Pi
|
1-Pi
|
1
|
20
|
0.50
|
0.050
|
0.950
|
2
|
30
|
0.75
|
0.067
|
0.933
|
3
|
30
|
1.00
|
0.100
|
0.900
|
5
|
40
|
1.25
|
0.125
|
0.875
|
6
|
40
|
1.50
|
0.150
|
0.850
|
8
|
50
|
1.75
|
0.160
|
0.840
|
9
|
50
|
2.00
|
0.180
|
0.820
|
12
|
60
|
2.25
|
0.200
|
0.800
|
17
|
80
|
2.50
|
0.213
|
0.788
|
22
|
80
|
2.75
|
0.275
|
0.725
|
35
|
100
|
3.00
|
0.350
|
0.650
|
40
|
100
|
3.25
|
0.400
|
0.600
|
75
|
150
|
3.50
|
0.500
|
0.500
|
70
|
125
|
3.75
|
0.560
|
0.440
|
62
|
100
|
4.00
|
0.620
|
0.380
|
62
|
90
|
4.25
|
0.689
|
0.311
|
64
|
90
|
4.50
|
0.711
|
0.289
|
51
|
70
|
4.75
|
0.729
|
0.271
|
50
|
60
|
5.00
|
0.833
|
0.167
|
Marissa then goes on to derive the weights for each income level:
|
|
Logits
|
|
Weights |
Pi *(1-Pi)
|
Pi /(1-Pi)
|
Li
|
NiPi(1-Pi)
|
Wi
|
0.048
|
0.053
|
-2.944
|
0.950
|
0.975
|
0.062
|
0.071
|
-2.639
|
1.867
|
1.366
|
0.090
|
0.111
|
-2.197
|
2.700
|
1.643
|
0.109
|
0.143
|
-1.946
|
4.375
|
2.092
|
0.128
|
0.176
|
-1.735
|
5.100
|
2.258
|
0.134
|
0.190
|
-1.658
|
6.720
|
2.592
|
0.148
|
0.220
|
-1.516
|
7.380
|
2.717
|
0.160
|
0.250
|
-1.386
|
9.600
|
3.098
|
0.167
|
0.270
|
-1.310
|
13.388
|
3.659
|
0.199
|
0.379
|
-0.969
|
15.950
|
3.994
|
0.228
|
0.538
|
-0.619
|
22.750
|
4.770
|
0.240
|
0.667
|
-0.405
|
24.000
|
4.899
|
0.250
|
1.000
|
0.000
|
37.500
|
6.124
|
0.246
|
1.273
|
0.241
|
30.800
|
5.550
|
0.236
|
1.632
|
0.490
|
23.560
|
4.854
|
0.214
|
2.214
|
0.795
|
19.289
|
4.392
|
0.205
|
2.462
|
0.901
|
18.489
|
4.300
|
0.198
|
2.684
|
0.987
|
13.843
|
3.721
|
0.139
|
5.000
|
1.609
|
8.333
|
2.887
|
Now, Marissa must transform the logits and the independent variable (Income level) by multiplying them by their respective weights:
Income level ($10Ks)
|
Logits
|
Weights
|
Weighted Income
|
Weighted Logits
|
Xi
|
Li
|
Wi
|
Xi*
|
Li*
|
0.50
|
-2.944
|
0.975
|
0.487
|
-2.870
|
0.75
|
-2.639
|
1.366
|
1.025
|
-3.606
|
1.00
|
-2.197
|
1.643
|
1.643
|
-3.610
|
1.25
|
-1.946
|
2.092
|
2.615
|
-4.070
|
1.50
|
-1.735
|
2.258
|
3.387
|
-3.917
|
1.75
|
-1.658
|
2.592
|
4.537
|
-4.299
|
2.00
|
-1.516
|
2.717
|
5.433
|
-4.119
|
2.25
|
-1.386
|
3.098
|
6.971
|
-4.295
|
2.50
|
-1.310
|
3.659
|
9.147
|
-4.793
|
2.75
|
-0.969
|
3.994
|
10.983
|
-3.872
|
3.00
|
-0.619
|
4.770
|
14.309
|
-2.953
|
3.25
|
-0.405
|
4.899
|
15.922
|
-1.986
|
3.50
|
0.000
|
6.124
|
21.433
|
0.000
|
3.75
|
0.241
|
5.550
|
20.812
|
1.338
|
4.00
|
0.490
|
4.854
|
19.415
|
2.376
|
4.25
|
0.795
|
4.392
|
18.666
|
3.491
|
4.50
|
0.901
|
4.300
|
19.349
|
3.873
|
4.75
|
0.987
|
3.721
|
17.673
|
3.674
|
5.00
|
1.609
|
2.887
|
14.434
|
4.646
|
Now, Marissa can run OLS on the transformed model, using Weights (wi) and Weighted Income (X*) as independent variables and the Weighted Logits (L*) as the dependent variable.
Marissa derives the following regression equation:
Interpreting the Model
As expected, weighted income has a positive relationship on likelihood of churn. However, her sample is just 19 observations, so Marissa must be very careful about drawing too strong an inference from these results. While R2 is a strong 0.981, it too must not be relied upon. In fact, it is pretty meaningless in a logit model. Also, notice that there is no intercept term in this model. You will recall that when using WLS to correct for heteroscedasticity, the intercept was lost in the transformed model and actually became its slope. It is equivalent to the slope in an unadjusted regression model, since heteroscedasticity doesn’t bias parameter estimates.
Calculating Probabilities
Now Marissa needs to use this model to assess current customers’ likelihood of churning. Let’s say she sees a customer who makes $9,500 a year. That customer would be in the income group, 1.0. What is that customer’s probability of churning? Marissa takes the weight, 1.643 for her wi and the weighted X* (also 1.643), and plugs them into her equation:
= -4.121
To get to the probability, Marissa must take the antilog of these logits, which will give her the odds ratio: 0.016
Now Marissa calculates this customer’s probability of churning:
So, a customer earning $9,500 per year has less than a two percent chance of churning. Had the customer been earning $46,000, he/she would have had a whopping 98.7% chance of churning!
There are equivalents of R2 that are used for logistic regression, but that discussion is beyond the scope of this post. Today’s post was to give you a primer on the theory of logistic regression.
Next Forecast Friday Topic: The Identification Problem
We have just concluded our discussions on qualitative choice models. Next week, we begin our three-part miniseries on Simultaneous Equations and Two-Stage Least Squares Regression. The first post will discuss the Identification problem.
*************************
We’re Almost at Our Goal of 200 Facebook Fans !
Thanks to all of you, Analysights now has 190 fans on Facebook! Can you help us get up to 200 fans by tomorrow? If you like Forecast Friday – or any of our other posts – then we want you to “Like” us on Facebook! And if you like us that much, please also pass these posts on to your friends who like forecasting and invite them to “Like” Analysights! By “Like-ing” us on Facebook, you’ll be informed every time a new blog post has been published, or when new information comes out. Check out our Facebook page! You can also follow us on Twitter. Thanks for your help!