What is odds, log odds and logit (feat. Slam Dunk story)?
Odds and logit is the basic concept to understand logistic regression. Today I’ll explain what it is as much as easily. Do you know a comic book, ‘Slam Dunk’? I’ll explain odds with this story.
1) Odds
Now, Shohoku high school is playing games with other high schools in the tournament. In the first round, Shohoku high school won 4 games and lost 6 games out of 10 games. Now the winning odds of Shohoku high school is 4/6 ≈ 0.67
In the 2nd round, Shohoku high school won 8 games and lost 2 games out of 10 games. Now the winning odds of Shohoku high school is 8/2 = 4.0
However, winning odds 0.67 or 4.0 is not familiar with us (What is that!!!). So, let’s talk about that in terms of probability. In the first round, the winning probability of Shohoku high school is 4/10 = 40.0%
and in the 2nd round, the winning probability is 8/10 = 80.0%
Now it seems much clear!! Also, we can understand the difference between odds and probability.
In the first round, the winning odds and probability of Shohoku high school is 4/6 and 4/10 respectively.
In the 2nd round, the winning odds and probability of Shohoku high school is 2/8 and 8/10 respectively.
Now, I’m interested in the ratio between probability of winning and probability of losing.
and the ratio can be explained as probability of winning / (1- probability of winning)
. If you have 80% probability of winning, you’ll have 20% (=1-80%) probability of losing.
Then, the ratio in the 1st and 2nd round will be calculated as below.
Aren’t you familiar with the number, 0.67 and 4.0? This is the winning odds of Shohoku high school in the 1st and 2nd round. That is, the winning odds of Shohoku high school can be calculated as
Simply, let’s say as
Eventually, odds is the ratio between p and (1-p) with regard to categorical values (i.e., win/lose, pass/fail, male/female, etc.)
2) log (odds)
Here is another story about Shohoku high school. The main players, Takenori Akagi, Hisashi Mitsui, Ryota Miyagi, Kaede Rukawa and Hanamichi Sakuragi graduated from Shohoku high school and left Shohoku high school. The team strategy of Shohoku high school had been weakened and they’re playing games in the tournament.
A) In the 1st round, they won 1 games and lost 4 games out of 5 games.
B) In the 2nd round, they won 1 games and lost 8 games out of 9 games.
C) In the 3rd round, they won 1 games and lost 16 games out of 17 games.
D) In the 4th round, they won 1 games and lost 32 games out of 33 games.
The winning odds of Shohoku high is
A) 1/4 = 0.25
B) 1/8 = 0.125
C) 1/16 = 0.062
D) 1/32 = 0.031
The more team strategy has been weakened, the more the winning odds goes to 0. In other words, when the winning odds decreases, the value would be between 0 and 1.
After this tournament, Shohoku high school built up their strategy, and made the best players again.
In the next tournament,
A) In the 1st round, they won 4 games and lost 1 games out of 5 games.
B) In the 2nd round, they won 7 games and lost 2 games out of 9 games.
C) In the 3rd round, they won 15 games and lost 2 games out of 17 games.
D) In the 4th round, they won 30 games and lost 3 games out of 33 games.
In this case, the winning odds of Shohoku high school is
A) 4/1 = 4
B) 7/2 = 3.5
C) 15/2 = 7.5
D) 33/3 = 11.0
Let’s arrange the winning odds in the line.
This asymmetrical dispersion makes hard to compare the winning odds. How can we change this asymmetricity?
Here is a joke about log!! “If your data seems weird, take logarithms, and you solved the problem”
So, let’s take logarithm.
log (odds)
When team strategy was weakened,
log (0.25) = -0.60
log (0.125) = -0.90
log (0.062) = -1.21
log (0.031) = -1.51
When team strategy was strengthened,
log (4) = 0.60
log (3.5) = 0.54
log (7.5) = 0.88
log (11.0) = 1.04
When I took log of values, it seems data are symmetrical.
This process would be expressed as a formula
We call log(p/(1-p))
‘logit fuction’, and this is the basic concept of logistic regression.
Why log(odds)
is important?
Kiminobu Kogure majored in data science in a collenge and now he is working in Shohoku high school as a strategy analyst. He analyzed the the record of victory and defeat about Shohoku high school for several years, and he obtained 100 log(odds)
. This data would be shown as a normal distribution.
If the data shows a normal distribution, we can have more statistical analysis, particularly in categorical values.
Wrap up!!
1) Odds is the ratio between something happens and something not happens.
2) and the equation is p/(1-p)
3) log (odds) changes asymmetrical data to symmetrical data.
Keypoint!!
Just because odds are the ratio, it does not mean ‘odds ratio’.
In the next post, I’ll explain what odds ratio is.