🐣

【ML】Understaing "Logit" from that mechanism

2024/05/24に公開

機械学習

logit

tech

1. Introduction

I heard the word "logit"(or logits) somewhere related to machine learning, and I can't understand it. So I will write this article for like me.

2. What is the logit?

2.1 Value before softmax layer

The usage often I saw is a value that before providing softmax layer. It seems the value before be probability is called logits.

This is a conclution, but let see more deeper a little.

2.2 Log Odds

There is a word Odds that calculated from probability( $p$ ). It is the ratio of the probability of the event occuring to the probability of it not occuring.
Odds $= \dfrac{p}{1-p}$

when apply log trans, it called Log Odds(Logit).
Log odds $= \log\left(\dfrac{p}{1-p}\right)$

In regression task, model have to predict this value by presuming the coefficients $\beta_n$ for the explanatory(predictor) variable $X_n$ .

By performing learning so that ${Logit} = \log\left(\dfrac{p}{1-p}\right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 ... + \beta_n X_n$ holds true as a model, the probability can be found using $p=\dfrac{e^{Logit}}{1+e^{Logit}}$ , and this calculation is just softmax.

Yes, this is the reason the value before softmax is called logit.

3. Why the model Don't predict probability directly?

It's a natural question, why the model don't predict probability directly? There are some reasons.

Linear Relationship
Logistic regression models aim to find a Linear relationship between the predictors (independent variables) and the outcome. But probabilities (which range from 0 to 1) do not have a Linear relationship with predictors. However, log odds can range from $-\infty \space to + \infty$ and can be modeled Linearly, so it is used.
Avoiding Non-linearity
Directly modeling probabilities would require a non-linear approach because the probability function is inherently non-linear.
By transforming probabilities into log odds using the logit function, we can use linear regression techniques.(only at the end, using sigmoid(non-niear) for exchange to probalility from raw value)
The reason of require Non-linear method are for simply, calculation efficiency, low overfitting risk, etc.

4. Summary

Logit is the value existing before softmax layer, and that name come from Log Odds.