【ML】Understaing "Logit" from that mechanism


1. Introduction

I heard the word "logit"(or logits) somewhere related to machine learning, and I can't understand it. So I will write this article for like me.

2. What is the logit?

2.1 Value before softmax layer

The usage often I saw is a value that before providing softmax layer. It seems the value before be probability is called logits.

This is a conclution, but let see more deeper a little.

2.2 Log Odds

There is a word Odds that calculated from probability(p). It is the ratio of the probability of the event occuring to the probability of it not occuring.
Odds = \dfrac{p}{1-p}

when apply log trans, it called Log Odds(Logit).
Log odds = \log\left(\dfrac{p}{1-p}\right)

In regression task, model have to predict this value by presuming the coefficients \beta_n for the explanatory(predictor) variable X_n.

By performing learning so that {Logit} = \log\left(\dfrac{p}{1-p}\right) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 ... + \beta_n X_n holds true as a model, the probability can be found using p=\dfrac{e^{Logit}}{1+e^{Logit}}, and this calculation is just softmax.

Yes, this is the reason the value before softmax is called logit.

3. Why the model Don't predict probability directly?

It's a natural question, why the model don't predict probability directly? There are some reasons.

  1. Linear Relationship
    Logistic regression models aim to find a Linear relationship between the predictors (independent variables) and the outcome. But probabilities (which range from 0 to 1) do not have a Linear relationship with predictors. However, log odds can range from -\infty \space to + \infty and can be modeled Linearly, so it is used.

  2. Avoiding Non-linearity
    Directly modeling probabilities would require a non-linear approach because the probability function is inherently non-linear.
    By transforming probabilities into log odds using the logit function, we can use linear regression techniques.(only at the end, using sigmoid(non-niear) for exchange to probalility from raw value)
    The reason of require Non-linear method are for simply, calculation efficiency, low overfitting risk, etc.

4. Summary

Logit is the value existing before softmax layer, and that name come from Log Odds.