【ML】Understaing "Logit" from that mechanism
1. Introduction
I heard the word "logit"(or logits) somewhere related to machine learning, and I can't understand it. So I will write this article for like me.
2. What is the logit?
2.1 Value before softmax layer
The usage often I saw is a value that before providing softmax layer. It seems the value before be probability is called logits.
This is a conclution, but let see more deeper a little.
2.2 Log Odds
There is a word Odds that calculated from probability(
Odds
when apply log trans, it called Log Odds(Logit).
Log odds
In regression task, model have to predict this value by presuming the coefficients
By performing learning so that
Yes, this is the reason the value before softmax is called logit.
3. Why the model Don't predict probability directly?
It's a natural question, why the model don't predict probability directly? There are some reasons.

Linear Relationship
Logistic regression models aim to find a Linear relationship between the predictors (independent variables) and the outcome. But probabilities (which range from 0 to 1) do not have a Linear relationship with predictors. However, log odds can range from and can be modeled Linearly, so it is used.\infty \space to + \infty 
Avoiding Nonlinearity
Directly modeling probabilities would require a nonlinear approach because the probability function is inherently nonlinear.
By transforming probabilities into log odds using the logit function, we can use linear regression techniques.(only at the end, using sigmoid(nonniear) for exchange to probalility from raw value)
The reason of require Nonlinear method are for simply, calculation efficiency, low overfitting risk, etc.
4. Summary
Logit is the value existing before softmax layer, and that name come from Log Odds.
Discussion