🐍

【ML】What is multiple regression analysis

2024/10/07に公開

1. What is multiple regression analysis

Imitation of an objective function by superimposing linear models.
This can be expressed as the product of the matrices of the explanatory variables and the weights.

In terms of NN, it is a single layer with a fully connected layer without an activation function.
Since there is no activation function, it cannot capture nonlinearity, but since there are few parameters, it is possible to analytically find the optimal solution for the weights.

Normally, the loss function is least squares, and with the current performance of computers, the optimal solution (the weight value that minimizes the loss function) can be found by matrix calculation.

Since it is the optimal solution, it is prone to overfitting, but since the output is linear to begin with, it does not seem to be much of a problem.
There are also ridge regression and lasso regression that add regularization terms.

2. Ridge Regression (L2 Regularization)

Ridge regression adds a penalty proportional to the square of the coefficients to the loss function. This forces the coefficients to be smaller but does not eliminate any of them, meaning all features remain in the model but with reduced impact.

Mathematically, the loss function for Ridge regression is:

\text{Loss} = \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p \beta_j^2

Where:

  • y_i is the actual value,
  • \hat{y}_i is the predicted value,
  • \beta_j are the coefficients of the regression,
  • \lambda is the regularization strength (a hyperparameter you choose).

3. Lasso Regression (L1 Regularization)

Lasso regression, on the other hand, adds a penalty proportional to the absolute value of the coefficients. This regularization method can result in some coefficients being exactly zero, effectively performing feature selection by removing irrelevant features.

Mathematically, the loss function for Lasso regression is:

\text{Loss} = \sum_{i=1}^n (y_i - \hat{y}_i)^2 + \lambda \sum_{j=1}^p |\beta_j|

Because Lasso can shrink some coefficients to zero, it is useful when you suspect that many features are irrelevant or redundant

Discussion