🟩

回帰分析における説明変数の線形変換と回帰係数の関係（２）

2024/07/21に公開

はじめに

前回の記事の続きである．

線形重回帰における回帰係数ベクトル

\hat{\bm{\beta}} = (X^\top X)^{-1} X^\top \bm{y} = \frac{1}{\det{X^\top X}} \widetilde{X^\top X} \ X^\top \bm{y}

について，前回は $\det{X^\top X}$ の変換性について以下の二つが成り立つことを確認した：

説明変数データの定数 $c$ 倍については， $\det{X^\top X}$ が $c^2$ 倍となる．
説明変数データに定数を加算しても， $\det{X^\top X}$ は不変である．

（ここで $\displaystyle X = \begin{pmatrix} \bm{1} & \bm{x}_1 & \cdots & \bm{x}_p \end{pmatrix}$ は $p$ 個の説明変数データベクトル $\bm{x}_1, \ldots \bm{x}_p \in \mathbb{R}^n$ と全ての成分が $1$ である定数ベクトル $\bm{1} \in \mathbb{R}^n$ からなる計画行列である．この記事でも簡単のため $p = 2$ として考察する．）

今回の記事では $\displaystyle \widetilde{X^\top X} \ X^\top \bm{y}$ の変換性について考察し，説明変数データの（線形）変換に応じた回帰係数の変換の様子を確認する．加えて，予測値の変換の様子も見ることとする．また，Pythonを使った確認も行う．

最初に結論

計画行列 $\displaystyle X = \begin{pmatrix} \bm{1} & \bm{x}_1 & \bm{x}_2 \end{pmatrix}$ から作成した線形回帰モデル

f_X(t_1, t_2) = \hat{\beta}_0 + \hat{\beta}_1 t_1 + \hat{\beta}_2 t_2

があるとする．

ここで， $\bm{x}_1 \to \bm{x}_1^{new} = c_1 \bm{x}_1 + d_1 \bm{1}, \ \bm{x}_2 \to \bm{x}_2^{new} = c_2 \bm{x}_2 + d_2 \bm{1}$ という変換を行ったデータからなる計画行列 $\displaystyle X^{new} = \begin{pmatrix} \bm{1} & \bm{x}_1^{new} & \bm{x}_2^{new} \end{pmatrix}$ で作られる線形回帰モデルは，上式の $t_1$ を $\displaystyle t_1 \to \frac{t_1^{new} - d_1}{c_1}$ で， $t_2$ を $\displaystyle t_2 \to \frac{t_2^{new} - d_2}{c_2}$ でそれぞれ置き換えた

f_{X^{new}}(t_1^{new}, t_2^{new}) = \hat{\beta}_0 + \hat{\beta}_1 \left( \frac{t_1^{new} - d_1}{c_1} \right) + \hat{\beta}_2 \left( \frac{t_1^{new} - d_2}{c_2} \right)

である．

このうえで，変換後のモデル $f_{X^{new}}$ に対して"学習データ"，つまり計画行列 $X^{new}$ の第 $i$ 行にある $x_{i1}^{new}, x_{i2}^{new}$ を入力すると，

\begin{align*} f_{X^{new}}(x_{i1}^{new}, x_{i2}^{new}) &= \hat{\beta}_0 + \hat{\beta}_1 \left( \frac{x_{i1}^{new} - d_1}{c_1} \right) + \hat{\beta}_2 \left( \frac{x_{i2}^{new} - d_2}{c_2} \right) \\ &= \hat{\beta}_0 + \hat{\beta}_1 \left( \frac{(c_1 x_{i1} + d_1) - d_1}{c_1} \right) + \hat{\beta}_2 \left( \frac{(c_2 x_{i2} + d_2) - d_2}{c_2} \right) \\ &= \hat{\beta}_0 + \hat{\beta}_1 x_{i1} + \hat{\beta}_2 x_{i2} \\ &= f_{X}(x_{i1}, x_{i2}) \end{align*}

となる．つまり入力が学習データである限りにおいては，予測値は学習データの（線形）変換をしても不変である．よって決定係数 $R^2$ も不変である．

以下，これらのことを確かめていく．

行列式の計算による確認

余因子行列 $\displaystyle \widetilde{X^\top X}$ の成分を $*_{ij}$ （添え字は $0$ から始まるとした）で表すと，

\hat{\bm{\beta}} = \begin{pmatrix} \hat{\beta}_0 \\ \hat{\beta}_1 \\ \hat{\beta}_2 \end{pmatrix} = \frac{1}{\det{X^\top X}} \widetilde{X^\top X} \ X^\top \bm{y} = \frac{1}{\det{X^\top X}} \begin{pmatrix} *_{00} & *_{01} & *_{02} \\ *_{10} & *_{11} & *_{12} \\ *_{20} & *_{21} & *_{22} \end{pmatrix} \begin{pmatrix} \bm{1}^\top \bm{y} \\ \bm{x}_1^\top \bm{y} \\ \bm{x}_2^\top \bm{y} \end{pmatrix}

という状況である．変換性のわかっている $\det{X^\top X}$ 以外の部分について考察すればよい．今

X^\top X = \begin{pmatrix} \bm{1}^\top \\ \bm{x}_1^\top \\ \bm{x}_2^\top \end{pmatrix} \begin{pmatrix} \bm{1} & \bm{x}_1 & \bm{x}_2 \\ \end{pmatrix} = \begin{pmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_1 & \bm{1}^\top \bm{x}_2 \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_1 & \bm{x}_1^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_1 & \bm{x}_2^\top \bm{x}_2 \end{pmatrix}

であるので，余因子行列の各成分は以下の通りとなる（余因子行列の定義は例えば前回の記事を参照）：

\begin{align*} & *_{00} = + \begin{vmatrix} \bm{x}_1^\top \bm{x}_1 & \bm{x}_1^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{x}_1 & \bm{x}_2^\top \bm{x}_2 \end{vmatrix}, \quad *_{01} = - \begin{vmatrix} \bm{1}^\top \bm{x}_1 & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{x}_1 & \bm{x}_2^\top \bm{x}_2 \end{vmatrix}, \quad *_{02} = + \begin{vmatrix} \bm{1}^\top \bm{x}_1 & \bm{1}^\top \bm{x}_2 \\ \bm{x}_1^\top \bm{x}_1 & \bm{x}_1^\top \bm{x}_2 \end{vmatrix}, \\ & *_{10} = - \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix}, \quad *_{11} = + \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix}, \quad *_{12} = - \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \end{vmatrix}, \\ & *_{20} = + \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_1 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_1 \end{vmatrix}, \quad *_{21} = - \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_1 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_1 \end{vmatrix}, \quad *_{22} = + \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_1 \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_1 \end{vmatrix}. \end{align*}

よって，回帰係数ベクトル $\hat{\bm{\beta}} = (\hat{\beta}_0,\ \hat{\beta}_1 ,\ \hat{\beta}_2)^\top$ のそれぞれの成分は

\begin{align*} (\det{X^\top X}) \hat{\beta}_0 &= *_{00} \times \bm{1}^\top \bm{y} + *_{01} \times \bm{x}_1^\top \bm{y} + *_{02} \times \bm{x}_2^\top \bm{y} \\ &= \begin{vmatrix} \bm{x}_1^\top \bm{x}_1 & \bm{x}_1^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{x}_1 & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} - \begin{vmatrix} \bm{1}^\top \bm{x}_1 & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{x}_1 & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{x}_1^\top \bm{y} + \begin{vmatrix} \bm{1}^\top \bm{x}_1 & \bm{1}^\top \bm{x}_2 \\ \bm{x}_1^\top \bm{x}_1 & \bm{x}_1^\top \bm{x}_2 \end{vmatrix} \bm{x}_2^\top \bm{y}, \\ (\det{X^\top X}) \hat{\beta}_1 &= *_{10} \times \bm{1}^\top \bm{y} + *_{11} \times \bm{x}_1^\top \bm{y} + *_{12} \times \bm{x}_2^\top \bm{y} \\ &= - \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} + \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{x}_1^\top \bm{y} - \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \end{vmatrix} \bm{x}_2^\top \bm{y}, \\ (\det{X^\top X}) \hat{\beta}_2 &= *_{20} \times \bm{1}^\top \bm{y} + *_{21} \times \bm{x}_1^\top \bm{y} + *_{22} \times \bm{x}_2^\top \bm{y} \\ &= \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_1 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_1 \end{vmatrix} \bm{1}^\top \bm{y} - \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_1 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_1 \end{vmatrix} \bm{x}_1^\top \bm{y} + \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_1 \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_1 \end{vmatrix} \bm{x}_2^\top \bm{y} \end{align*}

と計算できる^[1]．

定数倍

この表示を見るに，対応する変数の定数 $c$ 倍は，係数に $1/c$ 倍の影響がある．対応しない変数の定数倍は影響なしである．

例えば， $\beta_1$ については

(\det{X^\top X}) \hat{\beta}_1 = - \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} + \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{x}_1^\top \bm{y} - \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \end{vmatrix} \bm{x}_2^\top \bm{y}

であり，対応する説明変数データ $\bm{x}_1$ が $c$ 倍（ $\bm{x}_1 \to c \bm{x}_1$ ）となった場合は，左辺の $\det{X^\top X}$ が $c^2$ 倍になり，右辺は行列式の性質から

\begin{align*} & - \begin{vmatrix} (c\bm{x}_1)^\top \bm{1} & (c\bm{x}_1)^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} + \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} (c\bm{x}_1)^\top \bm{y} - \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ (c\bm{x}_1)^\top \bm{1} & (c\bm{x}_1)^\top \bm{x}_2 \end{vmatrix} \bm{x}_2^\top \bm{y} \\ = & - c \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} + c \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{x}_1^\top \bm{y} - c \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \end{vmatrix} \bm{x}_2^\top \bm{y} \\ = & c \left( - \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} + \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{x}_1^\top \bm{y} - \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \end{vmatrix} \bm{x}_2^\top \bm{y} \right) \end{align*}

と $c$ 倍となる．これによって $\hat{\beta}_1$ は $1/c$ 倍となる．一方， $\bm{x}_2$ が $c$ 倍となった場合は左辺の $\det{X^\top X}$ も右辺も $c^2$ 倍となるため， $\hat{\beta}_1$ は不変である．

定数の足し引き

$\beta_1$ について

$\beta_1$ について考察する．

(\det{X^\top X}) \hat{\beta}_1 = - \underbrace{ \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} }_{(i)} + \underbrace{ \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{x}_1^\top \bm{y} }_{(ii)} - \underbrace{ \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \end{vmatrix} \bm{x}_2^\top \bm{y} }_{(iii)}

であり，対応する説明変数データ $\bm{x}_1$ が一律 $b$ だけ足される（ $\bm{x}_1 \to \bm{x}_1 + b\bm{1}$ ）場合，左辺の $\det{X^\top X}$ は不変である．右辺の変換の様子を見ていこう．

(i) \to \begin{vmatrix} (\bm{x}_1 + b\bm{1})^\top \bm{1} & (\bm{x}_1 + b\bm{1})^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} = (i) + b \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y},

\begin{align*} (ii) \to \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} (\bm{x}_1 + b \bm{1})^\top \bm{y} = (ii) + b \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y}, \end{align*}

(iii) \to \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ (\bm{x}_1 + b \bm{1})^\top \bm{1} & (\bm{x}_1 + b \bm{1})^\top \bm{x}_2 \end{vmatrix} \bm{x}_2^\top \bm{y} = (iii) + b \underbrace{ \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \end{vmatrix} }_{=0} \bm{x}_2^\top \bm{y}

より，変換後の右辺は

- \left\{ (i) + \cancel{ b \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} } \right\} + \left\{ (ii) + \cancel{ b \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} } \right\} - (iii)

となって結局 $- (i) + (ii) - (iii)$ のまま不変である．

対応しない説明変数データ $\bm{x}_2$ が一律 $b$ だけ足される（ $\bm{x}_2 \to \bm{x}_2 + b\bm{1}$ ）場合，

\begin{align*} (i) &\to \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top (\bm{x}_2 + b\bm{1}) \\ (\bm{x}_2 + b\bm{1})^\top \bm{1} & (\bm{x}_2 + b\bm{1})^\top (\bm{x}_2 + b\bm{1}) \end{vmatrix} \bm{1}^\top \bm{y} \\ &= \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top (\bm{x}_2 + b\bm{1}) \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top (\bm{x}_2 + b\bm{1}) \end{vmatrix} \bm{1}^\top \bm{y} + b \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top (\bm{x}_2 + b\bm{1}) \\ \bm{1}^\top \bm{1} & \bm{1}^\top (\bm{x}_2 + b\bm{1}) \end{vmatrix} \bm{1}^\top \bm{y} \\ &= (i) + b \underbrace{ \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{1} \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{1} \end{vmatrix} }_{=0} \bm{1}^\top \bm{y} + b \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \\ \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} + b^2 \underbrace{ \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{1} \\ \bm{1}^\top \bm{1} & \bm{1}^\top \bm{1} \end{vmatrix} }_{=0} \bm{1}^\top \bm{y}, \end{align*}

\begin{align*} (ii) &\to \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top (\bm{x}_2 + b \bm{1}) \\ (\bm{x}_2 + b \bm{1})^\top \bm{1} & (\bm{x}_2 + b \bm{1})^\top (\bm{x}_2 + b \bm{1}) \end{vmatrix} \bm{x}_1^\top \bm{y} \\ &= \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top (\bm{x}_2 + b \bm{1}) \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top (\bm{x}_2 + b \bm{1}) \end{vmatrix} \bm{x}_1^\top \bm{y} + b \underbrace{ \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top (\bm{x}_2 + b \bm{1}) \\ \bm{1}^\top \bm{1} & \bm{1}^\top (\bm{x}_2 + b \bm{1}) \end{vmatrix} }_{=0} \bm{x}_1^\top \bm{y} \\ &= (ii) + b \underbrace{ \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{1} \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{1} \end{vmatrix} }_{=0} \bm{x}_1^\top \bm{y}, \end{align*}

\begin{align*} (iii) &\to \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top (\bm{x}_2 + b \bm{1}) \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top (\bm{x}_2 + b \bm{1}) \end{vmatrix} (\bm{x}_2 + b \bm{1})^\top \bm{y} \\ &= \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top (\bm{x}_2 + b \bm{1}) \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top (\bm{x}_2 + b \bm{1}) \end{vmatrix} \bm{x}_2^\top \bm{y} + b \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top (\bm{x}_2 + b \bm{1}) \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top (\bm{x}_2 + b \bm{1}) \end{vmatrix} \bm{1}^\top \bm{y} \\ &= (iii) + b \underbrace{ \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{1} \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{1} \end{vmatrix} }_{=0} \bm{x}_2^\top \bm{y} + b \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} + b^2 \underbrace{ \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{1} \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{1} \end{vmatrix} }_{=0} \bm{1}^\top \bm{y} \end{align*}

となる．よって変換後の右辺は

- \left\{ (i) + \cancel{ b \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \\ \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} } \right\} + (ii) - \left\{ (iii) + \cancel{ b \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} } \right\}

となりやはり不変である．

切片項

切片 $\hat{\beta}_0$ について考察する．

(\det{X^\top X}) \hat{\beta}_0 = \underbrace{ \begin{vmatrix} \bm{x}_1^\top \bm{x}_1 & \bm{x}_1^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{x}_1 & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} }_{(i)'} - \underbrace{ \begin{vmatrix} \bm{1}^\top \bm{x}_1 & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{x}_1 & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{x}_1^\top \bm{y} }_{(ii)'} + \underbrace{ \begin{vmatrix} \bm{1}^\top \bm{x}_1 & \bm{1}^\top \bm{x}_2 \\ \bm{x}_1^\top \bm{x}_1 & \bm{x}_1^\top \bm{x}_2 \end{vmatrix} \bm{x}_2^\top \bm{y} }_{(iii)'}

であり，説明変数データ $\bm{x}_1$ が一律 $b$ だけ足される（ $\bm{x}_1 \to \bm{x}_1 + b\bm{1}$ ）場合，左辺の $\det{X^\top X}$ は不変である．右辺の変換の様子を見ていこう．

\begin{align*} (i)' &\to \begin{vmatrix} (\bm{x}_1 + b \bm{1})^\top (\bm{x}_1 + b \bm{1}) & (\bm{x}_1 + b \bm{1})^\top \bm{x}_2 \\ \bm{x}_2^\top (\bm{x}_1 + b \bm{1}) & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} \\ &= \begin{vmatrix} (\bm{x}_1 + b \bm{1})^\top \bm{x}_1 & (\bm{x}_1 + b \bm{1})^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{x}_1 & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} + b \begin{vmatrix} (\bm{x}_1 + b \bm{1})^\top \bm{1} & (\bm{x}_1 + b \bm{1})^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} \\ &= (i)' + b \begin{vmatrix} \bm{1}^\top \bm{x}_1 & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{x}_1 & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} + b \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} + b^2 \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y}, \end{align*}

\begin{align*} (ii)' &\to \begin{vmatrix} \bm{1}^\top (\bm{x}_1 + b \bm{1}) & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top (\bm{x}_1 + b \bm{1}) & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} (\bm{x}_1 + b \bm{1})^\top \bm{y} \\ &= \begin{vmatrix} \bm{1}^\top (\bm{x}_1 + b \bm{1}) & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top (\bm{x}_1 + b \bm{1}) & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{x}_1^\top \bm{y} + b \begin{vmatrix} \bm{1}^\top (\bm{x}_1 + b \bm{1}) & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top (\bm{x}_1 + b \bm{1}) & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} \\ &= (ii)' + b \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{x}_1^\top \bm{y} + b \begin{vmatrix} \bm{1}^\top \bm{x}_1 & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{x}_1 & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} + b^2 \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} \end{align*}

\begin{align*} (iii)' &\to \begin{vmatrix} \bm{1}^\top (\bm{x}_1 + b \bm{1}) & \bm{1}^\top \bm{x}_2 \\ (\bm{x}_1 + b \bm{1})^\top (\bm{x}_1 + b \bm{1}) & (\bm{x}_1 + b \bm{1})^\top \bm{x}_2 \end{vmatrix} \bm{x}_2^\top \bm{y} \\ &= \begin{vmatrix} \bm{1}^\top (\bm{x}_1 + b \bm{1}) & \bm{1}^\top \bm{x}_2 \\ \bm{x}_1^\top (\bm{x}_1 + b \bm{1}) & \bm{x}_1^\top \bm{x}_2 \end{vmatrix} \bm{x}_2^\top \bm{y} + b \underbrace{ \begin{vmatrix} \bm{1}^\top (\bm{x}_1 + b \bm{1}) & \bm{1}^\top \bm{x}_2 \\ \bm{1}^\top (\bm{x}_1 + b \bm{1}) & \bm{1}^\top \bm{x}_2 \end{vmatrix} }_{=0} \bm{x}_2^\top \bm{y} \\ &= (iii)' + b \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \end{vmatrix} \bm{x}_2^\top \bm{y} \end{align*}

となる．よって右辺は

\begin{align*} & \left\{ (i)' + \cancel{ b \begin{vmatrix} \bm{1}^\top \bm{x}_1 & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{x}_1 & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} } + b \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} + \bcancel{ b^2 \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} } \right\} \\ - & \left\{ (ii)' + b \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{x}_1^\top \bm{y} + \cancel{ b \begin{vmatrix} \bm{1}^\top \bm{x}_1 & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{x}_1 & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} } + \bcancel{ b^2 \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} } \right\} \\ + & \left\{ (iii)' + b \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \end{vmatrix} \bm{x}_2^\top \bm{y} \right\} \\ = & (i)' - (ii)' + (iii)' + b \left\{ \begin{vmatrix} \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{1}^\top \bm{y} - \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_2^\top \bm{1} & \bm{x}_2^\top \bm{x}_2 \end{vmatrix} \bm{x}_1^\top \bm{y} + \begin{vmatrix} \bm{1}^\top \bm{1} & \bm{1}^\top \bm{x}_2 \\ \bm{x}_1^\top \bm{1} & \bm{x}_1^\top \bm{x}_2 \end{vmatrix} \bm{x}_2^\top \bm{y} \right\} \\ =& (i)' - (ii)' + (iii)' - b (\det{X X^\top}) \hat{\beta}_1 \end{align*}

へと変換される．つまり，

\hat{\beta}_0 \to \hat{\beta}_0 - b \hat{\beta}_1

となる．

行列式の計算による確認まとめ

以上のことを表にまとめると以下の通り．

定数倍

回帰係数	$\bm{1} \to c\bm{1}$	$\bm{x}_1 \to c\bm{x}_1$	$\bm{x}_2 \to c\bm{x}_2$
$\hat{\beta}_0$	$1/c$ 倍	不変	不変
$\hat{\beta}_1$	不変	$1/c$ 倍	不変
$\hat{\beta}_2$	不変	不変	$1/c$ 倍

定数の足し引き

回帰係数	$\bm{1} \to \bm{1} + b\bm{1}$	$\bm{x}_1 \to \bm{x}_1 + b\bm{1}$	$\bm{x}_2 \to \bm{x}_2 + b\bm{1}$
$\hat{\beta}_0$	$1/(1+b)$ 倍	$\hat{\beta}_0 - b \hat{\beta}_1$	$\hat{\beta}_0 - b \hat{\beta}_2$
$\hat{\beta}_1$	不変	不変	不変
$\hat{\beta}_2$	不変	不変	不変

複数の説明変数データを同時に足し引き $\bm{x}_1 \to \bm{x}_1 + b_1\bm{1}, \bm{x}_2 \to \bm{x}_2 + b_2\bm{1}$ する場合でも，説明変数データをひとつずつ動かすことと同じなので，変換は上記の表の通りとなる．

複合的な変換

また， $\bm{x}_1 \to c \bm{x}_1 + d\bm{1}$ という変換は $\displaystyle c\bm{x}_1 + d\bm{1} = c \left( \bm{x}_1 + \frac{d}{c}\bm{1} \right)$ とみることで定数の足し引きをした後に定数倍をすることに等しい．よってそれぞれの回帰係数は

\hat{\beta}_0 \to \hat{\beta}_0 - \frac{d}{c} \hat{\beta}_1， \quad \hat{\beta}_1 \to \frac{\hat{\beta}_1}{c}, \quad \hat{\beta}_2 \to \hat{\beta}_2

と変換される．

つまり，変換前の線形回帰モデルが

f_X(t_1, t_2) = \hat{\beta}_0 + \hat{\beta}_1 t_1 + \hat{\beta}_2 t_2

だとすると，変換後の線形回帰モデルは

\begin{align*} f_{X^{new}}(t_1^{new}, t_2^{new}) &= \left( \hat{\beta}_0 - \frac{d}{c} \hat{\beta}_1 \right) + \frac{\hat{\beta}_1}{c} t_1^{new} + \hat{\beta}_2 t_2^{new} \\ &= \hat{\beta}_0 + \hat{\beta}_1 \left( \frac{t_1^{new} - d}{c} \right) + \hat{\beta}_2 t_2^{new} \end{align*}

となるということである．

Pythonによる確認

以上のことをPythonライブラリstatsmodelsで確かめてみる．

import pandas as pd
import statsmodels.api as sm

# UCI Machine Leaning Repository 「Wine Quality Data Set (ワインの品質)」の赤ワインのデータセット
df = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", sep=";")

### 重回帰分析
### quality（品質）以外の列からqualityを予測する

# 回帰分析に使うデータの指定
x = df[['fixed acidity','volatile acidity','citric acid','residual sugar','chlorides','free sulfur dioxide','total sulfur dioxide','density','pH','sulphates','alcohol']] #説明変数
y = df[['quality']] #目的変数
 
# 切片として全要素が1.0の列を説明変数の先頭に追加
X = sm.add_constant(x)

# モデルの設定(OLS＝最小二乗法を指定)
model = sm.OLS(y, X)

# 回帰分析の実行
results = model.fit()

ここでalcoholについて（線形）変換5 * alcohol + 20を行ってみる．

X_transformed = X.copy()

c = 5
d = 20

X_transformed["alcohol"] = c * X_transformed["alcohol"] + d

# モデルの設定(OLS＝最小二乗法を指定)
model_transformed = sm.OLS(y, X_transformed)

# 回帰分析の実行
result_transformed = model_transformed.fit()

となる．それぞれの回帰係数は以下のようになる．

result.params

const                   21.965208
fixed acidity            0.024991
volatile acidity        -1.083590
citric acid             -0.182564
residual sugar           0.016331
chlorides               -1.874225
free sulfur dioxide      0.004361
total sulfur dioxide    -0.003265
density                -17.881164
pH                      -0.413653
sulphates                0.916334
alcohol                  0.276198
dtype: float64

result_transformed.params

const                   20.860418
fixed acidity            0.024991
volatile acidity        -1.083590
citric acid             -0.182564
residual sugar           0.016331
chlorides               -1.874225
free sulfur dioxide      0.004361
total sulfur dioxide    -0.003265
density                -17.881164
pH                      -0.413653
sulphates                0.916334
alcohol                  0.055240
dtype: float64

となっており，違いがあるのは切片constと，変換を施した変数alcoholに対応する回帰係数である．

変換前後の回帰係数の比較

変換前後のalcoholの回帰係数を比較すると，

print(result_transformed.params.alcohol)
print(result.params.alcohol / c)

0.055239539845377286
0.05523953984537707

となって「変換後のalcoholの回帰係数 ≒ 変換前のalcoholの回帰係数の $1/c$ 倍」が成り立っている．

変換前後の切片の比較

変換前後の切片constを比較すると，

print(result_transformed.params.const)
print(result.params.const - (d / c) * result.params.alcohol)

20.860417652540093
20.86041765253972

となって「変換後の切片const ≒ 変換前の切片const - 変換前のalcoholの回帰係数の $d/c$ 倍」が成り立っている．

変換前後の予測値と決定係数の比較

変換前後の予測値については

result.predict(X)

0       5.032850
1       5.137880
2       5.209895
3       5.693858
4       5.032850
          ...   
1594    5.529771
1595    5.961613
1596    5.943043
1597    5.470756
1598    6.008196
Length: 1599, dtype: float64

result_transformed.predict(X_transformed)

0       5.032850
1       5.137880
2       5.209895
3       5.693858
4       5.032850
          ...   
1594    5.529771
1595    5.961613
1596    5.943043
1597    5.470756
1598    6.008196
Length: 1599, dtype: float64

となって等しい．

また，変換前後の決定係数 $R^2$ についても

print(result.rsquared)
print(result_transformed.rsquared)

0.36055170303868855
0.36055170303868855

と等しいことが確認できる．

まとめ

以上から，記事の最初に触れたように，計画行列 $\displaystyle X = \begin{pmatrix} \bm{1} & \bm{x}_1 & \bm{x}_2 \end{pmatrix}$ から作成した線形回帰モデル

f_X(t_1, t_2) = \hat{\beta}_0 + \hat{\beta}_1 t_1 + \hat{\beta}_2 t_2

があるとしたとき， $\bm{x}_1 \to \bm{x}_1^{new} = c_1 \bm{x}_1 + d_1 \bm{1}, \ \bm{x}_2 \to \bm{x}_2^{new} = c_2 \bm{x}_2 + d_2 \bm{1}$ という変換を行ったデータからなる計画行列 $\displaystyle X^{new} = \begin{pmatrix} \bm{1} & \bm{x}_1^{new} & \bm{x}_2^{new} \end{pmatrix}$ で作られる線形回帰モデルは

f_{X^{new}}(t_1^{new}, t_2^{new}) = \hat{\beta}_0 + \hat{\beta}_1 \left( \frac{t_1^{new} - d_1}{c_1} \right) + \hat{\beta}_2 \left( \frac{t_1^{new} - d_2}{c_2} \right)

である．

また，入力が学習データである限りにおいては，予測値は学習データの（線形）変換をしても不変である： $f_{X^{new}}(x_{i1}^{new}, x_{i2}^{new}) = f_{X}(x_{i1}, x_{i2})$ ．よって特に決定係数 $R^2$ も不変である．

脚注

この表示から，説明変数の数が一般の $p$ 個の場合の変換性も同様に導けそうである． ↩︎

Discussion

ログインするとコメントできます

はじめに

最初に結論

行列式の計算による確認

定数倍

定数の足し引き

β1\beta_1β1​について

切片項

行列式の計算による確認まとめ

定数倍

定数の足し引き

複合的な変換

Pythonによる確認

変換前後の回帰係数の比較

変換前後の切片の比較

変換前後の予測値と決定係数の比較

まとめ

Discussion

$\beta_1$ について