👌

Principal component analysis

2024/02/21に公開

When a dataset X = \set{x_n}_{n=1}^N ,\, x_n \in \R^D is given, we aim to obtain Z \subseteq \R^{N \times Q}, where Q \ll D, representing a low-dimensional expression of X.


Let's consider the first principal component z_1 \in \R, and define

\begin{align*} z_{1n} &= x_n^\top w_1, \quad w_1 \in \R^D \\ \end{align*}

assuming E[z_1] = 0.


The variance of z_1 is

\begin{align*} \mathrm{Var}[z_1] &= \frac{1}{N} \sum_{n=1}^N (z_{1n} - E[z_1])^2 \\ &= \frac{1}{N} \sum_{n=1}^N (x_n^\top w_1)^2 \\ &= \frac{1}{N} \sum_{n=1}^N (x_n^\top w_1)(x_n^\top w_1) \\ &= \frac{1}{N} \sum_{n=1}^N w_1^\top x_n x_n^\top w_1\\ &= w_1^\top \left( \frac{1}{N} \sum_{n=1}^N x_n x_n^\top \right) w_1\\ &= w_1^\top \left( \frac{1}{N} X X^\top \right) w_1 ,\quad V \in \R^{N \times N}\\ &= w_1^\top V w_1\\ \end{align*}

Here, I define V = \frac{1}{N} X X^\top as the covariance matrix.


To maximize \mathrm{Var}[z_1], we introduce the Lagrange multiplier \lambda_1 with the constraint \| w_1 \|^2 = 1,

\begin{align*} J_1 &= w_1^\top V w_1 - \lambda_1 (w_1^\top w_1 - 1) \\ \frac{\partial}{\partial w_1} J_1 &= (V + V^\top) w_1 - \lambda_1 (I + I^\top) w_1 \\ &= 2 V w_1 - 2 \lambda_1 w_1 \\ &= 2 (V - \lambda_1 I) w_1 \\ &= 0 \\ \end{align*}

therefore, the \lambda_1 and w_1 that maximize J_1 satisfy

\begin{align*} (V - \lambda_1 I) w_1 &= 0 \\ \iff V w_1 = \lambda_1 w_1 \end{align*}

, indicating that \lambda_1 and w_1 are an eigenvalue and an eigenvector of V.


The i-th principal component,

\begin{align*} (V - \lambda_i I) w_i &= 0 \\ w_i^\top w_i &= \begin{cases} 1 \quad (i = j) \\ 0 \quad (i \neq j) \end{cases} \end{align*}
\begin{align*} J_i &= w_i^\top V w_i - \lambda_i (w_i^\top w_i - 1) - \sum_{j=1}^{i-1} \mu_j (w_i^\top w_j - 0) \\ &= w_i^\top V w_i - \lambda_i (w_i^\top w_i - 1) - \sum_{j=1}^{i-1} \mu_j w_i^\top w_j \\ \end{align*}

where \lambda_i, \set{\mu_j}_{j=1}^{i-1} are Lagrange multipliers.


\begin{align*} \frac{\partial}{\partial w_i} J_i &= 2 V w_i - 2 \lambda_i w_i - \sum_{j=1}^{i-1} \mu_j w_j = 0 \\ \end{align*}

multiplying w_k^\top \, k \neq i from the left,

\begin{align*} w_k^\top V w_i - \lambda_i w_k^\top w_i - \sum_{j=1}^{i-1} \mu_j w_k^\top w_j = 0 \\ w_k^\top V w_i - \mu_k = 0 \\ \end{align*}
\begin{align*} \mu_k &= w_k^\top V w_i \\ &= w_k^\top \lambda_i I w_i \\ &= \lambda_i w_k^\top w_i \\ &= 0 \end{align*}

,therefore

\begin{align*} \frac{\partial}{\partial w_i} J_i &= V w_i - \lambda_i w_i = 0 \\ &\iff V w_i = \lambda_i w_i \end{align*}

For each \lambda_i, this is an eigenvalue problem.

Discussion