When a dataset X = \set{x_n}_{n=1}^N ,\, x_n \in \R^D is given, we aim to obtain Z \subseteq \R^{N \times Q}, where Q \ll D, representing a low-dimensional expression of X.
Let's consider the first principal component z_1 \in \R, and define
\begin{align*}
z_{1n} &= x_n^\top w_1, \quad w_1 \in \R^D \\
\end{align*}
assuming E[z_1] = 0.
The variance of z_1 is
\begin{align*}
\mathrm{Var}[z_1] &= \frac{1}{N} \sum_{n=1}^N (z_{1n} - E[z_1])^2 \\
&= \frac{1}{N} \sum_{n=1}^N (x_n^\top w_1)^2 \\
&= \frac{1}{N} \sum_{n=1}^N (x_n^\top w_1)(x_n^\top w_1) \\
&= \frac{1}{N} \sum_{n=1}^N w_1^\top x_n x_n^\top w_1\\
&= w_1^\top \left( \frac{1}{N} \sum_{n=1}^N x_n x_n^\top \right) w_1\\
&= w_1^\top \left( \frac{1}{N} X X^\top \right) w_1 ,\quad V \in \R^{N \times N}\\
&= w_1^\top V w_1\\
\end{align*}
Here, I define V = \frac{1}{N} X X^\top as the covariance matrix.
To maximize \mathrm{Var}[z_1], we introduce the Lagrange multiplier \lambda_1 with the constraint \| w_1 \|^2 = 1,
\begin{align*}
J_1 &= w_1^\top V w_1 - \lambda_1 (w_1^\top w_1 - 1) \\
\frac{\partial}{\partial w_1} J_1 &= (V + V^\top) w_1 - \lambda_1 (I + I^\top) w_1 \\
&= 2 V w_1 - 2 \lambda_1 w_1 \\
&= 2 (V - \lambda_1 I) w_1 \\
&= 0 \\
\end{align*}
therefore, the \lambda_1 and w_1 that maximize J_1 satisfy
\begin{align*}
(V - \lambda_1 I) w_1 &= 0 \\
\iff V w_1 = \lambda_1 w_1
\end{align*}
, indicating that \lambda_1 and w_1 are an eigenvalue and an eigenvector of V.
The i-th principal component,
\begin{align*}
(V - \lambda_i I) w_i &= 0 \\
w_i^\top w_i &= \begin{cases}
1 \quad (i = j) \\
0 \quad (i \neq j)
\end{cases}
\end{align*}
\begin{align*}
J_i &= w_i^\top V w_i - \lambda_i (w_i^\top w_i - 1) - \sum_{j=1}^{i-1} \mu_j (w_i^\top w_j - 0) \\
&= w_i^\top V w_i - \lambda_i (w_i^\top w_i - 1) - \sum_{j=1}^{i-1} \mu_j w_i^\top w_j \\
\end{align*}
where \lambda_i, \set{\mu_j}_{j=1}^{i-1} are Lagrange multipliers.
\begin{align*}
\frac{\partial}{\partial w_i} J_i &= 2 V w_i - 2 \lambda_i w_i - \sum_{j=1}^{i-1} \mu_j w_j = 0 \\
\end{align*}
multiplying w_k^\top \, k \neq i from the left,
\begin{align*}
w_k^\top V w_i - \lambda_i w_k^\top w_i - \sum_{j=1}^{i-1} \mu_j w_k^\top w_j = 0 \\
w_k^\top V w_i - \mu_k = 0 \\
\end{align*}
\begin{align*}
\mu_k &= w_k^\top V w_i \\
&= w_k^\top \lambda_i I w_i \\
&= \lambda_i w_k^\top w_i \\
&= 0
\end{align*}
,therefore
\begin{align*}
\frac{\partial}{\partial w_i} J_i &= V w_i - \lambda_i w_i = 0 \\
&\iff V w_i = \lambda_i w_i
\end{align*}
For each \lambda_i, this is an eigenvalue problem.
Discussion