😺

Matrix multiplication

2024/02/24に公開

Let $A \in \R^{N \times M}$ be an $(N, M)$ matrix, $A_m$ be the $m$ -th column vector, and $a_n$ be the $n$ -th row vector of $A$ .

\begin{align*} A = \begin{pmatrix} a_{11} & a_{12} & \dots & a_{1M} \\ a_{21} & a_{22} & \dots & a_{2M} \\ \vdots & \vdots & \ddots & \vdots \\ a_{N1} & a_{N2} & \dots & a_{NM} \\ \end{pmatrix} ,\quad A_m = \begin{pmatrix} a_{1m} \\ a_{2m} \\ \vdots \\ a_{Nm} \end{pmatrix} \in \R^N ,\quad a_n = \begin{pmatrix} a_{n1} \\ a_{n2} \\ \dots \\ a_{nM} \end{pmatrix} \in \R^M \end{align*}

Using the column vector and the row vector, $A$ can also be represented as:

\begin{align*} A = \begin{pmatrix} A_1 & A_2 & \dots & A_M \end{pmatrix} = \begin{pmatrix} a_1^\top \\ a_2^\top \\ \vdots \\ a_N^\top \end{pmatrix} \end{align*}

Let's consider a matrix vector multiplication $A e_m$ where $e_m$ represents a unit vector whose $m$ -th element is $1$ otherwise $0$ .

\begin{align*} A e_m &= A_m ,\quad e_m = \begin{pmatrix} 0 \\ \vdots \\ 1 \\ \vdots \\ 0 \end{pmatrix} \in \R^N \end{align*}

We can regard the matrix $A$ as the list of destinations of each $\set{e_m}_{m=1}^M$ .

Next, let's consider $A x$ .

\begin{align*} A x &= \sum_{m=1}^M x_m A_m ,\quad x = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_M \end{pmatrix} \in \R^M ,\quad x_m \in \R ,\quad a_m \in \R^N \end{align*}

We can understand this equation as $A x$ is decomposed as the summation of directions $A_m$ with the weight $x_m$ .

From another point of view, $A x$ will be a list of inner products $a_n^\top x$ .

\begin{align*} A x &= \begin{pmatrix} a_1^\top x \\ a_2^\top x \\ \vdots \\ a_N^\top x \\ \end{pmatrix} \in \R^M ,\quad a_n = \begin{pmatrix}a_{n1} \\ \vdots \\ a_{nM}\end{pmatrix} \in \R^M \end{align*}

where $a_n$ is the $n$ -th row vector of $A$ .

Let $A \in \R^{N \times N}$ be a square matrix,

\begin{align*} A^{-1} A_m &= e_m \\ A^{-1} \begin{pmatrix} A_1 & A_2 & \dots & A_M \end{pmatrix} &= \begin{pmatrix} e_1 & e_2 & \dots & e_M \end{pmatrix} \end{align*}

\begin{align*} \end{align*}

\begin{align*} A^{-1} \left( \sum_{n=1}^N x_n A_n \right) = x ,\quad x \in \R^{N} \end{align*}

because

\begin{align*} A x = \sum_{n=1}^N x_n A_n \\ A^{-1} A x = A^{-1} \left( \sum_{n=1}^N x_n A_n \right) \\ x = A^{-1} \left( \sum_{n=1}^N x_n A_n \right) \\ \end{align*}

If $A^{-1}$ exists, $\forall y \in \R^N$ can be represented as a linear combination $y = x_1 A_1 + \dots + x_N A_N = A x$ .

And the weights $x = \set{x_n}_{n=1}^N$ can be obtained by $A^{-1} y$ because

\begin{align*} y = A x ,\\ A^{-1} y = A^{-1} A x ,\\ A^{-1} y = x . \\ \end{align*}