Given a dataset \mathcal{D} = \{(x_n, y_n)\}_{n=1}^N, kernel regression can be described as follows:
\begin{align*}
f(x) = \sum_{n=1}^N \alpha_n k(x_n, x)
\end{align*}
where k(\cdot, \cdot) is called as a kernel function.
In matrix notation, it can be written as:
\begin{align*}
\underbrace{
\begin{pmatrix}
\hat{y}_1 \\
\hat{y}_2 \\
\vdots \\
\hat{y}_N
\end{pmatrix}
}_{\hat{y} \in \R^N} =
\underbrace{
\begin{pmatrix}
k(x_1, x_1) & k(x_1, x_2) & \dots & k(x_1, x_N) \\
\vdots & & & \vdots \\
k(x_N, x_1) & k(x_N, x_2) & \dots & k(x_1N x_N) \\
\end{pmatrix}
}_{K \in \R^{N \times N}}
\underbrace{
\begin{pmatrix}
\alpha_1 \\
\alpha_2 \\
\vdots \\
\alpha_N \\
\end{pmatrix}
}_{\alpha \in \R^N}
\end{align*}
The matrix K is called the gram matrix or the kernel matrix.
This is very similar to Linear regression model. By introducing a normalizing constant \lambda \gt 0, we can obtain \alpha_*,
\begin{align*}
\mathcal{L}(\alpha) &= (y - K \alpha)^\top (y - K \alpha) + \lambda \alpha^\top K \alpha \\
&= y^\top y - 2 y^\top K \alpha + \alpha^\top K^\top K \alpha + \lambda \alpha^\top K \alpha \\
\frac{\partial \mathcal{L}(\alpha) }{\partial \alpha} &= - 2 y^\top K + \left( K^\top K + (K^\top K)^\top \right) \alpha + \lambda K\alpha \\
&= - 2 y^\top K + 2 (K^\top + \lambda I) K \alpha \\
\alpha_* &= \underset{\alpha}{\arg\min} \, \mathcal{L}(\alpha) = (K + \lambda I)^{-1} y ,\quad (\therefore \partial \mathcal{L}(\alpha) / \partial \alpha = 0)
\end{align*}
This is known as kernel ridge regression.
Here is a Python implementation and its result.
class KernelRegressor:
def __init__(self, kernel, lmd=1.0e-3):
self.kernel = kernel
self.lmd = lmd
self.alpha = None
def fit(self, X, y):
self.X = X
self.y = y
K = self.kernel(X, X) + self.lmd * np.eye(len(X))
A = K
b = y
self.alpha = np.linalg.solve(A, b)
return self
def predict(self, X):
K = self.kernel(X, self.X)
y_hat = K @ self.alpha
return y_hat
import tinygp
kernel = tinygp.kernels.ExpSquared(scale=1.0)
kr = KernelRegressor(kernel=kernel, lmd=1.0e-1).fit(X, y)
The kernel function k(x_i, x_j) = \exp(\| x_i - x_j \|^2 / \sigma) is used with parameters \sigma=1.0, \lambda = 0.1.
Discussion