🐑

Kernel regression

2024/02/14に公開

Given a dataset $\mathcal{D} = \{(x_n, y_n)\}_{n=1}^N$ , kernel regression can be described as follows:

\begin{align*} f(x) = \sum_{n=1}^N \alpha_n k(x_n, x) \end{align*}

where $k(\cdot, \cdot)$ is called as a kernel function.

In matrix notation, it can be written as:

\begin{align*} \underbrace{ \begin{pmatrix} \hat{y}_1 \\ \hat{y}_2 \\ \vdots \\ \hat{y}_N \end{pmatrix} }_{\hat{y} \in \R^N} = \underbrace{ \begin{pmatrix} k(x_1, x_1) & k(x_1, x_2) & \dots & k(x_1, x_N) \\ \vdots & & & \vdots \\ k(x_N, x_1) & k(x_N, x_2) & \dots & k(x_1N x_N) \\ \end{pmatrix} }_{K \in \R^{N \times N}} \underbrace{ \begin{pmatrix} \alpha_1 \\ \alpha_2 \\ \vdots \\ \alpha_N \\ \end{pmatrix} }_{\alpha \in \R^N} \end{align*}

The matrix $K$ is called the gram matrix or the kernel matrix.

This is very similar to Linear regression model. By introducing a normalizing constant $\lambda \gt 0$ , we can obtain $\alpha_*$ ,

\begin{align*} \mathcal{L}(\alpha) &= (y - K \alpha)^\top (y - K \alpha) + \lambda \alpha^\top K \alpha \\ &= y^\top y - 2 y^\top K \alpha + \alpha^\top K^\top K \alpha + \lambda \alpha^\top K \alpha \\ \frac{\partial \mathcal{L}(\alpha) }{\partial \alpha} &= - 2 y^\top K + \left( K^\top K + (K^\top K)^\top \right) \alpha + \lambda K\alpha \\ &= - 2 y^\top K + 2 (K^\top + \lambda I) K \alpha \\ \alpha_* &= \underset{\alpha}{\arg\min} \, \mathcal{L}(\alpha) = (K + \lambda I)^{-1} y ,\quad (\therefore \partial \mathcal{L}(\alpha) / \partial \alpha = 0) \end{align*}

This is known as kernel ridge regression.

Here is a Python implementation and its result.

class KernelRegressor:
    def __init__(self, kernel, lmd=1.0e-3):
        self.kernel = kernel
        self.lmd = lmd
        self.alpha = None

    def fit(self, X, y):
        self.X = X
        self.y = y

        K = self.kernel(X, X) + self.lmd * np.eye(len(X)) 

        # alpha = A^{-1} b : (N, 1)
        A = K  # (N, N)
        b = y # (N,)
        self.alpha = np.linalg.solve(A, b)

        return self

    def predict(self, X):
        K = self.kernel(X, self.X)  # X: (M, D), self.X: (N, D), K: (M, D)
        y_hat = K @ self.alpha  # y_hat: (M, )

        return y_hat

# Instantiate the model
import tinygp

kernel = tinygp.kernels.ExpSquared(scale=1.0)
kr = KernelRegressor(kernel=kernel, lmd=1.0e-1).fit(X, y)

The kernel function $k(x_i, x_j) = \exp(\| x_i - x_j \|^2 / \sigma)$ is used with parameters $\sigma=1.0$ , $\lambda = 0.1$ .

Discussion