iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🥅

Understanding Neural Networks in Julia

に公開

Introduction

The Julia language is highly compatible with both natural science and machine learning, and it demonstrates immense power in their combination. In this article, we will combine Julia's machine learning packages with a symbolic computation system to explicitly write down and understand the functional forms of neural networks, which are often treated as black boxes.

Packages

Install the following packages.

Installing packages
import Pkg
Pkg.add("Symbolics")
Pkg.add("CairoMakie")
Pkg.add("Flux")
Pkg.add("Lux")

Moving forward, we will build machine learning models (specifically neural networks here) using the Julia machine learning packages Flux.jl and Lux.jl, and check their functional forms by substituting objects created with Symbolics.jl. For Flux.jl and Lux.jl, please refer to this article, and for Symbolics.jl, refer to this article.

How to Use Symbolics.jl

Symbolics.jl displays the calculation results of objects created like @variables x beautifully in LaTeX. By substituting this x into a neural network, you can display its functional form in LaTeX.

How to use Symbolics.jl
using Symbolics
@variables x
simplify(exp(x) * exp(2x))
\begin{equation} e^{3 x} \end{equation}

Building a Neural Network with Flux.jl and Confirming Its Functional Form

Referring to the code in "5-5. Function Approximation by Neural Networks" in Introduction to Numerical Computation with Julia by Yuki Nagai (Gijutsu-Hyoron-sha, 2024), let's construct a fully-connected neural network with a single hidden layer as follows.

In this example, we have used the sine function sin as the activation function to keep the functional form simple. Normally, functions like Flux.relu or Flux.sigmoid are used instead of sin.

Building a neural network with Flux.jl and confirming its functional form
# Loading Flux.jl
using Flux

# Building a model with Flux.jl
model = Flux.Chain(
    x -> [x],              # Since the input is expected to be a 1-by-n matrix, convert it from a real number
    Flux.Dense(1, 2, sin), # Input layer -> Hidden layer (2 units)
    Flux.Dense(2, 1),      # Hidden layer -> Output layer
    x -> sum(x)            # Since a 1-by-n matrix is output, convert it back to a real number
)

# Confirming the functional form
model(x)
\begin{equation} - 0.28759 \sin\left( - 0.86971 x \right) - 0.35904 \sin\left( - 1.1822 x \right) \end{equation}

While the functional form of the neural network has been confirmed, note that the initial weights are determined by random numbers and the initial biases are 0.

How to check biases
model[2].bias
# 2-element Vector{Float32}:
#  0.0
#  0.0

Specifying weights and biases as follows makes it easier to verify the functional form. For example, since a weight is a matrix, we can represent w^{(3)}_{23} as the three-digit number 323 and w^{(2)}_{12} as 212, where the digits correspond to the layer, row, and column respectively. Since a bias is a vector, we can represent b^{(3)}_{1} as the two-digit number 31.

Specifying weights and biases
model[2].weight[1,1] = 211
model[2].weight[2,1] = 221
model[2].bias[1]     = 21
model[2].bias[2]     = 22
model[3].weight[1,1] = 311
model[3].weight[1,2] = 312
model[3].bias[1]     = 31
model(x)
\begin{equation} 31 + 311 \sin\left( 21 + 211 x \right) + 312 \sin\left( 22 + 221 x \right) \end{equation}

Comparing this to the notation in Introduction to Physics of Learning by Akio Tomiya, Koji Hashimoto, Takumi Kaneko, Masato Taki, Yuji Hirono, Ryo Karakida, and Akiyoshi Sannai, edited by Koji Hashimoto (Asakura Shoten, 2024) A2. Neural Networks (NN), we have confirmed that the functional form is as follows:

f_\theta(x) = w^{(3)}_{11} \sigma(w^{(2)}_{11} z^{(1)} + b^{(2)}_1) + w^{(3)}_{12} \sigma(w^{(2)}_{21} z^{(1)} + b^{(2)}_2) + b^{(3)}_1

where \sigma(x) = \sin(x), z^{(1)} = x, and \theta = \{ w^{(2)}_{11}, w^{(2)}_{21}, b^{(2)}_1, b^{(2)}_2, w^{(3)}_{12}, w^{(3)}_{11}, b^{(3)}_1 \}. The specific values of this function can be calculated as follows:

Specific values of the neural network
@show model(0.0)
@show model(0.5)
@show model(1.0)
# model(0.0) = 288.4383f0
# model(0.5) = 425.71756f0
# model(1.0) = -389.70456f0

Let's double-check the results with a calculator.

\begin{aligned} f_\theta(0) &= 311 \sin(21) + 312 \sin(22) + 31 \\ &= 311 \times 0.8366556385360561 + 312 \times -0.008851309290403876 + 31 \\ &= 288.4382950861074 \end{aligned}

It certainly matches. With Julia, you can verify it as follows:

Verification
x = 0
311 * sin(211*x+21) + 312 * sin(221*x+22) + 31
# 288.4382950861074

Naturally, you can also plot it as a graph. Note that the parameters here are different from those used above. In this case, the parameters are specified so that f_\theta(x) = \sin(x).

Plotting the neural network
# Building the neural network
using Flux
model = Flux.Chain(
    x -> [x],              # Since the input is expected to be a 1-by-n matrix, convert it from a real number
    Flux.Dense(1, 2, sin), # Input layer -> Hidden layer (2 units)
    Flux.Dense(2, 1),      # Hidden layer -> Output layer
    x -> sum(x)            # Since a 1-by-n matrix is output, convert it back to a real number
)

# Specifying weights and biases
model[2].weight[1,1] = 1.0
model[2].weight[2,1] = 0.0
model[2].bias[1]     = 0.0
model[2].bias[2]     = 0.0
model[3].weight[1,1] = 1.0
model[3].weight[1,2] = 0.0
model[3].bias[1]     = 0.0

# Plot
using CairoMakie
fig = Figure(size=(420,300), fontsize=11.5, backgroundcolor=:transparent)
axis = Axis(fig[1,1], xlabel=L"$x$", ylabel=L"$y$", xlabelsize=16.5, ylabelsize=16.5)
lines!(axis, 0..5, x -> model(x), label="model")
fig

You can also use vectors for inputs and outputs or increase the number of units in each layer. While it is easier to treat it as a black box, it is important to realize that it can also be written down as an explicit mathematical formula.

A more complex example
# Creating objects with Symbolics.jl
using Symbolics
@variables x₁, x₂

# Building a neural network with Flux.jl
using Flux
model = Flux.Chain(
    Flux.Dense(2, 4, sin), # Input layer -> Hidden layer (4 units)
    Flux.Dense(4, 4, sin), # Hidden layer -> Hidden layer (4 units)
    Flux.Dense(4, 2),      # Hidden layer -> Output layer
)

# Specifying weights and biases
for i in 1:length(model)
    for k in keys(model[i].weight)
        model[i].weight[k] = 100i + 10k[1]+ k[2]
    end
    for j in keys(model[i].bias)
        model[i].bias[j] = 10i + j
    end
end

# Confirming the functional form
model([x₁, x₂])
\begin{equation} \left[ \begin{array}{c} 31 + 314 \sin\left( 24 + 241 \sin\left( 11 + 111 \mathtt{x{_1}} + 112 \mathtt{x{_2}} \right) + 244 \sin\left( 14 + 141 \mathtt{x{_1}} + 142 \mathtt{x{_2}} \right) + 243 \sin\left( 13 + 131 \mathtt{x{_1}} + 132 \mathtt{x{_2}} \right) + 242 \sin\left( 12 + 121 \mathtt{x{_1}} + 122 \mathtt{x{_2}} \right) \right) + 311 \sin\left( 21 + 211 \sin\left( 11 + 111 \mathtt{x{_1}} + 112 \mathtt{x{_2}} \right) + 214 \sin\left( 14 + 141 \mathtt{x{_1}} + 142 \mathtt{x{_2}} \right) + 213 \sin\left( 13 + 131 \mathtt{x{_1}} + 132 \mathtt{x{_2}} \right) + 212 \sin\left( 12 + 121 \mathtt{x{_1}} + 122 \mathtt{x{_2}} \right) \right) + 312 \sin\left( 22 + 221 \sin\left( 11 + 111 \mathtt{x{_1}} + 112 \mathtt{x{_2}} \right) + 224 \sin\left( 14 + 141 \mathtt{x{_1}} + 142 \mathtt{x{_2}} \right) + 223 \sin\left( 13 + 131 \mathtt{x{_1}} + 132 \mathtt{x{_2}} \right) + 222 \sin\left( 12 + 121 \mathtt{x{_1}} + 122 \mathtt{x{_2}} \right) \right) + 313 \sin\left( 23 + 231 \sin\left( 11 + 111 \mathtt{x{_1}} + 112 \mathtt{x{_2}} \right) + 234 \sin\left( 14 + 141 \mathtt{x{_1}} + 142 \mathtt{x{_2}} \right) + 233 \sin\left( 13 + 131 \mathtt{x{_1}} + 132 \mathtt{x{_2}} \right) + 232 \sin\left( 12 + 121 \mathtt{x{_1}} + 122 \mathtt{x{_2}} \right) \right) \\ 32 + 324 \sin\left( 24 + 241 \sin\left( 11 + 111 \mathtt{x{_1}} + 112 \mathtt{x{_2}} \right) + 244 \sin\left( 14 + 141 \mathtt{x{_1}} + 142 \mathtt{x{_2}} \right) + 243 \sin\left( 13 + 131 \mathtt{x{_1}} + 132 \mathtt{x{_2}} \right) + 242 \sin\left( 12 + 121 \mathtt{x{_1}} + 122 \mathtt{x{_2}} \right) \right) + 321 \sin\left( 21 + 211 \sin\left( 11 + 111 \mathtt{x{_1}} + 112 \mathtt{x{_2}} \right) + 214 \sin\left( 14 + 141 \mathtt{x{_1}} + 142 \mathtt{x{_2}} \right) + 213 \sin\left( 13 + 131 \mathtt{x{_1}} + 132 \mathtt{x{_2}} \right) + 212 \sin\left( 12 + 121 \mathtt{x{_1}} + 122 \mathtt{x{_2}} \right) \right) + 322 \sin\left( 22 + 221 \sin\left( 11 + 111 \mathtt{x{_1}} + 112 \mathtt{x{_2}} \right) + 224 \sin\left( 14 + 141 \mathtt{x{_1}} + 142 \mathtt{x{_2}} \right) + 223 \sin\left( 13 + 131 \mathtt{x{_1}} + 132 \mathtt{x{_2}} \right) + 222 \sin\left( 12 + 121 \mathtt{x{_1}} + 122 \mathtt{x{_2}} \right) \right) + 323 \sin\left( 23 + 231 \sin\left( 11 + 111 \mathtt{x{_1}} + 112 \mathtt{x{_2}} \right) + 234 \sin\left( 14 + 141 \mathtt{x{_1}} + 142 \mathtt{x{_2}} \right) + 233 \sin\left( 13 + 131 \mathtt{x{_1}} + 132 \mathtt{x{_2}} \right) + 232 \sin\left( 12 + 121 \mathtt{x{_1}} + 122 \mathtt{x{_2}} \right) \right) \\ \array \right] \end{equation}

Building a Neural Network with Lux.jl and Confirming Its Functional Form

For information on how to migrate from Flux.jl to Lux.jl, please refer to this article. You will obtain the same results.

Building a neural network with Lux.jl and confirming its functional form
# Loading Lux.jl
using Lux
using Random

# Building a neural network with Flux.jl
model = Lux.Chain(
    x -> [x],
    Lux.Dense(1 => 2, sin),
    Lux.Dense(2 => 1),
    x -> sum(x),
)

# Specifying weights and biases
rng = Random.MersenneTwister(123)
ps, st = Lux.setup(rng, model)
ps = (
    layer_1 = NamedTuple(),
    layer_2 = (weight = [211; 221;;], bias = [221, 222]),
    layer_3 = (weight = [311 312], bias = [321]),
    layer_4 = NamedTuple()
)

# Confirming the functional form
using Symbolics
@variables x
first(model(x, ps, st))
\begin{equation} 31 + 311 \sin\left( 21 + 211 x \right) + 312 \sin\left( 22 + 221 x \right) \end{equation}

Summary

We constructed neural networks using both Flux.jl and Lux.jl, and confirmed their functional forms by substituting objects created with Symbolics.jl. We hope that explicitly writing out the functional forms has deepened your understanding of neural networks.

Version Information

Version Information
Julia v1.11.2
Symbolics v6.22.1
CairoMakie v0.12.18
Flux v0.16.0
Lux v1.4.4

References

https://gist.github.com/ohno/62e9c3ff71f1836bd300a41a368b00e1

Discussion