iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🧠

I Turned a Theoretical Paper I Don't Understand Into a Browser-Based Tool: Active Inference × Claude Code

に公開

In a corner of the AI research report that arrives at 5:00 AM every morning, I saw the term "Active Inference." The idea that the brain holds an internal model of the world and chooses actions to minimize the discrepancy between prediction and reality—a framework that explains perception, action, and decision-making in biological organisms in a unified way.

For some reason, it caught my attention the moment I read it.

Within that same day, I cloned the original paper's code, and from planning to completion, it took about two hours. I stripped away all the PyTorch, rewrote it in pure NumPy, and built a new interactive visualization UI running in the browser using Streamlit, which did not exist in the original code. 1550 lines of code. 98% test coverage.

At the mathematical level, I do not understand the content of the theory.


I don't even know why I was drawn to this

While studying Generative AI, I encountered active inference—a theory derived from Karl Friston's Free Energy Principle, as mentioned at the beginning.

To be honest, even if I follow the equations, I get thrown off by the third line. I understand concepts like Variational Inference and Bayesian estimation, but I have no intuitive feel for them.

Still, the impulse to "make this run" would not go away. One of the development ideas listed in my daily-research (automated research report) was an "Active Inference visualization tool," and the moment I saw that, my hands started moving.

If you ask me why, I cannot answer. I built it while thinking to myself that I might be going a little crazy.

The era where this is possible

There is something I want to stop and think about here.

I turned a paper on computational neuroscience into a browser-based visualization tool, despite not understanding the theory. I rewrote PyTorch into NumPy, analytically derived the Jacobian, and wrote tests. I even added a UI to observe behavior in real-time while tweaking parameters. It took about two hours.

This is a testament to the power of Claude Code, and at the same time, it is terrifying.

I often hear the claim that "the only thing that will remain in the AI era is one's obsession (偏愛)." Skills and knowledge can be replaced by AI, but irrational attachments like "I don't know why, but I'm drawn to this" cannot. As I continue to use Claude Code, I have realized that this is not just mere posturing.

With Claude Code, countless people can now build tools for active inference. But there probably aren't many people who "get hooked on a single word in a morning research report and start building it without even knowing the theory." In a world where implementation power is democratized, the only remaining differentiator is the obsession that decides what to build.

This article is a record of that obsession.


The original paper and code—according to Claude Code

The paper I based this on is Priorelli et al. (2025) "Embodied decisions as active inference." It is a paper published in PLOS Computational Biology that models "embodied decision-making" using active inference.

According to Claude Code, the model is coordinated through four processes:

  1. Discrete Inference — Probabilistically choosing "what to grasp" (POMDP)
  2. Continuous Inference — Optimizing "how to move the arm" using predictive coding
  3. Kinematics — Forward kinematics of a 3-joint arm (angles → end-effector position)
  4. Body — A physical model that actually moves the arm according to the brain's beliefs

The original code is written in PyTorch + Pymunk + Pyglet. It requires a GPU environment and cannot run from a browser. It is code for researchers to run experiments locally, and there was no UI that allowed anyone to try it in a browser.

Design decisions: Why did I rewrite it and put it in the browser—as Claude Code says

Usually, I don't just take Claude Code's suggestions at face value. "Why that design?", "What other alternatives were considered?", "What are the trade-offs?"—I ask these questions every time. I dig deep into Claude Code's reasoning and don't start implementing until I'm convinced. Those who have read my past articles probably know this stance of mine.

This time was different.

I don't understand the underlying theory at all, so I have no way to verify why Claude Code said, "You should rewrite PyTorch to pure NumPy." Even when told, "The Jacobian can be derived in closed form," I don't understand what a Jacobian is at the mathematical level. "If it's three joints, it becomes a 2x3 matrix"—I see.

The following design decision table is a record of Claude Code's judgments. All I really understood was the motivation to "want to run it on Streamlit Cloud" and that "uv is fast."

Judgment Claude Code's Reason Rejected Alternatives
PyTorch → Pure NumPy To run on Streamlit Cloud. For 3 joints, the Jacobian can be written in closed form Keep PyTorch → Cannot deploy
Pymunk → Spring tracking C bindings don't run on Streamlit Cloud. Can be replaced in 8 lines MuJoCo → Overkill
frozen dataclass Hashable → @st.cache_data can be used directly. Prevents accidental changes to precision parameters dict → Not hashable
Plotly > Matplotlib Compatibility between interactive operations (hover, zoom) and Streamlit Matplotlib → Static
uv Fast. Complete with pyproject.toml. Automatic venv management with uv run poetry → Slow

There was one motivation I added of my own volition: the hope that "if I watch the rewriting process, I might understand it a little." The idea was that if I watched Claude Code rewrite from PyTorch to NumPy, I would at least be able to see "what is being calculated." In the end, this expectation was half-right and half-wrong. I became able to see "what is being calculated." However, "why that is being calculated" remains a mystery.

The biggest sticking point: The VJP sign seems to be wrong

80% of the rewrite work went smoothly. I gave Claude Code a mapping table of paper notation to code variables and a structure map of the original code as documentation, and we rewrote it file by file. I was just watching from the side. Thinking, "Huh, so that's the structure."

Suddenly, the simulation values exploded.

In Claude Code terms, "the external unit (the belief of the hand position) diverged exponentially." What I saw was the screen where values swelled to 1e+15 and turned into NaN in a few steps.

Even when I asked Claude Code, it repeated off-target suggestions like "let's try adjusting the parameters." I asked three times and got three different answers. Normally, I would grill it, "No, what's the basis for that?", but this time I didn't have a standard for judgment of the "correct direction" myself. I couldn't even say with certainty whether Claude Code's proposal was off-target or not. It's just a guess that it was off-target because it didn't end up being fixed.

In the end, I compared the original paper's code and our implementation with diff and Claude Code identified the cause. The -precision (negative sign of the precision matrix) factor implicitly contained in PyTorch's tensor.backward(eps) was missing in the manual implementation.

—I could only respond to this explanation with "I see." The following code is the corrected version.

# Unit class in continuous.py — Correct NumPy port of PyTorch backward(eps)
# self.x[0]: belief of the unit (hand position), self.pi_eta_x: precision parameter
#
# WRONG:   parent.grad += J.T @ eps              ← Diverges
# CORRECT: parent.grad += -precision * (J.T @ eps)  ← Stable

eps_eta_x = (self.x[0] - fk_pred) * self.pi_eta_x
parent_grad = -self.pi_eta_x * (J.T @ eps_eta_x)  # VJP with -π factor

I can see the difference between WRONG and CORRECT if I look. It's whether there is a minus sign or not. But I cannot understand why the minus sign is necessary.

Claude Code explained, "In predictive coding, the negative sign of the precision matrix is included in the VJP." Since I don't understand predictive coding, I cannot judge the correctness of this explanation.

I was only able to verify that this fix is numerically correct. Claude Code wrote a test comparing SciPy's numerical differentiation with the analytical Jacobian and confirmed they match with atol=1e-5.

# Analytical Jacobian test — comparison with numerical differentiation
def test_jacobian_vs_numerical():
    angles = np.array([0.3, -0.5, 0.1])
    lengths = np.array([0.4, 0.3, 0.2])
    J_analytical = analytical_jacobian(angles, lengths)
    J_numerical = approx_fprime(angles, lambda a: forward_kinematics(a, lengths))
    np.testing.assert_allclose(J_analytical, J_numerical, atol=1e-5)

Whether the numbers match and whether it is theoretically correct are different stories, but for someone who doesn't understand, this was the only anchor.

I discarded the physics engine—though it was Claude Code that discarded it

The original code simulates arm movement using Pymunk (a 2D physics engine). Since C extensions cannot be used on Streamlit Cloud, an alternative was needed.

Claude Code explained it like this:

The point of active inference is that "the brain's beliefs drive action." The body merely follows the beliefs, and that lag creates prediction error. Therefore, a physics engine is unnecessary and can be replaced with 8 lines of spring tracking.

# simulation.py — spring tracking without a physics engine (8 lines)
BODY_TRACKING_GAIN = 8.0
actual_angles_norm += (believed_angles_norm - actual_angles_norm) * gain * dt
# → The lag between the brain's beliefs and the body creates prediction error, driving the inference loop

250MB of PyTorch + Pymunk + Pyglet was replaced by 50MB of NumPy + SciPy + Streamlit. What was previously an experiment script that had to be run locally has become a visualization tool that allows you to observe behavior in real-time while changing parameters just by opening a browser. I don't know if this decision is theoretically sound. However, I can see that the value named "belief" changes and the arm moves toward the target. Whether that is the correct behavior for active inference is something I cannot judge.

Project Structure

I will list the structure of the 1,550 lines that Claude Code wrote in 2 hours. In addition to rewriting the calculation core, viz/ and app.py were built from scratch, as they did not exist in the original code.

src/active_inference_viz/        # 1550 lines
├── model/                        # 1113 lines — math core
│   ├── config.py     (139 lines)     — SimConfig (frozen), SimResult
│   ├── math_utils.py (230 lines)     — Jacobian, FK, softmax, BMC
│   ├── discrete.py   (176 lines)     — Discrete inference (POMDP)
│   ├── continuous.py  (258 lines)     — Predictive coding (Unit/Obs)
│   ├── brain.py      (162 lines)     — Integration of discrete+continuous
│   └── simulation.py (148 lines)     — trial loop
├── viz/                          # 265 lines — visualization
│   ├── theme.py       (58 lines)
│   ├── arm_view.py   (122 lines)     — 2D arm display
│   └── belief_panel.py (85 lines)    — belief time series
└── app.py            (176 lines)     — Streamlit UI

tests/                            # 519 lines, 50 tests, 98% coverage

The comments on the right are exactly as Claude Code noted them. I don't even know what "BMC" stands for (apparently, it's Bayesian Model Comparison).

I don't know if I've done it correctly

The tests pass. The comparison with numerical differentiation also matches. The arm moves on Streamlit. When a cue is presented, the belief changes, and the hand moves toward the target.

But I don't know if this is "correct as Active Inference."

Are the update equations for predictive coding theoretically correct? Is the timing for integrating discrete and continuous inference appropriate? Is the calculation of EFE (Expected Free Energy) correct? I checked the correspondence between the paper's formulas and the code with Claude Code—or to be precise, Claude Code checked it, said "there are no issues," and I believed it.

I should be honest about this. With Claude Code, you can build something that "works" even if you don't understand the theory. If the tests pass and it is numerically stable, it looks correct on the surface. But since I lack the ability to judge its quality myself, it remains an unverified implementation.

I have published it on GitHub, so I am waiting for feedback from those who are knowledgeable about Active Inference.

https://github.com/shimo4228/active-inference-viz

Why I couldn't write this article for a week

I write articles about everything I do with Claude Code. I usually write them on the day of implementation or the next day. Writing an article serves as the "Check" phase of the PDCA cycle and helps me with my own learning.

I left only this project untouched for a week.

The reason I couldn't write it is simple: I didn't feel like I had learned anything from this project. Usually, I can write, "I made this design decision for this reason," or "I got stuck here, and here is the lesson." This time, there is none. Claude Code did everything. I was just watching from the sidelines. I know what happened, but I don't know what I learned.

After a week, I realized that "there is value in writing about the very fact that there is no learning." That is why I am writing this article.

Technical learning — none

I'll be honest. I gained no technical learning through this development.

I do not understand the technical content written in this article so far—VJP, Jacobian, predictive coding, precision matrix, chain rule. I am just putting Claude Code's explanations as they are, and there was never a moment where it clicked and I thought, "I see, that's what it means."

If there is one thing I learned, it is the fact itself that you can build things without understanding them. The tests pass, the values are stable, and the arm moves on the screen. Looking at that, I can think, "I'm done." But if I were asked to explain in my own words "what I completed," I couldn't say anything beyond what is written in this article.

I don't know if this can be called learning.

An era where偏愛 (partiality/devotion) remains

As I write this article, I am thinking about it again.

Claude Code is a terrifying tool. It turns a computational neuroscience paper into a browser-based visualization tool in two hours by an amateur. In addition to rewriting the calculation engine, it built a new interactive UI that didn't exist in the original code. 98% test coverage, full type checking. Would a professional researcher point out fatal errors? Or would they say, "It's formally well-made"? I don't have the power to judge either way.

But Claude Code cannot answer "why Active Inference?" Out of the dozens of themes lined up in my daily research report, why did I reach out for this one in particular? There is no rational explanation. If I had to force one, I could only say that the picture of "the brain having a model of the world and minimizing the discrepancy between prediction and reality" felt like it suggested something about the relationship between AI and humans.

What I realized after finishing the Active Inference tool was that in a world where implementation power is democratized, "why I want to build this" is far rarer than "what I can build." Irrational attachment—devotion—is the only unique value left to humans.

I don't know if the theory is correctly implemented. But the fact that "I wanted to build it even though I didn't understand it" is undoubtedly real.


References

GitHubで編集を提案

Discussion