Jupyter Notebook ファイルの中身

Jupyter Notebook は便利だが、たまにブラウザではなくターミナルから中身を見たり編集したいことがある。そのためには、Jupyter Notebook ファイルの仕様を知らないといけない。
備忘録として、*.ipynb
ファイルの中身の構造がどうなっているのかを簡単にまとめる。

例えば、JAX Quickstart をダウンロードしてきて、テキストファイルとして開くとこんな感じ:
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "xtWX4x9DCF5_"
},
"source": [
"# JAX Quickstart\n",
"\n",
"[](https://colab.research.google.com/github/google/jax/blob/main/docs/notebooks/quickstart.ipynb) [](https://kaggle.com/kernels/welcome?src=https://github.com/google/jax/blob/main/docs/notebooks/quickstart.ipynb)\n",
"\n",
"**JAX is NumPy on the CPU, GPU, and TPU, with great automatic differentiation for high-performance machine learning research.**\n",
"\n",
"With its updated version of [Autograd](https://github.com/hips/autograd), JAX\n",
"can automatically differentiate native Python and NumPy code. It can\n",
"differentiate through a large subset of Python’s features, including loops, ifs,\n",
"recursion, and closures, and it can even take derivatives of derivatives of\n",
"derivatives. It supports reverse-mode as well as forward-mode differentiation, and the two can be composed arbitrarily\n",
"to any order.\n",
"\n",
"What’s new is that JAX uses\n",
"[XLA](https://www.tensorflow.org/xla)\n",
"to compile and run your NumPy code on accelerators, like GPUs and TPUs.\n",
"Compilation happens under the hood by default, with library calls getting\n",
"just-in-time compiled and executed. But JAX even lets you just-in-time compile\n",
"your own Python functions into XLA-optimized kernels using a one-function API.\n",
"Compilation and automatic differentiation can be composed arbitrarily, so you\n",
"can express sophisticated algorithms and get maximal performance without having\n",
"to leave Python."
]
},
...
どうやら *.ipynb
の中身は JSON のようだ。

実は Jupyter Notebook 公式ドキュメント に色々と書いてあったりする。
Notebook documents contains the inputs and outputs of a interactive session as well as additional text that accompanies the code but is not meant for execution. In this way, notebook files can serve as a complete computational record of a session, interleaving executable code with explanatory text, mathematics, and rich representations of resulting objects. These documents are internally JSON files and are saved with the .ipynb extension. Since JSON is a plain text format, they can be version-controlled and shared with colleagues.
なるほどね。
テキスト形式であることによって、バージョン管理できるという利点が生じるみたい。
Notebooks may be exported to a range of static formats, including HTML (for example, for blog posts), reStructuredText, LaTeX, PDF, and slide shows, via the
nbconvert
command.
nbconvert
という便利コマンドがあるみたい。

中身の JSON をじっと見つめていると、なんとなく構造が見えてくる。
以下のような簡単なコードで、*.ipynb
からソースコード部分のみを取り出せる。
import json
path = "/path/to/notebook.ipynb"
data = json.load(open(path))
for cell in data["cells"]:
if cell["cell_type"] == "code":
source = "".join(cell["source"])
print(f"Source:\n{source}\n")
出力例:
Source:
import jax.numpy as jnp
from jax import grad, jit, vmap
from jax import random
Source:
key = random.PRNGKey(0)
x = random.normal(key, (10,))
print(x)
Source:
size = 3000
x = random.normal(key, (size, size), dtype=jnp.float32)
%timeit jnp.dot(x, x.T).block_until_ready() # runs on the GPU
...