Open4

Jupyter Notebook ファイルの中身

uint256_tuint256_t

Jupyter Notebook は便利だが、たまにブラウザではなくターミナルから中身を見たり編集したいことがある。そのためには、Jupyter Notebook ファイルの仕様を知らないといけない。
備忘録として、*.ipynb ファイルの中身の構造がどうなっているのかを簡単にまとめる。

uint256_tuint256_t

例えば、JAX Quickstart をダウンロードしてきて、テキストファイルとして開くとこんな感じ:

{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xtWX4x9DCF5_"
      },
      "source": [
        "# JAX Quickstart\n",
        "\n",
        "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google/jax/blob/main/docs/notebooks/quickstart.ipynb) [![Open in Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/google/jax/blob/main/docs/notebooks/quickstart.ipynb)\n",
        "\n",
        "**JAX is NumPy on the CPU, GPU, and TPU, with great automatic differentiation for high-performance machine learning research.**\n",
        "\n",
        "With its updated version of [Autograd](https://github.com/hips/autograd), JAX\n",
        "can automatically differentiate native Python and NumPy code. It can\n",
        "differentiate through a large subset of Python’s features, including loops, ifs,\n",
        "recursion, and closures, and it can even take derivatives of derivatives of\n",
        "derivatives. It supports reverse-mode as well as forward-mode differentiation, and the two can be composed arbitrarily\n",
        "to any order.\n",
        "\n",
        "What’s new is that JAX uses\n",
        "[XLA](https://www.tensorflow.org/xla)\n",
        "to compile and run your NumPy code on accelerators, like GPUs and TPUs.\n",
        "Compilation happens under the hood by default, with library calls getting\n",
        "just-in-time compiled and executed. But JAX even lets you just-in-time compile\n",
        "your own Python functions into XLA-optimized kernels using a one-function API.\n",
        "Compilation and automatic differentiation can be composed arbitrarily, so you\n",
        "can express sophisticated algorithms and get maximal performance without having\n",
        "to leave Python."
      ]
    },
...

どうやら *.ipynb の中身は JSON のようだ。

uint256_tuint256_t

実は Jupyter Notebook 公式ドキュメント に色々と書いてあったりする。

Notebook documents contains the inputs and outputs of a interactive session as well as additional text that accompanies the code but is not meant for execution. In this way, notebook files can serve as a complete computational record of a session, interleaving executable code with explanatory text, mathematics, and rich representations of resulting objects. These documents are internally JSON files and are saved with the .ipynb extension. Since JSON is a plain text format, they can be version-controlled and shared with colleagues.

なるほどね。
テキスト形式であることによって、バージョン管理できるという利点が生じるみたい。

Notebooks may be exported to a range of static formats, including HTML (for example, for blog posts), reStructuredText, LaTeX, PDF, and slide shows, via the nbconvert command.

nbconvert という便利コマンドがあるみたい。

uint256_tuint256_t

中身の JSON をじっと見つめていると、なんとなく構造が見えてくる。
以下のような簡単なコードで、*.ipynb からソースコード部分のみを取り出せる。

import json

path = "/path/to/notebook.ipynb"
data = json.load(open(path))

for cell in data["cells"]:
    if cell["cell_type"] == "code":
        source = "".join(cell["source"])
        print(f"Source:\n{source}\n")

出力例:

Source:
import jax.numpy as jnp
from jax import grad, jit, vmap
from jax import random

Source:
key = random.PRNGKey(0)
x = random.normal(key, (10,))
print(x)

Source:
size = 3000
x = random.normal(key, (size, size), dtype=jnp.float32)
%timeit jnp.dot(x, x.T).block_until_ready()  # runs on the GPU

...