🐾

Rascalでコンパイラを作る

2023/11/18に公開

メタプログラミング

rascal

tech

はじめに

前回、Rascalというメタコンパイラの紹介して、簡単な計算機を作りました。

今回はRascal公式サイトを参考に、オリジナル言語Funcのコンパイラを作っていこうと思います。

Funcは変数の束縛、式、関数などがある関数型言語です。使えるデータ型は自然数のみです。

例えば、Funcで階乗を求めるコードは以下のように書けます。

fact.func

fact(n) =
    if n == 0
        then 1
        else n * fact(n - 1)
    end

この記事で、Rascalでコンパイラを作る雰囲気を味わってほしいです。

文法

Syntax.rscに文法の定義を書きます。

Func言語は以下の構成になっています。

Prog: プログラム全体、Funcの集まり
Func: 関数の定義、関数名、引数、Expを持つ
Exp : 式の定義、四則演算やlet in式やif then else式をサポート

また%% で1行コメント、%で囲うと複数行コメント扱いになります。

Syntax.rsc

module Syntax

lexical Ident = [a-z][a-z0-9]* !>> [a-z0-9];

lexical Natural = [0-9]+ ;

lexical String = "\"" ![\"]*  "\"";

layout Layout = WhitespaceAndComment* !>> [\ \t\n\r];

lexical WhitespaceAndComment
   = [\ \t\n\r]
   | @category="Comment" ws2: "%" ![%]+ "%"
   | @category="Comment" ws3: "%%" ![\n]* $
   ;

syntax Binding = binding: Ident name "=" Exp exp;

syntax Exp
    = natCon: Natural
    | bracket "(" Exp ")"
    > left (mul: Exp e1 "*" Exp e2 | div: Exp e1 "/" Exp e2)
    > left (sub: Exp e1 "-" Exp e2 | add: Exp e1 "+" Exp e2)
    > left ( gt: Exp e1 "\>" Exp e2
           | lt: Exp e1 "\<" Exp e2
           | ge: Exp e1 "\>=" Exp e2
           | le: Exp e1 "\<=" Exp e2
           )
    > left (eq: Exp e1 "==" Exp e2 | neq: Exp e1 "/=" Exp e2)
    | id: Ident name
    | let: "let" {Binding ","}* binds "in" Exp exp "end"
    | cond: "if" Exp condExp "then" Exp thenExp "else" Exp elseExp "end"
    | call: Ident name "(" {Exp ","}* args ")"
    ;

syntax Func = func: Ident name "(" {Ident ","}* args ")" "=" Exp;

start syntax Prog = prog: Func*;

SyntaxからRascalのデータ構造に変換

実は前回の計算機はこの工程を省いて、直接syntaxの定義から評価しています。より複雑な処理をする場合はRascalのデータ構造に変換してからのほうが良いです。

Abstract.rscにFunc言語で書かれたプログラムを表すデータ構造を定義します。

Abstract.rsc

module Abstract

data BINDING
    = binding(str name, EXP exp);

data EXP
    = natCon(int iVal)
    | mul(EXP e1, EXP e2)
    | div(EXP e1, EXP e2)
    | sub(EXP e1, EXP e2)
    | add(EXP e1, EXP e2)
    | gt(EXP e1, EXP e2)
    | lt(EXP e1, EXP e2)
    | ge(EXP e1, EXP e2)
    | le(EXP e1, EXP e2)
    | eq(EXP e1, EXP e2)
    | neq(EXP e1, EXP e2)
    | id(str name)
    | let(list[BINDING] binds, EXP exp)
    | cond(EXP condExp, EXP thenExp, EXP elseExp)
    | call(str name, list[EXP] args)
    ;

data FUNC
    = func(str name, list[str] args, EXP exp);

data PROG
    = prog(list[FUNC] funcs);

よーく見るとSyntax.rscのProg、Exp、BindingとAbstract.rscのPROG、EXP、BINDINGは対応関係があることがわかります。

例えばSyntax.rscのExpの定義の1つである

cond: "if" Exp condExp "then" Exp thenExp "else" Exp elseExp "end"

はAbstract.rscのEXPの定義の1つである

cond(EXP condExp, EXP thenExp, EXP elseExp)

と対応しています。

この対応を漏れなく書かないとエラーになるので注意して下さい。

Load.rscにFuncソースコードからPROGデータ構造を生成するコードを書きます。

Load.rsc

module Load

import Prelude;
import Syntax;
import Abstract;

PROG load(str s) = implode(#PROG, parse(#start[Prog], s).top);
PROG load(loc l) = implode(#PROG, parse(#start[Prog], l).top);

ここで仕事をしてるのはimplodeという関数です。parse関数でソースコードをProgに変換後、implode関数でPROGデータ構造に変換しています。

load関数はソースコードを直接渡すこともできますが、Locationを指定することでファイルを入力にできます。

LocationはRascalに出てくる概念で、ファイルパスを一般化したようなものです。詳しくは以下を参考にして下さい。

VM（Virtual Machine）コード

Func言語からスタックベースのVMマシンで動くことを想定したコードを生成します。

Assembly.rscにそのVMマシンで動く命令を定義します。

Assembly.rsc

module Assembly

data Instr
    // 自然数をスタックに積む
    = pushNat(int intCon)
    // 識別子をスタックに積む
    | pushId(str id)
    // 識別子をスタックから取り出す。その識別子の変数をスタックに積む
    | getVal()
    // Assign value on top, to variable at identifier top-1
    | setVal()
    // ラベルの定義
    | label(str label)
    // 無条件でラベルに飛ぶ
    | go(str label)
    // スタックから値を取り出す、取り出した値を0以外ならばラベルに飛ぶ
    | goIf(str label)
    // 四則演算
    | mul() | div() | add() | sub()
    // 比較演算子
    | gt() | lt() | ge() | le() | eq() | neq()
    // スタックから識別子を取り出してその識別子の関数を呼び出す
    | call()
    ;

alias Instrs = list[Instr];

alias Code = map[str, Instrs];

コンパイラを作ることが目的なのでVMコードはテキトーに設計しました。関数の引数や計算結果を保存するスタックと変数を保存するヒープが一応あります。「でもこれだと変数が上書きされね？」「なんのために関数の引数スタックに積んでるんだよ。」などのツッコミどころ満載なのはご容赦下さい。

コード生成

コンパイラは普通、変数が宣言されているかなどをチェックしますが、今回は省きます。面倒なんで。

Compile.rscにPROGからCodeを生成するcompileProgを定義します。

Compile.rsc

module Compile

import Assembly;
import Abstract;
import IO;

Instrs compileBinding(binding(str name, EXP exp))
    = [*compileExp(exp), pushId(name), setVal()];

Instrs compileExp(natCon (int N)) = [pushNat(N)];

Instrs compileExp(id (str name)) = [pushId(name), getVal()];

// 変数を束縛した後、expのコード生成
Instrs compileExp(let (list[BINDING] binds, EXP exp))
    = [*([*compileBinding(bind) | bind <- binds]), *compileExp(exp)];

// ラベル生成用
private int nLabel = 0;

// ラベル生成
private str nextLabel() {
  nLabel += 1;
  return "L<nLabel>";
}

// 分岐するためのラベルを生成する。goIfはスタックの先頭が0以外なら分岐。
Instrs compileExp(cond(EXP condExp, EXP thenExp, EXP elseExp)) {
    thenLabel = nextLabel();
    endLabel = nextLabel();
    return [
        *compileExp(condExp),
        goIf(thenLabel),
        *compileExp(elseExp),
        go(endLabel),
        label(thenLabel),
        *compileExp(thenExp),
        label(endLabel)
    ];
}

// e1 e2を計算するとそれぞれの計算結果がスタックに残るはず。最後にmul()やdiv()命令を呼び、
// スタックから値を取り出し計算を行い、計算結果を再びスタックにプッシュする。
Instrs compileExp(mul(EXP e1, EXP e2)) = [*compileExp(e1), *compileExp(e2), mul()];
Instrs compileExp(div(EXP e1, EXP e2)) = [*compileExp(e1), *compileExp(e2), div()];
Instrs compileExp(sub(EXP e1, EXP e2)) = [*compileExp(e1), *compileExp(e2), sub()];
Instrs compileExp(add(EXP e1, EXP e2)) = [*compileExp(e1), *compileExp(e2), add()];
Instrs compileExp(gt(EXP e1, EXP e2)) = [*compileExp(e1), *compileExp(e2), gt()];
Instrs compileExp(lt(EXP e1, EXP e2)) = [*compileExp(e1), *compileExp(e2), lt()];
Instrs compileExp(ge(EXP e1, EXP e2)) = [*compileExp(e1), *compileExp(e2), ge()];
Instrs compileExp(le(EXP e1, EXP e2)) = [*compileExp(e1), *compileExp(e2), le()];
Instrs compileExp(eq(EXP e1, EXP e2)) = [*compileExp(e1), *compileExp(e2), eq()];
Instrs compileExp(neq(EXP e1, EXP e2)) = [*compileExp(e1), *compileExp(e2), neq()];

// 引数のコード生成をしてスタックにプッシュする。その後関数名をスタックにプッシュしcall()を呼ぶ。
Instrs compileExp(call(str name, list[EXP] args))
    = [*[*compileExp(arg) | arg <- args], pushId(name), call()];

// スタックの値を変数に束縛後、expのコード生成。
Instrs compileFunc(list[str] args, EXP exp)
    = [*[*[pushId(arg), setVal()] | arg <- args], *compileExp(exp)];

Code compileProg(prog (list[FUNC] funcs))
    = (name : compileFunc(args, exp)
        | func(str name, list[str] args, EXP exp) <- funcs);

所々に[*..., *...]みたいに*が付くのが気になると思います。これは配列を平坦化する文法です。例えば[[1, 2], [3, 4]]はlist[list[int]]型になるのですが、[*[1, 2], *[3, 4]]と書くと、これは[1, 2, 3, 4]に展開されて型はlist[int]になります。この文法を駆使すると、ほとんどがリストの定義と内包表現で書けます。素晴らしい！

最後にMain.rscにこれらの関数を呼び出すための関数を定義します。

Main.rsc

module Main

import Load;
import Compile;
import Assembly;

Code compile(str s) = compileProg(load(s));
Code compile(loc l) = compileProg(load(l));

今回実装したソースコード

https://github.com/whtsht/rascal-func

examplesディレクトリ下にfib.funcとfact.funcがあります。どのようなコードが生成されるか見たい場合はrascalをREPLを起動して確認できます。

$ rascal
rascal>import Main;
rascal>compile(|cwd:///examples/fib.func|);
map[str, list[Instr]]: ("fib":[
    pushId("n"),
    setVal(),
    pushId("n"),
    getVal(),
    pushNat(2),
    lt(),
    goIf("L1"),
    pushId("n"),
    getVal(),
    pushNat(1),
    sub(),
    pushId("fib"),
    call(),
    pushId("n"),
    getVal(),
    pushNat(2),
    sub(),
    pushId("fib"),
    call(),
    add(),
    go("L2"),
    label("L1"),
    pushId("n"),
    getVal(),
    label("L2")
  ])

おわりに

Rascalでコンパイラを作りました。

配列の平坦化や柔軟なパターンマッチなどが書きやすいだけで、コンパイラ書くときの快適さが他言語とかなり違うと私は思いました。実際に比較してみたいです。

今回は変数が宣言されているかどうかなどのソースコードの検証を全くしていません。Rascal公式サイトの型チェックの例などを参考にすればできると思うので、興味のある方はぜひ。

参考文献

GitHubで編集を提案