🖊️

SKK実装入門 (2) ひらがな入力

2023/05/22に公開

第一回ローマ字 -> ひらがな変換
前回同上
次回まだ

0. はじめに

この記事は第二回です。前回までの内容を前提とします。

さて前回は、ローマ字の入力を 1 文字ずつ受け取りひらがなに変換する仕組みを作りました。
今回は Neovim 上で実際にひらがなを入力できるようにします。

この記事時点での進捗コミットはこちら。

1. 必要な機能

今回実装する機能は大きく以下の 2 つに分けられます。

ユーザーの入力を奪い前回の kanaInput() 関数に送る
現在の入力状態をバッファに反映する

1.1. 入力の奪取

Neovim においてユーザーの入力を監視する方法はいくつかあります。
(Thanks @kuu!)

全部マッピングする (imap/lmap)
InsertCharPre イベントで v:char を使う
vim.on_key() を使う

今回は 1. を採用します。

1.1.1. マッピング

一番シンプルです。
ただし、imap を使うと既存のマッピングを破壊するリスクがあります。

buffer-local は global なマッピングを破壊しませんが、既に buffer-local なマッピングがあれば上書きしてしまいます
- 退避させることも出来ますが、lmap を使う方が素直でしょう
少し複雑ですので実際の使い方は後述します

:h language-mapping

						*language-mapping*
":lmap" defines a mapping that applies to:
- Insert mode
- Command-line mode
- when entering a search pattern
- the argument of the commands that accept a text character, such as "r" and
  "f"
- for the input() line
Generally: Whenever a character is to be typed that is part of the text in the
buffer, not a Vim command character.  "Lang-Arg" isn't really another mode,
it's just used here for this situation.
   The simplest way to load a set of related language mappings is by using the
'keymap' option.  See |45.5|.
   In Insert mode and in Command-line mode the mappings can be disabled with
the CTRL-^ command |i_CTRL-^| |c_CTRL-^|. These commands change the value of
the 'iminsert' option.  When starting to enter a normal command line (not a
search pattern) the mappings are disabled until a CTRL-^ is typed.  The state
last used is remembered for Insert mode and Search patterns separately.  The
state for Insert mode is also used when typing a character as an argument to
command like "f" or "t".
   Language mappings will never be applied to already mapped characters.  They
are only used for typed characters.  This assumes that the language mapping
was already done when typing the mapping. Correspondingly, language mappings
are applied when recording macros, rather than when applying them.

1.1.2. InsertCharPre

Insert モードで文字が入力されたとき、その文字が実際に挿入される前に発火するイベントです。
入力された文字は v:char に格納されており、このイベント中に変更することができます。
今回の用途には幾つかの問題があります。

<BS> で発火しない
v:char に特殊文字が指定できない（literally に扱われる。<BS> が <, B, S, > の4文字になってしまう）

When |v:char| is set to more than one character this text is inserted literally.
Insert モード限定のため、CommandLine モードへの拡張ができない

:h InsertCharPre

							*InsertCharPre*
InsertCharPre			When a character is typed in Insert mode,
				before inserting the char.
				The |v:char| variable indicates the char typed
				and can be changed during the event to insert
				a different character.  When |v:char| is set
				to more than one character this text is
				inserted literally.

				Cannot change the text. |textlock|

1.1.3. vim.on_key()

namespace に紐付けあらゆるキー入力を傍受します。
制御が困難です。

nvim_buf_clear_namespace() でクリアできない
mapping を評価したあとのキーが送信される

:h vim.on_key()

on_key({fn}, {ns_id})                                           *vim.on_key()*
    Adds Lua function {fn} with namespace id {ns_id} as a listener to every,
    yes every, input key.

    The Nvim command-line option |-w| is related but does not support
    callbacks and cannot be toggled dynamically.

    Note:
        {fn} will not be cleared by |nvim_buf_clear_namespace()|

    Note:
        {fn} will receive the keys after mappings have been evaluated

    Parameters: ~
      • {fn}     (function) Callback function. It should take one string
                 argument. On each key press, Nvim passes the key char to
                 fn(). |i_CTRL-V| If {fn} is nil, it removes the callback for
                 the associated {ns_id}
      • {ns_id}  integer? Namespace ID. If nil or 0, generates and returns a
                 new |nvim_create_namespace()| id.

    Return: ~
        (integer) Namespace id associated with {fn}. Or count of all callbacks
        if on_key() is called without arguments.

    Note:
        {fn} will be removed if an error occurs while calling.

1.2. 入力状態をバッファへ反映する

新しい変換結果を入力するだけでは足りず、前回までの入力との差分を削除するなどの操作が必要です。
具体例を挙げます。zenn と入力するとき、バッファは

z
ぜ
ぜn
ぜん

と変わっていくはずです。

e を入力したとき、Neovim には \bぜ が送信されてほしいのです（\b は <BS> です）。

新しい入力による変更に応じて、一致しない部分を削除し、新しい入力を追加する命令を作成するために PreEdit クラスを実装します。

2. 実装

2.1. PreEdit

順番が前後しますが、まずはマッピングする対象の機能を作ろうということで PreEdit クラスからにしましょう。

preedit.lua
local utf8 = require("skk.utf8")

---バッファの変化から差分を生成する
---@class PreEdit
---@field current string
---@field kakutei string
local PreEdit = {}

---@return PreEdit
function PreEdit.new()
  return setmetatable({
    current = "",
    kakutei = "",
  }, { __index = PreEdit })
end

---@param str string
function PreEdit:doKakutei(str)
  self.kakutei = self.kakutei .. str
end

---@param next string
---@return string
function PreEdit:output(next)
  local ret
  if self.kakutei == "" and vim.startswith(next, self.current) then
    ret = next:sub(#self.current)
  else
    local current_len = utf8.len(self.current) --[[@as integer]]
    ret = string.rep("\b", current_len) .. self.kakutei .. next
  end
  self.current = next
  self.kakutei = ""
  return ret
end

return PreEdit

utf8 は Lua 5.3 以降に存在する組み込みライブラリを模倣して私が作ったものです。
ファイル 1 つなので組み込みやすく便利です。
https://github.com/uga-rosa/utf8.nvim
Lua の # で求まる文字列長はバイト長なので、必要な <BS> 数と一致しません。
なので文字数を求めるために utf8.len() を使います。

こんな感じに動きます。
SKK におけるひらがなの直接入力をエミュっています。
output() には Context から現在の入力状態が渡されます。
変換が確定したときは doKakutei() で kakutei フィールドに積みます（output()が呼ばれるまで保持されます）。

preedit_spec.lua
local PreEdit = require("skk.preedit")

describe("preedit test", function()
  it("normal", function()
    local preEdit = PreEdit.new()
    -- input 'ひら'
    assert.equals("h", preEdit:output("h"))
    preEdit:doKakutei("ひ")
    assert.equals("\bひ", preEdit:output(""))
    assert.equals("r", preEdit:output("r"))
    preEdit:doKakutei("ら")
    assert.equals("\bら", preEdit:output(""))
  end)

  it("emoji", function()
    local preEdit = PreEdit.new()
    assert.equals("💩", preEdit:output("💩"))
    assert.equals("\b🚽", preEdit:output("🚽"))
    assert.equals("\b🍦", preEdit:output("🍦"))
  end)
end)

PreEdit は Context で管理しますので、フィールドに追加します。

2.2. 実際の入力状態を管理する InputState

今はまだいいですが、今後変換を実装する上で全てを Context に任せておくのは複雑になるでしょう。
そこで実際の入力状態を管理する InputState を実装します。

実際の と付けたのは、差分だけを管理する PreEdit との対比です。

Context の feed フィールドを分離します。

fixed フィールドは削除します。
- SKK では事前に変換開始を指定しなければ、即座に確定されます。
- 前回用意したのはテストのための仮バッファとして使うためです。

state.lua
---@alias State InputState

---@alias InputMode "direct"

---@class InputState
---@field type "input"
---@field mode InputMode
---@field feed string
local InputState = {}

function InputState.new()
  return setmetatable({
    type = "input",
    mode = "direct",
    feed = "",
  }, { __index = InputState })
end

---@return string
function InputState:toString()
  return self.feed
end

return {
  InputState = InputState,
}

2.3. Context を修正

ここまでの変更を Context に入れるとこのようになります。
toString() メソッドは state が持っている状態を整形して出力するもので、結果は preEdit:output() などに渡されます。

context.lua
local KanaTable = require("skk.kana.kana_table")
local PreEdit = require("skk.preedit")
local InputState = require("skk.state").InputState

---@class Context
---@field kanaTable KanaTable 全ての変換ルール
---@field preEdit PreEdit
---@field state State
---@field tmpResult? KanaRule feedに完全一致する変換ルール
local Context = {}

function Context.new()
  local self = setmetatable({}, { __index = Context })
  self.kanaTable = KanaTable.new()
  self.preEdit = PreEdit.new()
  self.state = InputState.new()
  return self
end

---@param candidates? KanaRule[]
function Context:updateTmpResult(candidates)
  candidates = candidates or self.kanaTable:filter(self.state.feed)
  self.tmpResult = nil
  for _, candidate in ipairs(candidates) do
    if candidate.input == self.state.feed then
      self.tmpResult = candidate
      break
    end
  end
end

---@return string
function Context:toString()
  return self.state:toString()
end

return Context

2.4. kanaInput() を修正

ここまでの変更を input.lua の kanaInput() に適応します。

input.lua
local Input = {}

---@param context Context
---@param result KanaRule
local function acceptResult(context, result)
  local preEdit = context.preEdit
  local state = context.state
  local kana, feed = result.output, result.next

  preEdit:doKakutei(kana)
  state.feed = feed
end

---@param context Context
---@param char string
function Input.kanaInput(context, char)
  local state = context.state
  local input = state.feed .. char
  local candidates = context.kanaTable:filter(input)
  if #candidates == 1 and candidates[1].input == input then
    -- 候補が一つかつ完全一致。確定
    acceptResult(context, candidates[1])
    context:updateTmpResult()
  elseif #candidates > 0 then
    -- 未確定
    state.feed = input
    context:updateTmpResult(candidates)
  elseif context.tmpResult then
    -- 新しい入力によりtmpResultで確定
    acceptResult(context, context.tmpResult)
    context:updateTmpResult()
    Input.kanaInput(context, char)
  else
    -- 入力ミス。context.tmpResultは既にnil
    state.feed = ""
    Input.kanaInput(context, char)
  end
end

return Input

テストはこのように修正されます。
dispatch() は入力を模倣するためのものです。今後を見越して分離しました。
実際は一文字ごとに preEdit:output(context:toString()) が発行され、Neovim のバッファに変更が反映されます。

input_spec.lua
local Context = require("skk.context")
local dispatch = require("skk.testutil").dispatch

---@type Context
local context

---@param input string
---@param expect string
local function test(input, expect)
  dispatch(context, input)
  assert.are.equals(expect, context.preEdit:output(""))
end

describe("Tests for input.lua", function()
  before_each(function()
    context = Context.new()
  end)

  it("single char", function()
    test("ka", "か")
  end)

  it("multiple chars (don't use tmpResult)", function()
    test("ohayou", "おはよう")
  end)

  it("multiple chars (use tmpResult)", function()
    test("amenbo", "あめんぼ")
  end)

  it("multiple chars (use tmpResult and its next)", function()
    test("uwwwa", "うwっわ")
  end)

  it("mistaken input", function()
    test("rkakyra", "から")
  end)
end)

testutil.lua
local input = require("skk.input")

local M = {}

---@param context Context
---@param keys string
function M.dispatch(context, keys)
  for key in vim.gsplit(keys, "") do
    input.kanaInput(context, key)
  end
end

return M

2.5. キー入力全般を管理する handleKey()

context の保持や今後 SKK の各種機能を実装していくことを考えると、kanaInput() を直接マッピングするのはあまり良い選択とは言えないでしょう。
そこで handle() と handleKey() を実装します。
handle() は init.lua から公開するための関数で、実際の分岐処理は keymap.lua の handleKey() に移譲します。

init.lua
local Context = require("skk.context")
local Keymap = require("skk.keymap")

local context = Context.new()

local M = {}

---@param key string
---@return string
function M.handle(key)
  Keymap.handleKey(context, key)
  local output = context.preEdit:output(context:toString())
  return output
end

return M

keymap.lua
local Input = require("skk.input")

local Keymap = {}

local keyMaps = {
  input = setmetatable({}, {
    __index = function()
      return Input.kanaInput
    end,
  }),
}

---@param context Context
---@param key string
function Keymap.handleKey(context, key)
  keyMaps[context.state.type][key](context, key)
end

return Keymap

KeyMaps が少し冗長に見えるかもしれませんが、henkanState もそのうち実装するのでこうしておきます。

あとは handle() を expr でマッピングすれば動くはずです。
ということで lmap の話をしましょう。

2.6. language-mapping

language-mapping は、ただ :lmap でマッピングを追加するだけでは機能しません。
一つ以上のマッピングを定義した状態で、<C-^> を送信する必要があります。

						*i_CTRL-^*
CTRL-^		Toggle the use of typing language characters.
		When language |:lmap| mappings are defined:
		- If 'iminsert' is 1 (langmap mappings used) it becomes 0 (no
		  langmap mappings used).
		- If 'iminsert' has another value it becomes 1, thus langmap
		  mappings are enabled.
		When no language mappings are defined:
		- If 'iminsert' is 2 (Input Method used) it becomes 0 (no
		  Input Method used).
		- If 'iminsert' has another value it becomes 2, thus the Input
		  Method is enabled.
		When set to 1, the value of the "b:keymap_name" variable, the
		'keymap' option or "<lang>" appears in the status line.
		The language mappings are normally used to type characters
		that are different from what the keyboard produces.  The
		'keymap' option can be used to install a whole number of them.

という訳で、SKK を有効化するための enable() (toggle()) と、それを呼び出す Neovim 側のマッピングはこのようになります。
とりあえずひらがなの入力が出来ればいいので、小文字の a-z を奪うようにしてみました。

init.lua
local Context = require("skk.context")
local Keymap = require("skk.keymap")

local context = Context.new()

local M = {}

---@param key string
---@return string
function M.handle(key)
  Keymap.handleKey(context, key)
  local output = context.preEdit:output(context:toString())
  return output
end

local keys = vim.split("abcdefghijklmnopqrstuvwxyz", "")

---@return string
function M.enable()
  if vim.bo.iminsert ~= 1 then
    for _, lhs in ipairs(keys) do
      vim.keymap.set("l", lhs, function()
        return M.handle(lhs)
      end, { buffer = true, silent = true, expr = true })
    end
    return "<C-^>"
  else
    return ""
  end
end

---@return string
function M.disable()
  if vim.bo.iminsert == 1 then
    for _, lhs in ipairs(keys) do
      vim.keymap.del("l", lhs, { buffer = true })
    end
    return "<C-^>"
  else
    return ""
  end
end

---@return string
function M.toggle()
  if vim.bo.iminsert == 1 then
    return M.disable()
  else
    return M.enable()
  end
end

return M

plugin/skk-learning.lua
if vim.g.loaded_skk_learning then
  return
end
vim.g.loaded_skk_learning = true

vim.keymap.set("i", "<C-j>", require("skk").toggle, { expr = true })

これで、インサートモードに入り <C-j> をタイプすれば有効化されます。toggle() なので、無効化したいときも同様です。
hiragana と入力したら ひらがな とちゃんと出力されました！成功です！

3. おわり

今回はここまでで終了とします。
無事ひらがなの入力が出来るようになりました！

次回は SKK 辞書の文法について調べ、読み込んでみます。変換の前準備ですね。

GitHubで編集を提案

Discussion

ログインするとコメントできます