🍉

MLM Scoring を使って、BERT から文章の「自然さ」を計算する（+ センター英語で試してみた）

2021/05/08に公開

この記事は Masked Language Model Scoring (ACL 2020) を読んで、実装を動かしたメモです。

MLM Scoring とは？

英語をちょっとでも勉強したことのある人は、

⭕️ I have a dog.
❌ I have a dogs.

上の文章が正しく、下の文章が間違っていることが分かるでしょう。
機械が自然言語を理解するために必要なことの一つとして、このように上の文章が正しいと分かることが挙げられます。

人間が文法的にあるいは意味的に「自然」だと考える文章に対し、より高い確率を与えるような確率分布 言語モデル $P_{LM}(W)$ に関する研究が昔から行われてきました。言語モデルは

\log P_{LM}(W) = \sum_{t=1}^{|W|} \log P_{LM} (w_t | W_{<t})

というように 「今まで出力した単語から次の単語を予測する条件付き確率」の対数尤度の和 で定式化することが通例です（Autoregressive Models と呼ばれます）。近年のモデルに関しても、GPT-2 のような言語モデルは正しい文章に関してこの確率を最大化するように事前学習を行います。

ところが、この定式化は、GPT-2 と同程度に普及している BERT や RoBERTa のような Masked Language Model(MLM) にはそぐわなくなっています。というのも、これらのモデルは Bi-directional（双方向）に単語を予測する ことで事前学習を行うモデルだからです。

そこでこの研究では、BERT や RoBERTa のような MLM モデルで文章の自然さを計算するための PLL (pseudo-log-likelihood scores = 擬似対数尤度スコア) を提案し、評価しています。これは、前から順に確率を考える通常の言語モデルのスコアとは異なり、「各単語を Mask で隠して予測したときの条件付き確率」の対数尤度の和 となります。

\log P_{MLM}(W) = \sum_{t=1}^{|W|} \log P_{MLM} (w_t | W_{\backslash t})

論文中の図が分かりやすいです。

論文では、PLL による MLM Scoring が、GPT-2 のような Autoregressive Models での言語モデルのスコアと同程度、あるいはそれよりも高い確率で、言語学的に正しい文章を判断できることを実験で示していました。

MLM Scoring を Google Colab で試す

MLM Scoring の実装は公開されています（著者が Amazon の人達なので、一般的な NLP 論文の実装よりはきちんとコードが整っている印象です）。

https://github.com/awslabs/mlm-scoring

実際に Google Colaboratory で動かしてみましょう。設定で GPU をオンにしておいてください。

準備

!git clone https://github.com/awslabs/mlm-scoring
%cd mlm-scoring
!pip install -e .
!pip install mxnet-mkl mxnet-cu110
%cd src

モデルの読み込み

Masked Language Model として、今回は BERT を用います。

from mlm.scorers import MLMScorer, MLMScorerPT, LMScorer
from mlm.models import get_pretrained
import mxnet as mx
ctxs = [mx.gpu(0)]

model, vocab, tokenizer = get_pretrained(ctxs, 'bert-base-en-cased')
scorer = MLMScorer(model, vocab, tokenizer, ctxs)

では、冒頭の

⭕️ I have a dog.
❌ I have a dogs.

の例に対して、スコアを計算してみましょう。

scorer.score_sentences([
                        "I have a dog.",
                        "I have a dogs."
                        ])

出力

[-8.910481944680214, -19.832987383008003]

スコアを計算できました。"I have a dog." が約 -8.91 なのに対し、"I have a dogs." が約 -19.83 となります。

"I have a dog." の方が確かに 0 に近い値であり、スコアが高くなっていますね（スコアは確率の対数尤度を取っているので、必ず負の値になることに注意しましょう）。

また、トークン別にスコアを確認することもできます。

scorer.score_sentences([
                        "I have a dog.",
                        "I have a dogs."
                        ], per_token=True)

出力

[[None,
  -0.6229148507118225,
  -2.0005035400390625,
  -0.1885448843240738,
  -5.971728801727295,
  -0.1267898678779602,
  None],
 [None,
  -0.749841034412384,
  -1.6459006071090698,
  -5.528548240661621,
  -11.774364471435547,
  -0.1343330293893814,
  None]]

これを見ると、"I have a dogs." の方では、"dogs" に対応するトークンにおいて -11.77 という低い数値になっていることが分かります。つまり、"dogs" が「不自然」であることを判定できていると言えます。

MLM Scoring で、センター試験の英語問題を解いてみよう

MLM Scoring の威力をさらに調べるために、英語の文法問題を解かせてみましょう。
単純な BERT の穴埋め予測でも 1 単語程度の穴埋めは容易に行えますが、MLM Scoring を用いることで、よりたくさんの穴が開いた問題でも柔軟に対応できそうです。

今回は、2020 年度のセンター試験英語第2問を使うことにしました。

A

A は単純な穴埋め問題で、4つの選択肢から1つ選べば良いです。

問1

問題文（以下省略）

実装（以下省略）

choices = ["apart", "different", "far", "free"]
sentences = [f"Due to the rain, our performance in the game was {choice} from perfect." for choice in choices]
result = np.array(scorer.score_sentences(sentences))
result_index = result.argsort()[::-1]

print("| 文 | スコア |")
print("| ---- | ---- |")

for index in result_index:
    print(f"| {sentences[index]} | {round(result[index], 2)} |")

出力

実際の正解の文章を太字で示します。

文	スコア
Due to the rain, our performance in the game was far from perfect.	-20.44
Due to the rain, our performance in the game was apart from perfect.	-36.79
Due to the rain, our performance in the game was different from perfect.	-40.5
Due to the rain, our performance in the game was free from perfect.	-59.62

このように、正解の選択肢においてスコアが最も高くなっています。

問2

文	スコア
Emergency doors can be found at both ends of this hallway.	-18.27
Emergency doors can be found at either ends of this hallway.	-29.01
Emergency doors can be found at neither ends of this hallway.	-33.29
Emergency doors can be found at each ends of this hallway.	-36.98

問3

文	スコア
My plans for studying abroad depend on whether I can get a scholarship.	-13.27
My plans for studying abroad depend on that I can get a scholarship.	-18.08
My plans for studying abroad depend on what I can get a scholarship.	-23.65
My plans for studying abroad depend on which I can get a scholarship.	-24.33

問4

文	スコア
Noriko can speak Swahili and so can Marco.	-22.79
Noriko can speak Swahili and also can Marco.	-38.67
Noriko can speak Swahili and as can Marco.	-39.53
Noriko can speak Swahili and that can Marco.	-45.88

問5

文	スコア
To say you will go jogging every day is one thing, but to do it is another.	-15.93
To say you will go jogging every day is one thing, but to do it is the other.	-17.65
To say you will go jogging every day is one thing, but to do it is the others.	-29.35
To say you will go jogging every day is one thing, but to do it is one another.	-34.53

問6

文	スコア
Our boss is a hard worker, but can be difficult to get along with.	-15.84
Our boss is a hard worker, but can be difficult to get away with.	-19.56
Our boss is a hard worker, but can be difficult to get down to.	-23.51
Our boss is a hard worker, but can be difficult to get around to.	-23.78

問7

文	スコア
When Ayano came to my house, it happened that nobody was at home.	-27.16
When Ayano came to my house, something happened that nobody was at home.	-31.25
When Ayano came to my house, there happened that nobody was at home.	-39.11
When Ayano came to my house, what happened that nobody was at home.	-42.14

問8

文	スコア
We'll be able to get home on time as long as the roads are clear.	-12.88
We'll be able to get home on time as long as the roads are blocked.	-17.88
We'll be able to get home on time as far as the roads are clear.	-24.77
We'll be able to get home on time as far as the roads are blocked.	-27.76

問9

文	スコア
I know you said you weren't going to the sports festival, but it is an important event, so please give it a second thought.	-17.8
I know you said you weren't going to the sports festival, but it is an important event, so please give it a first thought.	-29.08
I know you said you weren't going to the sports festival, but it is an important event, so please take it a second thought.	-35.12
I know you said you weren't going to the sports festival, but it is an important event, so please take it a first thought.	-48.5

問10

文	スコア
I didn't recognize any of the guests except for the two sitting in the back row.	-18.41
I didn't recognize either of the guests except for the two sitting in the back row.	-24.82
I didn't recognize any of the guests rather than the two sitting in the back row.	-29.43
I didn't recognize either of the guests rather than the two sitting in the back row.	-30.31

Aの総評

10問 全問正解 しました。

B

B は並び替え穴埋め問題です。

問1

問題文（以下省略）

実装（以下省略）

総当たりで解いてみましょう。

choices = ["been", "by", "completed", "have", "the time", "would not"]
sentences = [f"Yes, thank you so much. Without your help, the preparations {choice[0]} {choice[1]} {choice[2]} {choice[3]} {choice[4]} {choice[5]} all the guests arrive this afternoon." for choice in itertools.permutations(choices, 6)]
result = np.array(scorer.score_sentences(sentences))
result_index = result.argsort()[::-1]

print("スコア上位 5 件")
print("")
print("| 文 | スコア |")
print("| ---- | ---- |")

for index in result_index[:5]:
    print(f"| {sentences[index]} | {round(result[index], 2)} |")

穴埋めのパターン総数は 6! = 720 通りもありますが、GPU を使うことで、私の環境では 20 秒程度で計算を終えることができました。

出力

スコア上位 5 件

文	スコア
Yes, thank you so much. Without your help, the preparations would not have been completed by the time all the guests arrive this afternoon.	-24.3
Yes, thank you so much. Without your help, the preparations completed would not have been by the time all the guests arrive this afternoon.	-37.89
Yes, thank you so much. Without your help, the preparations been would not have completed by the time all the guests arrive this afternoon.	-38.09
Yes, thank you so much. Without your help, the preparations have would not been completed by the time all the guests arrive this afternoon.	-44.11
Yes, thank you so much. Without your help, the preparations been completed would not have by the time all the guests arrive this afternoon.	-44.91

難なく正解することができました。

問2

スコア上位 5 件

文	スコア
Actually, he has three, the youngest of whom is studying music in London. I don't think you've met her yet.	-27.9
Actually, he has three, the youngest of whom music is studying in London. I don't think you've met her yet.	-44.22
Actually, he has three, the youngest of whom is studying in music London. I don't think you've met her yet.	-49.91
Actually, he has three, the of whom youngest is studying music in London. I don't think you've met her yet.	-50.68
Actually, he has three, the music youngest of whom is studying in London. I don't think you've met her yet.	-50.86

問3

スコア上位 5 件

文	スコア
Yeah, we have to decide now whether to hold it as planned or put it off until some day next week. We should have thought about the chance of rain.	-43.0
Yeah, we have to decide now whether to hold it as planned or put off it until some day next week. We should have thought about the chance of rain.	-45.56
Yeah, we have to decide now whether to hold it or put it off as planned until some day next week. We should have thought about the chance of rain.	-47.98
Yeah, we have to decide now whether to hold it or put off it as planned until some day next week. We should have thought about the chance of rain.	-51.52
Yeah, we have to decide now whether to hold it off or put it as planned until some day next week. We should have thought about the chance of rain.	-53.17

Bの総評

3問 全問正解 しました。

C

C は自然な会話になるようにフレーズのパターンを選択していく問題で、人間が解く場合は会話の文脈も考慮に入れる必要があります。
しかし、今回は会話の文脈を無視して、文章の自然さだけで予測できるかを試してみます。

問1

問題文（以下省略）

実装（以下省略）

choices1 = ["according to the experts,", "thanks to the neighbors,"]
choices2 = ["it will create less noise", "it will create more jobs"]
choices3 = ["for", "in"]
sentences = []
for choice1 in choices1:
    for choice2 in choices2:
        for choice3 in choices3:
            sentences.append(f"But {choice1} {choice2} {choice3} young people. It will definitely have a positive economic effect on our city.")

result = np.array(scorer.score_sentences(sentences))
result_index = result.argsort()[::-1]

print("| 文 | スコア |")
print("| ---- | ---- |")

for index in result_index:
    print(f"| {sentences[index]} | {round(result[index], 2)} |")

出力

文	スコア
But according to the experts, it will create more jobs for young people. It will definitely have a positive economic effect on our city.	-31.86
But thanks to the neighbors, it will create more jobs for young people. It will definitely have a positive economic effect on our city.	-36.14
But according to the experts, it will create more jobs in young people. It will definitely have a positive economic effect on our city.	-37.62
But thanks to the neighbors, it will create more jobs in young people. It will definitely have a positive economic effect on our city.	-42.0
But according to the experts, it will create less noise for young people. It will definitely have a positive economic effect on our city.	-43.38
But thanks to the neighbors, it will create less noise for young people. It will definitely have a positive economic effect on our city.	-46.85
But according to the experts, it will create less noise in young people. It will definitely have a positive economic effect on our city.	-50.53
But thanks to the neighbors, it will create less noise in young people. It will definitely have a positive economic effect on our city.	-54.44

問題なく解けました。

問2

文	スコア
Very much so. But although he is quite upset, he doesn't object to Emma's plan. They always support each other in the end.	-39.16
Very much so. But because he isn't so upset, he doesn't object to Emma's plan. They always support each other in the end.	-40.82
Very much so. But although he is quite upset, he objects to Emma's plan. They always support each other in the end.	-41.4
Very much so. But because he is quite upset, he objects to Emma's plan. They always support each other in the end.	-41.59
Very much so. But because he is quite upset, he doesn't object to Emma's plan. They always support each other in the end.	-42.53
Very much so. But because he isn't so upset, he objects to Emma's plan. They always support each other in the end.	-43.41
Very much so. But although he isn't so upset, he doesn't object to Emma's plan. They always support each other in the end.	-44.07
Very much so. But although he isn't so upset, he objects to Emma's plan. They always support each other in the end.	-47.11

問3

文	スコア
Even if you think you do, the drill is meaningless so that we can help each other in case of a disaster. We should think it seriously.	-53.92
Even if you think you do, the drill is essential so that we can help each other in case of a disaster. We should think it seriously.	-59.65
Even if you think you do, the drill is meaningless so that we cannot help each other in case of a disaster. We should think it seriously.	-60.08
Even if you think you do, the drill is meaningless even so we can help each other in case of a disaster. We should think it seriously.	-65.66
Even if you think you do, the drill is essential so that we cannot help each other in case of a disaster. We should think it seriously.	-68.24
Even if you think you do, the drill is meaningless even so we cannot help each other in case of a disaster. We should think it seriously.	-73.07
Even if you think you do, the drill is essential even so we can help each other in case of a disaster. We should think it seriously.	-73.12
Even if you think you do, the drill is essential even so we cannot help each other in case of a disaster. We should think it seriously.	-82.96

初めて正解を逃しました。
この問題に限っては会話文の文脈が無いと正しく解けないのかもしれません。
そこで会話文全体を入れてやり直してみました。

文	スコア
Kenjiro: Why are there fire trucks in front of the school? Ms. Sakamoto: It's because there is a fire drill scheduled for this morning. Kenjiro: Again? We just had one last semester. I already know what to do. Ms.sakamoto: Even if you think you do, the drill is meaningless so that we can help each other in case of a disaster. We should think it seriously.	-105.77
Kenjiro: Why are there fire trucks in front of the school? Ms. Sakamoto: It's because there is a fire drill scheduled for this morning. Kenjiro: Again? We just had one last semester. I already know what to do. Ms.sakamoto: Even if you think you do, the drill is essential so that we can help each other in case of a disaster. We should think it seriously.	-109.39
Kenjiro: Why are there fire trucks in front of the school? Ms. Sakamoto: It's because there is a fire drill scheduled for this morning. Kenjiro: Again? We just had one last semester. I already know what to do. Ms.sakamoto: Even if you think you do, the drill is meaningless so that we cannot help each other in case of a disaster. We should think it seriously.	-113.0
Kenjiro: Why are there fire trucks in front of the school? Ms. Sakamoto: It's because there is a fire drill scheduled for this morning. Kenjiro: Again? We just had one last semester. I already know what to do. Ms.sakamoto: Even if you think you do, the drill is meaningless even so we can help each other in case of a disaster. We should think it seriously.	-119.23
Kenjiro: Why are there fire trucks in front of the school? Ms. Sakamoto: It's because there is a fire drill scheduled for this morning. Kenjiro: Again? We just had one last semester. I already know what to do. Ms.sakamoto: Even if you think you do, the drill is essential so that we cannot help each other in case of a disaster. We should think it seriously.	-120.78
Kenjiro: Why are there fire trucks in front of the school? Ms. Sakamoto: It's because there is a fire drill scheduled for this morning. Kenjiro: Again? We just had one last semester. I already know what to do. Ms.sakamoto: Even if you think you do, the drill is essential even so we can help each other in case of a disaster. We should think it seriously.	-123.09
Kenjiro: Why are there fire trucks in front of the school? Ms. Sakamoto: It's because there is a fire drill scheduled for this morning. Kenjiro: Again? We just had one last semester. I already know what to do. Ms.sakamoto: Even if you think you do, the drill is meaningless even so we cannot help each other in case of a disaster. We should think it seriously.	-127.62
Kenjiro: Why are there fire trucks in front of the school? Ms. Sakamoto: It's because there is a fire drill scheduled for this morning. Kenjiro: Again? We just had one last semester. I already know what to do. Ms.sakamoto: Even if you think you do, the drill is essential even so we cannot help each other in case of a disaster. We should think it seriously.	-135.45

やはり不正解になってしまいました。

Cの総評

3問中2問正解しました。

総評

MLM Scoring は、文章自体の自然さを評価する上では良い指標になっていると考えられます。ただ、文脈における適切さについてまでは流石に評価の対象に入れるのは難しいようです。
このあたりは今回の MLM Scoring に限らず、対話という一つの自然言語処理タスクの難しさに直結する問題でもありましょう。

今回実験のために書いたコードは以下の Google Colab のリンクから確認できるので、興味のある方は色々試してみてください。

MLM Scoring とは？

MLM Scoring を Google Colab で試す

MLM Scoring で、センター試験の英語問題を解いてみよう

A

問1

問題文（以下省略）

実装（以下省略）

出力

問2

問3

問4

問5

問6

問7

問8

問9

問10

Aの総評

B

問1

問題文（以下省略）

実装（以下省略）

出力

問2

問3

Bの総評

C

問1

問題文（以下省略）

実装（以下省略）

出力

問2

問3

Cの総評

総評

Discussion