🐥

【LLM】Coursera Generative AI with Large Language Models - Week 1

2023/08/06に公開

coursera

LLM

tech

Introduction to LLMs and the generative AI project lifecycle

Course Introduction

Andrew Ng先生
AWSのエキスパート3人が解説
prompt engineering から fine tuningまで学べる
実習はAWSの無料枠で完結する

Introduction Week 1

transformerがすごい、という会話

Generative AI & Models

学術界での定義

promptのことをcontext windowと呼ぶ
modelへのinput, outputのことをcontext, completionと呼ぶ
この授業ではfine tuningの方法も学べる

LLM use cases and tasks

LLMは会話のみならず、伝統的な言語タスクもできる
- 翻訳
- 文章生成
- コーディング
- named entity recognition
week3で外部APIとの接続を学ぶ
パラメータが増えると、モデルの主観的な理解力(基礎力みたいなもの)も向上する
- subjective understainding of languages processes, reasons, and solves the tasks
week2でsmaller modelのfine tuningの方法を学ぶ
ここ数年のLLMの進展はアーキテクチャがキーだった

Text generation before transformers

Text generation before transformers | Coursera
transformerの前はRNNが使われていた
RNNの欠点はスケーラビリティ。多くの計算リソースとメモリが必要だった。
transformerにより、そのスケーラビリティの問題が解決され、大きなデータセットで大きなモデルが作れるようになった
- マルチコアGPU
- 入力データの並列化

Transformers architecture

Transformers architecture | Coursera
Self Attentionによりモデルの言語理解が向上した
- Self Attention: 入力された文における単語の関係の強さを全て学習する機構
transformerのアーキテクチャは2つのパーツに分けられる。EncoderとDecoder
inputsは数字なので、文字をtokenizeする必要がある
- tokenizerの選択はモデル設計の1種
tokenizeされたinputはembeddingになる
- original paperでは512次元
- embeddingは単語単体の意味を表すようなベクトル
positional encoding
- 単語の順番をモデルが理解できるようにするための仕組み
multi head attention
- 1つのself attentionでは、単語の関係性を1つしか学習できない
- multi head attentionでは、複数のattentionを学習することで、複数の単語の関係性を学習できる
- 例
  - 文の人物の関係
  - 単語の韻
  - activity(?)
softmax
- tokenizer内の単語の数だけの次元を持つベクトル
- その中で最も大きい値を持つ次元が、モデルが出力する予測単語

Generating text with transformers

Generating text with transformers | Coursera
Encoderのoutputとtokenizedされたinputを1wordずつ入力として受け付け、Decoderは次の単語を予測する
- e.g. Frenchへの翻訳
  - tokenized: English sentence
  - encoder's output: english sentence全体の意味や構造
Encoder単体の学習も昔はされていた. encoderのoutputに分類層をつける。
- sentiment分析.
- BERT
Encoder-Decoder modelの方が翻訳はうまくいく
- inputとoutputの出力次元が異なってもいいので
- テキスト生成にも向いてる
- BART
最近はDecoder-only modelが主流
- GPT
- LLaMa
- BLOOM
- Jurassic

Prompting and prompt engineering

Prompting and prompt engineering | Coursera
in-context learning(ICL): context window (prompt) に問いだけでなく、同形式の問いと答えのペアを入れることで推論性能を向上させるテクニック
zero shot inference
- 問い + 補足情報 + 答えの形式
- 追記:
  - 答えの形式は、例えば、"Summary: "とか。
  - 仮に問いに"Summarize the following sentences"と書いてあったとしても、"Summary: "を付加することで改善することがある
one shot inference
- (問い + 補足情報 + 答え) * 1 + (問い + 補足情報 + 答えの形式)
few shot inference
- (問い + 補足情報 + 答え) * N + (問い + 補足情報 + 答えの形式) (N>=2)
smaller modelの方がone, few shot inferenceで改善しやすい

Generative configuration

Generative configuration | Coursera
Inference parameters: 推論時に関係があるパラメータ
- max new tokens : 生成するtoken数
  - stop tokenが出たらmaxまでいかないこともある
- greedy decoding: 予測単語の中で最も確率が高いものを選ぶ
  - 最もよく使われる設定だが、単語の繰り返しを起こしやすくなる
  - 短文生成には向いている
- random sampling: softmaxの確率に従ってランダムに単語を選ぶ
  - top k sampling: softmaxの確率が高い上位k個の単語からsampling
  - top p sampling: softmaxの確率が高い上位p%の単語からsampling
  - temperature: softmaxの確率分布を変化させる
    - 高いと峰が低く、低いと峰が高くなる
    - ランダム性(creativeness)を高くしたければ高く設定すればよい
- ※太字が一般的に設定できるパラメタ
- 疑問: kとpを併用しているときってand条件?
temperatureの例

Generative AI project lifecycle

Generative AI project lifecycle | Coursera
Lifecycleの全体像
Scope
- modelに対してどういう要求があるのかを定義する
  - どのタスク?
  - であればどのサイズのモデル?
- compute costの低減にも寄与する
Select
- from scratchで作るか、ありもののmodelを使うか？を考える
- 判断基準はweek1の後ほどで解説
Adapt and align model
- trainingのようなstep
- Prompt engineering(one or few shot inference)
  - 初手としてとりあえず実施
- fine-tuning
  - prompt engineeringでうまくいかなかったら実施
  - 方法はweek2で学ぶ
- align with human feedback
  - 人間の好み(preference)に従っているかを保証するstep
  - reinforcement learning with human feedback (RLHF) を実施する
  - RLHFはweek3で学ぶ
- evaluation
  - 推論や振る舞いの良さの評価
  - 指標についてはweek2で学ぶ
- このstepはiterative
  - Fine-tuningの実施後、prompt-engineeringを実施して、また評価、といった具合
Application integration
- modelを推論に最適化
- additional infraの構築
  - tendency
  - complex reasoning
  - complex mathmatics
  - に対応する

Introduction to AWS labs

Introduction to AWS labs | Coursera
AWSでexcerciseするための機能紹介

Lab 1 walkthrough

Lab 1 walkthrough | Coursera
Jupyterで要約タスクのprompt engineering
扱うモデルは FLAN-T5
zero-shot inferenceの実行
zero-shot inferenceで答えの形式を変えて実行
one-shot inferenceの実行
few-shot inferenceの実行
- 発表者の経験的には、5 or 6 shot以上はあまり改善しない
- このnotebookの例でも、one-shotで十分（2-shotでもあまり改善していない）
inference parameterを変えた推論の実行

感想

誰から・何で学んだは重要かも
- スタンダードな教材・有名な先生で学ぶのはその分野の共通言語を学ぶことになっている気がする

LLM pre-training and scaling laws

次ここから

Introduction to LLMs and the generative AI project lifecycle

Course Introduction

Introduction Week 1

Generative AI & Models

LLM use cases and tasks

Text generation before transformers

Transformers architecture

Generating text with transformers

Prompting and prompt engineering

Generative configuration

Generative AI project lifecycle

Introduction to AWS labs

Lab 1 walkthrough

感想

LLM pre-training and scaling laws

Discussion