大規模言語モデル[入門]

Transformer
大規模言語モデル: LLM
- 松尾研究室東京大学サマースクール LLM 大規模言語モデル講座
- 大規模言語モデル入門
LLM's Leaderboard
- W&B - Nejumi LLMリーダーボード Neo
- HF🤗 - Open LLM Leaderboard

Attention moduleの改善による学習効率の向上
- Flash Attention : 2次関数ではなく線形の特性
- FlashAttention-2 : FlashAttentionよりも約2倍高速, 従来Attentionによる実装との比較では最大9倍の学習高速化

追加

Dao-AILab/flash-attention

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

分散学習による学習の高速化
- DeepSpeed: 深層学習の訓練/推論を高速化するフレームワーク
  - Megatron-DeepSpeed: NVIDIAのMegatron-LMと結合, hf🤗 Transformers TrainerによるDeepSpeed 機能の統合^[1]

追加

大規模言語モデル開発を支える分散学習技術

アニメーションでDeepSpeed (ZeRO1)の仕組みを完全に理解する

Github - repo microsoft/DeepSpeed

DeepSpeed - Getting Started

脚注

Attention module以外のアーキテクチャの代替案

もっとある...

脚注

Prompt Engineering^[1]^[2]

脚注

RAG : Retrival Augmented Genearation^[1]^[2]^[3]

脚注

評価方法 :

脚注