LLMのベンチマーク・評価に関する情報
とりあえずメモです。日本語のもの中心です。
またそのうち追記します。
JGLUE
https://github.com/yahoojapan/JGLUE
ベンチマークのソフト
https://github.com/microsoft/promptbench
https://github.com/llm-jp/llm-jp-eval
https://github.com/explodinggradients/ragas
https://github.com/VILA-Lab/ATLAS
https://github.com/gkamradt/LLMTest_NeedleInAHaystack
https://github.com/elith-co-jp/langdechat
https://tech.algomatic.jp/entry/2024/04/10/183001
https://github.com/openai/simple-evals
ベンチマーク実践
https://qiita.com/wayama_ryousuke/items/a58791cdc2a05847824d
https://qiita.com/wayama_ryousuke/items/105a164e5c80c150caf1
参考リンク
https://note.com/wandb_jp/n/n2464e3d85c1a
https://github.com/yuzu-ai/japanese-llm-ranking
https://note.com/npaka/n/n0530f6f9123f
https://note.com/shi3zblog/n/n03bdb67370aa
https://wandb.connpass.com/event/300670/presentation/
https://note.com/shi3zblog/n/n6b2ac5874021
https://nikkie-ftnext.hatenablog.com/entry/lm-evaluation-harness-open-calm-7b-jcommonsenseqa
https://drive.google.com/file/d/1nQlHckrkCag-_hHrMc_5jGsnY9-keBJc/view
https://note.com/npaka/n/n44252e28e70a
https://www.bioerrorlog.work/entry/langcheck-llm-evaluation
https://www.bioerrorlog.work/entry/llm-model-based-eval-openai-practice
https://acro-engineer.hatenablog.com/entry/2023/11/29/000000
https://github.com/llm-jp/awesome-japanese-llm
https://docs.google.com/presentation/d/1MaIQi-AANQCh3TgACtx10eBwViamth-Y/
https://docs.google.com/presentation/d/1EMd6qcJg1yDdyopbvSIp-TkMeDfLQy1T/
https://www.docswell.com/s/DeepLearning2023/538DRY-2023-12-22-105000
まとめ
https://note.com/npaka/n/ndec10f78fe2f
https://zenn.dev/pakas/articles/80f797b0c3ae1e
https://qiita.com/s-nagase/items/2baced05d9db8efcf073
https://www.brainpad.co.jp/doors/contents/01_apply_generative_ai_to_business/
Discussion