ローカルLLMをしたい

純粋にchatGPTみたいなものをローカルで動かそうとすると、言語モデル自体に加えて下記のいずれかを使う？

llama + text generation webui
ollama + Open WebUI

llama-cpp-pythonを動かしてみる

yuuri@hotcocoa:~/llama-test$ source .venv/bin/activate
(.venv) yuuri@hotcocoa:~/llama-test$ pip3 install llama-cpp-python

yuuri

llama.cppって何？とreadmeを見ると、Inference of Meta's LLaMA model (and others) in pure C/C++とのこと
https://github.com/ggerganov/llama.cpp

llama.cppがLlamaを始めとするLLMモデルのインターフェース的ななにか？

yuuri

llama.cppから扱えるモデルは、guff形式？というものでHuggingFace(学習済みモデルが登録されているリポジトリみたいな場所)が主に扱っている形式とは異なる

なので、guff形式に変換してからでないとllama.cppからhuggingFaceのモデルを使うことは基本的にできないらしい(変換済みのものも登録はされているようですが・・・)
https://zenn.dev/saldra/articles/2598836233f555
llama.cppとセットでよく出てくる量子化について(llama.cppの大きな目的として、性能面含め様々なタイプのHWでLLMを動かせるようにしたいというのがあるようなので、主たる機能の一つなのですが)、下記の記事でなんとなく理解しました

https://qiita.com/xxyc/items/e8deb8dd2366a5e52345
もとからちっちゃいモデルと、大規模なモデルを量子化して重み付けの桁を丸めたモデルだと後者のほうが性能良いの・・・？

yuuri

量子化について

yuuri

ChatGPTみたいな使い方でなくほかプログラムと連携してLLMを動かしたい・・・のでLangChainというフレームワークを使ってLLMを使おうとしていたのですが、どうもLangChainからとくに変換することなくHuggingFaceのモデルを動かせるらしい

yuuri

AIというか深層学習そのものについて
https://thinkit.co.jp/article/22084

yuuri

langchain-huggingface を使うとサクッとhugging faceで公開されているモデルをローカル/GPUで動かすことができますね
https://huggingface.co/blog/langchain

pip install langchain-huggingface

from langchain_huggingface import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id="microsoft/Phi-3-mini-4k-instruct",
    task="text-generation",
    device=0,
    pipeline_kwargs={
        "max_new_tokens": 200,
        "top_k": 50,
    },
)
print(llm.invoke("にゃーん"))
print(llm.invoke("人生辛い"))

yuuri

コード実行時にモデルのダウンロード処理が走るのですが、huggingface-cliをつかって先にモデルをダウンロード(キャッシュ？)しておくことも可能

 huggingface-cli download cyberagent/open-calm-3b

ダウンロードしたモデル

yuuri

VScodeでWSL上のディレクトリを開けるプラグイン大変便利

yuuri

chat_templateについて

chat_templateが設定されていないLLMをLangChainのChatModelで動かそうとすると怒られるらしい

ValueError: Cannot use chat template functions because tokenizer.chat_template is not set and no template argument was passed! For information about writing templates and setting the tokenizer.chat_template attribute, please see the documentation at https://huggingface.co/docs/transformers/main/en/chat_templating

(tokenizerの役割を正直理解していないですが・・・)