🚀

Phi-3 miniをローカルで動かしてみる

2024/04/30に公開

Python

GPU

LLM

tech

はじめに

小さいけど高性能と言われているMicrosoftのPhi-3-miniをローカルPCで動かしてみました。

試した環境は以下のとおりです。

Core i9-13900
64GB RAM
GeForce RTX 4090
Windows 11 Pro

やってみる

例によってnpaka先生の記事を写経します。

from transformers import AutoModelForCausalLM, AutoTokenizer

# トークナイザーの準備
tokenizer = AutoTokenizer.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct"
)

# モデルの準備
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-128k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=True,
)

import torch

# プロンプトの準備
chat = [
    {"role": "user", "content": "Who is the cutest in Madoka Magica?"},
]
prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)

# 推論の実行
token_ids = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
with torch.no_grad():
    output_ids = model.generate(
        token_ids.to(model.device),
        do_sample=True,
        temperature=0.6,
        max_new_tokens=256,
    )
output = tokenizer.decode(output_ids.tolist()[0][token_ids.size(1) :], skip_special_tokens=True)
print(output)

実行結果

まずは日本語で質問してみます。

chat = [
    {"role": "user", "content": "カレーライスとは何ですか？"},
]

生成完了までの時間は約8.3秒でした。

カレーライスとは、カレーとご飯を組み合わせた日本の料理です。カレーは通常、肉や魚、野菜、そして特別なカレールー（カレーの味付け成分）を使用して作られます。これらを温めたご飯に混ぜ合わせた後、塩、酒、味噌、またはその他の具材を加えて味付けします。カレーライスは、日本の伝統的な食事であり、様々な地域で異なる味付け方法がありますが、基本的な構成は一貫しています。

英語で質問してみます。

chat = [
    {"role": "user", "content": "What's the curry rice?"},
]

生成完了までの時間は約4.5秒でした。

Curry rice is a popular dish in many parts of the world, especially in Asian cuisines. It typically consists of rice mixed with a flavorful curry sauce. The sauce can vary in ingredients based on the region or specific recipe, but it often includes a blend of spices, onions, garlic, ginger, and sometimes coconut milk. Vegetables, chicken, or other proteins can also be added. It's known for its aromatic, spicy, and comforting flavor.

英語のほうが生成時間が短く、やや回答精度も高い傾向がみられました。回答精度は学習しているデータに依存していると思います。

Phi-3-mini-4k-instructのほうでやってみた結果は以下です。

chat = [
    {"role": "user", "content": "カレーライスとは何ですか？"},
]

生成完了までの時間は約5.9秒でした。

カレーライスは、日本の伝統的な料理であり、カレーとライスを組み合わせた料理です。基本的なレシピでは、味噌汁を使用してカレー汁を作り、その上にご飯を盛り付けます。しかし、カレーライスにはさまざまな具材やスパイスが使われることが多いです。例えば、牛肉、鶏肉、豚肉などの魚介類、野菜、およびシンプルなものから豊富な具材まで、多様なカレーライスのバリエーションが存在します。また、塩、胡椒、醤油、キャベツオイ

生成完了までの時間は約5.9秒でした。

chat = [
    {"role": "user", "content": "What's the curry rice?"},
]

生成完了までの時間は約7秒でした。

Curry rice is a popular dish found in many countries, particularly in Asia. It consists of rice cooked with a curry sauce, which is made from a blend of spices, herbs, and often includes ingredients like meat, vegetables, or seafood. The specific ingredients and flavors can vary widely depending on the region and personal preferences. Here's a basic recipe for making curry rice:

Ingredients:
- 1 cup of basmati rice
- 2 cups of water or vegetable broth
- 1 tablespoon of vegetable oil
- 1 small onion, finely chopped
- 2 cloves of garlic, minced
- 1 tablespoon of ginger, grated
- 2-3 tablespoons of curry powder (adjust to taste)
- 1 can of coconut milk (14 oz)
- 1 cup of vegetables or protein (e.g., mixed vegetables, chicken, tofu, or shrimp), chopped
- Salt and pepper, to taste
- Fresh cilant

128kに比べてやや生成速度が速いように見えます。

おわりに

しばらく使ってみた感覚だと、この小ささでかなりの精度の日本語が生成できるのは驚異的です。

GeForce RTX 4090のようなハイエンドゲーミング向けのグラボだとかなり実用的な速度で実行できるため、うまくチューニングすれば業務特化のLLMをローカル運用するというコスパの高い使い方が期待できます。

ちょっと用途を考えてみたいですね。

はじめに

やってみる

実行結果

おわりに

Discussion