🐥

Open InterpreterをローカルモデルでWSL上で使ってみる

2023/09/16に公開

Python

LLM

Llama

tech

はじめに

やばいと評判のOpen Interpreter。GPT4で使うのが鉄板だと思いますがローカルPCにモデルをロードして使うこともできます。

ここではWindows11のWSL上にOpen InterpreterをインストールしてローカルのCode Llamaを使う方法を紹介します。

試した環境は以下のとおりです

Core i9-13900
64GB RAM
GeForce RTX 4090
Windows 11 Pro
WSL (Ubuntu 22.04 LTS)

セットアップする

こちらのようにWSL上でCUDAが利用できるようになっているのが前提です。

Open InterpreterでローカルでCode Llamaを使う場合は

interpreter --local

とフラグを立てて起動するんですが、llama-cpp-pythonが無いとエラーが発生します。なので事前にCUDAを利用するllama-cpp-pythonをセットアップします。

WSL上でCUDA付きllama-cpp-pythonをインストールする方法はちゃんと検証されてないっぽいですが、ここなどを参考にして以下の方法でインストールできました。

export CUDA_HOME=/usr/local/cuda-12.2
export PATH=${CUDA_HOME}/bin:${PATH}
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:$LD_LIBRARY_PATH
export FORCE_CMAKE=1 
export CMAKE_ARGS=-DLLAMA_CUBLAS=on
pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir -vv

python -c "from llama_cpp import GGML_USE_CUBLAS; print(GGML_USE_CUBLAS)"
True

pip install open-interpreter

python -c "from llama_cpp import GGML_USE_CUBLAS; print(GGML_USE_CUBLAS)"でFalseが返ってきた場合CUDA対応ができていません。この状態でinterpreter --localするとGPUを使わずCPUがブン回ります。

起動してみる

interpreter --local

Open Interpreter will use Code Llama for local execution. Use your arrow keys to set up the model.

[?] Parameter count (smaller is faster, larger is more capable): 13B
   7B
 > 13B
   34B

[?] Quality (smaller is faster, larger is more capable): Large | Size: 12.9 GB, Estimated RAM usage: 15.4 GB
   Small | Size: 5.1 GB, Estimated RAM usage: 7.6 GB
   Medium | Size: 7.3 GB, Estimated RAM usage: 9.8 GB
 > Large | Size: 12.9 GB, Estimated RAM usage: 15.4 GB
   See More

[?] Use GPU? (Large models might crash on GPU, but will run more quickly) (Y/n): Y

This language model was not found on your system.

Download to `/home/user/.local/share/Open Interpreter/models`?
[?]  (Y/n): Y

Downloading (…)b-instruct.Q8_0.gguf: 100%|██████████████████████████████████████████| 13.8G/13.8G [02:01<00:00, 114MB/s]
Model found at /home/user/.local/share/Open Interpreter/models/codellama-13b-instruct.Q8_0.gguf

▌ Model set to TheBloke/CodeLlama-13B-Instruct-GGUF

Open Interpreter will require approval before running code. Use interpreter -y to bypass this.

Press CTRL-C to exit.
>

interpreter --localを実行すると上記のようにモデルを指定したりGPUを使うかなどを設定します。ここではCodeLlama-13B-Instruct-GGUFをダウンロードしています。34Bもやってみましたが、死ぬほど遅かったのでGeForce RTX 4090クラスでは13Bが妥当だと思います。

使ってみる

> MongoDBのexampleというDBの中のUsersコレクションから特定のドキュメントを_idで検索し、1つのドキ ュメントを取得したいです。これをTypeScriptでDeno上で動くコードを書いてください。関数名は文字列変数"getUserById"とし、以下の仕様とします。この関数のファイル名は "getUserById.ts" とします。getUserByIdはstring型の"_id"を第一引数にとります。データベースのUriは　"mongodb://localhost:27017"。 インポートするMongo driver は https://deno.land/x/mongo@v0.31.1/mod.ts。usersコレクションに対す るtype interfaceは次のとおりです。interface UserSchema {_id: ObjectId; no: number; name: string; role: "admin" | "staff"; email: string; createdAt: number; updatedAt: number;}

  import { MongoClient } from "https://deno.land/x/mongo@v0.31.1/mod.ts";

  interface UserSchema {
      _id: ObjectId;
      no: number;
      name: string;
      role: "admin" | "staff";
      email: string;
      createdAt: number;
      updatedAt: number;
  }

  async function getUserById(userId: string): Promise<UserSchema> {
      const client = new MongoClient();
      await client.connect("mongodb://localhost:27017");
      const db = client.database("example");
      const usersCollection = db.collection<UserSchema>("users");
      const user = await usersCollection.findOne({ _id: ObjectId(userId) });
      return user;
  }


  Would you like to run this code? (y/n)

おわりに

これくらいならGPT4とCode Llamaの違いがわかりませんが、実行速度はローカルのほうがかなり速いです。体感3倍くらい。当然といえば当然ですが、この速さは気持ちいいです。

Discussion

ログインするとコメントできます