


🚀 What is GPTCache?

ChatGPT and various large language models (LLMs) boast incredible versatility, enabling the development of a wide range of applications. However, as your application grows in popularity and encounters higher traffic levels, the expenses related to LLM API calls can become substantial. Additionally, LLM services might exhibit slow response times, especially when dealing with a significant number of requests.

To tackle this challenge, we have created GPTCache, a project dedicated to building a semantic cache for storing LLM responses.


ドキュメントにQuick Startがあるけど


!pip install -q gptcache

OpenAI APIキーを環境変数に読み込み。Colaboratoryで環境変数を指定できるようになったのでそれを使ってみた。

import os
from google.colab import userdata

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

まず普通にOpenAI APIを使った場合

import time
import openai

def response_text(openai_resp):
    return openai_resp['choices'][0]['message']['content']

question = 'what‘s github?'

# OpenAI API original usage
start_time = time.time()
response = openai.ChatCompletion.create(
        'role': 'user',
        'content': question
print(f'Question: {question}')
print("Time consuming: {:.2f}s".format(time.time() - start_time))
print(f'Answer: {response_text(response)}\n')


Question: what‘s github?
Time consuming: 3.11s
Answer: GitHub is a web-based platform that allows developers to collaborate on and share code. It provides a version control system called Git, which allows multiple people to work on a project simultaneously and track changes made to the code. GitHub also offers features like issue tracking, project management, and code review, making it a popular platform for software development. It is widely used in the open-source community and by many companies for collaborative software development.


import time

def response_text(openai_resp):
    return openai_resp['choices'][0]['message']['content']

print("Cache loading.....")

# To use GPTCache, that's all you need
# -------------------------------------------------
from gptcache import cache
from gptcache.adapter import openai

# -------------------------------------------------

question = "what's github"
for _ in range(2):
    start_time = time.time()
    response = openai.ChatCompletion.create(
            'role': 'user',
            'content': question
    print(f'Question: {question}')
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
    print(f'Answer: {response_text(response)}\n')
Cache loading.....
Question: what's github
Time consuming: 2.72s
Answer: GitHub is a web-based platform for version control and collaboration that allows developers to work together on projects, record changes made to code, and share their work with others. It uses the Git version control system, which allows multiple developers to make changes to a project simultaneously, while keeping track of those changes and allowing for easy collaboration and merging of code. GitHub provides a graphical interface and various features for managing repositories, tracking issues, and facilitating project collaboration. It is widely used in the software development community for managing and hosting source code.

Question: what's github
Time consuming: 7.84s
Answer: GitHub is a web-based platform for version control and collaboration that allows developers to work together on projects, record changes made to code, and share their work with others. It uses the Git version control system, which allows multiple developers to make changes to a project simultaneously, while keeping track of those changes and allowing for easy collaboration and merging of code. GitHub provides a graphical interface and various features for managing repositories, tracking issues, and facilitating project collaboration. It is widely used in the software development community for managing and hosting source code.



question = "what's Google Colaboratory"


Cache loading.....
Question: what's Google Colaboratory
Time consuming: 2.45s
Answer: Google Colaboratory, also known as Colab, is a free cloud-based Jupyter notebook environment offered by Google. It allows users to write and execute Python code in a web browser without the need for any setup or installation. Colab provides access to powerful computing resources, including GPUs and TPUs, and allows collaborative editing, sharing, and commenting on notebooks. Users can import data, install packages, and run code cells to perform data analysis, deep learning, machine learning, and other data-related tasks conveniently.

Question: what's Google Colaboratory
Time consuming: 0.00s
Answer: Google Colaboratory, also known as Colab, is a free cloud-based Jupyter notebook environment offered by Google. It allows users to write and execute Python code in a web browser without the need for any setup or installation. Colab provides access to powerful computing resources, including GPUs and TPUs, and allows collaborative editing, sharing, and commenting on notebooks. Users can import data, install packages, and run code cells to perform data analysis, deep learning, machine learning, and other data-related tasks conveniently.


from gptcache import cache
from gptcache.adapter import openai


gptcache.adapterがopenaiをオーバーライドしている。デフォルトだと"exact match"、つまり入力と完全マッチした場合にキャッシュが使用されるということらしい。このキャッシュの振る舞いを決めているのがdata_managerらしい。


  • 入力内容のembeddingsを取得して、キャッシュ化
  • 入力ごとにembeddingsと類似検索して一定の類似度以上ならばキャッシュを使う


import time

def response_text(openai_resp):
    return openai_resp['choices'][0]['message']['content']

from gptcache import cache
from gptcache.adapter import openai
from gptcache.embedding import Onnx
from gptcache.manager import CacheBase, VectorBase, get_data_manager
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation

print("Cache loading.....")

onnx = Onnx()
data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("faiss", dimension=onnx.dimension))

questions = [
    "what's github",
    "can you explain what GitHub is",
    "can you tell me more about GitHub",
    "what is the purpose of GitHub"

for question in questions:
    start_time = time.time()
    response = openai.ChatCompletion.create(
                'role': 'user',
                'content': question
    print(f'Question: {question}')
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
    print(f'Answer: {response_text(response)}\n')


Cache loading.....
Question: what's github
Time consuming: 4.07s
Answer: GitHub is a development platform that allows developers to store and manage their code repositories in a centralized and distributed version control system called Git. It provides a web-based interface for collaborating with other developers, managing projects, tracking issues, and organizing code contributions. GitHub also allows users to showcase their work by creating and hosting websites directly from their repositories. It is widely used in the software development industry for open-source projects, as well as for private code repositories and team collaborations.

Question: can you explain what GitHub is
Time consuming: 1.33s
Answer: GitHub is a development platform that allows developers to store and manage their code repositories in a centralized and distributed version control system called Git. It provides a web-based interface for collaborating with other developers, managing projects, tracking issues, and organizing code contributions. GitHub also allows users to showcase their work by creating and hosting websites directly from their repositories. It is widely used in the software development industry for open-source projects, as well as for private code repositories and team collaborations.

Question: can you tell me more about GitHub
Time consuming: 1.07s
Answer: GitHub is a development platform that allows developers to store and manage their code repositories in a centralized and distributed version control system called Git. It provides a web-based interface for collaborating with other developers, managing projects, tracking issues, and organizing code contributions. GitHub also allows users to showcase their work by creating and hosting websites directly from their repositories. It is widely used in the software development industry for open-source projects, as well as for private code repositories and team collaborations.

Question: what is the purpose of GitHub
Time consuming: 0.80s
Answer: GitHub is a development platform that allows developers to store and manage their code repositories in a centralized and distributed version control system called Git. It provides a web-based interface for collaborating with other developers, managing projects, tracking issues, and organizing code contributions. GitHub also allows users to showcase their work by creating and hosting websites directly from their repositories. It is widely used in the software development industry for open-source projects, as well as for private code repositories and team collaborations.



  1. キャッシュのビルド
  2. LLMの選択


1. キャッシュのビルド




  • OpenAI
  • Cohere
  • HuggingFace
  • ONNX
  • data2vec
  • SentenceTransformers
  • FastText
  • PaddleNLP
  • RWKV
  • Timm
  • UForm
  • ViT


例えばOpenAIのEmbedding APIを使う場合だとこんな感じ?

from gptcache.core import cache, Config
from gptcache.manager import get_data_manager, CacheBase, VectorBase
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation
from gptcache.embedding import OpenAI

openai_embedding = OpenAI()

data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("faiss", dimension=openai_embedding.dimension))





  • 元の入力内容
  • プロンプト
  • 回答
  • アクセス時刻


  • SQLite
  • MySQL
  • PostgreSQL
  • MariaDB
  • SQL Server
  • Oracle
  • DuckDB
  • DynamoDB
  • MongoDB
  • Redis



  • Milvus/Zilliz Cloud
  • ChromaDB
  • docarray
  • Redis Vectorstore
  • weaviate
  • hnswlib
  • Qdrant
  • PGVector
  • usearch



  • ローカルディスク
  • S3



from gptcache.manager import get_data_manager, CacheBase, VectorBase
from gptcache.embedding import OpenAI

openai_embedding = OpenAI()

data_manager = get_data_manager(CacheBase("mysql"), VectorBase("milvus", dimension=openai_embedding.dimension), max_size=100, eviction='LRU') 



  • LRU(Least Recently Used) ・・・ 最も長い間使われていないものから消される
  • LFU(Least Frequency Used) ・・・ 最も使用回数が少ないものから消される
  • FIFO(First in First out) ・・・ 古いものから消される
  • RR(Random Replacment) ・・・ ランダムに消される



  • user request data
  • cached data
  • user-defined parameters


  • exact match evaluation ・・・ 完全一致
  • embedding distance evaluation ・・・ Embeddingsの距離≒類似度
  • ONNX model evaluation ・・・ ONNXを使ってモデルによる判定を行う。 PytorchやTensorRTがサポートされている。
  • Numpy norm ・・・ Numpyによるベクトルの大きさ≒距離?を使う



  • log_time_func ・・・ embeddingやsearchなどの時間をロギングする。
  • similarity_threshold ・・・類似度のしきい値を設定する。

2. LLMの選択


  • OpenAI
  • LangChain
  • Diffuser
  • Dolly
  • llama.cpp
  • MiniGPT-4
  • Replicate
  • Stability SDK

これ以外に、API adapterというのがあり、これをつかうと対応していないものでも対応可能っぽい?多分自分で定義することになる?しらんけど。




!pip install -q gptcache redis redis-om openai faiss-cpu tiktoken
!apt update && apt install sqlite3
import os
from google.colab import userdata

os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
import time
from gptcache import cache
from gptcache.adapter import openai
from gptcache.manager import CacheBase, VectorBase, get_data_manager
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation
from gptcache.embedding import OpenAI

def response_text(openai_resp):
    return openai_resp['choices'][0]['message']['content']

print("Cache loading.....")

openai_embedding = OpenAI()
data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("chromadb", dimension=openai_embedding.dimension))
questions = [
    "what's github",
    "can you explain what GitHub is",
    "can you tell me more about GitHub",
    "what is the purpose of GitHub"

for question in questions:
    start_time = time.time()
    response = openai.ChatCompletion.create(
                'role': 'user',
                'content': question
    print(f'Question: {question}')
    print("Time consuming: {:.2f}s".format(time.time() - start_time))
    print(f'Answer: {response_text(response)}\n')


Question: what's github
Time consuming: 2.62s
Answer: GitHub is a web-based platform used for version control and collaboration in software development projects. It provides features like code repository hosting, issue tracking, pull requests, and code review. GitHub allows developers to work together on projects, share and contribute to open-source projects, and keep track of changes made to the codebase. It is widely used in the software development community and has a vast collection of public repositories covering various programming languages and frameworks.

Question: can you explain what GitHub is
Time consuming: 1.44s
Answer: GitHub is a web-based platform used for version control and collaboration in software development projects. It provides features like code repository hosting, issue tracking, pull requests, and code review. GitHub allows developers to work together on projects, share and contribute to open-source projects, and keep track of changes made to the codebase. It is widely used in the software development community and has a vast collection of public repositories covering various programming languages and frameworks.

Question: can you tell me more about GitHub
Time consuming: 0.15s
Answer: GitHub is a web-based platform used for version control and collaboration in software development projects. It provides features like code repository hosting, issue tracking, pull requests, and code review. GitHub allows developers to work together on projects, share and contribute to open-source projects, and keep track of changes made to the codebase. It is widely used in the software development community and has a vast collection of public repositories covering various programming languages and frameworks.

Question: what is the purpose of GitHub
Time consuming: 0.14s
Answer: GitHub is a web-based platform used for version control and collaboration in software development projects. It provides features like code repository hosting, issue tracking, pull requests, and code review. GitHub allows developers to work together on projects, share and contribute to open-source projects, and keep track of changes made to the codebase. It is widely used in the software development community and has a vast collection of public repositories covering various programming languages and frameworks.


Question: what's github
Time consuming: 0.14s
Answer: GitHub is a web-based platform used for version control and collaboration in software development projects. It provides features like code repository hosting, issue tracking, pull requests, and code review. GitHub allows developers to work together on projects, share and contribute to open-source projects, and keep track of changes made to the codebase. It is widely used in the software development community and has a vast collection of public repositories covering various programming languages and frameworks.

Question: can you explain what GitHub is
Time consuming: 0.13s
Answer: GitHub is a web-based platform used for version control and collaboration in software development projects. It provides features like code repository hosting, issue tracking, pull requests, and code review. GitHub allows developers to work together on projects, share and contribute to open-source projects, and keep track of changes made to the codebase. It is widely used in the software development community and has a vast collection of public repositories covering various programming languages and frameworks.

Question: can you tell me more about GitHub
Time consuming: 0.14s
Answer: GitHub is a web-based platform used for version control and collaboration in software development projects. It provides features like code repository hosting, issue tracking, pull requests, and code review. GitHub allows developers to work together on projects, share and contribute to open-source projects, and keep track of changes made to the codebase. It is widely used in the software development community and has a vast collection of public repositories covering various programming languages and frameworks.

Question: what is the purpose of GitHub
Time consuming: 0.14s
Answer: GitHub is a web-based platform used for version control and collaboration in software development projects. It provides features like code repository hosting, issue tracking, pull requests, and code review. GitHub allows developers to work together on projects, share and contribute to open-source projects, and keep track of changes made to the codebase. It is widely used in the software development community and has a vast collection of public repositories covering various programming languages and frameworks.


$ sqlite3 sqlite.db 
SQLite version 3.37.2 2022-01-06 13:25:41
Enter ".help" for usage hints.
sqlite> .tables
gptcache_answer        gptcache_question_dep  gptcache_session     
gptcache_question      gptcache_report   
sqlite> .schema gptcache_question
CREATE TABLE gptcache_question (
        id INTEGER NOT NULL, 
        question VARCHAR(3000) NOT NULL, 
        create_on DATETIME, 
        last_access DATETIME, 
        embedding_data BLOB, 
        deleted INTEGER, 
        PRIMARY KEY (id)
sqlite> .schema gptcache_answer 
CREATE TABLE gptcache_answer (
        id INTEGER NOT NULL, 
        question_id INTEGER NOT NULL, 
        answer VARCHAR(3000) NOT NULL, 
        answer_type INTEGER NOT NULL, 
        PRIMARY KEY (id)
sqlite> .schema gptcache_question_dep 
CREATE TABLE gptcache_question_dep (
        id INTEGER NOT NULL, 
        question_id INTEGER NOT NULL, 
        dep_name VARCHAR(1000) NOT NULL, 
        dep_data VARCHAR(3000) NOT NULL, 
        dep_type INTEGER NOT NULL, 
        PRIMARY KEY (id)
sqlite> .schema gptcache_session
CREATE TABLE gptcache_session (
        id INTEGER NOT NULL, 
        question_id INTEGER NOT NULL, 
        session_id VARCHAR(1000) NOT NULL, 
        session_question VARCHAR(3000) NOT NULL, 
        PRIMARY KEY (id)
sqlite> .schema gptcache_report
CREATE TABLE gptcache_report (
        id INTEGER NOT NULL, 
        user_question VARCHAR(3000) NOT NULL, 
        cache_question_id INTEGER NOT NULL, 
        cache_question VARCHAR(3000) NOT NULL, 
        cache_answer VARCHAR(3000) NOT NULL, 
        similarity FLOAT NOT NULL, 
        cache_delta_time FLOAT NOT NULL, 
        cache_time DATETIME, 
        extra VARCHAR(3000), 
        PRIMARY KEY (id)



import chromadb

chroma_client = chromadb.Client()
collection = chroma_client.get_collection("gptcache")
{'ids': [], 'embeddings': None, 'metadatas': [], 'documents': []}



questions = [
    "what's github",
    "can you explain what GitHub is",
    "can you tell me more about GitHub",
    "what is the purpose of GitHub"

question = questions[0]

start_time = time.time()
response = openai.ChatCompletion.create(
            'role': 'user',
            'content': question
print(f'Question: {question}')
print("Time consuming: {:.2f}s".format(time.time() - start_time))
print(f'Answer: {response_text(response)}\n')
Question: what's github
Time consuming: 2.84s
Answer: GitHub is a web-based platform used for version control and collaboration. It allows software developers to track changes to their code, manage and share their code repositories, and collaborate on projects with other developers. GitHub provides features such as code hosting, pull requests, issue tracking, and project management tools. It is a widely used platform for open-source projects and a popular choice among developers for sharing and contributing to code repositories.


sqlite> .mode line
sqlite> select * from gptcache_question;
            id = 1
      question = what's github
     create_on = 2023-11-05 11:57:46.577177
   last_access = 2023-11-05 11:57:46.577183
embedding_data = 8kF|v<Ms7<C;&<:41,%=Cͼ[fna@=='w;}݌<s%g?ؼu<w:jZPC(y:h<;WPCoe;5V<A<1<ӕGnI<_źPX+z<<])ݺ݇S<Il< <) ݌<};NvǼСeD<u9fq<;0p<<&Q<@<%ݮÔ%<6k[<ל'/5;&<̼
       deleted = 0
sqlite> select * from gptcache_answer;
         id = 1
question_id = 1
     answer = GitHub is a web-based platform used for version control and collaboration. It allows software developers to track changes to their code, manage and share their code repositories, and collaborate on projects with other developers. GitHub provides features such as code hosting, pull requests, issue tracking, and project management tools. It is a widely used platform for open-source projects and a popular choice among developers for sharing and contributing to code repositories.
answer_type = 0


sqlite> select * from gptcache_question_dep;;
sqlite> select * from gptcache_session;
sqlite> select * from gptcache_report;


from pprint import pprint

collection = chroma_client.get_collection("gptcache")
pprint(collection.get(include=["embeddings", "metadatas", "documents"]))
{'documents': [None],
 'embeddings': [[-0.012110523879528046,
 'ids': ['1'],
 'metadatas': [None]}






あとLangChainを使う場合は、LangChain側のキャッシュモジュールでGPTCacheを使うのか、GPTCacheのLangChain adapterを使うのか、の違いがある


  • なんとなくLangChainのほうがいろいろ変化が激しそうなので、 GPTCacheのLangChain adapter使うとつらくなりそうな気がする
  • ただLangChainのドキュメントを斜め読みした限り、キャッシュモジュールはあまり細かい設定ができるようには見えないので、細かい制御がしたければGPTCacheなのかな
  • 個人的には最近はLangChainで書くことがほとんどなくて素のOpenAI API使ってるので、GPTCacheを使うメリットはあるかな、自分で書きたくないし。もしLangChainで書くなら、GPTCacheでキャッシュの抽象化レイヤーをさらに増やすよりかは直接Momento Cacheとかを使う方がいいかな~という気はする。

