Open2
LLMトレース・評価・解析ツール "Phoenix" を試してみる
以下の記事で紹介されていた。
Phoenix provides MLOps and LLMOps insights at lightning speed with zero-config observability. Phoenix provides a notebook-first experience for monitoring your models and LLM Applications by providing:
- LLM Traces - Trace through the execution of your LLM Application to understand the internals of your LLM Application and to troubleshoot problems related to things like retrieval and tool execution.
- LLM Evals - Leverage the power of large language models to evaluate your generative model or application's relevance, toxicity, and more.
- Embedding Analysis - Explore embedding point-clouds and identify clusters of high drift and performance degradation.
- RAG Analysis - Visualize your generative application's search and retrieval process to solve improve your retrieval-augmented generation.
- Structured Data Analysis - Statistically analyze your structured data by performing A/B analysis, temporal drift analysis, and more.
環境
pyenv+pyenv-virtualenvで。python-3.10.13。
$ pyenv virtualenv 3.10.13 phoenix
$ mkdir phoenix && cd phoenix
$ pyenv local phoenix
$ pip install jupyterlab ipywidgets
$ jupyter-lab --ip='0.0.0.0' --NotebookApp.token=''
以後はjupyter labで。
前提
試すのは以下
- Embedding Analysis
- RAG Analysis
LLM RelavanceとTraceは余裕があればやるかもしれない。
あと、可能な限り、日本語データセットを使って試してみたいと思う。