Open2

LLMトレース・評価・解析ツール "Phoenix" を試してみる

kun432kun432

以下の記事で紹介されていた。

https://towardsdatascience.com/llm-evals-setup-and-the-metrics-that-matter-2cc27e8e35f3

https://github.com/Arize-ai/phoenix

Phoenix provides MLOps and LLMOps insights at lightning speed with zero-config observability. Phoenix provides a notebook-first experience for monitoring your models and LLM Applications by providing:

  • LLM Traces - Trace through the execution of your LLM Application to understand the internals of your LLM Application and to troubleshoot problems related to things like retrieval and tool execution.
  • LLM Evals - Leverage the power of large language models to evaluate your generative model or application's relevance, toxicity, and more.
  • Embedding Analysis - Explore embedding point-clouds and identify clusters of high drift and performance degradation.
  • RAG Analysis - Visualize your generative application's search and retrieval process to solve improve your retrieval-augmented generation.
  • Structured Data Analysis - Statistically analyze your structured data by performing A/B analysis, temporal drift analysis, and more.
kun432kun432

環境

pyenv+pyenv-virtualenvで。python-3.10.13。

$ pyenv virtualenv 3.10.13 phoenix
$ mkdir phoenix && cd phoenix
$ pyenv local phoenix
$ pip install jupyterlab ipywidgets
$ jupyter-lab --ip='0.0.0.0' --NotebookApp.token=''

以後はjupyter labで。

前提

試すのは以下

  • Embedding Analysis
  • RAG Analysis

LLM RelavanceとTraceは余裕があればやるかもしれない。

あと、可能な限り、日本語データセットを使って試してみたいと思う。