🗳️

Codespaces + Dev Container で weaviate が leader not found な件

2025/07/14に公開

ログを見るのが大変だったがvolumeを消すと直ったという話。

問題

大体以下のようなdevcontanierがあり、これをCodespaceで動かしてみた。

compose.yaml

name: qux

services:
  app:
    image: qux
    build: .
    depends_on:
      - weaviate
  weaviate:
    command: ["--host", "0.0.0.0", "--port", "8080", "--scheme", "http"]
    image: cr.weaviate.io/semitechnologies/weaviate:1.31.3
    ports:
    - 8080:8080
    - 50051:50051
    volumes:
    - weaviate_data:/var/lib/weaviate
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
      ENABLE_API_BASED_MODULES: 'true'
    restart: always

volumes:
  weaviate_data:

.devcontainer/compose.yaml

services:
  app:
    image: qux-dev
    build:
      context: .devcontainer/
      dockerfile: Dockerfile
    volumes:
      - .:/workspace:cached
      - devcontainer-data:/home/vscode:cached
    network_mode: host

volumes:
  devcontainer-data:

起動してcurl http://localhost:8080/は問題ないものの、http://localhost:8080/v1/schemaだとquery: failed to find leader after retries: leader not foundと言われて困った。

ログの確認

困るのがログの確認で、VSCodeの場合Codespaces: Export Logsというコマンドがあるもののこれにコンテナの標準/エラー出力は含まれていないっぽい。

dockerコマンドも使用できなそうで、仕方ないのでvolumeに出力してログインするサービスと共有するという原始的方法を取った。

.devcontainer/compose.yaml

services:
  app:
    # 同上
  weaviate:
    entrypoint: ["sh", "-c"]
    command: ["/bin/weaviate --host 0.0.0.0 --port 8080 --scheme http > /var/shared/weaviate.log 2>&1"]
    volumes:
      - devcontainer-data:/var/shared
    environment:
      LOG_LEVEL: 'debug'

volumes:
  devcontainer-data:

調査 by Claude Code

leader not foundでWeaviateのコードを検索するとRaftアルゴリズムで選出するリーダーがいない、ということらしい。単一ノードだから気にしないでほしいところ。

ここには環境変数でCLUSTER_HOSTNAME: 'node1'を設定するという案があるのだが、これでも解決せず、ログによるといずれにせよリーダー選出を試みている。

…というログを示せればいいのだがClaude Codeに任せていたら記録を失くしてしまった。ただ調査結果を書いて貰ったのが以下で、つまり全く同じクラスターに参加しようとしているので（環境変数ではなく）volumeの永続データにクラスターモードだという情報が残っているのでは、ということだ。

weaviate_investigation_summary.md

# Weaviate Investigation - July 13, 2025

## Issue
Weaviate logs showing "leader not found" errors and repeated cluster join failures.

## Investigation Results
- Error: `query: failed to find leader after retries: leader not found`
- Pattern: Continuous attempts to join cluster at IP 172.18.0.2:8300 (status 8 failures)
- Frequency: Every ~1 second attempts with consistent failures
- Location: Latest logs show persistent join failures from 08:24:00Z to 08:24:49Z

## Root Cause
Weaviate configured for clustering but running as single node in Docker Compose:
- `CLUSTER_HOSTNAME: 'node1'` was set in compose.yaml
- No other cluster members available
- Weaviate keeps trying to form/join cluster

## Solution Applied
Removed `CLUSTER_HOSTNAME: 'node1'` from compose.yaml environment variables to run Weaviate in standalone mode.

## Status
- ✅ compose.yaml modified (CLUSTER_HOSTNAME removed)
- ✅ Environment restarted 
- ❌ **Problem persists**: Cluster join failures continue after restart (08:27:32Z to 08:27:56Z)

## Additional Investigation Required
- **Root Cause**: Weaviate has persistent cluster state in its data volume
- **Evidence**: Continues attempting to join node `33a833d37ee7` at `172.18.0.2:8300` 
- **Solution**: Clear Weaviate data volume to reset cluster configuration

## Next Steps
1. ⚠️ Stop services and clear Weaviate data volume
2. Restart services to initialize fresh single-node instance
3. Verify successful standalone startup without clustering attempts

解決

volumeを綺麗にするにはどうすればいいのか？

Full Rebuild Containerで可能らしい。というのでやってみると、無事にWeaviateが正常になった。

$ curl http://localhost:8080/v1/schema
{"classes":[]}

ログを読むなんて面倒事をしなくて良いのは素晴らしい。どっとはらえ。

問題

ログの確認

調査 by Claude Code

解決

Discussion