iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🔄

Automating Cross-Repository README Updates with GitHub Actions

に公開

Introduction

When managing multiple repositories on GitHub, there are times when you want to reflect data from one repository in the README of another.

  • You want to display statistics for each repository in your profile README.
  • You want to sync a summary of a monorepo to a separate repository.
  • You want to display package versions across multiple repositories.

Manual updates are easy to forget, and the numbers can quickly drift. In this article, I will explain how to implement automated cross-repo README synchronization using GitHub Actions, along with common pitfalls.

Design Pattern: push vs. pull

There are two main approaches to cross-repo synchronization.

Push method (write from the data source to the target repo)

Data Source Repo → (PAT) → Update target repo's README
  • Requires a Personal Access Token (PAT) (GITHUB_TOKEN can only write to its own repository).
  • Managing PAT permissions can be cumbersome.
  • Fine-grained PATs may result in 403 errors due to insufficient permissions.

Pull method (the target repo reads from the data source)

Target Repo → (GITHUB_TOKEN) → Read source repo's README via API
            → (GITHUB_TOKEN) → Update own README
  • No PAT required (everything can be handled by GITHUB_TOKEN).
  • No permission concerns if the data source is a public repository.
  • You only need to place the workflow in the target repository.

Conclusion: The pull method is significantly easier. You can avoid all the headaches of issuing, managing, and troubleshooting PAT permissions.

Implementation

1. Prepare HTML comment markers

Wrap the sections you want to auto-update in the target README with markers.

## Stats

<!-- STATS_START -->(10 PRs / 5 Merged)<!-- STATS_END --> across 3 repositories.

Since it only rewrites content between the markers, it will not affect other parts of the README.

2. A script to parse the source README

import base64
import re
import subprocess
import sys
from pathlib import Path

README = Path(__file__).resolve().parent.parent / "README.md"


def run(cmd: list[str]) -> str:
    # Execute command and return output
    result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
    return result.stdout.strip()


def fetch_source_readme(owner: str, repo: str) -> str | None:
    """Fetch README via GitHub API (no token needed for public repos)"""
    output = run([
        "gh", "api",
        f"repos/{owner}/{repo}/contents/README.md",
        "--jq", ".content",
    ])
    if not output:
        return None
    return base64.b64decode(output).decode("utf-8")


def replace_marker(text: str, marker: str, replacement: str) -> str:
    """Replace text between HTML comment markers"""
    pattern = rf"(<!-- {marker}_START -->).*?(<!-- {marker}_END -->)"
    return re.sub(pattern, rf"\1{replacement}\2", text, flags=re.DOTALL)


def parse_stats(source_text: str) -> dict:
    """Extract stats from source README (assuming table format)"""
    # Example: | **Total** | | **43** | **21** | ... |
    m = re.search(
        r"\| \*\*Total\*\* \|.*?\| \*\*(\d+)\*\* \| \*\*(\d+)\*\*",
        source_text,
    )
    if not m:
        return {}
    return {"total": int(m.group(1)), "merged": int(m.group(2))}


def main():
    source = fetch_source_readme("your-org", "your-source-repo")
    if not source:
        print("Failed to fetch source README", file=sys.stderr)
        sys.exit(1)

    stats = parse_stats(source)
    if not stats:
        print("Failed to parse stats", file=sys.stderr)
        sys.exit(1)

    readme = README.read_text(encoding="utf-8")
    readme = replace_marker(
        readme, "STATS",
        f"({stats['total']} PRs / {stats['merged']} Merged)",
    )
    README.write_text(readme, encoding="utf-8")
    print(f"Updated: {stats['total']} PRs / {stats['merged']} Merged")


if __name__ == "__main__":
    main()

3. Workflow

name: Sync README Stats

on:
  schedule:
    # Run after the data source update schedule
    - cron: '30 9 * * 1'
  workflow_dispatch:

permissions:
  contents: write

jobs:
  sync:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Sync stats from source repo
        env:
          GH_TOKEN: ${{ github.token }}
        run: python scripts/sync_stats.py

      - name: Commit and push if changed
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add README.md
          if ! git diff --cached --quiet; then
            git commit -m "docs: sync stats $(date -u +%Y-%m-%d)"
            git push
          fi

Common Pitfalls

403 errors with the PAT method

If you implement the push method, you may encounter 403 errors even when setting "All repositories" and "Contents: Read and write" permissions on a Fine-grained PAT.

remote: Permission to user/repo.git denied to user.
fatal: unable to access '...': The requested URL returned error: 403

The same 403 error often occurs with the GitHub Contents API (-X PUT). Since it is difficult to identify the exact cause, switching to the pull method is the most reliable solution.

cron timing

If the data source repo is updated on Mondays at 09:00, schedule your sync workflow for after 09:30.

# ❌ Same time as data source → High risk of getting stale data
- cron: '0 9 * * 1'

# ✅ Execute after data source updates
- cron: '30 9 * * 1'

Marker design

Make marker names unique for each purpose to avoid collisions.

<!-- PROJECT_STATS_START -->...<!-- PROJECT_STATS_END -->
<!-- BADGE_COUNT_START -->...<!-- BADGE_COUNT_END -->

The replace_marker function only replaces text between the specific markers, ensuring the structure of your README remains intact.

Summary

Point Description
Use pull method Place the workflow in the target repo and handle everything with GITHUB_TOKEN
HTML comment markers Clearly mark update sections to prevent impact on other content
Stagger cron jobs Run synchronization after the data source update completes
Single Source of Truth Centralize the data source and pull from it in other repos

If you are displaying the same numbers in multiple READMEs, definitely give this a try.

GitHubで編集を提案

Discussion