iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🤖

Automatically Updating GitHub Profile README Statistics (GitHub Actions + Python)

に公開

Introduction

Many people include OSS contribution PR counts and Kaggle statistics in their GitHub Profile README (README.md in the username/username repository).

However, it is tedious to manually update these numbers every time a PR is added.

In this article, I will introduce a mechanism for automated weekly updates using GitHub Actions + Python scripts.

Final Image

Every Monday, GitHub Actions runs automatically, and the following numbers are updated to the latest:

  • Total OSS PR count / Merged count
  • PR statistics for specific Organizations (e.g., team-mirai)
  • Kaggle Dataset count / Notebook count
  • Number of notebooks in the repository

How it Works

1. Embedding HTML markers in README.md

Enclose the numbers you want to update with HTML comments.

<!-- OSS_STATS_START -->(35 PRs / 14 Merged)<!-- OSS_STATS_END -->

Since the markers are HTML comments, they won't be displayed on GitHub. A Python script replaces the text between these markers using regular expressions.

2. Python Script (scripts/update_readme.py)

Retrieving PR Statistics

Use the gh search prs command to search for PRs by repository and aggregate their status (merged/open/closed).

def search_prs(repos: list[str]) -> dict[str, int]:
    """Get PR counts using gh search prs, one repo at a time."""
    all_prs = []
    for repo in repos:
        output = run([
            "gh", "search", "prs",
            "--author", AUTHOR,
            "--repo", repo,
            "--limit", "200",
            "--json", "state",
        ])
        if not output:
            continue
        all_prs.extend(json.loads(output))

    merged = sum(1 for p in all_prs if p["state"].upper() == "MERGED")
    open_count = sum(1 for p in all_prs if p["state"].upper() == "OPEN")
    closed = sum(1 for p in all_prs if p["state"].upper() == "CLOSED")
    return {
        "total": len(all_prs),
        "merged": merged,
        "open": open_count,
        "closed": closed,
    }

Points:

  • Since gh search prs uses the GitHub Search API, other public repositories can also be searched using the default GITHUB_TOKEN.
  • With gh pr list, you might not be able to access other repositories due to GITHUB_TOKEN scope restrictions.
  • Since state values are returned in lowercase (e.g., "merged"), normalize them using .upper().

Retrieving Kaggle Statistics

def get_kaggle_dataset_count() -> int | None:
    output = run([*kaggle_cmd(), "datasets", "list", "--user", "yasunorim", "--csv"])
    if not output:
        return None
    lines = output.strip().split("\n")
    count = len(lines) - 1  # Exclude CSV header
    return count if count > 0 else None

Points: Use the --csv option to format the output as CSV and count the number of lines. If the API fails, return None to maintain existing values.

Marker Replacement

def replace_marker(text: str, marker: str, replacement: str) -> str:
    pattern = rf"(<!-- {marker}_START -->).*?(<!-- {marker}_END -->)"
    return re.sub(pattern, rf"\1{replacement}\2", text, flags=re.DOTALL)

3. Manage Manual Numbers in config.json

Numbers that cannot be automatically retrieved via API, such as Kaggle medal counts or titles, are managed in a configuration file.

{
  "kaggle_bronze_medals": 8,
  "kaggle_title": "Notebooks Expert"
}

4. GitHub Actions Workflow

name: Update Profile README

on:
  schedule:
    - cron: '0 0 * * 1'  # Every Monday at 00:00 UTC
  workflow_dispatch:       # Allows manual execution

permissions:
  contents: write

jobs:
  update:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Configure Kaggle credentials
        env:
          KAGGLE_USERNAME: ${{ secrets.KAGGLE_USERNAME }}
          KAGGLE_KEY: ${{ secrets.KAGGLE_KEY }}
        run: |
          mkdir -p ~/.kaggle
          echo '{"username":"'"$KAGGLE_USERNAME"'","key":"'"$KAGGLE_KEY"'"}' > ~/.kaggle/kaggle.json
          chmod 600 ~/.kaggle/kaggle.json

      - name: Update README stats
        env:
          GH_TOKEN: ${{ github.token }}
        run: python scripts/update_readme.py

      - name: Commit and push if changed
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add README.md
          if git diff --cached --quiet; then
            echo "No changes to commit"
          else
            git commit -m "docs: update profile stats $(date -u +%Y-%m-%d)"
            git push
          fi

Setup Procedures

1. Repository Structure

username/username/
├── README.md              # Contains HTML markers
├── config.json            # Manually managed numbers
├── requirements.txt       # kaggle
├── scripts/
│   └── update_readme.py   # Update script
└── .github/workflows/
    └── update-profile.yml # Weekly workflow

2. GitHub Secrets Configuration

When using the Kaggle API, add the following to Settings → Secrets of the repository:

  • KAGGLE_USERNAME: Your Kaggle username
  • KAGGLE_KEY: The key value from ~/.kaggle/kaggle.json

3. Inserting Markers into README.md

Enclose the numbers you want to update with markers:

<!-- OSS_STATS_START -->(35 PRs / 14 Merged)<!-- OSS_STATS_END -->

4. Manual Testing

# Run locally (if logged in to gh CLI)
python scripts/update_readme.py

# Run manually via GitHub Actions
gh workflow run update-profile.yml

Pitfalls

gh pr list vs gh search prs

Initially, I was using gh pr list --repo <repo>, but with the GITHUB_TOKEN in GitHub Actions, access to repositories other than your own is restricted, resulting in 0 items for all repositories.

Since gh search prs uses the GitHub Search API, you can search public repositories using the default token.

Capitalization of state

  • gh pr list --json state → Uppercase ("MERGED", "OPEN", "CLOSED")
  • gh search prs --json state → Lowercase ("merged", "open", "closed")

It is safer to normalize them using .upper().

Safety Design for API Failures

Since the Kaggle CLI may not work in certain environments, the script is designed to return None upon API failure to maintain the existing values. If you overwrite them with 0, the numbers in your README will temporarily display incorrectly.

Summary

  • Update numbers at any desired position in the README using HTML markers + regular expression replacement.
  • Retrieve PR statistics for public repositories using gh search prs without additional tokens.
  • Safety design that maintains existing values upon API failure.
  • Manage numbers that cannot be retrieved via API in config.json.
  • Hybrid operation with weekly schedules and manual execution.

This is highly recommended for anyone who includes statistics in their Profile README, as it completely eliminates the hassle of manual updates.

GitHubで編集を提案

Discussion