iTranslated by AI
Automatically Updating GitHub Profile README Statistics (GitHub Actions + Python)
Introduction
Many people include OSS contribution PR counts and Kaggle statistics in their GitHub Profile README (README.md in the username/username repository).
However, it is tedious to manually update these numbers every time a PR is added.
In this article, I will introduce a mechanism for automated weekly updates using GitHub Actions + Python scripts.
Final Image
Every Monday, GitHub Actions runs automatically, and the following numbers are updated to the latest:
- Total OSS PR count / Merged count
- PR statistics for specific Organizations (e.g., team-mirai)
- Kaggle Dataset count / Notebook count
- Number of notebooks in the repository
How it Works
1. Embedding HTML markers in README.md
Enclose the numbers you want to update with HTML comments.
<!-- OSS_STATS_START -->(35 PRs / 14 Merged)<!-- OSS_STATS_END -->
Since the markers are HTML comments, they won't be displayed on GitHub. A Python script replaces the text between these markers using regular expressions.
2. Python Script (scripts/update_readme.py)
Retrieving PR Statistics
Use the gh search prs command to search for PRs by repository and aggregate their status (merged/open/closed).
def search_prs(repos: list[str]) -> dict[str, int]:
"""Get PR counts using gh search prs, one repo at a time."""
all_prs = []
for repo in repos:
output = run([
"gh", "search", "prs",
"--author", AUTHOR,
"--repo", repo,
"--limit", "200",
"--json", "state",
])
if not output:
continue
all_prs.extend(json.loads(output))
merged = sum(1 for p in all_prs if p["state"].upper() == "MERGED")
open_count = sum(1 for p in all_prs if p["state"].upper() == "OPEN")
closed = sum(1 for p in all_prs if p["state"].upper() == "CLOSED")
return {
"total": len(all_prs),
"merged": merged,
"open": open_count,
"closed": closed,
}
Points:
- Since
gh search prsuses the GitHub Search API, other public repositories can also be searched using the defaultGITHUB_TOKEN. - With
gh pr list, you might not be able to access other repositories due toGITHUB_TOKENscope restrictions. - Since
statevalues are returned in lowercase (e.g.,"merged"), normalize them using.upper().
Retrieving Kaggle Statistics
def get_kaggle_dataset_count() -> int | None:
output = run([*kaggle_cmd(), "datasets", "list", "--user", "yasunorim", "--csv"])
if not output:
return None
lines = output.strip().split("\n")
count = len(lines) - 1 # Exclude CSV header
return count if count > 0 else None
Points: Use the --csv option to format the output as CSV and count the number of lines. If the API fails, return None to maintain existing values.
Marker Replacement
def replace_marker(text: str, marker: str, replacement: str) -> str:
pattern = rf"(<!-- {marker}_START -->).*?(<!-- {marker}_END -->)"
return re.sub(pattern, rf"\1{replacement}\2", text, flags=re.DOTALL)
3. Manage Manual Numbers in config.json
Numbers that cannot be automatically retrieved via API, such as Kaggle medal counts or titles, are managed in a configuration file.
{
"kaggle_bronze_medals": 8,
"kaggle_title": "Notebooks Expert"
}
4. GitHub Actions Workflow
name: Update Profile README
on:
schedule:
- cron: '0 0 * * 1' # Every Monday at 00:00 UTC
workflow_dispatch: # Allows manual execution
permissions:
contents: write
jobs:
update:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install dependencies
run: pip install -r requirements.txt
- name: Configure Kaggle credentials
env:
KAGGLE_USERNAME: ${{ secrets.KAGGLE_USERNAME }}
KAGGLE_KEY: ${{ secrets.KAGGLE_KEY }}
run: |
mkdir -p ~/.kaggle
echo '{"username":"'"$KAGGLE_USERNAME"'","key":"'"$KAGGLE_KEY"'"}' > ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json
- name: Update README stats
env:
GH_TOKEN: ${{ github.token }}
run: python scripts/update_readme.py
- name: Commit and push if changed
run: |
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"
git add README.md
if git diff --cached --quiet; then
echo "No changes to commit"
else
git commit -m "docs: update profile stats $(date -u +%Y-%m-%d)"
git push
fi
Setup Procedures
1. Repository Structure
username/username/
├── README.md # Contains HTML markers
├── config.json # Manually managed numbers
├── requirements.txt # kaggle
├── scripts/
│ └── update_readme.py # Update script
└── .github/workflows/
└── update-profile.yml # Weekly workflow
2. GitHub Secrets Configuration
When using the Kaggle API, add the following to Settings → Secrets of the repository:
-
KAGGLE_USERNAME: Your Kaggle username -
KAGGLE_KEY: Thekeyvalue from~/.kaggle/kaggle.json
3. Inserting Markers into README.md
Enclose the numbers you want to update with markers:
<!-- OSS_STATS_START -->(35 PRs / 14 Merged)<!-- OSS_STATS_END -->
4. Manual Testing
# Run locally (if logged in to gh CLI)
python scripts/update_readme.py
# Run manually via GitHub Actions
gh workflow run update-profile.yml
Pitfalls
gh pr list vs gh search prs
Initially, I was using gh pr list --repo <repo>, but with the GITHUB_TOKEN in GitHub Actions, access to repositories other than your own is restricted, resulting in 0 items for all repositories.
Since gh search prs uses the GitHub Search API, you can search public repositories using the default token.
Capitalization of state
-
gh pr list --json state→ Uppercase ("MERGED","OPEN","CLOSED") -
gh search prs --json state→ Lowercase ("merged","open","closed")
It is safer to normalize them using .upper().
Safety Design for API Failures
Since the Kaggle CLI may not work in certain environments, the script is designed to return None upon API failure to maintain the existing values. If you overwrite them with 0, the numbers in your README will temporarily display incorrectly.
Summary
- Update numbers at any desired position in the README using HTML markers + regular expression replacement.
- Retrieve PR statistics for public repositories using
gh search prswithout additional tokens. - Safety design that maintains existing values upon API failure.
- Manage numbers that cannot be retrieved via API in
config.json. - Hybrid operation with weekly schedules and manual execution.
This is highly recommended for anyone who includes statistics in their Profile README, as it completely eliminates the hassle of manual updates.
Discussion