iTranslated by AI
GitHub Copilot Transitions to Token-Based Billing: Analyzing Real Costs for GPT-5.4 & Copilot CLI
This information is current as of April 2026, based on the official GitHub blog and documentation (dated 2026-04-27).
I received a notification from GitHub stating that "Copilot will transition to usage-based billing starting June 1st."
As someone who has been managing within the 300-request (PRU) monthly limit of Copilot Pro, I'm curious if this is a positive change. I picked up the unit price for GPT-5.4 from the official pricing table and performed an estimate based on my actual workflow.
What changes, and what stays the same
| Item | Old (until May) | New (June onwards) |
|---|---|---|
| Billing unit | Premium Request Unit (PRU) | GitHub AI Credits |
| Calculation method | Number of requests (flat) | Number of tokens × model unit price |
| Pro monthly limit | 300 PRU | Equivalent to $10 in AI Credits |
| Code completion | No credits required | Stays free |
| Copilot Code Review | PRU only | AI Credits + Actions execution time |
Code completion (inline completion, next edit suggestions) does not consume credits. This remains unchanged. Credits are consumed by chat, CLI, and agent functions, etc.
Monthly budget outlook
1 AI Credit is calculated as $0.01 USD.
- Copilot Pro ($10/month) → 1,000 Credits/month
- Copilot Pro+ ($39/month) → 3,900 Credits/month
Compared to the old "300 PRU" system, the numbers look larger, but how many calls you can actually make depends on the model and the token volume. Since the official unit prices per model have been released, you can perform an estimate.
Cost estimate per GPT-5.4 call
Fees when specifying --model gpt-5.4 in the Copilot CLI.
| Token type | Unit price ($/1M) | Credit equivalent (/1K tokens) |
|---|---|---|
| Input | $2.50 | 0.25 credits |
| Cached input | $0.25 | 0.025 credits |
| Output | $15.00 | 1.50 credits |
Tasks where the output can be controlled to be short have better cost performance.
Case 1: VRM expression review skill (with images)
I will estimate using the skill I wrote about in a previous article, where screenshots are passed to GPT-5.4 to have it return numeric values for facial blends. The actual call looks like this:
copilot -p "Look at the attached image and return the VRM blendshape values for relaxed/happy/surprised only as numbers between 0.00 and 1.00" --model gpt-5.4
An example output is as follows:
relaxed 0.22, happy 0.06, surprised 0.00
The token estimate is as follows:
- Input: Cropped image (380x420px) + prompt ≈ 600 tokens (*Token conversion for images depends on implementation, so this is an estimate. I want to check the measured values with the billing preview released in early May.)
- Output: Numeric response like
relaxed 0.22, happy 0.06≈ 150 tokens
Input: 600 × 0.25 / 1000 = 0.15 credits
Output: 150 × 1.50 / 1000 = 0.225 credits
Total ≈ 0.38 credits / call (approx. 0.004 dollars)
Calculating with Pro (1,000 credits/month), this comes to approx. 2,600 calls/month. This is 8–9 times the headroom of the 300 PRU under the old system.
If the system prompt is cached, the cached input unit price (0.025 credits/1K) will be applied, making it even cheaper.
Case 2: Text-only CLI call
When calling GPT-5.4 without images, such as for code reviews or summaries.
- Input: Code + instructions ≈ 2,000 tokens
- Output: Comments/suggestions ≈ 800 tokens
Input: 2,000 × 0.25 / 1000 = 0.50 credits
Output: 800 × 1.50 / 1000 = 1.20 credits
Total ≈ 1.70 credits / call
With Pro (1,000 credits), that's approx. 590 calls/month. About double the 300 calls from the old system.
Division of labor cost structure between Claude Code and Copilot CLI
My workflow follows this division of labor:
I leave the overall implementation and file editing to Claude Code, while using the Copilot CLI for spot processing like "short numeric evaluations," "checking information on GitHub," and "local reviews." By only sending tasks where the output can be controlled to be short to GPT-5.4, it is easy to balance accuracy and cost. The reason for choosing GPT-5.4 is the high precision in "returning numbers rather than adjectives" (for details, see "Fixing 'a bit stiff' in VRM with numbers — Copilot CLI + GPT-5.4 Delegation Skill").
From June onwards, the call costs for the Copilot CLI will be visualized on a token basis. A clear advantage of usage-based billing is that it becomes easier to design which tasks to send to which model, considering cost and accuracy.
Cautions
Copilot Code Review also consumes Actions minutes separately
From June, Copilot Code Review (the automatic PR review feature) will consume GitHub Actions execution time in addition to AI credits. In private repositories, this will be deducted from your monthly free Actions allowance.
This is a separate axis from CLI and Chat usage billing, so do not confuse the two.
Preview page to be released in early May
GitHub states, "We will release a billing preview page in early May, allowing you to check how your April usage would have been calculated with the new model." I intend to use this, as it is an opportunity to grasp my consumption before the migration.
[Update 2026-05-01] As of 5/1, I checked github.com/settings/billing on a real device, but the billing preview UI for personal accounts had not yet been deployed (internal flag show_preview_bill: false). /settings/billing/premium_request_analytics returns 404 for personal accounts. I will continue to wait for the rollout.
Summary
| Old System | New System (June onwards) | |
|---|---|---|
| Pro Monthly Limit | 300 requests | $10 = 1,000 credits |
| GPT-5.4 lightweight call (≈ 750 tokens) | 1 call = 1 PRU | 1 call ≈ 0.4 credits → approx. 2,600 calls worth |
| GPT-5.4 heavy text (≈ 2,800 tokens) | 1 call = 1 PRU | 1 call ≈ 1.7 credits → approx. 590 calls worth |
| Cost transparency | None (flat count) | Calculable by model-specific token unit price |
For users centered on lightweight tasks, this is essentially an increase in the limit. Even for heavy text tasks, there is about twice the leeway compared to the old system.
Under the old system's PRU, you could only see "how many times you called it," but under the new system, since model-specific unit prices are public, you can design costs in advance. I judge this as a tailwind for a configuration that uses Claude Code for orchestration while spot-delegating to the Copilot CLI.
Copilot utilization from June onwards will not be about "throwing everything at the high-performance model," but rather designing to "entrust the high-performance model with local tasks where the output can be shortened."
Reference
- GitHub Copilot is moving to usage-based billing
- Models and pricing for GitHub Copilot
- Copilot code review will start consuming GitHub Actions minutes", "title": "Copilot Pro is moving from a 300-request monthly limit to token billing—Estimating the actual cost of GPT-5.4 × Copilot CLI"}
Discussion
どういう感じになるのかと思っていたのでイメージしやすい分析、助かりました。VSCodeもトークン効率を上げるための更新が入り始めているので、特に協調型のエージェントに関しては必ずしも使用感が悪化するわけではなさそうな感じはしていたので、数字としての推測は興味深いです。やはりハンドオフ型のエージェントが一番影響大きそうですね……。
ありがとうございます!
ご指摘の通りで、協調型は同一セッションのキャッシュ再利用で救われる余地があります。例えば、 Claude Code+Copilot CLI 構成はまさにこの方向です。一方でハンドオフ型は引き継ぎコンテキストのサイズがそのまま課金に直結するので、設計段階から引き継ぎ粒度を意識する必要が出てきそうですね