iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
💰

GitHub Copilot Transitions to Token-Based Billing: Analyzing Real Costs for GPT-5.4 & Copilot CLI

に公開
2

This information is current as of April 2026, based on the official GitHub blog and documentation (dated 2026-04-27).

I received a notification from GitHub stating that "Copilot will transition to usage-based billing starting June 1st."

As someone who has been managing within the 300-request (PRU) monthly limit of Copilot Pro, I'm curious if this is a positive change. I picked up the unit price for GPT-5.4 from the official pricing table and performed an estimate based on my actual workflow.

What changes, and what stays the same

Item Old (until May) New (June onwards)
Billing unit Premium Request Unit (PRU) GitHub AI Credits
Calculation method Number of requests (flat) Number of tokens × model unit price
Pro monthly limit 300 PRU Equivalent to $10 in AI Credits
Code completion No credits required Stays free
Copilot Code Review PRU only AI Credits + Actions execution time

Code completion (inline completion, next edit suggestions) does not consume credits. This remains unchanged. Credits are consumed by chat, CLI, and agent functions, etc.

Monthly budget outlook

1 AI Credit is calculated as $0.01 USD.

  • Copilot Pro ($10/month) → 1,000 Credits/month
  • Copilot Pro+ ($39/month) → 3,900 Credits/month

Compared to the old "300 PRU" system, the numbers look larger, but how many calls you can actually make depends on the model and the token volume. Since the official unit prices per model have been released, you can perform an estimate.

Cost estimate per GPT-5.4 call

Fees when specifying --model gpt-5.4 in the Copilot CLI.

Token type Unit price ($/1M) Credit equivalent (/1K tokens)
Input $2.50 0.25 credits
Cached input $0.25 0.025 credits
Output $15.00 1.50 credits

Tasks where the output can be controlled to be short have better cost performance.

Case 1: VRM expression review skill (with images)

I will estimate using the skill I wrote about in a previous article, where screenshots are passed to GPT-5.4 to have it return numeric values for facial blends. The actual call looks like this:

copilot -p "Look at the attached image and return the VRM blendshape values for relaxed/happy/surprised only as numbers between 0.00 and 1.00" --model gpt-5.4

An example output is as follows:

relaxed 0.22, happy 0.06, surprised 0.00

The token estimate is as follows:

  • Input: Cropped image (380x420px) + prompt ≈ 600 tokens (*Token conversion for images depends on implementation, so this is an estimate. I want to check the measured values with the billing preview released in early May.)
  • Output: Numeric response like relaxed 0.22, happy 0.06 ≈ 150 tokens
Input:  600 × 0.25 / 1000 = 0.15 credits
Output: 150 × 1.50 / 1000 = 0.225 credits
Total   ≈ 0.38 credits / call (approx. 0.004 dollars)

Calculating with Pro (1,000 credits/month), this comes to approx. 2,600 calls/month. This is 8–9 times the headroom of the 300 PRU under the old system.

If the system prompt is cached, the cached input unit price (0.025 credits/1K) will be applied, making it even cheaper.

Case 2: Text-only CLI call

When calling GPT-5.4 without images, such as for code reviews or summaries.

  • Input: Code + instructions ≈ 2,000 tokens
  • Output: Comments/suggestions ≈ 800 tokens
Input:  2,000 × 0.25 / 1000 = 0.50 credits
Output:   800 × 1.50 / 1000 = 1.20 credits
Total   ≈ 1.70 credits / call

With Pro (1,000 credits), that's approx. 590 calls/month. About double the 300 calls from the old system.

Division of labor cost structure between Claude Code and Copilot CLI

My workflow follows this division of labor:

I leave the overall implementation and file editing to Claude Code, while using the Copilot CLI for spot processing like "short numeric evaluations," "checking information on GitHub," and "local reviews." By only sending tasks where the output can be controlled to be short to GPT-5.4, it is easy to balance accuracy and cost. The reason for choosing GPT-5.4 is the high precision in "returning numbers rather than adjectives" (for details, see "Fixing 'a bit stiff' in VRM with numbers — Copilot CLI + GPT-5.4 Delegation Skill").

From June onwards, the call costs for the Copilot CLI will be visualized on a token basis. A clear advantage of usage-based billing is that it becomes easier to design which tasks to send to which model, considering cost and accuracy.

Cautions

Copilot Code Review also consumes Actions minutes separately

From June, Copilot Code Review (the automatic PR review feature) will consume GitHub Actions execution time in addition to AI credits. In private repositories, this will be deducted from your monthly free Actions allowance.

This is a separate axis from CLI and Chat usage billing, so do not confuse the two.

Preview page to be released in early May

GitHub states, "We will release a billing preview page in early May, allowing you to check how your April usage would have been calculated with the new model." I intend to use this, as it is an opportunity to grasp my consumption before the migration.

[Update 2026-05-01] As of 5/1, I checked github.com/settings/billing on a real device, but the billing preview UI for personal accounts had not yet been deployed (internal flag show_preview_bill: false). /settings/billing/premium_request_analytics returns 404 for personal accounts. I will continue to wait for the rollout.

Summary

Old System New System (June onwards)
Pro Monthly Limit 300 requests $10 = 1,000 credits
GPT-5.4 lightweight call (≈ 750 tokens) 1 call = 1 PRU 1 call ≈ 0.4 credits → approx. 2,600 calls worth
GPT-5.4 heavy text (≈ 2,800 tokens) 1 call = 1 PRU 1 call ≈ 1.7 credits → approx. 590 calls worth
Cost transparency None (flat count) Calculable by model-specific token unit price

For users centered on lightweight tasks, this is essentially an increase in the limit. Even for heavy text tasks, there is about twice the leeway compared to the old system.

Under the old system's PRU, you could only see "how many times you called it," but under the new system, since model-specific unit prices are public, you can design costs in advance. I judge this as a tailwind for a configuration that uses Claude Code for orchestration while spot-delegating to the Copilot CLI.

Copilot utilization from June onwards will not be about "throwing everything at the high-performance model," but rather designing to "entrust the high-performance model with local tasks where the output can be shortened."


Reference

GitHubで編集を提案

Discussion

Hideki SaitoHideki Saito

どういう感じになるのかと思っていたのでイメージしやすい分析、助かりました。VSCodeもトークン効率を上げるための更新が入り始めているので、特に協調型のエージェントに関しては必ずしも使用感が悪化するわけではなさそうな感じはしていたので、数字としての推測は興味深いです。やはりハンドオフ型のエージェントが一番影響大きそうですね……。

toki_mwctoki_mwc

ありがとうございます!
ご指摘の通りで、協調型は同一セッションのキャッシュ再利用で救われる余地があります。例えば、 Claude Code+Copilot CLI 構成はまさにこの方向です。一方でハンドオフ型は引き継ぎコンテキストのサイズがそのまま課金に直結するので、設計段階から引き継ぎ粒度を意識する必要が出てきそうですね