iTranslated by AI
Major Generative AI Model Availability in Japan Regions: Azure, AWS, and Google Cloud [As of Sept 10, 2024] 🚀
Update Information
- September 10, 2024: Updated information regarding retirement dates for gpt-3.5-turbo/gpt-4 models
- September 5, 2024: Added information on how to check PTU capacity for the Japan East region
- September 4, 2024: Added information for gpt-4o 2024-08-06
- September 2, 2024: Minor corrections (no significant updates since August 29, 2024)
- August 29, 2024: First draft
Introduction
In recent years, as the adoption of generative AI in enterprises accelerates, the choice of generative AI models provided by various cloud vendors has become a critical decision directly linked to business success.
However, the availability and provision types of generative AI models (such as Azure OpenAI's GPT models, AWS's Claude, and Google Cloud's Gemini/Claude) in Japanese regions like Tokyo are subject to frequent updates.
Since entering August 2024, articles like the following have surfaced, causing quite a stir in the community.
The reason for this buzz, especially in enterprise use, is that from viewpoints such as:
- Legal regulations (Act on the Protection of Personal Information, etc.)
- Compliance (particularly strict in healthcare, finance, and government sectors)
There are many cases where API usage for generative AI models is required to keep communication and data storage entirely within data centers located in Japan.
In this article, I hope to help you understand the current situation and assist in planning internal and external service offerings by aggregating an overview of the availability and provision types of major generative AI models from major cloud vendors.
I plan to update this article periodically with corrections and additions.
Information as of September 10, 2024
Azure OpenAI GPT Model Availability @ Japan East Region
I have summarized the availability and provision types of OpenAI's GPT models in the Azure Japan East region in the table below.
Official documentation can sometimes be difficult to parse as it includes information for all regions. Therefore, the following is an excerpted summary specifically for the Japan East region.
| Model Name | Model Version | Standard Provision | PTU Provision | Global-Standard Provision | Global-Batch Provision | Model Retirement Date |
|---|---|---|---|---|---|---|
| gpt-3.5-turbo | 0301 | Not provided | Not provided | Not provided | Not provided |
Retirement planned for January 27, 2025 Automatic updates are scheduled to begin on November 15, 2024, for those who have configured automatic updates to gpt-3.5-turbo(0125) |
| gpt-3.5-turbo (incl. 16K) | 0613 | Available⭕ | Not provided | Not provided | Not provided |
Retirement planned for January 27, 2025 Automatic updates are scheduled to begin on November 15, 2024, for those who have configured automatic updates to gpt-3.5-turbo(0125) |
| gpt-35-turbo-instruct | 0914 | Not provided | Not provided | Not provided | Not provided | After September 14, 2025 |
| gpt-3.5-turbo | 1106 | Not provided | Not provided | Not provided | Not provided | After November 17, 2024 Automatic updates are scheduled to begin on November 15, 2024, for those who have configured automatic updates to gpt-3.5-turbo(0125) |
| gpt-3.5-turbo | 0125 | Not provided | Available⭕ | Not provided | Not provided | After February 22, 2025 |
| gpt-4 (incl. 32K) | 0314 | Not provided | Not provided | Not provided | Not provided | Deprecation start: November 1, 2024 Retirement: June 6, 2025 |
| gpt-4 (incl. 32K) | 0613 | *Partially Available🔺 (1) | Available | Not provided | Not provided | Deprecation start: November 1, 2024 Retirement: June 6, 2025 |
| gpt-4 | 1106-preview | Not provided | Available⭕ | Not provided | Not provided | Will be upgraded to "gpt-4-turbo-2024-04-09" after January 27, 2025. |
| gpt-4 | 0125-preview | Not provided | Available⭕ | Not provided | Not provided | Will be upgraded to "gpt-4-turbo-2024-04-09" after January 27, 2025. |
| gpt-4 | vision-preview | Available⭕ | Available⭕ | Not provided | Not provided | Will be upgraded to "gpt-4-turbo-2024-04-09" after January 27, 2025. (*3) |
| gpt-4-turbo | 2024-04-09 | Not provided | *Available🔺 (2) | Not provided | Not provided | TBD |
| gpt-4o | 2024-05-13 | Not provided | *Available🔺 (2) | Available⭕ | Not provided | TBD |
| gpt-4o | 2024-08-06 | Not provided | Not provided | Not provided | Not provided | TBD |
| gpt-4o-mini | 2024-07-18 | Not provided | Not provided | Not provided | Not provided | TBD |
| text-embedding-ada-002 | 2 | Available⭕ | Not provided | Not provided | Not provided | After April 3, 2025 |
| text-embedding-ada-002 | 1 | Not provided | Not provided | Not provided | Not provided | After April 3, 2025 |
| text-embedding-3-small | 1 | Not provided | Not provided | Not provided | Not provided | After February 2, 2025 |
| text-embedding-3-large | 1 | Available⭕ | Not provided | Not provided | Not provided | After February 2, 2025 |
*1: Some existing customers are granted access to the GPT-4 0613 version.
Interpretation of *1: Users who previously applied and received permission to use GPT-4(0613) in the Japan East region when it was offered can use it until the model retirement date. As of now (August 29, 2024), new deployments from subscriptions that do not have prior approval—meaning those that do not have a quota—are not possible.
*2: Even for PTU usage, provision may not be possible depending on Microsoft's data center resource status and usage. It is recommended to check with a Microsoft sales representative before signing a PTU contract. Alternatively, as shown below, the model deployment screen will display a message if there is insufficient capacity in the Japan East region, so it seems you can check deployment feasibility there.

*3

For details, please refer to the official documentation linked in the references below.
Supplementary Information
The recommended migration models for the GPT-3.5 series (excluding instruct models) and the GPT-4 series are gpt-35-turbo (0125)/GPT-4o-mini and GPT-4o (*), respectively. However, as of now (September 10, 2024), these are not available via Standard provision in the Japan East region.
Note: Please note that these are recommended as migration target models, not necessarily as direct successor models.
Current Challenges
The following are the current challenges. In short, for enterprise use, it remains a difficult situation for planning service offerings to end-users.
- There are no successor models/versions available via Standard provision in the Japan East region to serve as migration targets for the gpt-3.5-turbo(0613) model, which is scheduled for retirement on
November 1, 2024January 27, 2025. - For GPT-4, new deployments are not possible except for the vision-preview model. As the name suggests, the vision-preview model is not specialized for text-only use cases, and since preview versions are not recommended for production use, it is difficult to adopt for production environments.
- PTU usage is often more expensive compared to Standard usage, which may not fit the budget depending on the use case. Furthermore, provision for some models is not guaranteed (please contact Microsoft for details on pricing and available models).
Proposed Solutions
The following are five major countermeasures for the issues mentioned above. However, as noted, if using overseas regions is not permitted, PTU is budget-prohibitive, and using clouds other than Azure is not an option, further countermeasures need to be considered. (I won't elaborate here as no better ideas come to mind immediately; perhaps time will solve this.)
- Use of overseas regions (may require consent or persuasion of end-users)
- Use of PTU
- Use of models from other clouds
- Use of OSS models / Model Catalog
- Use of models provided by IT vendors that allow on-premises use even for closed models
Reflections
I suspect several complex factors are intertwined behind this situation. For instance, I speculate the following:
- Infrastructure conditions (including data center expansion, GPU supply/demand, and global optimization)
- The fact that hardware business remains more profitable than API business (likely influenced by sales strategies and the intentions of the US headquarters)
- A desire to promote the use of the Model Catalog/MaaS (in other words, avoiding total dependence on AOAI)
- Roadmap considerations for GPT models looking several years into the future
References: Azure Official Documentation
Blog post regarding the availability of GPT-4o 2024-08-06 on Azure:
References: Others
Article related to *1 above:
For PTU, the following is highly recommended as a very well-organized summary!
Also, articles like these:
AWS Claude Availability @ Tokyo Region
Below is a table summarizing the availability of Claude 3 and Claude 3.5 series models and related features in the Asia Pacific (Tokyo) region on Bedrock.
| Feature Category | Name | Availability in Tokyo Region |
|---|---|---|
| Model | Claude 3.5 Sonnet | Available⭕️ |
| Model | Claude 3 Opus | Not available |
| Model | Claude 3 Sonnet | Not available |
| Model | Claude 3 Haiku | Available⭕ |
| Related Feature | Guardrails | Available⭕ |
| Related Feature | Model Evaluation | Available⭕ |
| Related Feature | Knowledge Bases | Available⭕ |
| Related Feature | Agents | Available⭕ |
| Related Feature | Fine-tuning (Custom models) | Not available |
| Related Feature | Continued Pre-training (Custom models) | Not available |
| Related Feature | Provisioned Throughput | Not available |
For details, please refer to the official documentation below.
References: AWS Official Documentation
While it doesn't seem to be available in the Tokyo region yet, AWS also has a Provisioned Throughput service, similar to Azure's PTU. (I learned about this while writing this article.)
References: Others
Google Cloud Gemini/Claude Model Availability @ Tokyo Region
About Gemini
| Model Name | Version | Release Date | Retirement Date |
|---|---|---|---|
| Gemini 1.5 Flash model | gemini-1.5-flash-001 | May 24, 2024 | May 24, 2025 |
| Gemini 1.5 Pro model | gemini-1.5-pro-001 | May 24, 2024 | May 24, 2025 |
| Gemini 1.0 Pro Vision model | gemini-1.0-pro-vision-001 | February 15, 2024 | February 15, 2025 |
| Gemini 1.0 Pro model | gemini-1.0-pro-001 | February 15, 2024 | February 15, 2025 |
| Gemini 1.0 Pro model | gemini-1.0-pro-002 | April 9, 2024 | April 9, 2025 |
In summary, all Gemini models are available in the Tokyo region (asia-northeast1).
However, to use the Gemini 1.0 Ultra series, an application is required. Please contact your Google Cloud sales representative for information on the application process.
About Claude
As of August 29, 2024, it appears that Claude is not available in Japanese regions on Google Cloud. Additionally, I could not find information regarding model retirement dates in the official documentation. (If you find any, please let me know in the comments section.)
| Model Name | Available Regions |
|---|---|
| Claude 3.5 Sonnet | us-east5 (Ohio) europe-west1 (Belgium) |
| Claude 3 Opus | us-east5 (Ohio) |
| Claude 3 Haiku | us-central1 (Iowa) us-east5 (Ohio) europe-west1 (Belgium) europe-west4 (Netherlands) |
| Claude 3 Sonnet | us-central1 (Iowa) us-east5 (Ohio) asia-southeast1 (Singapore) |
For more details, please refer to the official documentation below.
References: Google Cloud Official Documentation
The following link, which might be confusing, lists the regions where Gemini models are hosted for Gemini for Google Cloud:
Similar to Azure and AWS PTU, it is possible to use Provisioned Throughput on Google Cloud.
To deepen my own understanding, I have summarized the overview of Provisioned Throughput in Google Cloud below.
Overview of Provisioned Throughput in Google Cloud
Provisioned Throughput is a fixed-fee monthly subscription service provided within Google Cloud's Vertex AI that reserves throughput for specific generative AI models. Users secure throughput by specifying the model and the location where it is executed.
Cases Where It Should Be Used
Consider using Provisioned Throughput if the following requirements apply:
- When you have critical workloads that require high throughput
- When building real-time generative AI applications (e.g., chatbots or agents)
- When you need throughput exceeding 20,000 characters per second
- When you want to provide users with a consistent and predictable experience
- When you want to keep costs down with a fixed monthly fee
How Provisioned Throughput is Measured
Provisioned Throughput is measured in units called Generative AI Scale Units (GSU). All inputs and outputs are converted into input characters per second using model-specific ratios (burn-down rates). The required number of GSUs is then calculated based on this converted number of input characters.
Supported Models
The following Google and partner models support Provisioned Throughput:
-
Google Models:
- gemini-1.5-flash: Provides 54,000 characters per second throughput with a context window of up to 128,000 characters per second
- gemini-1.5-pro: Provides 800 characters per second throughput
- gemini-1.0-pro: Provides 8,000 characters per second throughput
- MedLM-medium: Provides 2,000 characters per second throughput
- MedLM-large: Provides 200 characters per second throughput
-
Partner Models:
- Anthropic Claude 3.5 Sonnet: Provides 350 tokens per second throughput
- Anthropic Claude 3 Opus: Provides 70 tokens per second throughput
- Anthropic Claude 3 Haiku: Provides 4,200 tokens per second throughput
Subscription Notes
- Non-cancellable orders: The purchase of Provisioned Throughput is a contract and cannot be cancelled, though additional GSUs can be purchased.
- Auto-renewal: You can choose either auto-renewal or non-renewal at expiration when placing the order.
- Changing model versions or regions: You can change model versions within the same publisher or region.
- No throughput rollover: Unused throughput does not carry over to the next month.
- Priority processing: Provisioned Throughput requests are processed with priority.
Purchase Procedure
Create a Provisioned Throughput order in the Google Cloud console, enter the required number of GSUs, and confirm. After purchase, the status will be "Under review," "Active," or "Expired."
Usage and Monitoring
Provisioned Throughput is used with priority for each request; however, if the throughput limit is exceeded, on-demand charges will apply. You can also track usage using monitoring metrics.
Important Notes
To place a Provisioned Throughput order or increase the number of GSUs in an existing order, please contact your Google Cloud account representative.
Summary
The availability of generative AI models not only varies by cloud vendor but also significantly depends on the region and the provision type. In this article, I summarized the availability and constraints of major generative AI models on Azure, AWS, and Google Cloud. When considering implementation in an enterprise environment, many factors such as data handling, legal regulations, and budget must be taken into account. Stay informed with the latest updates and understand the characteristics of each cloud vendor to choose the optimal AI model.
Generative AI technology is evolving daily, with frequent model version upgrades and new feature additions. Therefore, it is important to regularly check official documentation for the latest information. I plan to continue providing updates through this article, so please stay tuned.
Thank you for reading this far! Great job.
If this was even slightly helpful, I would be very happy if you could give it a Like and Follow 🙏!
[Disclaimer]
The information in this article is current as of the time of writing (September 10, 2024). This article was created based on publicly available information, but it may contain errors. Please judge the accuracy of the content at your own risk. AI technology is advancing rapidly, and product specifications, pricing, and availability are subject to change without notice. For the most current and accurate information, please always check the official documentation and the latest information from the relevant service providers. Furthermore, the content of this article is intended for general informational purposes and is not intended as professional advice. For specific implementation or usage, please consult with an appropriate professional.
Discussion