iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
👈

Major Generative AI Model Availability in Japan Regions: Azure, AWS, and Google Cloud [As of Sept 10, 2024] 🚀

に公開

Update Information

  • September 10, 2024: Updated information regarding retirement dates for gpt-3.5-turbo/gpt-4 models
  • September 5, 2024: Added information on how to check PTU capacity for the Japan East region
  • September 4, 2024: Added information for gpt-4o 2024-08-06
  • September 2, 2024: Minor corrections (no significant updates since August 29, 2024)
  • August 29, 2024: First draft

Introduction

In recent years, as the adoption of generative AI in enterprises accelerates, the choice of generative AI models provided by various cloud vendors has become a critical decision directly linked to business success.
However, the availability and provision types of generative AI models (such as Azure OpenAI's GPT models, AWS's Claude, and Google Cloud's Gemini/Claude) in Japanese regions like Tokyo are subject to frequent updates.

Since entering August 2024, articles like the following have surfaced, causing quite a stir in the community.

https://xtech.nikkei.com/atcl/nxt/column/18/00001/09626/

The reason for this buzz, especially in enterprise use, is that from viewpoints such as:

  • Legal regulations (Act on the Protection of Personal Information, etc.)
  • Compliance (particularly strict in healthcare, finance, and government sectors)

There are many cases where API usage for generative AI models is required to keep communication and data storage entirely within data centers located in Japan.
In this article, I hope to help you understand the current situation and assist in planning internal and external service offerings by aggregating an overview of the availability and provision types of major generative AI models from major cloud vendors.

I plan to update this article periodically with corrections and additions.

Information as of September 10, 2024

Azure OpenAI GPT Model Availability @ Japan East Region

I have summarized the availability and provision types of OpenAI's GPT models in the Azure Japan East region in the table below.

Official documentation can sometimes be difficult to parse as it includes information for all regions. Therefore, the following is an excerpted summary specifically for the Japan East region.

Model Name Model Version Standard Provision PTU Provision Global-Standard Provision Global-Batch Provision Model Retirement Date
gpt-3.5-turbo 0301 Not provided Not provided Not provided Not provided Retirement planned for January 27, 2025

Automatic updates are scheduled to begin on November 15, 2024, for those who have configured automatic updates to gpt-3.5-turbo(0125)
gpt-3.5-turbo (incl. 16K) 0613 Available⭕ Not provided Not provided Not provided Retirement planned for January 27, 2025

Automatic updates are scheduled to begin on November 15, 2024, for those who have configured automatic updates to gpt-3.5-turbo(0125)
gpt-35-turbo-instruct 0914 Not provided Not provided Not provided Not provided After September 14, 2025
gpt-3.5-turbo 1106 Not provided Not provided Not provided Not provided After November 17, 2024

Automatic updates are scheduled to begin on November 15, 2024, for those who have configured automatic updates to gpt-3.5-turbo(0125)
gpt-3.5-turbo 0125 Not provided Available⭕ Not provided Not provided After February 22, 2025
gpt-4 (incl. 32K) 0314 Not provided Not provided Not provided Not provided Deprecation start: November 1, 2024
Retirement: June 6, 2025
gpt-4 (incl. 32K) 0613 *Partially Available🔺 (1) Available Not provided Not provided Deprecation start: November 1, 2024
Retirement: June 6, 2025
gpt-4 1106-preview Not provided Available⭕ Not provided Not provided Will be upgraded to "gpt-4-turbo-2024-04-09" after January 27, 2025.
gpt-4 0125-preview Not provided Available⭕ Not provided Not provided Will be upgraded to "gpt-4-turbo-2024-04-09" after January 27, 2025.
gpt-4 vision-preview Available⭕ Available⭕ Not provided Not provided Will be upgraded to "gpt-4-turbo-2024-04-09" after January 27, 2025. (*3)
gpt-4-turbo 2024-04-09 Not provided *Available🔺 (2) Not provided Not provided TBD
gpt-4o 2024-05-13 Not provided *Available🔺 (2) Available⭕ Not provided TBD
gpt-4o 2024-08-06 Not provided Not provided Not provided Not provided TBD
gpt-4o-mini 2024-07-18 Not provided Not provided Not provided Not provided TBD
text-embedding-ada-002 2 Available⭕ Not provided Not provided Not provided After April 3, 2025
text-embedding-ada-002 1 Not provided Not provided Not provided Not provided After April 3, 2025
text-embedding-3-small 1 Not provided Not provided Not provided Not provided After February 2, 2025
text-embedding-3-large 1 Available⭕ Not provided Not provided Not provided After February 2, 2025

*1: Some existing customers are granted access to the GPT-4 0613 version.
Interpretation of *1: Users who previously applied and received permission to use GPT-4(0613) in the Japan East region when it was offered can use it until the model retirement date. As of now (August 29, 2024), new deployments from subscriptions that do not have prior approval—meaning those that do not have a quota—are not possible.

*2: Even for PTU usage, provision may not be possible depending on Microsoft's data center resource status and usage. It is recommended to check with a Microsoft sales representative before signing a PTU contract. Alternatively, as shown below, the model deployment screen will display a message if there is insufficient capacity in the Japan East region, so it seems you can check deployment feasibility there.

*3

For details, please refer to the official documentation linked in the references below.

Supplementary Information

The recommended migration models for the GPT-3.5 series (excluding instruct models) and the GPT-4 series are gpt-35-turbo (0125)/GPT-4o-mini and GPT-4o (*), respectively. However, as of now (September 10, 2024), these are not available via Standard provision in the Japan East region.
Note: Please note that these are recommended as migration target models, not necessarily as direct successor models.

Current Challenges

The following are the current challenges. In short, for enterprise use, it remains a difficult situation for planning service offerings to end-users.

  • There are no successor models/versions available via Standard provision in the Japan East region to serve as migration targets for the gpt-3.5-turbo(0613) model, which is scheduled for retirement on November 1, 2024 January 27, 2025.
  • For GPT-4, new deployments are not possible except for the vision-preview model. As the name suggests, the vision-preview model is not specialized for text-only use cases, and since preview versions are not recommended for production use, it is difficult to adopt for production environments.
  • PTU usage is often more expensive compared to Standard usage, which may not fit the budget depending on the use case. Furthermore, provision for some models is not guaranteed (please contact Microsoft for details on pricing and available models).

Proposed Solutions

The following are five major countermeasures for the issues mentioned above. However, as noted, if using overseas regions is not permitted, PTU is budget-prohibitive, and using clouds other than Azure is not an option, further countermeasures need to be considered. (I won't elaborate here as no better ideas come to mind immediately; perhaps time will solve this.)

  • Use of overseas regions (may require consent or persuasion of end-users)
  • Use of PTU
  • Use of models from other clouds
  • Use of OSS models / Model Catalog
  • Use of models provided by IT vendors that allow on-premises use even for closed models

Reflections

I suspect several complex factors are intertwined behind this situation. For instance, I speculate the following:

  • Infrastructure conditions (including data center expansion, GPU supply/demand, and global optimization)
  • The fact that hardware business remains more profitable than API business (likely influenced by sales strategies and the intentions of the US headquarters)
  • A desire to promote the use of the Model Catalog/MaaS (in other words, avoiding total dependence on AOAI)
  • Roadmap considerations for GPT models looking several years into the future

References: Azure Official Documentation

https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models
https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/model-retirements
https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new

Blog post regarding the availability of GPT-4o 2024-08-06 on Azure:
https://techcommunity.microsoft.com/t5/ai-azure-ai-services-blog/introducing-gpt-4o-2024-08-06-api-with-structured-outputs-on/ba-p/4232684

References: Others

Article related to *1 above:
https://zenn.dev/chips0711/articles/7fe1e588b7753a

For PTU, the following is highly recommended as a very well-organized summary!
https://zenn.dev/microsoft/articles/azure_perfectly_understand_ptu

Also, articles like these:
https://zenn.dev/umi_mori/articles/aoai-warning-ptu

https://x.com/daiki15036604/status/1831340941430730755

AWS Claude Availability @ Tokyo Region

Below is a table summarizing the availability of Claude 3 and Claude 3.5 series models and related features in the Asia Pacific (Tokyo) region on Bedrock.

Feature Category Name Availability in Tokyo Region
Model Claude 3.5 Sonnet Available⭕️
Model Claude 3 Opus Not available
Model Claude 3 Sonnet Not available
Model Claude 3 Haiku Available⭕
Related Feature Guardrails Available⭕
Related Feature Model Evaluation Available⭕
Related Feature Knowledge Bases Available⭕
Related Feature Agents Available⭕
Related Feature Fine-tuning (Custom models) Not available
Related Feature Continued Pre-training (Custom models) Not available
Related Feature Provisioned Throughput Not available

For details, please refer to the official documentation below.

References: AWS Official Documentation

https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html

https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-regions.html

While it doesn't seem to be available in the Tokyo region yet, AWS also has a Provisioned Throughput service, similar to Azure's PTU. (I learned about this while writing this article.)

https://docs.aws.amazon.com/en_us/bedrock/latest/userguide/prov-throughput.html

References: Others

https://qiita.com/minorun365/items/e2202774ea357f311243

Google Cloud Gemini/Claude Model Availability @ Tokyo Region

About Gemini

Model Name Version Release Date Retirement Date
Gemini 1.5 Flash model gemini-1.5-flash-001 May 24, 2024 May 24, 2025
Gemini 1.5 Pro model gemini-1.5-pro-001 May 24, 2024 May 24, 2025
Gemini 1.0 Pro Vision model gemini-1.0-pro-vision-001 February 15, 2024 February 15, 2025
Gemini 1.0 Pro model gemini-1.0-pro-001 February 15, 2024 February 15, 2025
Gemini 1.0 Pro model gemini-1.0-pro-002 April 9, 2024 April 9, 2025

In summary, all Gemini models are available in the Tokyo region (asia-northeast1).

However, to use the Gemini 1.0 Ultra series, an application is required. Please contact your Google Cloud sales representative for information on the application process.

About Claude

As of August 29, 2024, it appears that Claude is not available in Japanese regions on Google Cloud. Additionally, I could not find information regarding model retirement dates in the official documentation. (If you find any, please let me know in the comments section.)

Model Name Available Regions
Claude 3.5 Sonnet us-east5 (Ohio)
europe-west1 (Belgium)
Claude 3 Opus us-east5 (Ohio)
Claude 3 Haiku us-central1 (Iowa)
us-east5 (Ohio)
europe-west1 (Belgium)
europe-west4 (Netherlands)
Claude 3 Sonnet us-central1 (Iowa)
us-east5 (Ohio)
asia-southeast1 (Singapore)

For more details, please refer to the official documentation below.

References: Google Cloud Official Documentation

https://cloud.google.com/vertex-ai/generative-ai/docs/learn/locations?hl=en#asia-pacific

https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/use-claude?hl=en#anthropic_claude_region_availability

https://cloud.google.com/vertex-ai/generative-ai/docs/learn/locations

The following link, which might be confusing, lists the regions where Gemini models are hosted for Gemini for Google Cloud:
https://cloud.google.com/gemini/docs/locations#asia-pacific

Similar to Azure and AWS PTU, it is possible to use Provisioned Throughput on Google Cloud.

https://cloud.google.com/vertex-ai/generative-ai/docs/provisioned-throughput

To deepen my own understanding, I have summarized the overview of Provisioned Throughput in Google Cloud below.

Overview of Provisioned Throughput in Google Cloud

Provisioned Throughput is a fixed-fee monthly subscription service provided within Google Cloud's Vertex AI that reserves throughput for specific generative AI models. Users secure throughput by specifying the model and the location where it is executed.

Cases Where It Should Be Used

Consider using Provisioned Throughput if the following requirements apply:

  • When you have critical workloads that require high throughput
  • When building real-time generative AI applications (e.g., chatbots or agents)
  • When you need throughput exceeding 20,000 characters per second
  • When you want to provide users with a consistent and predictable experience
  • When you want to keep costs down with a fixed monthly fee

How Provisioned Throughput is Measured

Provisioned Throughput is measured in units called Generative AI Scale Units (GSU). All inputs and outputs are converted into input characters per second using model-specific ratios (burn-down rates). The required number of GSUs is then calculated based on this converted number of input characters.

Supported Models

The following Google and partner models support Provisioned Throughput:

  • Google Models:

    • gemini-1.5-flash: Provides 54,000 characters per second throughput with a context window of up to 128,000 characters per second
    • gemini-1.5-pro: Provides 800 characters per second throughput
    • gemini-1.0-pro: Provides 8,000 characters per second throughput
    • MedLM-medium: Provides 2,000 characters per second throughput
    • MedLM-large: Provides 200 characters per second throughput
  • Partner Models:

    • Anthropic Claude 3.5 Sonnet: Provides 350 tokens per second throughput
    • Anthropic Claude 3 Opus: Provides 70 tokens per second throughput
    • Anthropic Claude 3 Haiku: Provides 4,200 tokens per second throughput

Subscription Notes

  • Non-cancellable orders: The purchase of Provisioned Throughput is a contract and cannot be cancelled, though additional GSUs can be purchased.
  • Auto-renewal: You can choose either auto-renewal or non-renewal at expiration when placing the order.
  • Changing model versions or regions: You can change model versions within the same publisher or region.
  • No throughput rollover: Unused throughput does not carry over to the next month.
  • Priority processing: Provisioned Throughput requests are processed with priority.

Purchase Procedure

Create a Provisioned Throughput order in the Google Cloud console, enter the required number of GSUs, and confirm. After purchase, the status will be "Under review," "Active," or "Expired."

Usage and Monitoring

Provisioned Throughput is used with priority for each request; however, if the throughput limit is exceeded, on-demand charges will apply. You can also track usage using monitoring metrics.

Important Notes

To place a Provisioned Throughput order or increase the number of GSUs in an existing order, please contact your Google Cloud account representative.

Summary

The availability of generative AI models not only varies by cloud vendor but also significantly depends on the region and the provision type. In this article, I summarized the availability and constraints of major generative AI models on Azure, AWS, and Google Cloud. When considering implementation in an enterprise environment, many factors such as data handling, legal regulations, and budget must be taken into account. Stay informed with the latest updates and understand the characteristics of each cloud vendor to choose the optimal AI model.

Generative AI technology is evolving daily, with frequent model version upgrades and new feature additions. Therefore, it is important to regularly check official documentation for the latest information. I plan to continue providing updates through this article, so please stay tuned.

Thank you for reading this far! Great job.
If this was even slightly helpful, I would be very happy if you could give it a Like and Follow 🙏!

[Disclaimer]

The information in this article is current as of the time of writing (September 10, 2024). This article was created based on publicly available information, but it may contain errors. Please judge the accuracy of the content at your own risk. AI technology is advancing rapidly, and product specifications, pricing, and availability are subject to change without notice. For the most current and accurate information, please always check the official documentation and the latest information from the relevant service providers. Furthermore, the content of this article is intended for general informational purposes and is not intended as professional advice. For specific implementation or usage, please consult with an appropriate professional.

Discussion