iTranslated by AI
Highlights and Impressions of the New Claude Models
Introduction
Claude, well-known for its generative AI, recently released new models.
In this article, I've excerpted parts of the article about the new models posted by Claude and added some of my personal thoughts.
The original post is also short, so if you find the content interesting, please check out the link below.
https://www.anthropic.com/news/claude-3-family
Increased Intelligence
There was an announcement that the three models provided by Claude, a type of generative AI, have improved across many metrics.
Specifically, the benchmarks are as follows:

Quoted from Introducing the next generation of Claude
Since this is Claude's own announcement, their most intelligent model takes the top spot in every category.
It seems to show great strength particularly in reasoning, code evaluation, and multilingual support.
However, in other areas, it is only slightly better compared to GPT-4 or Gemini 1.0 Pro.
Fast Response
This section mentions that the speed from question to answer has significantly improved.
In particular, the Claude 3 Haiku model appears to be the fastest and can be executed at a low cost.
As a concrete example, it states the following:
It can read an information and data dense research paper on arXiv (~10k tokens) with charts and graphs in less than three seconds.
It seems that information from arXiv, a website where various papers are archived and published, can be read in less than 3 seconds, even including graphs and charts.
That's impressive.
The Sonnet model is also said to be about twice as fast as Claude 2 or Claude 2.1.
The Opus model is reportedly about the same speed as Claude 2 or Claude 2.1.
However, both have become smarter, and it's mentioned that the Opus model has become significantly more intelligent.
Improved Vision Processing
As shown in the table below, improvements in reading capabilities for photos, graphs, charts, and diagrams have also been confirmed.

Quoted from Introducing the next generation of Claude
Personally, I have the impression that most of these are flat.
However, I feel that Claude is slightly stronger when it comes to reading diagrams.
Reduced Refusals
With the improvement in model performance, it is now possible to more accurately judge prompt nuances and whether answering is inappropriate.
As a result, as shown in the graph below, the rate of refusing to answer harmless questions has been reduced.

Quoted from Introducing the next generation of Claude
Improved Accuracy of the Opus Model
As a result of asking complex, fact-based questions that previous models struggled with, it now outputs correct answers twice as often, as shown below.

On the other hand, the proportion of incorrect answers or "I don't know" responses has decreased, indicating that the Opus model's answers are more accurate than before.
Personally, I'd like to see the results for the Sonnet and Haiku models as well.
Long context and near-perfect recall
Due to my lack of knowledge, I didn't quite understand what exactly has improved here. My apologies.
Therefore, if you're interested, please check the original text.
Built with Safety in Mind
They emphasized that the models are built with consideration for safety, privacy, and more, in addition to performance improvements.
In this section, I learned about the term Constitutional AI for the first time.
Easier to use
I will skip this part.
Model details
I will skip this part as well, as it mostly summarizes what has already been covered.
When Will They Be Available?
As of March 4, 2024, when this article was posted, the Opus and Sonnet models are reportedly available.
However, using the Opus model requires a subscription to the Pro plan.
Additionally, it seems the Sonnet model can also be used via Amazon Bedrock or Google Cloud’s Vertex AI Model Garden.
Conclusion
This was a summary of the release article for the new Claude models.
Just when we thought GPT-4 was amazing, something that surpasses it appears immediately.
It makes me feel once again that this is a hot field, and the speed is such that I feel a bit anxious about not being able to keep up.
That said, it's exciting to see so much buzz around these new developments.
Thank you for reading this far.
Discussion