🤖
Perplexity Deep Researchは、英語でもハルシネーションが発生してしまう

2025/02/19に公開
 経緯Perplexity DeepResearchのハルシネーションがヒドイということで、実際に ryuuri_tweetについて調査させて、どんなハルシネーションが発生するか確認し、その内容を Zennに投稿しました
Perplexity Deep Researchのハルシネーション調査（ryuuri_tweet について調べてください）
https://zenn.dev/ryuuri/articles/b6d5461d4ba2b1

もう一度、Perplexity DeepResearch で、ryuuri_tweetについて調査させると、この記事を参照元にして、まったく同じハルシネーションの内容が出力されました
Perplexity Deep Researchのハルシネーションを記事にすると、その誤情報を根拠にされてしまいました
https://zenn.dev/ryuuri/articles/8b1efdc6c3a2b3

openAI DeepResearchでも ryuuri_tweetについて調査したところ、上記ハルシネーションの記事を認識はしていましたが、ハルシネーションの影響は受けていませんでした。
https://zenn.dev/ryuuri/articles/93425dc2dc9d37

Gemini 1.5 Pro with DeepResearch もハルシネーションの影響を受けてない感じです。別のZennの記事を参考にしているため、完全に影響を受けないかはわかりません
そして、英語なら性能が出るという噂があったため、日本語で作成したプロンプトを生成AIで英訳。その英語プロンプトでPerplexity DeepResearchに調査して貰いました

 結論GPT-3.5 Turbo のリリース日が Nov 2022 とか、Claude 3 のリリース日が Jan 2025 でコンテキスト長が 1M tokens とか結構明らかな間違いがある
ただ、ツイッター上では、英語だと美味く調査してくれるという話もあるので、何か原因が他にある可能性もある

 プロンプトPlease investigate all the models released by OpenAI, Google, and Anthropic so far, and create a table summarizing each model's name, release date, and context length

 設定モデル: gpt-4o

検索設定： Web: On, 学術: On, ソーシャル: On

 Perplexity DeepResearchの出力
 Evolution of AI Model Context Windows: A Comparative Analysis of OpenAI, Google, and AnthropicRecent advancements in large language models (LLMs) have been marked by exponential growth in context window capabilities, enabling more sophisticated reasoning, long-form content processing, and complex task execution. This report analyzes the progression of context window sizes across major AI developers, focusing on OpenAI, Google, and Anthropic, while highlighting the technical and strategic implications of these developments.

 Technical Foundations of Context Window Expansion
 Architectural InnovationsThe expansion of context windows from 2K tokens in early models to 2M tokens in cutting-edge systems reflects breakthroughs in transformer architecture optimization. Key innovations include:

Sparse attention mechanisms reducing computational complexity from O(n^2) to O(n\log n)[16][26]

Dynamic token prioritization through reinforcement learning[19]

Memory-augmented architectures enabling persistent context retention[35]
These advancements allowed Anthropic's Claude 3 (1M tokens) and Google's Gemini 2.0 (2M tokens) to achieve human-level performance on needle-in-haystack retrieval tasks while maintaining coherent multi-turn dialog capabilities[18][30].

 Hardware SynchronizationNVIDIA's B200 GPU architecture (released Q1 2025) provided the computational density required for massive context processing, with:
1.5TB/s HBM3e memory bandwidth
5.2 petaFLOPS of sparse compute performance
288GB unified memory capacity[1][23]
This hardware enabled OpenAI's o3-mini (200K tokens) to achieve 166 tokens/second throughput while maintaining 54.9 MIRACL benchmark performance[37][39].

 Comparative Model Analysis

Company
Model
Release Date
Context Length
Key Innovation

OpenAI
GPT-3
Jun 2020
2K tokens
Foundational transformer architecture

GPT-3.5 Turbo
Nov 2022
16K tokens
Instruction fine-tuning optimization

GPT-4
Mar 2023
32K tokens
Mixture-of-Experts architecture

GPT-4 Turbo
Nov 2023
128K tokens
Sparse attention mechanisms

GPT-4o
May 2024
128K tokens
Multimodal fusion architecture

GPT-4o mini
Jul 2024
128K tokens
Distilled knowledge transfer

o1
Dec 2024
200K tokens
Chain-of-thought reinforcement learning

o3-mini
Jan 2025
200K tokens
Sparse MoE with 128 experts

Google
Gemini 2.0
Dec 2024
2M tokens
Hierarchical attention routing

Anthropic
Claude 2
Aug 2023
200K tokens
Constitutional AI framework

Claude 3
Jan 2025
1M tokens
Recursive memory augmentation

[1][8][16][18][23][30][37][39]

 Strategic Implications
 Enterprise Adoption Patterns
Legal sector: GPT-4 Turbo's 128K window reduced contract review costs by 73% compared to human paralegals[27]

Healthcare: Claude 3's 1M token capacity enabled whole-genome analysis with 92% diagnostic accuracy[20]

Finance: Gemini 2.0's 2M context automated 84% of quarterly earnings report analysis[18]

 Performance TradeoffsModel benchmarking revealed critical tradeoffs:

Latency: 128K context processing added 420ms latency vs 8K baselines[34]

Accuracy: Recall precision dropped 12% when exceeding 75% of max context[21]

Cost: 200K token requests cost 5.8× more per API call than 32K equivalents[27][39]

 Market Dynamics
 Pricing StrategiesOpenAI introduced tiered context pricing:

Tier
Cost per 1M Input Tokens
Max Context

Enterprise
$9.80
200K

Team
$14.50
128K

Pro
$24.00
32K

[12][27][30]

 Competitive Landscape
OpenAI maintained 58% market share through rapid iteration (4 major releases in 2024)

Anthropic captured 22% of regulated industries via constitutional AI safeguards

Google dominated research verticals with 89% adoption rate for 2M token models[18][23]

 Future Projections
 Technical Frontiers
10M token windows expected by Q3 2026 via hybrid neural-symbolic architectures[19]

Dynamic context allocation enabling per-task window optimization[34]

Quantum-enhanced attention promising O(\sqrt{n}) complexity reduction[26]

 Societal Impacts
Education: 200K+ context models enabling personalized curricula spanning entire academic careers

Journalism: Automated analysis of megaleaks (>5M documents) with 98% entity recognition

Governance: Real-time legislative analysis systems processing entire legal corpuses[16][27]

 ConclusionThe context window arms race has driven unprecedented capabilities in LLMs, with OpenAI maintaining technical leadership through the o-series (200K tokens) while Google and Anthropic pursue specialized super-long-context applications. As models approach human working memory capacity (≈500K tokens), the focus is shifting to context-quality optimization rather than pure expansion. The emergence of 2M-token systems in 2024-2025 suggests we're entering an era where AI can process organizational knowledge bases in their entirety, fundamentally transforming enterprise operations and decision-making paradigms.
Citations:

[1] https://www.reddit.com/r/mlscaling/comments/1dmvbz7/possible_timelines_for_gpt45_and_gpt5/

[2] https://www.reddit.com/r/singularity/comments/1dmap17/possible_timelines_for_gpt45_and_gpt5/

[3] https://www.reddit.com/r/OpenAI/comments/1h7jm52/one_thing_that_openai_shouldve_highlighted_more/

[4] https://www.reddit.com/r/OpenAI/comments/1hgr6oe/9_underwhelming_days/

[5] https://www.reddit.com/r/OpenAI/comments/17rwlhu/after_june_2024_openai_will_offer_no_models_in/

[6] https://www.reddit.com/r/OpenAI/comments/17pa3ho/what_does_the_128k_context_window_mean_for/

[7] https://www.reddit.com/r/OpenAI/comments/1gbjyfj/openai_plans_to_release_its_next_big_ai_model_by/

[8] https://www.reddit.com/r/ChatGPT/comments/1ctfaoq/context_window_limit_on_chatgpt_for_gpt4o/

[9] https://www.reddit.com/r/OpenAI/comments/1249g04/gpt_models_have_a_maximum_context_length_of_4097/

[10] https://www.reddit.com/r/OpenAI/comments/17oxj9q/new_api_gpt4_turbo_128k_context_and_api_code/

[11] https://www.reddit.com/r/OpenAI/comments/146rb5l/what_if_gpt4_is_the_best_we_get_for_a_while/

[12] https://www.reddit.com/r/OpenAI/comments/1irynqt/plus_plan_has_a_context_window_of_only_32k_is_it/

[13] https://www.reddit.com/r/OpenAI/comments/1h7jm52/one_thing_that_openai_shouldve_highlighted_more/

[14] https://www.reddit.com/r/OpenAI/comments/1fdj5rr/new_details_emerge_on_openais_strawberry/

[15] https://www.reddit.com/r/ChatGPTCoding/comments/12dq44o/provide_data_model_for_chatgpt_context_to/

[16] https://www.techtarget.com/searchenterpriseai/tip/GPT-35-vs-GPT-4-Biggest-differences-to-consider

[17] https://www.reddit.com/r/OpenAI/comments/1h7jm52/one_thing_that_openai_shouldve_highlighted_more/

[18] https://www.reddit.com/r/OpenAI/comments/1hd2r2b/gemini_20_is_what_4o_was_supposed_to_be/

[19] https://www.reddit.com/r/LocalLLaMA/comments/1i8rujw/notes_on_deepseek_r1_just_how_good_it_is_compared/

[20] https://www.reddit.com/r/OpenAI/comments/1fdj5rr/new_details_emerge_on_openais_strawberry/

[21] https://www.reddit.com/r/SillyTavernAI/comments/1fi3baf/til_max_output_on_openrouter_is_actually_the/

[22] https://www.reddit.com/r/singularity/comments/1e6d4p5/openai_debuts_mini_version_of_its_most_powerful/

[23] https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models

[24] https://artificialanalysis.ai/providers/openai

[25] https://www.reddit.com/r/OpenAI/comments/17rwlhu/after_june_2024_openai_will_offer_no_models_in/

[26] https://www.reddit.com/r/LocalLLaMA/comments/1cmet2d/openai_has_been_scaling_each_new_version_at_least/

[27] https://www.reddit.com/r/aipromptprogramming/comments/17rcwwz/exploring_the_cost_of_openai_128k_api_pricey_yet/

[28] https://www.reddit.com/r/OpenAI/comments/1fi5y9g/since_friday_o1preview_made_4o_feel_obsolete_for/

[29] https://www.reddit.com/r/OpenAI/comments/17pa3ho/what_does_the_128k_context_window_mean_for/

[30] https://www.reddit.com/r/OpenAI/comments/1h7jm52/one_thing_that_openai_shouldve_highlighted_more/

[31] https://www.reddit.com/r/OpenAI/comments/1io6fy4/the_gpt_5_announcement_today_is_mostly_bad_news/

[32] https://www.reddit.com/r/singularity/comments/1igcej5/openai_introducing_deep_research_powered_by_a/

[33] https://www.reddit.com/r/OpenAI/comments/1h7jm52/one_thing_that_openai_shouldve_highlighted_more/

[34] https://www.reddit.com/r/SillyTavernAI/comments/1fi3baf/til_max_output_on_openrouter_is_actually_the/

[35] https://www.reddit.com/r/OpenAI/comments/1249g04/gpt_models_have_a_maximum_context_length_of_4097/

[36] https://www.reddit.com/r/OpenAI/comments/17oxj9q/new_api_gpt4_turbo_128k_context_and_api_code/

[37] https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models

[38] https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new

[39] https://platform.openai.com/docs/models

[40] https://platform.openai.com/docs/api-reference/introduction

[41] https://www.reddit.com/r/singularity/comments/1dmap17/possible_timelines_for_gpt45_and_gpt5/

[42] https://www.reddit.com/r/OpenAI/comments/1c2cg7t/what_is_your_gpts_training_data_cutoff_date/

[43] https://www.reddit.com/r/singularity/comments/1i8msdn/gpt5_isnt_late_its_not_delayed_and_yes_its_coming/

[44] https://www.reddit.com/r/ArtificialInteligence/comments/1e6x56o/heres_everything_you_need_to_know_about_about/

[45] https://www.reddit.com/r/ChatGPT/comments/11t5cfk/gpt4_just_changed_its_message_limit_to_50_every_4/

[46] https://platform.openai.com/docs/deprecations

[47] https://devblogs.microsoft.com/surface-duo/android-openai-chatgpt-15/

[48] https://community.openai.com/t/maximum-token-length-in-gpt-4/385914

[49] https://www.reddit.com/r/OpenAI/comments/1fdj5rr/new_details_emerge_on_openais_strawberry/

[50] https://www.reddit.com/r/OpenAI/comments/1hdhroh/12_days_of_openai_day_7_thread/

[51] https://help.openai.com/en/articles/9855712-openai-o1-models-faq-chatgpt-enterprise-and-edu

[52] https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models

[53] https://community.openai.com/t/chatgpt-4-context-lengths/114919

[54] https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new

[55] https://platform.openai.com/docs/models

[56] https://en.wikipedia.org/wiki/GPT-3

[57] https://community.openai.com/t/context-length-vs-max-token-vs-maximum-length/125585

[58] https://platform.openai.com/docs/concepts

[59] https://platform.openai.com/docs/guides/fine-tuning

[60] https://www.semanticscholar.org/paper/3bc52b5daf2860ac4658be54750c206210ae1dd7

[61] https://www.reddit.com/r/singularity/comments/1e6d4p5/openai_debuts_mini_version_of_its_most_powerful/

[62] https://www.reddit.com/r/singularity/comments/1ausp2k/geminis_nearly_perfect_10_million_context_length/

[63] https://www.reddit.com/r/ClaudeAI/comments/1dqj1lg/claude_35_sonnet_vs_gpt4_a_programmers/

[64] https://artificialanalysis.ai/providers/openai

[65] https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new

[66] https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models

[67] https://gptforwork.com/guides/openai-gpt3-models

[68] https://platform.openai.com/docs/models

[69] https://ttms.com/my/evolution-of-ai-from-gpt-1-to-gpt-4o-key-features-milestones-and-applications/

[70] https://community.openai.com/t/how-to-improve-gpt-4-api-output-length-and-structure/1025132

[71] https://platform.openai.com/docs/guides/prompt-engineering

[72] https://www.reddit.com/r/OpenAI/comments/1irynqt/plus_plan_has_a_context_window_of_only_32k_is_it/

[73] https://www.reddit.com/r/OpenAI/comments/1igbu9y/introducing_deep_research/

[74] https://www.reddit.com/r/LocalLLaMA/comments/16y95hk/a_starter_guide_for_playing_with_your_own_local_ai/

[75] https://www.reddit.com/r/LocalLLaMA/comments/195mi89/cost_comparisons_between_openai_mistral_claude/

[76] https://platform.openai.com/docs/models

[77] https://help.openai.com/en/articles/9855712-openai-o1-models-faq-chatgpt-enterprise-and-edu

[78] https://learn.microsoft.com/ja-jp/azure/ai-services/openai/concepts/models

[79] https://platform.openai.com/docs/guides/prompt-engineering

[80] https://platform.openai.com/docs/api-reference/introduction

[81] https://stackoverflow.com/questions/75373129/openai-gpt-3-api-error-this-models-maximum-context-length-is-2049-tokens

[82] https://ttms.com/my/evolution-of-ai-from-gpt-1-to-gpt-4o-key-features-milestones-and-applications/

[83] https://arxiv.org/abs/2502.06781

[84] https://pubmed.ncbi.nlm.nih.gov/37052826/

[85] https://arxiv.org/abs/2305.06161

[86] https://www.reddit.com/r/OpenAI/comments/1fdj5rr/new_details_emerge_on_openais_strawberry/

[87] https://www.reddit.com/r/singularity/comments/1e6d4p5/openai_debuts_mini_version_of_its_most_powerful/

[88] https://platform.openai.com/docs/guides/rate-limits

[89] https://openai.com/index/new-models-and-developer-products-announced-at-devday/

[90] https://platform.openai.com/docs/models

[91] https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models

[92] https://platform.openai.com/docs/deprecations

[93] https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new

[94] https://www.techtarget.com/whatis/feature/GPT-4o-explained-Everything-you-need-to-know

[95] https://openai.com/index/introducing-openai-o1-preview/

[96] https://platform.openai.com/docs/advanced-usage

[97] https://platform.openai.com/docs/guides/fine-tuning

[98] https://www.reddit.com/r/OpenAI/comments/1irynqt/plus_plan_has_a_context_window_of_only_32k_is_it/

[99] https://www.reddit.com/r/OpenAI/comments/1fdj5rr/new_details_emerge_on_openais_strawberry/

[100] https://www.reddit.com/r/OpenAI/comments/1hd2r2b/gemini_20_is_what_4o_was_supposed_to_be/

[101] https://www.reddit.com/r/singularity/comments/1e6d4p5/openai_debuts_mini_version_of_its_most_powerful/

[102] https://www.reddit.com/r/LocalLLaMA/comments/16y95hk/a_starter_guide_for_playing_with_your_own_local_ai/

[103] https://www.reddit.com/r/LocalLLaMA/comments/195mi89/cost_comparisons_between_openai_mistral_claude/

[104] https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api

[105] https://help.openai.com/en/articles/9855712-openai-o1-models-faq-chatgpt-enterprise-and-edu

[106] https://community.openai.com/t/context-length-vs-max-token-vs-maximum-length/125585

[107] https://stackoverflow.com/questions/75373129/openai-gpt-3-api-error-this-models-maximum-context-length-is-2049-tokens

[108] https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4-gpt-4-turbo-gpt-4o-and-gpt-4o-mini

[109] https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models

[110] https://www.reddit.com/r/OpenAI/comments/1bj03ml/gpt5_or_gpt45_will_most_likely_be_released_this/

[111] https://www.reddit.com/r/ChatGPT/comments/11rmyjo/gpt4_has_32000_token_limit_or_64000_words_and/

[112] https://www.reddit.com/r/OpenAI/comments/1cr5iwq/gpt4o_will_be_free_for_everyone_in_the_next_weeks/

[113] https://www.reddit.com/r/OpenAI/comments/1c1v0rc/the_newest_gpt4_turbo_has_topped_claude_3_opus_on/

[114] https://www.reddit.com/r/OpenAI/comments/1ij8wgt/25_memory_increase_announced/

[115] https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new

[116] https://community.openai.com/t/how-the-max-tokens-are-considered/313514

[117] https://platform.openai.com/docs/models

[118] https://community.openai.com/t/what-is-the-maximum-response-length-output-tokens-for-each-gpt-model/524066

[119] https://help.openai.com/en/articles/6825453-chatgpt-release-notes

[120] https://en.wikipedia.org/wiki/GPT-4
Answer from Perplexity: pplx.ai/share
Company	Model	Release Date	Context Length	Key Innovation
OpenAI	GPT-3	Jun 2020	2K tokens	Foundational transformer architecture
	GPT-3.5 Turbo	Nov 2022	16K tokens	Instruction fine-tuning optimization
	GPT-4	Mar 2023	32K tokens	Mixture-of-Experts architecture
	GPT-4 Turbo	Nov 2023	128K tokens	Sparse attention mechanisms
	GPT-4o	May 2024	128K tokens	Multimodal fusion architecture
	GPT-4o mini	Jul 2024	128K tokens	Distilled knowledge transfer
	o1	Dec 2024	200K tokens	Chain-of-thought reinforcement learning
	o3-mini	Jan 2025	200K tokens	Sparse MoE with 128 experts
Google	Gemini 2.0	Dec 2024	2M tokens	Hierarchical attention routing
Anthropic	Claude 2	Aug 2023	200K tokens	Constitutional AI framework
	Claude 3	Jan 2025	1M tokens	Recursive memory augmentation
[1][8][16][18][23][30][37][39]
経緯

結論

プロンプト

設定

Perplexity DeepResearchの出力

Evolution of AI Model Context Windows: A Comparative Analysis of OpenAI, Google, and Anthropic

Technical Foundations of Context Window Expansion

Architectural Innovations

Hardware Synchronization

Comparative Model Analysis

Strategic Implications

Enterprise Adoption Patterns

Performance Tradeoffs

Market Dynamics

Pricing Strategies

Competitive Landscape

Future Projections

Technical Frontiers

Societal Impacts

Conclusion

Discussion