Perplexity Deep Researchは、英語でもハルシネーションが発生してしまう
経緯
- Perplexity DeepResearchのハルシネーションがヒドイということで、実際に ryuuri_tweetについて調査させて、どんなハルシネーションが発生するか確認し、その内容を Zennに投稿しました
- Perplexity Deep Researchのハルシネーション調査(ryuuri_tweet について調べてください)
- https://zenn.dev/ryuuri/articles/b6d5461d4ba2b1
- もう一度、Perplexity DeepResearch で、ryuuri_tweetについて調査させると、この記事を参照元にして、まったく同じハルシネーションの内容が出力されました
- Perplexity Deep Researchのハルシネーションを記事にすると、その誤情報を根拠にされてしまいました
- https://zenn.dev/ryuuri/articles/8b1efdc6c3a2b3
- openAI DeepResearchでも ryuuri_tweetについて調査したところ、上記ハルシネーションの記事を認識はしていましたが、ハルシネーションの影響は受けていませんでした。
- Gemini 1.5 Pro with DeepResearch もハルシネーションの影響を受けてない感じです。別のZennの記事を参考にしているため、完全に影響を受けないかはわかりません
- そして、英語なら性能が出るという噂があったため、日本語で作成したプロンプトを生成AIで英訳。その英語プロンプトでPerplexity DeepResearchに調査して貰いました
結論
- GPT-3.5 Turbo のリリース日が Nov 2022 とか、Claude 3 のリリース日が Jan 2025 でコンテキスト長が 1M tokens とか結構明らかな間違いがある
- ただ、ツイッター上では、英語だと美味く調査してくれるという話もあるので、何か原因が他にある可能性もある
プロンプト
Please investigate all the models released by OpenAI, Google, and Anthropic so far, and create a table summarizing each model's name, release date, and context length
設定
モデル: gpt-4o
検索設定: Web: On, 学術: On, ソーシャル: On
Perplexity DeepResearchの出力
Evolution of AI Model Context Windows: A Comparative Analysis of OpenAI, Google, and Anthropic
Recent advancements in large language models (LLMs) have been marked by exponential growth in context window capabilities, enabling more sophisticated reasoning, long-form content processing, and complex task execution. This report analyzes the progression of context window sizes across major AI developers, focusing on OpenAI, Google, and Anthropic, while highlighting the technical and strategic implications of these developments.
Technical Foundations of Context Window Expansion
Architectural Innovations
The expansion of context windows from 2K tokens in early models to 2M tokens in cutting-edge systems reflects breakthroughs in transformer architecture optimization. Key innovations include:
-
Sparse attention mechanisms reducing computational complexity from
toO(n^2) [16][26]O(n\log n) - Dynamic token prioritization through reinforcement learning[19]
- Memory-augmented architectures enabling persistent context retention[35]
These advancements allowed Anthropic's Claude 3 (1M tokens) and Google's Gemini 2.0 (2M tokens) to achieve human-level performance on needle-in-haystack retrieval tasks while maintaining coherent multi-turn dialog capabilities[18][30].
Hardware Synchronization
NVIDIA's B200 GPU architecture (released Q1 2025) provided the computational density required for massive context processing, with:
- 1.5TB/s HBM3e memory bandwidth
- 5.2 petaFLOPS of sparse compute performance
- 288GB unified memory capacity[1][23]
This hardware enabled OpenAI's o3-mini (200K tokens) to achieve 166 tokens/second throughput while maintaining 54.9 MIRACL benchmark performance[37][39].
Comparative Model Analysis
Company | Model | Release Date | Context Length | Key Innovation |
---|---|---|---|---|
OpenAI | GPT-3 | Jun 2020 | 2K tokens | Foundational transformer architecture |
GPT-3.5 Turbo | Nov 2022 | 16K tokens | Instruction fine-tuning optimization | |
GPT-4 | Mar 2023 | 32K tokens | Mixture-of-Experts architecture | |
GPT-4 Turbo | Nov 2023 | 128K tokens | Sparse attention mechanisms | |
GPT-4o | May 2024 | 128K tokens | Multimodal fusion architecture | |
GPT-4o mini | Jul 2024 | 128K tokens | Distilled knowledge transfer | |
o1 | Dec 2024 | 200K tokens | Chain-of-thought reinforcement learning | |
o3-mini | Jan 2025 | 200K tokens | Sparse MoE with 128 experts | |
Gemini 2.0 | Dec 2024 | 2M tokens | Hierarchical attention routing | |
Anthropic | Claude 2 | Aug 2023 | 200K tokens | Constitutional AI framework |
Claude 3 | Jan 2025 | 1M tokens | Recursive memory augmentation | |
[1][8][16][18][23][30][37][39] |
Strategic Implications
Enterprise Adoption Patterns
- Legal sector: GPT-4 Turbo's 128K window reduced contract review costs by 73% compared to human paralegals[27]
- Healthcare: Claude 3's 1M token capacity enabled whole-genome analysis with 92% diagnostic accuracy[20]
- Finance: Gemini 2.0's 2M context automated 84% of quarterly earnings report analysis[18]
Performance Tradeoffs
Model benchmarking revealed critical tradeoffs:
- Latency: 128K context processing added 420ms latency vs 8K baselines[34]
- Accuracy: Recall precision dropped 12% when exceeding 75% of max context[21]
- Cost: 200K token requests cost 5.8× more per API call than 32K equivalents[27][39]
Market Dynamics
Pricing Strategies
OpenAI introduced tiered context pricing:
Tier | Cost per 1M Input Tokens | Max Context |
---|---|---|
Enterprise | $9.80 | 200K |
Team | $14.50 | 128K |
Pro | $24.00 | 32K |
[12][27][30] |
Competitive Landscape
- OpenAI maintained 58% market share through rapid iteration (4 major releases in 2024)
- Anthropic captured 22% of regulated industries via constitutional AI safeguards
- Google dominated research verticals with 89% adoption rate for 2M token models[18][23]
Future Projections
Technical Frontiers
- 10M token windows expected by Q3 2026 via hybrid neural-symbolic architectures[19]
- Dynamic context allocation enabling per-task window optimization[34]
-
Quantum-enhanced attention promising
complexity reduction[26]O(\sqrt{n})
Societal Impacts
- Education: 200K+ context models enabling personalized curricula spanning entire academic careers
- Journalism: Automated analysis of megaleaks (>5M documents) with 98% entity recognition
- Governance: Real-time legislative analysis systems processing entire legal corpuses[16][27]
Conclusion
The context window arms race has driven unprecedented capabilities in LLMs, with OpenAI maintaining technical leadership through the o-series (200K tokens) while Google and Anthropic pursue specialized super-long-context applications. As models approach human working memory capacity (≈500K tokens), the focus is shifting to context-quality optimization rather than pure expansion. The emergence of 2M-token systems in 2024-2025 suggests we're entering an era where AI can process organizational knowledge bases in their entirety, fundamentally transforming enterprise operations and decision-making paradigms.
Citations:
[1] https://www.reddit.com/r/mlscaling/comments/1dmvbz7/possible_timelines_for_gpt45_and_gpt5/
[2] https://www.reddit.com/r/singularity/comments/1dmap17/possible_timelines_for_gpt45_and_gpt5/
[3] https://www.reddit.com/r/OpenAI/comments/1h7jm52/one_thing_that_openai_shouldve_highlighted_more/
[4] https://www.reddit.com/r/OpenAI/comments/1hgr6oe/9_underwhelming_days/
[5] https://www.reddit.com/r/OpenAI/comments/17rwlhu/after_june_2024_openai_will_offer_no_models_in/
[6] https://www.reddit.com/r/OpenAI/comments/17pa3ho/what_does_the_128k_context_window_mean_for/
[7] https://www.reddit.com/r/OpenAI/comments/1gbjyfj/openai_plans_to_release_its_next_big_ai_model_by/
[8] https://www.reddit.com/r/ChatGPT/comments/1ctfaoq/context_window_limit_on_chatgpt_for_gpt4o/
[9] https://www.reddit.com/r/OpenAI/comments/1249g04/gpt_models_have_a_maximum_context_length_of_4097/
[10] https://www.reddit.com/r/OpenAI/comments/17oxj9q/new_api_gpt4_turbo_128k_context_and_api_code/
[11] https://www.reddit.com/r/OpenAI/comments/146rb5l/what_if_gpt4_is_the_best_we_get_for_a_while/
[12] https://www.reddit.com/r/OpenAI/comments/1irynqt/plus_plan_has_a_context_window_of_only_32k_is_it/
[13] https://www.reddit.com/r/OpenAI/comments/1h7jm52/one_thing_that_openai_shouldve_highlighted_more/
[14] https://www.reddit.com/r/OpenAI/comments/1fdj5rr/new_details_emerge_on_openais_strawberry/
[15] https://www.reddit.com/r/ChatGPTCoding/comments/12dq44o/provide_data_model_for_chatgpt_context_to/
[16] https://www.techtarget.com/searchenterpriseai/tip/GPT-35-vs-GPT-4-Biggest-differences-to-consider
[17] https://www.reddit.com/r/OpenAI/comments/1h7jm52/one_thing_that_openai_shouldve_highlighted_more/
[18] https://www.reddit.com/r/OpenAI/comments/1hd2r2b/gemini_20_is_what_4o_was_supposed_to_be/
[19] https://www.reddit.com/r/LocalLLaMA/comments/1i8rujw/notes_on_deepseek_r1_just_how_good_it_is_compared/
[20] https://www.reddit.com/r/OpenAI/comments/1fdj5rr/new_details_emerge_on_openais_strawberry/
[21] https://www.reddit.com/r/SillyTavernAI/comments/1fi3baf/til_max_output_on_openrouter_is_actually_the/
[22] https://www.reddit.com/r/singularity/comments/1e6d4p5/openai_debuts_mini_version_of_its_most_powerful/
[23] https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models
[24] https://artificialanalysis.ai/providers/openai
[25] https://www.reddit.com/r/OpenAI/comments/17rwlhu/after_june_2024_openai_will_offer_no_models_in/
[26] https://www.reddit.com/r/LocalLLaMA/comments/1cmet2d/openai_has_been_scaling_each_new_version_at_least/
[27] https://www.reddit.com/r/aipromptprogramming/comments/17rcwwz/exploring_the_cost_of_openai_128k_api_pricey_yet/
[28] https://www.reddit.com/r/OpenAI/comments/1fi5y9g/since_friday_o1preview_made_4o_feel_obsolete_for/
[29] https://www.reddit.com/r/OpenAI/comments/17pa3ho/what_does_the_128k_context_window_mean_for/
[30] https://www.reddit.com/r/OpenAI/comments/1h7jm52/one_thing_that_openai_shouldve_highlighted_more/
[31] https://www.reddit.com/r/OpenAI/comments/1io6fy4/the_gpt_5_announcement_today_is_mostly_bad_news/
[32] https://www.reddit.com/r/singularity/comments/1igcej5/openai_introducing_deep_research_powered_by_a/
[33] https://www.reddit.com/r/OpenAI/comments/1h7jm52/one_thing_that_openai_shouldve_highlighted_more/
[34] https://www.reddit.com/r/SillyTavernAI/comments/1fi3baf/til_max_output_on_openrouter_is_actually_the/
[35] https://www.reddit.com/r/OpenAI/comments/1249g04/gpt_models_have_a_maximum_context_length_of_4097/
[36] https://www.reddit.com/r/OpenAI/comments/17oxj9q/new_api_gpt4_turbo_128k_context_and_api_code/
[37] https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models
[38] https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new
[39] https://platform.openai.com/docs/models
[40] https://platform.openai.com/docs/api-reference/introduction
[41] https://www.reddit.com/r/singularity/comments/1dmap17/possible_timelines_for_gpt45_and_gpt5/
[42] https://www.reddit.com/r/OpenAI/comments/1c2cg7t/what_is_your_gpts_training_data_cutoff_date/
[43] https://www.reddit.com/r/singularity/comments/1i8msdn/gpt5_isnt_late_its_not_delayed_and_yes_its_coming/
[44] https://www.reddit.com/r/ArtificialInteligence/comments/1e6x56o/heres_everything_you_need_to_know_about_about/
[45] https://www.reddit.com/r/ChatGPT/comments/11t5cfk/gpt4_just_changed_its_message_limit_to_50_every_4/
[46] https://platform.openai.com/docs/deprecations
[47] https://devblogs.microsoft.com/surface-duo/android-openai-chatgpt-15/
[48] https://community.openai.com/t/maximum-token-length-in-gpt-4/385914
[49] https://www.reddit.com/r/OpenAI/comments/1fdj5rr/new_details_emerge_on_openais_strawberry/
[50] https://www.reddit.com/r/OpenAI/comments/1hdhroh/12_days_of_openai_day_7_thread/
[51] https://help.openai.com/en/articles/9855712-openai-o1-models-faq-chatgpt-enterprise-and-edu
[52] https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models
[53] https://community.openai.com/t/chatgpt-4-context-lengths/114919
[54] https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new
[55] https://platform.openai.com/docs/models
[56] https://en.wikipedia.org/wiki/GPT-3
[57] https://community.openai.com/t/context-length-vs-max-token-vs-maximum-length/125585
[58] https://platform.openai.com/docs/concepts
[59] https://platform.openai.com/docs/guides/fine-tuning
[60] https://www.semanticscholar.org/paper/3bc52b5daf2860ac4658be54750c206210ae1dd7
[61] https://www.reddit.com/r/singularity/comments/1e6d4p5/openai_debuts_mini_version_of_its_most_powerful/
[62] https://www.reddit.com/r/singularity/comments/1ausp2k/geminis_nearly_perfect_10_million_context_length/
[63] https://www.reddit.com/r/ClaudeAI/comments/1dqj1lg/claude_35_sonnet_vs_gpt4_a_programmers/
[64] https://artificialanalysis.ai/providers/openai
[65] https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new
[66] https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models
[67] https://gptforwork.com/guides/openai-gpt3-models
[68] https://platform.openai.com/docs/models
[69] https://ttms.com/my/evolution-of-ai-from-gpt-1-to-gpt-4o-key-features-milestones-and-applications/
[70] https://community.openai.com/t/how-to-improve-gpt-4-api-output-length-and-structure/1025132
[71] https://platform.openai.com/docs/guides/prompt-engineering
[72] https://www.reddit.com/r/OpenAI/comments/1irynqt/plus_plan_has_a_context_window_of_only_32k_is_it/
[73] https://www.reddit.com/r/OpenAI/comments/1igbu9y/introducing_deep_research/
[74] https://www.reddit.com/r/LocalLLaMA/comments/16y95hk/a_starter_guide_for_playing_with_your_own_local_ai/
[75] https://www.reddit.com/r/LocalLLaMA/comments/195mi89/cost_comparisons_between_openai_mistral_claude/
[76] https://platform.openai.com/docs/models
[77] https://help.openai.com/en/articles/9855712-openai-o1-models-faq-chatgpt-enterprise-and-edu
[78] https://learn.microsoft.com/ja-jp/azure/ai-services/openai/concepts/models
[79] https://platform.openai.com/docs/guides/prompt-engineering
[80] https://platform.openai.com/docs/api-reference/introduction
[81] https://stackoverflow.com/questions/75373129/openai-gpt-3-api-error-this-models-maximum-context-length-is-2049-tokens
[82] https://ttms.com/my/evolution-of-ai-from-gpt-1-to-gpt-4o-key-features-milestones-and-applications/
[83] https://arxiv.org/abs/2502.06781
[84] https://pubmed.ncbi.nlm.nih.gov/37052826/
[85] https://arxiv.org/abs/2305.06161
[86] https://www.reddit.com/r/OpenAI/comments/1fdj5rr/new_details_emerge_on_openais_strawberry/
[87] https://www.reddit.com/r/singularity/comments/1e6d4p5/openai_debuts_mini_version_of_its_most_powerful/
[88] https://platform.openai.com/docs/guides/rate-limits
[89] https://openai.com/index/new-models-and-developer-products-announced-at-devday/
[90] https://platform.openai.com/docs/models
[91] https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models
[92] https://platform.openai.com/docs/deprecations
[93] https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new
[94] https://www.techtarget.com/whatis/feature/GPT-4o-explained-Everything-you-need-to-know
[95] https://openai.com/index/introducing-openai-o1-preview/
[96] https://platform.openai.com/docs/advanced-usage
[97] https://platform.openai.com/docs/guides/fine-tuning
[98] https://www.reddit.com/r/OpenAI/comments/1irynqt/plus_plan_has_a_context_window_of_only_32k_is_it/
[99] https://www.reddit.com/r/OpenAI/comments/1fdj5rr/new_details_emerge_on_openais_strawberry/
[100] https://www.reddit.com/r/OpenAI/comments/1hd2r2b/gemini_20_is_what_4o_was_supposed_to_be/
[101] https://www.reddit.com/r/singularity/comments/1e6d4p5/openai_debuts_mini_version_of_its_most_powerful/
[102] https://www.reddit.com/r/LocalLLaMA/comments/16y95hk/a_starter_guide_for_playing_with_your_own_local_ai/
[103] https://www.reddit.com/r/LocalLLaMA/comments/195mi89/cost_comparisons_between_openai_mistral_claude/
[104] https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api
[105] https://help.openai.com/en/articles/9855712-openai-o1-models-faq-chatgpt-enterprise-and-edu
[106] https://community.openai.com/t/context-length-vs-max-token-vs-maximum-length/125585
[107] https://stackoverflow.com/questions/75373129/openai-gpt-3-api-error-this-models-maximum-context-length-is-2049-tokens
[108] https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4-gpt-4-turbo-gpt-4o-and-gpt-4o-mini
[109] https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models
[110] https://www.reddit.com/r/OpenAI/comments/1bj03ml/gpt5_or_gpt45_will_most_likely_be_released_this/
[111] https://www.reddit.com/r/ChatGPT/comments/11rmyjo/gpt4_has_32000_token_limit_or_64000_words_and/
[112] https://www.reddit.com/r/OpenAI/comments/1cr5iwq/gpt4o_will_be_free_for_everyone_in_the_next_weeks/
[113] https://www.reddit.com/r/OpenAI/comments/1c1v0rc/the_newest_gpt4_turbo_has_topped_claude_3_opus_on/
[114] https://www.reddit.com/r/OpenAI/comments/1ij8wgt/25_memory_increase_announced/
[115] https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new
[116] https://community.openai.com/t/how-the-max-tokens-are-considered/313514
[117] https://platform.openai.com/docs/models
[118] https://community.openai.com/t/what-is-the-maximum-response-length-output-tokens-for-each-gpt-model/524066
[119] https://help.openai.com/en/articles/6825453-chatgpt-release-notes
[120] https://en.wikipedia.org/wiki/GPT-4
Answer from Perplexity: pplx.ai/share
Discussion