🤖

Perplexity Deep Researchは、英語でもハルシネーションが発生してしまう

2025/02/19に公開

経緯

  • Perplexity DeepResearchのハルシネーションがヒドイということで、実際に ryuuri_tweetについて調査させて、どんなハルシネーションが発生するか確認し、その内容を Zennに投稿しました
  • もう一度、Perplexity DeepResearch で、ryuuri_tweetについて調査させると、この記事を参照元にして、まったく同じハルシネーションの内容が出力されました
  • openAI DeepResearchでも ryuuri_tweetについて調査したところ、上記ハルシネーションの記事を認識はしていましたが、ハルシネーションの影響は受けていませんでした。
  • Gemini 1.5 Pro with DeepResearch もハルシネーションの影響を受けてない感じです。別のZennの記事を参考にしているため、完全に影響を受けないかはわかりません
  • そして、英語なら性能が出るという噂があったため、日本語で作成したプロンプトを生成AIで英訳。その英語プロンプトでPerplexity DeepResearchに調査して貰いました

結論

  • GPT-3.5 Turbo のリリース日が Nov 2022 とか、Claude 3 のリリース日が Jan 2025 でコンテキスト長が 1M tokens とか結構明らかな間違いがある
  • ただ、ツイッター上では、英語だと美味く調査してくれるという話もあるので、何か原因が他にある可能性もある

プロンプト

Please investigate all the models released by OpenAI, Google, and Anthropic so far, and create a table summarizing each model's name, release date, and context length

設定

モデル: gpt-4o
検索設定: Web: On, 学術: On, ソーシャル: On

Perplexity DeepResearchの出力

Evolution of AI Model Context Windows: A Comparative Analysis of OpenAI, Google, and Anthropic

Recent advancements in large language models (LLMs) have been marked by exponential growth in context window capabilities, enabling more sophisticated reasoning, long-form content processing, and complex task execution. This report analyzes the progression of context window sizes across major AI developers, focusing on OpenAI, Google, and Anthropic, while highlighting the technical and strategic implications of these developments.

Technical Foundations of Context Window Expansion

Architectural Innovations

The expansion of context windows from 2K tokens in early models to 2M tokens in cutting-edge systems reflects breakthroughs in transformer architecture optimization. Key innovations include:

  • Sparse attention mechanisms reducing computational complexity from
    O(n^2)
    to
    O(n\log n)
    [16][26]
  • Dynamic token prioritization through reinforcement learning[19]
  • Memory-augmented architectures enabling persistent context retention[35]

These advancements allowed Anthropic's Claude 3 (1M tokens) and Google's Gemini 2.0 (2M tokens) to achieve human-level performance on needle-in-haystack retrieval tasks while maintaining coherent multi-turn dialog capabilities[18][30].

Hardware Synchronization

NVIDIA's B200 GPU architecture (released Q1 2025) provided the computational density required for massive context processing, with:

  • 1.5TB/s HBM3e memory bandwidth
  • 5.2 petaFLOPS of sparse compute performance
  • 288GB unified memory capacity[1][23]

This hardware enabled OpenAI's o3-mini (200K tokens) to achieve 166 tokens/second throughput while maintaining 54.9 MIRACL benchmark performance[37][39].

Comparative Model Analysis

Company Model Release Date Context Length Key Innovation
OpenAI GPT-3 Jun 2020 2K tokens Foundational transformer architecture
GPT-3.5 Turbo Nov 2022 16K tokens Instruction fine-tuning optimization
GPT-4 Mar 2023 32K tokens Mixture-of-Experts architecture
GPT-4 Turbo Nov 2023 128K tokens Sparse attention mechanisms
GPT-4o May 2024 128K tokens Multimodal fusion architecture
GPT-4o mini Jul 2024 128K tokens Distilled knowledge transfer
o1 Dec 2024 200K tokens Chain-of-thought reinforcement learning
o3-mini Jan 2025 200K tokens Sparse MoE with 128 experts
Google Gemini 2.0 Dec 2024 2M tokens Hierarchical attention routing
Anthropic Claude 2 Aug 2023 200K tokens Constitutional AI framework
Claude 3 Jan 2025 1M tokens Recursive memory augmentation
[1][8][16][18][23][30][37][39]

Strategic Implications

Enterprise Adoption Patterns

  • Legal sector: GPT-4 Turbo's 128K window reduced contract review costs by 73% compared to human paralegals[27]
  • Healthcare: Claude 3's 1M token capacity enabled whole-genome analysis with 92% diagnostic accuracy[20]
  • Finance: Gemini 2.0's 2M context automated 84% of quarterly earnings report analysis[18]

Performance Tradeoffs

Model benchmarking revealed critical tradeoffs:

  • Latency: 128K context processing added 420ms latency vs 8K baselines[34]
  • Accuracy: Recall precision dropped 12% when exceeding 75% of max context[21]
  • Cost: 200K token requests cost 5.8× more per API call than 32K equivalents[27][39]

Market Dynamics

Pricing Strategies

OpenAI introduced tiered context pricing:

Tier Cost per 1M Input Tokens Max Context
Enterprise $9.80 200K
Team $14.50 128K
Pro $24.00 32K
[12][27][30]

Competitive Landscape

  • OpenAI maintained 58% market share through rapid iteration (4 major releases in 2024)
  • Anthropic captured 22% of regulated industries via constitutional AI safeguards
  • Google dominated research verticals with 89% adoption rate for 2M token models[18][23]

Future Projections

Technical Frontiers

  • 10M token windows expected by Q3 2026 via hybrid neural-symbolic architectures[19]
  • Dynamic context allocation enabling per-task window optimization[34]
  • Quantum-enhanced attention promising
    O(\sqrt{n})
    complexity reduction[26]

Societal Impacts

  • Education: 200K+ context models enabling personalized curricula spanning entire academic careers
  • Journalism: Automated analysis of megaleaks (>5M documents) with 98% entity recognition
  • Governance: Real-time legislative analysis systems processing entire legal corpuses[16][27]

Conclusion

The context window arms race has driven unprecedented capabilities in LLMs, with OpenAI maintaining technical leadership through the o-series (200K tokens) while Google and Anthropic pursue specialized super-long-context applications. As models approach human working memory capacity (≈500K tokens), the focus is shifting to context-quality optimization rather than pure expansion. The emergence of 2M-token systems in 2024-2025 suggests we're entering an era where AI can process organizational knowledge bases in their entirety, fundamentally transforming enterprise operations and decision-making paradigms.

Citations:
[1] https://www.reddit.com/r/mlscaling/comments/1dmvbz7/possible_timelines_for_gpt45_and_gpt5/
[2] https://www.reddit.com/r/singularity/comments/1dmap17/possible_timelines_for_gpt45_and_gpt5/
[3] https://www.reddit.com/r/OpenAI/comments/1h7jm52/one_thing_that_openai_shouldve_highlighted_more/
[4] https://www.reddit.com/r/OpenAI/comments/1hgr6oe/9_underwhelming_days/
[5] https://www.reddit.com/r/OpenAI/comments/17rwlhu/after_june_2024_openai_will_offer_no_models_in/
[6] https://www.reddit.com/r/OpenAI/comments/17pa3ho/what_does_the_128k_context_window_mean_for/
[7] https://www.reddit.com/r/OpenAI/comments/1gbjyfj/openai_plans_to_release_its_next_big_ai_model_by/
[8] https://www.reddit.com/r/ChatGPT/comments/1ctfaoq/context_window_limit_on_chatgpt_for_gpt4o/
[9] https://www.reddit.com/r/OpenAI/comments/1249g04/gpt_models_have_a_maximum_context_length_of_4097/
[10] https://www.reddit.com/r/OpenAI/comments/17oxj9q/new_api_gpt4_turbo_128k_context_and_api_code/
[11] https://www.reddit.com/r/OpenAI/comments/146rb5l/what_if_gpt4_is_the_best_we_get_for_a_while/
[12] https://www.reddit.com/r/OpenAI/comments/1irynqt/plus_plan_has_a_context_window_of_only_32k_is_it/
[13] https://www.reddit.com/r/OpenAI/comments/1h7jm52/one_thing_that_openai_shouldve_highlighted_more/
[14] https://www.reddit.com/r/OpenAI/comments/1fdj5rr/new_details_emerge_on_openais_strawberry/
[15] https://www.reddit.com/r/ChatGPTCoding/comments/12dq44o/provide_data_model_for_chatgpt_context_to/
[16] https://www.techtarget.com/searchenterpriseai/tip/GPT-35-vs-GPT-4-Biggest-differences-to-consider
[17] https://www.reddit.com/r/OpenAI/comments/1h7jm52/one_thing_that_openai_shouldve_highlighted_more/
[18] https://www.reddit.com/r/OpenAI/comments/1hd2r2b/gemini_20_is_what_4o_was_supposed_to_be/
[19] https://www.reddit.com/r/LocalLLaMA/comments/1i8rujw/notes_on_deepseek_r1_just_how_good_it_is_compared/
[20] https://www.reddit.com/r/OpenAI/comments/1fdj5rr/new_details_emerge_on_openais_strawberry/
[21] https://www.reddit.com/r/SillyTavernAI/comments/1fi3baf/til_max_output_on_openrouter_is_actually_the/
[22] https://www.reddit.com/r/singularity/comments/1e6d4p5/openai_debuts_mini_version_of_its_most_powerful/
[23] https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models
[24] https://artificialanalysis.ai/providers/openai
[25] https://www.reddit.com/r/OpenAI/comments/17rwlhu/after_june_2024_openai_will_offer_no_models_in/
[26] https://www.reddit.com/r/LocalLLaMA/comments/1cmet2d/openai_has_been_scaling_each_new_version_at_least/
[27] https://www.reddit.com/r/aipromptprogramming/comments/17rcwwz/exploring_the_cost_of_openai_128k_api_pricey_yet/
[28] https://www.reddit.com/r/OpenAI/comments/1fi5y9g/since_friday_o1preview_made_4o_feel_obsolete_for/
[29] https://www.reddit.com/r/OpenAI/comments/17pa3ho/what_does_the_128k_context_window_mean_for/
[30] https://www.reddit.com/r/OpenAI/comments/1h7jm52/one_thing_that_openai_shouldve_highlighted_more/
[31] https://www.reddit.com/r/OpenAI/comments/1io6fy4/the_gpt_5_announcement_today_is_mostly_bad_news/
[32] https://www.reddit.com/r/singularity/comments/1igcej5/openai_introducing_deep_research_powered_by_a/
[33] https://www.reddit.com/r/OpenAI/comments/1h7jm52/one_thing_that_openai_shouldve_highlighted_more/
[34] https://www.reddit.com/r/SillyTavernAI/comments/1fi3baf/til_max_output_on_openrouter_is_actually_the/
[35] https://www.reddit.com/r/OpenAI/comments/1249g04/gpt_models_have_a_maximum_context_length_of_4097/
[36] https://www.reddit.com/r/OpenAI/comments/17oxj9q/new_api_gpt4_turbo_128k_context_and_api_code/
[37] https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models
[38] https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new
[39] https://platform.openai.com/docs/models
[40] https://platform.openai.com/docs/api-reference/introduction
[41] https://www.reddit.com/r/singularity/comments/1dmap17/possible_timelines_for_gpt45_and_gpt5/
[42] https://www.reddit.com/r/OpenAI/comments/1c2cg7t/what_is_your_gpts_training_data_cutoff_date/
[43] https://www.reddit.com/r/singularity/comments/1i8msdn/gpt5_isnt_late_its_not_delayed_and_yes_its_coming/
[44] https://www.reddit.com/r/ArtificialInteligence/comments/1e6x56o/heres_everything_you_need_to_know_about_about/
[45] https://www.reddit.com/r/ChatGPT/comments/11t5cfk/gpt4_just_changed_its_message_limit_to_50_every_4/
[46] https://platform.openai.com/docs/deprecations
[47] https://devblogs.microsoft.com/surface-duo/android-openai-chatgpt-15/
[48] https://community.openai.com/t/maximum-token-length-in-gpt-4/385914
[49] https://www.reddit.com/r/OpenAI/comments/1fdj5rr/new_details_emerge_on_openais_strawberry/
[50] https://www.reddit.com/r/OpenAI/comments/1hdhroh/12_days_of_openai_day_7_thread/
[51] https://help.openai.com/en/articles/9855712-openai-o1-models-faq-chatgpt-enterprise-and-edu
[52] https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models
[53] https://community.openai.com/t/chatgpt-4-context-lengths/114919
[54] https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new
[55] https://platform.openai.com/docs/models
[56] https://en.wikipedia.org/wiki/GPT-3
[57] https://community.openai.com/t/context-length-vs-max-token-vs-maximum-length/125585
[58] https://platform.openai.com/docs/concepts
[59] https://platform.openai.com/docs/guides/fine-tuning
[60] https://www.semanticscholar.org/paper/3bc52b5daf2860ac4658be54750c206210ae1dd7
[61] https://www.reddit.com/r/singularity/comments/1e6d4p5/openai_debuts_mini_version_of_its_most_powerful/
[62] https://www.reddit.com/r/singularity/comments/1ausp2k/geminis_nearly_perfect_10_million_context_length/
[63] https://www.reddit.com/r/ClaudeAI/comments/1dqj1lg/claude_35_sonnet_vs_gpt4_a_programmers/
[64] https://artificialanalysis.ai/providers/openai
[65] https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new
[66] https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models
[67] https://gptforwork.com/guides/openai-gpt3-models
[68] https://platform.openai.com/docs/models
[69] https://ttms.com/my/evolution-of-ai-from-gpt-1-to-gpt-4o-key-features-milestones-and-applications/
[70] https://community.openai.com/t/how-to-improve-gpt-4-api-output-length-and-structure/1025132
[71] https://platform.openai.com/docs/guides/prompt-engineering
[72] https://www.reddit.com/r/OpenAI/comments/1irynqt/plus_plan_has_a_context_window_of_only_32k_is_it/
[73] https://www.reddit.com/r/OpenAI/comments/1igbu9y/introducing_deep_research/
[74] https://www.reddit.com/r/LocalLLaMA/comments/16y95hk/a_starter_guide_for_playing_with_your_own_local_ai/
[75] https://www.reddit.com/r/LocalLLaMA/comments/195mi89/cost_comparisons_between_openai_mistral_claude/
[76] https://platform.openai.com/docs/models
[77] https://help.openai.com/en/articles/9855712-openai-o1-models-faq-chatgpt-enterprise-and-edu
[78] https://learn.microsoft.com/ja-jp/azure/ai-services/openai/concepts/models
[79] https://platform.openai.com/docs/guides/prompt-engineering
[80] https://platform.openai.com/docs/api-reference/introduction
[81] https://stackoverflow.com/questions/75373129/openai-gpt-3-api-error-this-models-maximum-context-length-is-2049-tokens
[82] https://ttms.com/my/evolution-of-ai-from-gpt-1-to-gpt-4o-key-features-milestones-and-applications/
[83] https://arxiv.org/abs/2502.06781
[84] https://pubmed.ncbi.nlm.nih.gov/37052826/
[85] https://arxiv.org/abs/2305.06161
[86] https://www.reddit.com/r/OpenAI/comments/1fdj5rr/new_details_emerge_on_openais_strawberry/
[87] https://www.reddit.com/r/singularity/comments/1e6d4p5/openai_debuts_mini_version_of_its_most_powerful/
[88] https://platform.openai.com/docs/guides/rate-limits
[89] https://openai.com/index/new-models-and-developer-products-announced-at-devday/
[90] https://platform.openai.com/docs/models
[91] https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models
[92] https://platform.openai.com/docs/deprecations
[93] https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new
[94] https://www.techtarget.com/whatis/feature/GPT-4o-explained-Everything-you-need-to-know
[95] https://openai.com/index/introducing-openai-o1-preview/
[96] https://platform.openai.com/docs/advanced-usage
[97] https://platform.openai.com/docs/guides/fine-tuning
[98] https://www.reddit.com/r/OpenAI/comments/1irynqt/plus_plan_has_a_context_window_of_only_32k_is_it/
[99] https://www.reddit.com/r/OpenAI/comments/1fdj5rr/new_details_emerge_on_openais_strawberry/
[100] https://www.reddit.com/r/OpenAI/comments/1hd2r2b/gemini_20_is_what_4o_was_supposed_to_be/
[101] https://www.reddit.com/r/singularity/comments/1e6d4p5/openai_debuts_mini_version_of_its_most_powerful/
[102] https://www.reddit.com/r/LocalLLaMA/comments/16y95hk/a_starter_guide_for_playing_with_your_own_local_ai/
[103] https://www.reddit.com/r/LocalLLaMA/comments/195mi89/cost_comparisons_between_openai_mistral_claude/
[104] https://help.openai.com/en/articles/6654000-best-practices-for-prompt-engineering-with-the-openai-api
[105] https://help.openai.com/en/articles/9855712-openai-o1-models-faq-chatgpt-enterprise-and-edu
[106] https://community.openai.com/t/context-length-vs-max-token-vs-maximum-length/125585
[107] https://stackoverflow.com/questions/75373129/openai-gpt-3-api-error-this-models-maximum-context-length-is-2049-tokens
[108] https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4-gpt-4-turbo-gpt-4o-and-gpt-4o-mini
[109] https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models
[110] https://www.reddit.com/r/OpenAI/comments/1bj03ml/gpt5_or_gpt45_will_most_likely_be_released_this/
[111] https://www.reddit.com/r/ChatGPT/comments/11rmyjo/gpt4_has_32000_token_limit_or_64000_words_and/
[112] https://www.reddit.com/r/OpenAI/comments/1cr5iwq/gpt4o_will_be_free_for_everyone_in_the_next_weeks/
[113] https://www.reddit.com/r/OpenAI/comments/1c1v0rc/the_newest_gpt4_turbo_has_topped_claude_3_opus_on/
[114] https://www.reddit.com/r/OpenAI/comments/1ij8wgt/25_memory_increase_announced/
[115] https://learn.microsoft.com/en-us/azure/ai-services/openai/whats-new
[116] https://community.openai.com/t/how-the-max-tokens-are-considered/313514
[117] https://platform.openai.com/docs/models
[118] https://community.openai.com/t/what-is-the-maximum-response-length-output-tokens-for-each-gpt-model/524066
[119] https://help.openai.com/en/articles/6825453-chatgpt-release-notes
[120] https://en.wikipedia.org/wiki/GPT-4


Answer from Perplexity: pplx.ai/share

Discussion