Home / AI Technologies & Tools / Jeremy Berman’s AI Breakthrough with Natural Language in ARC-AGI-2

Jeremy Berman’s AI Breakthrough with Natural Language in ARC-AGI-2

Sep 29, 2025

Daniel MairlyEmerging Tech Advisor

In a world where artificial intelligence continues to push boundaries, one researcher has made a remarkable stride that could redefine how machines think and reason, setting a new benchmark for the field. Jeremy Berman, a research scientist at Reflection AI, has emerged as a leading figure with his groundbreaking approach to the ARC-AGI v2 challenge, a rigorous test often compared to an IQ exam for AI systems. Scoring an impressive 29.4%, Berman has shifted the paradigm from traditional code generation to natural language-based reasoning, offering a glimpse into a future where AI might mirror human-like understanding. This achievement not only highlights the potential for machines to tackle abstract reasoning tasks but also underscores the limitations of current models, sparking curiosity about how far this technology can evolve. As the industry grapples with the complexities of general intelligence, Berman’s work serves as a beacon of innovation, challenging conventional methods and setting a new standard for what AI can achieve.

Revolutionizing AI Reasoning

Shifting from Code to Language

Jeremy Berman’s journey in the ARC-AGI challenges reveals a transformative pivot that could reshape AI development. Initially achieving a striking 53.6% accuracy in ARC-AGI v1 using an “Evolutionary Test-time Compute” method, which relied on large language models generating and refining Python functions, Berman encountered significant hurdles with the more intricate compositional problems of ARC-AGI v2. These tasks demanded the application of multiple rules simultaneously, exposing the shortcomings of rigid coding frameworks. Recognizing that current language models often mimic understanding without truly grasping concepts, Berman boldly transitioned to using natural language descriptions of algorithms. This shift allowed for a more expressive and flexible articulation of complex ideas, enabling AI to approach problems with a broader conceptual framework. By leveraging natural language, Berman’s method marked a departure from deterministic code, opening up new possibilities for how machines interpret and solve abstract challenges.

Enhancing Problem-Solving with Thinking Models

Building on this innovative approach, Berman introduced advanced “thinking models” like Grok-4 in the ARC-AGI v2 challenge, which incorporate internal revision processes for autonomous solution refinement. Unlike the external revision loops used for Python in earlier attempts, these models enable the AI to self-correct and adapt without constant external input, mimicking a more intuitive problem-solving process. A key insight from this work was the effectiveness of broad exploration over deep iteration when addressing the multifaceted problems of ARC-AGI v2. This strategic shift not only secured Berman’s top position on the leaderboard but also highlighted the untapped potential of natural language as a programming medium. By prioritizing a wide-ranging search for solutions, the AI could consider diverse perspectives and avoid getting stuck in narrow, repetitive cycles. This advancement suggests that future AI systems might benefit from balancing depth with breadth, ensuring they remain versatile in tackling increasingly complex tasks across various domains.

Envisioning the Future of Artificial General Intelligence

Overcoming Current AI Limitations

Looking ahead, Jeremy Berman’s vision for AI addresses critical barriers that hinder progress toward true general intelligence, such as catastrophic forgetting—where models lose prior knowledge when learning new information. His advocacy for systems with “built-in revision loops” offers a promising direction, allowing AI to explore solution spaces adaptively and retain learned data over time. Aligning with the neuro-symbolic debate, Berman posits that neural networks, when equipped with sufficient computational power and optimized structures, can emulate the capabilities of biological neural systems. This perspective fuels optimism that within the next decade, challenges like catastrophic forgetting during fine-tuning could be resolved. Such advancements would pave the way for AI to achieve reasoning capabilities comparable to human cognition, transforming how technology integrates into scientific discovery and societal applications. Berman’s forward-thinking approach underscores the urgency of evolving beyond existing paradigms to unlock AI’s full potential.

Bridging the Human-Machine Performance Gap

Another facet of Berman’s contribution lies in confronting the stark performance disparity between humans and AI on abstract reasoning tasks. While humans achieve around 75% accuracy on ARC-AGI v1, even advanced models like GPT-4 lag far behind at a mere 5%, illustrating the immense difficulty of these challenges. Berman’s work over the past eight months, inspired by thought-provoking concepts from Jeff Hawkins’s “A Thousand Brains,” seeks to close this gap by reimagining how AI processes abstract patterns. His transition to natural language-based reasoning offers a dynamic alternative to static coding methods, enabling machines to approach problems with greater conceptual depth. This shift not only addresses the immediate shortcomings of language models but also sets a foundation for long-term progress in AI’s ability to synthesize new knowledge. By focusing on meta-skills—learning how to create new skills—Berman’s research points toward a future where AI might rival human ingenuity in deciphering complex, rule-based scenarios.

Paving the Way for Transformative Progress

Reflecting on the strides made through Jeremy Berman’s efforts in the ARC-AGI v2 challenge, it becomes evident that his adoption of natural language as a core reasoning tool marked a pivotal moment in AI research. His recognition of the limitations inherent in current models, which often fail to exhibit genuine understanding, spurred a rethinking of traditional approaches. Moreover, the emphasis on broad exploration strategies over repetitive iteration provided a fresh lens through which to view problem-solving in artificial intelligence. Berman’s vision for adaptive, self-revising systems tackled fundamental issues, setting a precedent for how the field could evolve. As discussions around artificial general intelligence gain momentum, his contributions offer a roadmap for overcoming architectural and computational barriers. The insights gained from this work inspire a renewed focus on developing AI that could one day match human cognitive abilities, driving the community to explore innovative solutions and push the boundaries of what technology can achieve.