Open-Source AI Nearly Wins Top US Math Competition

Open-Source AI Nearly Wins Top US Math Competition

In a stunning display of computational reasoning that has sent ripples through the technology community, an open-source artificial intelligence system has achieved a near-victory in one of the world’s most intellectually demanding academic contests. The system, known as Nomos 1 and developed by the San Francisco-based startup Nous Research, earned a score on the 2024 William Lowell Putnam Mathematical Competition that would have secured a second-place finish among its nearly 4,000 human counterparts. This achievement is far more than a simple performance benchmark; it represents a potential inflection point in the evolution of AI, signaling a paradigm shift where smaller, more efficient, and openly accessible models are beginning to challenge the dominance of massive, proprietary systems built by technology behemoths. The release of Nomos 1 challenges the long-held belief that progress in AI is solely a function of scale and computational power, suggesting instead that sophisticated architecture and clever design can unlock new frontiers of artificial intelligence.

The Ultimate Mathematical Gauntlet

The William Lowell Putnam Mathematical Competition is not an ordinary exam; it is widely regarded as the most prestigious and challenging undergraduate mathematics contest in North America, a veritable intellectual crucible for the brightest young minds. Its six-hour duration is split into two sessions, each presenting six notoriously difficult problems that require not just deep mathematical knowledge but also exceptional creativity, abstract reasoning, and elegant problem-solving skills. The competition’s alumni roster is a testament to its rigor, featuring numerous Fields Medalists and Nobel laureates, including the famed physicist Richard Feynman. Success on the Putnam is a mark of true mathematical genius, making it the ultimate proving ground for any entity, human or artificial, claiming advanced reasoning capabilities. The extreme difficulty is starkly illustrated by the scoring statistics: in a typical year, the median score hovers in the single digits, and a significant portion of participants fail to score any points at all. For instance, in a recent contest, over 60% of students scored three points or fewer out of a possible 120, underscoring the formidable nature of the challenge that Nomos 1 confronted.

Against this backdrop of extreme difficulty, the performance of Nomos 1 was nothing short of extraordinary. The AI system achieved an impressive score of 87 out of a possible 120 points on the 2024 competition problems. This result would have placed it second among the 3,988 human participants, falling just short of the top human score of 90. To fully appreciate this accomplishment, it must be compared to the median score of just 2, a figure that highlights the vast gulf between average performance and the elite level demonstrated by the AI. Nomos 1’s submission included eight problems answered with perfect scores, indicating a profound grasp of complex mathematical concepts and the ability to formulate flawless, rigorous proofs. This near-human level of performance in a domain that has long been considered a bastion of human intellect signals a significant milestone in the quest to develop artificial general intelligence. To ensure the integrity of the results, the AI’s solutions were blind-graded by a human expert who was a former top-200 Putnam finisher, and all associated materials were made public on GitHub to facilitate transparency and independent verification by the broader research community.

Smarter Not Bigger

A central and compelling aspect of Nomos 1’s success is that it was achieved through sophisticated technique rather than sheer computational scale, a direct challenge to the “bigger is better” philosophy that has dominated AI development. The system is built upon Alibaba’s Qwen3-30B-A3B-Thinking-2507 model, which contains 30 billion parameters. However, it employs a highly efficient mixture-of-experts (MoE) architecture, which means that only a fraction of the total parameters—approximately 3 billion—are active at any given moment during processing. This makes the model relatively compact and computationally frugal, especially when compared to the behemoths of the industry. The most striking evidence for the primacy of its design lies in the dramatic performance gap between the optimized Nomos 1 and its un-tuned base model. When the raw Qwen3 model was subjected to the same set of Putnam problems, it managed a score of only 24 out of 120. The monumental leap to 87 points is credited almost entirely to the specialized training, high-quality data, and, most critically, the innovative “reasoning harness” developed by Nous Research, proving that a superior software framework can amplify a model’s capabilities exponentially.

The engine driving Nomos 1’s remarkable mathematical acumen is an intricate, two-phase reasoning harness designed to mimic and optimize the problem-solving process within the competition’s strict three-hour time limit. The process begins with the “Solving Phase,” where a system of parallel “workers” simultaneously tackle the twelve problems. The system uses a priority-based logic, directing computational resources toward the problems that have the fewest self-judged “perfect” solutions, thereby ensuring that effort is intelligently allocated to the most difficult challenges. As each worker generates a potential solution, it also engages in a process of self-critique, assigning its own work a confidence score from one to seven. This iterative cycle of generation and assessment continues until a predetermined number of high-confidence solutions are produced for each problem or until time begins to run out. Fifteen minutes before the deadline, the system transitions to the “Finalization Phase,” a crucial stage dedicated to selecting the single best submission for each problem. This involves a consolidation step that groups the multitude of generated solutions by their final conclusion, followed by a sophisticated pairwise tournament that uses a single-elimination format to meticulously compare the top submissions and select the definitive final answer.

A New Era of Accessible AI

While other proprietary models from industry giants have reportedly achieved even higher raw scores on mathematical benchmarks, Nomos 1’s defining characteristic is not its absolute performance but its unprecedented combination of power, efficiency, and accessibility. Its roughly 3 billion active parameters represent a mere fraction of the computational footprint of its closed-source competitors. For context, estimates place OpenAI’s o1-pro at around 1.8 trillion parameters and Google’s Gemini 2.5 Pro at over 400 billion. This vast difference in scale has profound practical implications; Nomos 1 can be run effectively on consumer-grade hardware, a stark departure from the massive, energy-intensive data centers required to operate frontier models from major tech corporations. This achievement shatters the notion that state-of-the-art AI is the exclusive domain of entities with billion-dollar research budgets, paving the way for a more democratized technological future. The ability to run a top-tier AI mathematician on a local machine lowers the barrier to entry for innovation across countless fields.

By releasing Nomos 1 and its reasoning harness under a permissive Apache 2.0 open-source license, Nous Research has taken a deliberate step to empower the global community with elite-level AI capabilities. This move allows organizations, academic institutions, and even individual researchers to deploy advanced mathematical reasoning tools on their own infrastructure, freeing them from a dependency on costly API calls and the ecosystems of large cloud providers. The availability of such a powerful, open-source system has immediate and transformative applications in industries that rely on rigorous logical deduction and formal verification, including software and hardware engineering, advanced scientific modeling, cryptographic analysis, and automated theorem proving. For the research community, this development is particularly significant. It provides mathematicians and computer scientists with a powerful collaborator for verifying proofs, exploring complex theoretical systems, and potentially accelerating the pace of new discoveries, thereby democratizing not just the technology but the very process of scientific and mathematical inquiry itself.

Nous Researchs Vision for a Decentralized Future

The release of Nomos 1 was not an isolated event but a strategic move within a broader, coherent vision pursued by Nous Research. This vision champions a future where AI development is more distributed, efficient, and open. This philosophy was further evidenced by the company’s recent launch of Hermes 4.3, a general-purpose language model trained on their proprietary “Psyche network.” This network represents a groundbreaking approach to model training, utilizing a decentralized infrastructure that coordinates nodes over the open internet, with its integrity secured by the Solana blockchain. The fact that the version of Hermes 4.3 trained on this distributed network outperformed its centrally-trained counterpart served as a powerful proof-of-concept, demonstrating the viability of decentralized training for producing high-quality, production-ready models. Together, the development of Nomos 1 and Hermes 4.3 signaled Nous Research’s strategic bet that the future of AI competitiveness would not be won through a monolithic race for ever-larger parameter counts.

Ultimately, the advent of Nomos 1 highlighted a rapidly accelerating trend: the narrowing of the capability gap between colossal, closed-source models and smaller, highly optimized, open-source alternatives. This achievement had profound implications for both industry and research, offering a compelling proof-of-concept that true artificial intelligence is a product of sophisticated architecture and clever design, not just raw computational scale. By combining a moderately sized base model with a brilliant and intricate reasoning framework, Nous Research not only created a near-elite AI mathematician but also placed its power directly into the hands of the global community. The event marked a landmark moment that challenged the status quo and championed a more accessible, efficient, and democratic path forward for the entire field of artificial intelligence.

Subscribe to our weekly news digest.

Join now and become a part of our fast-growing community.

Invalid Email Address
Thanks for Subscribing!
We'll be sending you our best soon!
Something went wrong, please try again later