OpenAI has made headlines after its new AI model secured a gold medal at the 2025 International Math Olympiad (IMO). The model, which has not yet been released to the public, participated under strict test conditions that prohibited internet access and coding tools. It achieved a score of 35 out of 42, just enough to qualify for the prestigious medal. The IMO is widely considered the most challenging mathematics competition for high school students globally.
Unlike other AI systems, such as Google DeepMind’s task-specific AlphaGeometry 2, OpenAI’s model is rooted in a general-purpose reasoning framework. Alexander Wei, a technical staff member at OpenAI, emphasized on X (formerly Twitter) that the model’s capabilities stem from advancements in general-purpose reinforcement learning and enhanced test-time compute scaling. This approach allowed the AI to tackle a wide range of problems rather than focusing solely on geometry or Olympiad-style questions.
The AI-generated proofs were thoroughly reviewed by three former IMO medalists, who reached a unanimous decision on their grading. OpenAI reported that the model produced comprehensive natural language solutions, mirroring the multi-page arguments crafted by human mathematicians using lemmas.
Despite the success, some experts have raised concerns regarding transparency. Gary Marcus, a professor at New York University, voiced skepticism on X, stating, “OpenAI has told us the result, but not how it was achieved. That leaves me with many questions.” As of now, there has been no independent verification from the official IMO coordinators regarding the AI’s performance.
OpenAI’s announcement came shortly after the Defense Advanced Research Projects Agency (DARPA) launched an initiative aimed at allowing AI to coauthor advanced mathematical research. Sam Altman, CEO of OpenAI, noted that “this is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence.” This aligns with DARPA’s efforts to catalyze significant technological advancements, reminiscent of projects like ARPANET.
The achievement underscores the potential of large language models to address complex, high-stakes problems beyond mere text generation. If validated, this success could represent a turning point in how we understand machine reasoning and the collaboration between humans and AI in pioneering new discoveries. While the implications are profound, the ongoing dialogue about transparency and reproducibility in AI remains critical as the field continues to evolve.
