O3 Openai model is no less than its standard claims
Openai facing the latest LLM, O3, checking after independent tests have found that they have solved much less difficult mathematics problems than the company that first claimed.
When Openai revealed O3 in December, executive officials said that the model could answer “slightly more than the problems in Frontiermath, a very reputable set of mathematics puzzles at the graduate levels.
They added that the best competitor was stuck near 2 %. “Today, all shows contain less than 2 %,” said Mark Chen, chief research official, Mark Chen during O3 and O3 MINI. Live broadcast. “We see, with O3 in aggressive test time calculation settings, we can get more than 25 %.”
Techcrunch I mentioned The result was obtained by Openai’s version of O3, which used computing power more than the company last week.
On Friday, the AI Research Institute, which created Frontiermath, has published its own audience.
Openai O3 has released a very expected thinking model, along with O4-MINI, a smaller and cheapest model of O3-MINI.
We evaluated the new models on our wing of mathematics and science standards. Results on the topic! pic.twitter.com/5gbtzkey1b
AI era (Epochairesearch) April 18, 2025
Using an updated version of 290 questions of the standard, applying the model mode by approximately 10 %.
The result coincides with a lower character in Openai’s artistic paper in December, and EPOCH warned that the contradiction may be due to various reasons.
“The difference between our results and Openai may be due to Openai’s evaluation with more powerful internal scaffolding, using more computer time, or because these results were operated on a different sub -group of Frontiermath,” EPOCH books.
Frontiermath is designed to measure progress towards advanced athletic thinking. The December 2024 group contained 180 problems, while the special update in February 2025 expanded the complex to 290.
Transformations in the list of questions and the permitted computer capacity can cause significant fluctuations in the amounts reported.
Openai emphasized that the General O3 model uses a lower account than the experimental version
The evidence is that the commercial O3 also lacks tests by the ARC Foundation, which tried a larger and greater building. The general version “is a different model … seized for the use of chat/product”, Arc Price Foundation to publish On X, adding that “all the levels of the released O3 account are smaller than the version we evaluated.”
Openai Wenda Zou’s employee gave a similar explanation during a mysterious period last week. He said that the production system was “more improved for real world use situations”. “We have done [optimizations] To make the model more efficient in cost [and] “In general is more useful,” Zhu said, while recognizing the potential criterion of “discrepancies”.
Smaller models from the company, O3 – Mini -Height and O4 – MINI which has been announced, have already beat O3 over Frontiermath, and Openai says that the O3 -Pro variable will arrive in the coming weeks.
However, it explains how the reference titles can be misleading. In January, EPOCH was criticized for delaying the disclosure of Openai’s financing even after the appearance of O3 for the first time. Recently, the Xai MUSK’s XAI’s startup was charged with offering the plans exaggerated in the Grok 3 model capabilities.
Industry observers say that such standard differences have become their occurrence in the artificial intelligence industry, as companies are racing to capture newspaper addresses with new models.
Cryptopolitan Academy: Do you want to develop your money in 2025? Learn how to do this with Defi on our next electronic performance. Keep your place