top of page

OpenAI Unveils O3 and O4-Mini AI Models Capable of Image-Based Thinking and Autonomous Tool Use

  • Writer: Mary
    Mary
  • Apr 17
  • 3 min read

In a significant leap forward for artificial intelligence, OpenAI has unveiled two new models that transform how AI systems process visual information and solve complex problems. The company's latest additions to its "o-series" - o3 and o4-mini - represent what many industry experts are calling a fundamental shift in AI capabilities.


Gradient background with soft yellow and blue hues, white text reads "OpenAI" and "Starting Soon," conveying anticipation.

Visual Reasoning: Beyond Simply Seeing Images


The most revolutionary aspect of these new models lies in their ability to "think with images" rather than merely process them. While previous AI systems could recognize objects in images, these new models can manipulate visual information as part of their reasoning process, much like humans do when solving complex problems.


"They don't just see an image — they think with it," OpenAI explained in their announcement materials.

During the press conference demonstration, this capability was showcased when a researcher had o3 analyze a physics poster from a decade-old internship. The AI not only navigated through complex diagrams independently but also identified that certain results weren't actually present in the poster, suggesting a level of visual comprehension previously unseen in AI systems.


"It must have just read, you know, at least like 10 different papers in a few seconds for me," noted Brandon McKenzie, an OpenAI researcher working on multimodal reasoning. He estimated the task would have taken him "many days just to onboard myself back to my project, and then a few days more to actually search through the literature."



Autonomous Tool Use: Complex Problem-Solving Without Human Guidance


Perhaps equally impressive is these models' ability to independently use multiple tools when tackling complex problems - creating multi-step workflows without continuous human direction.


Greg Brockman, OpenAI's president, highlighted this capability during the announcement: "They actually use these tools in their chain of thought as they're trying to solve a hard problem. For example, we've seen o3 use like 600 tool calls in a row trying to solve a really hard task."


To illustrate this capability, the company described how the AI could handle a query about future energy usage patterns in California. Without step-by-step human guidance, the model could search the web for utility data, write Python code to analyze it, generate visualizations, and produce a comprehensive report - all as a single fluid process.



Breaking Performance Records


According to OpenAI, o3 sets new state-of-the-art benchmarks across key measures of AI capability, including technical assessments like Codeforces, SWE-bench, and MMMU. External expert evaluations show o3 making 20% fewer major errors than its predecessor on difficult real-world tasks.


Meanwhile, the smaller o4-mini model, designed for speed and cost efficiency, achieved an impressive 99.5% score on the 2025 AIME mathematics competition when given access to a Python interpreter.


In the realm of software engineering, these models excel particularly well. Brockman noted during the announcement that o3 is "actually better than I am at navigating through our OpenAI code base, which is really useful."


New Developer Tools and Initiatives


Alongside these new models, OpenAI introduced Codex CLI, a lightweight coding agent that runs directly in a user's terminal. This open-source tool allows developers to leverage the models' reasoning capabilities for coding tasks and supports screenshots and sketches.


To encourage adoption, the company is launching a $1 million initiative to support projects using Codex CLI and OpenAI models, with grants available in increments of $25,000 in API credits.



Availability and Access


The new models are already available to ChatGPT Plus, Pro, and Team users, with Enterprise and Education customers gaining access next week. Free users can sample o4-mini by selecting "Think" in the composer before submitting queries.


Developers can access both models via OpenAI's Chat Completions API and Responses API, though some organizations will need verification before gaining access.



The Future of AI


This release represents an important convergence in AI capabilities, with models increasingly combining specialized reasoning with natural conversation abilities and tool use.


"Today's updates reflect the direction our models are heading in: we're converging the specialized reasoning capabilities of the o-series with more of the natural conversational abilities and tool use of the GPT-series," OpenAI noted in its release.


With o3 and o4-mini, OpenAI appears to have crossed a significant threshold where machines begin to perceive and manipulate images as an integral part of their thinking process, much like humans do. This shift from passive recognition to active visual reasoning may ultimately prove more significant than any benchmark score, representing a moment when AI began to truly "see" the world through thinking eyes.


As competition in the AI space continues to intensify, OpenAI's dual focus on both reasoning capabilities and practical tool use suggests a strategy aimed at maintaining its leadership position by delivering both intelligence and utility in its systems.

Comments


bottom of page