Gemini 2.5 Pro Scores 130 IQ on Mensa Norway Test: A Breakthrough in AI Intelligence
- Mary
- Mar 31
- 5 min read
Introduction to Gemini 2.5 Pro’s Mensa Norway Achievement
In a groundbreaking development for artificial intelligence, Google’s Gemini 2.5 Pro has achieved an impressive IQ score of 130 on the Mensa Norway test, as highlighted in a recent X post by Mark Kretschmann on March 29, 2025. This score places Gemini 2.5 Pro at the top of a bell curve comparing various AI models, outranking competitors like OpenAI’s o1 (120 IQ) and xAI’s Grok 3 (112 IQ). The Mensa Norway test, known for its rigorous assessment of abstract reasoning, offers a fascinating glimpse into the evolving capabilities of AI.
This article explores the significance of this achievement, the context of the test, and what it means for the future of AI intelligence.

Understanding the Mensa Norway IQ Test
The Mensa Norway IQ test is a standardized assessment designed to measure abstract reasoning and problem-solving skills, often through pattern recognition and logical deduction. Unlike traditional IQ tests like the Wechsler or Stanford-Binet, which include verbal and auditory components, the Mensa Norway test focuses on non-verbal reasoning, making it a suitable benchmark for AI models that process data differently from humans. According to a web result from iqtest-free.org, IQ scores follow a normal distribution with a mean of 100 and a standard deviation of 15, meaning a score of 130 places Gemini 2.5 Pro in the top 2% of a typical human population distribution.
What makes the Mensa Norway test particularly relevant for AI evaluation is its offline administration. As noted in a Reddit thread on r/singularity, the offline nature of the test ensures that AI models cannot access online resources or training data, providing a more accurate measure of their raw reasoning abilities. This approach addresses concerns about AI models potentially “cheating” by recalling memorized patterns from their training datasets.
Gemini 2.5 Pro’s Performance in Context
The X post by Mark Kretschmann includes a bell curve graph from MaximumTruth.org, titled “IQ Test Results (Average of Last 7 Tests),” which visually represents the performance of various AI models on the Mensa
Norway quiz.
Gemini 2.5 Pro’s score of 130 is a standout, but the graph also includes other notable models:
OpenAI’s o1: 120 IQ
xAI’s Grok 3: 112 IQ
Claude 3 Opus: 110 IQ
GPT-4o: 108 IQ
Llama-3.2: 90 IQ
Interestingly, a comment in the X thread by user @cain151714 points out that Gemini 2.5 Pro scored 118 on an offline version of the test, suggesting some variability depending on testing conditions. However, the 130 score from the averaged results still marks a significant leap forward. For comparison, OpenAI’s o1 model scored 120 on the same test, as reported in a Medium article by AI Tools Korner on November 16, 2024, indicating that Gemini 2.5 Pro has surpassed its predecessor in reasoning ability.
Why This Matters: AI and IQ Testing
The application of IQ tests to AI models is a topic of debate, as highlighted in the X thread. User @CaptainSude1 notes that IQ tests are traditionally designed to predict real-life human outcomes, raising questions about their validity for AI. However, the Mensa Norway test’s focus on abstract reasoning aligns surprisingly well with the capabilities of large language models (LLMs), which excel at pattern recognition and logical deduction. The bell curve distribution of AI scores mirrors the normal distribution seen in human populations, suggesting that IQ tests may indeed offer a useful, if imperfect, benchmark for comparing AI intelligence.
Gemini 2.5 Pro’s performance is particularly impressive given its recent debut as a top performer on the LMArena leaderboard, as reported by RD World Online on March 25, 2025. LMArena measures human preferences for AI outputs, and Gemini 2.5 Pro’s top spot—outscoring competitors by nearly 40 points—demonstrates its ability to produce high-quality, preferred responses. The model also showed strong results on other benchmarks, such as a 63.8% score on SWE-bench (agentic coding) and 86.7% on the AIME 2025 math benchmark, further underscoring its advanced reasoning and problem-solving skills.
The Technology Behind Gemini 2.5 Pro
Gemini 2.5 Pro, an experimental update to Google’s Gemini 2.0 series, was announced on March 25, 2025, and is described as Google’s most intelligent AI yet. According to the RD World Online article, the model leverages advancements in reinforcement learning and chain-of-thought prompting, techniques that allow it to analyze information more thoroughly, draw logical conclusions, and incorporate context and nuance. Its native multimodality—capable of processing text, audio, images, video, and code repositories—further enhances its versatility, building on the strengths of previous Gemini models.
This architectural innovation likely contributed to Gemini 2.5 Pro’s high IQ score. The Mensa Norway test requires identifying complex patterns and making logical deductions, tasks that benefit from the model’s enhanced reasoning capabilities. As AI continues to evolve, such advancements suggest that models like Gemini 2.5 Pro are moving closer to human-like intelligence, at least in specific domains.
Implications for the Future of AI
Gemini 2.5 Pro’s achievement has far-reaching implications for the AI industry. First, it highlights the rapid pace of progress in AI reasoning abilities. As noted in the Medium article, OpenAI’s o1 model was already a significant leap forward, with its 120 IQ score aligning with projections that AI could reach this level within 4–10 years. Gemini 2.5 Pro’s 130 IQ, achieved just months later, suggests that this timeline may be accelerating.
Second, the success of Gemini 2.5 Pro on a human-designed IQ test raises questions about how we define and measure intelligence in AI. While IQ tests provide a useful point of comparison, they are not a perfect fit for AI systems, which lack the emotional and social intelligence that humans possess. Future benchmarks may need to evolve to better capture the unique strengths and limitations of AI.
Finally, Gemini 2.5 Pro’s performance could drive advancements in related industries, such as semiconductors, which power these increasingly complex models. As AI models continue to improve, their applications in fields like education, healthcare, and scientific research are likely to expand, potentially transforming how we solve some of the world’s most pressing challenges.
Conclusion: A Milestone in AI Development
Gemini 2.5 Pro’s 130 IQ score on the Mensa Norway test is a remarkable milestone in the development of artificial intelligence. By outperforming other leading models and demonstrating advanced reasoning capabilities, Gemini 2.5 Pro has set a new standard for what AI can achieve. While the use of IQ tests for AI remains a topic of discussion, this achievement underscores the rapid progress being made in the field. As Google and other companies continue to push the boundaries of AI, the future promises even more exciting breakthroughs, bringing us closer to a world where AI can rival—and perhaps surpass—human intelligence in specific domains. Mensa Norway Gemini 2.5 Pro.
Hozzászólások