← All models
Compare

Gemini 3.1 Pro vs Grok 4

Gemini 3.1 Pro (Google DeepMind) and Grok 4 (xAI) compared on benchmarks, pricing, context window, and use-case rankings.

Gemini 3.1 ProGoogle DeepMind · 2 in the newsGrok 4xAI
Scores
Intelligence (ECI)156147
Coding7115
Math7622
Reasoning & Knowledge9648
Agentic & Tools8816
Specifications
DeveloperGoogle DeepMindxAI
FamilyGeminiGrok
ReleasedFeb 19, 2026Jul 9, 2025
Parameters3T
AvailabilityAPI accessAPI access
Context window
Price — $/M input
Price — $/M output
Inputs
Outputs
Benchmarks
AIME 2024/202596%84%
APEX34%15%
ARC-AGI98%67%
ARC-AGI-277%16%
FrontierMath37%20%
FrontierMath Tier 417%2%
GPQA Diamond94%87%
GSO (code optimization)23%
Humanity's Last Exam46%
METR task horizon6.4 h1.8 h
SimpleBench80%61%
SimpleQA Verified77%48%
SWE-bench Verified76%
Terminal-Bench80%27%
WebDev Arena1461
WeirdML72%46%
GDPval (win/tie rate)24%

Use-case scores are 0–100 percentile composites across each area’s benchmarks, ranked against every model from the past year. Highlighted cells lead each row. Open a model for the full picture.

Frequently asked questions

Is Gemini 3.1 Pro better than Grok 4?

On Epoch AI's Capabilities Index, Gemini 3.1 Pro scores higher (156) than Grok 4 (147). The right pick depends on your task — compare their coding, math, and reasoning scores in the table above.

Which is better for coding, Gemini 3.1 Pro or Grok 4?

Across coding benchmarks like SWE-bench Verified and Terminal-Bench, Gemini 3.1 Pro ranks higher — 71th vs 15th percentile among the models tracked on Model Beat.

Want a different match-up? Open the compare tool to add or swap models.

Benchmarks & model data from Epoch AI (CC BY); pricing & specs from OpenRouter. ECI = Epoch Capabilities Index.