Compare

Gemini 3.1 Pro vs Step 3.7 Flash

Gemini 3.1 Pro (Google DeepMind) and Step 3.7 Flash (StepFun) compared on benchmarks, pricing, context window, and use-case rankings.

Gemini 3.1 Pro ranks higher for coding (68th vs 32th percentile).

Gemini 3.1 ProGoogle DeepMind · 3 in the newsStep 3.7 FlashStepFun

Scores

Intelligence (ECI)155—

Coding6832

Math77—

Reasoning & Knowledge9635

Agentic & Tools8897

Specifications

DeveloperGoogle DeepMindStepFun

FamilyGemini—

ReleasedFeb 19, 2026May 28, 2026

Parameters——

AvailabilityAPI access—

Context window—256K

Price — $/M input—$0.20

Price — $/M output—$1.15

Inputs—text, image, video

Outputs—text

Benchmarks

AIME 2024/202596%—

APEX34%—

ARC-AGI98%—

ARC-AGI-277%—

FrontierMath37%—

FrontierMath Tier 417%—

GPQA Diamond94%81%

GSO (code optimization)23%—

Humanity's Last Exam46%20%

METR task horizon6.4 h—

SimpleBench80%—

SimpleQA Verified77%—

SWE-bench Verified76%—

Terminal-Bench80%—

WebDev Arena1461—

WeirdML72%—

SciCode—40%

τ²-bench—99%

Use-case scores are 0–100 percentile composites across each area’s benchmarks, ranked against every model from the past year. Highlighted cells lead each row. Open a model for the full picture.

Frequently asked questions

Which is better for coding, Gemini 3.1 Pro or Step 3.7 Flash?

Across coding benchmarks like SWE-bench Verified and Terminal-Bench, Gemini 3.1 Pro ranks higher — 68th vs 32th percentile among the models tracked on Model Beat.

Want a different match-up? Open the compare tool to add or swap models.

More comparisons

Benchmarks & model data from Epoch AI (CC BY); pricing & specs from OpenRouter. ECI = Epoch Capabilities Index.