Compare

GPT-5.5 vs Step 3.7 Flash

GPT-5.5 (OpenAI) and Step 3.7 Flash (StepFun) compared on benchmarks, pricing, context window, and use-case rankings.

Step 3.7 Flash is cheaper on input tokens — $0.20 vs $5.00 per 1M.
GPT-5.5 has a larger context window (1.1M vs 256K tokens).
GPT-5.5 ranks higher for coding (91th vs 32th percentile).

GPT-5.5OpenAI · 17 in the newsStep 3.7 FlashStepFun

Scores

Intelligence (ECI)159—

Coding9132

Math97—

Reasoning & Knowledge9235

Agentic & Tools8797

Specifications

DeveloperOpenAIStepFun

FamilyGPT—

ReleasedApr 23, 2026May 28, 2026

Parameters——

AvailabilityAPI access—

Context window1.1M256K

Price — $/M input$5.00$0.20

Price — $/M output$30.00$1.15

Inputsfile, image, texttext, image, video

Outputstexttext

Benchmarks

AIME 2024/2025100%—

APEX38%—

ARC-AGI95%—

ARC-AGI-285%—

FrontierMath52%—

FrontierMath Tier 435%—

GPQA Diamond94%81%

GSO (code optimization)40%—

Humanity's Last Exam44%20%

SciCode56%40%

SimpleBench69%—

SimpleQA Verified63%—

SWE-bench Verified81%—

Terminal-Bench85%—

WebDev Arena1505—

WeirdML85%—

τ²-bench94%99%

Use-case scores are 0–100 percentile composites across each area’s benchmarks, ranked against every model from the past year. Highlighted cells lead each row. Open a model for the full picture.

Frequently asked questions

Which is cheaper, GPT-5.5 or Step 3.7 Flash?

Step 3.7 Flash is cheaper on input tokens at $0.20 per million, versus $5.00 (representative OpenRouter pricing).

Which has a larger context window, GPT-5.5 or Step 3.7 Flash?

GPT-5.5 supports up to 1.1M tokens, compared with 256K for the other.

Which is better for coding, GPT-5.5 or Step 3.7 Flash?

Across coding benchmarks like SWE-bench Verified and Terminal-Bench, GPT-5.5 ranks higher — 91th vs 32th percentile among the models tracked on Model Beat.

Want a different match-up? Open the compare tool to add or swap models.

More comparisons

Benchmarks & model data from Epoch AI (CC BY); pricing & specs from OpenRouter. ECI = Epoch Capabilities Index.