← All models
Compare

GLM-5.1 vs Grok 4

GLM-5.1 (Z.ai (Zhipu AI)) and Grok 4 (xAI) compared on benchmarks, pricing, context window, and use-case rankings.

GLM-5.1Z.ai (Zhipu AI) · 5 in the newsGrok 4xAI
Scores
Intelligence (ECI)150147
Coding6915
Math6322
Reasoning & Knowledge4848
Agentic & Tools16
Specifications
DeveloperZ.ai (Zhipu AI)xAI
FamilyGrok
ReleasedApr 7, 2026Jul 9, 2025
Parameters754B3T
AvailabilityAPI access
Context window203K
Price — $/M input$0.98
Price — $/M output$3.08
Inputstext
Outputstext
Benchmarks
AIME 2024/202592%84%
FrontierMath33%20%
FrontierMath Tier 413%2%
GPQA Diamond85%87%
SimpleBench59%61%
SimpleQA Verified37%48%
SWE-bench Verified74%
WebDev Arena1534
WeirdML57%46%
APEX15%
ARC-AGI67%
ARC-AGI-216%
GDPval (win/tie rate)24%
METR task horizon1.8 h
Terminal-Bench27%

Use-case scores are 0–100 percentile composites across each area’s benchmarks, ranked against every model from the past year. Highlighted cells lead each row. Open a model for the full picture.

Frequently asked questions

Is GLM-5.1 better than Grok 4?

On Epoch AI's Capabilities Index, GLM-5.1 scores higher (150) than Grok 4 (147). The right pick depends on your task — compare their coding, math, and reasoning scores in the table above.

Which is better for coding, GLM-5.1 or Grok 4?

Across coding benchmarks like SWE-bench Verified and Terminal-Bench, GLM-5.1 ranks higher — 69th vs 15th percentile among the models tracked on Model Beat.

Want a different match-up? Open the compare tool to add or swap models.

Benchmarks & model data from Epoch AI (CC BY); pricing & specs from OpenRouter. ECI = Epoch Capabilities Index.