How do GPT-5.5 xHigh and Gemini 3.1 Pro compare on coding benchmarks?

The comparison table shows SWE-bench, SWE-Pro, SWE-Atlas, Terminal-Bench, LiveCodeBench, and coding agent index scores for both GPT-5.5 xHigh and Gemini 3.1 Pro when publicly disclosed.

LLM Comparison

GPT-5.5 xHigh vs Gemini 3.1 Pro: benchmark scores, pricing & comparison.

Q: Which is better: GPT-5.5 xHigh or Gemini 3.1 Pro?

AskClash compares GPT-5.5 xHigh and Gemini 3.1 Pro side by side across SWE-bench, GPQA, HLE, Terminal-Bench, coding agent scores, token pricing, and context window so you can see which model wins on each benchmark.

Side-by-side GPT-5.5 xHigh vs Gemini 3.1 Pro comparison across SWE-bench, GPQA, HLE, Terminal-Bench, coding agent scores, token pricing, context window, and AskClash RWT. Green marks the winner on each benchmark.

Open live leaderboard GPT-5.5 xHigh model page Gemini 3.1 Pro model page

Rank #3 vs #14AskClash overall scores 86.0 vs 59.8.

Pricing $5.00/$30.0 vs $2.00/$12.0Input and output token prices per 1M tokens when published.

Proprietary vs ProprietaryOpenAI vs Google.

GPT-5.5 xHigh vs Gemini 3.1 Pro benchmark comparison

Green cells highlight the winning model for each metric. Scores are cached from the AskClash LLM leaderboard snapshot.

Metric	GPT-5.5 xHigh	Gemini 3.1 Pro
Overall Score	86.0	59.8
Leaderboard Rank	#3	#14
RWT	9.0	—
Coding Agent Index	76.4	42.7
HLE	52.2	51.4
GPQA	93.6	94.3
SWE-bench	—	80.6
SWE-Pro	58.6	54.2
Terminal-Bench	82.7	68.5
OSWorld	78.7	—
MCP Atlas	75.3	69.2
Finance Agent	51.8	43.0
CharXiv	—	80.2
MMMU-Pro	81.2	83.9
ARC-AGI 2	85.0	77.1
Tau2	98.0	99.3
MRCR	—	84.9
Input Price (per 1M tokens)	$5.00	$2.00
Output Price (per 1M tokens)	$30.0	$12.0
Context Window	1M	1M
Benchmark Cells	12	15

More GPT-5.5 xHigh and Gemini 3.1 Pro comparisons

Explore how GPT-5.5 xHigh and Gemini 3.1 Pro stack up against other top-ranked LLMs.

Claude Mythos/Fable 5 vs GPT-5.5 xHigh Claude Mythos/Fable 5 vs Gemini 3.1 Pro Claude Opus 4.8 (Adaptive) vs GPT-5.5 xHigh Claude Opus 4.8 (Adaptive) vs Gemini 3.1 Pro GPT-5.5 xHigh vs GLM-5.2 GLM-5.2 vs Gemini 3.1 Pro GPT-5.5 xHigh vs Claude Opus 4.7 (Adaptive)Claude Opus 4.7 (Adaptive) vs Gemini 3.1 Pro GPT-5.5 xHigh vs Qwen3.7 Max Qwen3.7 Max vs Gemini 3.1 Pro

How to read this comparison

Benchmark scores

Higher is better for all benchmark scores (SWE-bench, GPQA, HLE, Terminal-Bench, etc.). Green marks the model with the higher score.

Token pricing

Lower is better for input and output prices. Green marks the cheaper model per 1M tokens.

Coverage matters

Models with fewer disclosed benchmark cells may have inflated percentile scores. Check the benchmark cell count for context.

This comparison page is generated from the AskClash LLM leaderboard cache. Open the live leaderboard for real-time scores and interactive filtering.