How do GPT-5.4 xHigh and Qwen3.5 397B compare on coding benchmarks?

The comparison table shows SWE-bench, SWE-Pro, SWE-Atlas, Terminal-Bench, LiveCodeBench, and coding agent index scores for both GPT-5.4 xHigh and Qwen3.5 397B when publicly disclosed.

LLM Comparison

GPT-5.4 xHigh vs Qwen3.5 397B: benchmark scores, pricing & comparison.

Q: Which is better: GPT-5.4 xHigh or Qwen3.5 397B?

AskClash compares GPT-5.4 xHigh and Qwen3.5 397B side by side across SWE-bench, GPQA, HLE, Terminal-Bench, coding agent scores, token pricing, and context window so you can see which model wins on each benchmark.

Side-by-side GPT-5.4 xHigh vs Qwen3.5 397B comparison across SWE-bench, GPQA, HLE, Terminal-Bench, coding agent scores, token pricing, context window, and AskClash RWT. Green marks the winner on each benchmark.

Open live leaderboard GPT-5.4 xHigh model page Qwen3.5 397B model page

Rank #7 vs #19AskClash overall scores 73.5 vs 49.5.

Pricing $2.50/$15.0 vs $0.60/$3.60Input and output token prices per 1M tokens when published.

Proprietary vs Open WeightOpenAI vs Alibaba.

GPT-5.4 xHigh vs Qwen3.5 397B benchmark comparison

Green cells highlight the winning model for each metric. Scores are cached from the AskClash LLM leaderboard snapshot.

Metric	GPT-5.4 xHigh	Qwen3.5 397B
Overall Score	73.5	49.5
Leaderboard Rank	#7	#19
RWT	8.0	—
Coding Agent Index	71.1	—
HLE	52.1	28.7
GPQA	92.8	88.4
IFEval	—	92.6
SWE-bench	—	76.2
SWE-Pro	57.7	—
Terminal-Bench	75.1	—
LiveCodeBench	—	83.6
OSWorld	75.0	—
MCP Atlas	70.6	46.1
CharXiv	82.8	80.8
MMMU-Pro	81.2	79.0
ARC-AGI 2	73.3	—
Tau2	98.9	83.9
MRCR	97.3	—
Input Price (per 1M tokens)	$2.50	$0.60
Output Price (per 1M tokens)	$15.0	$3.60
Context Window	1.05M	128K
Benchmark Cells	14	9

More GPT-5.4 xHigh and Qwen3.5 397B comparisons

Explore how GPT-5.4 xHigh and Qwen3.5 397B stack up against other top-ranked LLMs.

Claude Mythos/Fable 5 vs GPT-5.4 xHigh Claude Mythos/Fable 5 vs Qwen3.5 397B Claude Opus 4.8 (Adaptive) vs GPT-5.4 xHigh Claude Opus 4.8 (Adaptive) vs Qwen3.5 397B GPT-5.5 xHigh vs GPT-5.4 xHigh GPT-5.5 xHigh vs Qwen3.5 397B GLM-5.2 vs GPT-5.4 xHigh GLM-5.2 vs Qwen3.5 397B Claude Opus 4.7 (Adaptive) vs GPT-5.4 xHigh Claude Opus 4.7 (Adaptive) vs Qwen3.5 397B

How to read this comparison

Benchmark scores

Higher is better for all benchmark scores (SWE-bench, GPQA, HLE, Terminal-Bench, etc.). Green marks the model with the higher score.

Token pricing

Lower is better for input and output prices. Green marks the cheaper model per 1M tokens.

Coverage matters

Models with fewer disclosed benchmark cells may have inflated percentile scores. Check the benchmark cell count for context.

This comparison page is generated from the AskClash LLM leaderboard cache. Open the live leaderboard for real-time scores and interactive filtering.