How do Qwen3.7 Max and Composer 2.5 compare on coding benchmarks?

The comparison table shows SWE-bench, SWE-Pro, SWE-Atlas, Terminal-Bench, LiveCodeBench, and coding agent index scores for both Qwen3.7 Max and Composer 2.5 when publicly disclosed.

LLM Comparison

Qwen3.7 Max vs Composer 2.5: benchmark scores, pricing & comparison.

Q: Which is better: Qwen3.7 Max or Composer 2.5?

AskClash compares Qwen3.7 Max and Composer 2.5 side by side across SWE-bench, GPQA, HLE, Terminal-Bench, coding agent scores, token pricing, and context window so you can see which model wins on each benchmark.

Side-by-side Qwen3.7 Max vs Composer 2.5 comparison across SWE-bench, GPQA, HLE, Terminal-Bench, coding agent scores, token pricing, context window, and AskClash RWT. Green marks the winner on each benchmark.

Open live leaderboard Qwen3.7 Max model page Composer 2.5 model page

Rank #6 vs #10AskClash overall scores 78.0 vs 67.0.

Pricing $2.50/$7.50 vs $0.50/$2.50Input and output token prices per 1M tokens when published.

Proprietary vs ProprietaryAlibaba vs Cursor.

Qwen3.7 Max vs Composer 2.5 benchmark comparison

Green cells highlight the winning model for each metric. Scores are cached from the AskClash LLM leaderboard snapshot.

Metric	Qwen3.7 Max	Composer 2.5
Overall Score	78.0	67.0
Leaderboard Rank	#6	#10
RWT	8.0	8.5
Coding Agent Index	—	51.8
HLE	41.4	—
GPQA	92.4	—
IFEval	94.3	—
SWE-bench	80.4	—
SWE-Pro	—	47.0
SWE-Atlas	—	72.0
Terminal-Bench	69.7	69.3
LiveCodeBench	91.6	—
MCP Atlas	76.4	—
Finance Agent	48.4	—
Tau2	94.7	—
MRCR	90.4	—
Input Price (per 1M tokens)	$2.50	$0.50
Output Price (per 1M tokens)	$7.50	$2.50
Context Window	1M	200K
Benchmark Cells	12	4

More Qwen3.7 Max and Composer 2.5 comparisons

Explore how Qwen3.7 Max and Composer 2.5 stack up against other top-ranked LLMs.

Claude Mythos/Fable 5 vs Qwen3.7 Max Claude Mythos/Fable 5 vs Composer 2.5 Claude Opus 4.8 (Adaptive) vs Qwen3.7 Max Claude Opus 4.8 (Adaptive) vs Composer 2.5 GPT-5.5 xHigh vs Qwen3.7 Max GPT-5.5 xHigh vs Composer 2.5 GLM-5.2 vs Qwen3.7 Max GLM-5.2 vs Composer 2.5 Claude Opus 4.7 (Adaptive) vs Qwen3.7 Max Claude Opus 4.7 (Adaptive) vs Composer 2.5

How to read this comparison

Benchmark scores

Higher is better for all benchmark scores (SWE-bench, GPQA, HLE, Terminal-Bench, etc.). Green marks the model with the higher score.

Token pricing

Lower is better for input and output prices. Green marks the cheaper model per 1M tokens.

Coverage matters

Models with fewer disclosed benchmark cells may have inflated percentile scores. Check the benchmark cell count for context.

This comparison page is generated from the AskClash LLM leaderboard cache. Open the live leaderboard for real-time scores and interactive filtering.