Benchmark scores
Higher is better for all benchmark scores (SWE-bench, GPQA, HLE, Terminal-Bench, etc.). Green marks the model with the higher score.
Side-by-side Kimi K2.7 Code vs Qwen3.6-35B-A3B comparison across SWE-bench, GPQA, HLE, Terminal-Bench, coding agent scores, token pricing, context window, and AskClash RWT. Green marks the winner on each benchmark.
Green cells highlight the winning model for each metric. Scores are cached from the AskClash LLM leaderboard snapshot.
| Metric | Kimi K2.7 Code | Qwen3.6-35B-A3B |
|---|---|---|
| Overall Score | 68.3 | 38.6 |
| Leaderboard Rank | #8 | #31 |
| RWT | 7.5 | — |
| HLE | 54.0 | 21.4 |
| GPQA | 90.5 | 86.0 |
| SWE-bench | 80.2 | 73.4 |
| SWE-Pro | 58.6 | — |
| Terminal-Bench | 66.7 | 51.5 |
| LiveCodeBench | 89.6 | 80.4 |
| OSWorld | 73.1 | — |
| MCP Atlas | 76.0 | 62.8 |
| Finance Agent | 44.9 | — |
| CharXiv | 80.4 | 78.0 |
| MMMU-Pro | 79.4 | 75.3 |
| Tau2 | 90.1 | 95.3 |
| Input Price (per 1M tokens) | $0.95 | $0 |
| Output Price (per 1M tokens) | $4.00 | $0 |
| Context Window | 256K | 262K |
| Benchmark Cells | 13 | 9 |
Explore how Kimi K2.7 Code and Qwen3.6-35B-A3B stack up against other top-ranked LLMs.
Higher is better for all benchmark scores (SWE-bench, GPQA, HLE, Terminal-Bench, etc.). Green marks the model with the higher score.
Lower is better for input and output prices. Green marks the cheaper model per 1M tokens.
Models with fewer disclosed benchmark cells may have inflated percentile scores. Check the benchmark cell count for context.
This comparison page is generated from the AskClash LLM leaderboard cache. Open the live leaderboard for real-time scores and interactive filtering.