GPQA
87.6 score
Compare Kimi K2.5 (Reasoning) vs GPT, Claude, Gemini, DeepSeek, open-weight, and frontier AI models using public benchmark scores, token pricing, context window, and access details.
AskClash combines public LLM benchmark cells into a weighted percentile score and penalizes missing coverage so narrow rows do not dominate better-measured models.
Cached benchmark values can include HLE, GPQA, SWE-bench, SWE-Pro, SWE-Atlas, Terminal-Bench, MCP Atlas, MMMU-Pro, ARC-AGI-2, Tau2, and model-specific coding or agent scores.
87.6 score
76.8 score
78.5 score
95.9 score
Use these comparison links to evaluate Kimi K2.5 (Reasoning) against nearby LLMs by benchmark score, price, context window, and provider.
Cached AskClash article matches that can provide release, provider, benchmark, pricing, or market context around this model.

Techmeme
** Fix Kimi-K2.5 tokenizer regression and _patch_mistral_regex Attribute… (#45305) by ArthurZucker ** Fix Qwen2.5-VL temporal RoPE scaling applied to still images (#45330) by Kash6, zucchini-nlp
Latent Space
Pydantic data models defining AgentFlow's type system: AgentKind (codex/claude/kimi/python/shell/sync), NodeSpec, PipelineSpec, ProviderConfig, target types (local/SSH/EC2/ECS/container), fanout expansion, MCP server specs.
Last cached leaderboard date: June 9, 2026. This model page is generated from the AskClash LLM Leaderboard cache and linked from the live leaderboard.