HLE
30.1 score
Compare Kimi K2.5 vs GPT, Claude, Gemini, DeepSeek, open-weight, and frontier AI models using public benchmark scores, token pricing, context window, and access details.
AskClash combines public LLM benchmark cells into a weighted percentile score and penalizes missing coverage so narrow rows do not dominate better-measured models.
Cached benchmark values can include HLE, GPQA, SWE-bench, SWE-Pro, SWE-Atlas, Terminal-Bench, MCP Atlas, MMMU-Pro, ARC-AGI-2, Tau2, and model-specific coding or agent scores.
30.1 score
87.6 score
82.0 score
93.9 score
76.8 score
50.7 score
50.8 score
85.0 score
63.3 score
29.5 score
77.5 score
78.5 score
95.9 score
Use these comparison links to evaluate Kimi K2.5 against nearby LLMs by benchmark score, price, context window, and provider.
Cached AskClash article matches that can provide release, provider, benchmark, pricing, or market context around this model.
** Fix Kimi-K2.5 tokenizer regression and _patch_mistral_regex Attribute… (#45305) by ArthurZucker ** Fix Qwen2.5-VL temporal RoPE scaling applied to still images (#45330) by Kash6, zucchini-nlp

Latent Space
Kimi K2.6 🚀, Codex Chronicle 🤖, Bezos’ $10B AI fundraise 💰 TLDR Newsletters Advertise TLDR TLDR AI 2026-04-21 Kimi K2.6 🚀, Codex Chronicle 🤖, Bezos’ $10B AI fundraise 💰 Your AI agents are already operating outside scope (Sponsor) New Cloud Security Alliance (CSA) research makes it clear: 47% of organizations have already experienced a security incident involving an AI agent. 53% report agents regularly exceeding intended permissions. And 87% of enterprises run two or more AI agent platforms. Eve
Pydantic data models defining AgentFlow's type system: AgentKind (codex/claude/kimi/python/shell/sync), NodeSpec, PipelineSpec, ProviderConfig, target types (local/SSH/EC2/ECS/container), fanout expansion, MCP server specs.
Last cached leaderboard date: May 22, 2026. This model page is generated from the AskClash LLM Leaderboard cache and linked from the live leaderboard.