HLE
40.0 score
Compare Claude Opus 4.6 (Adaptive) vs GPT, Claude, Gemini, DeepSeek, open-weight, and frontier AI models using public benchmark scores, token pricing, context window, and access details.
AskClash combines public LLM benchmark cells into a weighted percentile score and penalizes missing coverage so narrow rows do not dominate better-measured models.
Cached benchmark values can include HLE, GPQA, SWE-bench, SWE-Pro, SWE-Atlas, Terminal-Bench, MCP Atlas, MMMU-Pro, ARC-AGI-2, Tau2, and model-specific coding or agent scores.
40.0 score
84.0 score
89.2 score
80.8 score
65.4 score
72.7 score
68.8 score
92.1 score
Use these comparison links to evaluate Claude Opus 4.6 (Adaptive) against nearby LLMs by benchmark score, price, context window, and provider.
Cached AskClash article matches that can provide release, provider, benchmark, pricing, or market context around this model.
Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP Source: arXiv Logic / Formal Methods URL: https://arxiv.org/abs/2603.20405

Nathan Lambert - Interconnects

Latent Space
Claude Token Counter, now with model comparisons Simon Willison’s Weblog Subscribe Sponsored by: Honeycomb — AI agents behave unpredictably. Get the context you need to debug what actually happened. Read the blog 20th April 2026 - Link Blog Claude Token Counter, now with model comparisons . I upgraded my Claude Token Counter tool to add the ability to run the same count against different models in order to compare them. As far as I can tell Claude Opus 4.7 is the first model to change the tokeni
Last cached leaderboard date: May 22, 2026. This model page is generated from the AskClash LLM Leaderboard cache and linked from the live leaderboard.