HLE
40.0 score
AskClash combines public benchmark cells into a weighted percentile score and penalizes missing coverage so narrow rows do not dominate better-measured models.
Only benchmark columns with cached public values are shown here. Missing cells remain blank in the live table.
40.0 score
84.0 score
89.2 score
80.8 score
65.4 score
72.7 score
68.8 score
92.1 score
Use these links to compare nearby frontier and open-weight models from the same AI leaderboard data.
Cached AskClash article matches that can provide release, provider, or market context around this model.
Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP Source: arXiv Logic / Formal Methods URL: https://arxiv.org/abs/2603.20405

Nathan Lambert - Interconnects

Latent Space
Claude Token Counter, now with model comparisons Simon Willison’s Weblog Subscribe Sponsored by: Honeycomb — AI agents behave unpredictably. Get the context you need to debug what actually happened. Read the blog 20th April 2026 - Link Blog Claude Token Counter, now with model comparisons . I upgraded my Claude Token Counter tool to add the ability to run the same count against different models in order to compare them. As far as I can tell Claude Opus 4.7 is the first model to change the tokeni
Last cached leaderboard date: May 22, 2026. This model page is generated from the AskClash AI Leaderboard cache and linked from the live leaderboard.