HLE
41.5 score
Compare GPT-5.4 mini vs GPT, Claude, Gemini, DeepSeek, open-weight, and frontier AI models using public benchmark scores, token pricing, context window, and access details.
AskClash combines public LLM benchmark cells into a weighted percentile score and penalizes missing coverage so narrow rows do not dominate better-measured models.
Cached benchmark values can include HLE, GPQA, SWE-bench, SWE-Pro, SWE-Atlas, Terminal-Bench, MCP Atlas, MMMU-Pro, ARC-AGI-2, Tau2, and model-specific coding or agent scores.
41.5 score
88.0 score
97.4 score
54.4 score
60.0 score
72.1 score
57.7 score
45.4 score
78.0 score
93.4 score
Use these comparison links to evaluate GPT-5.4 mini against nearby LLMs by benchmark score, price, context window, and provider.
Cached AskClash article matches that can provide release, provider, benchmark, pricing, or market context around this model.

On evaluating and understanding the frontier of agents, and why I still turn to Claude.
Release: datasette-llm 0.1a4 Simon Willison’s Weblog Subscribe Sponsored by: WorkOS — Ready to sell to Enterprise clients? Build and ship securely with WorkOS. 31st March 2026 Release datasette-llm 0.1a4 — LLM integration plugin for other plugins to depend on Ability to configure different API keys for models based on their purpose - for example, set it up so enrichments always use gpt-5.4-mini with an API key dedicated to that purpose. #4 I released llm-echo 0.3 to provide an API key testing ut
Trusted access for the next era of cyber defense Simon Willison’s Weblog Subscribe Sponsored by: Teleport — Connect agents to your infra in seconds with Teleport Beams. Built-in identity. Zero secrets. Get early access 14th April 2026 - Link Blog Trusted access for the next era of cyber defense ( via ) OpenAI's answer to Claude Mythos appears to be a new model called GPT-5.4-Cyber: In preparation for increasingly more capable models from OpenAI over the next few months, we are fine-tuning our mo
A quote from Romain Huet Simon Willison’s Weblog Subscribe Sponsored by: Sonar — Now with SAST + SCA for secure, dependency-aware Agentic Engineering. SonarQube Advanced Security 25th April 2026 Since GPT-5.4, we’ve unified Codex and the main model into a single system, so there’s no separate coding line anymore. GPT-5.5 takes this further, with strong gains in agentic coding, computer use, and any task on a computer. — Romain Huet , confirming OpenAI won't release a GPT-5.5-Codex model Posted 2
Last cached leaderboard date: May 22, 2026. This model page is generated from the AskClash LLM Leaderboard cache and linked from the live leaderboard.