HLE
33.5 score
Compare GPT-5.2-Codex vs GPT, Claude, Gemini, DeepSeek, open-weight, and frontier AI models using public benchmark scores, token pricing, context window, and access details.
AskClash combines public LLM benchmark cells into a weighted percentile score and penalizes missing coverage so narrow rows do not dominate better-measured models.
Cached benchmark values can include HLE, GPQA, SWE-bench, SWE-Pro, SWE-Atlas, Terminal-Bench, MCP Atlas, MMMU-Pro, ARC-AGI-2, Tau2, and model-specific coding or agent scores.
33.5 score
89.9 score
94.0 score
77.6 score
56.4 score
64.0 score
76.3 score
92.1 score
Use these comparison links to evaluate GPT-5.2-Codex against nearby LLMs by benchmark score, price, context window, and provider.
Cached AskClash article matches that can provide release, provider, benchmark, pricing, or market context around this model.

Latent Space

Towards AI

On evaluating and understanding the frontier of agents, and why I still turn to Claude.
Letta Code supports [skills](https://docs.letta.com/letta-code/skills) and [subagents](https://docs.letta.com/letta-code/subagents), and bundles pre-built skills/subagents for advanced memory and continual learning. Letta is fully model-agnostic, though we recommend Opus 4.5 and GPT-5.2 for best performance (see our [model leaderboard](https://leaderboard.letta.com/) for our rankings). * **Follow our socials**: [Twitter/X](https://twitter.com/Letta_AI), [LinkedIn](https://www.linkedin.com/in/let
Last cached leaderboard date: May 25, 2026. This model page is generated from the AskClash LLM Leaderboard cache and linked from the live leaderboard.