GPQA
93.2 score
Compare GPT-5.2 Pro vs GPT, Claude, Gemini, DeepSeek, open-weight, and frontier AI models using public benchmark scores, token pricing, context window, and access details.
AskClash combines public LLM benchmark cells into a weighted percentile score and penalizes missing coverage so narrow rows do not dominate better-measured models.
Cached benchmark values can include HLE, GPQA, SWE-bench, SWE-Pro, SWE-Atlas, Terminal-Bench, MCP Atlas, MMMU-Pro, ARC-AGI-2, Tau2, and model-specific coding or agent scores.
93.2 score
99.0 score
80.0 score
88.7 score
86.5 score
52.9 score
Use these comparison links to evaluate GPT-5.2 Pro against nearby LLMs by benchmark score, price, context window, and provider.
Cached AskClash article matches that can provide release, provider, benchmark, pricing, or market context around this model.
Letta Code supports [skills](https://docs.letta.com/letta-code/skills) and [subagents](https://docs.letta.com/letta-code/subagents), and bundles pre-built skills/subagents for advanced memory and continual learning. Letta is fully model-agnostic, though we recommend Opus 4.5 and GPT-5.2 for best performance (see our [model leaderboard](https://leaderboard.letta.com/) for our rankings). * **Follow our socials**: [Twitter/X](https://twitter.com/Letta_AI), [LinkedIn](https://www.linkedin.com/in/let
GPT-5.5 prompting guide Simon Willisonβs Weblog Subscribe Sponsored by: Sonar β Now with SAST + SCA for secure, dependency-aware Agentic Engineering. SonarQube Advanced Security 25th April 2026 - Link Blog GPT-5.5 prompting guide . Now that GPT-5.5 is available in the API , OpenAI have released a wealth of useful tips on how best to prompt the new model. Here's a neat trick they recommend for applications that might spend considerable time thinking before returning a user-visible response: Befor
YCβs OpenAI stake π°, Gemini API Webhooks π§βπ», AI PE partnerships π¦ TLDR Newsletters Advertise TLDR TLDR AI 2026-05-05 YCβs OpenAI stake π°, Gemini API Webhooks π§βπ», AI PE partnerships π¦ The change you just shipped broke prod. Why? (Sponsor) AI fails differently than normal software. To make sense of it, Notion, Ramp, and Stripe use Braintrust to run thousands of evals a day and ship updates within 24 hours. Braintrust sits between your app and your models to bring evals and observability together
fix(core,model-profiles): add missing `ModelProfile` fields, warn on schema drift (#36129) URL: https://github.com/langchain-ai/langchain/releases/tag/langchain-openai%3D%3D1.1.12
Last cached leaderboard date: May 25, 2026. This model page is generated from the AskClash LLM Leaderboard cache and linked from the live leaderboard.