LLM Leaderboard · Proprietary

Gemini 3 Pro benchmarks, pricing, and LLM comparison.

Compare Gemini 3 Pro vs GPT, Claude, Gemini, DeepSeek, open-weight, and frontier AI models using public benchmark scores, token pricing, context window, and access details.

Rank #22AskClash overall score: 46.3
$2.00 / $12.0Input and output token price, when published. Context: 2M.
API/OAuthBilling and access path cached for this model row.

Gemini 3 Pro benchmark snapshot

AskClash combines public LLM benchmark cells into a weighted percentile score and penalizes missing coverage so narrow rows do not dominate better-measured models.

Overall46.3
Benchmark cells11
Context2M
CreatorGoogle

Gemini 3 Pro public benchmark scores

Cached benchmark values can include HLE, GPQA, SWE-bench, SWE-Pro, SWE-Atlas, Terminal-Bench, MCP Atlas, MMMU-Pro, ARC-AGI-2, Tau2, and model-specific coding or agent scores.

GPQA

90.8 score

MATH-500

91.0 score

SWE-bench

76.2 score

Terminal-Bench

54.2 score

LiveCodeBench

91.7 score

MCP Atlas

54.1 score

CharXiv

81.4 score

MMMU-Pro

81.0 score

ARC-AGI 2

31.1 score

Tau2

87.1 score

Gemini 3 Pro vs other AI models

Use these comparison links to evaluate Gemini 3 Pro against nearby LLMs by benchmark score, price, context window, and provider.

Related AI and tech coverage

Cached AskClash article matches that can provide release, provider, benchmark, pricing, or market context around this model.

Gemini Extended Thinking ✨, ChatGPT finance 📱, Claude Code at scale 👨‍💻

Gemini Extended Thinking ✨, ChatGPT finance 📱, Claude Code at scale 👨‍💻 TLDR Newsletters Advertise Blog TLDR TLDR AI 2026-05-18 Gemini Extended Thinking ✨, ChatGPT finance 📱, Claude Code at scale 👨‍💻 Your agent needs a harness, not a framework. 69% of engineers building in prod agree (Sponsor) Inngest asked 130 engineers about running AI in production—only 19% were very confident their stack could scale, with gaps in tracing being a key issue. 1 in 5 now spend up to half their time on reliabilit

Auto-grading decade-old Hacker News discussions with hindsight

Yesterday I stumbled on this HN thread Show HN: Gemini Pro 3 hallucinates the HN front page 10 years from now, where Gemini 3 was hallucinating the frontpage of 10 years from now. One of the comments struck me a bit more though - Bjartr linked to the HN frontpage from exactly 10 years ago, i.e. December 2015. I was reading through the discussions of 10 years ago and mentally grading them for prescience when I realized that an LLM might actually be a lot better at this task. I copy pasted one of

Building the agentic future: Developer highlights from I/O 2026

At Google I/O 2026, we’re accelerating the shift from prompts to action with the launch of Gemini 3.5 Flash. Combining frontier intelligence with incredible speed, 3.5 Flash outperforms Gemini 3.1 Pro across almost all benchmarks while running four times faster than other frontier models, providing the high-speed engine needed for real-world agentic workflows.We’re putting these capabilities in your hands with the launch of a new Google Antigravity 2.0 desktop application, Managed Agents in the

Last cached leaderboard date: May 22, 2026. This model page is generated from the AskClash LLM Leaderboard cache and linked from the live leaderboard.