Rankings

AI Models Ranking

My personal, honest ranking of the major AI models. I use most of these daily - these opinions are earned, not borrowed.

Last updated: March 2026

Claude

Anthropic S Tier

Opus 4.6 · Sonnet 4.6 · Haiku 4.5

My go-to for writing and coding. Claude Code is unmatched for software engineering - this entire site was built with it. Memory now works across all conversations for free. Hit #1 on the App Store in March 2026 with 11.3M daily users. Thoughtful, precise, and reliable.

claude.ai

ChatGPT

OpenAI S Tier

GPT-5.4 · GPT-5.4 Thinking · GPT-5.4 Pro

A close second. GPT-5.4 launched March 5 - 33% fewer false claims than GPT-5.2, native computer-use, and 47% more concise. GPT-5.4 mini and nano followed for high-volume workloads. 900M+ weekly users and the largest AI ecosystem. The Pentagon deal and ads for free users dented trust, but the model is excellent.

chatgpt.com

Gemini

Google DeepMind A Tier

Gemini 3.1 Pro · Gemini 3 Flash · Gemini 3 Deep Think

My pick for brainstorming and image generation. Gemini 3.1 Pro now leads 13 of 16 major benchmarks - 77.1% on ARC-AGI-2, 94.3% on GPQA Diamond - all at $2/M input tokens. Apple Siri integration delayed to May+ but the deal is signed. The benchmark king.

gemini.google.com

Llama

Meta A Tier

Llama 4 Maverick · Llama 4 Scout · Behemoth (training)

The open-source champion. Llama 4 went mixture-of-experts and natively multimodal - Scout fits on a single H100, Maverick beats GPT-4o on most benchmarks. Behemoth (2T params) is still training. Essential for the ecosystem.

llama.meta.com

Mistral

Mistral AI B Tier

Mistral Small 4 · Mistral Large 3 · Devstral 2 · Codestral

Europe's answer to the AI race. Small 4 just dropped - 119B MoE unifying reasoning, multimodal, and coding under Apache 2.0, 40% faster than Small 3. Forge lets enterprises build custom models on their own data. ASML, Ericsson, and ESA are already on board. Lean and efficient.

mistral.ai

Perplexity

Perplexity AI B Tier

Perplexity Computer · Sonar Pro · Deep Research · Comet Browser

Not a traditional model, but the best AI-powered search experience. Perplexity Computer runs 19 models with subagents for complex workflows. Comet browser launched free on iOS. Deep Research now runs on Opus 4.6. Dropped all ads. My go-to for research - and the gap is growing.

perplexity.ai

Grok

xAI Nope

Grok 4.20 Beta · Grok 4.1 · Grok 3

xAI raised $20B in Series E and SpaceX completed the acquisition. Grok 4.1 is now available to all. But in March, three plaintiffs - two minors - sued xAI over CSAM deepfakes created using Grok's image generation. 3M+ sexualized images in 10 days, ~23K involving minors. Investigations open in the US, EU, UK, France, Ireland, and Australia.

x.ai

DeepSeek

DeepSeek (China) Nope

DeepSeek-V3.2 · DeepSeek-R1 · V3.2-Speciale

V4 - the 1T-parameter model that was supposed to reshape the landscape - has missed every announced window. A mystery model on a developer platform turned out to be Xiaomi, not DeepSeek. CCP censorship remains baked in at the architecture level. Both OpenAI and Anthropic's distillation accusations stand. The promise keeps growing but the delivery keeps slipping.

chat.deepseek.com

Who Powers What

The AI models above don't just live in chatbots. They're quietly powering the products you already use every day. Here's who's running what under the hood - and one very notable absence.

Microsoft Copilot

Microsoft 365, Windows, Bing, Edge

OpenAI GPT-5.4 GPT-5.4 Thinking GPT-5.4 Pro

The deepest OpenAI integration. GPT-5.4 powers M365 Copilot with native computer-use capabilities. GPT-5.4 mini and nano handle high-volume workloads. Routes between models per task.

Amazon Alexa+

Echo, Fire TV, Ring, Smart Home

Anthropic Claude Amazon Nova

Claude handles the heavy thinking; Amazon's Nova models take the simpler tasks. Routed via Amazon Bedrock - "we pick the model that's right for the job." Amazon's $8B investment in Anthropic at work.

Samsung Galaxy AI

Galaxy S series, Fold, Flip, Tablets

Google Gemini Samsung Gauss

Gemini powers voice commands and cross-app actions. Samsung's in-house Gauss models handle on-device processing. Their TVs add Copilot and Perplexity into the mix too.

Meta AI

WhatsApp, Instagram, Messenger, Facebook

Meta Llama

Meta eating their own cooking. Llama powers the AI assistant across all Meta platforms - 3+ billion potential users. The largest real-world deployment of an open-source model.

Google Assistant

Android, Pixel, Nest, Search

Google Gemini

The old Google Assistant is being phased out in favor of Gemini across Android and Pixel devices. Gemini Live handles real-time voice conversations natively.

Apple Siri

iPhone, iPad, Mac, HomePod, Apple Watch

Google Gemini (Delayed) On-Device Models

Still the elephant in the room. The Gemini-powered Siri was supposed to ship in March with iOS 26.4 but has been pushed to iOS 26.5 (May) at the earliest, with full conversational AI in iOS 27 (September). Apple is paying Google ~$1B/year while developing its own "Ferret-3" models. We'll see — again.

Political Bias in AI Models

One thing most people don't realize: every major AI model has a measurable political lean. Multiple peer-reviewed studies have mapped these models on political compass-style charts. Here's what the research says.

Model	Political Lean	What the Research Found
ChatGPT	Left-Leaning	Consistently the furthest left across multiple studies. OpenAI's own evaluation found emotionally charged liberal prompts exert the largest pull on objectivity. GPT-5 shows improvement over GPT-4o.
Claude	Most Centrist	Earlier studies found liberal-leaning; by 2025, Promptfoo measured it as the most centrist model at 0.646 (0.5 = true center). Anthropic actively publishes their political even-handedness methodology.
Gemini	Moderate Left	Stanford study found users perceived it as the least slanted overall. Measured further left than Claude but more moderate than ChatGPT. Generally centrist on social issues.
Llama	Right-Leaning (Relative)	The 2023 ACL award-winning paper found it was the most right-wing authoritarian of the 14 models tested. An outlier in the open-source space.
Perplexity	Libertarian-Right	The IEEE study found it exhibited a "libertarian capitalistic stance" - more conservative than its peers. An interesting position for a search-focused product.
Grok	Chaotic	Despite xAI's "less woke" marketing, studies found the highest extremism rate at 67.9% - wild swings between far-left and far-right. Promptfoo called it "designed to be contrarian rather than ideological." Even Pew's quiz placed it as an "establishment liberal."
DeepSeek	CCP-Aligned	Not left or right on a Western spectrum — state-aligned. 1,156 documented censored topics including Taiwan, Tiananmen, and Xi Jinping. Responses shift by language: Chinese queries get Party-line answers, English queries get more nuanced takes. Censorship is embedded at the model level, not just the app layer.

All major AI models lean left on economics (wealth taxes, minimum wage). No study has found a consistently conservative AI among industry leaders.

Sources & Further Reading

TrackingAI.org — AI Political Compass Tracker Interactive scatter plots, regularly updated
Promptfoo — AI Political Bias Evaluation (2025) Comparative study of Claude, GPT, Gemini, Grok
Choudhary et al. — Political Bias in AI-Language Models IEEE, 2024
Stanford — Perceived Political Bias in Popular AI Models Stanford Report, 2025
Anthropic — Measuring Political Bias in Claude Anthropic Research, 2025
OpenAI — Defining and Evaluating Political Bias in LLMs OpenAI Research
Manhattan Institute — Measuring Political Preferences in AI Systems 2025

These rankings are entirely my own opinion based on daily use. Your mileage may vary. I have no financial relationship with any of these companies.

Well, except that Claude literally built this page. Make of that what you will.