The State of the AI Models — My Rankings as of March 2026

Last month's ranking post became one of the most-read things on this site. People want to know what someone who uses these tools daily — not someone running benchmarks in a lab — actually thinks about them.

March 2026 has been the most eventful month in AI since I started tracking. A Super Bowl ad war. A Pentagon controversy. A new GPT. A new Gemini that dominates benchmarks. Claude hitting number one on the App Store. And a lawsuit against Grok that makes the "Nope" tier feel generous.

Same format. Eight models. Four tiers. Here is where things stand.

The S Tier: Claude and ChatGPT

Claude holds the number one spot and the gap widened this month. Anthropic shipped memory to all users — including free tier — and launched an import tool that lets you bring your ChatGPT or Gemini context into Claude with a copy-paste. That is a bold move. It is also the kind of move you make when you are confident people will stay once they switch.

And switch they did. After OpenAI announced a $200 million Pentagon deal and started showing ads to free users, Claude shot to number one on the App Store. Anthropic had run Super Bowl ads promising to never put ads in Claude — and then OpenAI proved them right the same week. Claude hit 11.3 million daily active users and saw one million signups in a single day. ChatGPT later reclaimed the top spot, but the damage was done. For the first time, Claude is not just the better model — it is the better brand.

This entire site is still built with Claude Code. Opus 4.6 handles the hard thinking. Sonnet 4.6 handles everything else. The memory feature means Claude now remembers my projects, my preferences, and my writing style across conversations. That is the kind of compound advantage that gets harder to leave the longer you use it.

ChatGPT remains a strong number two. GPT-5.4 dropped on March 5 and it is OpenAI's best model yet — 33% fewer false claims than GPT-5.2, native computer-use capabilities, and 47% more concise on complex tasks. GPT-5.4 mini and nano followed on March 17 for high-volume workloads. The model is genuinely excellent. OpenAI still has 900 million weekly users and the largest ecosystem in AI.

But the Pentagon deal and the ads created a trust problem that no benchmark can fix. OpenAI's share of daily U.S. users dropped from 69% to 45% in one year. That is not a collapse — it is a market normalizing. And it is normalizing in Claude's direction.

The A Tier: Gemini and Llama

Gemini had the single most impressive technical achievement of the month. Gemini 3.1 Pro, released February 19, now leads on 13 of 16 major benchmarks. It scored 77.1% on ARC-AGI-2 — more than double Gemini 3 Pro's score on the same pure-logic test. It hit 94.3% on GPQA Diamond for scientific knowledge and 80.6% on SWE-Bench Verified for coding. All at $2 per million input tokens. That pricing is almost unfair.

But here is the thing — Gemini is still my brainstorming tool, not my building tool. The benchmarks say it should be better at coding than it feels in practice. I will keep testing. If Gemini 3.1 Pro's coding actually matches the benchmarks in real-world use, the ranking conversation changes significantly.

The Apple Siri situation got more complicated. The Gemini-powered Siri overhaul was supposed to ship with iOS 26.4 in March. It has now been pushed to iOS 26.5 in May at the earliest, with full conversational capabilities delayed to iOS 27 in September. Apple is paying Google roughly $1 billion a year for Gemini integration while also developing its own "Ferret-3" models as a bridge strategy. The deal is real, but the execution keeps slipping.

Llama holds its position. Behemoth — the two-trillion-parameter model — is still not publicly released. Meta reportedly pushed it back multiple times through 2025 and there is no confirmed date for broad availability. Scout and Maverick continue to do solid work in the open-source ecosystem. Three billion users across Meta's platforms run on Llama. But without Behemoth shipping, the Llama story has not changed since last month.

The B Tier: Mistral and Perplexity

Mistral earned its spot this month. Mistral Small 4 dropped — a 119-billion-parameter mixture-of-experts model that unifies reasoning, multimodal, and coding into a single model under Apache 2.0. That is Magistral's reasoning, Pixtral's vision, and Devstral's coding in one package, with configurable reasoning effort per request. It is 40% faster and handles 3x more requests per second than Small 3. At GTC, Mistral launched Forge, letting enterprises build fully custom models on their own data. Early adopters include ASML, Ericsson, and the European Space Agency.

Mistral keeps punching above their weight. If you are building products and care about cost, open-source licensing, and European data sovereignty, they are the answer.

Perplexity had a massive month. They launched Perplexity Computer — an agentic tool that runs 19 different AI models and creates subagents to handle complex workflows. It is only on the $200-per-month Max tier, but it is the most ambitious multi-model product anyone has shipped. They also dropped the Comet browser for iOS — a free AI-native browser that replaces traditional browsing with a built-in assistant. Deep Research now runs on Claude Opus 4.6. And they eliminated advertising entirely, saying it undermined trust in their answers.

That last move is the interesting one. In a month where OpenAI started putting ads in ChatGPT, Perplexity went the other direction. Perplexity remains my number one for research, and the gap between it and everything else for that use case is growing.

The Nope Tier: Grok and DeepSeek

Grok's month made the "Nope" tier feel generous.

On March 16, three plaintiffs — two of them minors — filed a federal lawsuit accusing xAI of producing, distributing, and possessing child sexual abuse material. The lawsuit alleges that a perpetrator used xAI's image generation technology through a third-party app to create sexualized deepfake images of high school girls, including from homecoming photos. Criminal investigators found the images being traded on Telegram as currency for other CSAM. The Center for Countering Digital Hate estimated that Grok's tools created over three million sexualized images in a ten-day period in early January, with roughly 23,000 involving minors.

xAI did not add content watermarks to AI-generated images — unlike Google and OpenAI — and only stopped allowing Grok to "undress" people in images on January 15, after the damage was done. The lawsuit seeks class-action status on behalf of thousands of minors and arrives alongside simultaneous investigations in the U.S., EU, UK, France, Ireland, and Australia.

Meanwhile, xAI raised $20 billion in a Series E round. SpaceX completed its acquisition of xAI. And Musk is hiring Wall Street bankers to teach Grok about finance. The technical product keeps improving — Grok 4.1 is now available to all users. But when your platform is generating CSAM at industrial scale and your response is to raise more money, the question is not about the technology anymore.

DeepSeek remains in the Nope tier. V4 — the one-trillion-parameter model that was supposed to reshape the landscape — has missed every announced window. Mid-February, Lunar New Year, late February, early March — all passed without a release. A mystery model that appeared on a developer platform last week, which many suspected was DeepSeek V4, turned out to be from Xiaomi. The CCP censorship, the distillation accusations from both OpenAI and Anthropic, and the security vulnerability injection documented by CrowdStrike all remain unaddressed. The technical promise is real but the delivery keeps slipping and the trust issues keep compounding.

What Changed This Month

The biggest shift in March was not a model launch. It was the consumer market.

For the first time, Claude outsold ChatGPT on both app stores. The catalyst was OpenAI's Pentagon deal and the backlash that followed — ChatGPT uninstalls jumped 295% in a single day — but the underlying trend is structural. Anthropic positioned itself as the ethical alternative, backed it up with a no-ads promise and free memory for all users, and captured a moment. Whether that moment becomes a movement depends on whether Claude can retain those eleven million daily users.

GPT-5.4 is a legitimate frontier model. Gemini 3.1 Pro is a benchmark monster. Mistral Small 4 is the best open-source value proposition in the market. Perplexity Computer is the most ambitious multi-model product anyone has shipped. The field has never been stronger.

But the conversation is no longer just about capability. It is about trust, positioning, and what you are willing to tolerate from the company behind the model. OpenAI's ads and Pentagon deal. Grok's CSAM lawsuit. DeepSeek's censorship and IP theft accusations. These are not technical issues — they are values issues. And in March 2026, values moved the market more than benchmarks did.

The Updated Rankings

The full rankings — including the "Who Powers What" ecosystem map, the political bias table, and links to every model — live at johncderrick.com/ai-models. I update it monthly.

If you want to stop reading about these models and start building real systems with them, the Prompt Library has ten copy-paste prompts for AI assistants, email triage, knowledge bases, CRMs, and more. The prompts work across Claude, ChatGPT, and Gemini.

The Protocol: I use these models every day, all of them. The rankings are not theoretical — they are operational. Claude builds my software. ChatGPT handles my overflow. Gemini brainstorms with me. Perplexity does my research. The full rankings live at johncderrick.com/ai-models and the prompts to put them to work live at johncderrick.com/prompts. Check the dates. If they are current, so are the opinions.

The State of the AI Models — My Rankings as of March 2026

The S Tier: Claude and ChatGPT

The A Tier: Gemini and Llama

The B Tier: Mistral and Perplexity

The Nope Tier: Grok and DeepSeek

What Changed This Month

The Updated Rankings

Practical Guides for Small Business

Need a Fractional CTO?

The State of the AI Models — My Rankings as of March 2026

The S Tier: Claude and ChatGPT

The A Tier: Gemini and Llama

The B Tier: Mistral and Perplexity

The Nope Tier: Grok and DeepSeek

What Changed This Month

The Updated Rankings

Practical Guides for Small Business

Need a Fractional CTO?

Get In Touch

Be the First to Know