The State of the AI Models — My Rankings as of February 2026

Log Entry: 2026-02-20 | Subject: AI, Models, Rankings, Claude, ChatGPT, Gemini, Strategy

I published my AI Models Ranking page a few weeks ago and it has become the most visited page on this site after the homepage. People want to know what someone who actually uses these tools every day thinks about them. Not a benchmark chart. Not a press release. An opinion earned through daily use.

So here is where things stand in February 2026. Seven models. Four tiers. One clear winner — and one that earned itself a "Nope."


The S Tier: Claude and ChatGPT

Claude is my number one and it is not close. Opus 4.6 handles the hard thinking. Sonnet 4.6 now beats last generation's Opus for the majority of daily coding work at a third of the price. This entire site — every page, every post, every deployment — was built with Claude Code. When I say "daily use," I mean I cannot do my job without it.

ChatGPT is a close second. GPT-5.2 is the default now and it is genuinely good — 80% fewer hallucinations than o3 with automatic reasoning routing. The cross-conversation memory is still something nobody else has matched. GPT-5.3-Codex is excellent for code. The ecosystem around OpenAI is massive and that matters. If Claude disappeared tomorrow, I would land on ChatGPT and lose maybe 15% of my output. That is how close they are.

The gap between S Tier and everything else is real. These two models handle reasoning and coding at a level that the rest of the field has not reached. The writing quality follows from that. If you are only going to pay for one AI subscription, it should be one of these two.


The A Tier: Gemini and Llama

Gemini is my brainstorming partner. When I need to riff on ideas or generate images, that is where I go. Gemini 3 Flash is now the default and it offers serious reasoning at speed. The Google ecosystem integration is a genuine advantage — it can pull context from your entire Google workspace — email, calendar, Drive — without you having to copy-paste anything. The weak spot is coding. Gemini is not where I go to build software.

Llama is the open-source champion and the reason the entire industry stays honest. Llama 4 went mixture-of-experts and natively multimodal. Scout fits on a single H100. Maverick beats GPT-4o on most benchmarks. Behemoth — two trillion parameters — is still training. I do not use Llama directly most days, but I benefit from it constantly. Every time a startup builds something useful on top of an open model, that is Llama's legacy.


The B Tier: Mistral and Perplexity

Mistral is Europe's answer and they punch above their weight. Large 3 is a 675-billion-parameter mixture-of-experts model at $0.50 per million tokens. That pricing is borderline irrational. Devstral 2 and Codestral are solid coding models. Magistral added chain-of-thought reasoning. If I were building a product and needed to keep API costs low, Mistral would be on my shortlist.

Perplexity is not a traditional model, but it is the best AI-powered search experience available. Sonar runs at 1,200 tokens per second on Cerebras hardware. Reasoning Pro and Deep Research modes are genuinely useful. When I need to research a topic quickly and want sourced answers instead of confident fabrications, Perplexity is where I go. It is my number one for research.


The Nope Tier: Grok

Grok earned its own tier and not in a good way.

xAI ships fast — four major versions in a year. Grok 4.20 just dropped with multi-agent collaboration. But SpaceX had to acquire xAI to keep the lights on. Musk is claiming 10% odds that Grok 5 achieves AGI, which is exactly the kind of claim you make when you need investor attention more than you need credibility. The $300-per-month Heavy tier is aggressive pricing for a model that has not demonstrated $300 worth of capability over the free alternatives.

And then there is the differentiation strategy. "Less woke" is a marketing angle, not a technical advantage. The research backs this up — Grok has the highest extremism rate at 67.9%, swinging wildly between far-left and far-right responses. Promptfoo called it "contrarian rather than ideological." It is not less biased. It is less predictable. Those are different things.


What Changed This Month

The biggest shift in February was not a new model launch. It was the ecosystem map.

Apple officially partnered with Google to put Gemini under the hood of next-generation Siri, targeting spring 2026. That means Gemini will power Android, Pixel, Samsung Galaxy AI, Google Assistant — and now Apple's entire device lineup on top of that. That is market penetration that no benchmark can compete with.

Meanwhile, Claude powers Amazon Alexa+ through Bedrock. OpenAI powers Microsoft Copilot across the entire M365 stack. Meta runs Llama across every platform it owns — WhatsApp, Instagram, Messenger, Facebook — three billion potential users on an open-source model.

The AI models are no longer just chatbots you visit in a browser tab. They are the invisible engines behind the products you already use. The ranking matters less for the chatbot experience and more for which model is making decisions on your behalf inside systems you did not even know were AI-powered.


The Bias Question

I added a political bias section to the ranking page because I think it matters and most people do not realize it exists. Every major AI model has a measurable political lean. Multiple peer-reviewed studies have mapped them.

The short version: ChatGPT leans furthest left. Claude is measured as the most centrist. Gemini is moderate left. Llama leans right relative to the field. Perplexity trends libertarian. Grok is chaotic — high extremism rate with no consistent ideology. All of them lean left on economics. No study has found a consistently conservative AI among industry leaders.

This is not a political argument. It is a calibration tool. If you know the bias of your instrument, you can correct for it. If you do not, you are trusting a system you do not fully understand. That should bother more people than it does.


Where I Think This Goes

The two-horse race at the top — Claude and ChatGPT — is going to define 2026. Both are improving fast. Both have massive enterprise adoption. Both are embedded into ecosystems that create lock-in.

The open-source tier — Llama and Mistral — will keep the closed-source leaders honest on pricing and capability. Behemoth, when it ships, could reshape the landscape entirely.

Perplexity will keep owning the search-adjacent space until Google figures out how to make Gemini's search integration feel less like a search engine and more like a research partner.

And Grok will keep shipping fast and making bold claims while charging premium prices for a product that has not earned premium trust. That might change. But February 2026 is not the month it changes.

If you want to stop treating these models like search engines and start building real systems with them, I also published a Prompt Library — ten copy-paste prompts for building AI assistants, email triage, knowledge bases, CRMs, and more. The prompts work across Claude, ChatGPT, and Gemini. That is where the rankings stop being theoretical and start being operational.

The Protocol: I use these models every day, all of them. The rankings are not theoretical — they are operational. Claude builds my software. ChatGPT handles my overflow. Gemini brainstorms with me. Perplexity does my research. The full rankings live at johncderrick.com/ai-models and the prompts to put them to work live at johncderrick.com/prompts. Check the dates. If they are current, so are the opinions.
End Log. Return to Index.
Free Resources

Practical Guides for Small Business

Step-by-step eBooks on CMS migration, AI implementation, and modern web development. Free previews available - full guides coming soon.

Browse eBooks & Guides →

Need a Fractional CTO?

I help small businesses cut costs and scale operations through AI integration, workflow automation, and systems architecture. A Full-Stack CTO with CEO, COO, and CMO experience.

View Services & Background See Pricing

Be the First to Know

New log entries, project launches, and behind-the-scenes insights delivered straight to your inbox.

You're in! Check your inbox to confirm.

No spam, ever. Unsubscribe anytime.