← jennylucas.ai
ChatGPT / OpenAI
GPT-5.4, o4-mini/high · 200K ctx
~68%
traffic share
CTX: 200K Cost: $$$ General Enterprise Software Dev
Strengths
Most consistent all-purpose model across coding, reasoning, and long-context tasks
Massive ecosystem: 900M weekly users, Copilot integration, API
Best tool-chaining and autonomous agent workflows
Codex agent for software development tasks
Strongest brand recognition and developer adoption
Weaknesses / Gaps
Highest cost at scale — premium pricing across all tiers
Cautious refusals on sensitive/abstract queries
Market share declining (87%→68%) as competitors close gap
Proprietary lock-in: can't self-host or fine-tune
Occasional hallucination issues in fast-moving domains
Strongest Verticals
General Enterprise Software Dev Marketing Automation Customer-Facing Chatbots
tap to expand
Gemini / Google
Gemini 3 Pro, Flash · 1M ctx
~18%
traffic share
CTX: 1M Cost: $$ Multimodal Google Workspace
Strengths
Best multimodal processing — text, image, audio, video in one pipeline
1M token context window — processes entire codebases/books
Unmatched distribution: Search, Workspace, Android, Chrome
Flash model leads real-time workloads at lowest latency/cost
Deepest integration with enterprise productivity tools
Weaknesses / Gaps
Growth driven by distribution more than model quality alone
Creative writing and nuanced instruction-following lag behind
Enterprise trust still building vs. OpenAI's head start
Complex pricing tiers across Google Cloud ecosystem
Less developer loyalty outside Google-native shops
Strongest Verticals
Multimodal Content Google Workspace Orgs Real-Time Dashboards Education
tap to expand
Claude / Anthropic
Opus 4.6, Sonnet 4.6, Haiku 4.5 · 200K ctx
~3%
traffic share
CTX: 200K Cost: $$ Regulated Enterprise Software Eng
Strengths
Best-in-class deep analysis, long-doc reasoning, multi-step logic
Constitutional AI: strongest safety/alignment for regulated industries
Superior coding performance (Opus 4.5 benchmark breakthrough)
Extended thinking with tool-use for complex agentic workflows
MCP protocol — Google adopted it for AI agent interoperability
Weaknesses / Gaps
Low consumer market share — stuck in single digits
Smaller context window (200K) vs. Gemini/Llama
Less multimodal capability (no native video/audio generation)
Brand awareness significantly trails OpenAI and Google
Limited distribution — no OS/browser/productivity suite tie-in
Strongest Verticals
Regulated Enterprise (Finance, HC, Gov) Legal Analysis Software Engineering Research & Analysis
tap to expand
Grok / xAI
Grok 4.1 · 1M ctx
~3%
traffic share
CTX: 1M Cost: $$ Social Analytics News/Sentiment
Strengths
Real-time access to X/Twitter social data and trending conversations
1M token context window matching Gemini
Strong for breaking news, sentiment analysis, cultural pulse
Less content filtering — more permissive for edgy/creative use
Growing steadily through X platform integration
Weaknesses / Gaps
Ecosystem entirely dependent on X/Twitter — narrow distribution
Smaller developer community and API adoption
Enterprise credibility concerns tied to Musk's brand volatility
Weaker on structured enterprise tasks vs. competitors
Limited third-party integrations and tooling
Strongest Verticals
Social Media Analytics News/Journalism Marketing & Brand Monitoring Cultural Trend Analysis
tap to expand
The open-source gap has closed. Open-weight models now trail proprietary frontier models by only ~3 months on average. DeepSeek, Llama, Qwen, and Mistral all compete with or beat GPT-5 and Claude on specific benchmarks.
DeepSeek
V3.2, R1 · China (Hangzhou) · MIT License
Reasoning powerhouse
671B MoE model (37B active). R1 scored 97.3% on MATH-500 — highest of any open model. Gold medal at IMO 2025, IOI 2025, ICPC World Finals. Trained at dramatically lower cost than industry norms. V3.2 is the first model to integrate thinking directly into tool-use workflows.
Best For
Complex reasoning, math, competitive coding, legal analysis
Risks / Gaps
Geopolitical concerns for US enterprise; verbose outputs; China-based data handling questions
tap to expand
Meta Llama
Llama 4 Scout/Maverick · USA · Community License
Open-weight ecosystem king
Scout: 10M token context window (industry-leading — 7,500+ pages). Maverick: 400B MoE for quality. Largest open-source community. Runs on consumer to enterprise hardware with multiple size options.
Best For
Self-hosted enterprise, long-document analysis, privacy-first deployments, fine-tuning
Risks / Gaps
700M MAU license cap and EU restrictions; requires infrastructure expertise to self-host
tap to expand
Alibaba Qwen
Qwen 3.5, Qwen3-Next · China · Apache 2.0
Multilingual & cost champion
1T+ parameters via MoE. Supports 119–201 languages. 92.3% on AIME25. Integrates vision and language for unified multimodal reasoning. Scores 74.1% on LiveCodeBench v6. 8.6×–19× higher decoding throughput vs previous gen.
Best For
Multilingual deployments, RAG pipelines, coding, global enterprise summarization
Risks / Gaps
Smaller Western dev community; compliance concerns for US-regulated industries; API stability
tap to expand
Mistral AI
Large 3, Small 4 · France · Apache 2.0
EU data sovereignty & speed
80+ languages. Small delivers 90% of Large's performance at 1/8 the cost. Building "Mistral Sovereign" for air-gapped government networks. Fastest inference at 7B scale. Strong French, German, Spanish NLP.
Best For
EU-regulated industries, real-time chatbots, GDPR-compliant deployments, European NLP
Risks / Gaps
Larger compute than Llama at comparable sizes; smaller global community; niche positioning
tap to expand
Perplexity
Multi-model AI search platform · USA
Citation-first AI search
Not a single LLM — an AI-native search platform combining multiple models with retrieval systems. Every answer includes citations and source links. Strong adoption among knowledge workers.
Best For
Competitive research, academic analysis, journalism, market intelligence, fact-checking
Risks / Gaps
Can't fine-tune or self-host; dependent on underlying model providers; not a standalone LLM
tap to expand
Xiaomi MiMo
MiMo-V2-Flash · China
Agentic coding at insane efficiency
Outperforms DeepSeek-V3.2 and Kimi-K2 on SWE benchmarks at 1/2–1/3 the parameters. 150 tok/sec. $0.10/M input tokens. Trained for agentic tool-calling via Multi-Teacher Online Policy Distillation (MOPD).
Best For
High-throughput code agents, tool-use workflows, cost-sensitive production inference
Risks / Gaps
Very new entrant; limited enterprise track record; China-origin concerns for regulated buyers
tap to expand
Vertical alignment matters more than benchmarks. Domain-specific APIs in banking and healthcare are already displacing generic models by reducing hallucination risk and easing compliance.
Healthcare
ClaudeGPT-5
Regulated environment demands safety alignment + reasoning depth. Healthcare LLM market growing at 25.95% CAGR. Constitutional AI and enterprise compliance are table stakes.
Financial Services
ClaudeDeepSeek
Multi-step reasoning for risk analysis, compliance docs. Domain-specific APIs reducing hallucination risk. DeepSeek's math prowess relevant for quantitative analysis.
Legal
ClaudeLlama 4
Long-context document analysis (10M tokens for Llama Scout). Claude's reasoning for contract review and case law. Privacy-first self-hosting critical for client data.
Software Engineering
GPT-5ClaudeDeepSeek
Codex agent, Claude Code, DeepSeek's competitive coding dominance. Agentic workflows are the differentiator. MiMo emerging as efficiency leader.
Marketing & Content
GPT-5Gemini
GPT's brand + Gemini's multimodal power. Integration with Workspace/Copilot for content at scale. Gemini's video understanding opens new creative workflows.
EU / Gov (Data Sovereignty)
MistralLlama 4
Apache 2.0/self-hosted. Mistral Sovereign designed for air-gapped government networks. EU AI Act compliance requires regionally trained or on-premise models.
Multilingual / Global
QwenMistral
Qwen covers 119–201 languages. Mistral strong in European languages. Both open-weight with permissive licensing for global deployment.
Social / Media Intelligence
GrokPerplexity
Grok's real-time X data for sentiment and trends. Perplexity's citation-first research for journalism and competitive intelligence.
Platform Engineering / DevPortals
ClaudeGPT-5Llama 4
Agentic AI governance, MCP protocol for agent interop, self-hosted options for IDPs. The diversity of models is exactly why governance platforms matter.
Different roles need different models. When you're in a deal, you're not selling to "an industry" — you're selling to a VP of Engineering, a CISO, a CMO, and they each evaluate AI through a completely different lens.
CTO / VP Engineering
ClaudeGPT-5Llama 4
Cares about agentic workflows, code quality, and developer productivity. Claude Code and Codex are the key differentiators. Llama 4 for self-hosted control and fine-tuning. Evaluates on: benchmark performance, context window, tool-use reliability, and MCP support.
CISO / Security
ClaudeMistralLlama 4
Constitutional AI and safety alignment are table stakes. Mistral Sovereign for air-gapped gov networks. Llama for on-prem, no-data-leaves-the-building deployments. Evaluates on: data residency, PII handling, audit logging, and compliance certifications.
CMO / Marketing
GPT-5Gemini
Content at scale: copywriting, campaign generation, multimodal assets. Gemini's video/image understanding opens new creative workflows. GPT-5 has the strongest brand and broadest integration with marketing tools via Copilot. Evaluates on: creative quality, multimodal support, speed, and integration with existing stack.
Data Science / ML
DeepSeekQwenLlama 4
Fine-tuning and self-hosting are non-negotiable for serious ML teams. DeepSeek's reasoning dominates math/science benchmarks. Qwen's Apache 2.0 license means zero restrictions. Evaluates on: fine-tuning flexibility, benchmark performance on domain tasks, cost per experiment, and open weights.
Developer / Platform Eng
ClaudeGPT-5DeepSeek
Coding benchmarks matter here: Claude Code, Codex, and DeepSeek all lead. MCP protocol support is becoming a differentiator for agent interop. The IDP/AEP layer governs which models developers can access and how they're routed. Evaluates on: code generation quality, IDE integration, agentic tool-use, and API reliability.
Legal / GRC
ClaudeLlama 4
Long-context document analysis is critical — Llama 4 Scout's 10M tokens can process entire case files. Claude's multi-step reasoning excels at contract review and regulatory analysis. Evaluates on: reasoning accuracy, hallucination rate, context window, and privacy controls.
Exec / Strategy
PerplexityGPT-5Claude
Executives want synthesized intelligence, not raw output. Perplexity's citation-first research is ideal for board prep and competitive intel. GPT-5 and Claude for strategic synthesis across long documents. Evaluates on: accuracy of synthesis, source attribution, speed, and polish of output.
How token pricing works: LLMs charge per million tokens (roughly 750K words). Every price shown is input / output — what you send vs. what the model generates. Output is always more expensive because it requires more compute. The cost driver isn't the model's answer — it's the context you send with every request.
Frontier tier
$5–$180/M output
Claude Opus 4.6: $5 in / $25 out · GPT-5.4 Pro: $30 in / $180 out · GPT-5.2: $1.75 in / $14 out

Use sparingly — complex reasoning, mission-critical analysis, high-stakes legal/medical review. A single Opus conversation with full 200K context costs ~$4.50. Run that 100x/day = $13,500/month on one workflow.
Mid tier
$1.25–$15/M output
Claude Sonnet 4.6: $3 in / $15 out · GPT-5.4: $2.50 in / $10 out · GPT-4o: $2.50 in / $10 out · Gemini 3.1 Pro: $2 in / $12 out · Grok 3: $3 in / $15 out

The production workhorse tier. Handles 80%+ of real enterprise tasks — coding, analysis, content generation, customer support.
Budget tier
$0.50–$5/M output
Claude Haiku 4.5: $1 in / $5 out · GPT-5.4 Mini: $0.75 in / $3 out · Gemini 3 Flash: $0.50 in / $3 out · Mistral Small: $0.20 in / $0.60 out

High-volume workloads: chatbots, classification, content moderation, bulk processing. Flash and Haiku deliver 90% of frontier quality at 10% of the cost.
Cheapest capable
$0.075–$0.42/M output
DeepSeek V3.2: $0.14 in / $0.28 out · Gemini Flash-Lite: $0.075 in / $0.30 out · Gemini 2.0 Flash: $0.10 in / $0.40 out

100x cheaper than frontier. Genuine alternatives for RAG, summarization, and routine queries. DeepSeek is MIT licensed; Gemini has a generous free tier.
Self-hosted open models
$0 per token
Llama 4, DeepSeek, Qwen, Mistral — free to download, but GPU infrastructure costs $0.50–$5/hour depending on model size. Running Llama 70B requires 2x A100 GPUs (~$2,160/month). Eliminates per-token fees at scale; 5–25x cheaper than API equivalents for high-volume use cases.
💡
The 70/20/10 rule
Best practice at scale: route 70% of queries to budget models, 20% to mid-tier, and 10% to frontier. This tiered approach reduces average per-query cost by 60–80%. Stack with prompt caching (~90% savings on repeated inputs) and batch APIs (50% off async workloads) for maximum efficiency. Realistic enterprise budgets should be ~1.7x base token cost to account for growth, infrastructure overhead, and experimentation.
LLM gateways are the new enterprise infrastructure layer. They sit between your applications and LLM providers, handling intelligent routing, automatic failover, cost governance, and security — so teams don't call APIs directly.
What LLM gateways do
Routing
Classify task complexity, send simple queries to cheap models, complex ones to frontier — automatically
Failover
If OpenAI goes down, traffic switches to Anthropic or Google instantly — 99.99% uptime
Cost controls
Per-team budgets, dollar-based quotas, semantic caching to eliminate redundant calls
Governance
RBAC, PII sanitization, prompt templates, audit logs, compliance automation
Gateway landscape
Portkey AI
Enterprise gateway + orchestration
Multi-modal LLM support with fine-tuning
Enterprise-grade AI gateway connecting developers to multiple LLMs. Intelligent routing, failover, cost optimization. Supports text, image, audio, and vision models. Active cost management across providers.
tap to expand
TrueFoundry
Orchestration + governance + GPU management
Full-stack AI platform for enterprises
Combines orchestration, governance, and scalability. MCP and Agents Registry for centralized tool management. GPU orchestration with up to 80% higher utilization. Prompt lifecycle management with versioning and monitoring.
tap to expand
Kong AI Gateway
API management extended for AI
Semantic routing + PII sanitization
Extends Kong's API management platform to LLM routing. Semantic caching saves on redundant prompts. Built-in PII stripping and prompt guards. Pre-built dashboards for AI-specific analytics. Best for teams already on Kong.
tap to expand
Bifrost
Open-source, high-performance, Go-based
11μs overhead per request
Open-source AI gateway built in Go. 20+ providers through a single OpenAI-compatible API. MCP integration, Prometheus metrics, distributed tracing. Drop-in replacement — switch from direct SDK calls with one line of code.
tap to expand
Cloudflare AI Gateway
Managed, edge-deployed, zero infrastructure
Global edge + unified billing
Leverages Cloudflare's global edge network. Request caching, rate limiting, usage analytics. Unified billing across OpenAI, Anthropic, and Google. No infrastructure setup required — dashboard-managed.
tap to expand
OpenRouter + LiteLLM
Lightweight multi-model access
Fastest path to multi-model
OpenRouter: single API key for 100+ models, automatic fallback, unified billing. Best for quick multi-model access. LiteLLM: open-source Python proxy translating all providers to OpenAI-compatible format. Virtual keys, spend tracking per team. Best for Python-centric prototyping.
tap to expand
🏛
The IDP layer: governance on top
The LLM gateway is infrastructure. The Internal Developer Portal is the governance and developer experience layer on top. It's what turns a multi-model mess into a managed platform: model catalogs, RBAC policies, self-service workflows, compliance audit trails, budget controls by team, and MCP/agent governance. Teams can switch models, introduce fine-tuned versions, or enforce new governance rules — without modifying application code. The diversity of models is exactly why the governance platform matters.