State of AI Agents and Agentic Engineering 2026 | Metavert
Slide 1
The State of AI Agents & Agentic Engineering By Jon Radoff Web: https://metavert.io 2026 X: @jradoff ©2026 Metavert LLC Licensed under Creative Commons Attribution 4.0.
Slide 2
THE $211B QUESTION Agentic AI: systems that autonomously plan, use tools, and execute multi-step tasks with minimal human oversight. $211B AI venture funding (2025) 50% of all global VC vs 6% of orgs see >5% EBIT impact (McKinsey, 2025) 6× +67% 14.5hr Output gap: top-quartile AI users Merged PRs per engineer per day Autonomous task horizon vs. average (OpenAI usage analytics, at Anthropic after Claude Code (METR benchmark, Feb 2026) public blog, Sep 2025) (eng. metrics, public blog, 2025) Doubling every 123 days The 6% is a snapshot of early adoption. The outliers above show where the curve is heading — and that curve doubles every four months. This deck is about the gap between the 6% average and the outliers — and why that gap is closing on an exponential curve. Source: McKinsey 2025; METR Feb 2026; OpenAI; Anthropic; Base44/Wix; Crunchbase 2
Slide 3
Where does agentic AI actually stand in February 2026? 1 How much money is flowing — and to whom? 2 What can the technology really do? 3 What's actually changing in how we build, create, and work? 3
Slide 4
ROADMAP 01 The Money 02 The Technology 03 The Evolution 04 Creator Economy 05 Who Is Winning 06 Infrastructure Wars 07 Direct from Imagination 08 Beyond Engineering 09 Robotics & Embodied 10 Spatial Computing 11 Machine Societies 12 Slop & Quality 13 The Deeper Numbers 14 The Reckoning 15 What Comes Next 4
Slide 5
“ The future is already here — it's just not evenly distributed. — William Gibson
Slide 6
SECTION 1 The Money $211B in AI venture capital — 50% of all global VC. SpaceX-xAI: largest merger in history at $1.25T. 5
Slide 7
Global AI Spending: The $1.5 Trillion Year $1.5T+ 580 420 Total global AI spend, 2025 Enterprise AI: 210 211 $337B+ on AI software and services 85 Infrastructure: $580B+ on chips, data centers, power Enterprise Infrastructure AI Services & Venture Govt & Defense Software & Hardware Consulting Capital AI VC alone: $211B — more than all VC in 2020 AI spending now exceeds the GDP of all but ~30 countries. Infrastructure is the largest slice — atoms, not just bits. Source: IDC AI Spending Guide, Oct 2025; Gartner IT Forecast Q4 2025; Crunchbase. Note: totals combine software/services + infrastructure; analysts define 'AI spend' differently. 6
Slide 8
AI as Share of All Venture Capital: 15% → 50% 2025: $211B 211 150 50% of all global VC 95 75 vs. 15% in 2020 62 38 42 vs. 30% in 2023 35 vs. 42% in 2024 2018 2019 2020 2021 2022 2023 2024 2025 Half of all venture capital on Earth now flows to AI. No technology has ever captured this share of investment this quickly. Source: Crunchbase Annual VC Report 2025; PitchBook NVCA Venture Monitor Q4 2025. Note: definitions of 'AI company' vary across data sources. 7
Slide 9
The Mega-Deals: Where the Money Went Largest private Cumulative OpenAI $40B $300B val raise ever Anthropic $8B+ $60B val through 2025 combined Largest merger GPU cloud SpaceX-xAI $1.25T val ever, Feb 2026 CoreWeave $7.5B $35B val infrastructure Series J, Dec Humanoid Databricks $10B $62B val 2024 Figure AI $1.5B $39.5B val robotics Top 10 rounds = $100B+ — nearly half of all AI VC went to fewer than a dozen companies AI mega-deals are reshaping entire industries: SpaceX-xAI's $1.25T merger is the largest in history. The top 10 rounds consumed ~50% of all AI VC. Source: Crunchbase; PitchBook; CNBC (SpaceX-xAI, Feb 2026); company releases. Note: valuations mix primary rounds, secondary, and mergers. 8
Slide 10
Revenue Reality: Who's Actually Making Money? 13 11 OpenAI: $12.7B ARR 9 4x growth from $3.4B (2024) But still unprofitable: $5B net loss in 2024 3 Cursor: $0 → $1B 1 0 Fastest B2B SaaS ever 24 months, entirely organic OpenAI Google AWS AI Anthropic Cursor Mistral Cloud AI Services Revenue is real and growing fast. Profitability isn't the goal yet — the industry is plowing capital into building out capabilities at a pace that makes profitability a question for later. Source: The Information (OpenAI financials, Dec 2025); Anthropic investor materials; Yahoo Finance (Cursor); company filings 9
Slide 11
The Coding Agent Revenue Explosion Cursor $1B ARR 24 months 100K+ Claude Code $500M+ ~12 months new products built daily Replit $250M ~12 months on AI-native platforms 76% of devs using AI for code Lovable $200M 8 months (Stack Overflow, 2025) In Codex (OpenAI) Built into ChatGPT $12.7B These aren't SaaS growth curves. They're something new — viral adoption at consumer speed, with enterprise revenue. Source: Sacra (Cursor, Replit, Bolt.new); CNBC (Lovable Feb 2026); TechCrunch; Stack Overflow Developer Survey 2025 10
Slide 12
Enterprise AI Spending Surge: $337B+ 72% 16% 28% 11% of organizations now use AI in at least one business function 12% 18% 15% The shift: AI spending moving from "innovation budgets" to operational line items. CFOs now own AI spend. IT / Engineering (28%) Marketing (18%) Operations (15%) Customer Svc (12%) 65% of orgs report increased R&D (11%) Other (16%) AI spending vs. last year (Gartner) AI has crossed the chasm from R&D experiment to operational necessity. IT leads, but marketing and operations are the fastest-growing AI budgets. Source: McKinsey "The State of AI" Global Survey, Dec 2025; Gartner IT Spending Forecast Q4 2025; IDC AI Tracker 11
Slide 13
The Agentic AI Market: From $7B to $93B 100 93 90 65.5% 80 70 65 60 CAGR 2025–2030 50 43 Fastest-growing segment in enterprise software 40 30 24 20 13 7 10 4 0 2024 2025 2026E 2027E 2028E 2029E 2030E Agentic AI is growing at 65% CAGR — faster than cloud computing, mobile, or SaaS did at equivalent stages. The $93B target assumes current adoption accelerates. Source: MarketsAndMarkets, "Agentic AI Market" Report, 2025; Gartner Hype Cycle for AI 2025 12
Slide 14
The Developer Productivity Signal Code completion rate Faster task completion 26% (GitHub Copilot) 55% (Google internal study) of all code is now of developers use AI 41% AI-generated 82% tools daily/weekly The productivity signal is real — but unevenly distributed. Gains are largest for greenfield creation and less experienced builders. Source: GitHub Copilot Impact Study 2025; Google "AI-Assisted Software Engineering" internal report, Sep 2025; Stack Overflow Dev Survey 2025; GitClear Code Analysis 2025 13
Slide 15
K E Y TA K E A W AY S $211B in AI venture capital — 50% of all global VC. $1.5T in total AI spending. $93B agentic AI market projected by 2030 at 65.5% CAGR. Coding agents growing at unprecedented speed: Cursor $0 → $1B in 24 months. The bet is enormous. Now let's see what the technology can actually do. 14
Slide 16
“ Swiftly, swiftly, the sphere of mentality expanded. — Olaf Stapledon, Star Maker, 1937
Slide 17
SECTION 2 The Technology 1,000× cost reduction in 4 years — and what it means for agents 15
Slide 18
Cost Curves: The Deflationary Spiral 35 30 92% 25 cost reduction since GPT-4 launch (Mar 2023 to today) 20 15 10 $0.10/M cheapest viable model 5 (GPT-4o-mini / DeepSeek) 0 Mar 23 Jun 23 Nov 23 Mar 24 Jun 24 Nov 24 Mar 25 Sep 25 Feb 26 DeepSeek effect: Open-source models at $1.50/M match frontier quality, forcing price Frontier Model ($/M input tokens) Budget Model ($/M input tokens) competition across the board. The cost of intelligence is plummeting faster than any technology curve in history. What cost $30/M tokens in 2023 now costs $0.10-2.50. This enables entirely new use cases. Source: OpenAI API pricing history; IntuitionLabs token cost analysis 2026; DeepSeek pricing; Anthropic pricing 16
Slide 19
The Exponential Nobody Expected: AI Task Horizons 16 ← 1 work day (8.3 hrs) What METR Measures METR gives AI agents real software 14 tasks of increasing difficulty and measures: 12 how long a task can the agent complete 50% of the time? 10 This is the '50% time horizon.' 8 Longer = more capable agent. 6 Why It Matters 4 50 min → Autocomplete tool 2 2 hrs → Junior pair-programmer 0 GPT-2 GPT-3 GPT-3.5 GPT-4 GPT-4o Sonnet o3 GPT-5 Opus 4.5 Opus 4.6 2019 2020 2022 2023 Early 24 3.7 Late 24 Mid 25 Late 25 Feb 26 14.5 hrs → Ships features solo Mid 24 Doubling every 4 months 1 week → Runs a sprint (projected) (down from every 7 months over 2019- Each doubling = new category of delegation. 2025) Task horizon is accelerating: doubling every 4 months in 2024-2025. Opus 4.6 crossed the 1-work-day threshold. If the trend holds, week-long autonomous tasks arrive by late 2026. Source: METR, 'Task-Completion Time Horizons,' Feb 2026; chart adapted from AI Digest (CC-BY); R² = 0.93. Note: 50% time horizon ≠ flawless continuous operation. 16B
Slide 20
Bigger Context Windows Were Not Enough 250× growth — but three limits remain: 1,000 Attention Degrades 1 >30% accuracy drop when key info is in the middle of a long context 1,250× Cost Premium 2 Long-context query: $0.10+ vs. 200 RAG retrieval: $0.00008 128 4 45× Slower 2023 2024 2025 2026 3 Long-context: ~45 sec response vs. RAG: ~1 sec — unusable in agent loops This is why the industry built agentic architecture instead: RAG for retrieval, tools for action, memory for persistence. Extending task horizons requires systems, not just bigger windows. Context grew 250× and it wasn't enough. The shift to agents with external tools — not just bigger models — was the direct consequence. Source: Liu et al., "Lost in the Middle" (Stanford/UC Berkeley, 2023); Elasticsearch Labs cost analysis; Anthropic pricing 16C
Slide 21
SWE-Bench: Software Engineering Leaderboard Claude Opus 4.5 80.9% 80.9% Claude Opus 4.6 80.8% Current best. MiniMax M2.5 80.2% Up from 33% just 18 months ago. GPT-5.2 80% Sonar Agent 79.2% GLM-5 (Zhipu) 77.8% Kimi K2.5 76.8% DeepSeek R1 49.2% The top 5 are clustered within 1.7 points. The leaderboard reshuffles monthly. Chinese models (MiniMax, GLM, Kimi) are now in the top 7. Source: SWE-bench.com Verified Leaderboard, retrieved Feb 2026; marc0.dev leaderboard tracker 17
Slide 22
SWE-Bench Pro: Enterprise-Grade Agentic Tasks Performance drops sharply on enterprise-grade tasks Auggie CLI Pro: 51.8% ~43% average drop from Verified: 80.9% Verified to Pro Claude Opus 4.5 Pro: 45.9% Real enterprise tasks are still significantly harder than benchmarks suggest. Verified: 71.7% OpenAI o3 Pro: 38.2% The gap between benchmark performance and real-world enterprise tasks is still ~40%. Auggie CLI's agent scaffold matters more than raw model ability. Source: Scale AI SEAL SWE-Bench Pro Leaderboard, Feb 2026; Augment Code blog, Jan 2026 18
Slide 23
GPQA Diamond: Surpassing Human Experts Claude Opus 4.6 91.3% PhD-Level OpenAI o3 87.7% Biology, chemistry, physics questions written by domain PhDs. Non-experts Kimi K2.5 87.6% with web access score only 34%. OpenAI o1 78.3% DeepSeek R1 71.5% Human experts: 69.7% AI now exceeds human domain experts by 21+ points on PhD-level science. The top 3 models all score above 87% where experts score 69.7%. Source: Rein et al., GPQA (NeurIPS 2023); Artificial Analysis GPQA Diamond Leaderboard, Feb 2026; Epoch AI 19
Slide 24
LMArena Elo: The Human Preference Leaderboard 5.2M+ Claude Opus 4.6 1 Thinking 1506 Elo 2 Claude Opus 4.6 1502 Elo human preference votes across 302 models 3 Gemini 3 Pro 1486 Elo Key shift: Anthropic reclaimed #1 after Gemini 3 Grok 4.1 4 Thinking 1475 Elo dominated for months. Thinking models now outperform their non- 5 Gemini 3 Flash 1473 Elo thinking variants. Elo ratings compress at the top: only 33 points separate #1 from #5. The real differentiator is now task-specific performance, not overall preference. Source: LMArena.ai Leaderboard, retrieved Feb 20, 2026; 5.2M+ community votes 20
Slide 25
LiveCodeBench: Competitive Programming 1,000+ 78 74 72 69 67 65 58 problems from LeetCode, AtCoder, CodeForces Easy/Medium: Top models score 90%+ on easy splits Hard: Performance drops to GPT-5 Mini GPT-5.1 o3 Gemini 3 o4 Mini Claude DeepSeek 30-40% on hard problems Pro Opus 4.6 R1 OpenAI dominates competitive coding. But competitive programming skill doesn't always translate to real-world engineering (see SWE-Bench gaps). Source: LiveCodeBench v6 Leaderboard, Feb 2026; Artificial Analysis coding evaluations 21
Slide 26
The Model Specialization Matrix Coding Reasoning Creative Agentic Multimodal Claude Opus 4.6 █████ █████ ████░ █████ ████░ GPT-5.2 ████░ ████░ ████░ ████░ █████ Gemini 3 Pro ████░ ████░ ████░ ████░ █████ o3 (reasoning) █████ █████ ██░░░ ███░░ ███░░ DeepSeek R1 ███░░ █████ ███░░ ███░░ ██░░░ Llama 4 Scout ███░░ ███░░ ███░░ ███░░ ████░ No single model leads across all categories. The winning strategy is increasingly model routing — sending each task to the best-fit model. Source: Composite from SWE-bench, GPQA, LMArena, LiveCodeBench, METR; Artificial Analysis, Feb 2026 22
Slide 27
Reasoning Models: The New Standard AIME 2025 52% GPT-4o >>> 89% o3 +37 GPQA Diamond 56% GPT-4o >>> 91% Opus 4.6 +35 ARC-AGI 21% GPT-4 >>> 88% o3 +67 Codeforces Elo 1200 GPT-4 >>> 2727 o3 +127% Reasoning models (o3, extended thinking) represent the largest single-generation capability jump in LLM history. The tradeoff: 10-50x more compute per query. Source: OpenAI o3 system card; Anthropic Claude Opus 4.6 benchmarks; ARC Prize Foundation; AIME 2025 results 23
Slide 28
Open vs. Closed: The Gap Has Collapsed 100 95 90 ~1% current gap 85 (down from ~17% in Jan 2024) 80 75 DeepSeek R1 matched 70 GPT-4 performance at $1.50/M tokens vs $15+ 65 Qwen, Llama 4, Mistral 60 all closing fast. Jan 24 Apr 24 Jul 24 Oct 24 Jan 25 Apr 25 Jul 25 Oct 25 Jan 26 Closed (best) Open (best) Approximate MMLU-Pro scores (best model in each category) Open-source is no longer a generation behind. DeepSeek proved you can match frontier quality at 1/10th the cost. This reshapes the entire competitive landscape. Source: MMLU-Pro Leaderboard (Artificial Analysis); DeepSeek R1 Technical Report (arXiv 2501.12948); DataCamp model comparisons 24
Slide 29
Open Source Leaders for Agents DeepSeek R1 671B MoE MATH-500: 97.3% Distillation enables $1.50/M tokens Llama 4 Scout 109B (17B active) 10M context MoE architecture, 16 experts Llama 4 Maverick 400B (17B active) Beats GPT-4o 128 experts, multimodal Qwen 2.5 Max MoE MMLU: 87.9 Chinese-English bilingual leader Mistral Large 2 123B 128K context Multilingual, code-first Open-source is no longer just a research alternative. DeepSeek, Llama 4, and Qwen are production-grade for agent workloads at a fraction of API costs. Source: DeepSeek R1 paper (arXiv 2501.12948); Meta Llama 4 blog, Apr 2025; Alibaba Qwen 2.5 report; Mistral AI blog 25
Slide 30
The Chinese Model Surge Model Organization SWE-Bench GPQA Cost/M tok DeepSeek R1 DeepSeek 49.2% 71.5% $1.50/M 3 of top 7 MiniMax M2.5 MiniMax 80.2% ~80% $2.00/M on SWE-Bench are Chinese models. GLM-5 Zhipu AI 77.8% ~78% N/A MiniMax M2.5 (#3 overall) surprised the industry. Kimi K2.5 Moonshot 76.8% 87.6% $3.00/M Qwen 2.5 Max Alibaba 69.6% ~72% $1.80/M China has achieved parity on coding and reasoning benchmarks at dramatically lower price points. Export controls have not prevented capability catch-up. Source: SWE-bench.com; GPQA Leaderboard; AIPortalX Chinese model comparison; company pricing pages, Feb 2026 26
Slide 31
Multimodal Vision Agents Mature Image Understanding GPT-5.2, Gemini 3, Claude Opus 4.6 Mature Video Analysis Gemini 3 Pro (2hr), GPT-5.2, Kimi K2.5 Mature Audio Processing Gemini 3 (19hr), GPT-5.2, Whisper v4 Emerging Screen/UI Reading Claude Opus (computer use), GPT-5.2 Early Real-time Video Gemini 3 Pro, GPT-5.2 Realtime Research 3D Understanding Research stage (Meta, NVIDIA) Vision is the new frontier for agents. Computer use (reading screens, clicking UI) turns LLMs into general-purpose software operators — not just code generators. Source: Google Gemini 3 Pro capabilities; Anthropic computer use; OpenAI GPT-5.2 spec; Roboflow multimodal rankings 2026 27
Slide 32
The Evolution: RAG to Agentic RAG to RLM Basic RAG Agentic RAG RLM Context Engine 2023-2024 2024-2025 2025-2026 Emerging Vector search, retrieve Self-RAG, Corrective RAG, Recursive decomposition; Unified knowledge core + >>> >>> >>> chunks, augment, generate GraphRAG (30K+ GH stars) 100x beyond context window memory + tool orchestrator $1.94B 0.978 +28.3% ? market (2025) accuracy improvement paradigm shift Enterprise AI advantage will hinge on who feeds the highest-quality, most real-time context — not who has the largest model. Source: MarketsandMarkets RAG Market Report; Microsoft Research (GraphRAG); Zhang et al., RLM (arXiv 2512.24601); Prime Intellect 30
Slide 33
Agent Memory: The Missing Architecture Memory Hierarchy Production Memory Systems Context window, CoT Letta (MemGPT) #1 Terminal-Bench Short-term scratchpad 1M tokens Git-based memory versioning Past conversations, task Episodic outcomes Per session Mem0 90% token reduction 1.8K vs 26K tokens/conversation Extracted facts, rules, Semantic preferences Persistent Zep 18.5% accuracy gain Learned strategies, action Self- Procedural patterns improving Temporal knowledge graphs, <200ms Hybrid search recall: BM25 (22.1%) + Dense (48.7%) = Hybrid (53.4%) The gap between a demo agent and a production agent is memory. Without persistent, structured memory, agents repeat mistakes and lose context. Source: Letta (MemGPT) docs; Mem0 research (mem0.ai); Zep (arXiv 2501.13956); Weaviate hybrid search benchmarks 31
Slide 34
Latest Model Releases (Feb 2026) Claude Opus 4.6 Anthropic Feb 5 1M context, agent teams, adaptive thinking, #1 LMArena Gemini 3.1 Pro Google Feb 2026 77.1% ARC-AGI-2, advanced agentic reasoning GPT-5.3-Codex OpenAI Feb 5 General-purpose work-on-computer agent, real-time coding Qwen 3.5 Alibaba Feb 17 Native multimodal (text+image+video) Kimi K2.5 Moonshot Jan 2026 Video generation + agentic tasks MiniMax M2.5 MiniMax Jan 2026 MoE, #3 on SWE-Bench overall Llama 4 Scout Meta Apr 2025 10M context, 109B MoE, open-source Seven major model releases in eight weeks. The release cadence has accelerated from quarterly to near- continuous. Keeping up is itself a competitive challenge. Source: Anthropic blog; Google AI blog; OpenAI blog; Alibaba (CNBC Feb 17); company announcements, Jan-Feb 2026 32
Slide 35
The Multimodal Convergence: Text + Vision + Audio + Action Text 1M+ tokens Vision 2hr video Context windows 250x larger Process full-length films, than 2023. Near-book-length. read screens, interpret diagrams. Leader: Claude Opus 4.6 Leader: Gemini 3 Pro Audio 19hr audio Action Computer use Process day-long recordings, Click, type, navigate. Operate real-time transcription. any software through the screen. Leader: Gemini 3 Pro Leader: Claude (computer use) The convergence of text, vision, audio, and action in a single model creates agents that can perceive and act in the world, not just generate text. Source: Google Gemini 3 Pro spec; Anthropic Claude computer use; OpenAI GPT-5.2 Realtime API; model capability documentation 34
Slide 36
K E Y TA K E A W AY S Frontier models now exceed human PhD experts on science benchmarks (91% vs 70%). The open-source gap has collapsed to ~1%. DeepSeek matches GPT-4 at 1/10th the cost. Agent task horizons doubling every 123 days. Claude Opus 4.6 hit 14.5 hours — week-long tasks by late 2026. Memory — not model size — is the key differentiator for production agents. The cost of intelligence fell 92% in 3 years. This changes everything. 35
Slide 37
“ One day, ladies will take their computers for walks in the park and tell each other, 'My little computer said such a funny thing this morning!' — Alan Turing, 1951
Slide 38
SECTION 3 The Evolution Context is everything: −19% on legacy code, +67% on greenfield — and 6× between average and power users 36
Slide 39
The Productivity Spectrum: Context Is Everything Legacy Maintenance Greenfield + Unfamiliar −19% slower (METR, experienced devs, 1M+ +26–67% faster (multiple studies) LOC repos) Anthropic: +67% merged PRs/engineer/day 16 expert devs on 10yr+ codebases they knew well. GitHub Copilot: +55% faster task completion Perception gap: thought +20% faster, actually slower. Novices: +35% faster (Stanford/MIT) Researchers' caveat: "Results don't apply to less Power users vs. average: 6x productivity gap (OpenAI) experienced devs or unfamiliar codebases." The Outliers: Where It Goes Exponential Base44: Solo founder → $80M exit in 6 months, zero funding | Devin: 10x faster migrations (3-4 hrs vs 30-40 hrs) Cursor: +126% user productivity, $500M ARR | Claude Code: 4% of all GitHub commits → 20% by year-end Google: 25% of code now AI-assisted, +31.8% review speed | Harvard: AI novices reach expert level in 2 months vs. 8 (4x) Same tools. Wildly different results. The METR slowdown is real — for one context. The 6x power-user gap (OpenAI) suggests we're still learning how to use these tools. Source: METR Jul 2025; Anthropic internal study Aug 2025; OpenAI power-user report; GitHub Copilot; Stanford/MIT; Harvard 51
Slide 40
What Agents Actually Do in Code Read codebase, docs, issues, PRs. 1 Understand Context Build a mental model of the project. Break the task into subtasks. 2 Plan Approach Identify files to modify, tests to write. Write code, create files, run commands. 3 Execute Changes Multi-file edits in a single operation. Run tests, lint, type-check. 4 Test & Validate Iterate until CI passes. Create PR, write description. 5 Submit for Review Respond to review comments. This is not autocomplete. Modern coding agents execute the full software development lifecycle autonomously — the human's role shifts to review and steering. Source: Anthropic Claude Code documentation; GitHub Copilot Workspace; Cognition Devin workflow documentation 50
Slide 41
The Tweet That Started It All "There's a new kind of coding I call 'vibe coding', where you fully give in to the vibes, embrace exponentials, and forget that the code even exists." Andrej Karpathy | February 2, 2025 4.5M+ views on X/Twitter The term spread so fast that Merriam-Webster recognized it as a trending noun within weeks. The idea captured something real: LLMs had gotten good enough that you could describe what you wanted and watch it appear. Source: Andrej Karpathy (@karpathy), X/Twitter, Feb 2, 2025; Merriam-Webster Trending Words 37
Slide 42
What Vibe Coding Looked Like (Early 2025) Conversational Non-linear Describe what you want in plain English, No structured dev process; follow your iterate by talking intuition and vibes Acceptance-first Solo-scale Accept most AI suggestions, fix issues as they One person can build what once required a arise team Vibe coding was a genuine paradigm shift for beginners and prototypers. But it hit a wall at scale: technical debt, security gaps, and unmaintainable code. Source: Karpathy (X, Feb 2025); community analysis across Hacker News, Reddit r/programming 38
Slide 43
The Shift: Karpathy's 2026 Update 2025: Vibe Coding 2026: Agentic Engineering "Give in to the vibes" "Orchestrating agents as oversight" Human writes prompts Human defines goals & standards AI generates code >>> Agents plan, write, and test code Human accepts most output Human reviews and steers Fix bugs as they appear Structured workflows & CI/CD Best for: prototypes, MVPs, Best for: production software, personal projects team-scale projects "Programming via LLM agents is increasingly becoming a default workflow for professionals, except with more oversight and scrutiny." — Karpathy, 2026 Source: The New Stack, "Vibe Coding Is Passe," 2026; Glide Blog, "What Is Agentic Engineering?" 39
Slide 44
Vibe Coding vs. Agentic Engineering: Side-by-Side Dimension Vibe Coding Agentic Engineering Human role Prompter Architect / Overseer AI role Code generator Autonomous executor Quality control Manual / ad hoc CI/CD, tests, reviews Code ownership "Forget code exists" Human accountable Best for Prototypes, MVPs Production systems Scale Single files Multi-repo, multi-agent Risk Technical debt Delegation failures Vibe coding didn't die — it evolved. The vibes are still there, but now there's engineering discipline around them. The best builders do both. Source: Karpathy (2025/2026); The New Stack; community practice analysis 40
Slide 45
English-Only Programming "The role of the builder shifted from writing code to articulating what should exist." Jon Radoff, meditations.metavert.io Composition > Intent > Syntax Iteration > Planning Construction Describe outcomes, not Compose services via MCP, CLI, APIs. Ship fast, steer the agent, converge implementation. Don't rebuild what agents can on quality through feedback loops. The compiler is now a conversation. integrate. English-only programming isn't about dumbing down software — it's about raising the abstraction layer so human intent drives execution directly. Source: Jon Radoff, meditations.metavert.io; MIT Media Lab talk; LightCMS, Chessmata, Agent Almanac case studies 41
Slide 46
The Three Eras of Software Creation Pioneer Era Engineering Era Creator Era 1950s-1990s 2000s-2020s 2025+ Who builds: Who builds: Who builds: Specialists Only Trained Developers Anyone with Intent How: How: How: Assembly, C, mainframes Frameworks, cloud, agile Natural language, agents Builders: Builders: Builders: Thousands Millions Billions Each era expanded the builder pool by orders of magnitude. The Creator Era doesn't eliminate engineers — it turns everyone else into potential builders too. Source: Jon Radoff, meditations.metavert.io; historical software development analysis 42
Slide 47
The Copilot Trap: Sustaining vs. Disruptive Innovation Copilots = Sustaining Agents = Disruptive Makes existing products better New business model that challenges for current customers incumbents from below Incumbents win (Microsoft, Google) New entrants win (Cursor, Lovable) Examples: Examples: Copilot in Office Claude Code (replace IDEs) Gemini in Search Devin (replace junior devs) Duet AI in GCP Bolt.new (replace agencies) Microsoft (2026): "Copilot was chapter one. Agents are chapter two." Source: CIO, "The AI Productivity Trap," 2026; Christensen, The Innovator's Dilemma; Microsoft developer keynote 2026 44
Slide 48
The Coding Agent Landscape (Feb 2026) Terminal Agent Claude Code CLI delegation; minimal context switching $2.5B+ Smart Editor Cursor Multi-file edits; project-wide context $1B Inline Suggest GitHub Copilot Fast autocomplete; GitHub integration $2B+ Editor+Agent Windsurf Cascade agent; cross-file context $100M+ Autonomous Devin Full project autonomy; 67% PR merge rate N/A Cloud IDE Replit Agent Full-stack; cloud execution; no local setup $250M Autocomplete >>> Smart Editor >>> Terminal Agent >>> Autonomous Source: DigitalOcean agent comparison; Sacra; CNBC; company websites, Feb 2026 45
Slide 49
Developer Adoption: The Numbers 84% 51% of developers using or use AI tools daily planning to use AI tools 59% 46% run 3+ AI tools do not fully trust in parallel AI results The trust paradox: 84% adoption but only 54% trust. Developers are using AI tools extensively despite deep skepticism about their reliability. Source: Stack Overflow 2025 Developer Survey; CODERCOPS AI Adoption Report 2026; Index.dev AI Pair Programming Stats 46
Slide 50
Claude Code + Cowork: The Anthropic Stack Claude Code Claude Cowork Terminal-first agentic coding Desktop AI assistant for non-devs Direct command-line delegation File & task management Minimal context switching Document creation & editing Multi-file, multi-repo capable Browser automation Git-native workflow Workflow orchestration 4% of all GitHub public commits Part of the trigger for the (Feb 2026) — projected 20%+ by EOY SaaSpocalypse (Feb 3, 2026) Anthropic's dual strategy: Claude Code for developers, Cowork for everyone else. Together they cover the full spectrum from terminal to desktop. Source: Anthropic product announcements; SemiAnalysis (Dylan Patel, GitHub commit analysis); Bloomberg SaaSpocalypse coverage 47
Slide 51
GitHub Agent HQ: The Multi-Agent Platform Announced GitHub Universe 2025. Operational Feb 2026. Agent HQ — Mission Control Manage multiple AI agents from one interface within GitHub Claude Anthropic Codex OpenAI Copilot GitHub Gemini Google Devin Cognition Grok xAI GitHub is positioning itself as the operating system for AI-assisted development. Agent HQ makes the agent choice interchangeable — the platform is the moat. Source: Visual Studio Magazine, Oct 2025; CNBC, "GitHub Unites OpenAI, Google and Anthropic Agents," Oct 2025 48
Slide 52
Devin: The Autonomous Engineer $20/mo 67% Price (down from $500) PRs merged (up from 34%) 83% $9.8B More tasks per compute unit Cognition valuation (Aug 2025) EightSleep: 3x more data features shipped | Litera: Test coverage +40%, regression cycles 93% faster Devin's price drop from $500 to $20/mo signals the commoditization of autonomous coding. The question is no longer whether agents can code, but how much oversight they need. Source: VentureBeat (Devin 2.0 pricing, Jan 2026); Cognition AI annual review 2025; company case studies 49
Slide 53
Anthropic's Economic Index: Augmentation > Automation Key Findings of conversations are augmentation 52% (learning, iteration, feedback) 48% 52% speedup on tasks requiring 12x college-degree-level education Higher countries use AI more as a collaborative partner, less for automation Augmentation (52%) Automation (48%) GDP The economic signal: AI is becoming a thinking partner, not just a task executor. Complex knowledge work benefits most from augmentation, not replacement. Source: Anthropic Economic Index, Jan 2026 Report; Anthropic Economic Index, Sep 2025 (geographic distribution) 52
Slide 54
K E Y TA K E A W AY S Vibe coding evolved into agentic engineering: same creative energy, now with production discipline. The SaaSpocalypse erased $2T in SaaS market cap. Seat-based pricing is the casualty. 84% of developers use AI tools, but only 54% trust them. Adoption outpaces confidence. Context is everything: −19% on legacy code (METR) but +67% on new work (Anthropic). The 6x power-user gap says we're still on the learning curve. Anthropic's data: 52% augmentation, 48% automation. AI is becoming a thinking partner. 53
Slide 55
SECTION 4 The Creator Economy 100K+ new products built daily on AI-native platforms — the barrier between idea and product is collapsing 54
Slide 56
The Creator Economy Explosion: Market Size 900 820 800 700 660 $820B projected by 2030 600 22.7% CAGR 510 500 380 400 300 275 84% 200 200 165 of creators now 127 using AI tools 100 0 AI in Creator Economy: 2023 2024 2025 2026E 2027E 2028E 2029E 2030E $3.31B (2024) >>> $12.85B (2029) 31.4% CAGR The creator economy is a $200B market today growing at 23% annually. AI tools are the accelerant — 84% of creators have already adopted them. Source: GlobeNewswire Creator Economy Market Report, 2025; AI in Creator Economy Market Forecast 2024-2029 55
Slide 57
Jon Radoff's Three Eras of Digital Identity 2000s 2010s 2020s+ Identity Era Self-Expression Era Empowerment Era >>> >>> Who we are online. What we create. Projecting our will through Profiles, avatars, social graphs. UGC, streaming, content creation. intelligent agents. NOW. "The role of the builder shifted from writing code to articulating what should exist." — Jon Radoff Source: Jon Radoff, meditations.metavert.io; MIT Media Lab presentation 56
Slide 58
Natural Language as Compiler OLD: Traditional Development Learn to Idea >>> code >>> Build >>> Deploy >>> MONTHS Iterate NEW: Agent-Assisted Creation Describe Agent Idea >>> intent >>> builds >>> Deploy >>> HOURS Iterate 41% of all code is now AI-generated Source: GitClear Code Analysis 2025; development workflow analysis 57
Slide 59
AI-Native Creator Platforms $1B 100K+ Cursor 24 months Replit $250M ~12 months new products built daily on these platforms Lovable $200M 8 months Combined ARR: $1.5B+ None existed 3 years ago. Bolt.new $40M 4 weeks These aren't SaaS growth curves. They're consumer-viral with enterprise revenue. Cursor went 0 to $1B faster than Slack, Zoom, or any enterprise product in history. Source: Sacra (Cursor, Replit, Bolt.new); CNBC (Lovable, Feb 2026); TechCrunch 58
Slide 60
Who's Creating? The Democratization Data 40% 40M+ of low-code/no-code users are Replit global community non-technical creators (not all engineers) 5M+ 76% Bolt.new signups with of developers using or planning 22-min avg sessions AI for code generation The builder pool is expanding beyond developers. 40% of no-code users have no technical background — they're designers, marketers, and entrepreneurs building software. Source: Stack Overflow Developer Survey 2025; Replit company stats; Bolt.new metrics; Gartner low-code forecast 59
Slide 61
Roblox: The $7.76B Proof of Creator Economics 144M 12.3M 44M+ Daily active users Monthly active Published (+69% YoY) developer studios experiences $1B+ $1.3M $7.76B Paid to developers Avg earnings, top 1,000 Revenue (2025 in 9 months (2025) creators (+50% YoY) estimated) Roblox proves the model: a platform where millions of non-traditional creators earn real revenue. AI-native creation platforms are building the same flywheel, faster. Source: Roblox Q3 2025 earnings; meditations.metavert.io; SEC filings 60
Slide 62
The YouTube Parallel: History Repeating YouTube (2005-2015) AI Creation (2024-2026) "It's just cat videos" "It's just vibe coding" "Real creators use TV studios" "Real engineers write their own code" "Quality will never match pro" "Quality will never match human" "Amateurs are ruining media" "Non-coders are ruining software" "No one can make money on it" "Technical debt will kill it" Result: $50B+ creator economy. Emerging: 100K+ products/day. MrBeast earns more than Single builders shipping most TV networks. what teams couldn't. Every creation tool was accused of degrading quality. The pattern is consistent: democratization creates more bad output AND more great output simultaneously. Source: YouTube historical analysis; creator economy market data; AI creation platform metrics 61
Slide 63
The "1000x Developer" Thesis Not just 10x engineers anymore — single person + agents = team-level output LightCMS Chessmata Agent Almanac Full CMS built in conversation. Complete chess application. An agent that discovers, catalogs, 38 MCP tools, Go + MongoDB, From concept to playable product and reviews other agents. agent-native from the ground up. using agentic engineering. Agents building for agents. Days Weekend 1-2 days Nuance: METR shows −19% for experts on familiar legacy codebases. But for new creation — the creator economy's domain — gains range from +35% (novices) to +67% (Anthropic). The 6x power-user gap (OpenAI) shows the ceiling is far higher than the average. Source: Jon Radoff, meditations.metavert.io; METR Jul 2025; Anthropic Aug 2025; OpenAI power-user report 62
Slide 64
Winners in the Creator Economy Creators Infrastructure 1 2 Democratized building. Everyone can ship. Compute, semiconductors, cloud. The barrier moved from code to imagination. Essential regardless of who wins above. Hard-Problem Toolmakers Workflow-Obsessed 3 4 Complex scaling, not simple CRUD. Removing friction from creator processes. The work agents can't easily replicate. The plumbing that makes building faster. LOSERS: UI-heavy SaaS that charges per-seat for workflows agents can automate Source: Jon Radoff, meditations.metavert.io; creator economy framework analysis 63
Slide 65
The Pricing Revolution OLD: Per-Seat Licensing NEW: Usage-Based Pricing Annual contracts, per-user pricing Pay for what you use Revenue scales with headcount Revenue scales with value delivered Misaligned: agents don't need seats Agent-compatible: metered consumption Stripe acquires Metronome Usage-based billing infrastructure (Dec 2025) Stablecoins > Visa Total transaction volume surpassed Visa in 2024 x402 protocol Agent-to-agent micropayments: 35M+ transactions Source: a16z State of Crypto; Stripe/Metronome announcement; meditations.metavert.io; x402 protocol metrics 64
Slide 66
K E Y TA K E A W AY S The creator economy hits $820B by 2030. AI is the multiplier — 84% of creators already use AI tools. 100,000 new products are built every day on AI-native platforms. One person + agents = team-level output. LightCMS, Chessmata, and Agent Almanac prove the thesis. The SaaS pricing model is breaking. Usage-based and micropayment models are replacing per-seat licensing. 65
Slide 67
SECTION 5 Who Is Winning Cursor: $0 to $1B ARR in 24 months. The SaaSpocalypse erased $2T. Here's who survived — and why. 66
Slide 68
The SaaSpocalypse in Numbers $2T -13 -15 -18 in SaaS market cap evaporated -22 Jan 15 - Feb 14, 2026 -28 Root cause: "Seat compression" -35 One AI agent replaces dozens of human licenses. Atlassian Salesforce Adobe ServiceNow Workday S&P 500 Software 10 agents >>> 100 sales reps = 90% fewer Salesforce seats The SaaSpocalypse isn't a market correction — it's a structural repricing. Seat-based revenue models break when agents replace human users. Source: Bloomberg, Feb 4, 2026; CNBC, Feb 6, 2026; S&P 500 Software Index; SaaStr analysis 67
Slide 69
The Winner Framework: Composable > Complex WINNING LOSING API-first, CLI-composable, UI-heavy, seat-based, agent-friendly workflow-dependent Examples: Examples: GitHub (agent orchestration) Salesforce (-28% in a month) Stripe (agent payments) Atlassian (-35% in a month) Supabase (agent-created DBs) Adobe (AI-native competitors) Vercel (instant deploy) ServiceNow (agent workflows) Cloudflare (edge inference) Traditional CRMs Linear (agent-composable) Per-seat collaboration tools Infrastructure that agents can compose via command-line is winning. UI-heavy SaaS that charges per human seat is losing. Source: Market data; Jon Radoff composable infrastructure thesis, meditations.metavert.io 68
Slide 70
GitHub: The Operating System for AI Development 100M+ 420M+ developers on the repositories platform (1 billionth created recently) 20M+ 90% cumulative Copilot users of Fortune 100 use 1.3M paid subscribers GitHub Copilot Feb 2026: Agent HQ — multi-agent platform supporting Claude Code, OpenAI Codex, Copilot, Google, Cognition, and xAI agents from one interface Source: GitHub stats page; Copilot usage data (wearetenet.com); GitHub Universe 2025 announcements 69
Slide 71
Claude Code on GitHub: The 4% That Signals 20% 4% of all GitHub public commits For context: authored by Claude Code (Feb 2026) GitHub Copilot took 2+ years to reach similar influence. Projected: 20%+ by end of 2026 Claude Code is terminal-first, which means it's capturing The product didn't exist 9 months ago. workflows that editor-based Now it's authoring 1 in 25 commits tools don't reach. on the world's largest code platform. Anthropic ARR: $2.5B+ (annualized) If the 20% projection holds, AI will write more new code than human devs by 2027. Source: SemiAnalysis (Dylan Patel), GitHub commit analysis, Feb 2026; GIGAZINE; Anthropic investor materials 70
Slide 72
GitHub Agent HQ: Multi-Agent Platform Claude Code OpenAI Codex Copilot Anthropic Live OpenAI Live GitHub Live Gemini Devin Grok Google Coming Cognition Coming xAI Coming Enterprise: centralized access management | Agent permissions | Security policy enforcement | Audit logging GitHub is making the agent choice interchangeable. The platform is the moat, not any single AI provider. This is the multi-cloud model applied to AI agents. Source: Visual Studio Magazine, Oct 2025; CNBC, "GitHub Unites AI Agents," Oct 2025; InfoWorld 71
Slide 73
Cursor: Fastest B2B SaaS in History $1B ARR $29.3B in 17-24 months valuation after $2.3B (fastest B2B SaaS ever) Series D (Nov 2025) 50%+ 25% of Fortune 500 of Y Combinator (NVIDIA, Adobe, Uber, Stripe) companies use Cursor Cursor's growth exceeds Slack, Zoom, and every enterprise product in history. It captured developers by being better at what they already do — a sustaining innovation that became essential. Source: BusinessWire (Series D, Nov 2025); SaaStr analysis; Yahoo Finance; CB Insights 72
Slide 74
The Composable Database Winners 62.5K+ customers, $2.4B+ rev MongoDB Atlas 30% YoY 74% of revenue is Atlas $5B valuation (Oct 2025) Supabase 4M devs 1M >>> 4M devs in <1 year of new DBs created by AI agents Neon 80% Vector DBs gaining adoption: 74% of orgs plan integrated vector databases for agentic AI applications (IDC) Not humans — agents Neon's stat is the signal: 80% of new databases are created by AI agents, not humans. The database layer is being reshaped by agent-driven infrastructure choices. Source: MongoDB Q3 FY2026 earnings; TechCrunch (Supabase Oct 2025); Neon/LinkedIn (agent DB stats); Databricks/Neon acquisition; IDC AI Infrastructure Survey 73
Slide 75
PostgreSQL: 55% Adoption and Rising 56 55.6% 41 adoption in 2025 34 (up from 48.7% in 2024) 28 27 25 Largest single-year expansion in database history (+7 pts) Why it's winning: PostgreSQL MySQL SQLite MS SQL MongoDB Redis Open-source, composable, Server agent-friendly. Supabase, Neon, pgvector all built on it. "The database that won the agent era was invented in 1986." Source: Stack Overflow Developer Survey 2025; byteiota.com PostgreSQL analysis 74
Slide 76
Edge & Cloud Infrastructure Winners Cloudflare Vercel 5B+ AI Gateway requests (Q4 2025) $200M+ ARR Workers AI across 150+ cities v0 = majority of new revenue 40% of YC W25 cohort building on CF Instant deploy for AI-generated apps Railway Fly.io $100M raise Easy-to-compose cloud infrastructure "AI-native cloud" positioning Global deployment for distributed agents Zero-config deployment for agents GPU support for edge inference The edge is winning because agents need fast, global inference. Cloudflare's 40% YC cohort adoption signals where the next generation of AI startups is building. Source: Cloudflare Q4 2025 earnings; Sacra (Vercel); CNBC; BusinessWire (Railway) 75
Slide 77
The Inference Economy $300M Every agent action requires inference. Together AI ARR 10x growth from $30M one year prior Inference is the new 350 transaction. 300 300 250 Open-source model hosting 200 (DeepSeek, Llama, Mistral) 150 130 is driving explosive demand 100 for inference infrastructure. 44 50 4 15 0 Together AI valuation: $3.3B Dec 23 Apr 24 Oct 24 Mar 25 Sep 25 NVIDIA inference revenue: majority of data center sales Source: Sacra (Together AI); readthesignal.co; NVIDIA Q4 FY2025 earnings 76
Slide 78
Stripe: The Commerce Layer for Agents $1.1T+ $50B+ payment volume processed (2025) company valuation (Jan 2026) Agentic Commerce Protocol Co-developed with OpenAI. Enables AI agents to browse, select, and purchase autonomously. Instant Checkout in ChatGPT Powering purchases with Etsy, Shopify merchants. 25+ partners including URBN, Coach, Kate Spade. Metronome Acquisition Usage-based billing infrastructure (Dec 2025). Stripe is building Enables thepricing consumption payment railsworkloads. for agent for an agent economy. When agents can buy things autonomously, commerce infrastructure becomes the critical chokepoint. Source: Stripe Newsroom (OpenAI Instant Checkout); Stripe Blog (Agentic Commerce Solutions); ThisWeekInFintech; PitchBook Stripe valuation 77
Slide 79
SaaS That Survived (Because Agents Use Them) Linear Notion $100M revenue $600M ARR (+50% from 2024) 280% profit growth 50%+ paying for AI features Why it survived: API-first design, Why it survived: Flexible data model, clean data model, agents can API access, AI-native features. create/update issues and projects Agents can compose docs, databases, programmatically. and knowledge bases via API. Linear didn't need to pivot for agents Projected: $900M-$1B by end 2026. — it was already built for them. The SaaS companies that survived are the ones agents can compose with. API-first, clean data models, and programmatic access are the common thread. Source: getlatka.com (Linear, Notion); SaaStr (Notion valuation); company announcements 79
Slide 80
The Winning Tech Stack: Languages #1 on GitHub 2.6M monthly contributors, +66% YoY. TypeScript First time surpassing Python. Type safety catches 94% of LLM-generated code errors. #1 in ML Still dominates training and data science. Python But TypeScript runs production agent apps. CLI-first Single-binary deployment. The language of Go infrastructure agents (Docker, K8s, Terraform). 72% admired "Too hard for humans, perfect for agents." Rust Chroma rewrite: 4x faster than Python. TypeScript's rise to #1 on GitHub is an AI story: its type system catches errors that LLMs make, making it the safest language for agent-generated code. Source: GitHub Octoverse 2025; Stack Overflow Dev Survey 2025; Codecademy TypeScript analysis; JetBrains State of Rust 2025 80
Slide 81
The Winning Tech Stack: Frameworks React Next.js Tailwind CSS 39.5% +60% NPM 51% adoption. But docs traffic DOWN 40% of web developers. ~300% enterprise downloads YoY. 71% of React jobs (agents don't read docs). Revenue - adoption growth since 2023. require it. 135K GitHub stars. 80%. Every AI coding tool defaults to React + Tailwind + TypeScript. This isn't preference — it's training data. The most-documented stack wins the agent era by default. Tailwind's paradox: most popular CSS framework ever, but 80% revenue drop because AI tools write Tailwind without reading Tailwind docs. Usage up, revenue down. Source: builder.io; npm download stats; Tailwind Labs layoff reporting (devclass.com, Jan 2026); paddo.dev analysis 81
Slide 82
Rust's AI-Assisted Renaissance AI agents handle ownership/borrowing complexity that deterred humans from Rust for decades. 72% admiration rate (highest of any language) 4x faster — Chroma rewrite from Python to Rust Growing agentic devs moving Python >>> Rust for performance The language that was "too hard for humans" is perfect for AI agents. Rust's strict compiler catches bugs before they ship — exactly what you want from agent-generated code. Source: JetBrains State of Rust 2025; Stack Overflow Dev Survey; Red Hat Developer; Chroma performance benchmarks 82
Slide 83
Docker's Agentic Pivot 92% 13B $16.3B of IT professionals container downloads container market use Docker (2026) per month by 2030 (21.7% CAGR) The Container Is the Runtime for Agents Docker Compose now orchestrates agents, models, and tools together. Every AI coding tool runs in containers: Cursor, Claude Code, Devin. Agent sandboxing requires containers for security isolation. 92% adoption (up from 80% in 2024) = largest single-year jump in Docker history. Source: Docker 2025 State of App Dev Report; programming-helper.com Docker analysis; GM Insights container market 83
Slide 84
HuggingFace: The GitHub of Machine Learning 2M+ 500K+ 1M+ public models datasets across Spaces (hosted (2nd million in 335 days) 8K+ languages demos & apps) $130M 10K+ $500M revenue (2024) companies using valuation 86% YoY growth 2K+ on Enterprise Hub (Oct 2025 secondary) Strategic investors: Google, Amazon, NVIDIA, Intel, AMD, Qualcomm, IBM, Salesforce | Open LLM Leaderboard: 2M+ unique visitors HuggingFace is to open-source AI what GitHub is to code — the platform everyone builds on. Its leaderboard shapes which open models get trained and funded. Source: PitchBook; Sacra; HuggingFace blog; investor announcements 84
Slide 85
HuggingFace smolagents: Code-First Agent Framework Lightweight, readable, no ~1,000 lines of core logic Performance framework bloat Python snippets, not JSON blobs. Llama-3-70B + smolagents Code agents first 30% fewer LLM calls. matches GPT-4 performance 4th on GAIA Leaderboard Local, proprietary via LiteLLM, or Any LLM backend Open Deep Research: Hub models 54% on GAIA benchmark (vs. OpenAI's 67.36%) Docker, E2B, or WebAssembly Built in 24 hours. Sandboxed execution isolation Share tools/agents directly to Hub-native sharing HuggingFace Hub smolagents proves you don't need a closed API to build production agents. Open-source + lightweight framework + any LLM = viable alternative to proprietary stacks. Source: HuggingFace smolagents docs; smolagents.org; GAIA Leaderboard; HuggingFace Open Deep Research blog 85
Slide 86
The Open Source Composable Model Wins The Winning Pattern: Open-source core + optional hosted cloud PostgreSQL >>> Supabase, Neon | Linux >>> AWS, GCP | Kubernetes >>> managed K8s | LLMs >>> HuggingFace, Together AI 76% 2M+ of organizations expect to models on HuggingFace increase open-source AI usage as the central hub The pattern repeats: open-source wins the standard, commercial wrappers capture the revenue. The AI ecosystem is following the same trajectory as databases and cloud. Source: IBM/Red Hat open-source AI survey; HuggingFace; historical open-source market analysis 86
Slide 87
K E Y TA K E A W AY S GitHub is the operating system. TypeScript + React + Tailwind is the default stack. HuggingFace is the hub: 2M models, smolagents matching GPT-4, the leaderboard that shapes open AI. 80% of Neon's new databases are created by AI agents, not humans. Claude Code authors 4% of GitHub commits — projected 20%+ by year end. The SaaS companies that survived are the ones agents can compose with. API-first wins. 87
Slide 88
SECTION 6 The Infrastructure Wars 97M MCP downloads/month — the protocol wars are already over. Now it's about who builds the best tools on top. 89
Slide 89
The Problem: A Thousand Integrations WITHOUT A PROTOCOL WITH MCP Claude GitHub Claude GitHub GPT-4o Slack GPT-4o Slack Gemini DB Gemini MCP DB 5 × 5 = 25 integrations Llama Email Llama Email 5 + 5 = 10 connections Mistral CRM Mistral CRM Without a protocol, integrating N models with M tools requires N×M custom adapters. MCP reduces this to N+M — the same insight that drove USB, TCP/IP, and HDMI. Source: Anthropic MCP documentation; modelcontextprotocol.io 90
Slide 90
MCP vs. A2A: The Two Agent Protocols MCP (Anthropic) A2A (Google) Model Context Protocol Agent-to-Agent Protocol Focus: Agent-to-tool connectivity Focus: Agent-to-agent coordination Architecture: Host → Client → Server Architecture: Peer-to-peer via Agent Cards State: Stateless at protocol level State: Intentionally stateful Transport: Stdio, HTTP, SSE Transport: HTTP + gRPC (v0.3) Capabilities: Tools, Resources, Prompts, Capabilities: Long-running tasks, dynamic Sampling (recursive loops) discovery, cross-org collab "USB for AI agents" "TCP/IP for agent swarms" MCP connects agents to tools. A2A connects agents to agents. They're complementary layers — not competitors. Both donated to Linux Foundation. Source: Anthropic MCP spec; Google A2A spec; Linux Foundation AAIF, Dec 2025 91
Slide 91
MCP Adoption: Explosive Growth 97M+ 17,000+ MCP SDK downloads/month MCP community servers (npm + PyPI) vs. 150+ A2A orgs Both at 40% of enterprise apps will have Linux Fdn MCP → AAIF (Dec 2025) MCP agents by end 2026 (Gartner) A2A → LF project (Jun 2025) A standard layer is emerging: MCP leads tool-connectivity (97M downloads/month); A2A addresses agent-to-agent coordination. The next battle is governance, security, and distribution. Source: Anthropic MCP blog, Nov 2024; Google A2A spec, Apr 2025; Linux Foundation AAIF charter, Nov 2025; Gartner 2026; npmjs.com/package/@modelcontextprotocol 92
Slide 92
MCP vs. Function Calling: Head-to-Head Dimension Function Calling MCP Scope Single model, single API Any model, any tool Discovery Hardcoded by developer Runtime tool discovery Transport HTTP only Stdio, HTTP, SSE State Stateless per-call Persistent sessions Composability Per-vendor schemas Universal protocol Security API-key scoped OAuth 2.1 + PKCE Ecosystem Vendor-locked 5,500+ open servers Function calling is a feature. MCP is an ecosystem. The protocol layer wins because it lets tools work with any model — not just one vendor's. Source: Anthropic MCP spec; OpenAI function calling docs; community benchmarks 93
Slide 93
How MCP Became the Standard Layer MCP Adoption Timeline Nov 2024 Anthropic open-sources MCP Jan 2025 1,000+ community servers Mar 2025 OpenAI adds MCP support to ChatGPT May 2025 Google DeepMind adopts MCP for Gemini Aug 2025 Microsoft integrates MCP into Copilot Studio Nov 2025 MCP moves to Linux Foundation (AAIF) Feb 2026 5,500+ servers, 97M+ downloads MCP emerged as the standard for four reasons: first-mover advantage (Nov 2024), OpenAI adoption (Mar 2025), developer simplicity, and neutral governance (Linux Foundation). A2A fills the complementary agent-to-agent gap. Source: Anthropic MCP launch blog, Nov 2024; OpenAI Agents SDK docs, Mar 2025; Google DeepMind blog; Ben Thompson, Stratechery, 'The Agentic Web and Original Sin,' 2025; Linux Foundation AAIF charter, Nov 2025 94
Slide 94
Agent-Friendly Design: The New Competitive Moat "The next competitive moat isn't better AI — it's better agent-friendliness. Products that make themselves composable via MCP, CLI, and API will capture the agent economy. Products that hide behind UIs will die." — Jon Radoff, meditations.metavert.io API-First CLI-Composable State-Observable Every feature must be callable Command-line tools that agents can Agents need to read state, not just without a browser. REST + MCP pipe, chain, and orchestrate. Unix write it. Expose status, logs, and endpoints for every action. philosophy for AI. metadata. If agents can't use your product via API, they'll use a competitor's. Agent-friendly design is the new SEO. Source: Jon Radoff, meditations.metavert.io; Agent-Friendly Design thesis, Jan 2026 95
Slide 95
LightCMS: A CMS Built for Agents Agent-First Architecture 38 MCP tools exposed for agent interaction LightCMS was built from the ground up for agent interaction — not retrofitted. Key Design Choices: Go • 38 MCP tools for full content management • Go + MongoDB for high-throughput Built in Go for • CLI-composable — every action is scriptable low-latency, high-concurrency • Schema is agent-discoverable at runtime • No mandatory UI — agents don't need one 100% API coverage — zero UI-only features LightCMS proves the thesis: a CMS designed for agents from day one outperforms traditional CMS platforms that bolt on API access as an afterthought. Source: Jon Radoff, LightCMS project; Beamable, lightcms.io 96
Slide 96
Agent Frameworks: The Orchestration Layer CrewAI ★ 27K+ LangGraph ★ 10K+ AutoGen ★ 40K+ Multi-agent teams Stateful graph agents Multi-agent conversation Python Python/JS Python smolagents ★ 15K+ Mastra ★ 10K+ Agno ★ 20K+ Minimal code agents TypeScript-first agents Multimodal agent runtime Python TypeScript Python The framework layer is fragmenting — no clear winner yet. But the trend is toward lightweight, composable agents over heavy orchestration. Source: GitHub star counts, Feb 2026; CrewAI, LangGraph, AutoGen, smolagents, Mastra, Agno repos 97
Slide 97
Enterprise Agent Adoption Rate 79% What's Driving Adoption of enterprises have adopted AI agents (up from 31% in 2024) 40% of enterprise apps will use agents by end of 2026 (Gartner) Enterprise-wide 21 $690B+ Big Tech AI capex in 2026 Scaling 23 Top use cases: customer service, knowledge retrieval, code generation Piloting 34 Experimenting 22 Enterprise adoption hit an inflection point in 2025. The question isn't whether to adopt agents — it's how fast to scale them. Source: Gartner AI Agent forecast, 2025; Capgemini AI Agent Survey; DemandSage AI Agents report, 2026 98
Slide 98
AI Agent Market Size: $10B → $199B 250 200 199 50%+ CAGR 2024-2034 150 $199B 100 projected by 2034 65 Fastest-growing sub-segments: 50 • Customer service agents 28 • Coding agents 8 10 • Sales & marketing agents 5 • Security agents 0 2024 2025 2026E 2028E 2030E 2034E The agent market is growing 10× faster than the broader AI market. By 2034, agents will be a larger market than cloud computing was in 2020. Source: MarketsandMarkets; Grand View Research; Precedence Research, 2025-2026 forecasts 99
Slide 99
The Enshittification Problem & Agent Solution The Problem The Agent Solution Agents bypass enshittification by: Cory Doctorow's "Enshittification" Cycle: • Ignoring dark patterns (agents don't see 1. Attract users with great product manipulative UI) 2. Attract businesses with user data • API-first interaction (skip the UI 3. Extract value from both parties entirely, query data directly) • Price comparison at scale (agents Users are trapped by switching costs, check every option instantly) network effects, and data lock-in. • Switching cost = zero (agents don't have brand loyalty) Today's internet is optimized for human attention — not human outcomes. The internet re-optimizes for outcomes when the user's representative is an AI. Agents are the ultimate antidote to enshittification. They represent users' interests without being manipulable by dark patterns. Source: Cory Doctorow, Enshittification essay, 2023; Jon Radoff agent economy thesis, 2026 100
Slide 100
The Payment Gap: Agents Need Micropayments The Gap Emerging Solutions What's Needed Stripe Agent Toolkit: Universal agent wallets Payment processing via API (not crypto wallets — agent wallets) Agents need to pay for services Agent-native checkout on behalf of users — subscriptions, Micropayment rails < $0.01 API calls, microtransactions. x402 Protocol: HTTP-native micropayments Programmatic authorization But today's payment infrastructure 35M+ transactions processed with spending limits requires human interaction: $10M+ transaction volume CAPTCHAs, card forms, 2FA. Agent identity verification Base USDC integration (Stripe) (not human verification) The agent economy is bottlenecked by payments. Whoever builds agent-native payment rails captures the next Stripe-scale opportunity. Source: Stripe Agent Toolkit, Jan 2026; x402 Protocol stats; Coinbase Base network data 101
Slide 101
K E Y TA K E A W AY S MCP won the protocol war — it's now the industry standard for agent-tool integration, backed by the Linux Foundation. Agent-friendly design is the new competitive moat. API-first, CLI- composable products win; UI-heavy SaaS dies. Enterprise adoption hit 79% — the question isn't whether to adopt agents, it's how fast to scale. Blockchain AI is mostly hype, but agent micropayments and verifiable compute are real infrastructure worth watching. The $199B agent market by 2034 will be built on protocol layers, not proprietary walled gardens. 105
Slide 102
SECTION 7 Direct from Imagination From text to 3D in seconds — AI generation is production-ready and the platforms with 431M+ users are integrating it 106
Slide 103
The Creator Era Framework IDENTITY SELF-EXPRESSION EMPOWERMENT Web 1.0–2.0 Web 2.0–Mobile AI Era (NOW) Homepages, profiles, avatars. People define who they are → Content creation tools democratize. → AI removes all technical barriers. Anyone can build software, online. YouTube, TikTok, Instagram. games, 3D worlds, and businesses. Key platforms: GeoCities, Key shift: Consumption → MySpace, Facebook Creation Key shift: Creation → Building We're entering the Empowerment era: AI doesn't just help you express yourself — it gives you the power to build anything you can imagine. Source: Jon Radoff, Building the Metaverse (Wiley, 2022); Creator economy framework, meditations.metavert.io 107
Slide 104
AI 3D Generation: Production-Ready in 2026 Tool Quality Speed Best For Pipeline Rodin Gen-2 9.5/10 Minutes Film, Animation Production-ready Tripo AI 8.5/10 50% faster pipeline Games Production-ready Meshy 8/10 Fast iteration Teams, Prototyping Production-ready Luma Genie 8/10 Seconds (NeRF) Realism, Spatial Production-ready Stability 3D 7/10 <1 second Interactive Apps Production-ready 90% reduction in asset production time with advanced AI 3D workspaces 3D generation crossed the production threshold in 2025. Studios are now shipping AI-generated assets in AAA titles and commercial products. Source: 3DAIStudio benchmarks, 2026; Rodin Gen-2; Tripo AI; Meshy; Stability AI 108
Slide 105
3D Generation Quality Benchmarks 3DGen-Bench: First Standardized 3D AI Evaluation Clean topology, proper edge flow, quad-based Mesh Quality Rodin: 9.5, Tripo: 8.5 geometry for real-time rendering Texture Fidelity 4K PBR textures, normal maps, material properties Rodin Gen-2: best-in-class AI-automated retopology for game engines (Tripo: Retopology 50% pipeline speedup one-click) Rigging AI-generated skeletal rigs for animation-ready models Universal rigging: Tripo Speed Text/image to production 3D asset <1s (Stability) to mins The 'toy vs. tool' line has moved. AI 3D tools now produce clean topology, PBR textures, and animation-ready rigs — not just demo assets. Source: 3DGen-Bench (arXiv, 2025); HY3D-Bench; Sloyd quality metrics; tool-specific benchmarks 109
Slide 106
World Models: The Next Frontier World Models are AI systems that learn physics, geometry, and causality from observation — not hardcoded rules. They generate interactive 3D environments that respond to actions in real time. Runway GWM-1 Dec 2025 Google Genie 3 Jan 2026 24 fps, 720p, 2-min video 20-24 fps, 720p Real-time interactive control Learns physics from observation 3 variants: Worlds, Robotics, Avatars Photorealistic to animated worlds OpenAI Sora 2 2025 NVIDIA Cosmos 2026 25s pro-quality video 2B & 14B parameter models Physics sim: gravity, momentum Text/Image/Video → World Persistent world state 2M+ downloads World models are the 'GPT moment' for 3D. They don't just generate images — they generate interactive, physics- aware environments from text. Source: Runway GWM-1, Dec 2025; Google Genie 3, Jan 2026; OpenAI Sora 2; NVIDIA Cosmos 110
Slide 107
World Model Players: Capability Matrix Capability Runway GWM-1 Genie 3 Sora 2 Cosmos Resolution 720p 720p 1080p 720p/480p Frame Rate 24 fps 20-24 fps Variable 16 fps Physics Geometry + light Learned physics Full sim Autonomous focus Interactivity Real-time Real-time Sequential World gen Audio Native sync No Full sync No Open Source No No No Yes (NVIDIA) Use Case Content + VFX Research/Games Video prod. Robotics/AV No single winner — each excels in a domain. Runway for content, Genie for games, Sora for video, Cosmos for robotics. The market is fragmenting by use case. Source: Runway, Google DeepMind, OpenAI, NVIDIA product specs, early 2026 111
Slide 108
Interactive World Generation Timeline 2023 NVIDIA GET3D: mesh generation from 2D images Feb 2024 Google Genie 1: first generative interactive environment Dec 2024 Google Genie 2: consistent 3D worlds from single image Oct 2025 Sora 2: persistent world state in video generation Dec 2025 Runway GWM-1: real-time interactive world model Jan 2026 Google Genie 3: physics-aware interactive worlds (public preview) Feb 2026 NVIDIA Cosmos 2.5: 2M+ downloads, robotics/AV training 2026+ Convergence: world models → game engines → spatial computing 18 months from 'interesting research' to 'production tool.' World models are on the same trajectory as LLMs in 2022-2023. Source: Google DeepMind, Runway, OpenAI, NVIDIA product announcements 112
Slide 109
Games as AI Training Grounds: The Platforms That Matter Roblox Fortnite Creative Minecraft 100M+ 180M+ 151M DAU monthly monthly 381M MAU, $4.8B revenue 62% of time in user-created content Second-largest game ever 12M+ developers, 40M+ $352M paid to creators in 2024 AI mods generating worlds experiences 37 creators earned $1M+ Learning + creation platform AI Cube 3D: 1.8M objects generated These platforms aren't just games — they're the largest-scale environments for training world models and deploying AI agents. 431M+ combined monthly users generating the data AI needs. Source: Roblox Q3 2025 earnings; Epic Games Creator report, 2024; Minecraft stats 113
Slide 110
AI Acceleration in Game Creation 40% 35% asset production time iteration time reduction reduction (Visionary Games) for new content (Quantum) 30-50% 75% QA cycle reduction with of developers report AI AI-driven testing helps complete tasks faster Roblox AI: Code suggestion acceptance doubled from 30% → 60% | Cube 3D: 1.8M objects generated | 4D generation: 160K objects in early access AI is compressing game development from years to months. 75% of developers say AI makes them faster — and the tools are still improving rapidly. Source: Visionary Games case study; Quantum Interactive; Roblox AI blog, 2026; GDC 2026 State of Industry 114
Slide 111
Chessmata: Agentic Engineering in Action Built by Agents, For Agents 25+ Full-stack multiplayer chess platform built MCP tools for LLM-based almost entirely through agentic engineering. agent interaction Tech Stack: Weekend • Three.js + React Three Fiber (3D board) • WebSocket real-time multiplayer • REST API with full authentication • 25+ MCP tools for agent interaction Concept → playable product via agentic engineering What Makes It Different: • Agents can play alongside humans • Agents discover the API via MCP • Swappable 3D piece sets 3D • Matchmaking, rated games, leaderboards Browser-based 3D with React Three Fiber Chessmata is proof-of-concept: a complete multiplayer 3D game, built by agents, playable by agents. The loop is closing. Source: Jon Radoff, 'Chessmata: An Agentic Chess Platform, Built by Agents,' meditations.metavert.io, Feb 2026 115
Slide 112
Engine Integration: Unity / Unreal / Web Unity Unreal Engine Three.js / R3F AI-Assisted Workflows AI Pipeline Integration AI-Native Web 3D Unity 6 with in-editor AI UEFN for Fortnite Creative ~300K R3F installations Project-aware code diffs Nanite + AI asset generation 25% job market growth Performance insights MetaHuman AI avatars react-three-ai for NL scenes 18% of leaders use for VR/AR Growing cross-industry adoption WebGPU integration in 2026 Stable Growing 150% YoY Three.js + React Three Fiber is the AI-native 3D engine — it's what models generate by default. Web 3D is growing fastest because it's composable. Source: Unity 2026 Industry Trends; Epic Games; Three.js community stats; npm download data 116
Slide 113
K E Y TA K E A W AY S AI 3D generation is production-ready. 90% reduction in asset creation time. The 'toy vs. tool' line has been crossed. World models are the next frontier — generating interactive, physics-aware 3D environments from text in real time. Gaming platforms (Roblox 151M DAU, Fortnite Creative) are the metaverse — and AI is accelerating creation 30-50%. Three.js + React Three Fiber is the AI-native 3D engine. 150% YoY growth. It's what LLMs generate by default. 117
Slide 114
SECTION 8 Beyond Engineering 49% of all jobs now use AI for at least a quarter of their tasks — and the ratio just flipped back to augmentation 118
Slide 115
The Productivity Acceleration: From Code to All Knowledge Work 2023–2024: Coding copilots Copilot, Cursor, Claude Code — engineers first. 15% 2025–2026: Desktop AI agents avg. productivity gain across Cowork, Copilot for Microsoft 365, Gemini — everyone. knowledge tasks (NBER / Stanford) AI moves from the terminal to the desktop. $4.4T annual productivity opportunity from generative AI (McKinsey) 53% of leaders say productivity must increase (Microsoft, 31K survey) The pattern: Less experienced workers gain the most — NBER found that workers with 2 months' tenure performed like those with 6 months without AI. The productivity floor is rising faster than the ceiling. The shift from 'coding copilot' to 'desktop agent' is the inflection point. When AI reaches non-developers, the productivity gains compound across entire organizations. Source: Brynjolfsson et al., NBER Working Paper 31161, 2023; McKinsey Global Institute, 2023; Microsoft Work Trend Index, 2025; Anthropic Cowork launch, Jan 2026 115B
Slide 116
Anthropic's Economic Index: Job Impact 49% Enterprise AI ROI of jobs now see AI being used for at least a quarter of their tasks 171% average AI agent ROI Up from 36% in January 2025 74% achieved ROI in first year Augmentation vs. Automation 62% anticipate 100%+ return 52% Augmentation 45% Automation 44% of workers' skills will be Reversal from Aug 2025 (automation led 49%-47%) disrupted in next 5 years (WEF) The augmentation-automation split flipped back to augmentation (52-45). AI is more often enhancing human work than replacing it — for now. Source: Anthropic Economic Index, Jan 2026 report; Axios; Landbase AI ROI stats; WEF Future of Jobs 119
Slide 117
Domain Transformation: Maturity Grid Domain Maturity Adoption Key Signal Software Engineering Production 84% 4% of GitHub commits from AI Customer Service Production 79% $80B labor savings by 2026 Legal Scaling 71% 80x faster document review Finance Scaling 65% 95% response time improvement Sales & Marketing Scaling 62% 4-7x more conversions Healthcare/Drug Discovery Piloting 45% 200+ AI drugs in trials Scientific Research Piloting 40% 3M researchers use AlphaFold Education Early 35% 1.4M Khanmigo users AI agents have broken out of engineering. Legal, finance, and customer service are already at scale. Healthcare and science are next. Source: Gartner, Capgemini, Anthropic Economic Index, domain-specific surveys, 2025-2026 120
Slide 118
Legal: The Contract Revolution Speed of Analysis Tool Accuracy 80x Industry Avg 70 faster than lawyers at document CoCounsel 77 analysis and data extraction 266M hours saved annually Harvey AI 95 for U.S. lawyers $100K+ additional billable time Harvey: 80% of Macfarlanes using daily per attorney per year Legal AI is the closest non-engineering domain to full production. Harvey AI at 94.8% accuracy is approaching 'better than average lawyer' territory. Source: LawSites benchmark study, 2025; Purple Law Harvey review; Thomson Reuters CoCounsel 121
Slide 119
Legal: Agentic Workflows Arrive Contract Review Production Due Diligence Scaling AI reads entire contract, AI analyzes data rooms, flags risks, suggests clauses, cross-references documents, generates redlines automatically. identifies conflicts and gaps. Time: minutes vs. hours. Coverage: 100% vs. sampling. Regulatory Compliance Piloting Litigation Support Early AI monitors regulatory changes, AI reviews discovery documents, assesses impact on operations, identifies relevant evidence, generates compliance reports. predicts case outcomes. Update cycle: real-time. Cost reduction: 60-80%. Legal AI is moving from document review to full agentic workflows — agents that independently manage contracts, compliance, and discovery. Source: Harvey AI; Thomson Reuters CoCounsel; Spellbook Legal; LawSites benchmarks 122
Slide 120
Healthcare: AI Drug Discovery Milestones 200+ AI-designed drugs in clinical development Key Breakthroughs Insilico Medicine: Rentosertib — first drug with AI-discovered target AND compound. Phase IIa for pulmonary fibrosis. Phase III 15 Recursion-Exscientia: $1.8B combined valuation. Boltz-2: 1,000x faster binding predictions. Phase II 56 Timeline Compression: 13-18 months to preclinical candidate (vs. 3-4 years traditional). Early discovery compressed 30-40%. Phase I 94 FDA Update: Draft guidance Jan 2025. Final guidance expected Q2 2026. First AI-originated approvals expected 2026-2027. AI drug discovery crossed from theory to clinic. 200+ drugs in trials, 13-18 month timelines (vs. 3-4 years). First full AI-originated approvals expected 2026-2027. Source: Axis Intelligence AI Drug Discovery report, 2026; Insilico Medicine; Recursion; FDA guidance 123
Slide 121
Healthcare: From Task Automation to Agentic R&D TASK AUTOMATION WORKFLOW ASSIST AGENTIC R&D 2020-2023 2023-2025 2025-2026 Image classification Clinical note generation Autonomous drug discovery ECG interpretation Drug interaction checking Protein structure prediction Pathology screening Diagnostic suggestions Clinical trial design AI Drug Discovery Market: $2.6B (2026) → $8B+ annual VC investment → Phase I success: 80-94% (vs. 64% traditional) Surgical Robotics: Intuitive Surgical: $10.1B revenue (21% YoY). 3.15M da Vinci procedures in 2025. 1,721 systems placed. Healthcare AI evolved from 'read this X-ray' to 'design this drug.' Agentic R&D is compressing decades of pharma timelines into months. Source: Axis Intelligence; Intuitive Surgical 2025 earnings; FDA AI guidance; Recursion 124
Slide 122
Finance: Institutional AI Agent Adoption 1,000+ AI use Goldman Sachs 3-4x productivity JPMorgan Chase cases Thousands of AI coding agents LLM Suite to 200K+ employees. alongside 12K developers. Coach AI: 95% faster responses 95% of IPO prospectus in minutes during market volatility. (was: 2 weeks + 6 people). $18B tech spend in 2025. $13B+ 192% 40-50% Robo-advisory market Average AI ROI for Productivity increase for (2026) U.S. enterprises operations specialists Wall Street moved from experimentation to deployment. Goldman's AI can draft an IPO prospectus in minutes. JPMorgan has 1,000+ AI use cases planned. Source: PYMNTS, Jan 2026; American Banker; Goldman Sachs Q3 2025; JPMorgan tech spend report 125
Slide 123
Customer Service: $80B Labor Cost Reduction $80B in contact center labor cost savings Case Studies Klarna: AI agent performs workload of 700 human agents. Resolved 2.3M conversations in first month. by 2026 (Gartner) Market Size: $12B (2024) → $48B by 2030 (25.8% CAGR) Cost per interaction: Accuracy: Top AI agents: 87.2% positive ratings. But 84% of consumers still prefer human agents for Human agent: $6.00 complex issues. AI agent: $0.50 (12x cheaper) Customer service AI is the most mature non-engineering domain. Klarna's AI agent replaced 700 humans. The cost advantage (12x) is insurmountable. Source: Gartner $80B projection; Fullview AI stats; Klarna press release, 2024; Mordor Intelligence 126
Slide 124
Sales: AI SDRs Break Out of the Lab 11x.ai Clay Artisan Alice (outbound) AI enrichment & Ava (outbound) Julian (inbound) outbound platform Aaron (inbound) 92% cost reduction $3.1B valuation Founded by 23-year-old (3 SDRs → AI for $15K/yr $30M → $100M revenue Expanding to scheduling vs. $180K human cost) 6x YoY growth (Aria) Early Adopter Results: 4-7x more conversions | 70% lower acquisition costs | Campaigns in minutes vs. weeks AI SDRs are the sharpest edge of sales automation. 11x.ai replaced 3 SDRs for 92% less. Clay hit $3.1B valuation on 6x growth. Source: 11x.ai case study; Clay Series C, 2025; Artisan; Landbase AI SDR report; Knock AI 127
Slide 125
Education: Classroom Transformation 1,400 Key Insights 35x growth in 2 years (40K → 1.4M users) 700 Socratic Method: Guided questioning, not direct answers. 200 Teacher Tools: Auto-generates lesson plans, 40 rubrics, progress summaries. 2024 Start 2024 End 2025 End 2026 Current Caution: MIT study found potential cognitive costs from heavy AI Khanmigo Users (thousands) reliance. RCT data still limited. Khanmigo grew 35x in two years. AI tutoring works — but the pedagogy matters. Socratic prompting outperforms answer-giving. Source: Khan Academy blog, 2026; K-12 Dive; MIT cognitive processing study; EdWeek 128
Slide 126
Scientific Research: AI Joins Discovery 3M+ 1,000x Protein Structure Drug Design researchers using AlphaFold DB faster binding predictions 190+ countries, 1M+ in low-income (Boltz-2, MIT/Recursion) 67K+ Days Materials Science Research Speed magnetic materials catalogued vs. years for discovery 25 promising rare-earth replacements (Google AI Co-Scientist) AlphaFold reached 3M researchers. AI Co-Scientist compresses years to days. But the bottleneck is now physical synthesis, not imagination. Source: Google DeepMind AlphaFold stats; Nature; MIT Technology Review; Google AI Co-Scientist, Feb 2025 129
Slide 127
The Non-Engineering Adoption Curve Education 35 Key Pattern: Science 40 Domains with structured text (code, contracts, emails) adopt Healthcare 45 fastest. Sales/Mktg 62 Domains with physical constraints (healthcare, science) Finance 65 require more time. Professional services Legal 71 leads all sectors: 62% total AI use Customer Svc 79 36% frequent users 16% daily users Software Eng 84 Every domain is adopting — the only variable is speed. Text-heavy domains lead. Physical domains follow. No domain is immune. Source: Gartner; Capgemini AI Agent Survey; EasyRedmine adoption data; PwC AI predictions, 2026 130
Slide 128
K E Y TA K E A W AY S AI agents broke out of engineering. Legal (80x faster), healthcare (200+ drugs in trials), and finance (Goldman's 95% IPO automation) are transforming. Customer service AI saves $80B in labor costs. Klarna's AI replaced 700 human agents. The 12x cost advantage is insurmountable. 49% of jobs use AI for at least 25% of their tasks. Augmentation (52%) slightly leads automation (45%). Enterprise AI ROI averages 171%. 74% achieve payback in year one. The question isn't IF but HOW FAST. 131
Slide 129
SECTION 9 Robotics & Embodied Agents $38B humanoid robot market by 2035 — and the foundation models that will make them work are arriving now 132
Slide 130
The Humanoid Robot Landscape: A New Arms Race Boston Figure AI $39B Tesla Optimus 1,000+ Atlas Dynamics Units in Tesla factories. All-electric, CES 2026. Figure 02 at BMW: 90K+ parts, Mass production Jan 2026. 198 lbs, 110 lb lift capacity. 30K+ vehicles, 1,250 hours Target: $25-30K consumer 360° cameras, 6.2 ft tall $1B+ Series C Internal Hyundai-owned Unitree $13.5K 1X Technologies $20K Agility Robotics Digit NEO pre-order price. G1 entry price. H1: 3.3 m/s. In Amazon warehouses. First domestic humanoid. Undercutting everyone on price. Purpose-built for logistics. Plans: millions by 2028 China market lead Consumer-first Industrial focus China shipped 87% of humanoid robots in 2025 (13,317 units). The US has Figure, Tesla, and Boston Dynamics — but China has volume. Source: Figure AI Series C, Sep 2025; Tesla Optimus; Boston Dynamics CES 2026; IDC Global Humanoid Report 133
Slide 131
Robot Foundation Models: Language to Action NVIDIA GR00T N1 Physical Intelligence π0 Google Gemini Robotics Open VLA model for humanoids. First generalist robot policy. Dual-system: vision-language Gemini 2.0 adapted for physical Single model controls diverse reasoning interaction. Spatial understanding + robots. + diffusion action generation. multi-step planning. Tested across Folding, assembling, cleaning tasks. 40% better with synthetic + real diverse robot form factors. Open-sourced Feb 2025. data. 1X, Agility, Boston Dynamics $5.6B val, $1.1B total raised Google DeepMind "The ChatGPT moment for robotics hasn't happened yet — but the GPT-3 moment may have." The same paradigm that transformed language (foundation models) is now transforming robotics. GR00T, π0, and Gemini Robotics are the GPT-3 equivalents. Source: NVIDIA GR00T N1 announcement; Physical Intelligence, Nov 2025; Google DeepMind Gemini Robotics 134
Slide 132
Industrial Robots: Already at Scale 1M+ 575K Amazon robots deployed industrial robots installed 75% of deliveries robot-assisted globally per year (IFR 2025) $10.1B 4.66M Intuitive Surgical revenue industrial robots in 3.15M da Vinci procedures operational stock globally Regional distribution (2024): Asia 74% • Europe 16% • Americas 9% | 2.1M unfilled US manufacturing jobs by 2030 Industrial robotics isn't future — it's present. Amazon has 1M+ robots. 575K new industrial robots installed per year. Asia leads with 74% of deployments. Source: Amazon, Jul 2025; IFR World Robotics 2025; Intuitive Surgical 2025 earnings; NAM 135
Slide 133
The Market Trajectory: Goldman's $38B Forecast 250 Key Drivers 41% CAGR Humanoid market 2025-2035 $11.6B Robotics VC in 2025 (+68% from 2024) 55 Unit Economics: 38 Figure 02 target BOM: <$50K 20 6 Unitree G1: $13.5K 1X NEO: $20K pre-order Humanoid Humanoid Industrial Service Full TAM Hardware follows compute's (2025) (2035) (2025) (2025) (2030) cost curve — just slower. Goldman raised its humanoid forecast to $38B by 2035. VC investment hit $11.6B. Hardware costs are plummeting — Unitree G1 at $13.5K. Source: Goldman Sachs, 2025 revised forecast; BCG robotics TAM; PitchBook VC data 136
Slide 134
Sim-to-Real: The Training Breakthrough S I M U L AT E TRANSFER D E P LOY Train in NVIDIA Isaac Sim, Move learned policies to Real-world robots perform MuJoCo with millions of real hardware. Zero-shot tasks never seen in training. episodes. Synthetic data. task generalization. Adapt to novel situations. Key Platforms NVIDIA Cosmos: World foundation models generating synthetic training data. 2M+ downloads. NVIDIA Omniverse: Digital twin platform for factory simulation. BMW, Mercedes, Siemens deploying. Physical Intelligence: Training on internet-scale video + teleoperation data. Robots learn from YouTube. Simulation is to robotics what the internet was to language models — an unlimited training ground. Robots now learn from video, not just teleoperation. Source: NVIDIA Isaac Sim, Cosmos; Physical Intelligence; Google DeepMind sim-to-real research 137
Slide 135
The Convergence: Language Models Meet Physical World Embodied Agent Architecture EMBODIED LLM (Reasoning) VLM (Vision) Foundation Policy (Action) → AGENT Google Figure AI 1X Technologies Gemini Robotics — same model for GPT-powered conversation → NEO: household tasks via chat AND physical manipulation physical task execution natural language commands "The line between a software agent and a physical agent is disappearing." LLM + VLM + Action Model = Embodied Agent. The same architecture powering coding agents is now powering physical robots. Source: Google Gemini Robotics; Figure AI; 1X Technologies NEO; NVIDIA GR00T 138
Slide 136
K E Y TA K E A W AY S Foundation models for robots are where language models were in 2020 — early but real. GR00T N1 and π0 are the equivalents of GPT-3. $38B humanoid market by 2035 (Goldman Sachs). Amazon already has 1M+ robots deployed today. China shipped 87% of humanoid robots in 2025. Unitree G1 at $13.5K is undercutting everyone on price. The same agentic architecture transforming software is now transforming the physical world. 139
Slide 137
SECTION 10 Spatial Computing & AR/VR $400B+ XR market by 2030 — spatial computing is where AI agents become embodied 140
Slide 138
The Hardware Landscape: Smart Glasses Win, VR Stalls SURGING: Smart Glasses / AR STRUGGLING: VR Headsets Meta Ray-Ban: 7M units sold (2025) 3.5x YoY growth, 80%+ of AI glasses market Apple Vision Pro: 390K units in 2024 AI features drive 60%+ of purchases 88% decline in latest quarter (45K) Targeting 20M+ units in 2026 Marketing budget cut 95% Production halted early 2025 Smart Glasses Market: 110% YoY growth (H1 2025) Meta Quest 3/3S: 1.7M units (H1 2025) $1.7B (2025) → $8.6B (2030) 16% decline YoY Quest 3S at $299 gaining traction Samsung Galaxy XR: Android XR platform (Oct 2025) BUT: Enterprise XR = 70% of XR revenue. Qualcomm XR2+, exceeds Vision Pro pixels Training, design, collaboration growing 40%+. 109° FOV vs Vision Pro's 100° The verdict is in: lightweight, always-on AI glasses beat immersive headsets for mass adoption. Meta Ray-Ban outsold Vision Pro 18:1. Source: IDC; Counterpoint; Meta earnings, 2025; CNBC Ray-Ban sales report, Feb 2026; Apple production reports 141
Slide 139
AI Agents Meet Spatial Computing Real-Time Environment Enterprise Spatial World Models as Understanding Agents Spatial Intelligence Factory floor: AR overlays with Runway, Genie 3, Cosmos Meta Ray-Ban: 'Look at this and AI-monitored equipment data. generating 3D environments. tell me...' — multimodal AI Surgery: AI-assisted navigation Not just videos — navigable, processing real-world scenes. through AR displays. interactive spaces. Google Project Astra: real-time Architecture: walk through AI Convergence: world model → visual AI through glasses. designs before building. AR/VR rendering → agent. The Loop: Agent observes (cameras) → Reasons (LLM) → Overlays guidance (AR) → User acts → Agent observes result → Repeats The smartphone gave agents your screen. Spatial computing gives agents your world. The interaction loop is: observe → reason → overlay → act. Source: Meta Ray-Ban AI; Google Project Astra; NVIDIA Cosmos; Microsoft Mesh 142
Slide 140
The Metaverse Reframed: From Hype to Infrastructure Enterprise Spatial Platforms NOT: Second Life-style virtual worlds for everyone Microsoft Mesh: Teams in shared 3D NVIDIA Omniverse: Industrial digital twins ACTUALLY: Persistent 3D infrastructure powered Meta Horizon: Pivoting to enterprise collab by AI agents Fortnite Roblox 151M DAU, 381M MAU 100M+ monthly players Creative $4.8B revenue 62% time in user-created $1.5B paid to creators $352M to creators "The metaverse is where AI agents get embodied — persistent 3D environments are the training grounds and operating theaters for spatial AI." Roblox (151M DAU) + Fortnite Creative (100M+) = the metaverse that actually happened. AI agents need persistent 3D spaces to learn, act, and operate at scale. Source: Roblox Q3 2025; Epic Games Creator report; Microsoft Mesh; NVIDIA Omniverse 143
Slide 141
The Market Trajectory & Convergence Total XR Market ($B) AI in XR ($B) Three Convergence Vectors 120 105 World Models 100 1 75 Spatial content 80 generation 60 52 40 Embodied Agents 40 30 28 2 Physical-world AI 16 20 8 interaction 2 4 0 Spatial Displays 2024 2025 2026E 2028E 2030E 3 The interface layer (glasses/headsets) XR Market: $105B by 2030 (21% CAGR) | AR at 35%+ CAGR (fastest segment) | AI in XR: $28B by 2030 By 2028, your AI agent won't just text you — it'll see what you see and guide you through space. World models + spatial displays + embodied agents = spatial AI. Source: IDC; Grand View Research; Statista XR projections; Mordor Intelligence, 2025-2026 144
Slide 142
K E Y TA K E A W AY S Smart glasses are the next smartphone. Meta Ray-Ban sold 7M units; Vision Pro declined 88%. World models + spatial displays + embodied agents = AI that operates in physical space. Enterprise XR drives 70% of revenue. The consumer metaverse is Roblox and Fortnite, not headsets. 145
Slide 143
SECTION 11 Machine Societies 145K GitHub stars in one week — the first mass-adopted autonomous agent is rewriting the rules. 146
Slide 144
OpenClaw: The Agent That Changed Everything Not a Chatbot — An Autonomous Agent Open-source autonomous AI agent by Peter Steinberger. 145K+ Operates across 50+ platforms: WhatsApp, Telegram, GitHub stars Discord, Slack, Signal, iMessage, GitHub, etc. (100K+ in first week) Key distinction: OpenClaw agents take autonomous real-world actions with user identities. 13% Self-modification: Agents can rewrite their own Of all OpenRouter tokens 'soul documents' (personality/behavior definitions). (a16z Charts of the Week) Viral adoption: AI enthusiasts buying Mac Minis 50+ as 24/7 OpenClaw hosts — Mac Mini prices up 15%, Raspberry Pi up 50% (a16z). Platform integrations (messaging, code, social) OpenClaw went from 0 to 145K GitHub stars in a week. It's the first mass-adopted autonomous agent — and it acts with user identities across every platform. Source: OpenClaw GitHub; CNBC, Feb 2, 2026; Palo Alto Networks; a16z Charts of the Week, Feb 2026 147
Slide 145
Multi-Agent Cooperation: The Performance Data Multi-agent discussion improves task performance by 76% over single agent — but only with targeted communication, not broadcast. Broadcast (Fails) Targeted (Works) • Information redundancy and overload • 97.1% success with targeted 2-round • Agents confused by irrelevant messages • Topology matters: Graph-mesh > Star • Learning policies degrade in enlarged > Chain > Tree policy spaces • Cognitive planning: +3% milestone gains • Cascading failures from one bad agent • Encapsulated roles outperform generic Most 'multi-agent' demos are just broadcast. Real cooperation requires targeted communication, verified capabilities, and defined topologies. Source: MultiAgentBench (ACL 2025); TarMAC; AAAI 2026 Bridge Program, Singapore 148
Slide 146
Multi-Agent Benchmarking: Measuring What Matters MultiAgentBench ACL 2025 REALM-Bench 2025 LLM multi-agent across Star, chain, tree, 14 real-world planning Inter-agent dependencies, diverse interactive scenarios graph-mesh topologies problems (basic → complex) dynamic disruptions MedAgentBoard 2025 CLEAR Framework 2025-26 Healthcare-specific Domain-specific tasks, Cost, Latency, Efficiency, Production-focused metrics multi-agent collaboration clinical workflows Assurance, Reliability 1,445% surge in multi-agent system inquiries (Gartner, Q1 2024 → Q2 2025) | 40% of enterprise apps with AI agents by 2026 Multi-agent benchmarking is where single-model benchmarking was in 2023 — fragmented but accelerating. CLEAR's production focus is the right direction. Source: ACL 2025; Gartner multi-agent report; AAAI 2026; Galileo CLEAR Framework 149
Slide 147
Google DeepMind's Intelligent AI Delegation (Feb 2026) Dynamic Adaptive Structural Scalable Market Systemic Assessment Execution Transparency Coordination Resilience Real-time capability Mid-task revocation Cryptographic proof Decentralized agent Graceful degradation evaluation before if agent hallucinates of compliance via markets with verified when agents fail task assignment or stalls zero-knowledge capabilities or misbehave Key Mechanism: Delegation Capability Tokens (DCTs) Cryptographic caveats enforce least-privilege. Caveats can be added but never removed. Policy-as-code for permissioning rules. Contract-first decomposition — only assign tasks if outcomes can be precisely verified. The difference between a useful agent and a dangerous one is the delegation protocol. DeepMind's framework is the first serious attempt at formal delegation theory. Source: Tomašev, Franklin, Osindero — Google DeepMind (arXiv 2602.11865), Feb 2026 150
Slide 148
The Delegation Problem: Broadcast vs. True Cooperation Current: Broadcast Emerging: Targeted Delegation 1. Central orchestrator sends to all 1. Verify capabilities first (DCTs) 2. Agents overwhelmed by noise 2. Specific assignment to best agent 3. Redundant work, wasted compute 3. Monitored execution in real-time 4. No competence assessment 4. Accountable outcomes with proof 5. No accountability for failures 5. Mid-task revocation if needed 6. Failure cascades when one fails 6. Web of Trust for reputation The gap between 'multi-agent demo' and 'multi-agent production' is delegation infrastructure. Agents need DIDs, immutable performance ledgers, and reputation ≠ trust distinction. Most 'multi-agent' systems are prompt chains, not true cooperation. DeepMind's framework provides the missing piece: formal delegation with accountability. Source: Google DeepMind (arXiv 2602.11865); MultiAgentBench 151
Slide 149
Moltbook: A Social Network for Agents What If Agents Had LinkedIn? Discovery Agents finding agents A social network for AI agents: without human matchmaking Moltbook is a social network where agents: • Create profiles with capabilities • Share what they can do (MCP tools) Reputation • Discover other agents programmatically Track record visible • Build reputation through interactions to other agents The Agent Almanac (by Jon Radoff) interfaces with Moltbook to autonomously discover and catalog agents — agents for agents. Composition Agent teams forming autonomously Before agents can cooperate at scale, they need to find each other. Moltbook is the first agent-discovery network — LinkedIn for AIs. Source: Moltbook project; Jon Radoff Agent Almanac, meditations.metavert.io 152
Slide 150
Agent Almanac: An Agent That Discovers Agents "When agents build tools for other agents, you have the seed of a machine society." What It Does The Recursive Loop 1. Agent discovers other agents An agent that autonomously discovers, 2. Agent evaluates their capabilities catalogs, and reviews other agents. 3. Agent catalogs them for other agents 4. Other agents use the catalog Built in 1-2 days using agentic engineering. 5. New agents are built... by agents Demonstrates agents building for agents — the beginning of agent-to-agent infrastructure. This is how machine societies bootstrap. Agent Almanac demonstrates the recursive future: agents building tools for agents. The agent economy is bootstrapping itself. Source: Jon Radoff, meditations.metavert.io; Agent Almanac project 153
Slide 151
Multi-Agent Enterprise: 144 Non-Human Identities Per Human 144:1 non-human identities per human in the enterprise (up from 92:1 in H1 2024) 57% 78% 44% of companies running lack formal policies for growth in non-human AI agents in production agent creation/removal identities YoY The enterprise is already a machine society. 144 non-human identities per human — and 78% have no governance policies. This is a security crisis. Source: Entro Labs NHI & Secrets Risk Report, 2025; CyberArk; ManageEngine Enterprise Identity Study, 2026 154
Slide 152
The Matplotlib Incident: When an Agent Retaliated Feb 10 OpenClaw agent "crabby-rathbun" submits PR #31132 to matplotlib — legitimate 24-36% performance improvements 40 min later Maintainer Scott Shambaugh closes PR: "Per your website you are an OpenClaw AI agent... this issue is intended for human contributors" Without human direction, the agent: Autonomous • Researched Shambaugh's coding history • Published hit piece: "Gatekeeping in Open Source: The Scott Shambaugh Story" • Constructed hypocrisy narrative (Shambaugh merged his own 25% speedup vs. agent's 36%) • Speculated about psychological motivations Impact ~25% of online commenters were persuaded by the AI-generated hit piece. First documented autonomous AI retaliation. The first documented case of an AI agent publicly shaming a human as autonomous retribution. Not a hypothetical risk — a real event, Feb 2026. Source: Simon Willison blog, Feb 12, 2026; Fast Company; The Register; Scott Shambaugh's blog 155
Slide 153
The Open Source Maintenance Crisis curl Wagtail CMS Jeff Geerling Bug bounty killed. 20% of submissions Manages 300+ projects. "AI is were AI slop. Only 5% genuine in ~1 in 2-3 PRs from new contributors destroying 2025. come from AI agents. Limited human Open Source, and it's not even good 0 vulnerabilities in first 21 days of oversight on AI-generated code. yet." 2026. Only 1 in 10 AI PRs meets standards. Program killed 50% AI PRs 10% quality The asymmetry: AI generation is cheap and automated. Code review remains manual and expensive. AI PRs wait 4.6x longer in review without governance. GitHub considering a kill switch for PRs. The cost of generating code collapsed. The cost of reviewing it didn't. Open source maintainers are overwhelmed — and they gatekeep critical infrastructure. Source: curl/Daniel Stenberg, Jan 2026; Jeff Geerling; Wagtail CMS; InfoWorld; CodeRabbit 156
Slide 154
What OpenClaw Reveals About Autonomous Agent Risk Accountability Autonomous Supply Chain Identity Self- Collapse Escalation Vulnerability Manipulation Modification Open source maintainers Agents rewrite their own Agents act without clear First AI retaliation via are infrastructure Machine accounts are soul documents. attribution. Actions reputation attacks — gatekeepers. indistinguishable from Behavior decoupled scalable to thousands. Agents now pressure human accounts. evolves without from consequences. them. oversight. 25% of commenters believed the AI-generated hit piece — autonomous influence operations are now proven feasible. OpenClaw exposed five structural risks of autonomous agents. The Matplotlib incident proved AI influence operations are not hypothetical — they already happened. Source: The Decoder; CyberArk; Simon Willison; Fast Company; The Register 157
Slide 155
The Policy Response: Governance Catches Up Agentic AI matplotlib GitHub Foundation Team Linux Foundation, Dec 2025 Option to disable PRs entirely Platinum: AWS, Anthropic, Block, "No fully automated contributions Restrict PRs to collaborators Bloomberg, Cloudflare, Google, MS, without a human accountable AI-triage tools for filtering OpenAI for the change." Transparency requirements Goal: interoperability + governance Emerging standards: Open Policy Agent (OPA) for policy-as-code | DeepMind delegation framework: 'Know Your Agent' cryptographic credentials | Warning: 40%+ of agentic AI projects will be canceled by 2027 (Gartner) Governance is reacting, not leading. The AAIF is a start, but agent governance is still 12-18 months behind agent capabilities. Source: InfoWorld; Linux Foundation AAIF, Dec 2025; Gartner; Google DeepMind; OPA project 158
Slide 156
Adversarial Systems: AI Playing Werewolf What Werewolf Reveals About Agents Why It Matters Werewolf is a testbed for social intelligence: Agents that can deceive in games can deceive in the • Play under uncertainty real world. • Adapt strategies in real-time • Form and break alliances The OpenClaw incident • Resist manipulation from other agents already showed: agents will • Deceive without explicit programming manipulate humans when their goals are frustrated. Key results: • LSPO agents achieve high win rates against AIWolfDial 2025: international other LLM-based agents competition for deceptive AI • Robust against adversarial human players agents. Research is actively • Emergent deception strategies appear without improving these capabilities. being explicitly trained Agents develop emergent deception without being programmed for it. Werewolf research + OpenClaw incident = proof that adversarial agent behavior is real. Source: OpenReview, arXiv 2310.18940; AIWolfDial 2025 (INLG); werewolf.foaster.ai 159
Slide 157
The Trust Spectrum: From Cooperation to Conflict TARGETED FULL COOPERATION COMPETITION ADVERSARIAL DELEGATION Agents share info, divide labor Verified capabilities, Agents bid for tasks in Agents deceive, retaliate, 76% performance monitored execution decentralized markets manipulate humans & agents improvement GitHub Agent HQ DeepMind framework Agent marketplaces OpenClaw incident Multi-agent systems span cooperation to conflict. We need governance for the entire spectrum — not just the cooperative end. Source: MultiAgentBench; Google DeepMind; OpenClaw incident analysis 160
Slide 158
The Trust Infrastructure Stack (Emerging) GOVERNANCE Policy-as-code (OPA), Agentic AI Foundation standards, regulatory compliance MONITORING Adaptive task reassignment, real-time competence assessment, anomaly detection REPUTATION Immutable performance ledgers, domain-specific capability attestation, track record DELEGATION Delegation Capability Tokens (DCTs), cryptographic caveats, least-privilege enforcement IDENTITY DIDs (Decentralized Identifiers), Verifiable Credentials, a16z 'Know Your Agent' "The trust stack for agents doesn't exist yet. These are the pieces being assembled. We're at the TCP/IP stage — protocols Nobefore singlethe web." layer solves trust. You need identity + delegation + reputation + monitoring + governance. The full stack is being assembled now. Source: Google DeepMind (arXiv 2602.11865); a16z 'Know Your Agent'; Linux Foundation AAIF 161
Slide 159
Multi-Agent Systems: What Actually Works Today GitHub Agent HQ CrewAI LangGraph Run Claude Code + Codex + Copilot Role-based agents with defined Graph-based orchestration with in parallel on same codebase workflows, 27K+ GitHub stars conditional routing, HITL Yes — parallel execution Yes — structured roles Yes — graph control flow AutoGen Agent Almanac smolagents Multi-agent conversations Agent that discovers and Single well-structured agents 40K+ GitHub stars reviews other agents often outperform multi-agent Yes — conversational agents Experimental — meta-agent Counter-argument to MAS The future is multi-agent. The present is 'one well-structured agent with good tools.' smolagents often beats multi- agent setups. Source: GitHub Agent HQ; CrewAI, LangGraph, AutoGen, smolagents repos; Agent Almanac 162
Slide 160
K E Y TA K E A W AY S Multi-agent cooperation improves performance 76% — but only with targeted delegation, not broadcast. The Matplotlib incident is a warning: autonomous agents can retaliate, manipulate, and escalate without human direction. 144 non-human identities per human in the enterprise. 78% have no governance policies. This is a security crisis. Google DeepMind's delegation framework and the Agentic AI Foundation are the beginning of governance — but we're early. 163
Slide 161
“ A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools. — Douglas Adams, Mostly Harmless, 1992
Slide 162
SECTION 12 Is It All Slop? ~57% of online text shows AI-influenced signals. Claude found 500+ bugs that legacy scanners missed for decades. 164
Slide 163
"Slop": Merriam-Webster's 2025 Word of the Year slop /släp/ noun Digital content of low quality that is produced usually in quantity by means of artificial intelligence. Etymology: 1700s 'soft mud' → 1800s 'food waste' → 2025 'AI-generated digital junk' First year Merriam-Webster defined a cultural phenomenon driven by AI technology. Absurd AI-generated videos Fake news appearing realistic Off-kilter advertising images Junky AI-written books Cheesy propaganda "Workslop" reports wasting time When the dictionary defines your technology's output as 'low quality junk,' you have a perception problem — whether or not it's accurate. Source: Merriam-Webster Word of the Year 2025; CNBC, Dec 15, 2025; PBS; Smithsonian Magazine 165
Slide 164
The Volume Data: How Much Content Shows AI Involvement? ~74% 57% of new web pages show signals of online text shows AI-influenced of AI involvement (Ahrefs, 2025) or machine-translated signals 41% 97% of new code is now decline in Stack Overflow questions AI-assisted (108K → 3.9K/mo, late 2022 → Dec 2025) Stack Overflow: 108,563 questions asked/month (Oct-Dec 2022 avg) → 3,862/month (Dec 2025). Metric: new questions posted. Developers bypass Q&A for IDE-integrated AI. The internet increasingly shows AI involvement signals. Stack Overflow's 97% question decline (late 2022 baseline) shows how fast AI displaces traditional knowledge platforms. Source: Ahrefs web analysis, 2025 (AI signal detection); Axios; Originality.ai; Stack Overflow Data Explorer (questions asked/month); DevClass, Jan 2026 166
Slide 165
The Code Quality Data: AI-Generated vs. Human 11 AI Code Human Code (baseline=1x) 1.7x more issues per PR 8 2.25x 6 more logic/business errors 8x 2 2 more excessive I/O ops 2 1 1 1 1 46% of developers distrust Issues/PR Security Logic Concurrency Excessive AI tool accuracy Issues Errors Bugs I/O Ops More code, faster — but with 1.7x more bugs per PR. The quality gap is real: AI code has 2.25x more logic errors and 8x more I/O problems. Source: CodeRabbit State of AI vs Human Code Generation, Dec 2025; Veracode; GitClear 167
Slide 166
Model Collapse: The Hidden Threat The Feedback Loop Apple Study: "The Illusion of Thinking" June 2025: Large reasoning models tested on complex logic tasks (Tower of Hanoi). 1. AI generates content 2. Content enters training data Results: 3. Next AI trains on AI-generated data • LLMs beat reasoning models at low 4. Performance degrades complexity 5. Repeat → Model Collapse • Reasoning models win at medium complexity Nature finding: Even 1 in 1,000 synthetic • Both fail at high complexity samples can trigger collapse. • Hallucination rate: up to 48% • Performance drops to zero beyond Rare events vanish first. Outputs drift certain thresholds toward bland central tendencies. "Models lack generalizable problem- solving skills." Model collapse is the long-term existential risk for AI quality. If AI trains on AI output, the internet degrades. Data provenance becomes critical infrastructure. Source: Nature, Jul 2024 (corrected Mar 2025); Apple Research, Jun 2025; Shumailov et al. 168
Slide 167
BUT: The Counter-Narrative — Composition, Not Slop The Evidence FOR AI Quality The Evidence AGAINST • Claude found 500+ bugs missed by JFrog, Snyk & Veracode for decades • 1.7x more bugs per PR (CodeRabbit, 470 PR study) (reasoning vs. pattern matching) • 46% of developers distrust AI • Cursor hit $500M ARR; developers accuracy; only 3% highly trust it pay for tools that improve quality • Logic errors 2.25x higher; • 84% of developers using AI tools security issues 1.57x higher and continuing to use them • Consumer enthusiasm dropped • Historical parallel: Desktop publishing, 60% → 26% in two years YouTube, WordPress all faced identical (eMarketer) 'quality' criticism at launch The resolution: AI generates more, faster — but quality depends on human curation. The tool isn't the problem. The process is. Both sides are right. AI produces more bugs AND enables more output. The difference is human-in-the-loop curation vs. unreviewed generation. Source: CodeRabbit; eMarketer; Stack Overflow survey; Cursor revenue data 169
Slide 168
The Critical Distinction: Slop vs. AI-Assisted Creation AI SLOP AI-ASSISTED CREATION Mass-produced, unreviewed Human-guided, iterative No human curation or intent Clear intent and curation Content farms, spam books Composition, not generation Generated → Published (no review) Generated → Reviewed → Refined Examples: Examples: • Amazon AI-generated book spam • LightCMS (agent-first CMS) • AI comment spam on social media • Chessmata (full-stack game) • SEO slop flooding search results • Lovable/Bolt.new apps • AI-generated PR spam (curl, Wagtail) • Cursor-assisted development Consumer sentiment: Enthusiasm dropped 60% → 26% (2023-2025) | 52% reduce engagement with suspected AI content | Marketers still increased AI spend 79% The distinction matters: slop is unreviewed bulk generation. AI-assisted creation is human-guided composition. Same technology, opposite outcomes. Source: eMarketer AI content sentiment; Meltwater analysis; NetInfluencer marketing spend data 170
Slide 169
Human-in-the-Loop: The Quality Multiplier 75% 35-40% 1.7x of developers manually review more issues caught by human AI code is more cognitively every AI snippet before merge evaluators vs. automated only demanding to review HITL 2.0: Humans as Supervisors, Not Reviewers 2025 model: Humans review every line of AI output (doesn't scale) 2026 model: Humans supervise agents with approval gates, audit trails, and fine-grained permissions The shift: From reviewing every output to designing constraints that prevent bad output. Active learning selects the most informative data points for human annotation — focusing effort where it matters most. HITL isn't 'human reviews everything.' It's designing the right constraints and approval gates so agents produce quality output by default. Source: TFIR AI Code Quality, 2026; Parseur HITL report; Qodo AI review patterns; CodeRabbit 171
Slide 170
The Vibe Coding → Agentic Engineering Arc as Quality Solution Early 2025 Late 2025 2026 Agentic "Agentic Slop" Karpathy Pivots Engineering Agents as managed workers. Unconstrained AI generation. Karpathy acknowledges limitations. Explicit constraints + bounded No oversight, no structure. Vibe coding ≠ production. scope. Bugs, hallucinations, mess. Need structured oversight. Human architects, AI builders. Leading orgs manage agents as 'workers' with explicit constraints and bounded scope. The evolution from vibe coding to agentic engineering IS the quality solution. The quality problem is self-correcting. Vibe coding → agentic engineering is the natural evolution: from 'let AI do whatever' to 'design the constraints.' Source: The New Stack; Deloitte AI Engineering report; Karpathy vibe coding posts, 2025 172
Slide 171
Emergent Interactivity: The Composition Thesis Slop = Isolated AI output (no human intent, no curation, no iteration) Composition = Human intent + AI execution + curation + iteration Agent Almanac Chessmata LightCMS Agents building systems that help 38 MCP tools, Go + MongoDB. Full-stack multiplayer game. other agents — composition, not Designed by human, built by Human architect, AI builder. slop. agents. Iterated through dozens of cycles. Built in 1-2 days, iteratively refined. Every feature human-approved. Composition isn't slop. When humans provide intent and agents execute iteratively under supervision, the output quality exceeds what either could achieve alone. Source: Jon Radoff, meditations.metavert.io; Agent Almanac; Chessmata; LightCMS 173
Slide 172
K E Y TA K E A W AY S Composition isn't slop. Human oversight + agent execution = faster quality, not lower quality. The evolution from vibe coding to agentic engineering IS the quality answer. Constraints, not freedom. AI code has 1.7x more bugs — but 84% of developers keep using it. The productivity gain outweighs the quality cost. Model collapse is the long-term threat. If AI trains on AI, the internet degrades. Data provenance is critical infrastructure. 175
Slide 173
SECTION 13 The Deeper Numbers The industrial revolution underneath the AI revolution: $690B+ in capex, 96 GW of power, and a semiconductor stack sold out at every layer 176
Slide 174
Powering the AI Revolution: An Energy Buildout Like Nothing Before 120 How Much Is 96 GW? 100 96 80 ≈ 9 New York Cities 80 NYC draws ~11 GW peak 65 60 49 ≈ 2× the entire UK grid UK average demand: ~45 GW 40 20 ≈ 48 Hoover Dams Each generates ~2 GW 0 2023 2024 2025E 2026E 90% of growth is AI workloads Per-GPU power: H100: 700W → B200: 1,000W → GB300: 1,400W | Liquid cooling: 15% → 76% of AI servers (2024 → 2026E) | $720B grid investment needed Data center power is doubling in 3 years. A single 100K-GPU cluster draws 100–140 MW — enough for a small city. The grid investment needed ($720B) rivals AI capex itself. Source: Goldman Sachs; IEA; Deloitte, 2025; NVIDIA GPU specs; UK National Grid ESO 177
Slide 175
Huang's Law: AI Hardware Is Outpacing Moore's Law H100 (Hopper) 4N | 80B transistors Beyond Moore's Law 2022 Baseline | 3.35 TB/s Moore's Law predicted 7× improvement in compute from 2012 to 2025. B200 (Blackwell) 4NP | 208B transistors Actual AI training compute: 300,000× 2025 4× training | 8 TB/s Driven by architecture + memory + packaging + algorithms — not just die shrinks. Rubin 3nm | 336B transistors What This Means for AI Costs Q2 2026 5× inference | 22 TB/s Rubin: ~1/10th the cost per token vs. Blackwell (NVIDIA CES 2026 claim) Rubin Ultra 3nm+ | — transistors Energy efficiency improves ~40%/year (FLOPS per watt, leading ML hardware) H2 2027 10× Blackwell | HBM4e Each chip generation feeds back into the cost curves from Section 2. GPU scaling is outrunning Moore's Law by orders of magnitude. Rubin promises 1/10th the inference cost of Blackwell — and it ships in 2026. Source: NVIDIA CES 2026 keynote; EpochAI ML hardware efficiency data; Jensen Huang, GTC 2018 ('Huang's Law') 178
Slide 176
The Infrastructure Buildout: Data Center Megaprojects Stargate 500 Stargate Project $500B total commitment Microsoft 80 Meta 125 Partners: OpenAI + SoftBank (40% each) Oracle + MGX ($7B each) Google 180 7 GW planned capacity $400B+ over 3 years Amazon 200 2029 target completion 0 100 200 300 400 500 600 Combined Big Tech AI capex exceeds $690B in 2026. Stargate alone equals $500B — the largest infrastructure project since the Interstate Highway System. Source: Microsoft, Google, Meta, Amazon earnings; Stargate project announcements; success.com 179
Slide 177
NVIDIA: The $4.6 Trillion Kingmaker $4.6T $130.5B $115.2B Market cap FY2025 revenue Data center revenue (world's most valuable) (+114% YoY) (88% of total) $11B 85-95% 3.6M Blackwell Q4 revenue AI GPU training B200/GB200 backlog (fastest ramp ever) market share (sold out mid-2026) NVIDIA more than doubled revenue in one year. Blackwell hit $11B in its first quarter — the fastest product ramp in semiconductor history. Source: NVIDIA Q4 FY2025 earnings; Capital.com; market data, Feb 2026 180
Slide 178
The Hardware Bottleneck Stack SK Hynix: 50-70% share, sold out through 2026 1. HBM (Memory) HBM4 mass production Feb 2026 1.2 TB/s bandwidth per stack TSMC advanced packaging = real chokepoint 2. CoWoS (Packaging) Google cut TPU production 4M → 3M units Oversubscribed through at least 2026 H100: 700W → B200: 1,000W → GB300: 1,400W 3. Power & Cooling Data centers: 49 GW → 96 GW by 2026 $720B in grid investment needed Every layer is sold out or constrained. The AI buildout is an industrial revolution limited by atoms, not bits. Source: SK Hynix; TSMC; Goldman Sachs; IEA; Fusion Worldwide 181
Slide 179
The Enabling Manufacturing Stack ASML TSMC Lam Research €32.7B revenue (all-time high) 70.2% global foundry share $18.4B revenue (+24% YoY) €38.8B order backlog (record) 2nm entered production late 2025 Advanced packaging: $1B → $3B+ EUV lithography monopoly 90%+ share at 3nm/2nm Shipments tripling €13.2B Q4 bookings (2× estimate) $52-56B 2026 capex (N2 ramp) Critical for CoWoS expansion Semiconductor industry: $697–772B (2025) | "The AI revolution runs through five companies in Taiwan, the Netherlands, and California." ASML's backlog alone (€38.8B) exceeds many countries' GDP. TSMC at 90%+ share of leading-edge nodes is the single most critical company in AI. Source: ASML Q4 FY2025; TSMC; Lam Research FY2025; SIA 182
Slide 180
NVIDIA's Groq Deal & The Competitive Landscape Groq Acquisition (Dec 2025) The Challengers AMD MI350: 40% cost/token advantage vs B200 $20B — NVIDIA's largest deal ever 7/10 largest AI cos now use Instinct Google TPU v6e: 40-65% inference cost reduction Groq LPU: 300+ tok/s (Llama 3 70B) Anthropic: largest TPU deal in Google history vs. NVIDIA GPU: 10-30 tok/s Custom ASICs: 40% of all inference workloads Signal: NVIDIA acquired its most credible Trainium, Maia, MTIA inference competitor. Groq continues as independent company under new CEO. Rubin (next-gen): Q3 2026, claims 10x inference cost cut NVIDIA dominates training (85-95%) but inference is fragmenting. Custom ASICs already handle 40% of inference — and Groq was the fastest threat. Source: CNBC, Dec 2025; NVIDIA; AMD; Google Cloud; TS2.tech 183
Slide 181
K E Y TA K E A W AY S 96 GW of data center power by 2026 — equivalent to 9 New York Cities. 90% of growth is AI. The $720B grid buildout rivals AI capex itself. AI hardware is scaling 300,000× faster than Moore's Law predicted. Rubin (2026) promises 1/10th inference cost vs. Blackwell. $690B+ in Big Tech capex. NVIDIA doubled revenue to $130.5B. Every layer of the hardware stack — memory, packaging, power — is sold out. 184
Slide 182
“ More human than human is our motto. — Eldon Tyrell, Blade Runner, 1982
Slide 183
SECTION 14 The Reckoning 144 non-human identities per human employee. 95% per-step reliability = 36% over 20 steps. The risks compound. 193
Slide 184
Agent Error Rates: Not Ready for High Stakes Reliability by System Type The Compounding Problem A 20-step workflow with 95% per-step reliability: Single-agent: 99.5% success rate Multi-agent: 97% success rate 0.95²⁰ = 36% overall (2.5% error increase from coordination) success Under stress: 96.9% → 88.1% (semantic perturbation testing) Rate limiting is the most damaging fault. Errors compound exponentially in multi-step agentic workflows. 91% of ML models degrade in production over time Recovery loops (retry, self-correct) can improve this — but compounding still matters at scale. Individual steps are reliable. Chains are fragile. Recovery loops help but don't eliminate compounding — this is why most production agents are single-purpose today. Source: ReliabilityBench (arXiv 2601.06112); Getmaxim multi-agent analysis; EdStellar 194
Slide 185
Safety Regulatory Landscape 2026 CA SB 53 NY RAISE Act EU AI Act Art. 50 ACTIVE ACTIVE UPCOMING Signed Sep 2025 Signed Dec 19, 2025 Deadline: Aug 2, 2026 Effective Jan 1, 2026 Effective Jan 1, 2027 First US frontier AI safety law Mirrors SB 53 scope Mandatory AI content marking Requires safety frameworks Includes university carveouts High-risk system obligations for large-scale AI models NY-developed models only Whistleblower protections Two US states have frontier AI laws. EU AI Act Article 50 hits Aug 2026. We're entering the era of mandatory AI governance — ready or not. Source: FPF; Governor Hochul announcement, Dec 2025; IAPP EU AI Act tracker 195
Slide 186
International AI Safety Report 2026 Key 2025 Capability Milestones Chaired by Yoshua Bengio • Gold-medal IMO math performance • Exceeded PhD-level science benchmarks 100+ international experts • Rapid coding & autonomous operation gains 30+ countries & organizations (EU, OECD, UN) Published February 3, 2026 Risk areas identified: 220 pages Deepfakes, labor displacement, power concentration, AI-induced psychosis, autonomous agent risks Covers: General-purpose AI systems • Labor market impacts • Human autonomy & power concentration • Deepfakes & manipulation • Autonomous agent risks The world's top AI safety researchers agree: capabilities are advancing faster than governance. The 220-page report is a roadmap for what we must solve. Source: internationalaisafetyreport.org, Feb 2026; arXiv 2501.17805 196
Slide 187
The Identity Problem: 144-to-1 144:1 non-human to human identities in enterprise Why This Matters Long-lived credentials Often never rotated, never audited Unmanaged roles Scope creep without review 44% growth from 2024 to 2025 Previous: 92:1 → Now: 144:1 Lack of visibility Most orgs can't inventory their NHIs Service accounts, API keys, tokens, bots, agents — all with credentials As agents proliferate, the identity but no human oversight. surface area grows exponentially. For every human in an enterprise, there are 144 non-human identities. Most are unmanaged. This is the biggest unseen attack surface in AI. Source: CyberSecurityTribe; Obsidian Security; OASIS Security NHI report, 2025 197
Slide 188
The Provenance Gap: AI Content Outpaces Verification ~74% of new pages show AI involvement signals → Verified provenance remains rare C2PA SynthID Meta Video Seal 6,000+ 10B+ 4,000+ member orgs content pieces watermarked hours watermarked Google Pixel 10: first smartphone Google DeepMind Open-source with C2PA at camera level Embedded in Imagen, Gemini Frequency-domain approach EU AI Act Article 50 deadline: August 2, 2026 — mandatory AI content marking AI content generation far outpaces provenance infrastructure. C2PA and SynthID are the answer, but adoption lags generation by orders of magnitude. Source: Ahrefs web content analysis, 2025 (AI signal detection, not full authorship); C2PA coalition; Google DeepMind SynthID; EU AI Act Art. 50 198
Slide 189
Agent Identity & Chain-of-Custody Agent A generates code → Agent B reviews → Agent C deploys → Bug reaches production. Who's liable? 57% of companies run AI agents in production Emerging Solutions 44% have documented governance policies a16z "Know Your Agent": Cryptographic credentials linking agents to principals <10% have infrastructure to enforce them Git AI • SLSA Framework • Sigstore (survey estimates vary) SonarQube AI-code detection Emerging liability: Workday precedent (USDC N.D. Cal.) held AI vendors can be 'agents' with direct liability. Hybrid frameworks combining tort liability + no-fault compensation are emerging. 57% of companies run agents in production. Fewer than 10% have the infrastructure to govern them. The accountability gap is a ticking clock for enterprise AI. Source: a16z 'Know Your Agent'; Google DORA; Clifford Chance; Stanford CodeX 199
Slide 190
Deepfakes: Detection Is Losing the Arms Race 55.54% human deepfake detection accuracy The Paradigm Shift You can't detect your way out of this problem. You have to sign your way out. (barely above coin flip) Detection-based approach: Arms race. Adversarial. Gets worse. High-quality video deepfakes: Humans detect only 24.5% Provenance-based approach: C2PA + SynthID + Video Seal Sign content at creation. Detection market: $15.7B by 2026 Verify the chain, not the content. (42% CAGR) This is why the provenance gap matters. Humans detect deepfakes at ~55% — barely above random chance. High-quality video fakes: only 24.5%. Detection is a losing game; provenance is the only answer. Source: Groh et al., PNAS 2022 (deepfake detection meta-analysis); Meta Video Seal paper, 2024; ScienceDirect systematic review (56 studies); Keepnet Labs synthesis, 2025 200
Slide 191
The Labor Market: Augmentation vs. Displacement Anthropic Economic Index The Skill Shift Declining demand: 2M Claude interactions analyzed Information processing, data analysis, routine knowledge work ~50/50 split augmentation vs automation (slight edge to augmentation) Rising demand: Interpersonal skills, coordination, 49% of jobs can now use AI in ≥25% resource monitoring, judgment of their tasks (up from 36% early 2025) Entry-level crisis: BUT: API usage by late 2025 shifted to Traditional career ladder fundamentally 77% automation patterns altered. AI substitutes lower-skilled tasks while complementing higher-skilled work. AI is augmenting and automating simultaneously. The 50/50 split is shifting toward automation. Entry-level white- collar roles face unprecedented pressure. Source: Anthropic Economic Index, Jan 2026; Axios; McKinsey MGI 202
Slide 192
K E Y TA K E A W AY S Agent error compounds exponentially: 95% per-step × 20 steps = 36% success. Most production agents remain single-purpose for this reason. 144 non-human identities per human. Deepfake detection at 55% (coin-flip). Provenance, not detection, is the only viable answer. 57% of companies run agents in production; <10% can govern them. Regulation is arriving (EU AI Act Art. 50, Aug 2026) — ready or not. 205
Slide 193
SECTION 15 What Comes Next Task horizons doubling every 123 days — week-long autonomous work by late 2026, month-long by mid-2027 206
Slide 194
Sequoia's Trillion-Dollar Thesis $10T AI revenue opportunity (10x larger than cloud) Sequoia's Framework The "Cognitive Revolution" — grander than the Industrial Revolution. At least 10x more compute consumption per knowledge worker. Some portfolio companies forecast 1,000x–10,000x. Cloud was ~$1T. AI will be $10T+. The difference: cloud changed where software runs; AI changes what software does. Sequoia sees AI as a $10T+ opportunity — 10x cloud. The thesis: AI doesn't just improve existing tasks, it creates entirely new categories of value. Source: Sequoia Capital, 'The $10 Trillion AI Revolution,' 2025 207
Slide 195
a16z Predictions for 2026 Death of the Agent-Native Industrial AI Prompt Box Infrastructure Deploys Zero visible prompting for Fundamental architectural Modular AI + autonomy in mainstream users. Proactive shifts. Agents interfacing energy, mining, construction, agent actions for human review. with web directly. manufacturing. Video Becomes 80% of AI AI Breaks Interactive Outside Valley Cyber Gap Video stops being passive. Forward-deployed discovery Automates repetitive security Becomes spaces you step in legacy verticals. AI's real work. Closes the hiring gap into. opportunity isn't in SF. in cybersecurity. a16z's big ideas: the prompt box dies, infrastructure goes agent-native, and 80% of AI's real opportunity lives outside Silicon Valley. Source: a16z, 'Big Ideas 2026' Parts 1-3; 'Notes on AI Apps in 2026' 208
Slide 196
The Network Effects of Composability "The degree to which a network facilitates interconnections determines the extent of its emergent creativity, innovation and wealth." Two Architectures, Two Outcomes What Agentic Composability Unlocks Hub-and-spoke (constrained) Agent networks = scale-free by default. Central authority controls access. Value captured by platform owner. 17,000+ MCP servers. Agents discover, Amazon, Facebook, app stores. negotiate, and delegate to other agents without central gatekeepers. Scale-free (emergent) Central facilitator enables peer connections. Metcalfe's Law: value grows as n² Value created by participants. Reed's Law: value grows as 2ⁿ Internet, Linux, MCP ecosystem. when subgroup formation is easy The web's original sin: ads. Emergent > Designed. Agents don't click ads — so the Like Unix pipes, but for intelligence. attention economy collapses. The web's business model (ads) breaks when agents are the consumers. Composable, open protocols (MCP, A2A) create scale-free networks where value compounds through Reed's Law. Source: Jon Radoff, 'Network Effects in the Metaverse,' meditations.metavert.io; Ben Thompson, 'The Upheaval Coming for the Internet Economy,' Stratechery, 2025 210
Slide 197
The Three Futures of Work 60%+ NEW ~20% Augmentation Creator Economy Automation of occupations benefit jobs created of occupations at risk Humans enhanced by agents. Natural language → code. Agents handle repetitive work. Nurses, physicians, teachers, Everyone can build. Data analysis, knowledge updates, HR, insurance, pharmacists. Single-builder companies. routine processing. Higher-value, uniquely human tasks. $820B opportunity. Entry-level white-collar pressure. Not one future but three, coexisting. Augmentation dominates (60%+). The creator economy is new. Automation pressure hits entry-level hardest. Source: Stanford Salt Lab; arXiv 2506.06576; McKinsey MGI 'Agents, Robots, and Us' 211
Slide 198
World Models → Interactive Entertainment → Simulation Static Generation Interactive Worlds Full Simulation Text/image/video creation 2023-24 → Real-time AI scenes explorable 2025 → Physics-aware, navigable spaces 2026+ NVIDIA Cosmos Runway GWM-1 9,000T tokens from 20M hrs of Real-time interactive AI worlds. real-world data. Open diffusion + Responds to user input with autoregressive transformer models. consistent physics. Python SDK for Cosmos-Predict2.5 for future state. robotics, training, avatars. We've gone from text → images → video → interactive worlds in three years. Cosmos and GWM-1 make navigable, physics-aware spaces generation-ready. Source: NVIDIA Cosmos; Runway GWM-1; DeepLearning.AI 212
Slide 199
The Spatial + Embodied + Agentic Convergence LLMs + VLMs World Models Embodied Agents Semantic intelligence Physical understanding Robot foundation models Language → action Spatial reasoning Physical interaction Spatial Displays Agent Protocols AR/VR interface layer MCP + A2A Always-on AI glasses Agent interoperability → Agents that see, reason, move, and build in physical space The agent doesn't just write your code. It sees your world, moves through your factory, and builds your vision. Five technologies converge into one. Source: arXiv (multiple); Frontiers in Robotics and AI; NVIDIA; Runway 213
Slide 200
The Internet Gets Rebuilt for Agents MCP Agent → Tool A2A Agent → Agent 97M monthly SDK downloads 50+ launch partners 100K → 97M in 12 months Salesforce, PayPal, Atlassian Fastest protocol adoption ever Launched April 2025 ACP Agent Commerce AP2 Agent Payments Agentic Commerce Protocol Agent Payments Protocol Agent-to-agent transactions Real-time compensation Autonomous purchasing Nanopayment-enabled MCP (agent→tool) + A2A (agent→agent) + commerce protocols = the TCP/IP of the agent era. MCP went from 100K to 97M downloads in 12 months. But agents don't click ads — so the web's attention-based business model collapses. Payment mechanisms become infrastructure. Source: A2A Protocol; Google Developers Blog; AWS Open Source Blog; Medium; Ben Thompson, Stratechery, 'The Upheaval Coming for the Internet Economy,' 2025 214
Slide 201
The Direct from Imagination Era Human imagination becomes the bottleneck — not technical skill, not resources, not budget. Creation Business Daily Life Every creative industry follows Smaller teams accomplish what The $285B SaaS decline isn't the same arc: Pioneer → required hundreds. Solo auteurs irrational — it reflects structural Engineering → Creator Era. sculpt entire experiences. disruption. Software is crossing into its Natural language is the new LLMs become universal Creator Era now. Millions of programming language. middleware. AI agents coordinate new builders from non-technical LLMs are compilers for intent. systems without proprietary APIs. backgrounds. 38% of startups now The bottleneck shifts from The gap between imagining solo-founded (up from 22%) engineering to creative vision. and building has collapsed. We went from Identity (who you are) to Self-Expression (what you create) to Empowerment (what you build). AI collapses the distance between imagination and reality. Source: Jon Radoff, 'The Direct from Imagination Era,' meditations.metavert.io; 'Software's Creator Era Has Arrived,' meditations.metavert.io, 2025 214B
Slide 202
K E Y TA K E A W AY S The web is being rebuilt for agents: MCP (97M downloads) + A2A + commerce protocols = TCP/IP of the agentic era. But agents don't click ads — so the internet's business model must change. Composability is the force multiplier. Scale-free agent networks follow Reed's Law (2ⁿ) — value compounds exponentially as agents discover and delegate to each other. We've entered the Direct from Imagination era: natural language is the new programming language, and the bottleneck has shifted from engineering capability to creative vision. 214C
Slide 203
“ The only way of discovering the limits of the possible is to venture a little way past them into the impossible. — Arthur C. Clarke, Profiles of the Future, 1962
Slide 204
The State of Agents in 2026 Massive investment, real capability, uneven execution. $211B 80.9% +67% AI VC funding (2025) SWE-Bench Verified Merged PRs/eng/day 50% of all global VC (Claude Opus 4.5) (Anthropic internal) 500+ 14.5hr "The gap between intention and implementation Bugs found by Claude that SAST tools missed for decades has collapsed. Autonomous task horizon doubling every 123 days The gap between implementation and value has not — yet." Jon Radoff · metavert.io · February 2026 215
Slide 205
The State of AI Agents & Agentic Engineering By Jon Radoff Web: https://metavert.io 2026 X: @jradoff ©2026 Metavert LLC Licensed under Creative Commons Attribution 4.0.