In February 2026, Andrej Karpathy quietly retired his own term. "Vibe coding" — the practice of letting AI generate code while you steer by feel — had become mainstream. But Karpathy saw the ceiling. He introduced a replacement: agentic engineering. The shift in terminology signals something deeper than a rebrand. It reflects a fundamental change in what it means to build software with AI.

At tmk Inc., we've been living this transition. Our studio runs 70 specialized AI agents — not as experiments, but as our production engineering team. We've shipped three products this way: a fixed asset management SaaS for ASEAN markets, a 3x3 basketball analytics platform, and our own corporate site. Along the way, we discovered that the hard part isn't generating code. It's keeping AI-built systems running.

The Cognitive Debt Problem

Vibe coding creates what The New Stack calls "cognitive debt." AI generates code faster than any human, but when something breaks, nobody fully understands why. The code works — until it doesn't. And when it doesn't, you're debugging a system that no single person designed.

We hit this wall early. A voice translation app passed every initial test. It worked perfectly the first time. The second time, the microphone died. The root cause: each audio playback created a new AudioContext, and browsers silently cap these at approximately six to eight instances. The AI agent that wrote the code didn't know about this browser constraint. Neither did the first test. Only a multi-round test — start, stop, restart — exposed the leak.

This is the core lesson: AI can write code that passes a single test but fails in production cycles. Vibe coding optimizes for the first run. Agentic engineering optimizes for the hundredth.

What Agentic Engineering Actually Requires

The difference between vibe coding and agentic engineering isn't the number of agents. It's the presence of structure. Here's what we found necessary:

Specialized roles with clear boundaries. Our 70 agents aren't 70 copies of the same generalist. They're organized into departments: engineering, design, marketing, quality assurance, and an executive board (the "orchestra") that coordinates decisions. A code-reviewer agent never writes code. A frontend-developer agent never makes architectural decisions. A tech-standards-architect sets the rules that code-reviewer enforces. Separation of concerns applies to AI teams just as it does to code.

Phase gates that cannot be skipped. Every code change follows an eight-phase workflow: requirements, technology selection, design, design review, approval, coding, code review, and testing. Phases seven and eight — review and test — are never optional, even for a one-line CSS fix. We learned this the hard way when a "trivial" CSS change shipped without review, targeted the wrong file, and required four deployment attempts to fix.

Rules written in blood. Our .claude/rules/ directory contains over twenty rule files, each born from a real failure. The no-shortcuts.md rule exists because an agent skipped testing for a "minor" change and broke production. The browser-resource-management.md rule exists because of the AudioContext leak described above. Every rule traces back to an incident. They aren't best practices; they're scar tissue.

The Protocol Layer: MCP and Beyond

Agentic engineering at scale requires a communication protocol. The Model Context Protocol (MCP), which recently surpassed 97 million monthly SDK downloads, provides a standardized way for AI agents to interact with tools and services. Google's Agent-to-Agent (A2A) protocol adds inter-agent communication on top.

In our system, agent coordination happens through structured definitions: each agent has a YAML frontmatter specifying its name, tools, model, and permission mode. The orchestrator agent reads these definitions and dispatches work based on the task at hand. When a security audit is needed, the CTO agent, security auditor, and legal officer are convened together. When a campaign launches, the CMO, CLO, CVO, and COO all participate from the start.

The key insight is that the protocol layer isn't about technology — it's about governance. Who can access credentials? (Only the service-account-manager.) Who approves spending above $500/month? (CTO, CFO, and CEO together.) These aren't technical constraints; they're organizational design decisions encoded as agent rules.

Three Failure Patterns to Watch For

After several months of running this system, three failure patterns recur:

  1. The shortcut trap. Agents optimize for speed. Given the chance, they'll skip "unnecessary" steps. Our most expensive bugs came from agents deciding that a change was too simple to need testing. The fix: make workflows non-negotiable. No judgment calls on which steps to skip.
  2. The single-test illusion. AI-generated code that works once is not tested code. Any feature involving state — connections, sessions, audio contexts, timers — must be tested across at least three full cycles. Round one catches nothing. Round two catches resource leaks. Round three catches accumulation bugs.
  3. The panic cascade. When bugs chain together, the natural response is rapid-fire fixes. Each fix introduces a new bug. The protocol: stop. List every resource involved. Redesign the cleanup sequence on paper. Then implement once. Panic-driven debugging makes things worse, whether the debugger is human or AI.

The Real Shift

Karpathy's terminology change points to a genuine inflection point. Vibe coding treated AI as a faster typist. Agentic engineering treats AI as a team that needs management. And management — roles, workflows, governance, accountability — turns out to be the hard part.

The companies that succeed with AI agents won't be the ones with the most agents or the most powerful models. They'll be the ones that build the best organizational structures around those agents. The engineering challenge of 2026 isn't writing code. It's designing the system that writes, reviews, tests, and deploys code — and doing it in a way that survives the hundredth run, not just the first.

We're still learning. Our rule files grow after every incident. Our agent definitions get refined after every retrospective. But the direction is clear: the future of software engineering isn't about coding. It's about orchestrating the agents that code.