decoding
back_to_blog

What Actually Makes Agents Work

Spoiler: it's not a smarter model. It's building a real domain memory factory and having the discipline to actually use it. The LLM is the horse. The harness is what you're missing.

════════════════════════════════════════════════════════════════

Everyone's chasing the next model release like it's going to solve their agent problems. It won't. The fix isn't a smarter model. It's building a real domain memory factory and enforcing discipline around it.

The core pattern is surprisingly simple: persistent domain memory + a strict two-agent loop. That's the whole trick. Everything else is commentary.

1. Domain Memory (Not a Vector DB)

This is where most people get it spectacularly wrong.

Domain memory is state, not embeddings. It's a durable, structured representation of the work that survives context resets and prevents the agent from re-learning the same lessons every single run. Your agent isn't suffering from amnesia because the model is stupid—it's amnesiac by design. Context windows are finite. Real projects don't fit inside one window.

Domain memory holds:

This can be as boring as:

core_insight

Boring is good. The point is memory ownership, not cleverness. If your domain memory requires a PhD to understand, you've already lost.

2. The Two-Agent Pattern

You split responsibilities cleanly around memory. Memory becomes the boundary between two fundamentally different jobs.

Initializer Agent

Coding / Worker Agent

This agent never "remembers"—it reconstructs state from artifacts every time.

That's the whole trick. Two agents, one memory boundary, zero magic.

────────────────────────────────────────

Design Rules for Serious Agents

If you want agents that ship instead of vibe:

Externalize the goal. Turn "build X" into a concrete backlog with pass/fail criteria. If the goal lives only in the prompt, it dies with the context window.

Make progress atomic and observable. One item at a time. Every run must update shared state. If you can't tell what changed by looking at the artifacts, the run didn't happen.

Leave a clean campsite. End every session with passing tests and readable notes. Your future self (and future agent runs) will thank you.

Standardize the boot ritual. Every run starts the same way: read memory, run checks, then act. No snowflake sessions. No "let me get oriented first." The ritual is the reliability.

Tests define reality. Test results are the domain state. Not vibes. Not intent. Not "I think I fixed it." The tests pass or they don't.

strategic_insight

The moat isn't the LLM. The moat is the domain memory and harness you design. That part is bespoke, domain-specific, and not commoditized. OpenAI can't ship it. Anthropic can't ship it. Only you can.

What "Harness" Actually Means

When Anthropic talks about a harness, they mean the scaffolding wrapped around the model that makes agentic work possible.

Claude is the horse.
Claude Code is the harness.
You're still holding the reins.

Early AI coding failed because either:

You need both. A thoroughbred with a rope around its neck is still just a panicked horse. A world-class harness on a donkey is still a donkey. The magic happens at the intersection.

The Long-Running Agent Problem

Agents don't actually "run forever."

They work in discrete sessions, each starting with zero memory. Context windows are finite. Real projects don't fit inside one window. This isn't a bug—it's thermodynamics.

So without a harness, agents:

Sound familiar? That's not the model being dumb. That's you not giving it persistent state.

The Two-Agent Harness (Why It Works)

Anthropic's updated guidance introduces a key idea:

Use a different prompt for the very first context window.

That first run is special. It sets the world.

The Initializer agent:

The Coding agent:

Once you do this, context resets stop being a problem. They become a feature—a clean slate with perfect state reconstruction.

Why This Matters for Claude Code (and Your Work)

The Claude Agent SDK formalizes this pattern:

Understanding the harness explains:

════════════════════════════════════════════════════════════════

If you want agents that actually deliver, you don't train them harder.

You give them:

That's the game.