What Actually Makes Agents Work

Everyone's chasing the next model release like it's going to solve their agent problems. It won't. The fix isn't a smarter model. It's building a real domain memory factory and enforcing discipline around it.

The core pattern is surprisingly simple: persistent domain memory + a strict two-agent loop. That's the whole trick. Everything else is commentary.

1. Domain Memory (Not a Vector DB)

This is where most people get it spectacularly wrong.

Domain memory is state, not embeddings. It's a durable, structured representation of the work that survives context resets and prevents the agent from re-learning the same lessons every single run. Your agent isn't suffering from amnesia because the model is stupid—it's amnesiac by design. Context windows are finite. Real projects don't fit inside one window.

Domain memory holds:

A persistent set of goals, requirements, and constraints
Explicit state tracking—what's passing, failing, flaky, or broken
Scaffolding: how to run, test, debug, and extend the system
A canonical progress log the agent must read before acting

This can be as boring as:

A structured JSON feature list
A progress.md or status.txt file

core_insight

Boring is good. The point is memory ownership, not cleverness. If your domain memory requires a PhD to understand, you've already lost.

2. The Two-Agent Pattern

You split responsibilities cleanly around memory. Memory becomes the boundary between two fundamentally different jobs.

Initializer Agent

Runs once (or rarely)
Expands a vague human goal into a machine-readable backlog
Writes the feature list, constraints, and rules of engagement
Bootstraps the domain memory
Has no need for memory itself—it's a stage manager, not a worker

Coding / Worker Agent

Starts every run amnesiac
Reads the domain memory as its only source of truth
Picks exactly one failing or incomplete item
Implements it, tests it, updates status
Writes a progress note and commits
Leaves the campsite clean

This agent never "remembers"—it reconstructs state from artifacts every time.

That's the whole trick. Two agents, one memory boundary, zero magic.

────────────────────────────────────────

Design Rules for Serious Agents

If you want agents that ship instead of vibe:

Externalize the goal. Turn "build X" into a concrete backlog with pass/fail criteria. If the goal lives only in the prompt, it dies with the context window.

Make progress atomic and observable. One item at a time. Every run must update shared state. If you can't tell what changed by looking at the artifacts, the run didn't happen.

Leave a clean campsite. End every session with passing tests and readable notes. Your future self (and future agent runs) will thank you.

Standardize the boot ritual. Every run starts the same way: read memory, run checks, then act. No snowflake sessions. No "let me get oriented first." The ritual is the reliability.

Tests define reality. Test results are the domain state. Not vibes. Not intent. Not "I think I fixed it." The tests pass or they don't.

strategic_insight

The moat isn't the LLM. The moat is the domain memory and harness you design. That part is bespoke, domain-specific, and not commoditized. OpenAI can't ship it. Anthropic can't ship it. Only you can.

What "Harness" Actually Means

When Anthropic talks about a harness, they mean the scaffolding wrapped around the model that makes agentic work possible.

Claude is the horse.
Claude Code is the harness.
You're still holding the reins.

Early AI coding failed because either:

The horse wasn't strong enough, or
The harness was garbage

You need both. A thoroughbred with a rope around its neck is still just a panicked horse. A world-class harness on a donkey is still a donkey. The magic happens at the intersection.

The Long-Running Agent Problem

Agents don't actually "run forever."

They work in discrete sessions, each starting with zero memory. Context windows are finite. Real projects don't fit inside one window. This isn't a bug—it's thermodynamics.

So without a harness, agents:

Forget what they were doing
Regress to mean
Re-introduce bugs they already fixed
Loop endlessly on solved problems

Sound familiar? That's not the model being dumb. That's you not giving it persistent state.

The Two-Agent Harness (Why It Works)

Anthropic's updated guidance introduces a key idea:

Use a different prompt for the very first context window.

That first run is special. It sets the world.

The Initializer agent:

Writes comprehensive feature requirements
Defines constraints and structure
Creates the artifacts future agents will rely on

The Coding agent:

Makes incremental progress
Operates purely off artifacts
Leaves breadcrumbs for the next session

Once you do this, context resets stop being a problem. They become a feature—a clean slate with perfect state reconstruction.

Why This Matters for Claude Code (and Your Work)

The Claude Agent SDK formalizes this pattern:

Tool use
Structured planning
Context compaction
Artifact-driven continuity

Understanding the harness explains:

Why some prompts work and others don't
Why long-running tasks fail without structure
Why "just make the model smarter" is the wrong lever

════════════════════════════════════════════════════════════════

If you want agents that actually deliver, you don't train them harder.

You give them:

Memory they can trust
Rules they must follow
A harness that doesn't let them drift

That's the game.