Everyone's chasing the next model release like it's going to solve their agent problems. It won't. The fix isn't a smarter model. It's building a real domain memory factory and enforcing discipline around it.
The core pattern is surprisingly simple: persistent domain memory + a strict two-agent loop. That's the whole trick. Everything else is commentary.
1. Domain Memory (Not a Vector DB)
This is where most people get it spectacularly wrong.
Domain memory is state, not embeddings. It's a durable, structured representation of the work that survives context resets and prevents the agent from re-learning the same lessons every single run. Your agent isn't suffering from amnesia because the model is stupid—it's amnesiac by design. Context windows are finite. Real projects don't fit inside one window.
Domain memory holds:
- A persistent set of goals, requirements, and constraints
- Explicit state tracking—what's passing, failing, flaky, or broken
- Scaffolding: how to run, test, debug, and extend the system
- A canonical progress log the agent must read before acting
This can be as boring as:
- A structured JSON feature list
- A progress.md or status.txt file
Boring is good. The point is memory ownership, not cleverness. If your domain memory requires a PhD to understand, you've already lost.
2. The Two-Agent Pattern
You split responsibilities cleanly around memory. Memory becomes the boundary between two fundamentally different jobs.
Initializer Agent
- Runs once (or rarely)
- Expands a vague human goal into a machine-readable backlog
- Writes the feature list, constraints, and rules of engagement
- Bootstraps the domain memory
- Has no need for memory itself—it's a stage manager, not a worker
Coding / Worker Agent
- Starts every run amnesiac
- Reads the domain memory as its only source of truth
- Picks exactly one failing or incomplete item
- Implements it, tests it, updates status
- Writes a progress note and commits
- Leaves the campsite clean
This agent never "remembers"—it reconstructs state from artifacts every time.
That's the whole trick. Two agents, one memory boundary, zero magic.
Design Rules for Serious Agents
If you want agents that ship instead of vibe:
Externalize the goal. Turn "build X" into a concrete backlog with pass/fail criteria. If the goal lives only in the prompt, it dies with the context window.
Make progress atomic and observable. One item at a time. Every run must update shared state. If you can't tell what changed by looking at the artifacts, the run didn't happen.
Leave a clean campsite. End every session with passing tests and readable notes. Your future self (and future agent runs) will thank you.
Standardize the boot ritual. Every run starts the same way: read memory, run checks, then act. No snowflake sessions. No "let me get oriented first." The ritual is the reliability.
Tests define reality. Test results are the domain state. Not vibes. Not intent. Not "I think I fixed it." The tests pass or they don't.
The moat isn't the LLM. The moat is the domain memory and harness you design. That part is bespoke, domain-specific, and not commoditized. OpenAI can't ship it. Anthropic can't ship it. Only you can.
What "Harness" Actually Means
When Anthropic talks about a harness, they mean the scaffolding wrapped around the model that makes agentic work possible.
Claude is the horse.
Claude Code is the harness.
You're still holding the reins.
Early AI coding failed because either:
- The horse wasn't strong enough, or
- The harness was garbage
You need both. A thoroughbred with a rope around its neck is still just a panicked horse. A world-class harness on a donkey is still a donkey. The magic happens at the intersection.
The Long-Running Agent Problem
Agents don't actually "run forever."
They work in discrete sessions, each starting with zero memory. Context windows are finite. Real projects don't fit inside one window. This isn't a bug—it's thermodynamics.
So without a harness, agents:
- Forget what they were doing
- Regress to mean
- Re-introduce bugs they already fixed
- Loop endlessly on solved problems
Sound familiar? That's not the model being dumb. That's you not giving it persistent state.
The Two-Agent Harness (Why It Works)
Anthropic's updated guidance introduces a key idea:
Use a different prompt for the very first context window.
That first run is special. It sets the world.
The Initializer agent:
- Writes comprehensive feature requirements
- Defines constraints and structure
- Creates the artifacts future agents will rely on
The Coding agent:
- Makes incremental progress
- Operates purely off artifacts
- Leaves breadcrumbs for the next session
Once you do this, context resets stop being a problem. They become a feature—a clean slate with perfect state reconstruction.
Why This Matters for Claude Code (and Your Work)
The Claude Agent SDK formalizes this pattern:
- Tool use
- Structured planning
- Context compaction
- Artifact-driven continuity
Understanding the harness explains:
- Why some prompts work and others don't
- Why long-running tasks fail without structure
- Why "just make the model smarter" is the wrong lever
If you want agents that actually deliver, you don't train them harder.
You give them:
- Memory they can trust
- Rules they must follow
- A harness that doesn't let them drift
That's the game.