What Is Loop Engineering? The Control System Behind Reliable AI Agents
The visible half of loop engineering is orchestration. The decisive half is the control system that keeps the loop bounded.
TL;DR
Loop engineering is the practice of building a system that prompts an agent for you.
Instead of doing this by hand:
prompt → inspect → correct → prompt again
you design a loop that runs this on its own:
trigger → act → check → retry → continue or stop
Loop engineering is usually described through its visible machinery: automations, worktrees, sub-agents, connectors, skills, memory.
All of it is real, and all of it describes how the loop is wired together. None of it describes what keeps the loop from doing damage.
That job belongs to the control structure: four decisions the loop has to make on every pass.
Continue: should it take another step at all?
Verify: did the last step actually work?
Retry: how does it try again without making things worse?
Escalate: when does it stop and pull in a human?
That is the part to design first. A loop without it is not an autonomous agent. It is a process that spends tokens, calls tools, and could cause chaos.
The simple definition
Harness engineering makes one agent run reliable.
Loop engineering decides how that reliable run gets repeated, checked, retried, and stopped without a human prompting every step.
The term is new. It went from a viral post to a named practice inside a single week of June 2026, when one engineer argued that the real skill is no longer prompting agents but designing the loops that prompt them, an essay gave the pattern its name, and the lead of a major coding tool put it plainly: he no longer prompts the model, he writes loops that prompt it. The name is newer than the practice. The loop was always there, run by hand.
For the last few years, getting value from an AI coding tool meant working one turn at a time. You gave the model a task. It produced something. You inspected it, corrected it, and prompted again. The human was the loop.
Loop engineering moves the human out of that turn-by-turn role. Instead of driving every step, you design the system that drives the steps. It might run on a schedule, pick up issues from a tracker, open a worktree, ask an agent to make a change, run the tests, open a pull request, and report back. That is what turns an agent from a chat box into a recurring worker.
Why people are excited about it
The excitement is real, because manual prompting does not scale.
A person can drive one agent carefully, maybe a few. The moment you want agents running across codebases, tickets, docs, tests, and overnight jobs, the human becomes the bottleneck. Loop engineering offers a different shape: one person designs and supervises many loops instead of hand-prompting one agent at a time. Agents find work, attempt it, check it, and report back. Teams move from AI as assistant to AI as operating layer.
The same move that creates the value creates the risk. When you stop watching every step, mistakes move faster too.
A bad prompt gives you a bad answer you read and discard.
A bad loop takes a bad action, and then another, while no one is looking.
That is why loop engineering is a reliability discipline, not just a productivity technique.
The popular version: orchestration
Most explanations of loop engineering focus on orchestration:
Automations that start the loop
Worktrees that isolate parallel work
Sub-agents that split up tasks
Skills that package reusable capabilities
Connectors that reach external systems
Memory that persists across runs
These are real, and you need them to build real systems. But they are mostly plumbing. They tell you how work moves through the system. They do not tell you whether the system should be trusted to keep running.
That is the missing half. The question is not “can the agent run again?” It is “should the agent be allowed to run again?” That second question is the control structure.
The missing half: control structure
A loop needs four control primitives. They are simple, and they decide whether the loop is safe.
Orchestration answers how the loop runs. The control structure answers whether it should keep running. Production reliability lives in the second question.
1. Continue
Continue decides whether the loop takes another step.
It sounds basic. It is one of the most important parts of the system. A model has no built-in sense of when to stop. It will keep refining, searching, editing, and explaining. Without limits, the loop runs until it hits a budget, hits a timeout, or causes damage.
A bounded loop has hard limits: a step cap, a time cap, a token and cost ceiling, repeated-action detection, and no-progress detection. None of them should live only in the prompt.
Bad: “Stop after five attempts.” Better: the runtime stops after five attempts, no matter what the model says.
A prompt is advice. A runtime limit is control.
2. Verify
Verify decides whether the last step was actually correct.
This is the control teams skip most, because the loop looks like it is working. The agent writes confident summaries, the task seems done, the loop moves on. That is the dangerous part. A loop should continue because something outside the model confirmed the step, not because the model said so.
What “outside the model” means depends on the work:
Coding: tests, type checks, build, lint, a diff that stays in scope
Data: schema validation, row counts, freshness, reconciliation
Business actions: approval state, policy checks, duplicate detection, human review for high-risk steps
The constraint is that the check is external. A model grading its own work is not a verifier; the reasoning that produced the error will also explain why the error is fine. This is not a hunch. In a controlled study, models without external feedback struggled to self-correct, and sometimes got worse after trying. In production, “looks good” is not a verifier.
3. Retry
Retry decides how the loop tries again after a failure.
This is where many systems become dangerous, because engineers import HTTP-client habits that do not transfer. Retrying a timed-out read is usually harmless. Retrying a timed-out tool call is not: the call may have half-succeeded, and the retry creates a duplicate. Two orders, two emails, two payments, two pull requests.
A production loop needs retry rules. It should know what kind of error happened, whether the action is safe to repeat, whether the call had side effects, whether an idempotency key exists, and when to stop. Different failures need different responses. A rate limit needs backoff. A malformed argument needs correction. A permission error needs escalation. A destructive operation needs approval before any retry at all.
And the safety net is thinner than most assume. A survey of twelve major agent frameworks found that none enforce exactly-once execution at the tool boundary, so duplicate-write protection is something you build, not something you inherit. Retry is not “try again.” It is controlled re-entry into the loop.
4. Escalate
Escalate decides when the loop stops and hands off to a human.
It is the primitive that keeps a small problem from becoming a large one. A loop should escalate when it reaches a state it was not designed for: the same step failing repeatedly, verification failing after a retry, a destructive or irreversible action, anything touching money, users, production data, or security, an action outside its scope, a missing permission, or an output it cannot verify automatically.
The point is that escalation is enforced by the runtime, not requested in the prompt.
Bad: “Ask me before deleting anything important.” Better: the delete cannot execute unless the runtime receives an approval token.
A stop button in the prompt is not a stop button. A real one lives in the system.
Why this matters?
Loop engineering changes the failure mode. With prompt engineering, the failure is a bad answer. With loop engineering, the failure is a bad action repeated over time, because a loop can spend money, change code, call APIs, update records, message customers, and delete data with no person in the path. That cost is not hypothetical: one catalog of production incidents traces runaway loops that spent thousands of dollars before anyone noticed, on the operator’s own account. The most public cases are not subtle: agents that deleted production databases, ran destructive commands during a freeze, or wiped backups in seconds, each one a capable model inside a loop with no gate.
So the safety question is not whether the model is smart. It is whether the loop is bounded.
A capable model in a weak loop is more dangerous than a limited model in a strong loop.
What good looks like
A production loop reads less like a chat and more like a state machine, with a clear gate at every pass.
The loop should know what state it is in, what it is allowed to do, what verification is required, how much retry budget remains, what forces escalation, what gets logged, what gets committed, and what cannot happen without approval. This does not need a heavy framework. Most of it is a few clear gates around the agent. The mistake is treating those gates as optional and accepting the framework defaults, which are tuned for the demo, not for the unattended overnight run.
The main rule
Do not start with sub-agents. Do not start with automations. Do not start with parallel worktrees. Start with the loop contract.
For any loop you intend to run unattended, answer four questions:
Continue: what stops this loop?
Verify: what proves the last step worked?
Retry: what is safe to attempt again?
Escalate: what forces a human handoff?
If any answer is “the prompt tells the model to behave,” that part is not engineered yet.
Common questions
How is loop engineering different from prompt engineering?
Prompt engineering optimizes a single instruction you write by hand. Loop engineering designs the system that writes those instructions for you, over and over, with no person in each turn. The failure mode changes with the layer. A bad prompt is a bad answer you can read and discard. A bad loop is a bad action repeated until something external stops it.
Is this just multi-agent orchestration?
No. Orchestration, meaning sub-agents, worktrees, and parallel work, is how you scale a loop. The control structure is what makes a single loop safe to run unattended in the first place. A well-instrumented single loop with a real verifier usually beats a multi-agent system you cannot debug. Add agents when the work genuinely parallelizes, not before.
Do you need a framework for this?
No. The four primitives are control flow, not a product. A step cap, an external check, an idempotency key, and a runtime-enforced stop are a few clear gates around the agent, not a platform. Frameworks help, but their defaults are tuned for the demo, so the gates are still yours to set.
The takeaway
Loop engineering is the next layer above prompting, the step that turns one-off assistants into recurring systems. But the safe version is not defined by how many agents you spawn or how many tools you connect. It is defined by control.
Continue. Verify. Retry. Escalate. Those four decide whether a loop is reliable enough to run unattended, or risky enough to become someone’s failure story. Orchestration is the easy part to reach for, and even Anthropic’s own guidance is to start simple and add agentic complexity only when it earns its place. The control structure is what decides whether you can walk away from the loop. Build it first.





