04 — Agent Loops
What Was Researched
The core execution loop patterns used by AI agents — how they process user input, make API calls, handle tool calls, manage state between iterations, and determine when to stop. This is the central control flow of any agent.
Which Sources Were Used
| Source | Type | URL | Relevance |
|---|---|---|---|
Hermes Agent (hermes-agent/AGENTS.md, run_agent.py) |
Local codebase | https://github.com/NousResearch/hermes-agent | CRITICAL |
LangGraph (langgraph/) |
Local codebase | https://github.com/langchain-ai/langgraph | CRITICAL |
Codex (codex/AGENTS.md) |
Local codebase | https://github.com/openai/codex | HIGH |
Pi Agent Core (pi-mono/packages/agent/) |
Local codebase | https://github.com/badlogic/pi-mono | HIGH |
OpenRouter SDK (openrouter-sdk/) |
Local codebase | https://github.com/OpenRouterTeam/typescript-sdk | MEDIUM |
assistant-ui (assistant-ui/) |
Local codebase | https://github.com/assistant-ui/assistant-ui | HIGH |
OpenClaw (openclaw/) |
Local codebase | https://github.com/openclaw/openclaw | HIGH |
Key Findings
Loop Pattern 1: Synchronous While-Loop (Hermes)
The most common pattern. Hermes's run_conversation() in run_agent.py:
while (api_call_count < max_iterations and iteration_budget.remaining > 0) \
or budget_grace_call:
if interrupt_requested: break
response = client.chat.completions.create(model=model, messages=messages, tools=tool_schemas)
if response.tool_calls:
for tool_call in response.tool_calls:
result = handle_function_call(tool_call.name, tool_call.args, task_id)
messages.append(tool_result_message(result))
api_call_count += 1
else:
return response.content # No tool calls = final response
Key features:
- Budget tracking — Dual limits:
max_iterations(hard cap, default 90) +iteration_budget(configurable per-session) - Grace call — One extra turn allowed after budget exhaustion to let the agent wrap up
- Interrupt support —
_interrupt_requestedflag checked every iteration - Message role alternation — Strict enforcement: never two same-role messages in a row
- Synchronous execution — Entire loop is synchronous with async I/O handled internally
Loop Pattern 2: Graph-Based State Machine (LangGraph)
LangGraph represents agent execution as a directed graph:
[START] → [call_model] → [should_continue?] → [tool_node] → [call_model] → ...
↘ [END]
Key features:
- Explicit state transitions — Each node transforms state, edges define flow
- Checkpointing at every node — Automatic persistence for durable execution
- Interrupt points — Any edge can be an interrupt for human-in-the-loop
- Branching — Conditional edges enable dynamic routing
- Subgraphs — Nested graphs for composable agent architectures
- Streaming — Token-level and node-level streaming
Loop Pattern 3: Stop-Condition Driven (OpenRouter SDK)
The OpenRouter SDK introduces a declarative stop-condition model:
maxModelResponses— Cap total model callsmaxToolCalls— Cap total tool invocationsstopConditions— Custom predicates for loop terminationModelResultfor stream consumption — Unified result type for both streaming and non-streaming
Loop Termination Strategies
| Strategy | Used By | Mechanism |
|---|---|---|
| No tool calls = stop | Hermes, Pi, Codex | Model returns text without tool calls |
| Iteration cap | Hermes (90), Codex | Hard limit on API call count |
| Budget tracking | Hermes | Token/cost budget with grace call |
| Graph end node | LangGraph | Explicit END node in graph |
| Stop conditions | OpenRouter SDK | Declarative predicates |
| Interrupt signal | Hermes, Pi | User sends Ctrl+C or new message |
Reasoning Effort Control
A new dimension in agent loops (June 2026):
- Grok 4.3:
reasoning_effort= none / low / medium / high (default: low) - GLM 5.2:
reasoning_effort= high / xhigh - Claude: Built-in reasoning (not configurable)
The agent loop should support setting reasoning effort per-turn, potentially adapting it based on task complexity.
What Is Confirmed
- While-loop with tool-call detection is the dominant pattern (Hermes, Pi, Codex all use it)
- Budget/iteration limits are essential — unbounded loops cause cost explosions
- Grace calls (one extra turn after budget) improve completion quality
- Graph-based execution (LangGraph) provides superior debuggability and interruptibility
- Message role alternation must be strictly enforced (Hermes documents this as a hard requirement)
- Interrupt support is required for production agents
- Human-in-the-loop and steering (investigated in human_in_the_loop_steering.md) is standard for production agents to govern dangerous tools CLAIM-192, support runtime context injection CLAIM-189, and allow configurable bypass modes CLAIM-194.
What Is Uncertain
- Whether graph-based or while-loop patterns are better for a new harness
- Optimal default iteration cap (Hermes uses 90, Codex appears lower)
- How to dynamically adjust reasoning effort based on task complexity
- The UX balance between micro-approvals (every tool) and macro-approvals (session bypasses) CLAIM-193
How This Applies to Building a Modern Model-Agnostic Agent Harness
- Start with a while-loop pattern — simpler, proven, and sufficient for most use cases
- Implement budget tracking from day one — token budget + iteration cap + grace call
- Support interrupt signals — both user-initiated and programmatic with request-local cancellation safety to avoid HTTP retries CLAIM-190, CLAIM-191
- Enforce message role alternation — critical for model compatibility
- Expose reasoning effort as a per-turn configuration option
- Consider graph-based execution for complex multi-step workflows (can be layered on later)
- Implement OpenRouter-style stop conditions for declarative loop control
- Provide granular bypass policies (e.g., Ask, Session Bypass, and Sensitive Path Gating) to balance developer velocity and security CLAIM-194, CLAIM-195