Agent Self-Improvement & Curation Loops
LLM-based coding agents must be capable of moving beyond static system prompts and rigid tool definitions. In long-running development workflows, agents encounter unique APIs, recurring workspace bugs, and specific developer preferences. To maintain peak performance, the harness must support procedural learning loops — enabling the agent to learn from execution traces, write custom instructions for itself (skills), and automatically consolidate these instructions over time to prevent token bloat and cache invalidation.
1. Curation Telemetry and Lifecycle States
To manage learning dynamically, the harness must track procedural knowledge usage. Hardcoding telemetry directly in the body of instructions or files causes version history pollution and merge conflicts.
A. The Telemetry Sidecar (.usage.json)
The system isolates execution metadata to an asynchronous, process-serialized sidecar file (~/.hermes/skills/.usage.json) CLAIM-209. The sidecar records view count, active use count, patch count, and ISO-8601 timestamps of last activity CLAIM-209:
view_count: Incremented when the model reads/inspects a skill CLAIM-209.use_count: Incremented when the skill's instructions are loaded into the LLM context CLAIM-209.patch_count: Incremented when the model edits or corrects the skill CLAIM-209.
B. State Transitions
Using the sidecar's activity markers, the system runs deterministic state transitions to prune inactive procedural rules:
active: The default state on creation.stale: Automatically marked when a skill goes unused forstale_after_days(default: 30) CLAIM-211.archived: Moved from the active path to a hidden archive directory (~/.hermes/skills/.archive/) when unused forarchive_after_days(default: 90) CLAIM-211.
These transitions keep the active directory clean, preserving token budgets and prefix cache stability.
2. Nous Hermes Curation Engine
The Nous Hermes Curation system represents the state-of-the-art in autonomous skill pruning and umbrella consolidation.
A. Curation Invocation
Curation is not run on a cron daemon, which would fail on offline or closed environments. Instead, the harness implements an inactivity-triggered background review CLAIM-208:
- On CLI session boot or periodically within gateway ticks, the system evaluates the curator state CLAIM-208.
- If the time since the last run exceeds
interval_hours(default: 7 days) and the primary user interface has been idle formin_idle_hours(default: 2 hours), a run is triggered CLAIM-208. - The runner spawns a background fork of the
AIAgentexecuting on a cheaper auxiliary model slot (auxiliary.curator), running on a separate prompt cache to avoid disturbing live user chats CLAIM-208.
B. Governance Boundaries: Pinned & Protected Built-ins
- Pin Safeguards: Users or agents can declare a skill "pinned" (
"pinned": truein.usage.json). Pins exempt the skill from auto-transitions and prevent the agent from executing deletion tool calls (skill_manage(action="delete")) CLAIM-212. Importantly, the agent is still permitted to patch pinned skills, letting it refine rule details without unpinning CLAIM-212. - Protected Built-ins: Core UI skills (such as
/planexecution schemas) are hardcoded as protected built-ins, preventing them from ever being consolidated or archived CLAIM-210. - Agent-Created Constraint: Curation is restricted strictly to skills flagged
"created_by": "agent"or"agent_created": truein.usage.json(meaning they were authored by the background review fork) CLAIM-210. User-authored or hand-copied skills are left untouched to prevent breaking manual configurations CLAIM-210.
C. LLM Consolidation & Umbrella Building
When curator.consolidate: true is active, the background reviews agent evaluates candidate skills to combat catalog bloat:
- Prefix Clustering: The agent groups narrow skills sharing keyword domains (e.g.,
git-rebase-bug,git-commit-hook) CLAIM-213. - Umbrella Merger: Overlapping micro-skills are absorbed into broad class-level instructions (e.g.
git-workflow) CLAIM-213. - Demotion to Support Files: Narrow scripts, templates, or references are detached from standalone status and placed in the umbrella skill's directory (
references/,templates/, orscripts/) while the source skill is archived CLAIM-213. - Package Integrity Enforcement: If a source skill contains relative links or assets, the curator must migrate these files and rewrite links inside the umbrella instructions to prevent broken references CLAIM-214.
D. Security Audits & Backups
- Vulnerability Scanning: Agent-created skills undergo AST audits and static security checks using a validation guard (
tools/skills_guard.py) before registration to block shell injection or credential theft CLAIM-215. - Tarball Snapshots: Before modifying the skills directory, the curator takes a compressed tarball snapshot (
skills.tar.gz) stored under~/.hermes/skills/.curator_backups/along with amanifest.jsonrecord CLAIM-216. This allows full multi-tier rollback in case of an LLM consolidation error CLAIM-216.
3. User Preference Caching (Claude Code Auto-Memory)
Autonomous improvement also extends to aligning with developer style preferences.
- Preference Extraction: During turns, the model observes user feedback (e.g. "I prefer pnpm over npm", "use HSL color formats").
- Auto-Memory Logging: The system updates a central
.claude/memory.mdfile in the user's workspace containing learned preferences CLAIM-204. - Developer Audits: Users can view and edit this file using
/memorycommands, keeping the learning loop highly transparent and auditable CLAIM-204.
4. Advanced Academic & Research Paradigms
Academic frameworks expand self-improvement to model weights and inference paths:
- Test-Time Self-Improvement (TT-SI): The agent monitors its execution traces, identifies failure cases (e.g. compile errors), synthesizes correction pairs, and runs local fine-tuning loops to adapt its model parameters for the workspace CLAIM-217.
- Recursive Introspection (RISE): An inference-time approach where the agent evaluates its intermediate reasoning steps against self-reward metrics, iteratively rewriting its execution plans mid-turn to prevent errors before calling terminal commands CLAIM-217.
5. Traceability Map & Sources
The curation and self-improvement specifications are grounded in real-world implementations:
- Nous Hermes Curator Orchestrator: Detailed in curator.py [SRC-002].
- Nous Hermes Telemetry Sidecar: Detailed in skill_usage.py [SRC-002].
- Nous Hermes Skills Creation Guard: Detailed in skill_manager_tool.py [SRC-002].
- VS Code Copilot & Claude Code Memory: Documented in GitHub Copilot Agent Mode [SRC-020].
- RISE and TT-SI Research: Documented in academic papers detailing recursive self-training loops [SRC-014].