09 — Skills (SKILL.md) & Self-Improvement Loops
What Was Researched
The SKILL.md file format, agent skills systems, and background curation loops that enable agents to learn, store, refine, and reuse procedural knowledge. We analyzed how harnesses dynamically track skill usage, transition states to prevent token bloat, consolidate overlapping skills into broad umbrella classes, and log user styling preferences across sessions.
Research Documents
- self_improving_agents_and_learning_loops.md — Detailed analysis of background curation loops, telemetry sidecars, and preference learning.
Which Sources Were Used
| Source | Type | URL | Relevance |
|---|---|---|---|
Hermes Agent (hermes-agent/skills/, tools/skills_tool.py, tools/skill_manager_tool.py, agent/curator.py, tools/skill_usage.py) |
Local codebase | https://github.com/NousResearch/hermes-agent | CRITICAL |
| OpenClaw skills system | Local codebase | https://github.com/openclaw/openclaw | HIGH |
| Codex AGENTS.md | Local codebase | https://github.com/openai/codex | MEDIUM |
| agentskills.io standard | External reference | https://agentskills.io | HIGH |
| RISE & TT-SI Research Papers | Academic Literature | https://arxiv.org/abs/2407.18210 | MEDIUM |
Key Findings
SKILL.md Format
The emerging standard for agent skills. A SKILL.md file contains:
- YAML frontmatter — Name, description, metadata
- Markdown body — Detailed instructions the agent follows when the skill is activated
Both Hermes and OpenClaw use this format. The agentskills.io open standard formalizes it.
Hermes Curation & Self-Improvement (Most Advanced)
Hermes implements an autonomous background self-improvement cycle:
- Curator Invocation: Triggered during inactivity (default: 7 days since last run, 2 hours user idle) CLAIM-208. Spawns a background
AIAgentfork on a cheaper auxiliary model (auxiliary.curator), running in its own prompt cache to inspect candidate skills CLAIM-208. - Telemetry Sidecar: Tracks views, uses, and patches in
~/.hermes/skills/.usage.jsonrather than raw file frontmatter CLAIM-209. - Deterministic Transitions: Moves unused agent-created skills from
active->stale(30 days) ->archived(90 days, moved to.archive/directory) CLAIM-211. - Consolidation (Umbrella Building): Merges narrow task-specific skills into broad class-level instruction files (e.g.
git-workflow) CLAIM-213. Consolidates by patching existing umbrellas, creating new ones, or demoting micro-skills to subfiles (references/,templates/,scripts/) under the umbrella folder CLAIM-213, maintaining relative link and package path integrity CLAIM-214. - Security & Protection: Scans agent-created skills via AST analysis and static checks CLAIM-215. Pinned skills (
"pinned": true) are exempt from archiving and deletes while permitting patches CLAIM-212. - Backups: Takes pre-run tarball snapshots (
skills.tar.gz) under.curator_backups/for multi-tier rollbacks CLAIM-216.
User Preference Learning (Claude Code)
- Preference Extraction: Extracts user styling/tool preferences dynamically and logs them to a local
.claude/memory.mdfile in the workspace CLAIM-204. - Auditability: Developers audit or edit preferences using
/memorycommands or direct file edits CLAIM-204.
Academic Paradigms (RISE & TT-SI)
- Test-Time Self-Improvement (TT-SI): Fine-tunes model parameters dynamically using synthetic instruction pairs generated from compilation or runtime traces CLAIM-217.
- Recursive Introspection (RISE): Rewrites reasoning paths mid-turn based on internal evaluation rewards to bypass terminal errors CLAIM-217.
How This Applies to Building a Model-Agnostic Agent Harness
- Adopt the SKILL.md standard: Use YAML frontmatter for description-based matching.
- Isolate Telemetry: Keep usage metrics in a JSON sidecar to preserve cache stability and avoid VCS pollution CLAIM-209.
- Implement an Idle Curation Daemon: Run background reviews during inactivity to merge duplicate/narrow guidelines into class-level umbrellas CLAIM-208, CLAIM-213.
- Enforce Safe Sandbox Scanning: Scan skills with AST tools before execution CLAIM-215.
- Enable Tarball Backups: Auto-snapshot the skills directory before curation rollouts CLAIM-216.
- Expose Pinned Exemptions: Allow developers to pin load-bearing guidelines CLAIM-212.