13 — Model-Agnostic Harness Architecture
What Was Researched
Architectural patterns for building AI agent harnesses that are not tied to any specific LLM provider — abstraction layers, provider adapters, unified message formats, capability negotiation, fallback strategies, and runtime model switching.
Additionally, a comprehensive model landscape survey covering 25 models across 11 providers was conducted to inform the architecture with real-world pricing, context windows, modalities, and capability data.
Which Sources Were Used
- 25 OpenRouter model pages (fetched live 2026-06-23) — see model_landscape_june_2026.md for full source table
- Providers: xAI, Z.ai, Anthropic, OpenAI, NVIDIA, Google, MoonshotAI, Qwen, MiniMax, StepFun, Cohere, DeepSeek
Key Findings
Model Landscape (June 2026)
See full document: model_landscape_june_2026.md
Key takeaways:
- 20x pricing spread between cheapest ($0.50/M Nemotron) and most expensive ($10/M Claude) frontier models — intelligent model routing is critical
- 1M context windows are now standard for top-tier models (5 of 8 frontier models)
- Open-weight frontier models (GLM 5.2, Nemotron 3, Kimi K2.7 Code) are now viable alternatives to closed models
- 5 model tiers recommended for the harness: Nano → Mini/Flash → Mid-Range → Frontier → Ultra-Premium
- Specialized model slots required beyond text generation: Coding Agent, Voice I/O, Embedder, Reranker, Router/Classifier
- Configurable reasoning effort (Grok: none/low/medium/high; GLM: high/xhigh) is a new API parameter dimension
What Is Confirmed
- Research topic scoped and defined
- Model landscape research complete with verified pricing data from OpenRouter
- 5-tier routing architecture is supported by the pricing data
- Open-weight models provide a viable self-hosting path
What Is Uncertain
- How to handle provider-specific features (e.g., caching, extended thinking) in a model-agnostic way
- Optimal abstraction granularity — too abstract loses features, too concrete loses portability
- How to handle heterogeneous tool calling formats across providers
- Exact benchmark scores (rendered client-side on OpenRouter; would need Artificial Analysis API)
- Exact pricing for some models where meta descriptions didn't include explicit dollar amounts
How This Applies to Building a Modern Model-Agnostic Agent Harness
Model routing is not LiteLLM-specific. Use an OpenAI-compatible client pointed at Ollama (local), OpenRouter (hosted), a LiteLLM proxy (optional), or direct provider APIs.
This is the central architectural question. The harness must define a clean abstraction boundary that allows swapping LLM providers without changing agent logic, while still exposing provider-specific capabilities through an extension mechanism.
The model landscape research directly informs:
- Model registry design — track specs for 25+ models (pricing, context, modalities, reasoning config)
- Cost-optimized routing — 20x pricing spread enables 90%+ cost savings through intelligent tiering
- Multi-modal support — harness needs audio I/O, file I/O, embedding, and reranking endpoints, not just text
- Reasoning effort control — new
reasoning_effortparameter must be surfaced in configuration - Provider fallback — models range from 4 to 20 providers; reliability profiles differ significantly
- Self-hosting path — open-weight models (GLM 5.2, Nemotron 3, Kimi K2.7) enable zero-API-cost local deployment