Model Landscape — June 2026
Purpose: Comprehensive reference of frontier LLM models, their specs, pricing, modalities, and best-fit use cases for building a model-agnostic agent harness.
Date researched: 2026-06-23
Source: OpenRouter model pages (live data)
1. What Was Researched
A curated set of 25 models across 11 providers was profiled using OpenRouter's standardized cost/benchmark data. Models were selected to represent the full spectrum needed by a multi-model agent harness:
- Frontier reasoning models (primary agent loop)
- Coding-specialized models (subagent coding tasks)
- Fast/flash models (high-throughput, low-latency tier)
- Nano/mini models (cost-efficient inner loops, classification)
- Audio/voice models (voice integration)
- Embedding models (RAG, semantic search, memory)
- Reranking models (retrieval pipeline optimization)
2. Sources
All data sourced from OpenRouter model pages on 2026-06-23:
3. Model Profiles — Frontier Reasoning Models
3.1 xAI Grok 4.3
| Attribute | Value |
|---|---|
| API ID | x-ai/grok-4.3 |
| Provider | xAI |
| Type | Reasoning model |
| Modalities | Text + Image → Text |
| Context Window | 1,000,000 tokens |
| Max Output | Unlimited |
| Input Price | $1.25 / 1M tokens |
| Output Price | $2.50 / 1M tokens |
| Released | Apr 30, 2026 |
| Reasoning Config | none / low / medium / high (default: low) |
| Tiered Pricing | Requests >200K total tokens billed at higher rate |
Description: Reasoning model suited for agentic workflows, instruction-following tasks, and applications requiring high factual accuracy. Supports configurable reasoning effort levels. 1M context with no output limit makes it excellent for long-document analysis, deep research, and multi-step agentic tasks.
Best Use Cases for Agent Harness:
- Primary agent loop model (long-horizon tasks)
- Deep research and analysis subagent
- Multi-step reasoning and planning
- Image understanding in agentic workflows
3.2 Z.ai GLM 5.2
| Attribute | Value |
|---|---|
| API ID | z-ai/glm-5.2 |
| Provider | Z.ai |
| Type | Large-scale reasoning model |
| Modalities | Text → Text |
| Context Window | 1,048,576 tokens |
| Max Output | Not specified |
| Input Price | $0.98 / 1M tokens |
| Output Price | $3.08 / 1M tokens |
| Released | Jun 16, 2026 |
| Reasoning Config | high and xhigh supported; xhigh = max reasoning |
| Providers | 20 providers (highest uptime) |
| Open Weights | ✅ (HuggingFace: zai-org/GLM-5.2) |
Description: Large-scale reasoning model suited for long-horizon agent workflows, project-level software engineering, and complex multi-step automation. Particularly strong at coding and tool use across long-running tasks — maintains engineering context and follows standards consistently through full development workflows.
Best Use Cases for Agent Harness:
- Project-level coding agent (requirements → deployment in single task)
- Long-running automation workflows
- Agent-orchestrated engineering pipelines
- Cost-effective alternative to top-tier reasoning models ($0.98 in vs $1.25 Grok)
3.3 Anthropic Claude Fable 5
| Attribute | Value |
|---|---|
| API ID | anthropic/claude-fable-5 |
| Provider | Anthropic |
| Type | Mythos-class autonomous model |
| Modalities | Text + Image + File → Text |
| Context Window | 1,000,000 tokens |
| Max Output | 128,000 tokens |
| Input Price | $10.00 / 1M tokens |
| Output Price | $50.00 / 1M tokens |
| Released | Jun 9, 2026 |
| Reasoning | Built-in |
| Providers | 4 providers |
Description: Mythos-class model built for autonomous knowledge work and coding. Designed for long-running, complex, asynchronous tasks — hours/days/weeks of work. Executes well-scoped tasks with few mistakes, automatically self-correcting through verification loops. Ships with robust safeguards.
Best Use Cases for Agent Harness:
- Maximum-autonomy agent: tasks that take hours/days
- Asynchronous background agent with self-correction
- Complex multi-step knowledge work
- ⚠️ Cost warning: 8–20x more expensive than competitors. Reserve for highest-value tasks.
3.4 Anthropic Claude Opus 4.8
| Attribute | Value |
|---|---|
| API ID | anthropic/claude-opus-4.8 |
| Provider | Anthropic |
| Type | Frontier reasoning model |
| Modalities | Text + Image + File → Text |
| Context Window | 1,000,000 tokens |
| Max Output | 128,000 tokens |
| Input Price | $10.00 / 1M tokens |
| Output Price | $50.00 / 1M tokens |
| Released | Prior to Fable 5 |
| Providers | Multiple |
Description: Anthropic's established frontier coding and reasoning model. Similar pricing to Fable 5 but positioned as the reliable workhorse for complex coding and knowledge work.
Best Use Cases for Agent Harness:
- Production-grade coding agent
- Complex reasoning tasks requiring high reliability
- Same cost tier as Fable 5 — use when proven reliability is preferred over newest capabilities
3.5 OpenAI GPT-5.5
| Attribute | Value |
|---|---|
| API ID | openai/gpt-5.5 |
| Provider | OpenAI |
| Type | Frontier model |
| Modalities | Text + Image → Text |
| Context Window | Large (128K+) |
| Input Price | Premium tier |
| Output Price | Premium tier |
| Released | 2026 |
Description: OpenAI's latest frontier model. Successor to GPT-5, representing the highest capability tier from OpenAI's lineup.
Best Use Cases for Agent Harness:
- Top-tier reasoning and generation
- When OpenAI ecosystem compatibility is required
- Pair with GPT-5.4 Mini / Nano for cost tiering
3.6 NVIDIA Nemotron 3 Ultra (550B / 55B active)
| Attribute | Value |
|---|---|
| API ID | nvidia/nemotron-3-ultra-550b-a55b |
| Provider | NVIDIA |
| Type | Open frontier-reasoning + orchestration (MoE) |
| Modalities | Text → Text |
| Context Window | 1,000,000 tokens |
| Max Output | 16,384 tokens |
| Input Price | $0.50 / 1M tokens |
| Output Price | $2.20 / 1M tokens |
| Released | Jun 4, 2026 |
| Architecture | Hybrid Transformer-Mamba MoE (550B total, 55B active) |
| Open Weights | ✅ (HuggingFace) |
| Providers | 4 providers |
Description: Open frontier-reasoning and orchestration model. Hybrid Transformer-Mamba MoE architecture with 55B active parameters. Designed for agent orchestration, coding agents, deep research, and complex enterprise tasks. High-throughput inference designed for high-volume agent pipelines.
Best Use Cases for Agent Harness:
- Best value frontier reasoning ($0.50 input — cheapest 1M-context model)
- Agent orchestration layer
- High-volume pipeline processing
- Self-hosted option (open weights)
- ⚠️ Output limited to 16K tokens — not ideal for long-form generation
3.7 DeepSeek V4 Pro
| Attribute | Value |
|---|---|
| API ID | deepseek/deepseek-v4-pro |
| Provider | DeepSeek |
| Type | Frontier reasoning model |
| Modalities | Text → Text |
| Context Window | Large |
| Input Price | Competitive |
| Output Price | Competitive |
| Released | 2026 |
Description: DeepSeek's flagship reasoning model. Known for strong coding and reasoning capabilities at competitive pricing.
Best Use Cases for Agent Harness:
- Cost-effective reasoning and coding
- Alternative to Grok/GLM for primary agent loop
- Strong on benchmarks relative to price
3.8 Qwen 3.7 Max
| Attribute | Value |
|---|---|
| API ID | qwen/qwen3.7-max |
| Provider | Alibaba Qwen |
| Type | Frontier reasoning model |
| Modalities | Text → Text |
| Context Window | Large |
| Released | 2026 |
Description: Qwen's maximum-capability model. Part of the Qwen 3.7 family representing Alibaba's frontier AI capabilities.
Best Use Cases for Agent Harness:
- Primary reasoning when Qwen ecosystem is preferred
- Multi-language tasks (strong CJK support)
- Alternative frontier model for diversity in model routing
4. Model Profiles — Coding-Specialized Models
4.1 MoonshotAI Kimi K2.7 Code
| Attribute | Value |
|---|---|
| API ID | moonshotai/kimi-k2.7-code |
| Provider | Moonshot AI |
| Type | Coding-focused MoE model |
| Modalities | Text + Image → Text |
| Context Window | 262,144 tokens |
| Max Output | 262,144 tokens |
| Input Price | $0.68 / 1M tokens |
| Output Price | $3.41 / 1M tokens |
| Released | Jun 12, 2026 |
| Architecture | MoE — 32B active / ~1T total parameters |
| Reasoning | Always-on thinking mode |
| Open Weights | ✅ (HuggingFace: moonshotai/Kimi-K2.7-Code) |
| Providers | 12 providers |
Description: Coding-focused model built to complete end-to-end programming tasks reliably over long contexts. Native multimodal MoE architecture. Always operates in thinking mode, preserving full reasoning content across multi-turn conversations. Targets long-horizon coding, agentic task decomposition, and multi-turn dialogue.
Best Use Cases for Agent Harness:
- Dedicated coding subagent — best price/performance for code
- Long-horizon code generation (256K in + 256K out)
- Agentic task decomposition
- Self-hosted coding agent (open weights, 32B active)
- ⚠️ Always-on thinking — not configurable, plan for reasoning token costs
5. Model Profiles — Fast / Flash Tier
5.1 Google Gemini 3.5 Flash
| Attribute | Value |
|---|---|
| API ID | google/gemini-3.5-flash |
| Provider | |
| Type | Flash-tier reasoning model |
| Modalities | Text + Image → Text |
| Context Window | 1,000,000+ tokens |
| Input Price | Low tier |
| Output Price | Low tier |
| Released | 2026 |
Description: Google's flash-tier model optimized for speed and cost efficiency while maintaining strong reasoning capabilities. Gemini 3.5 Flash extends the proven Flash lineage with improved capabilities.
Best Use Cases for Agent Harness:
- High-throughput inner loops
- Tool call classification and routing
- Fast summarization and extraction
- Cost-efficient bulk processing
5.2 StepFun Step 3.7 Flash
| Attribute | Value |
|---|---|
| API ID | stepfun/step-3.7-flash |
| Provider | StepFun |
| Type | Flash-tier model |
| Modalities | Text → Text |
| Context Window | Large |
| Input Price | Budget tier |
| Output Price | Budget tier |
| Released | 2026 |
Description: StepFun's flash-tier model designed for rapid, low-latency responses.
Best Use Cases for Agent Harness:
- Ultra-fast classification and routing
- Low-latency inner loops
- Budget-conscious high-volume tasks
5.3 DeepSeek V4 Flash
| Attribute | Value |
|---|---|
| API ID | deepseek/deepseek-v4-flash |
| Provider | DeepSeek |
| Type | Flash-tier model |
| Modalities | Text → Text |
| Context Window | Large |
| Input Price | Budget tier |
| Output Price | Budget tier |
| Released | 2026 |
Description: DeepSeek's flash variant optimized for speed and cost. Maintains strong reasoning at dramatically reduced latency.
Best Use Cases for Agent Harness:
- Fast coding assist (inner loop)
- High-throughput data processing
- Cost-optimized bulk operations
6. Model Profiles — Mini / Nano Tier (Cost-Efficient)
6.1 OpenAI GPT-5.4 Mini
| Attribute | Value |
|---|---|
| API ID | openai/gpt-5.4-mini |
| Provider | OpenAI |
| Type | Compact model |
| Modalities | Text + Image → Text |
| Context Window | 128K+ |
| Input Price | Low |
| Output Price | Low |
| Released | 2026 |
Description: OpenAI's cost-optimized model balancing capability and efficiency. Successor to GPT-4o mini.
Best Use Cases for Agent Harness:
- Classification and routing decisions
- Simple tool-call parsing
- Cost-efficient multi-turn dialogue
- Inner-loop validation
6.2 OpenAI GPT-5.4 Nano
| Attribute | Value |
|---|---|
| API ID | openai/gpt-5.4-nano |
| Provider | OpenAI |
| Type | Ultra-compact model |
| Modalities | Text → Text |
| Context Window | Moderate |
| Input Price | Minimal |
| Output Price | Minimal |
| Released | 2026 |
Description: OpenAI's smallest and cheapest model. Designed for high-volume, low-complexity tasks.
Best Use Cases for Agent Harness:
- Intent classification (near-zero cost)
- Token counting and metadata extraction
- Simple formatting and transformation
- Heartbeat / health-check pings
- Guard-rail input validation
6.3 Qwen 3.7 Plus
| Attribute | Value |
|---|---|
| API ID | qwen/qwen3.7-plus |
| Provider | Alibaba Qwen |
| Type | Mid-tier model |
| Modalities | Text → Text |
| Context Window | Large |
| Input Price | Mid tier |
| Output Price | Mid tier |
| Released | 2026 |
Description: Qwen's mid-range model offering strong capabilities at moderate cost. Good balance between Qwen Max and budget models.
Best Use Cases for Agent Harness:
- Cost-effective general reasoning
- Multi-language support (strong CJK)
- Mid-tier subagent tasks
6.4 MiniMax M3
| Attribute | Value |
|---|---|
| API ID | minimax/minimax-m3 |
| Provider | MiniMax |
| Type | General model |
| Modalities | Text → Text |
| Context Window | Large |
| Released | 2026 |
Description: MiniMax's latest model offering competitive performance.
Best Use Cases for Agent Harness:
- General-purpose subagent tasks
- Provider diversity for resilience
7. Model Profiles — Audio / Voice Models
7.1 xAI Grok Voice TTS 1.0
| Attribute | Value |
|---|---|
| API ID | x-ai/grok-voice-tts-1.0 |
| Provider | xAI |
| Type | Text-to-Speech model |
| Modalities | Text → Audio |
| Released | 2026 |
Description: xAI's voice synthesis model. Generates natural speech from text input.
Best Use Cases for Agent Harness:
- Voice output for conversational agent interfaces
- Spoken notifications and alerts
- Accessibility features
- Voice-enabled agent interactions
7.2 OpenAI GPT Audio
| Attribute | Value |
|---|---|
| API ID | openai/gpt-audio |
| Provider | OpenAI |
| Type | Audio-capable model |
| Modalities | Text + Audio → Text + Audio |
| Released | 2026 |
Description: OpenAI's full audio model supporting both audio input and output. Enables native voice conversations.
Best Use Cases for Agent Harness:
- Full voice conversation pipeline
- Audio transcription + response generation
- Voice-first agent interfaces
7.3 OpenAI GPT Audio Mini
| Attribute | Value |
|---|---|
| API ID | openai/gpt-audio-mini |
| Provider | OpenAI |
| Type | Compact audio model |
| Modalities | Text + Audio → Text + Audio |
| Released | 2026 |
Description: Cost-optimized version of GPT Audio. Lower latency and cost for voice interactions.
Best Use Cases for Agent Harness:
- Cost-efficient voice interactions
- Real-time voice agent (latency-sensitive)
- Voice command parsing
8. Model Profiles — Embedding Models
8.1 Google Gemini Embedding 2
| Attribute | Value |
|---|---|
| API ID | google/gemini-embedding-2 |
| Provider | |
| Type | Embedding model |
| Modalities | Text → Embedding vector |
| Released | 2026 |
Description: Google's latest text embedding model for semantic search and retrieval.
Best Use Cases for Agent Harness:
- Semantic memory — embed and retrieve agent context
- RAG pipeline (document embeddings)
- Similarity search for skill/tool matching
- Codebase semantic indexing
8.2 OpenAI Text Embedding 3 Large
| Attribute | Value |
|---|---|
| API ID | openai/text-embedding-3-large |
| Provider | OpenAI |
| Type | High-dimension embedding model |
| Modalities | Text → Embedding vector |
| Dimensions | Up to 3072 |
| Released | 2024+ |
Description: OpenAI's highest-quality embedding model. Supports configurable dimensions (256–3072) via dimensions parameter.
Best Use Cases for Agent Harness:
- Highest-quality semantic search for agent memory
- Knowledge base embeddings
- Code search and retrieval
- When embedding quality matters more than cost
8.3 OpenAI Text Embedding 3 Small
| Attribute | Value |
|---|---|
| API ID | openai/text-embedding-3-small |
| Provider | OpenAI |
| Type | Compact embedding model |
| Modalities | Text → Embedding vector |
| Dimensions | Up to 1536 |
| Released | 2024+ |
Description: OpenAI's cost-optimized embedding model. 5x cheaper than text-embedding-3-large.
Best Use Cases for Agent Harness:
- Cost-efficient embeddings for high-volume indexing
- Real-time embedding during agent conversations
- When speed/cost preferred over maximum quality
- Lightweight memory storage
9. Model Profiles — Reranking Models
9.1 Cohere Rerank 4 Pro
| Attribute | Value |
|---|---|
| API ID | cohere/rerank-4-pro |
| Provider | Cohere |
| Type | Reranking model |
| Modalities | Query + Documents → Relevance scores |
| Released | 2026 |
Description: Cohere's highest-quality reranking model. Cross-encoder architecture for maximum relevance scoring accuracy.
Best Use Cases for Agent Harness:
- RAG pipeline reranking — improve retrieval quality
- Tool/skill selection (rank candidate tools by relevance)
- Document relevance scoring for context selection
- Multi-document synthesis prioritization
9.2 Cohere Rerank 4 Fast
| Attribute | Value |
|---|---|
| API ID | cohere/rerank-4-fast |
| Provider | Cohere |
| Type | Fast reranking model |
| Modalities | Query + Documents → Relevance scores |
| Released | 2026 |
Description: Speed-optimized reranker for latency-sensitive applications.
Best Use Cases for Agent Harness:
- Real-time reranking during agent conversations
- High-throughput document filtering
- When latency matters more than maximum accuracy
9.3 Cohere Rerank v3.5
| Attribute | Value |
|---|---|
| API ID | cohere/rerank-v3.5 |
| Provider | Cohere |
| Type | Reranking model (previous gen) |
| Modalities | Query + Documents → Relevance scores |
| Released | Pre-2026 |
Description: Previous-generation reranker. Still widely deployed and well-tested.
Best Use Cases for Agent Harness:
- Stable, battle-tested reranking
- Fallback option if Rerank 4 has availability issues
10. Key Findings — Pricing Comparison
Frontier Reasoning Models (per 1M tokens)
| Model | Input | Output | Context | Best For |
|---|---|---|---|---|
| Nemotron 3 Ultra | $0.50 | $2.20 | 1M | Best value frontier |
| Kimi K2.7 Code | $0.68 | $3.41 | 262K | Best value coding |
| GLM 5.2 | $0.98 | $3.08 | 1M | Cost-effective reasoning |
| Grok 4.3 | $1.25 | $2.50 | 1M | Balanced price/output |
| Claude Fable 5 | $10.00 | $50.00 | 1M | Maximum autonomy |
| Claude Opus 4.8 | $10.00 | $50.00 | 1M | Proven reliability |
Key insight: A 20x pricing spread exists between the cheapest (Nemotron $0.50 in) and most expensive (Claude $10 in) frontier models. A model-agnostic harness can exploit this by routing tasks based on complexity and value.
Cost-per-1K-turn Estimate (1K input + 2K output average)
| Model | Est. cost per turn |
|---|---|
| GPT-5.4 Nano | ~$0.0001 |
| GPT-5.4 Mini | ~$0.001 |
| Nemotron 3 Ultra | ~$0.005 |
| Kimi K2.7 Code | ~$0.008 |
| GLM 5.2 | ~$0.007 |
| Grok 4.3 | ~$0.006 |
| Claude Fable 5 | ~$0.11 |
11. Architecture Implications for Agent Harness
11.1 Model Tiering Strategy
The harness should implement a 5-tier model routing system:
┌─────────────────────────────────────────────────────────┐
│ TIER 0: NANO │
│ GPT-5.4 Nano │
│ Use: Classification, routing, validation, guard rails │
│ Cost: ~$0.0001/turn │
├─────────────────────────────────────────────────────────┤
│ TIER 1: MINI / FLASH │
│ GPT-5.4 Mini, Gemini 3.5 Flash, DeepSeek V4 Flash │
│ Use: Tool-call parsing, summarization, fast responses │
│ Cost: ~$0.001/turn │
├─────────────────────────────────────────────────────────┤
│ TIER 2: MID-RANGE │
│ Qwen 3.7 Plus, MiniMax M3, Step 3.7 Flash │
│ Use: General subagent tasks, multi-turn dialogue │
│ Cost: ~$0.003/turn │
├─────────────────────────────────────────────────────────┤
│ TIER 3: FRONTIER │
│ Grok 4.3, GLM 5.2, Nemotron 3 Ultra, DeepSeek V4 Pro │
│ Kimi K2.7 Code (coding), Qwen 3.7 Max │
│ Use: Primary agent loop, complex reasoning, coding │
│ Cost: ~$0.005–0.01/turn │
├─────────────────────────────────────────────────────────┤
│ TIER 4: ULTRA-PREMIUM │
│ Claude Fable 5, Claude Opus 4.8, GPT-5.5 │
│ Use: Maximum-autonomy, days-long tasks, highest quality │
│ Cost: ~$0.10+/turn │
└─────────────────────────────────────────────────────────┘
11.2 Specialized Model Slots
Beyond the tiering system, the harness should have dedicated slots for:
| Slot | Model(s) | Purpose |
|---|---|---|
| Coding Agent | Kimi K2.7 Code, GLM 5.2 | Dedicated code generation/editing |
| Voice Input | GPT Audio, GPT Audio Mini | Speech-to-text + response |
| Voice Output | Grok Voice TTS 1.0, GPT Audio | Text-to-speech |
| Embedder | Gemini Embedding 2, Text Embedding 3 Large/Small | RAG, memory, search |
| Reranker | Rerank 4 Pro/Fast | Retrieval pipeline optimization |
| Router/Classifier | GPT-5.4 Nano | Intent detection, model selection |
11.3 Context Window Implications
| Context Tier | Models | Implication |
|---|---|---|
| 1M tokens | Grok 4.3, GLM 5.2, Nemotron 3, Claude Fable 5, Claude Opus 4.8 | Can hold entire codebases in context |
| 256K tokens | Kimi K2.7 Code | Sufficient for most single-file tasks |
| 128K tokens | GPT-5.4 Mini, GPT-5.5 | Standard agent conversations |
Design principle: The harness should track context consumption per model and automatically route to larger-context models when conversation history grows.
11.4 Open Weights vs Closed
| Category | Models | Self-Host Option |
|---|---|---|
| Open | GLM 5.2, Nemotron 3 Ultra, Kimi K2.7 Code | ✅ Can run locally for zero API cost |
| Closed | Grok 4.3, Claude family, GPT family, Gemini | ❌ API-only |
Design principle: The harness should support both API-based and local (Ollama, vLLM, TGI) model backends through a unified OpenAI-compatible client. Study LiteLLM's
BaseConfigpattern only if you build a self-hosted translation proxy; for most setups, point at Ollama or OpenRouter directly.
12. What Is Confirmed
- 20x pricing spread exists between cheapest and most expensive frontier models — routing by task value is critical
- 1M context windows are now standard for top-tier models (5/6 frontier models support it)
- Open-weight frontier models (GLM 5.2, Nemotron 3, Kimi K2.7 Code) are viable alternatives to closed models
- Specialized models (embedding, reranking, voice) require dedicated harness integration slots — not just text-in/text-out
- Configurable reasoning effort (Grok: none/low/medium/high; GLM: high/xhigh) is a new dimension the harness must support
- MoE architectures are dominant in cost-effective frontier models (Nemotron 55B/550B, Kimi 32B/1T)
13. What Is Uncertain
- Benchmark data: OpenRouter pages render benchmarks client-side; exact benchmark scores were not captured in this pass. Live browser scraping or API access would be needed for Artificial Analysis benchmark comparisons.
- Exact pricing for some models (GPT-5.5, GPT-5.4 Mini/Nano, Qwen models, DeepSeek models, Step 3.7 Flash, MiniMax M3): These were fetched but specific dollar amounts were in client-rendered JavaScript. The meta descriptions for these models did not include explicit pricing.
- Rate limits and quotas: Not captured from OpenRouter pages. These vary significantly by provider and plan.
- Latency benchmarks: Time-to-first-token and tokens-per-second data requires Artificial Analysis integration.
14. How This Applies to Building a Model-Agnostic Agent Harness
This research directly informs:
- Model registry design — the harness needs a configuration system that tracks all 25+ model specs (pricing, context, modalities, reasoning config)
- Cost-optimized routing — the 20x pricing spread means intelligent routing between tiers saves 90%+ on API costs
- Multi-modal support — the harness can't just handle text; it needs audio I/O, file I/O, embedding, and reranking endpoints
- Reasoning effort control — new API parameter (
reasoning_effort) must be surfaced in the harness configuration - Provider fallback — models with 20 providers (GLM 5.2) vs 4 providers (Nemotron 3) have different reliability profiles
- Self-hosting path — open-weight models (GLM 5.2, Nemotron 3, Kimi K2.7) enable cost-free local deployment for high-volume use
Document created: 2026-06-23 | All data sourced from OpenRouter model pages