Seven contenders. One Operator. A field map of the 2026 agent stack.

The Agent OS
Stack.

A Field Comparison
Tools vs. Tooling vs. Operating Systems

Every six months the open-source agent landscape rearranges itself. OpenClaw goes viral. Hermes ships self-evolution. Antigravity arrives from Mountain View. Codex renames itself again. Claude Code keeps absorbing protocols. This is the map — and where our own ecosystem fits inside it.

/01 Why this exists

None of these tools are the operating system. They are processes. The operating system is the one you build around them — the doctrine, the gates, the seven files, the four layers.

OpenClaw 270K★ / thClaws Rust · sovereign / Hermes 3-tier memory / Letta memory-first / Codex AGENTS.md / Antigravity browser-MCP / Claude Code 1M ctx / Aider git-native / Cursor IDE-first / Cline VS Code / Gemini CLI terminal / Windsurf Cascade / Devin autonomous SWE / Open Interpreter computer-use / Manus web agent / Operator OS 4-layer / OpenClaw 270K★ / thClaws Rust · sovereign / Hermes 3-tier memory / Letta memory-first / Operator OS 4-layer /

/02 ─ The Contenders

Seven processes, one Operator.

Below: seven full-row contenders (one of them ours), an eight-cell supporting cast of editor-grade tools, and the bespoke ecosystem they all run inside. The last entry gets a layer diagram — because it is the only one that owns the others.

OpenClaw.

Community fork · 430,000 LOC TypeScript · Node.js 18+ · MIT

The viral one. 270,000+ GitHub stars in 84 days. A personal AI agent in your messenger — Telegram, WhatsApp, Discord, twenty more channels — backed by 700+ skills and a full computer-use surface. Palo Alto Networks flagged it as a “security nightmare”: documented CVEs and malicious skills found in ClawHub. Always run with Docker isolation. The original — and the trigger that forked half the open-source agent landscape onto every class of hardware imaginable (see Vol. 01).

Edge

Largest plugin marketplace anywhere
Messaging-native (22+ channels)
Most complete feature set
MIT — fork-friendly by design

Trade

Documented CVEs · audit-hostile
1.5 GB RAM minimum
TypeScript monolith — slow to start
ClawHub skills are not curated

thClaws.

thClaws org · github.com/thClaws/thClaws · Rust · Apache 2.0 · 1,043 stars / 143 forks

Native-Rust open-source agent harness. Multi-provider, runs entirely on your own machine, sovereignty-by-design — the manifesto on the homepage at thclaws.ai. Created April 20, 2026; pushed daily through May. Where OpenClaw is a 430,000-line TypeScript monolith, thClaws is the opposite bet: a single Rust binary, allowlist-first command execution, a much smaller attack surface to audit. Operator forked it on April 25 as a study mirror — interesting enough to track, not yet enough to build on.

Edge

Native Rust · single binary · audit-friendly
Multi-provider routing baked in
Sovereignty-by-design positioning
Apache 2.0 · fork without friction

Trade

~5 weeks old · spec moving fast
Rust expertise needed to extend
Smaller plugin surface than OpenClaw
Closer-to-runtime than to OS — overlaps Aider's niche

Hermes Agent.

NousResearch · MIT · ICLR 2026 Oral · Python · Reflexion pattern

A research-grade agent designed to think about itself. The 3-tier memory architecture — Procedural (skills), Semantic (MEMORY.md + USER.md), Episodic (transcripts + FTS5) — is the cleanest formal model anyone has shipped. The self-evolution variant adds GEPA: five constitutional constraint gates the agent must pass before committing. We adopted the pattern and the terminology in our §16 Self-Improvement Protocol — without the AGPL-transitive DSPy/GEPA Python dependencies.

Edge

3-tier memory as first-class design
Self-evolving constitutional gates (GEPA)
MIT-licensed core · ICLR-backed research
Reflexion loop is the right primitive

Trade

AGPL transitive deps (DSPy / GEPA)
RL training pipeline is heavy
Research-grade · not turnkey
Runtime install needs sandboxing

Letta / MemGPT.

Letta Labs · UC Berkeley research lineage · Apache 2.0 · Python · memory-first

Memory as a first-class citizen. Letta (formerly MemGPT) builds the agent around persistent memory, not the other way around — agent-managed core memory, archival memory, and recall buffer with explicit eviction policy. Parallels Hermes' 3-tier model from a different angle: where Hermes is research-pure (Procedural / Semantic / Episodic), Letta is production-pragmatic (Core / Archival / Recall). Apache-licensed, runnable locally, hostable as a service. If our §16 Reflexion pattern is the algorithm, Letta is one of the cleanest implementations of the underlying storage primitives.

Edge

Memory architecture as the product
Apache 2.0 · self-hostable
Berkeley research lineage · benchmarked
Clean storage primitives we can borrow

Trade

Narrower than Hermes — just memory, not full agent
Python service · ops overhead
Smaller community than OpenClaw / Hermes
Still iterating on tool-calling spec

Codex.

OpenAI · proprietary · CLI · GPT-5 / o-series

OpenAI's terminal coding agent. Reads AGENTS.md — the same stateful-file pattern Anthropic uses for CLAUDE.md and Google uses for GEMINI.md, just under a different name. Strong on code reasoning, multimodal, runs entirely against OpenAI's hosted models. Less skill ecosystem than Claude Code; no MCP yet. Best when you're already paying for ChatGPT Pro and want CLI access to the same brain.

Edge

OpenAI ecosystem leverage
Strong code reasoning · multimodal native
Simple AGENTS.md spec
Fast iteration on CLI

Trade

Vendor lock-in to OpenAI
No MCP integration
Smaller plugin surface
ChatGPT Pro effectively required

Antigravity.

Google · proprietary · agentic IDE · Gemini 3 · browser-MCP native

Google's bet on the agentic IDE. Gemini-powered, browser-native MCP integration, multi-agent orchestration baked into the editor itself. Excellent for workflows where you want the agent to read live DOM, click through pages, and ship the code in one motion. Vendor-locked to Gemini — but if you're already inside the Google ecosystem the integration is hard to beat. New, fast-moving, still re-shipping monthly.

Edge

Browser-MCP native
Gemini long-context windows
Multi-agent orchestration built-in
Google-scale infra behind it

Trade

Google ecosystem lock-in
Proprietary editor
Gemini-only LLM (no Claude / GPT)
Spec still in motion · 2026 product

Claude Code.

Anthropic · proprietary · CLI + IDE + Desktop + Web · 1M context · MCP-native

Anthropic's official engineering CLI. 1M token context window. MCP protocol for tool integration. A real skill marketplace. Sub-agents for parallel work. Hooks. Worktrees. Available across CLI, VS Code, JetBrains, desktop apps (Mac and Windows), and claude.ai/code. Stateful via CLAUDE.md + MEMORY.md + skills + hooks. The most polished multi-tool surface we evaluate against — and the one Vol. 01 was authored on.

Edge

1M token context window
MCP protocol leadership · large skill marketplace
CLI + IDE + Desktop + Web — every surface
Stateful by design (CLAUDE.md + hooks)

Trade

Anthropic vendor concentration risk
Paid plan effectively required for serious use
Less browser-native than Antigravity
Billing model still moving

The supporting cast.

Aider · Cursor · Cline · Gemini CLI · Windsurf · Devin · Open Interpreter · Manus

Each one solves one slice well. None of them stack into a 4-layer OS by themselves. Listed here because the right Operator OS uses them — it doesn't replace them.

Aider.

Python · Apache · terminal

git-native pair programmer. Opens a chat in your terminal, edits files, commits the diff. Cleanest LLM-coding-agent primitive on the market. Best for solo developers who live in git.

Cursor.

Cursor Inc. · proprietary · IDE

VS Code fork with Composer. Tight IDE experience, the Composer multi-file agent, fast autocomplete. Best for developers who want the agent baked into the editor surface and will pay a plan for it.

Cline.

Apache · VS Code extension · MCP-friendly

Open agent inside VS Code. Reads files, runs commands, supports MCP servers. Lighter than Cursor, more transparent. Best when you want Cursor-style capability without the proprietary stack.

Gemini CLI.

Google · proprietary · terminal

Google's terminal answer. Reads GEMINI.md (alias of CLAUDE.md). Free tier exists. Useful as a parallel runtime in our framework-agnostic doctrine — same constitution, different brain.

Windsurf.

Codeium · proprietary · agentic IDE

Codeium's answer to Cursor. Cascade agent built into the editor. Multi-model, not Gemini-locked the way Antigravity is. Best when you want Cursor-class capability without committing to Anthropic or Google.

Devin.

Cognition Labs · proprietary · autonomous SWE

The autonomous benchmark. Plans, codes, browses, ships — without an operator in the loop. $500/mo seat. The thing people cite when arguing whether agents replace engineers. Useful as a yardstick, not a workflow.

Open Interpreter.

MIT · Python · terminal computer-use

Local code-execution agent. Runs Python, Shell, JS on your machine in a sandbox. MIT, scriptable, no vendor. The simplest rent-an-agent primitive you can host yourself — and the cleanest one to wrap in an audit gate.

Manus.

Monica · proprietary · browser-first

Autonomous web agent that went viral early 2025. Plans tasks, drives a real browser, files reports. Spec opaque, performance variable. Watch for the architecture pattern; do not depend on the platform.

The Operator OS.

Phatchara Wipataphan · 2025– · doctrine-driven · framework-agnostic

Not a tool — an operating system for using all of them. A 4-layer doctrine that loads identically into Claude Code (CLAUDE.md), Codex (AGENTS.md), and Gemini CLI (GEMINI.md). Same constitution, three runtimes. Layer 1 holds the rules and the audit gates. Layer 2 routes every request through Strategy E (privacy → task → escalate) across Ollama, LiteLLM, and OpenRouter. Layer 3 runs the 24/7 intelligence pipeline. Layer 4 holds the specialized engines. Reflexion (§16) and Multi-Tool Verification (§16.7) keep the doctrine alive across sessions.

The 4-Layer Stack — schematic

L1 Sandbox CLAUDE / AGENTS / GEMINI · audits · MRIC scope · install gates

L2 Company Hub l2-hub :8100 · LiteLLM :4000 · 14 role agents · Strategy E routing · Redis :6380

L3 SBOS Solopreneur Foundation :8200 · 24/7 intel pipeline · NotebookLM grounded RAG

L4 Engines TMAO · HESS · AKASHA · Synapse · PlatformIO · VISTEC — each 7-file stateful

Edge

Framework-agnostic — one constitution, three runtimes
4-layer doctrine + MRIC scope boundaries
7-file stateful protocol per engine
Self-improving via §16 GEPA loop
LOCAL-first routing · sovereign data
Install gates + audit log + Multi-Tool Verification

Trade

Bespoke — no external community
Requires operator discipline
Doctrine onboarding is non-trivial
Maintenance overhead lives on you
Multi-PC operational complexity

Runtimes loading
the same constitution

Architectural layers
L1 · L2 · L3 · L4

Stateful files
per engine

540

Lines of doctrine
at v2.6

/03 ─ Decision

Pick your process.

Three honest paths, indexed by what you actually optimize for. The fourth — the Operator OS — is what happens when you stop optimizing for any single tool.

If you want

leverage.

→ Claude Code or Antigravity. Vendor-grade integration, MCP, multi-surface. Pay the plan, ship faster, accept the lock-in tradeoff.

→

If you want

openness.

→ OpenClaw, Hermes, or Aider. MIT or Apache, fork-friendly, community-driven. Trade polish and security posture for sovereignty over the source.

→

If you want

the operating system.

→ Build the Operator OS around them. The constitution is yours. The runtimes are interchangeable. Every tool in this table becomes a process inside Layer 2.

→

/04 ─ Side-by-Side

Cross-tool matrix.

Fifteen rows. Seven dimensions that actually move a buying decision. The protagonist sits at the bottom — it's the only row that owns the others.

Project	Vendor	License	Memory	Tools	Cost	Best For
OpenClaw	Community	MIT	Session + skills	700+ skills	$0 + API	Messaging-native personal agents
thClaws	thClaws org	Apache 2.0	Per-binary session	Rust harness · allowlist	$0 + provider	Sovereign Rust agent harness
Hermes Agent	NousResearch	MIT	3-tier P/S/E	Skills + Honcho	$/tok + GPU	Self-evolving research
Letta / MemGPT	Letta Labs	Apache 2.0	Core / Archival / Recall	Tool-calling spec	$/tok · self-host free	Memory-first persistent agent
Codex	OpenAI	Proprietary	AGENTS.md + session	Built-in	Plan + $/tok	OpenAI-ecosystem coding
Antigravity	Google	Proprietary	Project + GEMINI.md	Browser-MCP + IDE	$/tok	Gemini browser-driven coding
Claude Code	Anthropic	Proprietary	CLAUDE.md + MCP + hooks	MCP + skills + sub-agents	Plan + $/tok	Multi-tool engineering execution
Aider	Community	Apache	Git history	Git-native	$/tok	Terminal pair coding
Cursor	Cursor Inc.	Proprietary	Project	Composer	Plan	IDE-first agentic coding
Cline	Cline	Apache	Project	MCP	$/tok	Open VS Code agent
Gemini CLI	Google	Proprietary	GEMINI.md + session	Built-in	$/tok · free tier	Gemini terminal access
Windsurf	Codeium	Proprietary	Project	Cascade agent	Plan	Multi-model agentic IDE
Devin	Cognition Labs	Proprietary	Session + planning	Browser + IDE + plan	$500/mo seat	Autonomous SWE benchmark
Open Interpreter	Community	MIT	Session	Code execution	$/tok · self-host	Terminal computer-use · audit-friendly
Manus	Monica	Proprietary	Opaque	Browser-native	$/run	Autonomous web agent · pattern study
Operator OS ★	Phatchara	Own doctrine	7-file + Reflexion	MCP + skills + SBOS + L2	$5–$30/mo	Sovereign multi-engine operator

/05 ─ Reality check

Build it, or settle.

Every operator who builds a stack like this eventually hits the same fork: keep building, or admit it's a design that lives on paper forever and adopt the off-the-shelf alternatives below. This is the honest scorecard.

Layer Component Status What's actually true

L1 Sandbox doctrine Shipped v2.6 · 540 lines · CLAUDE.md + AGENTS.md + GEMINI.md mirror verbatim across 3 runtimes.

L2 Company Hub Shipped l2-hub :8100 · LiteLLM :4000 · Strategy E routing live · Redis :6380 · C.1 Phase 0 just landed.

L3 SBOS intel pipeline Shipped Port 8200 · 24/7 harvest → score → store · NotebookLM grounded RAG behind it.

L4 PlatformIO firmware fleet Shipped HybridPack · SpotWelder · JumpStarter · Zircon UPS · all PoC firmware running on real hardware.

L4 Synapse — lab knowledge Deploying Phase 6 · Cloudflare Tunnel api.synapse.inlife.dev → :8500 · LINE OA handoff pending.

L4 AKASHA — knowledge ops Partial Shadow on 2060 · 7-file scaffold complete · production cutover not yet decided.

L4 HESS — energy storage Partial Design docs + LiC pack PoC + firmware on bench · CEST pilot deployment outstanding.

L4 TMAO — trading Blocked Paper-trading only · WR 22% < 70% gate · live capital frozen until D21 root cause cleared.

L4 VISTEC — research Academic Manuscript revision flow active · journal-output engine · revenue not the goal here.

/05.2 — The honest fork

Should we keep building, or settle for off-the-shelf?

Build it — reasons that survive scrutiny

L1, L2, L3 + 4 of 6 L4 engines already ship. We are past napkin.
No alternative on this page is framework-agnostic across Claude Code, Codex, Gemini CLI at the same time.
Operator's domains (trading · battery · firmware · research · marketplace) span vendor silos no single tool covers.
The doctrine pays for itself every session — onboarding cost on new tools dropped from days to a /clear.
Anthropic concentration risk (50% vendor cap) requires our own routing layer regardless.

Settle — admit the failure modes

Zero community. Bus factor = 1. Every bug, every upgrade, every rewrite lives on operator.
OpenClaw has 270K users today. Operator OS has one. That gap is not a feature.
4 of last 5 sessions before #64 shipped pure spec — design drift outpaced ship velocity.
TMAO blocked on a quality gate the doctrine didn't prevent. The doctrine isn't load-bearing yet.
Some L4 engines (HESS, AKASHA) could honestly run on Notion + Aider with 10% the doctrine.

The fork is not "build or quit" — it's "ship or rewrite." Half the L4 layer is real software with users (the operator counts). The other half is doctrine looking for a workload. Settle on the engines where the off-the-shelf delta is <20%. Build only where vendor-locked tools would actively damage the work — TMAO (sovereign data), HESS (industrial IP), L2 routing (compliance), L1 doctrine (the moat).

/05.3 — Kicks it from design-forever to shippable

One outside user. Even paid, even small. The single biggest forcing function on the table.
TMAO gate cleared (D21 root cause + WR ≥ 70% on paper for 30 sessions) — that's the engine that validates the doctrine, not the doctrine itself.
C.1 cost ceiling holds for 3 months ($30/mo hard cap). Operating cost predictability is what makes the OS thesis defensible.
Synapse Phase 6 live with a real customer querying api.synapse.inlife.dev — proves the cross-layer architecture, not just the boxes.
Doctrine drift score < 10% over a quarter (§16.5 metric) — proves the self-improvement loop closes.