◤ THE CLAW INDEX
VOL. 02 MAY 2026 FIELD COMPARISON ED. AGENT OS
Seven contenders. One Operator. A field map of the 2026 agent stack.

The Agent OS
Stack.

A Field Comparison
Tools vs. Tooling vs. Operating Systems

Every six months the open-source agent landscape rearranges itself. OpenClaw goes viral. Hermes ships self-evolution. Antigravity arrives from Mountain View. Codex renames itself again. Claude Code keeps absorbing protocols. This is the map — and where our own ecosystem fits inside it.

/01 Why this exists
None of these tools are the operating system. They are processes. The operating system is the one you build around them — the doctrine, the gates, the seven files, the four layers.
/02 ─ The Contenders

Seven processes, one Operator.

Below: seven full-row contenders (one of them ours), an eight-cell supporting cast of editor-grade tools, and the bespoke ecosystem they all run inside. The last entry gets a layer diagram — because it is the only one that owns the others.

01

OpenClaw.

Community fork · 430,000 LOC TypeScript · Node.js 18+ · MIT

The viral one. 270,000+ GitHub stars in 84 days. A personal AI agent in your messenger — Telegram, WhatsApp, Discord, twenty more channels — backed by 700+ skills and a full computer-use surface. Palo Alto Networks flagged it as a “security nightmare”: documented CVEs and malicious skills found in ClawHub. Always run with Docker isolation. The original — and the trigger that forked half the open-source agent landscape onto every class of hardware imaginable (see Vol. 01).

Edge

  • Largest plugin marketplace anywhere
  • Messaging-native (22+ channels)
  • Most complete feature set
  • MIT — fork-friendly by design

Trade

  • Documented CVEs · audit-hostile
  • 1.5 GB RAM minimum
  • TypeScript monolith — slow to start
  • ClawHub skills are not curated
02

thClaws.

thClaws org · github.com/thClaws/thClaws · Rust · Apache 2.0 · 1,043 stars / 143 forks

Native-Rust open-source agent harness. Multi-provider, runs entirely on your own machine, sovereignty-by-design — the manifesto on the homepage at thclaws.ai. Created April 20, 2026; pushed daily through May. Where OpenClaw is a 430,000-line TypeScript monolith, thClaws is the opposite bet: a single Rust binary, allowlist-first command execution, a much smaller attack surface to audit. Operator forked it on April 25 as a study mirror — interesting enough to track, not yet enough to build on.

Edge

  • Native Rust · single binary · audit-friendly
  • Multi-provider routing baked in
  • Sovereignty-by-design positioning
  • Apache 2.0 · fork without friction

Trade

  • ~5 weeks old · spec moving fast
  • Rust expertise needed to extend
  • Smaller plugin surface than OpenClaw
  • Closer-to-runtime than to OS — overlaps Aider's niche
03

Hermes Agent.

NousResearch · MIT · ICLR 2026 Oral · Python · Reflexion pattern

A research-grade agent designed to think about itself. The 3-tier memory architecture — Procedural (skills), Semantic (MEMORY.md + USER.md), Episodic (transcripts + FTS5) — is the cleanest formal model anyone has shipped. The self-evolution variant adds GEPA: five constitutional constraint gates the agent must pass before committing. We adopted the pattern and the terminology in our §16 Self-Improvement Protocol — without the AGPL-transitive DSPy/GEPA Python dependencies.

Edge

  • 3-tier memory as first-class design
  • Self-evolving constitutional gates (GEPA)
  • MIT-licensed core · ICLR-backed research
  • Reflexion loop is the right primitive

Trade

  • AGPL transitive deps (DSPy / GEPA)
  • RL training pipeline is heavy
  • Research-grade · not turnkey
  • Runtime install needs sandboxing
04

Letta / MemGPT.

Letta Labs · UC Berkeley research lineage · Apache 2.0 · Python · memory-first

Memory as a first-class citizen. Letta (formerly MemGPT) builds the agent around persistent memory, not the other way around — agent-managed core memory, archival memory, and recall buffer with explicit eviction policy. Parallels Hermes' 3-tier model from a different angle: where Hermes is research-pure (Procedural / Semantic / Episodic), Letta is production-pragmatic (Core / Archival / Recall). Apache-licensed, runnable locally, hostable as a service. If our §16 Reflexion pattern is the algorithm, Letta is one of the cleanest implementations of the underlying storage primitives.

Edge

  • Memory architecture as the product
  • Apache 2.0 · self-hostable
  • Berkeley research lineage · benchmarked
  • Clean storage primitives we can borrow

Trade

  • Narrower than Hermes — just memory, not full agent
  • Python service · ops overhead
  • Smaller community than OpenClaw / Hermes
  • Still iterating on tool-calling spec
05

Codex.

OpenAI · proprietary · CLI · GPT-5 / o-series

OpenAI's terminal coding agent. Reads AGENTS.md — the same stateful-file pattern Anthropic uses for CLAUDE.md and Google uses for GEMINI.md, just under a different name. Strong on code reasoning, multimodal, runs entirely against OpenAI's hosted models. Less skill ecosystem than Claude Code; no MCP yet. Best when you're already paying for ChatGPT Pro and want CLI access to the same brain.

Edge

  • OpenAI ecosystem leverage
  • Strong code reasoning · multimodal native
  • Simple AGENTS.md spec
  • Fast iteration on CLI

Trade

  • Vendor lock-in to OpenAI
  • No MCP integration
  • Smaller plugin surface
  • ChatGPT Pro effectively required
06

Antigravity.

Google · proprietary · agentic IDE · Gemini 3 · browser-MCP native

Google's bet on the agentic IDE. Gemini-powered, browser-native MCP integration, multi-agent orchestration baked into the editor itself. Excellent for workflows where you want the agent to read live DOM, click through pages, and ship the code in one motion. Vendor-locked to Gemini — but if you're already inside the Google ecosystem the integration is hard to beat. New, fast-moving, still re-shipping monthly.

Edge

  • Browser-MCP native
  • Gemini long-context windows
  • Multi-agent orchestration built-in
  • Google-scale infra behind it

Trade

  • Google ecosystem lock-in
  • Proprietary editor
  • Gemini-only LLM (no Claude / GPT)
  • Spec still in motion · 2026 product
07

Claude Code.

Anthropic · proprietary · CLI + IDE + Desktop + Web · 1M context · MCP-native

Anthropic's official engineering CLI. 1M token context window. MCP protocol for tool integration. A real skill marketplace. Sub-agents for parallel work. Hooks. Worktrees. Available across CLI, VS Code, JetBrains, desktop apps (Mac and Windows), and claude.ai/code. Stateful via CLAUDE.md + MEMORY.md + skills + hooks. The most polished multi-tool surface we evaluate against — and the one Vol. 01 was authored on.

Edge

  • 1M token context window
  • MCP protocol leadership · large skill marketplace
  • CLI + IDE + Desktop + Web — every surface
  • Stateful by design (CLAUDE.md + hooks)

Trade

  • Anthropic vendor concentration risk
  • Paid plan effectively required for serious use
  • Less browser-native than Antigravity
  • Billing model still moving
08

The supporting cast.

Aider · Cursor · Cline · Gemini CLI · Windsurf · Devin · Open Interpreter · Manus

Each one solves one slice well. None of them stack into a 4-layer OS by themselves. Listed here because the right Operator OS uses them — it doesn't replace them.

Aider.
Python · Apache · terminal

git-native pair programmer. Opens a chat in your terminal, edits files, commits the diff. Cleanest LLM-coding-agent primitive on the market. Best for solo developers who live in git.

Cursor.
Cursor Inc. · proprietary · IDE

VS Code fork with Composer. Tight IDE experience, the Composer multi-file agent, fast autocomplete. Best for developers who want the agent baked into the editor surface and will pay a plan for it.

Cline.
Apache · VS Code extension · MCP-friendly

Open agent inside VS Code. Reads files, runs commands, supports MCP servers. Lighter than Cursor, more transparent. Best when you want Cursor-style capability without the proprietary stack.

Gemini CLI.
Google · proprietary · terminal

Google's terminal answer. Reads GEMINI.md (alias of CLAUDE.md). Free tier exists. Useful as a parallel runtime in our framework-agnostic doctrine — same constitution, different brain.

Windsurf.
Codeium · proprietary · agentic IDE

Codeium's answer to Cursor. Cascade agent built into the editor. Multi-model, not Gemini-locked the way Antigravity is. Best when you want Cursor-class capability without committing to Anthropic or Google.

Devin.
Cognition Labs · proprietary · autonomous SWE

The autonomous benchmark. Plans, codes, browses, ships — without an operator in the loop. $500/mo seat. The thing people cite when arguing whether agents replace engineers. Useful as a yardstick, not a workflow.

Open Interpreter.
MIT · Python · terminal computer-use

Local code-execution agent. Runs Python, Shell, JS on your machine in a sandbox. MIT, scriptable, no vendor. The simplest rent-an-agent primitive you can host yourself — and the cleanest one to wrap in an audit gate.

Manus.
Monica · proprietary · browser-first

Autonomous web agent that went viral early 2025. Plans tasks, drives a real browser, files reports. Spec opaque, performance variable. Watch for the architecture pattern; do not depend on the platform.

09

The Operator OS.

Phatchara Wipataphan · 2025– · doctrine-driven · framework-agnostic

Not a tool — an operating system for using all of them. A 4-layer doctrine that loads identically into Claude Code (CLAUDE.md), Codex (AGENTS.md), and Gemini CLI (GEMINI.md). Same constitution, three runtimes. Layer 1 holds the rules and the audit gates. Layer 2 routes every request through Strategy E (privacy → task → escalate) across Ollama, LiteLLM, and OpenRouter. Layer 3 runs the 24/7 intelligence pipeline. Layer 4 holds the specialized engines. Reflexion (§16) and Multi-Tool Verification (§16.7) keep the doctrine alive across sessions.

The 4-Layer Stack — schematic
L1 Sandbox CLAUDE / AGENTS / GEMINI · audits · MRIC scope · install gates
L2 Company Hub l2-hub :8100 · LiteLLM :4000 · 14 role agents · Strategy E routing · Redis :6380
L3 SBOS Solopreneur Foundation :8200 · 24/7 intel pipeline · NotebookLM grounded RAG
L4 Engines TMAO · HESS · AKASHA · Synapse · PlatformIO · VISTEC — each 7-file stateful

Edge

  • Framework-agnostic — one constitution, three runtimes
  • 4-layer doctrine + MRIC scope boundaries
  • 7-file stateful protocol per engine
  • Self-improving via §16 GEPA loop
  • LOCAL-first routing · sovereign data
  • Install gates + audit log + Multi-Tool Verification

Trade

  • Bespoke — no external community
  • Requires operator discipline
  • Doctrine onboarding is non-trivial
  • Maintenance overhead lives on you
  • Multi-PC operational complexity
3
Runtimes loading
the same constitution
4
Architectural layers
L1 · L2 · L3 · L4
7
Stateful files
per engine
540
Lines of doctrine
at v2.6
/03 ─ Decision

Pick your process.

Three honest paths, indexed by what you actually optimize for. The fourth — the Operator OS — is what happens when you stop optimizing for any single tool.

If you want
leverage.
→ Claude Code or Antigravity. Vendor-grade integration, MCP, multi-surface. Pay the plan, ship faster, accept the lock-in tradeoff.
If you want
openness.
→ OpenClaw, Hermes, or Aider. MIT or Apache, fork-friendly, community-driven. Trade polish and security posture for sovereignty over the source.
If you want
the operating system.
→ Build the Operator OS around them. The constitution is yours. The runtimes are interchangeable. Every tool in this table becomes a process inside Layer 2.
/04 ─ Side-by-Side

Cross-tool matrix.

Fifteen rows. Seven dimensions that actually move a buying decision. The protagonist sits at the bottom — it's the only row that owns the others.

Project Vendor License Memory Tools Cost Best For
OpenClaw Community MIT Session + skills 700+ skills $0 + API Messaging-native personal agents
thClaws thClaws org Apache 2.0 Per-binary session Rust harness · allowlist $0 + provider Sovereign Rust agent harness
Hermes Agent NousResearch MIT 3-tier P/S/E Skills + Honcho $/tok + GPU Self-evolving research
Letta / MemGPT Letta Labs Apache 2.0 Core / Archival / Recall Tool-calling spec $/tok · self-host free Memory-first persistent agent
Codex OpenAI Proprietary AGENTS.md + session Built-in Plan + $/tok OpenAI-ecosystem coding
Antigravity Google Proprietary Project + GEMINI.md Browser-MCP + IDE $/tok Gemini browser-driven coding
Claude Code Anthropic Proprietary CLAUDE.md + MCP + hooks MCP + skills + sub-agents Plan + $/tok Multi-tool engineering execution
Aider Community Apache Git history Git-native $/tok Terminal pair coding
Cursor Cursor Inc. Proprietary Project Composer Plan IDE-first agentic coding
Cline Cline Apache Project MCP $/tok Open VS Code agent
Gemini CLI Google Proprietary GEMINI.md + session Built-in $/tok · free tier Gemini terminal access
Windsurf Codeium Proprietary Project Cascade agent Plan Multi-model agentic IDE
Devin Cognition Labs Proprietary Session + planning Browser + IDE + plan $500/mo seat Autonomous SWE benchmark
Open Interpreter Community MIT Session Code execution $/tok · self-host Terminal computer-use · audit-friendly
Manus Monica Proprietary Opaque Browser-native $/run Autonomous web agent · pattern study
Operator OS ★ Phatchara Own doctrine 7-file + Reflexion MCP + skills + SBOS + L2 $5–$30/mo Sovereign multi-engine operator
/05 ─ Reality check

Build it, or settle.

Every operator who builds a stack like this eventually hits the same fork: keep building, or admit it's a design that lives on paper forever and adopt the off-the-shelf alternatives below. This is the honest scorecard.

Layer Component Status What's actually true
L1 Sandbox doctrine Shipped v2.6 · 540 lines · CLAUDE.md + AGENTS.md + GEMINI.md mirror verbatim across 3 runtimes.
L2 Company Hub Shipped l2-hub :8100 · LiteLLM :4000 · Strategy E routing live · Redis :6380 · C.1 Phase 0 just landed.
L3 SBOS intel pipeline Shipped Port 8200 · 24/7 harvest → score → store · NotebookLM grounded RAG behind it.
L4 PlatformIO firmware fleet Shipped HybridPack · SpotWelder · JumpStarter · Zircon UPS · all PoC firmware running on real hardware.
L4 Synapse — lab knowledge Deploying Phase 6 · Cloudflare Tunnel api.synapse.inlife.dev → :8500 · LINE OA handoff pending.
L4 AKASHA — knowledge ops Partial Shadow on 2060 · 7-file scaffold complete · production cutover not yet decided.
L4 HESS — energy storage Partial Design docs + LiC pack PoC + firmware on bench · CEST pilot deployment outstanding.
L4 TMAO — trading Blocked Paper-trading only · WR 22% < 70% gate · live capital frozen until D21 root cause cleared.
L4 VISTEC — research Academic Manuscript revision flow active · journal-output engine · revenue not the goal here.
/05.2 — The honest fork

Should we keep building, or settle for off-the-shelf?

Build it — reasons that survive scrutiny

  • L1, L2, L3 + 4 of 6 L4 engines already ship. We are past napkin.
  • No alternative on this page is framework-agnostic across Claude Code, Codex, Gemini CLI at the same time.
  • Operator's domains (trading · battery · firmware · research · marketplace) span vendor silos no single tool covers.
  • The doctrine pays for itself every session — onboarding cost on new tools dropped from days to a /clear.
  • Anthropic concentration risk (50% vendor cap) requires our own routing layer regardless.

Settle — admit the failure modes

  • Zero community. Bus factor = 1. Every bug, every upgrade, every rewrite lives on operator.
  • OpenClaw has 270K users today. Operator OS has one. That gap is not a feature.
  • 4 of last 5 sessions before #64 shipped pure spec — design drift outpaced ship velocity.
  • TMAO blocked on a quality gate the doctrine didn't prevent. The doctrine isn't load-bearing yet.
  • Some L4 engines (HESS, AKASHA) could honestly run on Notion + Aider with 10% the doctrine.
The fork is not "build or quit" — it's "ship or rewrite." Half the L4 layer is real software with users (the operator counts). The other half is doctrine looking for a workload. Settle on the engines where the off-the-shelf delta is <20%. Build only where vendor-locked tools would actively damage the work — TMAO (sovereign data), HESS (industrial IP), L2 routing (compliance), L1 doctrine (the moat).
/05.3 — Kicks it from design-forever to shippable
  1. One outside user. Even paid, even small. The single biggest forcing function on the table.
  2. TMAO gate cleared (D21 root cause + WR ≥ 70% on paper for 30 sessions) — that's the engine that validates the doctrine, not the doctrine itself.
  3. C.1 cost ceiling holds for 3 months ($30/mo hard cap). Operating cost predictability is what makes the OS thesis defensible.
  4. Synapse Phase 6 live with a real customer querying api.synapse.inlife.dev — proves the cross-layer architecture, not just the boxes.
  5. Doctrine drift score < 10% over a quarter (§16.5 metric) — proves the self-improvement loop closes.