three agents, one shape

Agent loops, compared.

Claude Code, carlos, and OpenAI Codex all run the same inner primitive: stream a model, parse text plus tool calls, dispatch tools under a safety gate, feed results back, repeat. Where they differ is in the machinery around that loop: how they remember, how they ask for permission, who watches them when you're not. Hover a step in any column to see the equivalent step light up in the other two.

phases:

Claude Code

interactive harness

USER PROMPT

terminal input, file refs via @, piped data

load context

CLAUDE.md + auto-memory + skill descriptions + system prompt

model turn

three phases blend: gather, act, verify

tool_use

read · edit · bash · grep · glob · fetch · spawn subagent

final text

stop_reason ≠ tool_use → return to user

permission gate

default / auto-accept / plan / auto with classifier

hooks fire

PreToolUse / PostToolUse / Stop · scripts can block or rewrite

checkpoint

file snapshot before each edit · esc-esc to rewind

result → context

appended as user-role tool_result message

next iteration

context fills

auto-compaction · summarize history · keep recent code + decisions

carlos

event-sourced agent

USER PROMPT

chat TUI, /research, carlos please headless, scheduled cron

load context

AGENTS.md + CLAUDE.md walk + .claude/skills + .agents/skills

EvtUserMessage → eventlog

SQLite WAL, source of truth · projection cache derives

drain steering

supervisor-injected nudges at iteration boundary

budget gate

tokens / cents / wall-clock · ErrBudgetExceeded → clean halt

provider.Stream

canonical event shape across anthropic / openai / openrouter / gemini / ollama

tool_use

read · write · edit · bash (PTY) · grep · git · web_fetch · plan

end_turn

EvtAssistantMessage → eventlog · return

approver.Approve

AutoApprover · stdin · in-chat overlay (y/N/Always)

reg.Execute

tools resolve paths against worktree BaseDir if --worktree

EvtToolCall + EvtToolResult

manage TUI roster updates in real time via Subscribe

next iteration (max 25)

/compact (manual)

summarizer rebuilds history · EvtSessionReset + synthetic recap

Codex

sandbox-first ReAct

USER PROMPT

codex TUI, codex exec for CI, ChatGPT handoff

load context

AGENTS.md (global → project → subtree) + profile config

session.jsonl

~/.codex/sessions/ · /resume restores a saved conversation

/plan mode?

user-toggled · proposes strategy before any edits

model turn

GPT-5 · o-series · configurable via profile

tool_use

shell · apply_patch · web_search · image_input · MCP

final text

model emits no tool call → return

approval × sandbox

policy: untrusted / on-failure / on-request / never

os sandbox exec

seatbelt (mac) · landlock+seccomp (linux) · wsl (win)

result → context

jsonl append · child process exit code surfaced

next iteration

/compact (manual)

summarize the visible conversation to free tokens

Where they converge.

The inner loop is the primitive all three projects land on: stream the model, parse text plus tool calls, dispatch tools, feed results back, terminate when the model stops asking for tools. All three expose the same handful of file / shell / search tools as the coding-agent floor. All three keep an instructions file (CLAUDE.md, AGENTS.md) as conversation-start context. All three ship a gate between "model proposes a tool call" and "tool actually runs." All three support subagents that get a fresh context window.

That shared spine is not a coincidence; it's the shape the task forces. An LLM with tools, iterating against environmental feedback, until either the model is satisfied or a safety rail trips. The interesting question is what each one builds around the spine.

Where they diverge.

Categories with cross-cutting machinery: what the loop touches outside the model call. Hover a row to scan it.

concern	claude code	carlos	codex
source of truth	JSONL transcripts under `~/.claude/projects/`; projection rebuilt from log on resume	SQLite WAL eventlog under `~/.carlos/state.db`; every action is an event; projection is derived	JSONL sessions under `~/.codex/sessions/`; `/resume` restores a named conversation
runtime	Node.js + TypeScript	pure Go, no CGO; single static binary under 30 MB	Rust (`codex-rs`, 96% Rust); fast cold start, single static binary
provider surface	Anthropic only (Claude models)	anthropic, openai, openrouter, ollama, plus a Go CPU backend; one canonical event shape	OpenAI models (GPT, o-series) via profile config; `config.toml` selects model + reasoning level
safety gate	permission modes (default / auto-accept / plan / auto with classifier); `/permissions` allowlist	pluggable `Approver` interface (AutoApprover / stdin / in-chat y/N/Always overlay); session-level Always cache per tool	orthogonal matrix: approval policy × sandbox mode. Approval: untrusted / on-failure / on-request / never. Sandbox: read-only / workspace-write / danger-full
OS sandbox layer	none surfaced; relies on the underlying OS + user permissions	opt-in `--worktree` git-worktree isolation; tools resolve paths against the worktree base dir	first-class: macOS Seatbelt via `sandbox-exec`, Linux `bwrap` + seccomp, Windows WSL2; child processes spawned inside the box
plan / preview / apply	plan mode is a read-only inspection step; user approves plan, then implementation runs in default mode	`PlanTool` queues a worktree diff as an approval-queue artifact; `apply_handler` ff-merges on accept, discards on reject	`/plan` switches to plan mode; proposes a strategy before any edit. No diff-as-artifact concept; the conversation carries the plan
context compaction	auto-triggered when context fills; `/compact <focus>` for targeted; per-message `/rewind`	manual `/compact` only; uses `memory.Summarizer` (Naive or LLM); emits `EvtSessionReset` + a synthetic recap	manual `/compact` only; summarizes the visible conversation to free tokens. No auto-trigger
budget enforcement	no per-loop budget gate; user observes spend via the status bar	`Budget` + `Tracker` per scope (run + subtree, rolls up); gate fires before each provider call; clean halt classified as graceful end	no per-loop budget gate; user observes spend via the status line and provider dashboard
steering mid-flight	type a correction during a tool call; running tool finishes, then the correction is read before next step	supervisor-owned steering channel; messages drain at the iteration boundary (after collectAssistant, before next provider call)	type into the input field; the in-flight turn is interruptible, next step reads the new input
sub-agents	`Task` tool spawns a subagent with its own context + restricted tools; returns a summary; configured via `.claude/agents/` markdown	`Supervisor.Spawn → SubAgent` with 11-state lifecycle; restart breaker (3-in-60s); per-agent heartbeat ticker; manage TUI roster surfaces every live sub-agent	subagent support framed as parallelization; spawn-and-summarize pattern; less elaborate lifecycle than carlos
liveness detection	none surfaced; a crashed Claude Code is gone with its session	5s heartbeat emit + 10s sweep; stale agents (heartbeat > 2× interval) transition to `orphaned` at next startup via `Recover`	none in the local CLI; the Cloud agent surface monitors its own runs
scheduled / autonomous runs	non-interactive `claude -p` for CI/cron, but no daemon	`carlos daemon` with UDS IPC + launchd / systemd unit; `/schedule add "every weekday at 9am ..."`; daemon-fired runs spawn sub-agents	`codex exec` for non-interactive CI; Codex Cloud hosts long-running web-agent tasks separate from the local CLI
cloud handoff	none	none	distinguishing feature: kick off a Cloud task from the local editor, monitor progress, apply resulting diffs back to your workspace
skill induction	user-authored only (`.claude/skills/SKILL.md`); model surfaces them on relevance match	propose-don't-publish: inducer watches usage, judge scores cross-provider, replay-eval scores with-vs-without on historical transcripts; skill lands in the approval queue, never auto-publishes	`SKILL.md` packaged workflows invoked via `/skills` or implicit description match; MCP dependencies declared via `agents/openai.yaml`. No auto-induction
research mode	no first-class research orchestrator; user chains tools manually or installs MCP servers	decompose → web_search → web_fetch → read → synthesize → verify pipeline with citation tracking (stable sN/pN IDs); `/research` spawns as sub-agent	web_search is a built-in tool; no orchestrated pipeline. ChatGPT integration covers deep-research at a different layer
extensibility	PreToolUse / PostToolUse / Stop hooks; MCP server protocol for arbitrary external tools; plugin marketplace	in-process via `tools.Registry`, `Approver`, `memory.Summarizer`, verifier `Dispatcher`; MCP client (stdio servers, per-frame gating) ships built-in via `internal/mcp`	MCP for third-party tools; Skills bundle scripts + prompts + MCP deps as a unit. No shell-hook system in the Claude Code sense

The headline difference.

Claude Code is an interactive harness. The loop is tight, the user is in the tight feedback loop, and the architecture is shaped around getting out of the way of one person at a keyboard. Hooks, plan mode, checkpoints: extension points that assume you're there to type.

carlos is an event-sourced agent. The inner loop is the same shape, but every step writes to a durable log, every sub-agent has a lifecycle the supervisor manages, and the surface widens: scheduled runs via the daemon, autonomous research with citation grounding, propose-don't-publish skill induction. More machinery, paid for by the property that carlos still works when you've gone to dinner.

Codex is a sandbox-first ReAct loop. The architectural commitment is OS-level isolation as a peer of approval, not a fallback. Approval policy and sandbox mode are independent axes; the default Auto preset means workspace-write + on-request. Pair that with the Cloud handoff and Codex is the only one of the three that treats "this might run somewhere I'm not" as a first-class case at the protocol level.

None of them is universally better. The shared spine is the floor; the rest is product positioning, and the right one for any given week is the one whose machinery matches what the work needs.

Sources: Claude Code "How Claude Code works" and "Best practices" from Anthropic's docs; Codex from OpenAI Codex CLI docs + the openai/codex repo (read June 2026); carlos's loop is internal/agent/loop.go in this repo.