carlos
three agents, one shape

Agent loops, compared.

Claude Code, carlos, and OpenAI Codex all run the same inner primitive: stream a model, parse text plus tool calls, dispatch tools under a safety gate, feed results back, repeat. Where they differ is in the machinery around that loop: how they remember, how they ask for permission, who watches them when you're not. Hover a step in any column to see the equivalent step light up in the other two.

phases:

Claude Code

interactive harness
USER PROMPT
terminal input, file refs via @, piped data
load context
CLAUDE.md + auto-memory + skill descriptions + system prompt
model turn
three phases blend: gather, act, verify
tool_use
read · edit · bash · grep · glob · fetch · spawn subagent
or
final text
stop_reason ≠ tool_use → return to user
permission gate
default / auto-accept / plan / auto with classifier
hooks fire
PreToolUse / PostToolUse / Stop · scripts can block or rewrite
checkpoint
file snapshot before each edit · esc-esc to rewind
result → context
appended as user-role tool_result message
next iteration
context fills
auto-compaction · summarize history · keep recent code + decisions

carlos

event-sourced agent
USER PROMPT
chat TUI, /research, carlos please headless, scheduled cron
load context
AGENTS.md + CLAUDE.md walk + .claude/skills + .agents/skills
EvtUserMessage → eventlog
SQLite WAL, source of truth · projection cache derives
drain steering
supervisor-injected nudges at iteration boundary
budget gate
tokens / cents / wall-clock · ErrBudgetExceeded → clean halt
provider.Stream
canonical event shape across anthropic / openai / openrouter / gemini / ollama
tool_use
read · write · edit · bash (PTY) · grep · git · web_fetch · plan
or
end_turn
EvtAssistantMessage → eventlog · return
approver.Approve
AutoApprover · stdin · in-chat overlay (y/N/Always)
reg.Execute
tools resolve paths against worktree BaseDir if --worktree
EvtToolCall + EvtToolResult
manage TUI roster updates in real time via Subscribe
next iteration (max 25)
/compact (manual)
summarizer rebuilds history · EvtSessionReset + synthetic recap

Codex

sandbox-first ReAct
USER PROMPT
codex TUI, codex exec for CI, ChatGPT handoff
load context
AGENTS.md (global → project → subtree) + profile config
session.jsonl
~/.codex/sessions/ · /resume restores a saved conversation
/plan mode?
user-toggled · proposes strategy before any edits
model turn
GPT-5 · o-series · configurable via profile
tool_use
shell · apply_patch · web_search · image_input · MCP
or
final text
model emits no tool call → return
approval × sandbox
policy: untrusted / on-failure / on-request / never
os sandbox exec
seatbelt (mac) · landlock+seccomp (linux) · wsl (win)
result → context
jsonl append · child process exit code surfaced
next iteration
/compact (manual)
summarize the visible conversation to free tokens

Where they converge.

The inner loop is the primitive all three projects land on: stream the model, parse text plus tool calls, dispatch tools, feed results back, terminate when the model stops asking for tools. All three expose the same handful of file / shell / search tools as the coding-agent floor. All three keep an instructions file (CLAUDE.md, AGENTS.md) as conversation-start context. All three ship a gate between "model proposes a tool call" and "tool actually runs." All three support subagents that get a fresh context window.

That shared spine is not a coincidence; it's the shape the task forces. An LLM with tools, iterating against environmental feedback, until either the model is satisfied or a safety rail trips. The interesting question is what each one builds around the spine.

Where they diverge.

Categories with cross-cutting machinery: what the loop touches outside the model call. Hover a row to scan it.

concern claude code carlos codex
source of truth JSONL transcripts under ~/.claude/projects/; projection rebuilt from log on resume SQLite WAL eventlog under ~/.carlos/state.db; every action is an event; projection is derived JSONL sessions under ~/.codex/sessions/; /resume restores a named conversation
runtime Node.js + TypeScript pure Go, no CGO; single static binary under 30 MB Rust (codex-rs, 96% Rust); fast cold start, single static binary
provider surface Anthropic only (Claude models) anthropic, openai, openrouter, ollama, plus a Go CPU backend; one canonical event shape OpenAI models (GPT, o-series) via profile config; config.toml selects model + reasoning level
safety gate permission modes (default / auto-accept / plan / auto with classifier); /permissions allowlist pluggable Approver interface (AutoApprover / stdin / in-chat y/N/Always overlay); session-level Always cache per tool orthogonal matrix: approval policy × sandbox mode. Approval: untrusted / on-failure / on-request / never. Sandbox: read-only / workspace-write / danger-full
OS sandbox layer none surfaced; relies on the underlying OS + user permissions opt-in --worktree git-worktree isolation; tools resolve paths against the worktree base dir first-class: macOS Seatbelt via sandbox-exec, Linux bwrap + seccomp, Windows WSL2; child processes spawned inside the box
plan / preview / apply plan mode is a read-only inspection step; user approves plan, then implementation runs in default mode PlanTool queues a worktree diff as an approval-queue artifact; apply_handler ff-merges on accept, discards on reject /plan switches to plan mode; proposes a strategy before any edit. No diff-as-artifact concept; the conversation carries the plan
context compaction auto-triggered when context fills; /compact <focus> for targeted; per-message /rewind manual /compact only; uses memory.Summarizer (Naive or LLM); emits EvtSessionReset + a synthetic recap manual /compact only; summarizes the visible conversation to free tokens. No auto-trigger
budget enforcement no per-loop budget gate; user observes spend via the status bar Budget + Tracker per scope (run + subtree, rolls up); gate fires before each provider call; clean halt classified as graceful end no per-loop budget gate; user observes spend via the status line and provider dashboard
steering mid-flight type a correction during a tool call; running tool finishes, then the correction is read before next step supervisor-owned steering channel; messages drain at the iteration boundary (after collectAssistant, before next provider call) type into the input field; the in-flight turn is interruptible, next step reads the new input
sub-agents Task tool spawns a subagent with its own context + restricted tools; returns a summary; configured via .claude/agents/ markdown Supervisor.Spawn → SubAgent with 11-state lifecycle; restart breaker (3-in-60s); per-agent heartbeat ticker; manage TUI roster surfaces every live sub-agent subagent support framed as parallelization; spawn-and-summarize pattern; less elaborate lifecycle than carlos
liveness detection none surfaced; a crashed Claude Code is gone with its session 5s heartbeat emit + 10s sweep; stale agents (heartbeat > 2× interval) transition to orphaned at next startup via Recover none in the local CLI; the Cloud agent surface monitors its own runs
scheduled / autonomous runs non-interactive claude -p for CI/cron, but no daemon carlos daemon with UDS IPC + launchd / systemd unit; /schedule add "every weekday at 9am ..."; daemon-fired runs spawn sub-agents codex exec for non-interactive CI; Codex Cloud hosts long-running web-agent tasks separate from the local CLI
cloud handoff none none distinguishing feature: kick off a Cloud task from the local editor, monitor progress, apply resulting diffs back to your workspace
skill induction user-authored only (.claude/skills/SKILL.md); model surfaces them on relevance match propose-don't-publish: inducer watches usage, judge scores cross-provider, replay-eval scores with-vs-without on historical transcripts; skill lands in the approval queue, never auto-publishes SKILL.md packaged workflows invoked via /skills or implicit description match; MCP dependencies declared via agents/openai.yaml. No auto-induction
research mode no first-class research orchestrator; user chains tools manually or installs MCP servers decompose → web_search → web_fetch → read → synthesize → verify pipeline with citation tracking (stable sN/pN IDs); /research spawns as sub-agent web_search is a built-in tool; no orchestrated pipeline. ChatGPT integration covers deep-research at a different layer
extensibility PreToolUse / PostToolUse / Stop hooks; MCP server protocol for arbitrary external tools; plugin marketplace in-process via tools.Registry, Approver, memory.Summarizer, verifier Dispatcher; MCP client (stdio servers, per-frame gating) ships built-in via internal/mcp MCP for third-party tools; Skills bundle scripts + prompts + MCP deps as a unit. No shell-hook system in the Claude Code sense

The headline difference.

Claude Code is an interactive harness. The loop is tight, the user is in the tight feedback loop, and the architecture is shaped around getting out of the way of one person at a keyboard. Hooks, plan mode, checkpoints: extension points that assume you're there to type.

carlos is an event-sourced agent. The inner loop is the same shape, but every step writes to a durable log, every sub-agent has a lifecycle the supervisor manages, and the surface widens: scheduled runs via the daemon, autonomous research with citation grounding, propose-don't-publish skill induction. More machinery, paid for by the property that carlos still works when you've gone to dinner.

Codex is a sandbox-first ReAct loop. The architectural commitment is OS-level isolation as a peer of approval, not a fallback. Approval policy and sandbox mode are independent axes; the default Auto preset means workspace-write + on-request. Pair that with the Cloud handoff and Codex is the only one of the three that treats "this might run somewhere I'm not" as a first-class case at the protocol level.

None of them is universally better. The shared spine is the floor; the rest is product positioning, and the right one for any given week is the one whose machinery matches what the work needs.

Sources: Claude Code "How Claude Code works" and "Best practices" from Anthropic's docs; Codex from OpenAI Codex CLI docs + the openai/codex repo (read June 2026); carlos's loop is internal/agent/loop.go in this repo.