Agent loops, compared.
Claude Code, carlos, and OpenAI Codex all run the same inner primitive: stream a model, parse text plus tool calls, dispatch tools under a safety gate, feed results back, repeat. Where they differ is in the machinery around that loop: how they remember, how they ask for permission, who watches them when you're not. Hover a step in any column to see the equivalent step light up in the other two.
Claude Code
carlos
Codex
Where they converge.
The inner loop is the primitive all three projects land on: stream the model, parse text plus tool calls, dispatch tools, feed results back, terminate when the model stops asking for tools. All three expose the same handful of file / shell / search tools as the coding-agent floor. All three keep an instructions file (CLAUDE.md, AGENTS.md) as conversation-start context. All three ship a gate between "model proposes a tool call" and "tool actually runs." All three support subagents that get a fresh context window.
That shared spine is not a coincidence; it's the shape the task forces. An LLM with tools, iterating against environmental feedback, until either the model is satisfied or a safety rail trips. The interesting question is what each one builds around the spine.
Where they diverge.
Categories with cross-cutting machinery: what the loop touches outside the model call. Hover a row to scan it.
| concern | claude code | carlos | codex |
|---|---|---|---|
| source of truth | JSONL transcripts under ~/.claude/projects/; projection rebuilt from log on resume |
SQLite WAL eventlog under ~/.carlos/state.db; every action is an event; projection is derived |
JSONL sessions under ~/.codex/sessions/; /resume restores a named conversation |
| runtime | Node.js + TypeScript | pure Go, no CGO; single static binary under 30 MB | Rust (codex-rs, 96% Rust); fast cold start, single static binary |
| provider surface | Anthropic only (Claude models) | anthropic, openai, openrouter, ollama, plus a Go CPU backend; one canonical event shape | OpenAI models (GPT, o-series) via profile config; config.toml selects model + reasoning level |
| safety gate | permission modes (default / auto-accept / plan / auto with classifier); /permissions allowlist |
pluggable Approver interface (AutoApprover / stdin / in-chat y/N/Always overlay); session-level Always cache per tool |
orthogonal matrix: approval policy × sandbox mode. Approval: untrusted / on-failure / on-request / never. Sandbox: read-only / workspace-write / danger-full |
| OS sandbox layer | none surfaced; relies on the underlying OS + user permissions | opt-in --worktree git-worktree isolation; tools resolve paths against the worktree base dir |
first-class: macOS Seatbelt via sandbox-exec, Linux bwrap + seccomp, Windows WSL2; child processes spawned inside the box |
| plan / preview / apply | plan mode is a read-only inspection step; user approves plan, then implementation runs in default mode | PlanTool queues a worktree diff as an approval-queue artifact; apply_handler ff-merges on accept, discards on reject |
/plan switches to plan mode; proposes a strategy before any edit. No diff-as-artifact concept; the conversation carries the plan |
| context compaction | auto-triggered when context fills; /compact <focus> for targeted; per-message /rewind |
manual /compact only; uses memory.Summarizer (Naive or LLM); emits EvtSessionReset + a synthetic recap |
manual /compact only; summarizes the visible conversation to free tokens. No auto-trigger |
| budget enforcement | no per-loop budget gate; user observes spend via the status bar | Budget + Tracker per scope (run + subtree, rolls up); gate fires before each provider call; clean halt classified as graceful end |
no per-loop budget gate; user observes spend via the status line and provider dashboard |
| steering mid-flight | type a correction during a tool call; running tool finishes, then the correction is read before next step | supervisor-owned steering channel; messages drain at the iteration boundary (after collectAssistant, before next provider call) | type into the input field; the in-flight turn is interruptible, next step reads the new input |
| sub-agents | Task tool spawns a subagent with its own context + restricted tools; returns a summary; configured via .claude/agents/ markdown |
Supervisor.Spawn → SubAgent with 11-state lifecycle; restart breaker (3-in-60s); per-agent heartbeat ticker; manage TUI roster surfaces every live sub-agent |
subagent support framed as parallelization; spawn-and-summarize pattern; less elaborate lifecycle than carlos |
| liveness detection | none surfaced; a crashed Claude Code is gone with its session | 5s heartbeat emit + 10s sweep; stale agents (heartbeat > 2× interval) transition to orphaned at next startup via Recover |
none in the local CLI; the Cloud agent surface monitors its own runs |
| scheduled / autonomous runs | non-interactive claude -p for CI/cron, but no daemon |
carlos daemon with UDS IPC + launchd / systemd unit; /schedule add "every weekday at 9am ..."; daemon-fired runs spawn sub-agents |
codex exec for non-interactive CI; Codex Cloud hosts long-running web-agent tasks separate from the local CLI |
| cloud handoff | none | none | distinguishing feature: kick off a Cloud task from the local editor, monitor progress, apply resulting diffs back to your workspace |
| skill induction | user-authored only (.claude/skills/SKILL.md); model surfaces them on relevance match |
propose-don't-publish: inducer watches usage, judge scores cross-provider, replay-eval scores with-vs-without on historical transcripts; skill lands in the approval queue, never auto-publishes | SKILL.md packaged workflows invoked via /skills or implicit description match; MCP dependencies declared via agents/openai.yaml. No auto-induction |
| research mode | no first-class research orchestrator; user chains tools manually or installs MCP servers | decompose → web_search → web_fetch → read → synthesize → verify pipeline with citation tracking (stable sN/pN IDs); /research spawns as sub-agent |
web_search is a built-in tool; no orchestrated pipeline. ChatGPT integration covers deep-research at a different layer |
| extensibility | PreToolUse / PostToolUse / Stop hooks; MCP server protocol for arbitrary external tools; plugin marketplace | in-process via tools.Registry, Approver, memory.Summarizer, verifier Dispatcher; MCP client (stdio servers, per-frame gating) ships built-in via internal/mcp |
MCP for third-party tools; Skills bundle scripts + prompts + MCP deps as a unit. No shell-hook system in the Claude Code sense |
The headline difference.
Claude Code is an interactive harness. The loop is tight, the user is in the tight feedback loop, and the architecture is shaped around getting out of the way of one person at a keyboard. Hooks, plan mode, checkpoints: extension points that assume you're there to type.
carlos is an event-sourced agent. The inner loop is the same shape, but every step writes to a durable log, every sub-agent has a lifecycle the supervisor manages, and the surface widens: scheduled runs via the daemon, autonomous research with citation grounding, propose-don't-publish skill induction. More machinery, paid for by the property that carlos still works when you've gone to dinner.
Codex is a sandbox-first ReAct loop. The architectural commitment is OS-level isolation as a peer of approval, not a fallback. Approval policy and sandbox mode are independent axes; the default Auto preset means workspace-write + on-request. Pair that with the Cloud handoff and Codex is the only one of the three that treats "this might run somewhere I'm not" as a first-class case at the protocol level.
None of them is universally better. The shared spine is the floor; the rest is product positioning, and the right one for any given week is the one whose machinery matches what the work needs.
internal/agent/loop.go in this repo.