Codex and MCP servers, a year in: a different bet than Claude Code

Q: How do I add an MCP server to Codex?

Run codex mcp add -- for a local stdio server, or codex mcp add --url for a Streamable HTTP one. You can also edit ~/.codex/config.toml directly, which the docs recommend when you need fine-grained control.

Q: Where is the Codex MCP config file?

At ~/.codex/config.toml for global servers, or .codex/config.toml in a project, which loads only in trusted directories. It's TOML, not JSON, and the CLI and IDE extension share the same file.

Claude Code and Codex hit the same MCP wall and answered it two different ways. A year in, here's the bet each one made — and what it costs you.

George Ciobanu · May 25, 2026 · 7 min read · Updated June 5, 2026

You add a couple of MCP servers to ~/.codex/config.toml, start Codex, type your first prompt, and nothing happens. The status line reads starting MCP servers and sits there. One of the six servers you've collected is unauthenticated, or slow, or pointed at a host that's down, and Codex won't take the first turn until it has heard back. You watch a spinner because of a tool you weren't going to use.

It's a reported behavior on Codex's own tracker, and it's the visible edge of a choice that separates Codex from Claude Code more than anything in their marketing does.

Two labs, two bets

By mid-2026 every coding agent had hit the same wall. Connect enough MCP servers and their tool definitions crowd out the context the model needs for your actual code, and tool-selection accuracy drops once the list runs long. The wall was identical. The two answers were not.

Claude Code moved the fix into the client. It shipped lazy tool loading as an opt-in in late 2025 and turned it on by default in early 2026: definitions stay out of context until a task needs them, fetched on demand, which cut startup token cost on heavy setups by around 85%. The agent triages its own tools.

Codex hasn't. It discovers your servers' tools before the first turn and loads them as a single eager inventory, with no on-demand search. The open request to add one, deferring tools once their descriptions cross 10% of the window, is itself the proof it isn't there yet. What OpenAI leaned on instead is the model: its own framing for the Codex line is token efficiency and context compaction, models trained to stay coherent across multiple windows on long tasks. The wager is that a good enough model makes the bloat survivable.

It mostly holds. GPT-5.5 will plow through a toolset that would have wrecked a weaker model. But the rule wasn't repealed, only hidden behind a better model, and when it surfaces (Codex turning vague, or reaching for the wrong tool once the list is long) there's no client quietly cleaning up after you. In Codex, you are the tool-search algorithm.

Which makes the pruning your job

So you prune by hand, and the knob that matters is the dull one: enabled_tools, the per-server allow-list that stops a server from dumping its whole surface into context. Trim each server to the handful you actually call.

[mcp_servers.github]
url = "https://api.githubcopilot.com/mcp/"
bearer_token_env_var = "GITHUB_PAT_TOKEN"
enabled_tools = ["get_issue", "get_pull_request", "get_file_contents"]

The servers worth keeping are the same ones that survive in any agent, docs and repo and database and browser, and I made that case in the Claude Code piece; the packages are identical; what changes is the client-side config, TOML in Codex's config.toml rather than Claude Code's JSON. Two Codex-specific habits are worth adding on top.

Put instructions in AGENTS.md, the file Codex already reads, instead of standing up a server for them. A server that returns the same text on every call was never a server. And when an edit is structural (a rename across forty call sites, or finding every caller of a method), grep-and-patch is where agents quietly break things, and AST-aware tools hold up better. That's the category pandō covers, for TypeScript, JavaScript, Clojure, and ClojureScript: codex mcp add pando-ai -- npx -y pando-ai. We make it, so weigh that accordingly; it's worth reaching for only when the change is genuinely structural.

Servers you can't fully trust

The year that taught everyone to add servers also taught them to be wary of them. Tool poisoning is the shape it takes: a server's descriptions are shown and approved once, at connect time, but its responses flow into context at runtime unchecked, and the model trusts them. Researchers used that gap to walk an agent into leaking private-repo contents through a poisoned issue, and a separate demonstration exfiltrated a WhatsApp history through calls that looked routine. On the supply-chain side, a widely installed connection helper shipped a critical RCE (CVE-2025-6514, CVSS 9.6) across hundreds of thousands of installs.

Codex adds two edges of its own. Project-scoped .codex/config.toml loads only in a trusted directory, a real guardrail until you reflexively trust a cloned repo that ships its own server config. And if two servers expose a tool with the same name, check what Codex actually surfaces in /mcp rather than assuming the one you meant wins. Treat adding a server the way you'd treat a dependency that can run shell commands and read your context, because that is what it is.

The gap is temporary; the standard isn't

None of this decides much in the long run. The architectural split, client-side deferral versus a model that copes, is a difference in timing, not direction. If Codex adds lazy tool discovery, the startup blocking and the context tax both fade, and the two agents converge on the same shape: fewer tools in the model's face, loaded only when needed.

What isn't temporary is the protocol underneath. MCP started at Anthropic, and OpenAI adopted it anyway (across Codex, the IDE extension, and the Responses API) and then went further: Codex can run as an MCP server itself, so another agent calls Codex the way Codex calls Context7. When a lab builds its flagship coding agent to speak a rival's protocol and to answer through it, the question of whether MCP was ever just an Anthropic experiment is closed. The remaining contest is over who manages the bloat more gracefully. The protocol underneath isn't the interesting question anymore.

Codex leaned on the model so you could stop thinking about all this. For now, the thinking is still the job.

FAQ

How do I add an MCP server to Codex? Run codex mcp add <name> -- <command> for a local stdio server, or codex mcp add <name> --url <url> for a Streamable HTTP one. You can also edit ~/.codex/config.toml directly, which the docs recommend when you need fine-grained control.

Where is the Codex MCP config file? At ~/.codex/config.toml for global servers, or .codex/config.toml in a project, which loads only in trusted directories. It's TOML, not JSON, and the CLI and IDE extension share the same file.

Does Codex support tool search like Claude Code? Not yet. Codex discovers MCP tools before the first turn, while Claude Code defers them by default. An open Codex issue requests the feature. Until it ships, trim each server with its enabled_tools allow-list to keep context lean.

Why does Codex hang when starting MCP servers? Codex waits on MCP startup before your first turn, so one slow, unauthenticated, or unreachable server can block the session. Disable or fix that server (set enabled = false, or required = false so it can't block startup). Only raise startup_timeout_sec if the server is healthy but slow to initialize.

Can Codex use the same MCP servers as Claude Code? Yes. MCP is an open standard, so the server packages and endpoints are the same. Only the configuration differs: Codex uses TOML in ~/.codex/config.toml, while Claude Code uses JSON.

Are MCP servers worth using in Codex in 2026? Yes, kept lean. A few servers for real external reach, like docs, repo, database, and browser, pay off. Piling on servers you rarely call wastes context and slows startup, because Codex won't triage tools for you.

Two labs, two bets #

Which makes the pruning your job #

Servers you can't fully trust #

The gap is temporary; the standard isn't #

FAQ #

Two labs, two bets

Which makes the pruning your job

Servers you can't fully trust

The gap is temporary; the standard isn't

FAQ