MCP Is Where AI Agents Stop Being Toys
I did not start caring about MCP because of the protocol spec. I read the spec, it is fine, it is JSON-RPC with some sensible primitives, and on its own it would not have earned a post. I started caring about MCP the day it changed what an agent is. Before tool access, an agent generates text. You read the text, you decide, you act. After tool access, the agent acts. It opens the pull request, it queries the database, it restarts the container, it pages someone. The text was a suggestion. The tool call is an operation.
That is the whole shift, and it is the reason the Model Context Protocol matters more than the people demoing it seem to realize. MCP is the thing that took “AI assistant” — a chat box that produces words — and turned it into an operator that produces side effects in your systems. The moment an agent can do something real, it stops being a toy and starts being a service account with a language model attached. We know how to reason about service accounts. We have spent twenty years learning how badly they go wrong. The mistake I keep seeing is people treating MCP like a cute plugin mechanism instead of what it is: a new operational boundary in production.
What MCP actually standardizes
Strip away the marketing and MCP does four boring, useful things. It standardizes tool discovery — an agent can ask a server “what can you do?” and get a machine-readable answer. It standardizes schemas — every tool declares its inputs and outputs in a typed shape, so the model is not guessing at argument formats. It standardizes invocation — there is one consistent way to call a tool and get a result back, over stdio or HTTP. And it standardizes the boundary between the model and the external system, which is the part that actually matters.
Before MCP, every integration was bespoke. You wired GitHub into your agent one way, your internal API another way, your Postgres a third way, and each connector was a snowflake nobody else could review. MCP makes the connector a thing: a server, with a defined surface, that sits between the model and the system it touches. That is genuinely good engineering. A standard boundary is a reviewable boundary, an auditable boundary, a boundary you can put policy on. The protocol’s real contribution is not that it connects models to tools. Plenty of things did that. It is that it makes the connection a named component you can govern.
Why this lands hard for DevOps and SRE
For most of the AI conversation, infrastructure people have been spectators. Chatbots are not our problem. But MCP is, because the tools worth exposing to an agent are our tools.
Think about what an MCP server can put in front of a model in a platform team: GitHub and your git host. The filesystem. Your observability stack — metrics, logs, traces. Runbooks. CI/CD pipelines. Cloud provider APIs. Databases. Internal service APIs that were never built with an autonomous caller in mind. Every one of those is a real lever on production, and MCP is the mechanism that hands the lever to something that runs in a loop without a human between each pull.
This is exactly where it gets interesting, because the dangerous part and the useful part are the same part. An agent that can read your metrics, correlate them with the last deploy, check the runbook, and draft the rollback PR is genuinely valuable — that is real toil gone. But “read your metrics, check the deploy, draft the rollback” is one MCP config away from “apply the rollback,” and one sloppy permission away from “apply a rollback, to the wrong service, at 2 a.m., because a log line told it to.” There is no version of this where you get the capability without the exposure. The capability is the exposure.
The failure modes are old friends
None of the risks here are exotic. They are the same integration failures we already know, wearing a new coat. That should be reassuring and it should also worry you, because we keep losing to these even without a language model in the loop.
- Overbroad permissions. The server gets a token with admin scope because that was the fastest way to make the demo work. Now the agent has admin scope.
- Secrets leaking into subprocesses. MCP servers launched over stdio inherit an environment. Drop a cloud key in there and every child process — and every tool that shells out — can see it.
- Untrusted servers. “Just
npxthis server” and “justuvxthat one” are the new curl-pipe-bash. You are running someone else’s code, with your tokens, against your systems, on the strength of a README. - No audit trail. The agent made forty tool calls during an incident and you have no structured record of which ones touched what. Good luck with the postmortem.
- Prompt injection through tool output. This is the one people underrate. A tool returns data — an issue body, a log line, a row from a table — and that data contains instructions. The model reads it as part of the conversation and does what it says. Your tool output is now an attack surface.
- Destructive tools with no guardrails. A
delete, adrop, aterminateexposed with the same ceremony as alist. No confirmation, no rate limit, no timeout. One bad inference and it is done. - Unclear ownership. Who owns this server? Who reviews its upgrades? Who is paged when it misbehaves? If the answer is “the person who added it to their config last month,” you do not have an integration, you have a liability.
Operate it like infrastructure
The fix is not new and it is not clever. It is the discipline we already apply to anything privileged, applied here on purpose instead of by accident.
Start from least privilege and read-only by default. An agent that can only read solves most of the valuable problems and causes almost none of the catastrophic ones. Make write access a deliberate, narrowly scoped decision, never the starting point. Put allowlists on what a tool can touch — repos, namespaces, tables, paths — so “query the database” cannot become “query any database.”
Isolate the workspace. The agent gets its own credentials, its own blast radius, its own sandbox. It does not borrow your shell, your keys, or your laptop’s ambient access. And keep personal tools and production tools apart. The MCP servers I let an agent use to tinker on my own machine are not the servers that should ever see prod. Mixing those two contexts is how a convenience becomes an incident.
Log every tool call, structured. Tool name, arguments, caller, result, timestamp. This is non-negotiable. If you cannot reconstruct what the agent did, you are not running it, you are hoping. Put timeouts and rate limits on every tool, because an agent in a bad loop will hammer a thing far faster and far more patiently than a human ever would.
Treat prompt injection as a given, not an edge case. Anything a tool returns is untrusted input the moment it enters the context. Privileged actions need a boundary that the content of a log line cannot cross on its own — a human confirmation, a separate approval step, a tool that simply does not exist in the high-trust context.
And review every MCP server like the small privileged integration it is. Read the code. Pin the version. Know what it talks to and with whose credentials. “It is just an MCP server” is the sentence that will be in the postmortem.
Above all, observe the boundary itself. The interesting telemetry in an agentic system is not the model’s tokens, it is the line where the model meets your systems. Watch the tool calls the way you watch API traffic at a trust boundary, because that is precisely what they are.
Where this goes
MCP is going to stick. It solves a real problem, the standard is good enough, and the ecosystem has already decided. I am not betting against it and you should not either. The open question was never whether agents would get tools. They have them. The open question is whether teams will operate that tool access like infrastructure — scoped, logged, owned, observed — or keep pretending it is just another integration somebody added to a config file.
That choice is the difference between an agent that is a genuine operator on your team and an agent that is an unaudited admin account with a confident voice. The protocol gives you the boundary. What you build on it is the actual work, and it is the same work it has always been: least privilege, real logs, clear ownership, and a healthy suspicion of anything that can change production while you are asleep.