AI-driven MCP Security Proxy

Summer Yue, Meta’s director of alignment, lost her emails. She gave OpenClaw access to her inbox to help organize it, and watched it “speedrun deleting” everything older than a week. She typed “STOP OPENCLAW” from her phone. Nothing. She had to run to her Mac mini to kill it manually.

Summer Yue watching OpenClaw ignore her.

Yue called it a “rookie mistake” afterward. That’s probably fair, but it’s also not the right frame. Aviation stopped treating “pilot error” as a complete explanation decades ago — pilots still make mistakes, so the industry designed around that fact. Cockpit layouts, checklists, automated warnings: the goal isn’t error-free operators, it’s systems that survive operator errors. The same logic applies here.

Instead of terminating the explanation at the user or the agent, lets think of this as an infrastructure issue. A simple proxy in front of the email server could have prevented this disaster, no matter if the agent is misguided or malicious. Simple infrastructure-layer constraints like:

Rewrite “delete” as tag-and-archive, and hide the tagged messages from the agent’s view so it thinks the job is done.
Cap deletions at 20 per hour, so a runaway loop can’t clean out an inbox overnight.
Refuse to touch anything more than an hour old.

These constraints are outside the control of the agent and enforced regardless of prompt. They are also very easily implemented.

The proxy

mcp_proxy_maker is an MCP proxy with a plugin pipeline. Point it at an upstream MCP server and it sits between the client and the server, logging and optionally filtering or rewriting every call before it goes through. Its a thin pass-through layer, unremarkable on its own. The useful part is two Claude Code skills that ship with it:

/probe-mcp probes the upstream server for security issues. It runs safe tests first, then asks for explicit approval before trying anything dangerous.
/propose-filters takes the probe results and generates mitigations: YAML filter rules, argument rewrites, or custom Python plugins that inspect request and response content.

To use it, you just stand up the proxy, run /probe-mcp to probe the filters, then run /propose-filters. Claude writes the fix; you review it.

A simple example

mcp-server-fetch is a small MCP server that takes a URL and returns the page as Markdown. Real tool, in real use. I pointed the proxy at it and ran /probe-mcp. Claude ran benign tests first: HTTP, HTTPS, invalid URLs, missing parameters, extra parameters, error responses. Then it asked whether to proceed with the dangerous tests. I said yes.

Claude's safety classification before running the dangerous probes.

The findings were not good:

No SSRF protection. Loopback addresses, cloud metadata endpoints (169.254.169.254), and private-network IPs all go through as-is.
No scheme whitelisting. file://, ftp://, and data: URIs pass validation and reach the upstream server.
No response size or timeout limits. A 100 KB binary response and a 10-second stall both go through unchecked.
No content filtering. Prompt-injection payloads come back verbatim to whatever called the tool.

See the full probe report for details.

I told Claude to fix only the critical SSRF and scheme issues. It generated a small Python plugin that rejects non-http/https schemes immediately, resolves the target hostname, and checks each returned IP against the loopback, private, and link-local ranges before the request goes out. Small, readable, does exactly what I need it to.

What I’m actually building this for

I have a personal agent, Benchclaw, that I’m setting up for my startup in medical technology. Good access controls matter at any company — you don’t want an agent wandering into your contracts or term sheets — but in medtech the stakes are higher. Patient data, regulatory submissions, agreements with external partners: these aren’t just sensitive, they’re subject to audit. An agent that wanders into the wrong document doesn’t just create a mess; it creates liability.

The obvious solution is “just don’t give it access to that workspace.” The problem is that agents are bad at boundaries. If something is technically reachable, it tends to get reached, usually because some other tool or context window pulls it in sideways.

So I built a Notion access control plugin for the proxy. Permissions are declared inside the Notion pages themselves — the first line of each page lists which bots can access it, using a pair of emoji markers: 🖊 for read-write and 👀 for read-only. A page with no marker for the agent is effectively invisible; any attempt to access it returns an access-denied error. Users control access by editing pages directly in Notion, so non-technical staff can use it effortlessly.

The permission markers themselves are tamper-resistant from within the agent’s toolchain. Operations that would modify them are modified to preserve the marker or rejected outright. New child pages inherit the parent’s markers automatically, so access permissions propagate as the workspace grows.

This kind of structural guarantee matters in a regulated environment. You can tell an agent “don’t touch the regulatory submissions folder” in a system prompt. But an instruction is not a control. Proxy-level enforcement doesn’t care what the agent was told or what it inferred from context — the fence is there regardless, and it’s the only kind of fence worth relying on.

Other use-cases

Writes to the internal Kanban board go through, but deletions get rewritten as soft-deletes with a tag. A human can see what the agent wanted to remove and decide whether that was right. External Kanban board is walled off.
Calendar invites can be created, but only with internal email addresses. Anything external requires a human to send.

I’m also planning to use this for project management tools and a few others I haven’t fully thought through yet.

The pattern is the same each time: figure out what the blast radius looks like if the agent makes a bad call, and then make the bad outcomes structurally unavailable rather than just instructionally discouraged. Nobody expects pilots to be infallible. We just haven’t built the equivalent infrastructure for agents yet. That’s what this is.

AI-driven MCP Security Proxy

Using Claude to build a less insecure MCP

The proxy

A simple example

What I’m actually building this for

Other use-cases

Sitemap

Connect