May 3, 2026blog

Finding Where Your Tokens Actually Go

A static analyzer for MCP and tool calls, built to cut Claude costs.

I built an MCP proxy for security — a layer in front of a server that can filter or rewrite each call before it goes through. But watching what actually flowed across it, I kept noticing something it wasn’t built for: waste. Tools returning tens of thousands of tokens where a few hundred would do; blank fields and base64 blobs nobody asked for. So I built jqi to give an LLM the leverage to inspect those oversized MCP responses directly — to read the shape of a giant JSON blob without paying to read all of it, and judge how the tools and their return shapes could be simplified.

Both of those steps were manual: I had to notice a bloated tool, then point the model at it. claudenlos closes the loop agentically. It’s an automated discovery system that reads the JSONL conversation logs Claude already keeps under ~/.claude/projects and tells you, tool by tool and MCP by MCP, where the tokens are actually going — no noticing required. No instrumentation, no gateway, no change to how you work; the data is already on disk, and this just reads it. (The name puns on kostenlos, German for “free of charge”; the goal is to get a little closer to it.)

What it has already turned up

I pointed it at my own logs, and three installed servers paid for themselves on the first pass:

  • Hive, a project-management MCP, returns a 22kB JSON object per task — about 8.5k tokens — where 3k would carry everything the model actually uses. The other two-thirds is blank fields and default values repeated on every call: every null you already suspected, every false, every empty array. Omitting them is the entire fix. (This is the server I took apart with jqi.)
  • Notion’s newer MCP attaches a signed S3 URL to every image it returns — 500-plus characters of opaque, expiring query string apiece. The model can’t dereference them and won’t try; it just pays to carry them on every page it reads. An image-heavy page is thousands of characters of links to nowhere.
  • Zoho Mail’s MCP hands back email attachments as base64 blobs inline, so the agent copies the whole file through the context window — tens of thousands of tokens for something a file path would have conveyed in twenty. That’s not a tuning opportunity; it’s the wrong representation. (Remember: this is a live endpoint shipped by someone who’s ostensibly paid to write it.)

Neither of these took any prior suspicion on my part. claudenlos read the logs and ranked them to the top, each with a file:line pointer to a call I could go read. The discovery is mechanical enough that I could rerun it across any set of installed servers and expect it to surface the next one.

What it measures

It computes three things from your logs:

  • Call statistics. For every tool and MCP, the distribution of response sizes — p25/p50/p75/p95/max — turned into an estimated token cost, next to how often each one is called. A tool that returns 300 tokens on the median call but 40k at p95 is a different problem than one that’s uniformly large, and the percentiles make that visible.
  • Sequence analysis. It looks at the runs of tool calls between your messages — the stretches where the agent is working on its own — and builds a transition matrix: which tool tends to follow which, and how often a call is a retry after a failure.
  • Automated findings. On top of the raw stats, it surfaces the high-leverage cases: high-volume tools, high-variance results, near-deterministic call chains (one tool follows another ≥80% of the time), high retry rates, and outright broken tools. Each finding carries an example file:line pointer, so you can go read the call that triggered it.

What you do with the findings

The categories map onto fixes:

  • A high-volume, high-cost tool is a candidate for simplification — Hive, above. claudenlos is the thing that tells you that tool is worth an afternoon.
  • An MCP that returns the wrong representation — Zoho’s base64 attachments — can burn tens of thousands of tokens a call. The percentile spread is the tell.
  • A bulky-but-opaque field — Notion’s signed image URLs — is a rewrite target, not a deletion. The proxy can swap each 500-character URL for a short virtual one and expose a synthetic tool that resolves a virtual URL to a file on disk when the model actually needs the image. The context carries a stub; the bytes land on disk only on demand.
  • A near-deterministic chain is an automation opportunity in disguise. If tool A is followed by tool B 95% of the time, B probably shouldn’t be a separate round-trip through the model at all; fold it into A, or into a skill.
  • A high retry rate usually means a tool is broken, or its interface is confusing the model. It’s cheaper to fix the tool than to keep paying for the retries.

It runs in one line over whatever logs it can find:

uv run claudenlos --alias gh,github --chars-per-token 3.5 -v

It auto-discovers Claude profiles across ~/.claude*/projects, deduplicates aliases that point at the same MCP, and writes tab-separated output you can paste straight into a spreadsheet and sort.

One caveat: claudenlos is a simple tool — a few hundred lines that read logs and count characters. It’s a starting point, not an oracle, and the plain TSV is deliberate: feed it into an LLM and let that do the investigating. The model can rank the findings, follow each file:line to the call that triggered it and propose the fix. The tool identifies, you have to diagnose.

Why bother

Token cost is easy to wave away one call at a time — what’s another few thousand tokens? But tool and MCP overhead compounds: the same wasteful response, on every call, across every conversation, for as long as that server stays installed. The leverage is large and nearly invisible, because almost nobody reads their own logs.

claudenlos doesn’t fix anything itself. It points, with numbers, at the handful of tools where an hour of work pays for itself many times over — and then jqi, or a proxy filter, or a deleted round-trip does the rest. Measure first.