Lower costs
Up to 98% fewer tokens sent to the LLM, with no signal loss on the content that actually matters.
Every token counts. Distill compresses LLM context upstream — before it ever enters memory — to cut costs, speed up responses, and sharpen output quality.
When you work with an AI coding assistant, you constantly send large context blocks: build outputs, logs, code files, stacktraces. Most of it is redundant or useless.
A typical build error output is thousands of tokens of noise for 5–10 actually useful lines. You're paying for dead weight — and drowning the LLM in context that prevents it from focusing.
Distill fixes this by compressing your context intelligently before it reaches the model. You keep only the signal.
Distill is an open-source MCP (Model Context Protocol) server that exposes three always-loaded tools inside Claude Code.
Detects content type (build, logs, diffs, code, stacktraces) and applies content-aware compression.
Reads AST structure instead of raw file content. 7 languages, 5 modes (auto, full, skeleton, extract, search).
Runs TypeScript in a QuickJS sandbox to batch 5–10 operations in a single MCP call.
No API keys. No cloud services. No auth. Install and start using it immediately.
Up to 98% fewer tokens sent to the LLM, with no signal loss on the content that actually matters.
Less context = less tokens to process = shorter time-to-first-token.
Less noise, more signal. The LLM focuses on what matters, output quality goes up.
[DISTILL:COMPRESSED] marker, PreCompact hook, distill-compressor subagent, slash commands. Zero config on the API side.
One command to set up Claude Code with Distill.