about · 01

Why Distill?

Every token counts. Distill compresses LLM context upstream — before it ever enters memory — to cut costs, speed up responses, and sharpen output quality.

problem · 02

Noise kills signal.

When you work with an AI coding assistant, you constantly send large context blocks: build outputs, logs, code files, stacktraces. Most of it is redundant or useless.

A typical build error output is thousands of tokens of noise for 5–10 actually useful lines. You're paying for dead weight — and drowning the LLM in context that prevents it from focusing.

Distill fixes this by compressing your context intelligently before it reaches the model. You keep only the signal.

product · 03

One MCP server. Three tools.

Distill is an open-source MCP (Model Context Protocol) server that exposes three always-loaded tools inside Claude Code.

  • auto_optimize01 / 03

    Detects content type (build, logs, diffs, code, stacktraces) and applies content-aware compression.

  • smart_file_read02 / 03

    Reads AST structure instead of raw file content. 7 languages, 5 modes (auto, full, skeleton, extract, search).

  • code_execute03 / 03

    Runs TypeScript in a QuickJS sandbox to batch 5–10 operations in a single MCP call.

No API keys. No cloud services. No auth. Install and start using it immediately.

results · 04

What you get.

40-98%

Lower costs

Up to 98% fewer tokens sent to the LLM, with no signal loss on the content that actually matters.

lower latency

Faster responses

Less context = less tokens to process = shorter time-to-first-token.

higher signal

Sharper results

Less noise, more signal. The LLM focuses on what matters, output quality goes up.

MCP stdio

Native Claude Code integration

[DISTILL:COMPRESSED] marker, PreCompact hook, distill-compressor subagent, slash commands. Zero config on the API side.

get started

Ready to optimize?

One command to set up Claude Code with Distill.

$bunx distill-mcp setup
Distill - Save 98% LLM Tokens with Smart Context Compression