Skip to main content

MCP server

Bleep ships a Model Context Protocol server. Point Claude Code (or any MCP-aware client) at it and an agent can compile, test, run, and inspect your build through 18 structured tool calls — without parsing CLI output, without keeping a long-running interactive shell open, without reading pages of context for a one-line answer.

The design is built for the world where multiple agents run against the same build at the same time:

  • Compact by default. Every tool returns a small JSON summary (error counts, failure suites, the diff against the previous run). Full output is one extra call away — bleep.status for the cached last run, or verbose=true on the original tool.
  • Errors stream. Per-project compile errors land as MCP notifications the instant that project finishes, not at the end of the whole build. The latency floor for a real failure is milliseconds.
  • In-process. The MCP server runs against bleep’s BSP server inside the same JVM. Every call is sub-second after warmup, sub-100ms once warm. Four parallel agents do not stall on a tool that doesn’t exist outside one of them.

Setup

Run from your build root:

bleep setup-mcp-server

That writes .mcp.json — the config file that Claude Code, Cursor, and any other MCP client reads to discover servers. Restart the client (or trigger a re-scan) and the bleep tools appear.

The flag --force-jvm runs the MCP server through the JVM rather than the native binary — useful when iterating on bleep itself.

The tool surface

ToolEffectWhat it does
bleep.compileread-onlyCompile selected projects. Returns error counts and a diff vs the previous run.
bleep.testread-onlyRun tests. Returns pass/fail counts, failure summaries, and a diff.
bleep.test.suitesread-onlyList test suite class names without running them. Requires projects to be compiled.
bleep.sourcegenadditiveRun sourcegen scripts for selected projects.
bleep.fmtadditiveFormat Scala and Java sources via scalafmt and google-java-format.
bleep.cleandestructiveDelete compile output for selected projects.
bleep.watchadditiveStart a background watch job. Returns a jobId; results stream as notifications.
bleep.syncread-onlyPull the latest results from active watch jobs (or do a fresh compile if none are running).
bleep.watch.stopdestructiveStop a watch job, or all of them if no jobId is given.
bleep.build.effectiveread-onlyThe project config after templates apply — what bleep sees.
bleep.build.resolvedread-onlyFully resolved classpath, source dirs, compiler JARs. Requires prior compile.
bleep.projectsread-onlyList projects with their dependencies and test-project flag.
bleep.programsread-onlyList projects with a mainClass (runnable programs).
bleep.scriptsread-onlyList the named scripts under scripts: in bleep.yaml.
bleep.runadditiveCompile and run a project or script. Returns stdout/stderr and exit code.
bleep.statusread-onlyThe cached results from the last build/test, with full diagnostics. Paginated.
bleep.restartdestructiveExit the MCP server process. The client will relaunch it.

Effect mirrors the MCP spec's tool semantics — clients use it to decide whether a tool can run unattended.

Output shape

Every tool returns a JSON document. By default that document is a summary, not a transcript:

{
"compiled": 12,
"errors": 0,
"warnings": 3,
"diff": {
"newErrors": [],
"newWarnings": [{"project": "myapp", "file": "Main.scala", "line": 42, "message": "..."}]
}
}

The diff is computed against the previous run that touched the same projects (a two-slot ring buffer is kept in memory). An agent that runs bleep.compile twice in a row sees an empty diff on the second call — the response is small and obviously a no-op.

Two ways to get full output when you actually need it:

  • bleep.status returns the full diagnostics from the last run, with project, limit, offset parameters for pagination. This is the right call when an agent has already seen the summary and decided to drill in.
  • verbose=true on bleep.compile / bleep.test returns the full output inline. Use sparingly — it's much larger.

Watch and sync

Long-running compile/test loops use a different shape. bleep.watch starts a background fiber and returns a jobId. Results stream to the client as MCP notifications. To pull the latest snapshot synchronously, the agent calls bleep.sync, which reads from every active watch job and returns the same compact summary shape.

agent: bleep.watch { mode: "test", projects: ["myapp"] }
→ { jobId: "w1" }
... time passes; notifications stream as projects compile and tests run ...
agent: bleep.sync
→ { watchResults: [{ jobId: "w1", mode: "test", result: {...} }] }
agent: bleep.watch.stop { jobId: "w1" }

Without an active watch job, bleep.sync falls back to a fresh compile of every project — useful as a "where am I?" probe.

Why these design choices

A few words on the why, since each choice paid for itself within the first session of using bleep with multiple parallel agents.

Compactness over completeness

A full compile transcript can be tens of thousands of tokens. An agent making decisions doesn't need the transcript — it needs to know did anything go wrong, and what changed since last time. Returning a summary first, with bleep.status available for drill-in, turns a 30 000-token tool response into a 200-token one and a follow-up.

Diff against the previous run

Coding agents call build tools dozens of times per session, and most of those calls are no-ops. Highlighting what changed between calls lets the agent reason about progress without re-reading every error.

Errors stream, results summarise

When something is wrong, latency matters — the agent should know about a compile error the instant the project finishes, not 30 seconds later when the rest of the build wraps. So per-project errors stream as notifications during the call, while the summary that wraps up the call is the diff and the counts.

In-process BSP

The MCP server runs against bleep's existing BSP server inside the same JVM. Tool calls don't fork processes, don't reload bleep, don't incur the daemon-handshake tax that every other build tool has. This is the only reason "four agents in parallel" doesn't degrade.

See also