Add the foundation for deploying and managing MCP servers on the VPS broker, with per-peer credential vaults and visibility scopes. Architecture: - One Docker container per mesh with a Node supervisor - Each MCP server runs as a child process with its own stdio pipe - claudemesh launch installs native MCP entries in ~/.claude.json - Mid-session deploys fall back to svc__* dynamic tools + list_changed New components: - DB: mesh.service + mesh.vault_entry tables, mesh.skill extensions - Broker: 19 wire protocol types, 11 message handlers, service catalog in hello_ack with scope filtering, service-manager.ts (775 lines) - CLI: 13 tool definitions, 12 WS client methods, tool call handlers, startServiceProxy() for native MCP proxy mode - Launch: catalog fetch, native MCP entry install, stale sweep, cleanup, MCP_TIMEOUT=30s, MAX_MCP_OUTPUT_TOKENS=50k Security: path sanitization on service names, column whitelist on upsertService, returning()-based delete checks, vault E2E encryption. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
45 KiB
Mesh Services: MCP Servers & Skills Platform
Consolidated spec for deploying, managing, and executing MCP servers and multi-file skills within a claudemesh mesh. Covers source modes, execution engine, credential vaults, access control, native Claude Code integration, and dynamic tool discovery.
Problem
Today:
- Skills are a single
instructionstext field in Postgres. No multi-file support. - MCP servers are live-proxied through the registering peer. When that peer disconnects, the server dies. The
persistentflag is cosmetic. - Neither supports bundled artifacts (templates, configs, schemas, example code).
- Claude Code has no way to discover mesh tools natively — peers must use the generic
mesh_tool_callproxy.
Design goals
- Three source modes — inline, zip bundle, git repo — for both skills and MCP servers
- MCP servers run on the VPS, not on peers — true 24/7 persistence
- Sandboxed execution with resource limits
- Native Claude Code tool integration — deployed MCPs appear as regular MCP server entries
- Per-peer credential vault for secrets (OAuth tokens, API keys)
- Visibility scopes on services — peer, group, role, or mesh-wide — deployer controls who can call, not who sees secrets
- Dynamic mid-session discovery via
notifications/tools/list_changed - All existing behavior preserved — inline skills and live-proxy MCPs unchanged
Architecture overview
┌──────────────────────────────────────────────────────────────────┐
│ claudemesh launch --name Mou --mesh dev │
│ │
│ 1. Connect to broker, authenticate │
│ 2. Fetch service catalog (scope-filtered for this peer) │
│ 3. Write native MCP entries to ~/.claude.json: │
│ mesh:gmail, mesh:context7, mesh:whatsapp │
│ 4. Spawn claude │
│ 5. On exit: remove mesh:* entries │
└──────────┬───────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Claude Code session │
│ │
│ MCP: claudemesh (stdio) │
│ ├── send_message, list_peers, set_summary, ... (peer comms) │
│ ├── mesh_mcp_deploy, mesh_mcp_scope, ... (service mgmt) │
│ ├── vault_set, vault_list, ... (credentials) │
│ └── mesh_mcp_schema (introspection)│
│ │
│ MCP: mesh:gmail (stdio proxy) → mcp__mesh_gmail__* │
│ MCP: mesh:context7 (stdio proxy) → mcp__mesh_context7__* │
│ MCP: mesh:whatsapp (stdio proxy) → mcp__mesh_whatsapp__* │
│ │
│ MCP: playwriter (stdio, local) → local MCPs as usual │
│ MCP: figma (stdio, local) │
└──────────┬───────────────────────────────────────────────────────┘
│ Each mesh:* proxy connects via WebSocket
▼
┌──────────────────────────────────────────────────────────────────┐
│ Broker (VPS — wss://ic.claudemesh.com/ws) │
│ │
│ Existing: message routing, presence, state, memory, files, ... │
│ │
│ New: Service Catalog │
│ ├── Scope enforcement (peer/group/role/mesh visibility) │
│ ├── Tool schema registry (from runner) │
│ ├── Deploy/undeploy/update commands │
│ └── System events: mcp_deployed, mcp_undeployed │
│ │
│ New: Vault │
│ └── Per-peer encrypted credential storage │
│ │
│ Tool call routing: │
│ ├── Managed service? → forward to runner │
│ └── Live proxy? → forward to hosting peer (existing) │
└──────────┬───────────────────────────────────────────────────────┘
│ stdio (child process)
▼
┌──────────────────────────────────────────────────────────────────┐
│ Runner (one Docker container per mesh) │
│ │
│ Supervisor (Node main thread) │
│ ├── stdin/stdout ↔ broker (JSON-RPC multiplexed) │
│ ├── Routes tool calls by service name │
│ ├── Lifecycle: load / unload / restart │
│ ├── Health: MCP ping per child, restart on 3 failures │
│ ├── Logs: 1000-line ring buffer per service │
│ └── Vault: decrypts credentials at spawn time │
│ │
│ Child processes (one per MCP server): │
│ ├── child_process.spawn("node", [...]) ← Node MCP servers │
│ ├── child_process.spawn("uvx", [...]) ← Python MCP servers │
│ ├── child_process.spawn("npx", [...]) ← npm MCP packages │
│ │ │
│ │ Each child: │
│ │ ├── Own stdio pipe (MCP protocol) │
│ │ ├── Own env vars (including vault-resolved secrets) │
│ │ ├── Own /secrets/<name>/ dir (vault files) │
│ │ └── Killed individually on undeploy │
│ │ │
│ Base image: node:22 + python3.12 + uv + npx │
│ Limits: --memory=512m --cpus=1 --network=mesh-restricted │
└──────────────────────────────────────────────────────────────────┘
Source modes
1. Inline (existing, unchanged)
share_skill(name, description, instructions, tags) ← text-only skill
mesh_mcp_register(server_name, description, tools) ← live peer proxy
2. Zip bundle
Upload a zip, then deploy:
1. share_file(path="./my-server.zip", tags=["mcp-bundle"]) → fileId
2. mesh_mcp_deploy(file_id=fileId, server_name="my-server", config={...})
MCP server zip structure:
my-mcp-server/
├── package.json # or pyproject.toml / requirements.txt
├── src/index.ts # MCP server entry (stdio transport)
├── .env.example # declares required env vars
└── README.md
Skill bundle zip structure:
my-skill/
├── SKILL.md # instructions (replaces inline text)
├── skill.json # { name, description, tags }
├── templates/ # prompt templates, examples
└── schemas/ # JSON schemas, configs
3. Git repository
mesh_mcp_deploy(
git_url="https://github.com/user/my-mcp-server.git",
branch="main",
server_name="my-server",
config={ env: { API_KEY: "$vault:my-api-key" } }
)
- Shallow clone (
--depth 1) - Commit SHA pinned in DB for auditability
mesh_mcp_update(server_name)→ git pull + rebuild + restart- Auth via
config.git_auth(stored encrypted, never logged)
Execution engine
Why child processes, not worker threads
MCP servers use stdio transport — each server owns its stdin/stdout via
StdioServerTransport. Two servers can't share one process. Worker threads
don't help because:
- MCP SDK
StdioServerTransporttakes over process stdin/stdout npx @package/mcp-serverspawns its own process anyway- Python MCPs need a Python runtime, not a Node thread
The runner spawns each MCP server as a child process with its own stdio pipe, exactly how every MCP server is designed to work.
Container design: one per mesh
┌─ Docker container (mesh: "dev") ─────────────────┐
│ │
│ Supervisor (Node main thread) │
│ ├─ stdio ↔ broker │
│ ├─ routes calls by service name │
│ │ │
│ ├─ spawn("npx", ["@upstash/context7-mcp"]) │
│ │ └─ stdio pipe ↔ MCP protocol │
│ ├─ spawn("node", ["dist/index.js"]) │
│ │ └─ stdio pipe ↔ MCP protocol │
│ ├─ spawn("uvx", ["mcp-outline"]) │
│ │ └─ stdio pipe ↔ MCP protocol │
│ └─ spawn("python", ["-m", "server"]) │
│ └─ stdio pipe ↔ MCP protocol │
│ │
│ Base: node:22 + python3.12 + uv + npx │
│ Limits: --memory=512m --cpus=1 │
│ Network: mesh-restricted bridge (allowlist) │
└───────────────────────────────────────────────────┘
Why one container, not N:
- One Docker process to manage, one cgroup for the whole mesh
- One network namespace — single firewall config
- Shared node_modules / pip cache across services
- VPS resources: 8 vCores / 24GB — N containers exhausts memory fast
Why not zero containers (bare child processes on the broker):
- Broker stays routing-only — runner crashes don't take it down
- Security boundary — runner can't access broker's DB or filesystem
- Runner can be on a different machine later (NUC, second VPS)
Supervisor protocol
Broker ↔ runner communicate over the container's stdin/stdout as JSON lines:
// Broker → runner
{ action: "load", name: "gmail", path: "/services/gmail", env: {...} }
{ action: "call", name: "gmail", tool: "search_emails", args: {...}, callId: "abc" }
{ action: "unload", name: "gmail" }
{ action: "health", name: "gmail" }
{ action: "list_tools", name: "gmail" }
// Runner → broker
{ callId: "abc", result: {...} }
{ callId: "abc", error: "connection refused" }
{ type: "loaded", name: "gmail", tools: [{name, description, inputSchema}] }
{ type: "unloaded", name: "gmail" }
{ type: "crashed", name: "gmail", restarts: 3, error: "OOM" }
{ type: "health", name: "gmail", ok: true, rssKb: 45000 }
Runtime auto-detection
| File found | Runtime | Spawn command |
|---|---|---|
package.json |
node | npm install && node <main> |
package.json with npx hint |
node | npx <package> |
pyproject.toml |
python | pip install . && python -m <module> |
requirements.txt |
python | pip install -r requirements.txt && python <entry> |
Bunfile or bun.lockb |
bun | bun install && bun <entry> |
Health & restart
- Supervisor sends MCP
pingto each child every 30s - No response within 5s → mark unhealthy
- 3 consecutive failures → restart (kill + re-spawn)
- Max 5 restarts → status=
crashed, notify deployer via mesh system event - On crash:
{ type: "push", event: "mcp_crashed", eventData: { name, error, restarts } }
Logs
Per-service ring buffer (1000 lines). Captures child's stderr + stdout
(excluding MCP protocol JSON). Accessible via mesh_mcp_logs(name, lines?).
Storage layout
/var/claudemesh/services/
├── <meshId>/
│ ├── <serviceName>/
│ │ ├── source/ # extracted zip or git clone
│ │ ├── secrets/ # vault-resolved credential files
│ │ ├── node_modules/ # or .venv/ for Python
│ │ └── .meta.json # { pid, startedAt, sha, runtime }
Network policy
Default: --network=mesh-restricted (Docker bridge with outbound deny-all).
Per-service allowlist in deploy config:
{
"network_allow": [
"gmail.googleapis.com:443",
"oauth2.googleapis.com:443",
"100.113.153.45:*"
]
}
Implemented via iptables rules on the bridge, or per-container --add-host
entries combined with a proxy. For Tailscale-accessible services (NUC, etc.),
allow the Tailscale IP.
Credential vault
Design
Per-peer encrypted storage on the broker. Credentials never leave the vault in plaintext — decrypted only inside the runner container at spawn time.
Peers don't share credentials. They share access to the running MCP server via scopes. The MCP server runs with the deployer's credentials; other peers call it without ever seeing the secrets.
Encryption model
Same crypto as E2E file sharing (crypto/file-crypto.ts):
- Peer generates random symmetric key
- Encrypts the credential with
crypto_secretbox(symmetric) - Seals the symmetric key with their own pubkey (
crypto_box) - Stores sealed key + ciphertext on broker — broker sees only ciphertext
- At spawn time: runner requests decryption from the deployer's sealed key (the runner holds a mesh-scoped keypair granted by the deployer at deploy time)
Vault reference syntax
In mesh_mcp_deploy env config, $vault: prefix triggers vault resolution:
$vault:api-key → inject as env var
$vault:gmail-creds:file:/secrets/creds.json → decrypt, write to file, set env var to path
Examples:
mesh_mcp_deploy({
server_name: "gmail",
git_url: "https://github.com/gongrzhe/server-gmail-autoauth-mcp",
env: {
GMAIL_CREDENTIALS_PATH: "$vault:gmail-creds:file:/secrets/credentials.json",
GMAIL_OAUTH_PATH: "$vault:gmail-oauth:file:/secrets/gcp-oauth.keys.json",
},
network_allow: ["gmail.googleapis.com:443", "oauth2.googleapis.com:443"],
})
MCP tools
vault_set(key, value, type?, mount_path?) — encrypt + store
value: string (env var) or local file path (reads + encrypts the file)
type: "env" (default) or "file"
mount_path: for files, where to write inside the service dir
vault_list() — list keys (no values, metadata only)
vault_delete(key) — remove entry
DB schema
CREATE TABLE mesh.vault_entry (
id TEXT PRIMARY KEY,
mesh_id TEXT NOT NULL REFERENCES mesh.mesh(id) ON DELETE CASCADE,
member_id TEXT NOT NULL REFERENCES mesh.member(id),
key TEXT NOT NULL,
-- E2E encrypted content
ciphertext BYTEA NOT NULL,
nonce BYTEA NOT NULL,
sealed_key BYTEA NOT NULL, -- symmetric key sealed with peer's pubkey
-- Metadata (plaintext)
entry_type TEXT DEFAULT 'env' CHECK (entry_type IN ('env', 'file')),
mount_path TEXT,
description TEXT,
created_at TIMESTAMP DEFAULT now(),
updated_at TIMESTAMP DEFAULT now(),
UNIQUE (mesh_id, member_id, key)
);
Visibility scopes
Model
Scopes control who can see and call a service. Credentials are invisible to callers — they interact with the running service, not the secrets behind it. The deployer controls visibility; the vault handles secrets separately.
Scope levels
| Scope | Who sees it | Use case |
|---|---|---|
peer |
Only the deployer (default) | Personal tools, staging before publish |
{ peers: [...] } |
Named peers | Shared between specific people |
{ group: "eng" } |
All @eng members | Team-specific tools |
{ groups: ["eng", "ops"] } |
Multiple groups | Cross-team tools |
{ role: "lead" } |
Any peer with that role | Role-gated admin tools |
mesh |
Everyone in the mesh | Shared utilities |
Examples
┌─────────────────────────────────────────────────┐
│ Mesh: "dev-team" │
│ │
│ mesh scope ─── everyone │
│ ├── context7 (utility) │
│ ├── youtube-transcript │
│ └── mesh-db (shared database) │
│ │
│ group scope ─── @group members only │
│ ├── @eng │
│ │ ├── github-mcp (eng team's GitHub) │
│ │ └── ssh-manager (eng infra access) │
│ ├── @sales │
│ │ ├── apollo-io (sales CRM) │
│ │ └── gmail (sales@ inbox) │
│ └── @ops │
│ ├── stalwart-mail (mail server admin) │
│ └── namecheap (DNS management) │
│ │
│ role scope ─── by role tag │
│ ├── lead → mesh-admin-tools (deploy, vault) │
│ └── observer → (read-only MCPs only) │
│ │
│ peer scope ─── only specific peers │
│ ├── Alejandro │
│ │ ├── gmail-personal (my inbox) │
│ │ └── gworkspace (my workspace) │
│ └── Mou │
│ └── cursor-composer (Mou's Cursor) │
│ │
└─────────────────────────────────────────────────┘
Deploy with scope
// Mesh scope — everyone
mesh_mcp_deploy({
server_name: "context7",
source: { type: "git", url: "..." },
scope: "mesh",
})
// Group scope — only @eng
mesh_mcp_deploy({
server_name: "github-mcp",
source: { type: "git", url: "..." },
scope: { group: "eng" },
})
// Multi-group
mesh_mcp_deploy({
server_name: "ssh-manager",
scope: { groups: ["eng", "ops"] },
})
// Role scope — only leads
mesh_mcp_deploy({
server_name: "mesh-admin",
scope: { role: "lead" },
})
// Peer scope — just me (default)
mesh_mcp_deploy({
server_name: "gmail-personal",
scope: "peer",
})
// Specific peers
mesh_mcp_deploy({
server_name: "shared-workspace",
scope: { peers: ["Mou", "Alejandro"] },
})
Enforcement
- At catalog time: broker filters the service catalog by scope before
sending to peers in
hello_ack. The peer's groups and role (fromhello) are matched against each service's scope. A tool you can't access never appears in Claude's tool list. - At call time: broker re-checks scope before routing. Double-check in case catalog is stale or the peer's groups changed.
Scope resolution logic
function peerCanAccess(service: Service, peer: PeerConn): boolean {
const scope = service.scope;
if (typeof scope === "string") {
if (scope === "peer") return service.deployed_by === peer.memberId;
if (scope === "mesh") return true;
}
if ("peers" in scope) {
return scope.peers.some(p =>
p === peer.memberId || p === peer.displayName);
}
if ("group" in scope) {
return peer.groups.some(g => g.name === scope.group);
}
if ("groups" in scope) {
return peer.groups.some(g => scope.groups.includes(g.name));
}
if ("role" in scope) {
return peer.groups.some(g => g.role === scope.role);
}
return false;
}
MCP tools
mesh_mcp_scope(server_name, scope?)
scope set: mesh_mcp_scope("gmail", { group: "sales" })
scope read: mesh_mcp_scope("gmail") → { scope, deployed_by }
Scope change events
When a scope changes, the broker:
- Computes which peers gained/lost access
- Sends
mcp_scope_changedsystem event to affected peers - Peers who gained access get
svc__*dynamic tools vialist_changed - Peers who lost access get tools removed via
list_changed - Full native access requires session restart
DB
Single column on mesh.service:
scope JSONB DEFAULT '{"type": "peer"}'
-- {"type": "peer"}
-- {"type": "mesh"}
-- {"type": "peers", "allow": ["member_id_1", "member_id_2"]}
-- {"type": "group", "group": "eng"}
-- {"type": "groups", "groups": ["eng", "ops"]}
-- {"type": "role", "role": "lead"}
Future: cross-mesh scope
Not for v1. Each mesh is isolated. The schema supports it later:
{"type": "cross_mesh", "meshes": ["dev", "staging"]}
A service deployed in dev visible in staging. Requires the runner to be
accessible from both meshes (possible since it's on the VPS).
Native Claude Code integration
Goal
Deployed mesh MCPs feel indistinguishable from locally installed MCP servers.
Claude sees mcp__mesh_gmail__search_emails — not mesh_tool_call("gmail", ...).
At session start: native MCP entries
claudemesh launch queries the broker for the scope-filtered service catalog
and installs each service as a native MCP entry before spawning Claude:
// commands/launch.ts — extended flow
// Step 3 (new): fetch service catalog from broker
const catalog = await fetchServiceCatalog(mesh);
// Step 4 (new): write mesh MCP entries to ~/.claude.json
for (const service of catalog) {
addMcpEntry(`mesh:${service.name}`, {
command: "claudemesh",
args: ["mcp", "--service", service.name],
});
}
// Step 5: spawn claude with mesh-aware env
const child = spawn("claude", claudeArgs, {
env: {
...process.env,
CLAUDEMESH_CONFIG_DIR: tmpDir,
CLAUDEMESH_DISPLAY_NAME: displayName,
// Mesh calls traverse: proxy → WS → broker → runner → child.
// Default MCP timeout is too short for this chain.
MCP_TIMEOUT: process.env.MCP_TIMEOUT ?? "30000",
// Mesh MCPs may return large results (DB queries, file contents).
MAX_MCP_OUTPUT_TOKENS: process.env.MAX_MCP_OUTPUT_TOKENS ?? "50000",
},
});
// Step 6 (extended): cleanup mesh:* entries on exit
child.on("exit", () => {
removeMcpEntries("mesh:*");
cleanup(); // existing tmpdir cleanup
});
Each claudemesh mcp --service <name> is a thin stdio proxy:
// Thin proxy: connects to broker, serves ONE service's tools
const client = new BrokerClient(mesh);
await client.connect();
const tools = await client.getServiceTools(serviceName);
server.setRequestHandler(ListToolsRequestSchema, () => ({ tools }));
server.setRequestHandler(CallToolRequestSchema, async (req) => {
// Wait for broker reconnection if WS is down (up to 10s)
if (client.status !== "open") {
const connected = await client.waitForConnection(10_000);
if (!connected) {
return text("Service temporarily unavailable — broker reconnecting. Retry in a few seconds.", true);
}
}
return await client.mcpCall(serviceName, req.params.name, req.params.arguments);
});
Resilience notes:
- The
BrokerClienthandles WS reconnection with exponential backoff (1s→30s) - Claude Code does NOT auto-restart crashed MCP servers — if the proxy process itself dies, those tools vanish until session restart
- The proxy should catch all exceptions and return MCP errors, never crash
claudemesh doctordiagnoses dead proxy processes mid-session
Result: Claude Code starts and sees:
mcp__mesh_gmail__search_emails ← proper namespace, full schema
mcp__mesh_gmail__send_email ← deferred by ToolSearch automatically
mcp__mesh_context7__query_docs ← native MCP, no indirection
Session management
Safe ~/.claude.json modification:
~/.claude.jsonstores MCP entries AND other Claude Code config (permissions, env vars, etc.). Never overwrite the whole file.- Read-modify-write: load full JSON → add/remove only
mesh:*keys inmcpServers→ write back. Preserve all other keys. - Use
flockon writes to prevent concurrent session corruption.
Stale entry cleanup:
- Each
mesh:*entry includes_meshSessionmetadata with PID and timestamp claudemesh launchsweeps stale entries on startup (dead PID check)claudemesh doctorreports orphaned entries
Concurrent sessions:
- Entries are session-scoped:
mesh:gmail:w1t0p0(includes session ID) - Each session manages only its own entries
Mid-session deploys: dynamic tools
When a service is deployed after the Claude session started, native MCP entries can't be added (Claude Code doesn't support adding new MCP servers mid-session).
Two-tier fallback:
-
Claudemesh MCP fires
notifications/tools/list_changed(stdio, proven to work)- Adds
svc__<name>__<tool>tools to its owntools/list - Claude sees them as
mcp__claudemesh__svc__gmail__search_emails - Works, but namespacing is less clean than native
- Adds
-
System notification tells the peer:
[mesh] Service deployed: "namecheap" by Alejandro (3 tools). Available now via mesh_tool_call("namecheap", "domains_list", {...}). Restart session for native mcp__mesh_namecheap__* access. -
mesh_tool_callremains the universal fallback — works for any service at any time, native or not.
Mid-session undeploys
When a service is undeployed, the native proxy process detects the broker
event and exits gracefully. Claude Code sees the MCP server disconnect and
stops offering those tools. No list_changed needed — MCP server death
is already handled.
Schema introspection
For programmatic access to tool schemas (building workflows, debugging):
mesh_mcp_schema(server_name) → all tools with full inputSchema
mesh_mcp_schema(server_name, tool_name) → one specific tool's schema
mesh_mcp_catalog() → all services with tool counts, scope, status
Database changes
New table: mesh.service
CREATE TABLE mesh.service (
id TEXT PRIMARY KEY,
mesh_id TEXT NOT NULL REFERENCES mesh.mesh(id) ON DELETE CASCADE,
name TEXT NOT NULL,
type TEXT NOT NULL CHECK (type IN ('mcp', 'skill')),
-- Source
source_type TEXT NOT NULL CHECK (source_type IN ('inline', 'zip', 'git')),
source_file_id TEXT REFERENCES mesh.file(id),
source_git_url TEXT,
source_git_branch TEXT DEFAULT 'main',
source_git_sha TEXT,
prev_git_sha TEXT, -- for rollback
-- Content
description TEXT NOT NULL,
instructions TEXT, -- skills only
tools_schema JSONB, -- MCPs: [{ name, description, inputSchema }]
-- Bundle
manifest JSONB, -- { files: [...], entry: "src/index.ts" }
-- Execution (MCPs only)
runtime TEXT CHECK (runtime IN ('node', 'python', 'bun', NULL)),
status TEXT DEFAULT 'stopped'
CHECK (status IN ('building', 'installing', 'running',
'stopped', 'failed', 'crashed', 'restarting')),
config JSONB DEFAULT '{}', -- resource limits, network policy
last_health TIMESTAMP,
restart_count INT DEFAULT 0,
version INT DEFAULT 1,
-- Visibility scope
scope JSONB DEFAULT '{"type": "peer"}',
-- Metadata
deployed_by TEXT REFERENCES mesh.member(id),
deployed_by_name TEXT,
created_at TIMESTAMP DEFAULT now() NOT NULL,
updated_at TIMESTAMP DEFAULT now() NOT NULL,
UNIQUE (mesh_id, name)
);
New table: mesh.vault_entry
CREATE TABLE mesh.vault_entry (
id TEXT PRIMARY KEY,
mesh_id TEXT NOT NULL REFERENCES mesh.mesh(id) ON DELETE CASCADE,
member_id TEXT NOT NULL REFERENCES mesh.member(id),
key TEXT NOT NULL,
ciphertext BYTEA NOT NULL,
nonce BYTEA NOT NULL,
sealed_key BYTEA NOT NULL,
entry_type TEXT DEFAULT 'env' CHECK (entry_type IN ('env', 'file')),
mount_path TEXT,
description TEXT,
created_at TIMESTAMP DEFAULT now(),
updated_at TIMESTAMP DEFAULT now(),
UNIQUE (mesh_id, member_id, key)
);
Extend mesh.skill (backward compat)
ALTER TABLE mesh.skill
ADD COLUMN source_type TEXT DEFAULT 'inline'
CHECK (source_type IN ('inline', 'zip', 'git')),
ADD COLUMN bundle_file_id TEXT REFERENCES mesh.file(id),
ADD COLUMN git_url TEXT,
ADD COLUMN git_branch TEXT DEFAULT 'main',
ADD COLUMN git_sha TEXT,
ADD COLUMN manifest JSONB;
Wire protocol additions
Client → broker
// --- Service deployment ---
interface WSMcpDeployMessage {
type: "mcp_deploy";
server_name: string;
source:
| { type: "zip"; file_id: string }
| { type: "git"; url: string; branch?: string; auth?: string };
config?: {
env?: Record<string, string>; // supports $vault: refs
memory_mb?: number; // default 256
cpus?: number; // default 0.5
network_allow?: string[]; // default: none
runtime?: "node" | "python" | "bun";
};
scope?:
| "peer" // private (default)
| "mesh" // everyone
| { peers: string[] } // named peers
| { group: string } // single group
| { groups: string[] } // multiple groups
| { role: string }; // by role tag
_reqId?: string;
}
interface WSMcpUndeployMessage {
type: "mcp_undeploy";
server_name: string;
_reqId?: string;
}
interface WSMcpUpdateMessage {
type: "mcp_update";
server_name: string;
_reqId?: string;
}
interface WSMcpLogsMessage {
type: "mcp_logs";
server_name: string;
lines?: number; // default 50, max 1000
_reqId?: string;
}
interface WSMcpScopeMessage {
type: "mcp_scope";
server_name: string;
scope?: // set — omit to read current
| "peer"
| "mesh"
| { peers: string[] }
| { group: string }
| { groups: string[] }
| { role: string };
_reqId?: string;
}
interface WSMcpSchemaMessage {
type: "mcp_schema";
server_name: string;
tool_name?: string; // omit for all tools
_reqId?: string;
}
interface WSMcpCatalogMessage {
type: "mcp_catalog";
_reqId?: string;
}
// --- Skill deployment ---
interface WSSkillDeployMessage {
type: "skill_deploy";
source:
| { type: "zip"; file_id: string }
| { type: "git"; url: string; branch?: string; auth?: string };
_reqId?: string;
}
// --- Vault ---
interface WSVaultSetMessage {
type: "vault_set";
key: string;
ciphertext: string; // base64
nonce: string; // base64
sealed_key: string; // base64
entry_type: "env" | "file";
mount_path?: string;
description?: string;
_reqId?: string;
}
interface WSVaultListMessage {
type: "vault_list";
_reqId?: string;
}
interface WSVaultDeleteMessage {
type: "vault_delete";
key: string;
_reqId?: string;
}
Broker → client
// --- Service responses ---
interface WSMcpDeployStatusMessage {
type: "mcp_deploy_status";
server_name: string;
status: "building" | "installing" | "running" | "failed";
tools?: Array<{ name: string; description: string; inputSchema: object }>;
error?: string;
_reqId?: string;
}
interface WSMcpLogsResultMessage {
type: "mcp_logs_result";
server_name: string;
lines: string[];
_reqId?: string;
}
interface WSMcpSchemaResultMessage {
type: "mcp_schema_result";
server_name: string;
tools: Array<{ name: string; description: string; inputSchema: object }>;
_reqId?: string;
}
interface WSMcpCatalogResultMessage {
type: "mcp_catalog_result";
services: Array<{
name: string;
type: "mcp" | "skill";
description: string;
status: string;
tool_count: number;
deployed_by: string;
scope: { type: string; [key: string]: unknown };
source_type: string;
runtime?: string;
created_at: string;
}>;
_reqId?: string;
}
interface WSMcpScopeResultMessage {
type: "mcp_scope_result";
server_name: string;
scope: { type: string; [key: string]: unknown };
deployed_by: string;
_reqId?: string;
}
// --- Skill responses ---
interface WSSkillDeployAckMessage {
type: "skill_deploy_ack";
name: string;
files: string[];
_reqId?: string;
}
// --- Vault responses ---
interface WSVaultAckMessage {
type: "vault_ack";
key: string;
action: "stored" | "deleted" | "not_found";
_reqId?: string;
}
interface WSVaultListResultMessage {
type: "vault_list_result";
entries: Array<{
key: string;
entry_type: "env" | "file";
mount_path?: string;
description?: string;
updated_at: string;
}>;
_reqId?: string;
}
// --- System events (broadcast to mesh) ---
// Sent as WSPushMessage with subtype: "system"
// event: "mcp_deployed"
// eventData: { name, description, tool_count, deployed_by, scope, tools: [...] }
// event: "mcp_undeployed"
// eventData: { name, by }
// event: "mcp_crashed"
// eventData: { name, error, restarts }
// event: "mcp_updated"
// eventData: { name, prev_sha, new_sha, tools: [...] }
Extended hello_ack
interface WSHelloAckMessage {
// ... existing fields ...
/** Scope-filtered service catalog for this peer. */
services?: Array<{
name: string;
description: string;
status: string;
tools: Array<{ name: string; description: string; inputSchema: object }>;
deployed_by: string;
}>;
}
MCP tool additions (CLI)
Service management tools
mesh_mcp_deploy(server_name, file_id?, git_url?, git_branch?, env?, runtime?,
memory_mb?, network_allow?, scope?)
mesh_mcp_undeploy(server_name)
mesh_mcp_update(server_name) // git-only: pull + rebuild + restart
mesh_mcp_logs(server_name, lines?)
mesh_mcp_scope(server_name, scope?) // set or read visibility scope
mesh_mcp_schema(server_name, tool?) // introspect tool schemas
mesh_mcp_catalog() // list all services with status
mesh_skill_deploy(file_id?, git_url?, git_branch?)
Vault tools
vault_set(key, value, type?, mount_path?, description?)
vault_list()
vault_delete(key)
Existing tools (unchanged)
share_skill(name, description, instructions, tags) // inline skills
mesh_mcp_register(server_name, description, tools) // live peer proxy
mesh_tool_call(server_name, tool_name, args) // universal fallback
mesh_mcp_list() // shows both proxy + managed
Broker-side service manager
New file: apps/broker/src/service-manager.ts
Interface
interface ServiceManager {
deploy(opts: {
meshId: string;
name: string;
source: { type: "zip"; fileId: string }
| { type: "git"; url: string; branch: string; auth?: string };
config: ServiceConfig;
vaultEntries: Array<{ key: string; ciphertext: Buffer; nonce: Buffer; sealedKey: Buffer;
entryType: "env" | "file"; mountPath?: string }>;
}): Promise<{ tools: ToolDef[]; status: string }>;
undeploy(meshId: string, name: string): Promise<void>;
update(meshId: string, name: string): Promise<{ tools: ToolDef[]; newSha?: string }>;
callTool(meshId: string, serverName: string, toolName: string,
args: Record<string, unknown>): Promise<{ result?: unknown; error?: string }>;
logs(meshId: string, name: string, lines?: number): string[];
status(meshId: string, name: string): ServiceStatus;
restoreAll(): Promise<void>; // on broker boot
}
Boot restore
On broker startup:
- Query
mesh.service WHERE status IN ('running', 'crashed', 'restarting') - Set all to
status='restarting' - Re-spawn runner container per mesh
- Load each service's source and spawn child process
- Set
status='running'only after successful MCPinitializeresponse - Services that fail to start →
status='failed', system event broadcast
Security model
| Concern | Mitigation |
|---|---|
| Arbitrary code execution | Docker container, one per mesh |
| Resource exhaustion | --memory=512m --cpus=1 per container |
| Filesystem escape | No host volume mounts |
| Secret leakage | Vault E2E encrypted, decrypted only inside container |
| Network exfiltration | --network=mesh-restricted, per-service allowlist |
| Malicious zip (path traversal) | Validate all paths within target dir, reject .. |
| Git auth tokens | Stored encrypted in vault, passed via GIT_ASKPASS |
| Denial of service | Max 20 services per mesh, max 50MB zip, max 500MB image |
| Scope bypass | Double-check: filter catalog + check on call |
| OAuth token expiry | Store refresh tokens, notify deployer on persistent failure |
| Tool name collision | svc__ prefix for mid-session dynamic tools |
| Stale MCP entries | PID check + age sweep on launch |
| Tool call timeout | MCP_TIMEOUT=30000 set by launch (default too short for mesh chain) |
| Large tool output | MAX_MCP_OUTPUT_TOKENS=50000 set by launch; proxy truncates if needed |
| Proxy crash | Claude Code won't auto-restart; claudemesh doctor diagnoses dead proxies |
| Broker restart | Proxies reconnect via BrokerClient backoff; calls return "reconnecting" during window |
CLI commands
# Deploy from zip
claudemesh deploy ./my-server.zip --name my-server
# Deploy from git
claudemesh deploy --git https://github.com/user/repo.git --name my-server
# Deploy with vault refs
claudemesh vault set gmail-creds ~/.gmail-mcp/credentials.json --type file
claudemesh deploy --git https://github.com/user/gmail-mcp.git --name gmail \
--env 'GMAIL_CREDENTIALS_PATH=$vault:gmail-creds:file:/secrets/creds.json' \
--network-allow 'gmail.googleapis.com:443'
# Set access
claudemesh scope gmail --mesh # everyone
claudemesh scope gmail --group eng # @eng only
claudemesh scope gmail --groups 'eng,ops' # @eng + @ops
claudemesh scope gmail --role lead # leads only
claudemesh scope gmail --peers 'Mou,Alejandro' # specific peers
claudemesh scope gmail --peer # private (deployer only)
# Manage
claudemesh logs gmail
claudemesh update gmail # git-only: pull + rebuild
claudemesh undeploy gmail
claudemesh catalog # list all services
# Skills
claudemesh skill deploy ./my-skill.zip
claudemesh skill deploy --git https://github.com/user/skill.git
# Vault
claudemesh vault set api-key "sk-abc123"
claudemesh vault set oauth-creds ~/path/to/creds.json --type file
claudemesh vault list
claudemesh vault delete api-key
Migration path
| What | Before | After |
|---|---|---|
share_skill() inline |
works | unchanged |
mesh_mcp_register() live proxy |
works | unchanged, labeled "proxy" in catalog |
| Zip MCP server | not possible | share_file + mesh_mcp_deploy |
| Git MCP server | not possible | mesh_mcp_deploy(git_url=...) |
| Zip skill bundle | not possible | mesh_skill_deploy(file_id=...) |
| Git skill | not possible | mesh_skill_deploy(git_url=...) |
mesh_tool_call |
forwards to peer | routes to runner OR forwards to peer |
mesh_mcp_list |
proxy only | shows proxy + managed, with status |
| Tool discovery | manual mesh_mcp_list |
native MCP entries at launch + mid-session events |
| Credentials | plaintext env vars | E2E encrypted vault with $vault: refs |
| Access control | none (anyone can call) | Scopes: peer/group/role/mesh per service |
All existing behavior preserved. New capabilities are additive.
Implementation order
Phase 1: Foundation
- DB migration —
mesh.servicetable,mesh.vault_entrytable, extendmesh.skill - Wire protocol — add all new message types to
types.ts - Vault — broker-side storage + CLI tools (
vault_set,vault_list,vault_delete) - Service catalog —
mcp_catalog,mcp_schema, scope filtering inhello_ack
Phase 2: Execution engine
- Runner supervisor —
service-manager.ts, child process spawn/kill/restart/health - Docker container — base image, build + run lifecycle
- Deploy flow — zip extraction, git clone, runtime detection,
npm install/pip install - Tool call routing — broker routes managed service calls to runner
Phase 3: Native integration
- Launch integration —
claudemesh launchwritesmesh:*MCP entries to~/.claude.json - Stdio proxy —
claudemesh mcp --service <name>thin proxy command - Mid-session fallback —
svc__*dynamic tools +list_changedon claudemesh MCP - Session cleanup — stale entry sweep, PID checks,
flockon config writes
Phase 4: Skill bundles
- Skill deploy — zip/git extraction,
SKILL.md+skill.jsonparsing, manifest storage get_skillextension — returns structured file contents from bundle
Phase 5: Polish
mesh_mcp_update— git pull + rebuild + restart flow- Boot restore — re-spawn services on broker restart
- CLI commands —
claudemesh deploy,claudemesh vault,claudemesh scope,claudemesh catalog - Docs + example bundles — sample MCP server zip, sample skill bundle
Appendix: Claude Code MCP behavior (verified)
Key findings from Claude Code MCP architecture research that informed this spec. These are behaviors of Claude Code itself, not the MCP protocol.
Lifecycle
- MCP servers start when a session begins, stop when it ends
- No auto-restart on crash — next tool invocation fails. Our proxy must handle reconnection to the broker independently
- No health checks from Claude Code — failures discovered on tool use
MCP_TIMEOUTenv var controls tool call timeout
Dynamic tools
notifications/tools/list_changedis supported and triggers immediate re-fetch oftools/list— works mid-conversation over stdio- SSE/HTTP transport support for
list_changedmay be unreliable — known bug in some versions. This is why we use stdio proxies, not HTTP transport.
ToolSearch / deferred tools
- Enabled by default (
ENABLE_TOOL_SEARCH=true) - Only tool names are loaded at startup — full schemas fetched on demand
- Requires Sonnet 4+ or Opus 4+ (Haiku does not support tool references)
- Adding 100+ MCP tools has near-zero context cost at startup
- Configurable:
ENABLE_TOOL_SEARCH=auto:5loads upfront if <5% of context
Tool output limits
- Warning at 10,000 tokens, hard limit at 25,000 tokens (default)
- Configurable via
MAX_MCP_OUTPUT_TOKENSenv var - Per-tool override:
_meta["anthropic/maxResultSizeChars"](up to 500K chars)
Namespacing
- Tools namespaced as
mcp__servername__toolname - Two servers with same tool name → no conflict (different namespace)
- Server names normalized: spaces → underscores
Registration
- File-based only — no runtime API to add MCP servers
- Scopes:
local(/.claude.json),/.claude.json global)project(.mcp.json),user( - Precedence: local > project > user
claude mcp add --scope userfor global,--scope projectfor team-shared- Cannot add new MCP server entries mid-session — this is why
claudemesh launchpre-writes entries before spawning, and mid-session deploys fall back to dynamicsvc__*tools on the claudemesh MCP server
Environment variables
- Passed via
--env KEY=VALUEonclaude mcp add .mcp.jsonsupports${VAR}and${VAR:-default}expansion- Special:
${CLAUDE_PLUGIN_ROOT},${CLAUDE_PLUGIN_DATA}
Implications for this spec
- Native MCP entries MUST be written before
claudespawns →claudemesh launchflow - Stdio transport is the only reliable path for
list_changed→ thin proxy model - ToolSearch means 100+ mesh tools have negligible context cost
- No server dependencies → each mesh proxy is independent
- No auto-restart → proxies must reconnect to broker on their own