Add the foundation for deploying and managing MCP servers on the VPS broker, with per-peer credential vaults and visibility scopes. Architecture: - One Docker container per mesh with a Node supervisor - Each MCP server runs as a child process with its own stdio pipe - claudemesh launch installs native MCP entries in ~/.claude.json - Mid-session deploys fall back to svc__* dynamic tools + list_changed New components: - DB: mesh.service + mesh.vault_entry tables, mesh.skill extensions - Broker: 19 wire protocol types, 11 message handlers, service catalog in hello_ack with scope filtering, service-manager.ts (775 lines) - CLI: 13 tool definitions, 12 WS client methods, tool call handlers, startServiceProxy() for native MCP proxy mode - Launch: catalog fetch, native MCP entry install, stale sweep, cleanup, MCP_TIMEOUT=30s, MAX_MCP_OUTPUT_TOKENS=50k Security: path sanitization on service names, column whitelist on upsertService, returning()-based delete checks, vault E2E encryption. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1259 lines
45 KiB
Markdown
1259 lines
45 KiB
Markdown
# Mesh Services: MCP Servers & Skills Platform
|
|
|
|
> Consolidated spec for deploying, managing, and executing MCP servers
|
|
> and multi-file skills within a claudemesh mesh. Covers source modes,
|
|
> execution engine, credential vaults, access control, native Claude Code
|
|
> integration, and dynamic tool discovery.
|
|
|
|
---
|
|
|
|
## Problem
|
|
|
|
Today:
|
|
- **Skills** are a single `instructions` text field in Postgres. No multi-file support.
|
|
- **MCP servers** are live-proxied through the registering peer. When that peer disconnects, the server dies. The `persistent` flag is cosmetic.
|
|
- Neither supports bundled artifacts (templates, configs, schemas, example code).
|
|
- Claude Code has no way to discover mesh tools natively — peers must use the generic `mesh_tool_call` proxy.
|
|
|
|
## Design goals
|
|
|
|
1. Three source modes — inline, zip bundle, git repo — for both skills and MCP servers
|
|
2. MCP servers run on the VPS, not on peers — true 24/7 persistence
|
|
3. Sandboxed execution with resource limits
|
|
4. Native Claude Code tool integration — deployed MCPs appear as regular MCP server entries
|
|
5. Per-peer credential vault for secrets (OAuth tokens, API keys)
|
|
6. Visibility scopes on services — peer, group, role, or mesh-wide — deployer controls who can call, not who sees secrets
|
|
7. Dynamic mid-session discovery via `notifications/tools/list_changed`
|
|
8. All existing behavior preserved — inline skills and live-proxy MCPs unchanged
|
|
|
|
---
|
|
|
|
## Architecture overview
|
|
|
|
```
|
|
┌──────────────────────────────────────────────────────────────────┐
|
|
│ claudemesh launch --name Mou --mesh dev │
|
|
│ │
|
|
│ 1. Connect to broker, authenticate │
|
|
│ 2. Fetch service catalog (scope-filtered for this peer) │
|
|
│ 3. Write native MCP entries to ~/.claude.json: │
|
|
│ mesh:gmail, mesh:context7, mesh:whatsapp │
|
|
│ 4. Spawn claude │
|
|
│ 5. On exit: remove mesh:* entries │
|
|
└──────────┬───────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌──────────────────────────────────────────────────────────────────┐
|
|
│ Claude Code session │
|
|
│ │
|
|
│ MCP: claudemesh (stdio) │
|
|
│ ├── send_message, list_peers, set_summary, ... (peer comms) │
|
|
│ ├── mesh_mcp_deploy, mesh_mcp_scope, ... (service mgmt) │
|
|
│ ├── vault_set, vault_list, ... (credentials) │
|
|
│ └── mesh_mcp_schema (introspection)│
|
|
│ │
|
|
│ MCP: mesh:gmail (stdio proxy) → mcp__mesh_gmail__* │
|
|
│ MCP: mesh:context7 (stdio proxy) → mcp__mesh_context7__* │
|
|
│ MCP: mesh:whatsapp (stdio proxy) → mcp__mesh_whatsapp__* │
|
|
│ │
|
|
│ MCP: playwriter (stdio, local) → local MCPs as usual │
|
|
│ MCP: figma (stdio, local) │
|
|
└──────────┬───────────────────────────────────────────────────────┘
|
|
│ Each mesh:* proxy connects via WebSocket
|
|
▼
|
|
┌──────────────────────────────────────────────────────────────────┐
|
|
│ Broker (VPS — wss://ic.claudemesh.com/ws) │
|
|
│ │
|
|
│ Existing: message routing, presence, state, memory, files, ... │
|
|
│ │
|
|
│ New: Service Catalog │
|
|
│ ├── Scope enforcement (peer/group/role/mesh visibility) │
|
|
│ ├── Tool schema registry (from runner) │
|
|
│ ├── Deploy/undeploy/update commands │
|
|
│ └── System events: mcp_deployed, mcp_undeployed │
|
|
│ │
|
|
│ New: Vault │
|
|
│ └── Per-peer encrypted credential storage │
|
|
│ │
|
|
│ Tool call routing: │
|
|
│ ├── Managed service? → forward to runner │
|
|
│ └── Live proxy? → forward to hosting peer (existing) │
|
|
└──────────┬───────────────────────────────────────────────────────┘
|
|
│ stdio (child process)
|
|
▼
|
|
┌──────────────────────────────────────────────────────────────────┐
|
|
│ Runner (one Docker container per mesh) │
|
|
│ │
|
|
│ Supervisor (Node main thread) │
|
|
│ ├── stdin/stdout ↔ broker (JSON-RPC multiplexed) │
|
|
│ ├── Routes tool calls by service name │
|
|
│ ├── Lifecycle: load / unload / restart │
|
|
│ ├── Health: MCP ping per child, restart on 3 failures │
|
|
│ ├── Logs: 1000-line ring buffer per service │
|
|
│ └── Vault: decrypts credentials at spawn time │
|
|
│ │
|
|
│ Child processes (one per MCP server): │
|
|
│ ├── child_process.spawn("node", [...]) ← Node MCP servers │
|
|
│ ├── child_process.spawn("uvx", [...]) ← Python MCP servers │
|
|
│ ├── child_process.spawn("npx", [...]) ← npm MCP packages │
|
|
│ │ │
|
|
│ │ Each child: │
|
|
│ │ ├── Own stdio pipe (MCP protocol) │
|
|
│ │ ├── Own env vars (including vault-resolved secrets) │
|
|
│ │ ├── Own /secrets/<name>/ dir (vault files) │
|
|
│ │ └── Killed individually on undeploy │
|
|
│ │ │
|
|
│ Base image: node:22 + python3.12 + uv + npx │
|
|
│ Limits: --memory=512m --cpus=1 --network=mesh-restricted │
|
|
└──────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Source modes
|
|
|
|
### 1. Inline (existing, unchanged)
|
|
|
|
```
|
|
share_skill(name, description, instructions, tags) ← text-only skill
|
|
mesh_mcp_register(server_name, description, tools) ← live peer proxy
|
|
```
|
|
|
|
### 2. Zip bundle
|
|
|
|
Upload a zip, then deploy:
|
|
|
|
```
|
|
1. share_file(path="./my-server.zip", tags=["mcp-bundle"]) → fileId
|
|
2. mesh_mcp_deploy(file_id=fileId, server_name="my-server", config={...})
|
|
```
|
|
|
|
**MCP server zip structure:**
|
|
```
|
|
my-mcp-server/
|
|
├── package.json # or pyproject.toml / requirements.txt
|
|
├── src/index.ts # MCP server entry (stdio transport)
|
|
├── .env.example # declares required env vars
|
|
└── README.md
|
|
```
|
|
|
|
**Skill bundle zip structure:**
|
|
```
|
|
my-skill/
|
|
├── SKILL.md # instructions (replaces inline text)
|
|
├── skill.json # { name, description, tags }
|
|
├── templates/ # prompt templates, examples
|
|
└── schemas/ # JSON schemas, configs
|
|
```
|
|
|
|
### 3. Git repository
|
|
|
|
```
|
|
mesh_mcp_deploy(
|
|
git_url="https://github.com/user/my-mcp-server.git",
|
|
branch="main",
|
|
server_name="my-server",
|
|
config={ env: { API_KEY: "$vault:my-api-key" } }
|
|
)
|
|
```
|
|
|
|
- Shallow clone (`--depth 1`)
|
|
- Commit SHA pinned in DB for auditability
|
|
- `mesh_mcp_update(server_name)` → git pull + rebuild + restart
|
|
- Auth via `config.git_auth` (stored encrypted, never logged)
|
|
|
|
---
|
|
|
|
## Execution engine
|
|
|
|
### Why child processes, not worker threads
|
|
|
|
MCP servers use **stdio transport** — each server owns its stdin/stdout via
|
|
`StdioServerTransport`. Two servers can't share one process. Worker threads
|
|
don't help because:
|
|
- MCP SDK `StdioServerTransport` takes over process stdin/stdout
|
|
- `npx @package/mcp-server` spawns its own process anyway
|
|
- Python MCPs need a Python runtime, not a Node thread
|
|
|
|
The runner spawns each MCP server as a **child process** with its own stdio
|
|
pipe, exactly how every MCP server is designed to work.
|
|
|
|
### Container design: one per mesh
|
|
|
|
```
|
|
┌─ Docker container (mesh: "dev") ─────────────────┐
|
|
│ │
|
|
│ Supervisor (Node main thread) │
|
|
│ ├─ stdio ↔ broker │
|
|
│ ├─ routes calls by service name │
|
|
│ │ │
|
|
│ ├─ spawn("npx", ["@upstash/context7-mcp"]) │
|
|
│ │ └─ stdio pipe ↔ MCP protocol │
|
|
│ ├─ spawn("node", ["dist/index.js"]) │
|
|
│ │ └─ stdio pipe ↔ MCP protocol │
|
|
│ ├─ spawn("uvx", ["mcp-outline"]) │
|
|
│ │ └─ stdio pipe ↔ MCP protocol │
|
|
│ └─ spawn("python", ["-m", "server"]) │
|
|
│ └─ stdio pipe ↔ MCP protocol │
|
|
│ │
|
|
│ Base: node:22 + python3.12 + uv + npx │
|
|
│ Limits: --memory=512m --cpus=1 │
|
|
│ Network: mesh-restricted bridge (allowlist) │
|
|
└───────────────────────────────────────────────────┘
|
|
```
|
|
|
|
**Why one container, not N:**
|
|
- One Docker process to manage, one cgroup for the whole mesh
|
|
- One network namespace — single firewall config
|
|
- Shared node_modules / pip cache across services
|
|
- VPS resources: 8 vCores / 24GB — N containers exhausts memory fast
|
|
|
|
**Why not zero containers (bare child processes on the broker):**
|
|
- Broker stays routing-only — runner crashes don't take it down
|
|
- Security boundary — runner can't access broker's DB or filesystem
|
|
- Runner can be on a different machine later (NUC, second VPS)
|
|
|
|
### Supervisor protocol
|
|
|
|
Broker ↔ runner communicate over the container's stdin/stdout as JSON lines:
|
|
|
|
```typescript
|
|
// Broker → runner
|
|
{ action: "load", name: "gmail", path: "/services/gmail", env: {...} }
|
|
{ action: "call", name: "gmail", tool: "search_emails", args: {...}, callId: "abc" }
|
|
{ action: "unload", name: "gmail" }
|
|
{ action: "health", name: "gmail" }
|
|
{ action: "list_tools", name: "gmail" }
|
|
|
|
// Runner → broker
|
|
{ callId: "abc", result: {...} }
|
|
{ callId: "abc", error: "connection refused" }
|
|
{ type: "loaded", name: "gmail", tools: [{name, description, inputSchema}] }
|
|
{ type: "unloaded", name: "gmail" }
|
|
{ type: "crashed", name: "gmail", restarts: 3, error: "OOM" }
|
|
{ type: "health", name: "gmail", ok: true, rssKb: 45000 }
|
|
```
|
|
|
|
### Runtime auto-detection
|
|
|
|
| File found | Runtime | Spawn command |
|
|
|---|---|---|
|
|
| `package.json` | node | `npm install && node <main>` |
|
|
| `package.json` with npx hint | node | `npx <package>` |
|
|
| `pyproject.toml` | python | `pip install . && python -m <module>` |
|
|
| `requirements.txt` | python | `pip install -r requirements.txt && python <entry>` |
|
|
| `Bunfile` or `bun.lockb` | bun | `bun install && bun <entry>` |
|
|
|
|
### Health & restart
|
|
|
|
- Supervisor sends MCP `ping` to each child every 30s
|
|
- No response within 5s → mark unhealthy
|
|
- 3 consecutive failures → restart (kill + re-spawn)
|
|
- Max 5 restarts → status=`crashed`, notify deployer via mesh system event
|
|
- On crash: `{ type: "push", event: "mcp_crashed", eventData: { name, error, restarts } }`
|
|
|
|
### Logs
|
|
|
|
Per-service ring buffer (1000 lines). Captures child's stderr + stdout
|
|
(excluding MCP protocol JSON). Accessible via `mesh_mcp_logs(name, lines?)`.
|
|
|
|
### Storage layout
|
|
|
|
```
|
|
/var/claudemesh/services/
|
|
├── <meshId>/
|
|
│ ├── <serviceName>/
|
|
│ │ ├── source/ # extracted zip or git clone
|
|
│ │ ├── secrets/ # vault-resolved credential files
|
|
│ │ ├── node_modules/ # or .venv/ for Python
|
|
│ │ └── .meta.json # { pid, startedAt, sha, runtime }
|
|
```
|
|
|
|
### Network policy
|
|
|
|
Default: `--network=mesh-restricted` (Docker bridge with outbound deny-all).
|
|
|
|
Per-service allowlist in deploy config:
|
|
```json
|
|
{
|
|
"network_allow": [
|
|
"gmail.googleapis.com:443",
|
|
"oauth2.googleapis.com:443",
|
|
"100.113.153.45:*"
|
|
]
|
|
}
|
|
```
|
|
|
|
Implemented via iptables rules on the bridge, or per-container `--add-host`
|
|
entries combined with a proxy. For Tailscale-accessible services (NUC, etc.),
|
|
allow the Tailscale IP.
|
|
|
|
---
|
|
|
|
## Credential vault
|
|
|
|
### Design
|
|
|
|
Per-peer encrypted storage on the broker. Credentials never leave the vault
|
|
in plaintext — decrypted only inside the runner container at spawn time.
|
|
|
|
Peers don't share credentials. They share **access to the running MCP
|
|
server** via scopes. The MCP server runs with the deployer's credentials;
|
|
other peers call it without ever seeing the secrets.
|
|
|
|
### Encryption model
|
|
|
|
Same crypto as E2E file sharing (`crypto/file-crypto.ts`):
|
|
|
|
1. Peer generates random symmetric key
|
|
2. Encrypts the credential with `crypto_secretbox` (symmetric)
|
|
3. Seals the symmetric key with their own pubkey (`crypto_box`)
|
|
4. Stores sealed key + ciphertext on broker — broker sees only ciphertext
|
|
5. At spawn time: runner requests decryption from the deployer's sealed key
|
|
(the runner holds a mesh-scoped keypair granted by the deployer at deploy time)
|
|
|
|
### Vault reference syntax
|
|
|
|
In `mesh_mcp_deploy` env config, `$vault:` prefix triggers vault resolution:
|
|
|
|
```
|
|
$vault:api-key → inject as env var
|
|
$vault:gmail-creds:file:/secrets/creds.json → decrypt, write to file, set env var to path
|
|
```
|
|
|
|
Examples:
|
|
```typescript
|
|
mesh_mcp_deploy({
|
|
server_name: "gmail",
|
|
git_url: "https://github.com/gongrzhe/server-gmail-autoauth-mcp",
|
|
env: {
|
|
GMAIL_CREDENTIALS_PATH: "$vault:gmail-creds:file:/secrets/credentials.json",
|
|
GMAIL_OAUTH_PATH: "$vault:gmail-oauth:file:/secrets/gcp-oauth.keys.json",
|
|
},
|
|
network_allow: ["gmail.googleapis.com:443", "oauth2.googleapis.com:443"],
|
|
})
|
|
```
|
|
|
|
### MCP tools
|
|
|
|
```
|
|
vault_set(key, value, type?, mount_path?) — encrypt + store
|
|
value: string (env var) or local file path (reads + encrypts the file)
|
|
type: "env" (default) or "file"
|
|
mount_path: for files, where to write inside the service dir
|
|
|
|
vault_list() — list keys (no values, metadata only)
|
|
vault_delete(key) — remove entry
|
|
```
|
|
|
|
### DB schema
|
|
|
|
```sql
|
|
CREATE TABLE mesh.vault_entry (
|
|
id TEXT PRIMARY KEY,
|
|
mesh_id TEXT NOT NULL REFERENCES mesh.mesh(id) ON DELETE CASCADE,
|
|
member_id TEXT NOT NULL REFERENCES mesh.member(id),
|
|
key TEXT NOT NULL,
|
|
|
|
-- E2E encrypted content
|
|
ciphertext BYTEA NOT NULL,
|
|
nonce BYTEA NOT NULL,
|
|
sealed_key BYTEA NOT NULL, -- symmetric key sealed with peer's pubkey
|
|
|
|
-- Metadata (plaintext)
|
|
entry_type TEXT DEFAULT 'env' CHECK (entry_type IN ('env', 'file')),
|
|
mount_path TEXT,
|
|
description TEXT,
|
|
|
|
created_at TIMESTAMP DEFAULT now(),
|
|
updated_at TIMESTAMP DEFAULT now(),
|
|
|
|
UNIQUE (mesh_id, member_id, key)
|
|
);
|
|
```
|
|
|
|
---
|
|
|
|
## Visibility scopes
|
|
|
|
### Model
|
|
|
|
Scopes control who can see and call a service. Credentials are invisible to
|
|
callers — they interact with the running service, not the secrets behind it.
|
|
The deployer controls visibility; the vault handles secrets separately.
|
|
|
|
### Scope levels
|
|
|
|
| Scope | Who sees it | Use case |
|
|
|---|---|---|
|
|
| `peer` | Only the deployer (default) | Personal tools, staging before publish |
|
|
| `{ peers: [...] }` | Named peers | Shared between specific people |
|
|
| `{ group: "eng" }` | All @eng members | Team-specific tools |
|
|
| `{ groups: ["eng", "ops"] }` | Multiple groups | Cross-team tools |
|
|
| `{ role: "lead" }` | Any peer with that role | Role-gated admin tools |
|
|
| `mesh` | Everyone in the mesh | Shared utilities |
|
|
|
|
### Examples
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────┐
|
|
│ Mesh: "dev-team" │
|
|
│ │
|
|
│ mesh scope ─── everyone │
|
|
│ ├── context7 (utility) │
|
|
│ ├── youtube-transcript │
|
|
│ └── mesh-db (shared database) │
|
|
│ │
|
|
│ group scope ─── @group members only │
|
|
│ ├── @eng │
|
|
│ │ ├── github-mcp (eng team's GitHub) │
|
|
│ │ └── ssh-manager (eng infra access) │
|
|
│ ├── @sales │
|
|
│ │ ├── apollo-io (sales CRM) │
|
|
│ │ └── gmail (sales@ inbox) │
|
|
│ └── @ops │
|
|
│ ├── stalwart-mail (mail server admin) │
|
|
│ └── namecheap (DNS management) │
|
|
│ │
|
|
│ role scope ─── by role tag │
|
|
│ ├── lead → mesh-admin-tools (deploy, vault) │
|
|
│ └── observer → (read-only MCPs only) │
|
|
│ │
|
|
│ peer scope ─── only specific peers │
|
|
│ ├── Alejandro │
|
|
│ │ ├── gmail-personal (my inbox) │
|
|
│ │ └── gworkspace (my workspace) │
|
|
│ └── Mou │
|
|
│ └── cursor-composer (Mou's Cursor) │
|
|
│ │
|
|
└─────────────────────────────────────────────────┘
|
|
```
|
|
|
|
### Deploy with scope
|
|
|
|
```typescript
|
|
// Mesh scope — everyone
|
|
mesh_mcp_deploy({
|
|
server_name: "context7",
|
|
source: { type: "git", url: "..." },
|
|
scope: "mesh",
|
|
})
|
|
|
|
// Group scope — only @eng
|
|
mesh_mcp_deploy({
|
|
server_name: "github-mcp",
|
|
source: { type: "git", url: "..." },
|
|
scope: { group: "eng" },
|
|
})
|
|
|
|
// Multi-group
|
|
mesh_mcp_deploy({
|
|
server_name: "ssh-manager",
|
|
scope: { groups: ["eng", "ops"] },
|
|
})
|
|
|
|
// Role scope — only leads
|
|
mesh_mcp_deploy({
|
|
server_name: "mesh-admin",
|
|
scope: { role: "lead" },
|
|
})
|
|
|
|
// Peer scope — just me (default)
|
|
mesh_mcp_deploy({
|
|
server_name: "gmail-personal",
|
|
scope: "peer",
|
|
})
|
|
|
|
// Specific peers
|
|
mesh_mcp_deploy({
|
|
server_name: "shared-workspace",
|
|
scope: { peers: ["Mou", "Alejandro"] },
|
|
})
|
|
```
|
|
|
|
### Enforcement
|
|
|
|
- **At catalog time:** broker filters the service catalog by scope before
|
|
sending to peers in `hello_ack`. The peer's groups and role (from `hello`)
|
|
are matched against each service's scope. A tool you can't access never
|
|
appears in Claude's tool list.
|
|
- **At call time:** broker re-checks scope before routing. Double-check
|
|
in case catalog is stale or the peer's groups changed.
|
|
|
|
### Scope resolution logic
|
|
|
|
```typescript
|
|
function peerCanAccess(service: Service, peer: PeerConn): boolean {
|
|
const scope = service.scope;
|
|
if (typeof scope === "string") {
|
|
if (scope === "peer") return service.deployed_by === peer.memberId;
|
|
if (scope === "mesh") return true;
|
|
}
|
|
if ("peers" in scope) {
|
|
return scope.peers.some(p =>
|
|
p === peer.memberId || p === peer.displayName);
|
|
}
|
|
if ("group" in scope) {
|
|
return peer.groups.some(g => g.name === scope.group);
|
|
}
|
|
if ("groups" in scope) {
|
|
return peer.groups.some(g => scope.groups.includes(g.name));
|
|
}
|
|
if ("role" in scope) {
|
|
return peer.groups.some(g => g.role === scope.role);
|
|
}
|
|
return false;
|
|
}
|
|
```
|
|
|
|
### MCP tools
|
|
|
|
```
|
|
mesh_mcp_scope(server_name, scope?)
|
|
scope set: mesh_mcp_scope("gmail", { group: "sales" })
|
|
scope read: mesh_mcp_scope("gmail") → { scope, deployed_by }
|
|
```
|
|
|
|
### Scope change events
|
|
|
|
When a scope changes, the broker:
|
|
1. Computes which peers gained/lost access
|
|
2. Sends `mcp_scope_changed` system event to affected peers
|
|
3. Peers who gained access get `svc__*` dynamic tools via `list_changed`
|
|
4. Peers who lost access get tools removed via `list_changed`
|
|
5. Full native access requires session restart
|
|
|
|
### DB
|
|
|
|
Single column on `mesh.service`:
|
|
|
|
```sql
|
|
scope JSONB DEFAULT '{"type": "peer"}'
|
|
-- {"type": "peer"}
|
|
-- {"type": "mesh"}
|
|
-- {"type": "peers", "allow": ["member_id_1", "member_id_2"]}
|
|
-- {"type": "group", "group": "eng"}
|
|
-- {"type": "groups", "groups": ["eng", "ops"]}
|
|
-- {"type": "role", "role": "lead"}
|
|
```
|
|
|
|
### Future: cross-mesh scope
|
|
|
|
Not for v1. Each mesh is isolated. The schema supports it later:
|
|
|
|
```json
|
|
{"type": "cross_mesh", "meshes": ["dev", "staging"]}
|
|
```
|
|
|
|
A service deployed in `dev` visible in `staging`. Requires the runner to be
|
|
accessible from both meshes (possible since it's on the VPS).
|
|
|
|
---
|
|
|
|
## Native Claude Code integration
|
|
|
|
### Goal
|
|
|
|
Deployed mesh MCPs feel indistinguishable from locally installed MCP servers.
|
|
Claude sees `mcp__mesh_gmail__search_emails` — not `mesh_tool_call("gmail", ...)`.
|
|
|
|
### At session start: native MCP entries
|
|
|
|
`claudemesh launch` queries the broker for the scope-filtered service catalog
|
|
and installs each service as a native MCP entry before spawning Claude:
|
|
|
|
```typescript
|
|
// commands/launch.ts — extended flow
|
|
|
|
// Step 3 (new): fetch service catalog from broker
|
|
const catalog = await fetchServiceCatalog(mesh);
|
|
|
|
// Step 4 (new): write mesh MCP entries to ~/.claude.json
|
|
for (const service of catalog) {
|
|
addMcpEntry(`mesh:${service.name}`, {
|
|
command: "claudemesh",
|
|
args: ["mcp", "--service", service.name],
|
|
});
|
|
}
|
|
|
|
// Step 5: spawn claude with mesh-aware env
|
|
const child = spawn("claude", claudeArgs, {
|
|
env: {
|
|
...process.env,
|
|
CLAUDEMESH_CONFIG_DIR: tmpDir,
|
|
CLAUDEMESH_DISPLAY_NAME: displayName,
|
|
// Mesh calls traverse: proxy → WS → broker → runner → child.
|
|
// Default MCP timeout is too short for this chain.
|
|
MCP_TIMEOUT: process.env.MCP_TIMEOUT ?? "30000",
|
|
// Mesh MCPs may return large results (DB queries, file contents).
|
|
MAX_MCP_OUTPUT_TOKENS: process.env.MAX_MCP_OUTPUT_TOKENS ?? "50000",
|
|
},
|
|
});
|
|
|
|
// Step 6 (extended): cleanup mesh:* entries on exit
|
|
child.on("exit", () => {
|
|
removeMcpEntries("mesh:*");
|
|
cleanup(); // existing tmpdir cleanup
|
|
});
|
|
```
|
|
|
|
Each `claudemesh mcp --service <name>` is a thin stdio proxy:
|
|
|
|
```typescript
|
|
// Thin proxy: connects to broker, serves ONE service's tools
|
|
const client = new BrokerClient(mesh);
|
|
await client.connect();
|
|
const tools = await client.getServiceTools(serviceName);
|
|
|
|
server.setRequestHandler(ListToolsRequestSchema, () => ({ tools }));
|
|
server.setRequestHandler(CallToolRequestSchema, async (req) => {
|
|
// Wait for broker reconnection if WS is down (up to 10s)
|
|
if (client.status !== "open") {
|
|
const connected = await client.waitForConnection(10_000);
|
|
if (!connected) {
|
|
return text("Service temporarily unavailable — broker reconnecting. Retry in a few seconds.", true);
|
|
}
|
|
}
|
|
return await client.mcpCall(serviceName, req.params.name, req.params.arguments);
|
|
});
|
|
```
|
|
|
|
**Resilience notes:**
|
|
- The `BrokerClient` handles WS reconnection with exponential backoff (1s→30s)
|
|
- Claude Code does NOT auto-restart crashed MCP servers — if the proxy
|
|
process itself dies, those tools vanish until session restart
|
|
- The proxy should catch all exceptions and return MCP errors, never crash
|
|
- `claudemesh doctor` diagnoses dead proxy processes mid-session
|
|
|
|
**Result:** Claude Code starts and sees:
|
|
```
|
|
mcp__mesh_gmail__search_emails ← proper namespace, full schema
|
|
mcp__mesh_gmail__send_email ← deferred by ToolSearch automatically
|
|
mcp__mesh_context7__query_docs ← native MCP, no indirection
|
|
```
|
|
|
|
### Session management
|
|
|
|
**Safe `~/.claude.json` modification:**
|
|
- `~/.claude.json` stores MCP entries AND other Claude Code config (permissions,
|
|
env vars, etc.). Never overwrite the whole file.
|
|
- Read-modify-write: load full JSON → add/remove only `mesh:*` keys in
|
|
`mcpServers` → write back. Preserve all other keys.
|
|
- Use `flock` on writes to prevent concurrent session corruption.
|
|
|
|
**Stale entry cleanup:**
|
|
- Each `mesh:*` entry includes `_meshSession` metadata with PID and timestamp
|
|
- `claudemesh launch` sweeps stale entries on startup (dead PID check)
|
|
- `claudemesh doctor` reports orphaned entries
|
|
|
|
**Concurrent sessions:**
|
|
- Entries are session-scoped: `mesh:gmail:w1t0p0` (includes session ID)
|
|
- Each session manages only its own entries
|
|
|
|
### Mid-session deploys: dynamic tools
|
|
|
|
When a service is deployed after the Claude session started, native MCP entries
|
|
can't be added (Claude Code doesn't support adding new MCP servers mid-session).
|
|
|
|
**Two-tier fallback:**
|
|
|
|
1. **Claudemesh MCP fires `notifications/tools/list_changed`** (stdio, proven to work)
|
|
- Adds `svc__<name>__<tool>` tools to its own `tools/list`
|
|
- Claude sees them as `mcp__claudemesh__svc__gmail__search_emails`
|
|
- Works, but namespacing is less clean than native
|
|
|
|
2. **System notification tells the peer:**
|
|
```
|
|
[mesh] Service deployed: "namecheap" by Alejandro (3 tools).
|
|
Available now via mesh_tool_call("namecheap", "domains_list", {...}).
|
|
Restart session for native mcp__mesh_namecheap__* access.
|
|
```
|
|
|
|
3. **`mesh_tool_call` remains the universal fallback** — works for any
|
|
service at any time, native or not.
|
|
|
|
### Mid-session undeploys
|
|
|
|
When a service is undeployed, the native proxy process detects the broker
|
|
event and exits gracefully. Claude Code sees the MCP server disconnect and
|
|
stops offering those tools. No `list_changed` needed — MCP server death
|
|
is already handled.
|
|
|
|
### Schema introspection
|
|
|
|
For programmatic access to tool schemas (building workflows, debugging):
|
|
|
|
```
|
|
mesh_mcp_schema(server_name) → all tools with full inputSchema
|
|
mesh_mcp_schema(server_name, tool_name) → one specific tool's schema
|
|
mesh_mcp_catalog() → all services with tool counts, scope, status
|
|
```
|
|
|
|
---
|
|
|
|
## Database changes
|
|
|
|
### New table: `mesh.service`
|
|
|
|
```sql
|
|
CREATE TABLE mesh.service (
|
|
id TEXT PRIMARY KEY,
|
|
mesh_id TEXT NOT NULL REFERENCES mesh.mesh(id) ON DELETE CASCADE,
|
|
name TEXT NOT NULL,
|
|
type TEXT NOT NULL CHECK (type IN ('mcp', 'skill')),
|
|
|
|
-- Source
|
|
source_type TEXT NOT NULL CHECK (source_type IN ('inline', 'zip', 'git')),
|
|
source_file_id TEXT REFERENCES mesh.file(id),
|
|
source_git_url TEXT,
|
|
source_git_branch TEXT DEFAULT 'main',
|
|
source_git_sha TEXT,
|
|
prev_git_sha TEXT, -- for rollback
|
|
|
|
-- Content
|
|
description TEXT NOT NULL,
|
|
instructions TEXT, -- skills only
|
|
tools_schema JSONB, -- MCPs: [{ name, description, inputSchema }]
|
|
|
|
-- Bundle
|
|
manifest JSONB, -- { files: [...], entry: "src/index.ts" }
|
|
|
|
-- Execution (MCPs only)
|
|
runtime TEXT CHECK (runtime IN ('node', 'python', 'bun', NULL)),
|
|
status TEXT DEFAULT 'stopped'
|
|
CHECK (status IN ('building', 'installing', 'running',
|
|
'stopped', 'failed', 'crashed', 'restarting')),
|
|
config JSONB DEFAULT '{}', -- resource limits, network policy
|
|
last_health TIMESTAMP,
|
|
restart_count INT DEFAULT 0,
|
|
version INT DEFAULT 1,
|
|
|
|
-- Visibility scope
|
|
scope JSONB DEFAULT '{"type": "peer"}',
|
|
|
|
-- Metadata
|
|
deployed_by TEXT REFERENCES mesh.member(id),
|
|
deployed_by_name TEXT,
|
|
created_at TIMESTAMP DEFAULT now() NOT NULL,
|
|
updated_at TIMESTAMP DEFAULT now() NOT NULL,
|
|
|
|
UNIQUE (mesh_id, name)
|
|
);
|
|
```
|
|
|
|
### New table: `mesh.vault_entry`
|
|
|
|
```sql
|
|
CREATE TABLE mesh.vault_entry (
|
|
id TEXT PRIMARY KEY,
|
|
mesh_id TEXT NOT NULL REFERENCES mesh.mesh(id) ON DELETE CASCADE,
|
|
member_id TEXT NOT NULL REFERENCES mesh.member(id),
|
|
key TEXT NOT NULL,
|
|
ciphertext BYTEA NOT NULL,
|
|
nonce BYTEA NOT NULL,
|
|
sealed_key BYTEA NOT NULL,
|
|
entry_type TEXT DEFAULT 'env' CHECK (entry_type IN ('env', 'file')),
|
|
mount_path TEXT,
|
|
description TEXT,
|
|
created_at TIMESTAMP DEFAULT now(),
|
|
updated_at TIMESTAMP DEFAULT now(),
|
|
UNIQUE (mesh_id, member_id, key)
|
|
);
|
|
```
|
|
|
|
### Extend `mesh.skill` (backward compat)
|
|
|
|
```sql
|
|
ALTER TABLE mesh.skill
|
|
ADD COLUMN source_type TEXT DEFAULT 'inline'
|
|
CHECK (source_type IN ('inline', 'zip', 'git')),
|
|
ADD COLUMN bundle_file_id TEXT REFERENCES mesh.file(id),
|
|
ADD COLUMN git_url TEXT,
|
|
ADD COLUMN git_branch TEXT DEFAULT 'main',
|
|
ADD COLUMN git_sha TEXT,
|
|
ADD COLUMN manifest JSONB;
|
|
```
|
|
|
|
---
|
|
|
|
## Wire protocol additions
|
|
|
|
### Client → broker
|
|
|
|
```typescript
|
|
// --- Service deployment ---
|
|
|
|
interface WSMcpDeployMessage {
|
|
type: "mcp_deploy";
|
|
server_name: string;
|
|
source:
|
|
| { type: "zip"; file_id: string }
|
|
| { type: "git"; url: string; branch?: string; auth?: string };
|
|
config?: {
|
|
env?: Record<string, string>; // supports $vault: refs
|
|
memory_mb?: number; // default 256
|
|
cpus?: number; // default 0.5
|
|
network_allow?: string[]; // default: none
|
|
runtime?: "node" | "python" | "bun";
|
|
};
|
|
scope?:
|
|
| "peer" // private (default)
|
|
| "mesh" // everyone
|
|
| { peers: string[] } // named peers
|
|
| { group: string } // single group
|
|
| { groups: string[] } // multiple groups
|
|
| { role: string }; // by role tag
|
|
_reqId?: string;
|
|
}
|
|
|
|
interface WSMcpUndeployMessage {
|
|
type: "mcp_undeploy";
|
|
server_name: string;
|
|
_reqId?: string;
|
|
}
|
|
|
|
interface WSMcpUpdateMessage {
|
|
type: "mcp_update";
|
|
server_name: string;
|
|
_reqId?: string;
|
|
}
|
|
|
|
interface WSMcpLogsMessage {
|
|
type: "mcp_logs";
|
|
server_name: string;
|
|
lines?: number; // default 50, max 1000
|
|
_reqId?: string;
|
|
}
|
|
|
|
interface WSMcpScopeMessage {
|
|
type: "mcp_scope";
|
|
server_name: string;
|
|
scope?: // set — omit to read current
|
|
| "peer"
|
|
| "mesh"
|
|
| { peers: string[] }
|
|
| { group: string }
|
|
| { groups: string[] }
|
|
| { role: string };
|
|
_reqId?: string;
|
|
}
|
|
|
|
interface WSMcpSchemaMessage {
|
|
type: "mcp_schema";
|
|
server_name: string;
|
|
tool_name?: string; // omit for all tools
|
|
_reqId?: string;
|
|
}
|
|
|
|
interface WSMcpCatalogMessage {
|
|
type: "mcp_catalog";
|
|
_reqId?: string;
|
|
}
|
|
|
|
// --- Skill deployment ---
|
|
|
|
interface WSSkillDeployMessage {
|
|
type: "skill_deploy";
|
|
source:
|
|
| { type: "zip"; file_id: string }
|
|
| { type: "git"; url: string; branch?: string; auth?: string };
|
|
_reqId?: string;
|
|
}
|
|
|
|
// --- Vault ---
|
|
|
|
interface WSVaultSetMessage {
|
|
type: "vault_set";
|
|
key: string;
|
|
ciphertext: string; // base64
|
|
nonce: string; // base64
|
|
sealed_key: string; // base64
|
|
entry_type: "env" | "file";
|
|
mount_path?: string;
|
|
description?: string;
|
|
_reqId?: string;
|
|
}
|
|
|
|
interface WSVaultListMessage {
|
|
type: "vault_list";
|
|
_reqId?: string;
|
|
}
|
|
|
|
interface WSVaultDeleteMessage {
|
|
type: "vault_delete";
|
|
key: string;
|
|
_reqId?: string;
|
|
}
|
|
```
|
|
|
|
### Broker → client
|
|
|
|
```typescript
|
|
// --- Service responses ---
|
|
|
|
interface WSMcpDeployStatusMessage {
|
|
type: "mcp_deploy_status";
|
|
server_name: string;
|
|
status: "building" | "installing" | "running" | "failed";
|
|
tools?: Array<{ name: string; description: string; inputSchema: object }>;
|
|
error?: string;
|
|
_reqId?: string;
|
|
}
|
|
|
|
interface WSMcpLogsResultMessage {
|
|
type: "mcp_logs_result";
|
|
server_name: string;
|
|
lines: string[];
|
|
_reqId?: string;
|
|
}
|
|
|
|
interface WSMcpSchemaResultMessage {
|
|
type: "mcp_schema_result";
|
|
server_name: string;
|
|
tools: Array<{ name: string; description: string; inputSchema: object }>;
|
|
_reqId?: string;
|
|
}
|
|
|
|
interface WSMcpCatalogResultMessage {
|
|
type: "mcp_catalog_result";
|
|
services: Array<{
|
|
name: string;
|
|
type: "mcp" | "skill";
|
|
description: string;
|
|
status: string;
|
|
tool_count: number;
|
|
deployed_by: string;
|
|
scope: { type: string; [key: string]: unknown };
|
|
source_type: string;
|
|
runtime?: string;
|
|
created_at: string;
|
|
}>;
|
|
_reqId?: string;
|
|
}
|
|
|
|
interface WSMcpScopeResultMessage {
|
|
type: "mcp_scope_result";
|
|
server_name: string;
|
|
scope: { type: string; [key: string]: unknown };
|
|
deployed_by: string;
|
|
_reqId?: string;
|
|
}
|
|
|
|
// --- Skill responses ---
|
|
|
|
interface WSSkillDeployAckMessage {
|
|
type: "skill_deploy_ack";
|
|
name: string;
|
|
files: string[];
|
|
_reqId?: string;
|
|
}
|
|
|
|
// --- Vault responses ---
|
|
|
|
interface WSVaultAckMessage {
|
|
type: "vault_ack";
|
|
key: string;
|
|
action: "stored" | "deleted" | "not_found";
|
|
_reqId?: string;
|
|
}
|
|
|
|
interface WSVaultListResultMessage {
|
|
type: "vault_list_result";
|
|
entries: Array<{
|
|
key: string;
|
|
entry_type: "env" | "file";
|
|
mount_path?: string;
|
|
description?: string;
|
|
updated_at: string;
|
|
}>;
|
|
_reqId?: string;
|
|
}
|
|
|
|
// --- System events (broadcast to mesh) ---
|
|
|
|
// Sent as WSPushMessage with subtype: "system"
|
|
// event: "mcp_deployed"
|
|
// eventData: { name, description, tool_count, deployed_by, scope, tools: [...] }
|
|
|
|
// event: "mcp_undeployed"
|
|
// eventData: { name, by }
|
|
|
|
// event: "mcp_crashed"
|
|
// eventData: { name, error, restarts }
|
|
|
|
// event: "mcp_updated"
|
|
// eventData: { name, prev_sha, new_sha, tools: [...] }
|
|
```
|
|
|
|
### Extended `hello_ack`
|
|
|
|
```typescript
|
|
interface WSHelloAckMessage {
|
|
// ... existing fields ...
|
|
|
|
/** Scope-filtered service catalog for this peer. */
|
|
services?: Array<{
|
|
name: string;
|
|
description: string;
|
|
status: string;
|
|
tools: Array<{ name: string; description: string; inputSchema: object }>;
|
|
deployed_by: string;
|
|
}>;
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## MCP tool additions (CLI)
|
|
|
|
### Service management tools
|
|
|
|
```typescript
|
|
mesh_mcp_deploy(server_name, file_id?, git_url?, git_branch?, env?, runtime?,
|
|
memory_mb?, network_allow?, scope?)
|
|
mesh_mcp_undeploy(server_name)
|
|
mesh_mcp_update(server_name) // git-only: pull + rebuild + restart
|
|
mesh_mcp_logs(server_name, lines?)
|
|
mesh_mcp_scope(server_name, scope?) // set or read visibility scope
|
|
mesh_mcp_schema(server_name, tool?) // introspect tool schemas
|
|
mesh_mcp_catalog() // list all services with status
|
|
mesh_skill_deploy(file_id?, git_url?, git_branch?)
|
|
```
|
|
|
|
### Vault tools
|
|
|
|
```typescript
|
|
vault_set(key, value, type?, mount_path?, description?)
|
|
vault_list()
|
|
vault_delete(key)
|
|
```
|
|
|
|
### Existing tools (unchanged)
|
|
|
|
```typescript
|
|
share_skill(name, description, instructions, tags) // inline skills
|
|
mesh_mcp_register(server_name, description, tools) // live peer proxy
|
|
mesh_tool_call(server_name, tool_name, args) // universal fallback
|
|
mesh_mcp_list() // shows both proxy + managed
|
|
```
|
|
|
|
---
|
|
|
|
## Broker-side service manager
|
|
|
|
New file: `apps/broker/src/service-manager.ts`
|
|
|
|
### Interface
|
|
|
|
```typescript
|
|
interface ServiceManager {
|
|
deploy(opts: {
|
|
meshId: string;
|
|
name: string;
|
|
source: { type: "zip"; fileId: string }
|
|
| { type: "git"; url: string; branch: string; auth?: string };
|
|
config: ServiceConfig;
|
|
vaultEntries: Array<{ key: string; ciphertext: Buffer; nonce: Buffer; sealedKey: Buffer;
|
|
entryType: "env" | "file"; mountPath?: string }>;
|
|
}): Promise<{ tools: ToolDef[]; status: string }>;
|
|
|
|
undeploy(meshId: string, name: string): Promise<void>;
|
|
|
|
update(meshId: string, name: string): Promise<{ tools: ToolDef[]; newSha?: string }>;
|
|
|
|
callTool(meshId: string, serverName: string, toolName: string,
|
|
args: Record<string, unknown>): Promise<{ result?: unknown; error?: string }>;
|
|
|
|
logs(meshId: string, name: string, lines?: number): string[];
|
|
|
|
status(meshId: string, name: string): ServiceStatus;
|
|
|
|
restoreAll(): Promise<void>; // on broker boot
|
|
}
|
|
```
|
|
|
|
### Boot restore
|
|
|
|
On broker startup:
|
|
1. Query `mesh.service WHERE status IN ('running', 'crashed', 'restarting')`
|
|
2. Set all to `status='restarting'`
|
|
3. Re-spawn runner container per mesh
|
|
4. Load each service's source and spawn child process
|
|
5. Set `status='running'` only after successful MCP `initialize` response
|
|
6. Services that fail to start → `status='failed'`, system event broadcast
|
|
|
|
---
|
|
|
|
## Security model
|
|
|
|
| Concern | Mitigation |
|
|
|---|---|
|
|
| Arbitrary code execution | Docker container, one per mesh |
|
|
| Resource exhaustion | `--memory=512m --cpus=1` per container |
|
|
| Filesystem escape | No host volume mounts |
|
|
| Secret leakage | Vault E2E encrypted, decrypted only inside container |
|
|
| Network exfiltration | `--network=mesh-restricted`, per-service allowlist |
|
|
| Malicious zip (path traversal) | Validate all paths within target dir, reject `..` |
|
|
| Git auth tokens | Stored encrypted in vault, passed via `GIT_ASKPASS` |
|
|
| Denial of service | Max 20 services per mesh, max 50MB zip, max 500MB image |
|
|
| Scope bypass | Double-check: filter catalog + check on call |
|
|
| OAuth token expiry | Store refresh tokens, notify deployer on persistent failure |
|
|
| Tool name collision | `svc__` prefix for mid-session dynamic tools |
|
|
| Stale MCP entries | PID check + age sweep on launch |
|
|
| Tool call timeout | `MCP_TIMEOUT=30000` set by launch (default too short for mesh chain) |
|
|
| Large tool output | `MAX_MCP_OUTPUT_TOKENS=50000` set by launch; proxy truncates if needed |
|
|
| Proxy crash | Claude Code won't auto-restart; `claudemesh doctor` diagnoses dead proxies |
|
|
| Broker restart | Proxies reconnect via BrokerClient backoff; calls return "reconnecting" during window |
|
|
|
|
---
|
|
|
|
## CLI commands
|
|
|
|
```bash
|
|
# Deploy from zip
|
|
claudemesh deploy ./my-server.zip --name my-server
|
|
|
|
# Deploy from git
|
|
claudemesh deploy --git https://github.com/user/repo.git --name my-server
|
|
|
|
# Deploy with vault refs
|
|
claudemesh vault set gmail-creds ~/.gmail-mcp/credentials.json --type file
|
|
claudemesh deploy --git https://github.com/user/gmail-mcp.git --name gmail \
|
|
--env 'GMAIL_CREDENTIALS_PATH=$vault:gmail-creds:file:/secrets/creds.json' \
|
|
--network-allow 'gmail.googleapis.com:443'
|
|
|
|
# Set access
|
|
claudemesh scope gmail --mesh # everyone
|
|
claudemesh scope gmail --group eng # @eng only
|
|
claudemesh scope gmail --groups 'eng,ops' # @eng + @ops
|
|
claudemesh scope gmail --role lead # leads only
|
|
claudemesh scope gmail --peers 'Mou,Alejandro' # specific peers
|
|
claudemesh scope gmail --peer # private (deployer only)
|
|
|
|
# Manage
|
|
claudemesh logs gmail
|
|
claudemesh update gmail # git-only: pull + rebuild
|
|
claudemesh undeploy gmail
|
|
claudemesh catalog # list all services
|
|
|
|
# Skills
|
|
claudemesh skill deploy ./my-skill.zip
|
|
claudemesh skill deploy --git https://github.com/user/skill.git
|
|
|
|
# Vault
|
|
claudemesh vault set api-key "sk-abc123"
|
|
claudemesh vault set oauth-creds ~/path/to/creds.json --type file
|
|
claudemesh vault list
|
|
claudemesh vault delete api-key
|
|
```
|
|
|
|
---
|
|
|
|
## Migration path
|
|
|
|
| What | Before | After |
|
|
|---|---|---|
|
|
| `share_skill()` inline | works | unchanged |
|
|
| `mesh_mcp_register()` live proxy | works | unchanged, labeled "proxy" in catalog |
|
|
| Zip MCP server | not possible | `share_file` + `mesh_mcp_deploy` |
|
|
| Git MCP server | not possible | `mesh_mcp_deploy(git_url=...)` |
|
|
| Zip skill bundle | not possible | `mesh_skill_deploy(file_id=...)` |
|
|
| Git skill | not possible | `mesh_skill_deploy(git_url=...)` |
|
|
| `mesh_tool_call` | forwards to peer | routes to runner OR forwards to peer |
|
|
| `mesh_mcp_list` | proxy only | shows proxy + managed, with status |
|
|
| Tool discovery | manual `mesh_mcp_list` | native MCP entries at launch + mid-session events |
|
|
| Credentials | plaintext env vars | E2E encrypted vault with `$vault:` refs |
|
|
| Access control | none (anyone can call) | Scopes: peer/group/role/mesh per service |
|
|
|
|
All existing behavior preserved. New capabilities are additive.
|
|
|
|
---
|
|
|
|
## Implementation order
|
|
|
|
### Phase 1: Foundation
|
|
1. DB migration — `mesh.service` table, `mesh.vault_entry` table, extend `mesh.skill`
|
|
2. Wire protocol — add all new message types to `types.ts`
|
|
3. Vault — broker-side storage + CLI tools (`vault_set`, `vault_list`, `vault_delete`)
|
|
4. Service catalog — `mcp_catalog`, `mcp_schema`, scope filtering in `hello_ack`
|
|
|
|
### Phase 2: Execution engine
|
|
5. Runner supervisor — `service-manager.ts`, child process spawn/kill/restart/health
|
|
6. Docker container — base image, build + run lifecycle
|
|
7. Deploy flow — zip extraction, git clone, runtime detection, `npm install` / `pip install`
|
|
8. Tool call routing — broker routes managed service calls to runner
|
|
|
|
### Phase 3: Native integration
|
|
9. Launch integration — `claudemesh launch` writes `mesh:*` MCP entries to `~/.claude.json`
|
|
10. Stdio proxy — `claudemesh mcp --service <name>` thin proxy command
|
|
11. Mid-session fallback — `svc__*` dynamic tools + `list_changed` on claudemesh MCP
|
|
12. Session cleanup — stale entry sweep, PID checks, `flock` on config writes
|
|
|
|
### Phase 4: Skill bundles
|
|
13. Skill deploy — zip/git extraction, `SKILL.md` + `skill.json` parsing, manifest storage
|
|
14. `get_skill` extension — returns structured file contents from bundle
|
|
|
|
### Phase 5: Polish
|
|
15. `mesh_mcp_update` — git pull + rebuild + restart flow
|
|
16. Boot restore — re-spawn services on broker restart
|
|
17. CLI commands — `claudemesh deploy`, `claudemesh vault`, `claudemesh scope`, `claudemesh catalog`
|
|
18. Docs + example bundles — sample MCP server zip, sample skill bundle
|
|
|
|
---
|
|
|
|
## Appendix: Claude Code MCP behavior (verified)
|
|
|
|
Key findings from Claude Code MCP architecture research that informed this
|
|
spec. These are behaviors of Claude Code itself, not the MCP protocol.
|
|
|
|
### Lifecycle
|
|
- MCP servers start when a session begins, stop when it ends
|
|
- **No auto-restart on crash** — next tool invocation fails. Our proxy must
|
|
handle reconnection to the broker independently
|
|
- No health checks from Claude Code — failures discovered on tool use
|
|
- `MCP_TIMEOUT` env var controls tool call timeout
|
|
|
|
### Dynamic tools
|
|
- `notifications/tools/list_changed` is supported and triggers immediate
|
|
re-fetch of `tools/list` — works mid-conversation over stdio
|
|
- **SSE/HTTP transport support for `list_changed` may be unreliable** — known
|
|
bug in some versions. This is why we use stdio proxies, not HTTP transport.
|
|
|
|
### ToolSearch / deferred tools
|
|
- Enabled by default (`ENABLE_TOOL_SEARCH=true`)
|
|
- Only tool **names** are loaded at startup — full schemas fetched on demand
|
|
- Requires Sonnet 4+ or Opus 4+ (Haiku does not support tool references)
|
|
- Adding 100+ MCP tools has near-zero context cost at startup
|
|
- Configurable: `ENABLE_TOOL_SEARCH=auto:5` loads upfront if <5% of context
|
|
|
|
### Tool output limits
|
|
- Warning at 10,000 tokens, hard limit at 25,000 tokens (default)
|
|
- Configurable via `MAX_MCP_OUTPUT_TOKENS` env var
|
|
- Per-tool override: `_meta["anthropic/maxResultSizeChars"]` (up to 500K chars)
|
|
|
|
### Namespacing
|
|
- Tools namespaced as `mcp__servername__toolname`
|
|
- Two servers with same tool name → no conflict (different namespace)
|
|
- Server names normalized: spaces → underscores
|
|
|
|
### Registration
|
|
- **File-based only** — no runtime API to add MCP servers
|
|
- Scopes: `local` (~/.claude.json), `project` (.mcp.json), `user` (~/.claude.json global)
|
|
- Precedence: local > project > user
|
|
- `claude mcp add --scope user` for global, `--scope project` for team-shared
|
|
- **Cannot add new MCP server entries mid-session** — this is why `claudemesh
|
|
launch` pre-writes entries before spawning, and mid-session deploys fall
|
|
back to dynamic `svc__*` tools on the claudemesh MCP server
|
|
|
|
### Environment variables
|
|
- Passed via `--env KEY=VALUE` on `claude mcp add`
|
|
- `.mcp.json` supports `${VAR}` and `${VAR:-default}` expansion
|
|
- Special: `${CLAUDE_PLUGIN_ROOT}`, `${CLAUDE_PLUGIN_DATA}`
|
|
|
|
### Implications for this spec
|
|
- Native MCP entries MUST be written before `claude` spawns → `claudemesh launch` flow
|
|
- Stdio transport is the only reliable path for `list_changed` → thin proxy model
|
|
- ToolSearch means 100+ mesh tools have negligible context cost
|
|
- No server dependencies → each mesh proxy is independent
|
|
- No auto-restart → proxies must reconnect to broker on their own
|