Backwards compat shim (task 27)
- requireCliAuth() falls back to body.user_id when BROKER_LEGACY_AUTH=1
and no bearer present. Sets Deprecation + Warning headers + bumps a
broker_legacy_auth_hits_total metric so operators can watch the
legacy traffic drain to 0 before removing the shim.
- All handlers parse body BEFORE requireCliAuth so the fallback can
read user_id out of it.
HA readiness (task 29)
- .artifacts/specs/2026-04-15-broker-ha-statelessness-audit.md
documents every in-memory symbol and rollout plan (phase 0-4).
- packaging/docker-compose.ha-local.yml spins up 2 broker replicas
behind Traefik sticky sessions for local smoke testing.
- apps/broker/src/audit.ts now wraps writes in a transaction that
takes pg_advisory_xact_lock(meshId) and re-reads the tail hash
inside the txn. Concurrent broker replicas can no longer fork the
audit chain.
Deploy gate (task 30)
- /health stays permissive (200 even on transient DB blips) so
Docker doesn't kill the container on a glitch.
- New /health/ready checks DB + optional EXPECTED_MIGRATION pin,
returns 503 if either fails. External deploy gate can poll this
and refuse to promote a broken deploy.
Metrics dashboard (task 32)
- packaging/grafana/claudemesh-broker.json: ready-to-import Grafana
dashboard covering active conns, queue depth, routed/rejected
rates, grant drops, legacy-auth hits, conn rejects.
Tests (task 28)
- audit-canonical.test.ts (4 tests) pins canonical JSON semantics.
- grants-enforcement.test.ts (6 tests) covers the member-then-
session-pubkey lookup with default/explicit/blocked branches.
Docs (task 34)
- docs/env-vars.md catalogues every env var the broker + CLI read.
Crypto review prep (task 35)
- .artifacts/specs/2026-04-15-crypto-review-packet.md: reviewer
brief, threat model, scope, test coverage list, deliverables.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Broker (all need redeploy):
- sweepOrphanMessages: DELETE undelivered message_queue rows older
than 7 days; hourly sweep. Stops unbounded growth when a sender
typos a name (queued forever, never claimed).
- Per-member send rate limit: TokenBucket(60/min, burst 10) keyed on
memberId so reconnecting can't bypass. Surfaces as queued=false,
error='rate_limit: ...'.
- Pre-flight size cap: reject at handleSend if nonce+ciphertext+
targetSpec exceeds env.MAX_MESSAGE_BYTES with a clear error
instead of silent WSS frame-level kill.
- No-recipient reject: for direct sends, check any matching peer
is connected BEFORE queueing. Kills the self-send silent drop
(sending to your own pubkey when you only have one session
connected) and typo-to-offline-peer silent drops.
- WSAckMessage.error field added for structured failure reasons.
CLI:
- ws-client ack handler reads msg.queued and msg.error; surfaces
rate_limit / too_large / no_recipient to callers instead of
returning ok:true with a dummy messageId.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a user opens multiple Claude Code instances on one laptop they
all share the same memberPubkey (one identity, one config.json). The
broker was broadcasting each Claude Code start/stop to every OTHER
session of the same user — showing as 'peer agutierrez left / joined'
spam in every active claude terminal.
Now: skip broadcast to presences whose memberPubkey equals the joining
or leaving presence's memberPubkey. Other actual peers on the mesh
still see the event.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Older CLIs sometimes called POST /cli/mesh/create without a pubkey,
and the broker stored the string 'pending' as peer_pubkey on the
owner's mesh.member row. Every subsequent hello from the real CLI
failed the membership lookup silently, leaving the connection in
'reconnecting' forever with no useful log line.
Now: validate pubkey is 64 hex chars before creating the owner
member row. Existing 'pending' rows on prod were patched manually.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- apps/cli/ is now the canonical CLI (was apps/cli-v2/).
- apps/cli/ legacy v0 archived as branch 'legacy-cli-archive' and tag
'cli-v0-legacy-final' before deletion; git history preserves it too.
- .github/workflows/release-cli.yml paths updated.
- pnpm-lock.yaml regenerated.
Broker-side peer-grant enforcement (spec: 2026-04-15-per-peer-capabilities):
- 0020_peer-grants.sql adds peer_grants jsonb + GIN index on mesh.member.
- handleSend in broker fetches recipient grant maps once per send, drops
messages silently when sender lacks the required capability.
- POST /cli/mesh/:slug/grants to update from CLI; broker_messages_dropped_by_grant_total metric.
- CLI grant/revoke/block now mirror to broker via syncToBroker.
Auto-migrate on broker startup:
- apps/broker/src/migrate.ts runs drizzle migrate with pg_advisory_lock
before the HTTP server binds. Exits non-zero on failure so Coolify
healthcheck fails closed.
- Dockerfile copies packages/db/migrations into /app/migrations.
- postgres 3.4.5 added as direct broker dep.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Email (broker):
- Rebrand mesh-invitation.tsx to match site (clay accent #d97757,
cream fg, Anthropic Serif/Mono, dark bg). Mesh glyph in header.
- Hero CTA links to the /i/short URL landing page.
- Single one-liner 'npm i -g claudemesh-cli && claudemesh launch --join URL'
so new users copy once, paste once, done.
Web InstallToggle:
- Replace two-step numbered list with single one-liner in the first-time
panel. Reduces copy/paste ops from 2 to 1 and stops prescribing
'YourName' as a literal (CLI now defaults to $USER).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the plain-text invite email with a standalone react-email
template (apps/broker/src/emails/mesh-invitation.tsx) using
@react-email/components + Tailwind. Rendered on demand in
handleCliMeshInvite and sent as both HtmlBody and TextBody via
Postmark (or html+text via Resend).
Self-contained — no dependency on @turbostarter/email, i18n, or ui
packages. Adds react, react-dom, @react-email/components, @react-email/render
to broker deps. Enables tsconfig jsx: react-jsx and .tsx includes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Broker now sends the invite email when body.email is provided and
POSTMARK_API_KEY (or RESEND_API_KEY) is configured. Returns
`emailed: boolean` so the CLI can honestly report whether the email
was sent instead of falsely claiming success on link generation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
handleCliMeshCreate now generates ownerPubkey/ownerSecretKey/rootKey so
CLI-created meshes can issue invites. handleCliMeshInvite builds the
full signed v1 payload + v2 capability (matching createMyInvite in
packages/api) and self-heals meshes created by older broker versions
that are missing keys.
Fixes 500 on claudemesh share after CLI mesh create.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- session_id (clm_sess_...) in browser URL — identifies login attempt
- user_code (ABCD-EFGH) visual confirmation — shown in both terminal and browser
- device_code (secret) — CLI polls with this, never displayed
- CLI accepts stdin paste of JWT token while polling (race)
- Web page handles both ?session= and ?code= params
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Drizzle schema: device_code + cli_session tables in mesh pgSchema
- Broker endpoints: POST /cli/device-code, GET /cli/device-code/:code,
POST /cli/device-code/:code/approve, GET /cli/sessions
- Web app API routes now proxy to broker (no in-memory state)
- Tracks devices per user: hostname, platform, arch, last_seen, token_hash
- JWT signed with CLI_SYNC_SECRET, 30-day expiry
- Session revocation support via revokedAt column
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- /meshes and /status now show mesh slug names instead of truncated IDs
- meshSlug cached on connect and loaded from DB join on boot
- Remove dangerous fallback that connected to ALL meshes in email flow
- BridgeRow now includes optional meshSlug field
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dynamic import returns module wrapper, need .default.ready then .default
for the actual sodium functions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Members created by CLI don't have dashboardUserId set. Now searches
by both userId and dashboardUserId columns. Falls back to all meshes
if no member link found (bootstrap case for mesh owners).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The member table has no secretKey column (by design - keys are local).
Email verification now generates a fresh ed25519 keypair and creates
a new bridge-specific member entry for each mesh the user belongs to.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Users can now type /connect in the bot → enter email → receive 6-digit
code → enter code → auto-connect to all meshes linked to that email.
Supports Resend and Postmark email providers via env vars.
Rate-limited to 5 code attempts, 10-min expiry.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Claudemesh MCP server now declares prompts:{} and resources:{} capabilities.
Mesh skills auto-appear as /claudemesh:skill-name slash commands in Claude Code
via prompts/list+get, and as skill://claudemesh/{name} resources for the
upcoming MCP_SKILLS protocol. share_skill accepts optional metadata (when_to_use,
allowed_tools, model, context, agent) stored in the manifest jsonb column.
Change notifications sent on share/remove so Claude Code refreshes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Boot restore now checks runner /health to see what's already running,
then updates DB status to match. Fixes the bug where broker restart
marked running services as 'failed' because it tried to re-deploy
without shared source volume.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Runner /load now accepts gitUrl, npxPackage, or sourcePath. It handles
git clone and npm install internally. Broker no longer needs shared
volume for source extraction — just tells the runner what to fetch.
CLI mesh_mcp_deploy now supports npx_package as a third source type.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- apps/runner/: Dockerfile (node22 + python3 + uv + bun) + supervisor.mjs
(HTTP API for load/call/unload/health)
- docker-compose: runner service with shared services-data volume
- Broker mcp_deploy: git clone or zip extract → runner /load → MCP spawn
- Broker mcp_call: routes managed services to runner via HTTP, falls back
to live-proxy for peer-hosted servers
- RUNNER_URL env var for broker → runner communication
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- broker-crypto.ts: AES-256-GCM encrypt/decrypt with BROKER_ENCRYPTION_KEY
- mcp_deploy stores env as _encryptedEnv in mesh.service.config (no plaintext in DB)
- boot restore: decrypts _encryptedEnv and re-spawns services via service-manager
- auto-generates ephemeral key if BROKER_ENCRYPTION_KEY not set (logs warning)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add the foundation for deploying and managing MCP servers on the VPS
broker, with per-peer credential vaults and visibility scopes.
Architecture:
- One Docker container per mesh with a Node supervisor
- Each MCP server runs as a child process with its own stdio pipe
- claudemesh launch installs native MCP entries in ~/.claude.json
- Mid-session deploys fall back to svc__* dynamic tools + list_changed
New components:
- DB: mesh.service + mesh.vault_entry tables, mesh.skill extensions
- Broker: 19 wire protocol types, 11 message handlers, service catalog
in hello_ack with scope filtering, service-manager.ts (775 lines)
- CLI: 13 tool definitions, 12 WS client methods, tool call handlers,
startServiceProxy() for native MCP proxy mode
- Launch: catalog fetch, native MCP entry install, stale sweep, cleanup,
MCP_TIMEOUT=30s, MAX_MCP_OUTPUT_TOKENS=50k
Security: path sanitization on service names, column whitelist on
upsertService, returning()-based delete checks, vault E2E encryption.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Persistent MCP servers (opt-in via `persistent: true`) survive host
disconnects — they appear as offline in mcp_list and auto-restore when
the host reconnects. Ephemeral servers (default) still clean up on
disconnect. Offline servers return a clear error on mcp_call with
time-since-disconnect info.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Save groups, profile, visibility, summary, display name, and cumulative
stats to a new mesh.peer_state table on disconnect. On reconnect (same
meshId + memberId), restore them automatically — hello groups take
precedence over stored groups if provided. Broadcast peer_returned
system event with last-seen time and summary to other peers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Peers now receive [system] notifications when MCP servers join or
leave the mesh, with tool names and hosting peer info.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Peers report os.hostname() in the hello handshake. list_peers shows
[local] or [remote] tag per peer. MCP instructions teach AI to read
local peers' files directly via filesystem instead of relay.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add tamper-evident audit logging where each entry includes a SHA-256
hash of the previous entry, forming a verifiable chain per mesh.
Events tracked: peer_joined, peer_left, state_set, message_sent
(never logs message content). New WS handlers: audit_query for
paginated retrieval, audit_verify for chain integrity verification.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Peers self-report resource usage via set_stats; stats visible in
list_peers responses and the new mesh_stats MCP tool. CLI auto-reports
every 60s and tracks messagesIn/Out, toolCalls, uptime, and errors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Peers can register MCP servers with the mesh and other peers can invoke
those tools through the existing claudemesh connection without restarting.
Broker: in-memory MCP registry with mcp_register/unregister/list/call
handlers, call forwarding to hosting peer with 30s timeout, and automatic
cleanup on peer disconnect.
CLI: mcpRegister/mcpUnregister/mcpList/mcpCall client methods, inbound
mcp_call_forward handler, and 4 new MCP tools (mesh_mcp_register,
mesh_mcp_list, mesh_tool_call, mesh_mcp_remove).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace in-memory-only setTimeout scheduling with a DB-backed system
that survives broker restarts. Adds:
- `scheduled_message` table in mesh schema (Drizzle + raw CREATE TABLE
for zero-downtime deploys)
- Minimal 5-field cron parser (no dependencies) with next-fire-time
calculation for recurring entries
- On broker boot, all non-cancelled entries are loaded from PostgreSQL
and timers re-armed automatically
- CLI `schedule_reminder` MCP tool accepts optional `cron` expression
- CLI `remind` command accepts `--cron` flag
- One-shot reminders remain backward compatible — no cron field = same
behavior as before
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extend the WS hello handshake with optional peerType, channel, and model
fields so peers can advertise what kind of client they are. The broker
stores these in-memory on PeerConn and returns them (along with cwd) in
the peers_list response. CLI peers command and MCP list_peers tool now
display the new metadata.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>