Additive plumbing for v0.9.0 daemon spec §4.2/§4.4. Adds two nullable
columns to mesh.message_queue — client_message_id (caller-supplied) and
request_fingerprint (canonical sha256 of the send shape) — and threads
them through the broker:
- handleSend reads them off the wire envelope when present
- queueMessage persists them on the row
- drainForMember projects them onto the push so receiving daemons
can dedupe their local inbox by client_message_id
Columns stay nullable so legacy traffic (launch CLI, dashboard chat)
continues to flow uninterrupted. Sprint 7 (broker hardening) will add
the partial unique index and the client_message_dedupe atomic-accept
table once we're ready to enforce dedupe broker-side.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds apps/cli/src/cli/validators.ts — a small module of shape
validators (pubkey, pubkey prefix, message id, mesh slug) that return
discriminated results so callers can distinguish "shape is wrong"
(INVALID_ARGS exit) from "value is well-shaped, lookup failed"
(NOT_FOUND exit). Includes renderValidationError() for a consistent
three-tier error contract: what's wrong, what would be valid, closest
valid alternative.
First adopter is `claudemesh msg-status`:
- Validates id locally before opening WS — typos return immediately.
- Accepts 8-32 char prefixes (full ids are 32). Pastes that get
copy-truncated by the terminal still work.
- Distinct error messages for malformed input vs not-in-queue vs
ambiguous prefix; --json emits the structured shape.
Broker side: WS message_status handler validates idStr is 8-32
base62 before querying. Prefix lookups use LIKE 'prefix%' scoped to
the caller's mesh (no cross-mesh leak). Returns ambiguous_prefix
when more than one match.
Establishes the canonical pattern; rolling out to send / grant /
revoke / topic post --reply-to in subsequent patches.
Two related bugs surfaced in multi-session production use of 1.8.0:
1. Replies via `claudemesh send <from_id>` rejected with "no connected
peer for target" when the original sender's session had rotated
(Claude Code restart, /resume). Root cause: from_id carried the
ephemeral session pubkey, which disappears the moment the session
ends. Fix: handleSend pre-flight now also resolves the target
pubkey against the persistent meshMember table and routes to the
owning member's live session(s); MCP push channel now sets from_id
to the stable member pubkey and exposes the ephemeral one under
from_session_pubkey.
2. Broadcast/* and @group sends loopback'd to the sender's *sibling*
sessions (same member, different session keypair), surfacing a
spurious "tampered or wrong keypair" decrypt warning on the
sender's own inboxes. Fix: broadcast/group fan-out now skips by
memberPubkey, not just by presence_id, so the entire sender member
is excluded — direct sends keep per-presence skip so a member can
still DM their own sibling session intentionally.
Push envelope now also carries senderMemberPubkey alongside
senderPubkey so any other client of the WS channel can choose the
right one.
Adds a reply_to_id column (self-FK on topic_message) plus end-to-end
plumbing so a message can mark itself as a reply to a previous one in
the same topic.
- Schema: 0027_topic_message_reply_to.sql adds reply_to_id with
ON DELETE SET NULL + index for backlink lookup.
- Broker: appendTopicMessage validates parent shares the topic, writes
reply_to_id; topicHistory + topic_history_response surface it; WS
push envelope now carries senderMemberId, senderName, topic name,
reply_to_id, and message_id so recipients have everything they need
to reply without a follow-up query.
- REST: POST /v1/messages accepts replyToId (validated server-side);
GET /messages and SSE /stream emit it per row.
- CLI: \`topic post --reply-to <id|prefix>\` resolves prefixes against
recent history; \`topic tail\` renders an "↳ in reply to <name>:
<snippet>" line above replies and shows a copyable #shortid tag on
every row.
- MCP push pipe: channel attributes now include from_pubkey,
from_member_id, message_id, topic, reply_to_id — the recipient can
thread a reply directly from the inbound notification.
- Skill + identity prompt updated to teach Claude how to use the new
attributes for replies.
Bumped CLI to 1.9.0.
WSSendMessage gains an optional mentions field; the broker forwards
it into appendTopicMessage so WS-driven topic sends get the same
write-time fan-out path as REST POST /v1/messages. v1 messages
(today's plaintext-base64) still fall back to a body regex when the
field is omitted, so existing CLIs aren't broken; v2 ciphertext
clients in phase 3 will populate it.
Also drops the duplicate meshMember import (kept the meshMember-as-
memberTable alias which the rest of the file uses).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
claudemesh apikey revoke <id> reported success even when the input
didn't match any row in mesh.api_key. The CLI's `apikey list` shows
truncated 8-char prefixes; users naturally paste those; broker did
exact-id match against meshApiKey.id; UPDATE affected 0 rows; old
revokeApiKey returned void so the CLI couldn't tell. Discovered via
end-to-end CLI smoke test against prod (roadmap validation pass).
Three-part fix:
- broker.revokeApiKey now returns
{ status: "revoked"|"not_found"|"not_unique"; id?, matches? } and
accepts either the full id or a unique prefix (>=6 chars). Prefix
matching is bounded to the caller's mesh and only succeeds if
exactly one row matches; ambiguous prefixes return not_unique so
we never silently revoke the wrong key.
- New WSApiKeyRevokeResponseMessage carries the structured status
back to the CLI. Old apikey_revoke_ok type removed before being
released — never shipped to users. The error path is no longer
used for not_found/not_unique cases; the unified response carries
both outcomes.
- CLI's apiKeyRevoke now resolves with { ok, id } | { ok: false,
code, message }. runApiKeyRevoke surfaces the code/message and
exits non-zero on failure (NOT_FOUND for missing, INVALID_ARGS
for ambiguous prefix).
Net effect: pasting `claudemesh apikey revoke vq0fwjdX` now actually
revokes the key whose id starts with vq0fwjdX (or fails loud if 0
or >1 keys match). Verified against prod via the new branch's CLI
binary before commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Now that the filename-tracked runner is in place and prod is bootstrapped,
BROKER_SKIP_MIGRATE=1 is no longer needed. Removed from Coolify env;
the comment is updated to reflect that the flag is a break-glass for
ops, not the steady-state.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The web chat surface needed a guaranteed landing room — a topic that
exists for every mesh from creation onward so the dashboard always has
somewhere to drop the user. #general is the convention; ephemeral DMs
remain ephemeral (mesh.message_queue) so agentic privacy is unchanged.
Three hooks plus a backfill:
- packages/api/src/modules/mesh/mutations.ts — createMyMesh now calls
ensureGeneralTopic() right after the mesh insert. New helper is
idempotent via the unique (mesh_id, name) index.
- apps/broker/src/index.ts — handleMeshCreate (CLI claudemesh new)
inserts #general + subscribes the owner member as 'lead' in the
same handler.
- apps/broker/src/crypto.ts — invite-claim flow auto-subscribes the
newly minted member to #general as 'member', defensively ensuring
the topic exists if predates this change.
- packages/db/migrations/0024_general_topic_backfill.sql — one-shot
backfill: creates #general for every active mesh that doesn't have
one, subscribes every active member, and marks the mesh owner as
'lead' based on owner_user_id == member.user_id. Idempotent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issuance flow over WS for now (REST endpoints come next slice).
Plaintext secret returned ONCE on create — never recoverable.
- broker: 3 WS handlers (apikey_create/list/revoke), wire types in
union, audit log on issuance + revoke
- ws-client: apiKeyCreate/List/Revoke with resolver maps, response
dispatch
- CLI: claudemesh apikey create <label> [--cap a,b] [--topic c,d]
[--expires ISO]; list shows status, scope, last-used; revoke by id
- policy: apikey create + revoke prompt by default (issuing or
disabling a credential is meaningful)
Default capability set is "send,read" — least privilege for unscoped
keys (admin must explicitly opt-in).
Spec: .artifacts/specs/2026-05-02-v0.2.0-scope.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
handleSend's pre-flight check rejected #<topicId> sends because the
target wasn't matched by @group / * / pubkey, so it fell into the
"direct" branch and looked for a peer with that pubkey. Topic targets
need their own class — delivery happens via topic_member, not by
matching connected peers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Broker (apps/broker/src/index.ts)
- Unified disconnect/kick handler uses close code 1000 for disconnect
(CLI auto-reconnects) vs 4001 for kick (CLI exits, no reconnect).
- Ban now closes with code 4002.
- Hello handler: revoked members get a specific 'revoked' error with a
'Contact the mesh owner to rejoin' message, then ws.close(4002).
Previously banned users saw the generic 'unauthorized' error.
- list_bans handler returns { name, pubkey, revokedAt } for each
revoked member.
CLI (apps/cli)
- ws-client: close codes 4001 and 4002 set .closed = true and stash
.terminalClose so callers can surface a friendly message instead of
the low-level 'ws terminal close' error. Revoked error in hello is
also captured as a terminal close.
- withMesh catches terminalClose and prints:
4001 → 'Kicked from this mesh. Run claudemesh to rejoin.'
4002 → the broker's 'Contact the mesh owner to rejoin.' message
- kick.ts now exports runDisconnect + runKick with clear hints:
'disconnect' → 'They will auto-reconnect within seconds.'
'kick' → 'They can rejoin anytime by running claudemesh.'
- cli.ts adds 'disconnect' dispatch; HELP updated.
Semantics:
disconnect: session reset, no DB state, auto-reconnects
kick : session ends, no DB state, user must manually rejoin
ban : session ends + revokedAt set, cannot rejoin until unban
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Broker WS handlers:
- kick: disconnect peer(s) by name, --stale duration, or --all.
Authz: owner or admin only. Closes WS + marks presence disconnected.
- ban: kick + set revokedAt on mesh.member. Hello already rejects
revoked members, so ban is instant and permanent until unban.
- unban: clear revokedAt. Peer can rejoin with their existing keypair.
- list_bans: return all revoked members for a mesh.
Session-id dedup (previous commit): handleHello disconnects ghost
presences with matching (meshId, sessionId) before inserting the new
one. Eliminates duplicate entries after broker restarts.
CLI (alpha.37):
- claudemesh kick <peer|--stale 30m|--all>
- claudemesh ban/unban <peer>
- claudemesh bans [--json]
- Uses new sendAndWait() on ws-client for request-response pattern
over WS (generic _reqId resolver).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a client reconnects with the same session_id before the 90s
stale sweeper runs, the old ghost presence stays in the connections
map. Result: duplicate entries in list_peers for the same Claude
Code instance.
Now: handleHello iterates connections for matching (meshId, sessionId),
closes the old WS, deletes from map, marks disconnected in DB.
One session_id = one presence, always.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backwards compat shim (task 27)
- requireCliAuth() falls back to body.user_id when BROKER_LEGACY_AUTH=1
and no bearer present. Sets Deprecation + Warning headers + bumps a
broker_legacy_auth_hits_total metric so operators can watch the
legacy traffic drain to 0 before removing the shim.
- All handlers parse body BEFORE requireCliAuth so the fallback can
read user_id out of it.
HA readiness (task 29)
- .artifacts/specs/2026-04-15-broker-ha-statelessness-audit.md
documents every in-memory symbol and rollout plan (phase 0-4).
- packaging/docker-compose.ha-local.yml spins up 2 broker replicas
behind Traefik sticky sessions for local smoke testing.
- apps/broker/src/audit.ts now wraps writes in a transaction that
takes pg_advisory_xact_lock(meshId) and re-reads the tail hash
inside the txn. Concurrent broker replicas can no longer fork the
audit chain.
Deploy gate (task 30)
- /health stays permissive (200 even on transient DB blips) so
Docker doesn't kill the container on a glitch.
- New /health/ready checks DB + optional EXPECTED_MIGRATION pin,
returns 503 if either fails. External deploy gate can poll this
and refuse to promote a broken deploy.
Metrics dashboard (task 32)
- packaging/grafana/claudemesh-broker.json: ready-to-import Grafana
dashboard covering active conns, queue depth, routed/rejected
rates, grant drops, legacy-auth hits, conn rejects.
Tests (task 28)
- audit-canonical.test.ts (4 tests) pins canonical JSON semantics.
- grants-enforcement.test.ts (6 tests) covers the member-then-
session-pubkey lookup with default/explicit/blocked branches.
Docs (task 34)
- docs/env-vars.md catalogues every env var the broker + CLI read.
Crypto review prep (task 35)
- .artifacts/specs/2026-04-15-crypto-review-packet.md: reviewer
brief, threat model, scope, test coverage list, deliverables.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Broker (all need redeploy):
- sweepOrphanMessages: DELETE undelivered message_queue rows older
than 7 days; hourly sweep. Stops unbounded growth when a sender
typos a name (queued forever, never claimed).
- Per-member send rate limit: TokenBucket(60/min, burst 10) keyed on
memberId so reconnecting can't bypass. Surfaces as queued=false,
error='rate_limit: ...'.
- Pre-flight size cap: reject at handleSend if nonce+ciphertext+
targetSpec exceeds env.MAX_MESSAGE_BYTES with a clear error
instead of silent WSS frame-level kill.
- No-recipient reject: for direct sends, check any matching peer
is connected BEFORE queueing. Kills the self-send silent drop
(sending to your own pubkey when you only have one session
connected) and typo-to-offline-peer silent drops.
- WSAckMessage.error field added for structured failure reasons.
CLI:
- ws-client ack handler reads msg.queued and msg.error; surfaces
rate_limit / too_large / no_recipient to callers instead of
returning ok:true with a dummy messageId.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a user opens multiple Claude Code instances on one laptop they
all share the same memberPubkey (one identity, one config.json). The
broker was broadcasting each Claude Code start/stop to every OTHER
session of the same user — showing as 'peer agutierrez left / joined'
spam in every active claude terminal.
Now: skip broadcast to presences whose memberPubkey equals the joining
or leaving presence's memberPubkey. Other actual peers on the mesh
still see the event.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Older CLIs sometimes called POST /cli/mesh/create without a pubkey,
and the broker stored the string 'pending' as peer_pubkey on the
owner's mesh.member row. Every subsequent hello from the real CLI
failed the membership lookup silently, leaving the connection in
'reconnecting' forever with no useful log line.
Now: validate pubkey is 64 hex chars before creating the owner
member row. Existing 'pending' rows on prod were patched manually.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- apps/cli/ is now the canonical CLI (was apps/cli-v2/).
- apps/cli/ legacy v0 archived as branch 'legacy-cli-archive' and tag
'cli-v0-legacy-final' before deletion; git history preserves it too.
- .github/workflows/release-cli.yml paths updated.
- pnpm-lock.yaml regenerated.
Broker-side peer-grant enforcement (spec: 2026-04-15-per-peer-capabilities):
- 0020_peer-grants.sql adds peer_grants jsonb + GIN index on mesh.member.
- handleSend in broker fetches recipient grant maps once per send, drops
messages silently when sender lacks the required capability.
- POST /cli/mesh/:slug/grants to update from CLI; broker_messages_dropped_by_grant_total metric.
- CLI grant/revoke/block now mirror to broker via syncToBroker.
Auto-migrate on broker startup:
- apps/broker/src/migrate.ts runs drizzle migrate with pg_advisory_lock
before the HTTP server binds. Exits non-zero on failure so Coolify
healthcheck fails closed.
- Dockerfile copies packages/db/migrations into /app/migrations.
- postgres 3.4.5 added as direct broker dep.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Email (broker):
- Rebrand mesh-invitation.tsx to match site (clay accent #d97757,
cream fg, Anthropic Serif/Mono, dark bg). Mesh glyph in header.
- Hero CTA links to the /i/short URL landing page.
- Single one-liner 'npm i -g claudemesh-cli && claudemesh launch --join URL'
so new users copy once, paste once, done.
Web InstallToggle:
- Replace two-step numbered list with single one-liner in the first-time
panel. Reduces copy/paste ops from 2 to 1 and stops prescribing
'YourName' as a literal (CLI now defaults to $USER).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the plain-text invite email with a standalone react-email
template (apps/broker/src/emails/mesh-invitation.tsx) using
@react-email/components + Tailwind. Rendered on demand in
handleCliMeshInvite and sent as both HtmlBody and TextBody via
Postmark (or html+text via Resend).
Self-contained — no dependency on @turbostarter/email, i18n, or ui
packages. Adds react, react-dom, @react-email/components, @react-email/render
to broker deps. Enables tsconfig jsx: react-jsx and .tsx includes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Broker now sends the invite email when body.email is provided and
POSTMARK_API_KEY (or RESEND_API_KEY) is configured. Returns
`emailed: boolean` so the CLI can honestly report whether the email
was sent instead of falsely claiming success on link generation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
handleCliMeshCreate now generates ownerPubkey/ownerSecretKey/rootKey so
CLI-created meshes can issue invites. handleCliMeshInvite builds the
full signed v1 payload + v2 capability (matching createMyInvite in
packages/api) and self-heals meshes created by older broker versions
that are missing keys.
Fixes 500 on claudemesh share after CLI mesh create.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- session_id (clm_sess_...) in browser URL — identifies login attempt
- user_code (ABCD-EFGH) visual confirmation — shown in both terminal and browser
- device_code (secret) — CLI polls with this, never displayed
- CLI accepts stdin paste of JWT token while polling (race)
- Web page handles both ?session= and ?code= params
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Drizzle schema: device_code + cli_session tables in mesh pgSchema
- Broker endpoints: POST /cli/device-code, GET /cli/device-code/:code,
POST /cli/device-code/:code/approve, GET /cli/sessions
- Web app API routes now proxy to broker (no in-memory state)
- Tracks devices per user: hostname, platform, arch, last_seen, token_hash
- JWT signed with CLI_SYNC_SECRET, 30-day expiry
- Session revocation support via revokedAt column
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- /meshes and /status now show mesh slug names instead of truncated IDs
- meshSlug cached on connect and loaded from DB join on boot
- Remove dangerous fallback that connected to ALL meshes in email flow
- BridgeRow now includes optional meshSlug field
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dynamic import returns module wrapper, need .default.ready then .default
for the actual sodium functions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Members created by CLI don't have dashboardUserId set. Now searches
by both userId and dashboardUserId columns. Falls back to all meshes
if no member link found (bootstrap case for mesh owners).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The member table has no secretKey column (by design - keys are local).
Email verification now generates a fresh ed25519 keypair and creates
a new bridge-specific member entry for each mesh the user belongs to.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Users can now type /connect in the bot → enter email → receive 6-digit
code → enter code → auto-connect to all meshes linked to that email.
Supports Resend and Postmark email providers via env vars.
Rate-limited to 5 code attempts, 10-min expiry.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Claudemesh MCP server now declares prompts:{} and resources:{} capabilities.
Mesh skills auto-appear as /claudemesh:skill-name slash commands in Claude Code
via prompts/list+get, and as skill://claudemesh/{name} resources for the
upcoming MCP_SKILLS protocol. share_skill accepts optional metadata (when_to_use,
allowed_tools, model, context, agent) stored in the manifest jsonb column.
Change notifications sent on share/remove so Claude Code refreshes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>