Watchdogs (75s stale detect) and lease model (90s grace window for
silent reconnects) both shipped 2026-05-05.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Continuous presence: peers no longer see peer_left/peer_joined for
transient WS reconnects. After a WS close, the connection enters a
90s grace window in offline-leased state. If the same session
reconnects (matched by sessionPubkey, or sessionId+memberPubkey for
member-WS) within grace, it silently swaps the WS reference, restores
online state, drains queued DMs, and resets the DB row. No peer ever
sees the session leave.
Mechanics:
- PeerConn gains leaseState ("online"|"offline"), leaseUntil, evictionTimer
- ws.on("close") starts grace instead of immediate cleanup; old
socket close after a reattach is detected (conn.ws !== ws) and
ignored, since the lease is already healthy on the new socket
- handleHello / handleSessionHello check for offline-leased entry
matching the stable identity BEFORE running session-id dedup;
reattach swaps ws, resets state, returns silent: true
- The hello dispatcher skips peer_joined broadcast when result.silent
- evictPresenceFully extracted from the close handler — runs the
peer_left broadcast + cleanup (URL watches, streams, MCP registry,
clock auto-pause). Called by evictionTimer after 90s, or directly
if lease wasn't online (defensive)
- Stale-pong watchdog skips offline-leased entries (their WS is
intentionally dead during grace)
- broker.ts exports restorePresence(presenceId) — clears
disconnectedAt + bumps lastPingAt, called on reattach to undo any
damage the DB-level stale-presence sweeper may have done during
grace
DMs sent to a session in grace fall through to today's existing
queueing path (sendToPeer no-ops on dead WS, the message_queue row
sits with deliveredAt=NULL, drained on reattach via the existing
maybePushQueuedMessages call). No protocol change. No DB schema
change. Backward compatible — old daemons against this broker get
silent reconnects within 90s, full peer_joined cycle beyond.
Layer 2 of the continuous-presence work; spec at
.artifacts/specs/2026-05-05-continuous-presence.md. Layer 3
(daemon-side resume token storage + send) is optional polish, not
needed for the user-visible behavior.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Both sides now actively detect half-dead WS connections instead of
waiting for kernel TCP keepalive (~2hrs default on Linux). Bug user
reported: "claudemesh peer list" shows zero peers despite running
sessions, because NAT/CGNAT silently dropped the WS flow but neither
side noticed.
Broker (apps/broker/src/index.ts):
- Add lastPongAt to PeerConn, populate at connections.set sites,
bump in ws.on("pong").
- 30s ping loop now also terminates conns whose pong is >75s stale.
ws.terminate() fires the close handler → existing peer_left path.
Daemon (apps/cli/src/daemon/ws-lifecycle.ts):
- Add idle watchdog at 30s cadence, started after hello-ack.
- Bumps lastActivity on incoming message, ping, and pong frames.
- Sends sock.ping() if recent activity, terminates if idle >75s.
- Watchdog cleared on close handler + explicit close().
CLI 1.34.15 → 1.34.16. Broker stays 0.1.0 (deploys from main).
Spec: .artifacts/specs/2026-05-05-continuous-presence.md (full lease
model + resume token, this commit ships only the watchdogs — first
of four progressive layers).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace four April-vintage entries (claudemesh launch v0.1.4, Mesh
Dashboard placeholder, MCP bridge placeholder, "SQLite-backed"
self-host) with the four most recent shipped milestones: kick refuses
control-plane (v1.34.15), 1.34.x multi-session correctness train,
per-session presence (v1.30.0), multi-mesh daemon (v1.26.0). All
entries link to /changelog instead of dead "#" hrefs or the old
github.com/alezmad/claudemesh-cli repo.
Copy passes Strunk: active voice, concrete versions, no puffery.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comprehensive review of all home-page marketing components against
the post-correction positioning. Five surgical fixes, zero hand-waving.
CTA copy. The previous "Anthropic built Claude Code per developer.
The next unlock is between developers." was a strong line in 2025
but Anthropic Agent Teams (Feb 2026) IS now between-developers
within one machine. Replaced with the accurate distinction:
"Anthropic Agent Teams stops at the edge of one laptop. claudemesh
starts there — across machines, users, and organizations."
WhereMeshFits — new "vs. Agent Teams" comparison card. The single
most important card the page can have right now. Most readers
arriving in May 2026 know about Agent Teams; the comparison they
want to read is exactly this one. Also tightened the "What
claudemesh is" claim card to lean into "across machines, users,
orgs" instead of the narrower "peer network for Claude Code"
framing.
FAQ — three updates:
1. "How is this different from MCP?" was claiming "43 tools that
let peers message, share files…" which contradicted v1.5.0's
ship of tool-less MCP (tools/list returns []). Replaced with
the actual current architecture: thin push-pipe + resource-
noun-verb CLI bundled as a skill.
2. New entry "How is this different from Anthropic's Agent
Teams?" — the biggest gap in the FAQ given the new ecosystem.
Same shape as the WhereMeshFits card so the messaging stays
consistent across surfaces.
3. "Can a peer be in multiple meshes?" updated to reflect
v1.26.0's universal multi-mesh daemon (was speaking about it
as roadmap; it's been shipped for ~2 days). Bridge peers
promoted from "v0.2 roadmap" to "shipped in v0.2.0 (v1.6.0)".
4. "Free during public beta" no longer claims paid tiers launch
"when the dashboard ships" — dashboard already shipped (v1.5+
web chat, v1.7 demo cut). Replaced with team-scale features
(SSO, audit retention, dedicated brokers) as the pricing
trigger.
Pricing card — same "dashboard ships" → "team-scale features"
language fix as the FAQ pricing entry. Single source of truth
maintained between FAQ + Pricing card.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The site had drifted ~6 months behind the product. Three problems
addressed in one push:
1. Timeline ("Shipped, not promised") topped out at v0.6–0.8 and
claimed "66 npm releases" — both stale. Adds a v0.9 → 1.34 tier
covering daemon, multi-mesh, multi-session correctness train,
refuse-to-kick on control-plane, env-var fallback. Updates count
to "120+ npm releases through v1.34.15." Rewrites the "next"
block from the now-shipped "Daemon redesign · per-topic
encryption" to the actually-pending "HKDF cross-machine identity
· session capabilities · A2A interop · self-host packaging ·
federation."
2. Hero subhead leaned into the original "Claude Code peer mesh"
framing, which is undercut by Anthropic Agent Teams (Feb 2026,
single-machine native mailbox). Now reframes claudemesh as the
encrypted backbone where Claude Code sessions, autonomous
agents, and humans coordinate "across machines, across users,
across organizations" — the four words that distinguish the
product from anything Anthropic structurally can ship from
inside Claude Code.
3. /changelog had three entries from April 2026 (v0.1.2 → v0.1.4)
and was 70+ versions out of date. Replaced with a curated
16-entry timeline from v0.1.0 → v1.34.15, hand-picked to tell
the story (load-bearing ships, not every patch). Adds links
back to docs/roadmap.md, .artifacts/specs/, and GitHub Releases.
New module: apps/web/src/modules/marketing/home/changelog-data.ts
holds the curated entries as a single source of truth. Imported by
both the /changelog page and a new home-page component
LatestReleases (compact 5-entry strip, slotted between Timeline
and Pricing) so they never disagree.
Misc fixes pulled in:
- timeline.tsx had glyph="layers" which isn't in SectionIcon's
valid set; switched to "grid" (changelog-data.ts uses same).
- changelog data extracted to a non-route module so Next.js's
route-export validator stops complaining about exporting
CHANGELOG_ENTRIES from app/.../changelog/page.tsx.
Pre-existing typecheck noise in packages/ui/web/sidebar.tsx
(csstype version mismatch) + billing modules unrelated to this
change. My files all typecheck clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates the "Known gaps tracked for follow-ups" subsection of the
v1.34.x section to reflect the 2026-05-04 follow-up sprint:
- Gap 1 (stale CLAUDEMESH_CONFIG_DIR) shipped in 1.34.14.
- Gap 2 (peer list --mesh scope) shipped in 1.34.15. Notes the
diagnosis correction — bug was CLI-side, not broker.
- Gap 3 (kick no-op on control-plane) shipped in 1.34.15 as
refuse-with-hint. Richer presence-pause verb deferred.
- Gap 4 (session capabilities) has a written spec at
.artifacts/specs/2026-05-04-session-capabilities.md;
implementation queued behind v0.3.0 topic-encryption.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Spec for the gap #4 follow-up from the 1.34.x triage. Builds on
2026-04-15-per-peer-capabilities.md (member-keyed recipient grants)
by adding a sender-side cap subset on session attestations: parent
member signs {session_pubkey, allowed_caps[], expires_at}, broker
enforces intersection of recipient grants × session caps on every
protected operation.
v2 attestation alongside v1 (different canonical prefix
"claudemesh-session-attest-v2|..." → no collision). Default when
no caps subset is declared = full member caps (today's behavior;
opt-in restriction, not breaking).
CLI surface: claudemesh launch --caps dm,read. Bonus: set_state
gate (state-write cap) ships in the same release — closes the
"any session can clobber shared keys like current-pr" footgun.
Migration: dry-run mode for one release before flipping
enforcement. Mirrors the original per-peer-capabilities rollout.
Estimate: ~1 sprint + 1 week dry-run window.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-ups from the 1.34.x multi-session correctness train,
all backwards-compatible.
1.34.14 — stale CLAUDEMESH_CONFIG_DIR falls back. The launch flow
exposes CLAUDEMESH_CONFIG_DIR=<tmpdir> to its spawned claude; if a
later claudemesh invocation inherited that env (Bash tool inside
Claude Code, tmux update-environment, exported var), the inherited
path pointed at a tmpdir that no longer existed and readConfig()
silently returned empty. paths.ts now memoizes resolution: env unset
→ default; env points at a real dir → trust it; env set but dir gone
→ TTY-only stderr warning with shell-specific unset hint, fall back
to ~/.claudemesh.
1.34.15 — peer list --mesh actually scopes. peers.ts and launch.ts
were calling tryListPeersViaDaemon() with no argument; the daemon's
?mesh= filter (server-side, since 1.26.0) was already correct, the
CLI just wasn't passing the slug. Forwarding fixed in both sites;
send.ts cross-mesh hex-prefix resolution intentionally untouched.
1.34.15 — kick refuses no-op kicks on control-plane. Pre-1.34.15
kicking a daemon's member-WS just closed the socket and triggered
auto-reconnect — a no-op with a misleading "session ended" message.
Broker now skips peers where peerRole === "control-plane" and
surfaces them in a new additive ack field skipped_control_plane;
the CLI reads it and prints a clearer hint pointing at ban / daemon
down. Soft disconnect verb keeps old behavior. PeerConn gains a
peerRole slot populated at both connections.set sites.
Tests: 4 new for paths-stale-env, 5 for kick-control-plane-skip.
CLI 87/87 green; broker 55/55 unit green (integration tests
pre-existing infra failure on this machine).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Seven-ship sequence that took the daemon from "works for one session"
to "internally consistent for N sessions on one daemon." Architecture
invariant after 1.34.13: every shared store / channel scopes by
recipient (SSE demux at bind layer + token forwarding, inbox per-
recipient columns, outbox sender-session routing).
- 1.34.7 inbox flush + delete commands
- 1.34.8 seen_at column + TTL prune + first echo guard
- 1.34.9 broader echo guard + system-event polish + staleness warning
- 1.34.10 per-session SSE demux (SseFilterOptions) + universal daemon
(--mesh / --name deprecated) + daemon_started version stamp
- 1.34.11 inbox per-recipient column (storage half of 1.34.10)
- 1.34.12 daemon up detaches by default (logs to ~/.claudemesh/daemon/
daemon.log; service units explicitly pass --foreground)
- 1.34.13 MCP forwards session token on /v1/events — the actual fix
that activates 1.34.10's demux. Without this header the
daemon's session resolved null, filter was empty, every MCP
received the unfiltered global stream.
Roadmap entry at docs/roadmap.md captures the timeline + the four
known gaps tracked for follow-ups (launch env-var leak, broker
listPeers mesh-filter, kick on control-plane no-op, session caps as
first-class concept).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five info-level log points across the WS lifecycle helper:
ws_open_attempt / ws_open_ok / ws_hello_sent / ws_hello_acked /
ws_closed (with status + close code/reason).
Surfaced during M1 smoke testing — without these the only visible
signal was "presence row missing on broker," which made it hard to
distinguish "WS never opened" / "opened but hello rejected" /
"acked then closed by broker."
Both clients prefix the helper-emitted msg ("session_broker_*",
"broker_*") so log greps stay clean per role.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves the merge of m1-broker-drain-race-and-presence-role and
m1-cli-lifecycle-and-role-peer-list into main:
* Rename wire-level role classification field `role` → `peerRole`
to avoid collision with 1.31.5's top-level `role` lift of
`profile.role` (user-supplied string consumed by the agent-vibes
claudemesh skill). `peerRole` is the broker presence taxonomy
(control-plane/session/service); top-level `role` keeps its 1.31.5
semantics.
- apps/broker/src/broker.ts (listPeersInMesh return)
- apps/broker/src/index.ts (peers_list response)
- apps/broker/src/types.ts (WSPeersListMessage)
- apps/cli/src/commands/peers.ts (PeerRecord + filter + lift)
* Wire CLI client_ack emission: handleBrokerPush gains
ackClientMessage callback; daemon-WS and session-WS each got a
sendClientAck() method that frames {type:"client_ack",
clientMessageId, brokerMessageId?} and forwards via the lifecycle
helper. Run.ts wires the callback into both onPush paths.
Receiver dedupes against existing inbox row first then acks
unconditionally — broker needs the ack regardless of dedupe to
release its claim lease.
- apps/cli/src/daemon/inbound.ts (ackClientMessage in InboundContext)
- apps/cli/src/daemon/broker.ts + session-broker.ts (sendClientAck)
- apps/cli/src/daemon/run.ts (wire-up)
* Version bump 1.32.1 → 1.33.0; CHANGELOG entry replaces "Unreleased"
with full m1 description.
Verification: tsc clean across cli + broker; CLI 83/83 unit tests
pass; broker 50 unit tests pass (5 integration test files require a
live Postgres and were skipped — pre-existing infra gap, not a
regression). CLI bundle rebuilt; version 1.33.0 baked.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Milestone 1 CLI side:
- New apps/cli/src/daemon/ws-lifecycle.ts: connectWsWithBackoff helper
- DaemonBrokerClient + SessionBrokerClient refactored to use the helper
- DaemonBrokerClient: stray sessionPubkey + getSessionKeys() removed
- daemon-WS onPush no longer carries session secret (member-only decrypt)
- IPC send paths now sign with mesh member secret
- peers.ts: filters role==='control-plane' by default; --all opts in;
JSON output exposes role field
NOTE: a follow-up commit on main renames the wire-level field 'role'
to 'peerRole' to avoid collision with 1.31.5's profile.role lift.
Milestone 1 broker side:
- Schema: claimedAt + claimId + claimExpiresAt on message_queue,
role on presence (default 'session')
- Migration 0029_drain_lease_and_presence_role.sql
- drainForMember rewritten for two-phase claim/deliver with 30s lease
- New markDelivered() called on receipt of client_ack
- New sweepExpiredClaims() running every 15s
- handleHello sets role='control-plane', handleSessionHello sets 'session'
- listPeersInMesh returns role
- WSClientAckMessage type added; broker accepts and dispatches client_ack
v1: initial 3-layer architecture proposal, reviewed by Codex GPT-5.2 (high)
v2: full end-state with hybrid P2P data plane, broker as coordination
plane only, 6 layers, 8 architectural milestones, Codex-2 corrections
(at-least-once requires client_ack, service_pubkey explicit, meta
required in v2 envelope, streamId required for stream channel,
explicit revocation flow). v2 is frozen for implementation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three correctness fixes on top of the m1 schema migration:
1) Fix the drainForMember claim-then-push race
----------------------------------------------------------------
Previously the claim CTE set delivered_at = NOW() *before* the WS
send. If readyState !== OPEN at push time, the row was marked
delivered and the message dropped silently — at-most-once with no
retry hook.
The new flow:
- claim sets (claimed_at, claim_id, claim_expires_at = NOW()+30s)
- delivered_at stays NULL until the recipient acks
- re-eligibility predicate now also accepts rows whose lease
expired, so dropped pushes redeliver (at-least-once)
Adds two helpers:
- markDelivered() — scoped to (mesh_id, recipient pubkey) so a
peer can only ack its own messages
- sweepExpiredClaims() — clears expired (claimed_at, claim_id,
claim_expires_at) every 15s, wired into startSweepers
2) Accept `client_ack` from recipients
----------------------------------------------------------------
New WS message type handled in the dispatcher right after `send`.
Lookups by clientMessageId or brokerMessageId; either is fine. Until
the daemon (apps/cli, separate worktree) starts emitting acks, leases
will simply expire and re-deliver — which is the desired retry
behaviour.
3) Tag presence rows with `role`
----------------------------------------------------------------
handleHello (member-keyed, used by the long-lived daemon WS) →
role: 'control-plane'
handleSessionHello (per-Claude-Code session WS) →
role: 'session'
listPeersInMesh exposes the new field; the peers_list response
surfaces it. WSPeersListMessage type adds an optional `role` plus the
long-undocumented `memberPubkey`. CLI-side filter swap from peerType
to role lands in a follow-up worktree — that's why the CLI is
untouched here per the M1 spec.
Typechecks clean (apps/broker tsc --noEmit, packages/db tsc --noEmit).
Test suite needs a real DB so wasn't run in this worktree; existing
dup-delivery and broker tests use drainForMember positionally and the
new claimerPresenceId arg is optional, so they should continue to pass.
Schema groundwork for v2 agentic-comms milestone 1.
mesh.message_queue gets three nullable columns (claimed_at, claim_id,
claim_expires_at) so drainForMember can move from "claim-and-deliver in
one UPDATE" to a two-phase claim/lease + recipient-ack model. This is
the at-least-once retry hook the broker has been missing.
mesh.presence gets a typed `role` column ('control-plane' | 'session'
| 'service') with default 'session' so legacy hellos keep working. The
CLI's hidden-daemon hack (peerType === 'claudemesh-daemon') will swap
to a role-based filter in a follow-up worktree.
Migration is hand-authored as 0029_*.sql to match the existing pattern
(drizzle-kit's _journal.json drifted long ago — the runtime migrator
in apps/broker/src/migrate.ts tracks files lexicographically via
mesh.__cmh_migrations, not the journal).
Foundational cleanups before agentic-comms architecture work
(.artifacts/specs/2026-05-04-agentic-comms-architecture-v2.md).
All behavior-preserving.
1. Extract `connectWsWithBackoff` into apps/cli/src/daemon/ws-lifecycle.ts.
Both DaemonBrokerClient and SessionBrokerClient now share one
lifecycle implementation (connect, hello-handshake, ack-timeout,
close + backoff reconnect). Each client provides its own buildHello
/ isHelloAck / onMessage hooks and keeps its own RPC bookkeeping
(pendingAcks, peerListResolvers, onPush). Composition over
inheritance per Codex's review; no protocol shape changes.
2. Drop daemon-WS ephemeral session pubkey. DaemonBrokerClient no
longer mints + sends a per-reconnect ephemeral keypair in its
hello. Session-targeted DMs land on SessionBrokerClient since
1.32.1, not the member-keyed daemon-WS, so the field was
vestigial. Send-encrypt path now signs DMs with the stable mesh
member secret. handleBrokerPush invocations from daemon-WS only
pass the member secret — session decryption is the session-WS's
job.
3. Role-aware peer list. `peer list` now hides peers whose
broker-emitted `role` is `'control-plane'`. `--all` opts back in.
JSON output emits `role` at top level. Older brokers that don't
emit role yet default to 'session', so legacy peer rows stay
visible without the broker-side change shipped first. Replaces
the prior `peerType === 'claudemesh-daemon'` channel-name hack.
Typecheck + tests + build all green.
SessionBrokerClient (daemon-side, since 1.30.0) was constructed
without a push handler and silently dropped every inbound `push` /
`inbound` frame. Header docstring claimed it handled "inbound DM
delivery for messages targeted at the session pubkey" but the
callback was never wired.
Net effect: any DM sent to a peer's session pubkey (everything
`peer list` returns now) was queued, broker-acked, marked
delivered_at on the broker, and thrown away by the recipient
daemon. inbox.db stayed at zero rows; `claudemesh inbox` reported
"no messages" no matter what arrived.
Two-session smoke surfaced this — sender outbox status=done with
broker_message_id, recipient inbox empty.
Fix: wire SessionBrokerClient to forward push/inbound frames to
the same handleBrokerPush the member-keyed broker already uses.
Pass the per-session secret key as sessionSecretKeyHex so
decryptOrFallback tries it first; member key remains the fallback
for legacy member-targeted traffic.
Verified end-to-end with two registered sessions sending in both
directions — inbox.db row count went 0 → 2.
Files: apps/cli/src/daemon/session-broker.ts,
apps/cli/src/daemon/run.ts. No broker change required.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Nine UX bugs surfaced from a real two-session interconnect smoke
test, shipped together.
Self-identity is visible
- peer list now shows the caller as (this session), sorted to top.
Daemon path resolves session pubkey via /v1/sessions/me so
isThisSession is set correctly warm.
- whoami shows session pubkey, session id, mesh, role, groups, cwd,
pid when run inside a launched session.
Sibling-session disambiguation
- peer list rows carry sid:<short> tag so visually-identical rows
can be told apart at a glance.
Daemon hidden by default
- claudemesh-daemon presence rows hidden from peer list by default.
--all opts back in. Header shows N daemon hidden when applicable.
--self flag works end-to-end
- Argv parser was greedy: --self ate the next arg as its value.
BOOLEAN_FLAGS set in cli/argv.ts now lists known no-value switches.
- message send subcommand now passes self through (only legacy send
was wired before).
- Help text lists --self.
Member-pubkey fan-out
- Sending to your own member pubkey with --self now resolves to every
connected sibling session and sends one message per recipient.
Required because the broker drain matches target_spec only against
full session pubkeys; member-pubkey sends queued but never drained.
Broker welcome at launch
- After the launch banner, one line confirms WS state, peer count,
and unread inbox count. Best-effort — falls back gracefully.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
claudemesh send <16-hex-prefix> would ack with sent to <prefix> (daemon)
but the recipient never received the message. Broker pre-flight and
the drain query both exact-match on full 64-char pubkey, so a prefix
queued successfully but no recipient drain ever fetched the row.
Sender saw sent, recipient saw nothing — silent drop.
Fix: CLI resolves any hex prefix (4-63 chars, not full 64) to the
full pubkey via the daemon peer list before submitting. Outcomes:
- unique match: canonicalize and continue
- no match: clear error + list of online peer display names
- multiple: clear error + candidate list + hint to lengthen prefix
The 16-hex prefix shown in peer list rows is now safe to paste
straight into claudemesh send.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After 1.31.4 the human renderer surfaced role and groups, but launched-
session LLMs still dropped them when they called peer list --json and
built their own tables.
- Top-level role field. The broker returns role nested under
profile.role; the CLI now lifts it to a top-level role field at
parse time so it is the second-most-visible JSON field after
displayName. profile.role is preserved.
- Updated claudemesh skill SKILL.md peer-list section with the full
JSON shape (memberPubkey, sessionId, role, profile, isSelf,
isThisSession) plus explicit guidance to render role + groups in
any peer table inside a launched session.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
claudemesh peer list now surfaces each peer's profile-level role
(set via claudemesh profile) and any joined groups inline next to
the display name, e.g.
● mou [role:lead, @flexicar:reviewer, @oncall] (ai) · 0d215762…
When both are empty, an explicit footer is added so absence is
unambiguous:
● peer [...]
role: (none) groups: (none)
JSON output is unchanged — the broker has been returning profile
and groups all along, only the human renderer was missing the role.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1.31.2 published with the right code change (DAEMON_PATHS no longer
follow CLAUDEMESH_CONFIG_DIR) but a stale baked-in VERSION constant
because the build ran before the version bump. Same fix, rebuilt
cleanly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real production bug observed in 1.31.0 / 1.31.1: every CLI verb from
inside a claudemesh launch-spawned session printed
[claudemesh] warn service-managed daemon not responding within 8000ms
even when the launchd-managed daemon was healthy and answering
direct UDS probes in 10ms.
Root cause: claudemesh launch exports CLAUDEMESH_CONFIG_DIR to a
per-session tmpdir so joined-mesh state and the IPC session token
stay isolated. DAEMON_PATHS read from the same env, so inside a
launched session the CLI looked for daemon.sock at
/var/folders/.../claudemesh-XXXX/daemon/daemon.sock — which never
exists. The CLI declared the daemon down, fell into the service-
managed wait branch, and timed out.
The daemon is a per-machine singleton serving every session; its
files live at ~/.claudemesh/daemon/ regardless of overlays. Pin
DAEMON_PATHS.DAEMON_DIR to that location. New CLAUDEMESH_DAEMON_DIR
override is preserved for tests and multi-daemon dev setups.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1.31.0 introduced a session reaper that called execFileSync(ps) once
per registered session every 5s. With many sessions registered, the
daemon's event loop stalled for hundreds of ms — long enough that
incoming /v1/version probes from the CLI timed out against a healthy
daemon and the new service-managed warning fired.
Fix:
- getProcessStartTime is now async (execFile + promisify); never
blocks the event loop
- New getProcessStartTimes(pids) issues one batched ps for all
survivors instead of N separate forks. Sweep cost is fixed
regardless of session count.
- registerSession stays sync; start-time capture is fire-and-forget
- reapDead is now async; the setInterval wrapper voids it so a
rejected sweep cannot crash the daemon
Behavior is otherwise unchanged from 1.31.0: same 5s cadence, same
PID-reuse guard semantics, same broker-WS teardown via the registry
hook. 83/83 tests still green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The default body-max-length=100 was firing a warning on every
substantive commit because 100 chars total can't fit a real changelog
message. Disabled (level 0). body-max-line-length bumped to 200 so
long URLs / paths / pasted errors don't trip a warning that adds
nothing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three operability fixes for users running the daemon under launchd or
systemd.
PID-watcher autoclean
=====================
The session reaper already dropped registry entries with dead pids on
a 30s loop, but had two real-world gaps:
- 30s sweep let stale presence linger on the broker for half a minute
- bare process.kill(pid, 0) trusts a recycled pid; a registry entry
could survive its real owner's death whenever the OS rolled the
pid number forward to a new program
Process-exit IPC from claude-code is best-effort and skipped on
SIGKILL / OOM / segfault / panic, so it cannot replace the sweep.
Fix:
- New process-info.ts captures opaque per-process start-times via
ps -o lstart= (works on macOS and Linux, ~1 ms per call)
- registerSession stores the start-time alongside the pid
- reapDead drops entries when pid is dead OR start-time changed
since register
- Sweep cadence 30s -> 5s
- Best-effort fallback to bare liveness when start-time capture
fails at register time
Registry hooks already close the per-session broker WS on
deregister, so peer list rebuilds within one sweep of any session
exit.
Service-managed daemon: no more "spawn failed" false alarms
===========================================================
After claudemesh install (which writes a launchd plist or systemd
unit with KeepAlive=true), users routinely saw
[claudemesh] warn daemon spawn failed: socket did not appear
within 3000ms
even when the daemon was running fine. Two contributing causes:
1. Probe timeout was 800ms — the first IPC after a launchd-driven
restart can take longer (SQLite migration + broker WS opens) and
tripped it. Bumped to 2500ms.
2. On a failed probe the CLI tried its own detached spawn, which
collided with launchd's KeepAlive restart cycle (singleton lock
fails, child exits) and we'd then time out polling for a socket
that was actually about to come up.
Now: when the launchd plist or systemd unit exists, the CLI does not
attempt a spawn. It waits up to 8s for the OS-managed unit to bring
the socket up. New service-not-ready state distinguishes "OS hasn't
restarted it yet" from "we tried to spawn and it failed".
Install verifies broker connectivity, not just process start
============================================================
Previously install ended once launchctl reported the unit loaded —
a daemon that boots but cannot reach the broker (blocked :443,
expired TLS, DNS, broker outage) only surfaced on the user's first
peer list or send.
/v1/health now includes per-mesh broker WS state. install polls it
for up to 15s after service boot and prints either "broker
connected (mesh=...)" or a warning naming the meshes still in
connecting state, with a hint at common causes.
The verification is best-effort and does not fail the install — it
just surfaces the issue early.
Tests
=====
4 new vitest cases cover the reaper paths: dead pid, live pid plus
matching start-time, live pid plus mismatched start-time (PID
reuse), and the no-start-time fallback. 83 of 83 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
claudemesh install was baking --mesh <primary> into the launchd plist /
systemd unit, locking the daemon to a single mesh and contradicting
1.26.0's multi-mesh design. users with >1 joined mesh fell off the
daemon path on every non-primary verb (cold-WS fallback, peer list
returning all meshes because the server-side filter ran against zero
attached state, "daemon spawn failed: socket did not appear" from
launched sessions in sibling meshes).
now: meshSlug is optional in InstallArgs; claudemesh install omits it
so the unit runs `claudemesh daemon up` with no flag, which attaches
to every joined mesh. `claudemesh daemon install-service --mesh <slug>`
is preserved as opt-in for single-mesh hosts and CI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
two install-path fixes that bit on first 1.30.0 upgrade:
- pin node by absolute path in launchd plist / systemd unit. shebang's
/usr/bin/env node resolved against the service environment PATH and
picked up system Node 22.x, which lacks node:sqlite (experimental)
→ daemon died with ERR_UNKNOWN_BUILTIN_MODULE. process.execPath now
goes first, so the daemon always runs under the same Node that ran
claudemesh install.
- tear down the old daemon before bootstrapping. claudemesh install on
a machine with an already-running daemon hit Bootstrap failed: 5:
Input/output error (launchctl refuses to re-bootstrap a loaded unit
+ old daemon held the singleton lock). Now we run launchctl bootout
(systemd: systemctl --user stop) first, plus SIGTERM to any orphan
pid in daemon.pid, so subsequent installs replace cleanly.
both fixes apply to darwin and linux paths. windows path is unchanged
— it doesn't have a service-install today (daemon-install-service
errors with "unsupported platform" on win32).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- broker-actions: msg-status section header used out-of-scope `id`
variable; was a real bug (renders "message undefined…" on the JSON
path). Fixed to use the in-scope lookupId.
- exit-codes: add IO_ERROR (10) — referenced in three places by
platform-actions but never declared.
- types/text-import.d.ts: declare wildcard `*.md` module so Bun's
text-import attribute used by skill.ts typechecks.
- ipc/server: cast PeerSummary/SkillSummary through unknown before
spreading into Record<string, unknown>.
- mcp/server: typed JSON.parse for SSE events.
- bridge/daemon-route: import path with .ts → .js (esm).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
paid down the broker's accumulated type debt. zero behavioral changes,
purely type-system tightening:
- broker.ts: row extraction helper for postgres-js result vs pg shape;
findMemberByPubkey defaultGroups null-coalescing.
- env.ts: zod default ordered before transform (zod v4 ordering).
- index.ts: typed JSON.parse for the tg/token, upload-auth, file-upload,
member patch and mesh-settings handlers; export SelfEditablePolicy
from member-api; added bodyVersion to WSSendMessage; added the
disconnect/kick/ban/unban/list_bans message types to WSClientMessage;
String(key) cast for neo4j record symbol-typed keys.
- jwt.ts, paths.ts, telegram-token.ts: typed JSON.parse results.
- service-manager.ts: typed package.json + MCP JSON-RPC reader.
- telegram-bridge.ts: typed WS message handler; missing log import;
null-tolerant BridgeRow + skip rows missing memberId/displayName;
typed e in catch.
- types.ts: bodyVersion on WSSendMessage, manifest on WSSkillData,
five new admin message types (kick/disconnect/ban/unban/list_bans).
- packages/db/server.ts: drizzle constructor positional args + scoped
ts-expect-error for the namespace-bag schema generic mismatch.
apps/broker/src/types.ts will eventually want a real audit pass to
catch every WS verb and surface the orphans, but this clears the path
for 1.30.0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
per-session presence is small and uncomplicated enough that a rollback
flag isn't load-bearing. backwards compat is already covered at the
protocol layer — older brokers reply unknown_message_type to
session_hello and the SessionBrokerClient marks itself closed for that
mesh, which is the same outcome the flag would have given. removing
the flag, the helper, and the conditional from the registry hook.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
flips CLAUDEMESH_SESSION_PRESENCE default to ON. With the broker side
already shipped (the session_hello handler from earlier in this sprint
A wave), every claudemesh launch now gets its own long-lived broker
presence row owned by the daemon and identified by a per-launch
ephemeral keypair vouched by the member's stable key. Two sessions in
the same cwd finally see each other in peer list — the symptom users
have been hitting since 1.28.0 dropped the bridge tier.
Bumps roadmap: 1.30.0 = presence (was queued for 1.30/wizard); the
launch-wizard refactor moves to 1.31.0, setup wizard to 1.32.0, the
mesh→workspace rename to 1.33.0. Verification smoke documented in the
1.30.0 changelog entry.
Rollback: CLAUDEMESH_SESSION_PRESENCE=0 (also accepts "false"/"off").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
claudemesh launch now also generates a per-launch ed25519 keypair and a
parent-vouched attestation (12h TTL), included in the body of POST
/v1/sessions/register under body.presence. The daemon stores it on
SessionInfo and, with CLAUDEMESH_SESSION_PRESENCE=1, opens a long-lived
broker WS so the session has its own presence row.
Also fixes a latent 1.29.0 bug: claudeSessionId was referenced before
its const declaration, hitting the TDZ → ReferenceError silently
swallowed by the surrounding catch. Net: the IPC session-token
registration has been failing every launch since 1.29.0, falling back
to user-level scope for every session. Hoisted the declaration up so
the registration actually runs.
The presence payload is forward-compat: older daemons ignore unknown
body fields, so 1.30.0 CLIs work fine against unupgraded daemons.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
whoami --json exits with EXIT.AUTH_FAILED (=2) when not signed in.
The JSON output is the contract under test, valid regardless of exit
code — execSync was throwing on exit 2 so the assertion never ran.
Switch to spawnSync, accept {0,2}, parse stdout independently.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
daemon-side half of 1.30.0 per-session broker presence. behind
CLAUDEMESH_SESSION_PRESENCE=1 (default OFF this cycle so the broker
side bakes before the flag flips).
- SessionBrokerClient (apps/cli/src/daemon/session-broker.ts) — slim
WS that opens with session_hello, presence-only, no outbox drain.
- session-hello-sig.ts — signParentAttestation (12h TTL, ≤24h cap) and
signSessionHello, mirroring the broker canonical formats.
- session-registry: optional presence field on SessionInfo;
setRegistryHooks for onRegister/onDeregister callbacks. Hook errors
are caught so they can never throttle registry mutations.
- IPC POST /v1/sessions/register accepts the presence material under
body.presence (session_pubkey, session_secret_key, parent_attestation).
Older callers without it stay scoped + supported.
- run.ts wires the registry hooks: on register, opens a SessionBrokerClient
for the matching mesh; on deregister (explicit or reaper), closes it.
Shutdown closes any remaining session WSes before the IPC server.
8 new unit tests cover registry lifecycle (replace/throw/presence
roundtrip) and signature canonical-bytes verification against libsodium.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 1.30.0 daemon-multiplexed presence flow needs a way for the daemon
to open a WS keyed on a per-launch ephemeral pubkey. This commit adds:
- WSSessionHelloMessage in types.ts (additive — older clients still use
WSHelloMessage; older brokers reply with unknown_message_type so newer
clients can fall back).
- handleSessionHello in index.ts: validates parentAttestation (TTL ≤24h,
ed25519 by parent), session signature (skew + ed25519 by session),
parent membership in mesh.member, and parentMemberId/pubkey coherence.
- Inserts a presence row keyed on sessionPubkey but member_id from the
parent — member-targeted operations (revocation, send-by-member-pubkey)
keep working unchanged.
- Broadcasts peer_joined to ALL siblings in the mesh, including the
same-member ones (the regular hello path skips those to avoid self-
spam, but session_hello explicitly wants sibling visibility).
Behavior parity tests will land alongside the daemon SessionBrokerClient.
The unit tests added in the previous commit cover the crypto layer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the crypto primitives the 1.30.0 per-session broker presence flow
needs: canonicalSessionAttestation/canonicalSessionHello bytes, and
verifySessionAttestation/verifySessionHelloSignature with TTL bounds
(≤24h) plus standard ed25519 + skew checks.
10 unit tests cover the hostile cases — expired attestation, over-TTL,
wrong-key signing, tampered fields, and the "attacker captured the
attestation but doesn't hold the session secret key" scenario.
No wire changes yet — types and dispatch land in the next two commits.
Spec: .artifacts/specs/2026-05-04-per-session-presence.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
records the design for daemon-multiplexed broker presence — every
launched claude session gets its own long-lived presence row owned
by the daemon, identified by a per-launch ephemeral keypair vouched
by the member's stable keypair.
resolves the "two sibling sessions can't see each other in peer list"
gap that surfaced when the bridge tier was deleted in 1.28.0. covers
state machine, broker session_hello handler, parent-attestation
signing, ipc route extension, sequencing (broker first, daemon
flagged, cli third), compat with older builds, and verification
smoke.
~440 loc estimate across cli + daemon + broker. queued for 1.30.0
alongside the launch-wizard refactor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
extend the v0.9.x section with a new "v1.26.0 → v1.29.0 — sprint A
toward v2" block listing what each release delivered. trim the
v2.0.0 section to just the remaining HKDF identity work; everything
else from the original v2 spec is now shipped.
queue 1.30.0 (launch wizard), 1.31.0 (setup wizard), 1.32.0 (full
workspace rename) as the explicit remaining items before HKDF
ships as 2.0.0 in its own sprint.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
every claudemesh launch-spawned session now mints a 32-byte random
token, writes it under tmpdir (mode 0600), and registers it with the
daemon. cli invocations from inside that session inherit
CLAUDEMESH_IPC_TOKEN_FILE in env, attach the token via Authorization:
ClaudeMesh-Session <hex>, and the daemon resolves it to a SessionInfo.
server-side: every read route that filters by mesh now uses meshFromCtx —
explicit query/body wins, session default fills in when missing. write
routes follow the same pattern.
cli-side: peers.ts (and other multi-mesh-iterating verbs in future)
prefers session-token mesh over all joined meshes when the user didn't
pass --mesh explicitly.
backward-compatible in both directions — tokenless callers behave
exactly as before. registry is in-memory; daemon restart loses it but
the 30s reaper handles dead pids and most callers re-register on next
launch.
verified end-to-end: peer list with token returns 4 prueba1 peers,
without token returns 3 meshes' peers (aggregate).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drop the orphaned bridge tier (~600 LoC). client/server/protocol
files deleted; tryBridge had returned null in production for seven
releases since the 1.24.0 mcp shim rewrite stopped opening the
sockets. each verb now has two paths: daemon (with 1.27.3's
auto-spawn) → cold ws.
add per-process daemon policy: --strict (error instead of cold
fallback) and --no-daemon (skip daemon entirely). enforcement at
withMesh so a single chokepoint covers every verb. env equivalents
CLAUDEMESH_STRICT_DAEMON / CLAUDEMESH_NO_DAEMON. flag wins.
net -394 loc; the daemon-up case ships ~600 loc lighter and the
fallback story is one tier simpler. first sprint A drop; per-session
ipc tokens and the wizard refactors follow in 1.29.0+.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
every daemon-routed verb now probes the ipc socket via /v1/version
(instead of trusting existsSync), cleans up stale sock/pid files left
by a crashed daemon, and auto-spawns a detached `claudemesh daemon up`
under a file-lock when the daemon is down. polls for liveness up to a
budget (3s for ad-hoc verbs, 10s for launch) before falling through to
cold path.
includes a per-process result cache (script doing 50 sends pays spawn
cost at most once), a 30s recently-failed marker (no thundering-herd
retries on crash-loop), a spawn-lock (concurrent invocations share one
attempt), and a recursion guard env var (nested cli calls inside the
daemon process skip auto-spawn).
fixes the stale-socket bug where launch's ensureDaemonRunning returned
early on a left-over socket file from a crashed daemon, silently
breaking the spawned claude session's mcp shim.
deferred to 1.28.0: --strict / --no-daemon flags, lazy-loading of
cold-path code, per-session ipc tokens.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
adds a kitchen-sink "every flag set explicitly" recipe under
wizard-free spawn templates, with a per-position annotation table.
agents copy this verbatim instead of stitching flags from the table
when spawning unattended sessions.
corrects two stale items: --system-prompt forwards to claude
--system-prompt (not --append-system-prompt), and -q is currently a
no-op (only --quiet is wired).
flags the 1.27.1 cutoff: all twelve launch flags are only end-to-end
wired from that version on; older builds silently dropped half of them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
six flags declared on `LaunchFlags` were silently dropped at the CLI
layer — `--role`, `--groups`, `--message-mode`, `--system-prompt`,
`--continue`, and `--quiet`. each was honored inside `runLaunch` if it
arrived, but the four call sites in the entrypoint forwarded a hardcoded
5-key subset.
now forwarded at every entry: bare command, bare invite URL, the
launch/connect verb, and the new workspace launch alias. pure plumbing;
no behaviour change for users who weren't passing these flags.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
extend the daemon thin-client surface to two more verb families: state
get/set/list now routes through `/v1/state`, and remember/recall/forget
through `/v1/memory`. same warm-path pattern as 1.25.0 — try the unix
socket first, fall back to the cold ws path when the daemon is absent.
multi-mesh aware (aggregates on read, requires `--mesh` for writes
when ambiguous).
also ships an early `claudemesh workspace <verb>` alias surface — bare
teaser for the 1.28.0 mesh→workspace public rename. no-arg falls
through to launch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 1.26.0 step that finally delivers ambient mode for multi-mesh
users. Daemon holds Map<slug, DaemonBrokerClient>; one process, one
PID per user, all your meshes online concurrently.
run.ts: claudemesh daemon up with no --mesh attaches to every joined
mesh from config. --mesh <slug> still scopes to one (legacy mode).
The daemon_started log line reports meshes: [...] instead of mesh.
drain.ts: dispatches each outbox row to the broker keyed by row.mesh
(column added in 1.25.0). Legacy rows with mesh=NULL fall back to the
only broker if there's exactly one, otherwise mark dead with a clear
error.
ipc/server.ts:
- GET /v1/peers aggregates across all attached meshes; each peer
record gains a mesh field. ?mesh=<slug> narrows server-side.
- GET /v1/skills aggregates similarly; /v1/skills/:name walks meshes
and returns first match.
- POST /v1/send requires mesh field on multi-mesh daemons; auto-picks
on single-mesh; returns 400 with attached list if ambiguous.
- POST /v1/profile accepts optional mesh; without it, fans out to all
attached meshes (consistent presence).
CLI: trySendViaDaemon now forwards expectedMesh as the body's mesh
field (was informational, now authoritative). claudemesh send
--mesh A and --mesh B from the same shell both route to the right
broker via the same daemon process.
Verified: aggregated peer list across 3 attached meshes; cross-mesh
sends from CLI reach status=done with correct broker_message_ids.
Released as 1.26.0 on npm.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Daemon outbox now stores resolved target_spec + crypto_box ciphertext
+ nonce per row. Drain worker is a forwarder; no per-row resolution at
drain time. Outbound routing is no longer a placeholder.
Schema additions (additive, NULL allowed for legacy rows): outbox.mesh,
target_spec, nonce, ciphertext, priority. v0.9.0 rows keep draining via
the broadcast fallback so existing in-flight rows finish cleanly.
IPC /v1/send resolves the user-friendly to (display name, hex prefix,
full pubkey, @group, *, #topicId) into a broker-format target_spec at
accept time. DMs encrypt via crypto_box; broadcast/topic/group base64
the plaintext. Hex prefixes (16+ chars) match against connected peers.
CLI thin-client routing extends trySendViaDaemon pattern to peer list
and skill list/get. Three new helpers in services/bridge/daemon-route.ts.
SKILL.md gains ambient mode section: after claudemesh install, raw
claude works for the daemon's attached mesh. Launch stays as the
override path.
Spec at .artifacts/specs/2026-05-04-v2-roadmap-completion.md orders
the remaining v2.0.0 work: multi-mesh daemon (1.26), CLI-to-thin-client
(1.27), mesh-to-workspace rename (1.28), HKDF identity (2.0).
Released as 1.25.0 on npm.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The architectural convergence v0.9.0 was building toward. CLI keeps
working without a daemon (claudemesh send/peer/inbox/...), but the MCP
push-pipe — which Claude Code uses for mid-turn channel emits, slash
commands, and resources — now requires the daemon. There is no fallback.
Daemon (additive):
- /v1/skills (list) and /v1/skills/:name (get) IPC endpoints, so the
MCP shim can surface mesh skills without holding its own broker WS.
- listSkills() / getSkill() on DaemonBrokerClient.
- SSE 'message' event now carries plaintext body, sender_member_pubkey,
priority, and subtype — full payload the MCP shim needs to render a
channel notification.
MCP server: 979 → 469 LoC (470 of the remaining 469 is the unrelated
mesh-service proxy mode; the push-pipe path is ~200 LoC including
boilerplate).
- Probes ~/.claudemesh/daemon/daemon.sock at boot. Bails loudly with
actionable instructions if missing.
- Subscribes to /v1/events SSE and translates each event into a
notifications/claude/channel emit.
- Fetches mesh skills from the daemon for ListPrompts/GetPrompt and
ListResources/ReadResource. ListTools returns []; the CLI is the API.
- No broker WS, no decryption, no reconnect logic. Daemon owns all of it.
claudemesh install: auto-installs and starts the daemon service for the
user's primary mesh (launchd / systemd-user). Pass --no-service to skip.
claudemesh launch: probes the daemon socket; if absent, spawns
'claudemesh daemon up --mesh <slug>' detached and waits up to 10s for
the socket. Surfaces a clear warning on timeout but doesn't block —
Claude Code's MCP shim will print the same error if the daemon really
isn't there.
Bundle: dist/entrypoints/mcp.js drops from 154KB → 104KB (gzipped 34KB
→ 19KB). Test: MCP boots cleanly via stdio, declares correct
capabilities, talks JSON-RPC; daemon /v1/skills returns the empty list
as expected on a mesh with no skills.
Released as 1.24.0 on npm.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the last functional gaps where the MCP tool registry exposed
write verbs the CLI didn't:
- vault set <k> <v> [--type env|file --mount <path> --description ...]
Client-side crypto_secretbox_easy with a fresh symmetric key sealed
to the member's own pubkey via crypto_box_seal — same pattern used
for file shares. Pairs with the existing vault list/delete.
- watch add <url> [--label --interval --mode --extract --notify-on]
Pairs with watch list/remove.
- webhook create <name> — pairs with webhook list/delete.
Cleanup: deletes 22 stub files under apps/cli/src/mcp/tools/* plus
router.ts, middleware/, handlers/ (~120 LoC). These were FAMILY/TOOLS
metadata-only re-exports left over from before the 1.5.0 tool-less
push-pipe flip; nothing imports them. The legitimate MCP surfaces
stay: the inbound <channel> push pipe, mesh skills as prompts and
skill:// resources, and the mesh-service proxy mode.
Released as 1.23.0 on npm.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Root --help now lists the daemon subcommand suite (was missing).
- claudemesh daemon (no subcommand) prints a usage block instead of
silently launching the foreground daemon. Adds help|--help|-h aliases.
- SKILL.md gains a "Daemon path (v0.9.0, opt-in, fastest)" section
explaining the runtime, lifecycle, and that it's independent from
claudemesh install.
Released as 1.22.1 on npm.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Bump apps/cli/package.json to 1.22.0 (additive feature: claudemesh
daemon long-lived runtime).
- CHANGELOG entry for 1.22.0 covering subcommands, idempotency wiring,
crash recovery, and the deferred Sprint 7 broker hardening.
- Roadmap entry for v0.9.0 daemon foundation right above the v2.0.0
daemon redesign section, so the bridge release is documented as the
shipped step toward the larger architectural shift.
- Move shipped daemon specs (v1..v10 iteration trail + locked v0.9.0
spec + broker-hardening followups) from .artifacts/specs/ to
.artifacts/shipped/ per the project artifact-pipeline convention.
Not in this commit: npm publish and the cli-v1.22.0 GitHub release tag
— both are public-distribution actions and require explicit user
approval.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Additive plumbing for v0.9.0 daemon spec §4.2/§4.4. Adds two nullable
columns to mesh.message_queue — client_message_id (caller-supplied) and
request_fingerprint (canonical sha256 of the send shape) — and threads
them through the broker:
- handleSend reads them off the wire envelope when present
- queueMessage persists them on the row
- drainForMember projects them onto the push so receiving daemons
can dedupe their local inbox by client_message_id
Columns stay nullable so legacy traffic (launch CLI, dashboard chat)
continues to flow uninterrupted. Sprint 7 (broker hardening) will add
the partial unique index and the client_message_dedupe atomic-accept
table once we're ready to enforce dedupe broker-side.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mesh.slug actually carries a UNIQUE constraint (mesh_slug_unique)
even though the schema comment claimed otherwise. Trying to rename
to a slug another mesh already owns blew up as a generic 500.
Now: caught at the route, surfaced as 409 with body
{"error":"slug \"<x>\" is already taken"}; CLI maps it to
EXIT.ALREADY_EXISTS and prints the message.
Schema comment corrected to match DB reality.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-launch fix: every visible surface already keyed on slug, so
"name" was a parallel string that only existed to confuse users
on rename ("I renamed but nothing visible changed").
Now slug IS the identifier. claudemesh rename <old> <new> is the
whole rename surface. PATCH /api/cli/meshes/:slug body becomes
{ slug } and the route writes both columns to keep them in sync.
Mesh create derives slug from input.name and stores name = slug.
Pickers drop the (parens). The claudemesh slug verb shipped 30
min ago is removed — merged into rename.
The mesh.name DB column stays for now to avoid touching ~25
reader sites; a follow-up migration drops it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The launch welcome flow's menuSelect was rendering opts.meshes.map(
m => m.slug) — so even after rename writes the new name to local
config, the picker still only showed the slug. Renders as
"name (slug)" when they differ; falls back to slug alone when
they match (default for never-renamed meshes).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Slugs are not globally unique (mesh.id is canonical) so the route
only validates the regex and updates the row. CLI refuses a local
collision (two joined meshes sharing a slug would make the picker
ambiguous) and rewrites ~/.claudemesh/config.json on success.
Other peers pick up the new slug on next claudemesh sync.
Server: PATCH /api/cli/meshes/:slug body now accepts { name?, slug? }
— same route, just optional both fields.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After renaming the mesh display name on the server, the launch
picker still showed the slug ("flexicar-2") because (a) local
config.json was not updated and (b) the picker only printed
mesh.slug. Now: rename writes the new name back into config.json
on success, and the picker prints "name (slug)" when they differ.
Also surfaces a hint that slugs are immutable (today).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The rename route was collapsing "mesh doesn't exist" and "exists
but you don't own it" into a single 404 with body
{"error":"mesh not found or you are not the owner"}, and the CLI
was throwing that body away — the user only saw "API error 404:
Not Found", which is actively misleading when they have multiple
accounts and signed in to the wrong one.
Server: separate lookup-then-update. 404 only when the slug is
missing; 403 with an actionable message when the caller is not
the owner.
CLI: parse the {error} body off ApiError and print it instead of
the bare statusText. Map status codes to specific exit codes
(401 -> AUTH_FAILED, 403 -> PERMISSION_DENIED, 404 -> NOT_FOUND).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Coolify's last deploy reused the cached image — the new
/api/cli/meshes/[slug] route never made it into .next/server.
Adding force=true to the deploy API call so Coolify rebuilds
from the current commit instead of replaying the cache.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The /api/my/meshes/:slug PATCH route was never implemented and
better-auth's enforceAuth middleware can't validate the CLI's
device-code JWT (signed with CLI_SYNC_SECRET, not a better-auth
session). Adds /api/cli/meshes/:slug on the web app — verifies
the HS256 JWT inline, scopes the rename to (slug, ownerUserId).
CLI now calls the new path. Mirrors the cli-sync-token pattern.
Closes the "API error 401: Unauthorized" hit after a successful
claudemesh login.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two new CLI verbs for the file-sharing surface that already existed
on the broker (HTTP /upload + WS get_file/list_files) but was only
reachable through MCP-style docstrings referencing tools that do
not in fact exist:
claudemesh file share <path> [--to peer] [--message "..."]
claudemesh file get <id> [--out path]
Same-host fast path: when --to resolves to a session on the same
hostname, skip MinIO and DM the absolute filepath. The receiver
reads it off disk directly. No bucket roundtrip, no 50 MB cap.
Falls back to encrypted upload when the peer is remote or --upload
is set.
Routes the same-host DM by session pubkey, not displayName, so
sibling sessions of the same member do not trip the v0.5.1
self-DM guard.
Updates the bundled SKILL.md and the MCP server instructions to
reference the real CLI verbs instead of the fictional share_file()
/ get_file() tool calls.
Also: rename.ts now distinguishes mesh-membership from web-account
auth and points users at claudemesh login + the dashboard rather
than emitting a bare "Not signed in".
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Zero-install access to the protocol reference: a fresh `npm i -g
claudemesh-cli` user (or someone running the prebuilt binary) can
now `claudemesh skill | claude --skill-add -` without copying any
files into ~/.claude/skills. The skill markdown is embedded into
the CLI bundle at build time via Bun's text-import attribute.
Also replaces two `<> ALL(...)` raw SQL fragments in the dashboard
unread-count queries with drizzle's notInArray() helper — matches
the same fix already applied to /v1/me/topics in the API package.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
closes the "DM looped back to my own inbox" footgun.
what was happening: peer list returns one row per presence,
including the caller's own session AND its sibling sessions.
the cli filtered out the exact-session row but left siblings
unlabeled — copying their pubkey from peer list silently
targeted your own sibling, and the message arrived in "your
own inbox" because the sender was you.
fix is two-part.
(1) peer list — tag rows whose memberPubkey matches the
caller's stable JoinedMesh.pubkey:
● displayName (this session) — the exact session running
the cli call
● displayName (your other session) — sibling session of
your own member
visually identical otherwise; just the marker.
(2) claudemesh send — refuse a target that exactly matches the
caller's own member pubkey on the mesh, with a hint pointing
at --self for the rare intentional sibling-DM case.
both changes additive — existing scripts that pass display
names or other peers' pubkeys behave identically.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
the previous form had drizzle render the date param as a js
toString() value which postgres rejected (Fri Apr 03 2026
GMT+0000 doesn't parse as timestamp without help). fix:
serialize to iso then cast ::timestamp inside the sql tag.
simplified the where clause too — the prior conditional dance
emitted "status != completed" three times redundantly. one
"completed_at IS NULL OR > window" covers active + recent-done
in one clause; status filtering happens client-side via the
existing statusSet pass.
also cleans up the debug probe scaffolding.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ships v0.5.0 phase 2.
api: three new aggregator endpoints for the per-mesh subsystems
that didn't have one yet.
- GET /v1/me/tasks — open + claimed by default; ?status=all
surfaces completed (30d window). sorted open > claimed > done.
- GET /v1/me/state — every (key, value) row across the user's
meshes, sorted by recency. ?key=foo filters to one key.
- GET /v1/me/memory?q=... — ilike on content + tags, no q
returns the last 30 days. excludes forgotten rows.
cli (1.16.0): task list, state list, recall now route through
the matching aggregator when --mesh is omitted. --mesh foo
still scopes to one mesh (existing behavior preserved).
with this, every per-mesh read verb in the cli either has a
cross-mesh aggregator or doesn't need one. v0.5.0 substrate is
complete.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ships v0.5.0 phase 1.
omitting --mesh on these read verbs now routes through
/v1/me/topics and /v1/me/notifications instead of prompting
the user to pick a mesh. behavior preserved for explicit
--mesh foo.
implementation: resolveMeshForMint helper in commands/me.ts
silently picks the first joined mesh for apikey-mint when
flags.mesh is null. /v1/me/* endpoints resolve the user from
the apikey issuer regardless of which mesh issued the key, so
mint location is irrelevant — only the user identity matters.
help text updated to reflect the new default.
phase 2 (task list, state list, memory recall) needs /v1/me/*
aggregator endpoints first; deferred.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ships v0.4.0 phase 5 — final aggregating verb. v0.4.0 substrate
is complete after this.
api: GET /v1/me/search?q=... matches against topic names +
sender display names + v1 message snippets (base64 decode then
ilike). v2 ciphertext matches only on topic/sender — server has
no topic keys. 30-day window on messages, capped at 50 hits per
category.
cli (1.14.0): claudemesh me search <query> renders topic + msg
sections with inline yellow highlighting. min 2 chars; --json
returns the raw response.
web: /dashboard/search adds an autofocused input + mark
highlighting on every match site (topic name, sender, snippet).
sidebar gets a search entry between activity and invites.
roadmap: phase 5 marked shipped, v0.5.0 default-aggregation
behavior added as the natural next track.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ships v0.4.0 phase 4. final aggregating verb after this is
me search (phase 5).
api: GET /v1/me/activity returns topic messages across every
mesh the user belongs to in a 24h default window (?since=iso
override), excluding messages the caller authored themselves.
"what is happening that i missed", capped at 200.
cli (1.13.0): claudemesh me activity prints a condensed feed
with mesh + topic + sender + relative timestamp + snippet (or
[encrypted] for v2 ciphertext).
web: /dashboard/activity clusters consecutive messages from the
same topic into thread blocks for readability. sidebar gains an
activity entry between notifications and invites.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ships v0.4.0 phase 3.
api: GET /v1/me/notifications aggregates the mesh.notification
table across every joined mesh in a 7-day window (?since=iso
overrides, ?include=all surfaces already-read). returns sender +
topic + mesh context plus a 240-char snippet for v1 plaintext
messages or raw ciphertext for v2 (the dashboard topic-key cache
decrypts client-side).
cli (1.12.0): claudemesh me notifications — terse unread feed
with @ dot, --all to include read, --since for custom window.
web: /dashboard/notifications mirrors the cli view in card form,
adds a notifications entry to the dashboard sidebar between
topics and invites. each card links straight to the topic chat.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
new workflow joins the tailnet via tailscale oauth then triggers
the coolify deploy endpoint. path filter scoped to web app + every
package transpiled into it, so broker/cli/docs changes skip it.
concurrency group coalesces rapid pushes.
requires three repo secrets: COOLIFY_TOKEN, TS_OAUTH_CLIENT_ID,
TS_OAUTH_SECRET (the OAuth client needs the devices:write scope and
the tag:ci tag in tailnet ACL tagOwners).
inline coolify token removed from CLAUDE.md — it now references
the repo secret. broker deploy is unchanged: it runs through the
gitea-vps webhook.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
roadmap entry for the me-topics + dashboard-topics ship.
claude.md gets the long-overdue note that apps/web is on coolify
on the ovh vps, not vercel — it does not auto-deploy on push to
gitea-vps the way the broker does, and that mismatch cost a
session of debugging. records the manual deploy command so the
next time we ship a web change we don't rediscover the issue.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
restores api + cli to clean state after isolating the v0.4.0
phase 2 deploy issue (web app needed an explicit coolify deploy
trigger — it does not auto-deploy from gitea-vps push the way the
broker does).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
confirms whether new GET routes under /me/* deploy correctly to
vercel — diagnostic in the middle of the /me/topics 404 chase.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
the sql.join() form of NOT IN crashed the route handler before
it could respond — vercel surfaced the crash as a plaintext 404
instead of going through hono's exception handler. switching to
drizzle's notInArray() / inArray() emits stable parameter
bindings and resolves both /v1/me/topics (fresh endpoint) and
/v1/topics (older endpoint with the same ANY() pattern bug).
also cleans up debug instrumentation that was added while
chasing the 404.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ships v0.4.0 phase 2: a cross-mesh topic feed.
api: GET /v1/me/topics aggregates topics across every mesh the
caller belongs to with per-topic unread counts (vs the user's
member-row last_read_at) and last-message timestamps. Sorted by
last activity.
cli (1.11.0): claudemesh me topics renders the feed; --unread
filters to topics with pending reads; --json returns raw.
web: /dashboard/topics ssr's the same view server-side (direct
db queries, no apikey-mint roundtrip) and adds a Topics entry
to the dashboard sidebar between Meshes and Invites.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
bun build --compile in the cli release workflow couldn't resolve
@claudemesh/sdk because dist/ never gets built (--ignore-scripts).
adding exports.bun -> ./src/index.ts lets bun consume the typescript
sources directly while npm consumers keep using dist/.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
split the v0.4.0 entry into phase 1 (the me/workspace endpoint
+ verb that just shipped in CLI 1.10.0) and phase 2+ (remaining
me topics/notifications/activity/search verbs).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
count distinct members with disconnectedAt is null instead of
all presence rows — a member can have many sessions, plus stale
rows from prior runs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drizzle's sql template literal interpolated meshIds as a tuple
(($1, $2, $3, ...)) instead of an array, breaking ANY() and
returning HTTP 500. inArray() emits the right binding shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes the last gap from phase 3.5: web-created topics start as v1
plaintext (mutations.ts ensureGeneralTopic doesn't generate a key,
because the dashboard owner has a throwaway pubkey with no secret).
Once the browser identity is registered via /v1/me/peer-pubkey, the
chat panel can lazily upgrade the topic to v2.
API (POST /v1/topics/:name/claim-key)
- Atomic claim: only succeeds when topic.encrypted_key_pubkey IS
NULL. Body carries the new senderPubkey + the caller's sealed copy
of the freshly-generated topic key. Race losers get 409 with the
winning senderPubkey so they fall through to the regular fetch
path. Idempotent at topic_member_key level.
Web
- claimTopicKey() in services/crypto/topic-key.ts: generates a fresh
32-byte symmetric key, seals for self, POSTs the claim. Returns
the in-memory key so the caller can encrypt immediately without a
follow-up GET /key round-trip.
- sealTopicKeyFor(): mirrors the CLI helper so a browser holder can
re-seal for newcomers (CLI peers, other browsers) instead of the
topic going dark when only a browser has the key.
- TopicChatPanel: when keyState === "topic_unencrypted", composer
now shows a "🔓 plaintext (v1) — encryption not yet enabled" line
with an "enable encryption" button. Click → claimTopicKey → state
flips to "ready" → 🔒 v0.3.0 banner appears. On race-lost, falls
through to fetch.
- New 30s re-seal loop fires while holding the key: polls
/pending-seals, seals via sealTopicKeyFor for each pending target,
POSTs to /seal. Same cadence + soft-fail discipline as the CLI.
Net effect: any dashboard user can convert legacy v1 topics to v2
with a single click, and CLI peers joining later will receive a
sealed copy from the browser's re-seal loop without manual action.
Adds a 409 not_web_member guard to POST /v1/me/peer-pubkey: the
endpoint will only rewrite peer_pubkey on members that have
dashboard_user_id set. CLI members own their on-disk keypair —
overwriting their stored peer_pubkey would break the next WS hello
because the signature verification would fail against the new
pubkey.
In practice this restriction is invisible to the legitimate browser
flow: the dashboard always mints its apikey against the web member
(dashboard_user_id is non-null by construction in mutations.ts).
Guard ensures a misuse (e.g. a CLI-minted apikey being used to call
peer-pubkey) gets a clear 409 instead of silently breaking the CLI's
auth.
Discovered during phase 3.5 smoke when a CLI-minted apikey clobbered
the only openclaw member (CLI-owned) and the user's CLI signature
would have stopped verifying on the next launch.
Closes the v1-vs-v2 split between CLI and dashboard. The web chat
panel now reads and writes the same crypto_secretbox-under-topic-key
ciphertext that CLI 1.8.0+ writes — every encrypted topic finally
renders correctly from the browser.
API
- POST /v1/me/peer-pubkey replaces the throwaway pubkey that
mutations.ts mints at mesh-create time with one whose secret the
browser actually holds. Idempotent; auth via the dashboard apikey
whose issuedByMemberId is the row to update.
Web
- apps/web/src/services/crypto/identity.ts — IndexedDB-backed
ed25519 identity, lazy-init on first use. Generates once per
browser-profile; survives reload. ed25519 → x25519 derivation for
crypto_box decrypt. Module-cached after first call.
- apps/web/src/services/crypto/topic-key.ts — mirrors the CLI
topic-key service. Fetches GET /v1/topics/:name/key, decrypts the
sealed copy with our x25519 secret, caches the 32-byte symmetric
key in-memory keyed by (apikey-prefix, topic). encryptMessage /
decryptMessage map directly onto crypto_secretbox{,_open}.
- apps/web/src/modules/mesh/topic-chat-panel.tsx — on mount:
registers our pubkey, fetches the topic key, polls /key every 5s
while not_sealed (matching the CLI's 30s re-seal cadence). Render
branches on bodyVersion: v2 -> decrypted-cache, v1 -> legacy
base64. Send branches: encrypts under the topic key when key is
ready, falls back to v1 plaintext on legacy or not-yet-sealed
topics. Composer shows a 🔒 v0.3.0 / "waiting for re-seal" badge.
Adds libsodium-wrappers + @types to apps/web. Browser bundle picks
up its own copy; the existing CLI/broker/API copies are untouched.
Threat model: IndexedDB is per-origin and not exfiltratable from
other sites; XSS or a malicious extension still wins, same as for
any browser-stored secret. Documented divergence from the CLI's
~/.claudemesh-stored keypair in the identity module's preamble.
Adds Windows Terminal (wt.exe new-tab + split-pane), PowerShell
Start-Process, cmd.exe start, and WSL routing examples to the
"Spawning new sessions" section. Plus the platform's gotchas:
single-quote nesting in cmd.exe, -NoExit semantics, WSL ~/.claudemesh
path-vs-host divergence, and pwsh / --profile selectors for Windows
Terminal. Bumps CLI to 1.9.5.
Adds a "Spawning new sessions (no wizard)" section to the bundled
claudemesh skill. Documents every flag of `claudemesh launch`
(--name, --mesh, --join, --groups, --role, --message-mode,
--system-prompt, --resume, --continue, -y, -q, plus -- pass-through),
shows wizard-free spawn templates from minimal to cold-start-with-
join, and the canonical pane-creation primitives (tmux send-keys,
iTerm2 osascript, Terminal.app, gnome-terminal, screen) that wrap
the verb when spawning into a fresh terminal pane or window.
Closes the gap where Claude knew the verb existed but had no
playbook for "how do I start another peer in a new pane without an
interactive prompt firing." Bumps CLI to 1.9.4 so the skill ships
on `claudemesh install`.
Adds apps/cli/src/cli/validators.ts — a small module of shape
validators (pubkey, pubkey prefix, message id, mesh slug) that return
discriminated results so callers can distinguish "shape is wrong"
(INVALID_ARGS exit) from "value is well-shaped, lookup failed"
(NOT_FOUND exit). Includes renderValidationError() for a consistent
three-tier error contract: what's wrong, what would be valid, closest
valid alternative.
First adopter is `claudemesh msg-status`:
- Validates id locally before opening WS — typos return immediately.
- Accepts 8-32 char prefixes (full ids are 32). Pastes that get
copy-truncated by the terminal still work.
- Distinct error messages for malformed input vs not-in-queue vs
ambiguous prefix; --json emits the structured shape.
Broker side: WS message_status handler validates idStr is 8-32
base62 before querying. Prefix lookups use LIKE 'prefix%' scoped to
the caller's mesh (no cross-mesh leak). Returns ambiguous_prefix
when more than one match.
Establishes the canonical pattern; rolling out to send / grant /
revoke / topic post --reply-to in subsequent patches.
Help text was a wall of monochrome ASCII. Now section headers print
bold-clay, the program title is brand-orange, each verb's syntax is
tinted cyan, and `(alias: ...)` parentheticals are dimmed so they
read as secondary metadata. The styles helper already gates on TTY +
NO_COLOR, so non-interactive output stays unchanged.
Adds .artifacts/specs/2026-05-02-workspace-view.md — the v0.4.0
spec for a per-user virtual workspace that aggregates reads across
all joined meshes while keeping writes mesh-scoped. Roadmap entry
added under v0.3.0.
Previously POST /v1/messages returned the message_queue row id as
`messageId`. Topic posts ARE durable (in topic_message); the queue
entry drains on delivery. Pasting that id into `--reply-to` failed
because the broker validates parents against topic_message, not the
queue. Now `messageId` aliases `historyId` for topic posts; both
`historyId` and `queueId` remain available as explicit fields.
Roadmap and CLI README updated with v0.3.1 reply-to + v0.3.2
multi-session entries.
Two related bugs surfaced in multi-session production use of 1.8.0:
1. Replies via `claudemesh send <from_id>` rejected with "no connected
peer for target" when the original sender's session had rotated
(Claude Code restart, /resume). Root cause: from_id carried the
ephemeral session pubkey, which disappears the moment the session
ends. Fix: handleSend pre-flight now also resolves the target
pubkey against the persistent meshMember table and routes to the
owning member's live session(s); MCP push channel now sets from_id
to the stable member pubkey and exposes the ephemeral one under
from_session_pubkey.
2. Broadcast/* and @group sends loopback'd to the sender's *sibling*
sessions (same member, different session keypair), surfacing a
spurious "tampered or wrong keypair" decrypt warning on the
sender's own inboxes. Fix: broadcast/group fan-out now skips by
memberPubkey, not just by presence_id, so the entire sender member
is excluded — direct sends keep per-presence skip so a member can
still DM their own sibling session intentionally.
Push envelope now also carries senderMemberPubkey alongside
senderPubkey so any other client of the WS channel can choose the
right one.
Adds a reply_to_id column (self-FK on topic_message) plus end-to-end
plumbing so a message can mark itself as a reply to a previous one in
the same topic.
- Schema: 0027_topic_message_reply_to.sql adds reply_to_id with
ON DELETE SET NULL + index for backlink lookup.
- Broker: appendTopicMessage validates parent shares the topic, writes
reply_to_id; topicHistory + topic_history_response surface it; WS
push envelope now carries senderMemberId, senderName, topic name,
reply_to_id, and message_id so recipients have everything they need
to reply without a follow-up query.
- REST: POST /v1/messages accepts replyToId (validated server-side);
GET /messages and SSE /stream emit it per row.
- CLI: \`topic post --reply-to <id|prefix>\` resolves prefixes against
recent history; \`topic tail\` renders an "↳ in reply to <name>:
<snippet>" line above replies and shows a copyable #shortid tag on
every row.
- MCP push pipe: channel attributes now include from_pubkey,
from_member_id, message_id, topic, reply_to_id — the recipient can
thread a reply directly from the inbound notification.
- Skill + identity prompt updated to teach Claude how to use the new
attributes for replies.
Bumped CLI to 1.9.0.
await import('libsodium-wrappers') returns the namespace object in
bun, not the sodium API. randombytes_buf et al. live on .default.
Without this, every topic_create on the deployed broker errored
with 'sodium.randombytes_buf is not a function' and the WS handler
silently dropped — CLI saw a 5s timeout.
Confirmed via broker docker logs:
warn ws message error: sodium.randombytes_buf is not a function
Same destructure pattern as crypto.ts (which uses the synchronous
default import).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds v1.7.0 (terminal parity) and v1.8.0 (per-topic encryption)
verbs to the bundled claudemesh skill so Claude Code sessions
discover them via the auto-installed SKILL.md instead of the
README-only path.
Sections added:
- topic tail / topic post under the topic block
- member resource (distinct from peer)
- notification resource
- per-topic encryption block — explains v2 ciphertext marker,
re-seal flow, and 404 behaviour
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A bad ed25519 pubkey on the creator member (legacy data) made
sealTopicKeyForMember throw, which propagated up through createTopic
and made the WS topic_create handler never send a topic_created
frame. CLI saw a 5s timeout and printed 'topic create failed'.
Wraps the seal call in try/catch — topic creation succeeds even if
no copy gets sealed for the creator. They'll see GET /v1/topics/:name/key
return 404 until they re-seal (or a holder does it for them via
the phase-3 background loop).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CLI v1.8.0 on npm. Web stays on v1 plaintext pending the IndexedDB
identity work tracked as phase 3.5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wire format:
topic_member_key.encrypted_key = base64(
<32-byte sender x25519 pubkey> || crypto_box(topic_key)
)
Embedding sender pubkey inline lets re-sealed copies (carrying a
different sender than the original creator-seal) decode the same
way as creator copies, without an extra schema column or join.
topic.encrypted_key_pubkey stays for backwards-compat metadata
but the wire truth is the inline prefix.
API (phase 3):
GET /v1/topics/:name/pending-seals list members without keys
POST /v1/topics/:name/seal submit a re-sealed copy
POST /v1/messages now accepts bodyVersion (1|2); v2 skips the
regex mention extraction (server can't read v2 ciphertext).
GET /messages + /stream now return bodyVersion per row.
Broker + web mutations updated to use the inline-sender format
when sealing. ensureGeneralTopic (web) also generates topic keys
per the bugfix that landed earlier today; both producers now
share one wire format.
CLI (claudemesh-cli@1.8.0):
+ apps/cli/src/services/crypto/topic-key.ts — fetch/decrypt/encrypt/seal
+ claudemesh topic post <name> <msg> — encrypted REST send (v2)
* claudemesh topic tail <name> — decrypts v2 on render, runs a
30s background re-seal loop for pending joiners
Web client stays on v1 plaintext until phase 3.5 (browser-side
persistent identity in IndexedDB). Mention fan-out from phase 1
already works for both versions, so /v1/notifications keeps
working through the cutover.
Spec at .artifacts/specs/2026-05-02-topic-key-onboarding.md
updated with the implemented inline-sender format and the
phase 3.5 web plan.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WSSendMessage gains an optional mentions field; the broker forwards
it into appendTopicMessage so WS-driven topic sends get the same
write-time fan-out path as REST POST /v1/messages. v1 messages
(today's plaintext-base64) still fall back to a body regex when the
field is omitted, so existing CLIs aren't broken; v2 ciphertext
clients in phase 3 will populate it.
Also drops the duplicate meshMember import (kept the meshMember-as-
memberTable alias which the rest of the file uses).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The web mesh-creation path went straight through db.insert(meshTopic)
and bypassed the broker's createTopic, so the v0.3.0 phase-2 key
generation never ran for #general topics created via the dashboard.
Result: GET /v1/topics/general/key returned 409 topic_unencrypted
on every web-created mesh.
Mirrors the broker's createTopic flow inline: generate a 32-byte
topic key + ephemeral x25519 sender keypair, persist the public
half on topic.encrypted_key_pubkey, seal a copy for the oldest
non-revoked member (the owner — owner-as-member rows are minted
at mesh creation per a prior fix), and let the topicKey leave
memory.
Existing meshes with already-created (and unencrypted) #general
topics aren't backfilled; they stay v0.2.0 plaintext until the
phase 3 client encrypt path lands. New meshes get encrypted
topics from this commit forward.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 (notification table) and phase 2 (schema + creator seal)
shipped today. Phase 3 (member-driven re-seal + client-side
encrypt/decrypt) is the cut that actually flips the broker to
ciphertext-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 (infra layer) of v0.3.0. Topics now generate a 32-byte
XSalsa20-Poly1305 key on creation; the broker seals one copy via
crypto_box for the topic creator using an ephemeral x25519
sender keypair (whose public half lives on
topic.encrypted_key_pubkey). Topic key plaintext leaves memory
immediately after the creator's seal — the broker can't read it.
Schema 0026:
+ topic.encrypted_key_pubkey (text, nullable for legacy v0.2.0)
+ topic_message.body_version (integer, 1=plaintext / 2=v2 cipher)
+ topic_member_key (id, topic_id, member_id,
encrypted_key, nonce, rotated_at)
API:
+ GET /v1/topics/:name/key — return the calling member's sealed
copy. 404 if no copy exists yet (joined post-creation, no peer
has re-sealed). 409 if the topic is legacy unencrypted.
Open question parked: how new joiners get their sealed copy
without ceding plaintext to the broker. Spec at
.artifacts/specs/2026-05-02-topic-key-onboarding.md picks
member-driven re-seal (Option B). Pending-seals endpoint, seal
POST, and the actual on-the-wire encryption ship in phase 3.
Mention fan-out from phase 1 (notification table) is decoupled
from ciphertext, so /v1/notifications + MentionsSection keep
working unchanged through both phases.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 of v0.3.0 — replaces the regex-on-decoded-ciphertext scan
in /v1/notifications and the dashboard MentionsSection with reads
from a new mesh.notification table populated at write time.
Schema 0025: mesh.notification (id, mesh_id, topic_id, message_id,
recipient_member_id, sender_member_id, kind, created_at, read_at)
with a unique (message_id, recipient) so a re-fanned message yields
one row per recipient. Backfills existing v0.2.0 messages by
regex-matching the (still-base64-plaintext) bodies — guarded with
a base64 + length check so binary ciphertext doesn't crash the
migration.
Writers (POST /v1/messages + broker appendTopicMessage) now
extract @-mentions from either an explicit `mentions: string[]`
on the request OR a regex over the base64 plaintext (transitional
fallback). Targets are intersected with the mesh roster + capped
at 32 per message. Web chat panel sends the explicit array now so
it keeps working after phase 2 lands.
Readers switch to JOIN-on-notification:
/v1/notifications — table-backed, supports ?unread=1
POST /v1/notifications/read — new, mark by ids or all-up-to
MentionsSection (RSC) — same JOIN, returns readAt for each row
GET /v1/notifications also gains a read_at field per row so a
future bell UI can show unread vs read.
Once per-topic encryption (phase 2) lands, the regex fallback
becomes a no-op for v2 messages — clients MUST send `mentions`,
which they already do.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the terminal verbs (topic tail / member list / notification
list) explicitly to v1.7.0 so the demo cut summary matches what's
on npm.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three new verbs that wrap the v1.6.x REST surface:
claudemesh topic tail <name> → live SSE consumer with N-message backfill
claudemesh member list → mesh roster decorated with online state
claudemesh notification list → recent @-mentions of you across topics
Each command auto-mints a 5-minute read-only apikey via the WS
broker and revokes on exit, so users don't manage tokens. SSE
client uses fetch + ReadableStream so the bearer stays in the
Authorization header.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Companion to the morning handoff. Captures the 12 commits shipped
this evening, live deployment status, the CLI/UI surface gap, three
known risks (chiefly: mentions query depends on plaintext-base64
ciphertext + crashes on non-UTF8 bytes), and three branches for
the next session ranked by leverage: record the demo, wire CLI
verbs to the new endpoints, then v0.3.0 per-topic encryption.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Old next-block listed dashboard (shipped), slack bridge (still
v0.3.0), self-host (v0.3.0), SSO (out of scope). Replaces with
the actual roadmap horizon: daemon redesign, per-topic crypto,
self-host packaging, federation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Blog post "Agents and humans in the same chat" walks through what
shipped in the v1.7.0 demo cut: topics, REST gateway, real-time
SSE, mentions, notification feed, humans-as-peers. Linked from
the blog index above the original protocol post.
Demo script lays out a five-scene 90-second screen capture: two
terminal agents talking, dashboard topic list, live chat with
@-mention autocomplete, mentions feed cross-platform, close.
Production notes + distribution checklist included.
Marketing screenshots and the actual recording are still TODO.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Recently-active apikey holders (used in the last 5 minutes) appear
in the peer list alongside WS-connected sessions. The dashboard
chat user now becomes visible to CLI peers calling list_peers,
closing the v1.6.0 humans-as-peers loop.
Presence rows take precedence when both exist; rest-only rows
get via:"rest" flag and idle status (no presence channel to
infer working/dnd from).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates v1.6.x and v1.7.0 sections with concrete endpoints + client
behaviour for what landed this session. Bridge smoke test and
/v1/peers humans remain open under v1.6.x.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Universe dashboard gets a "Recent mentions" section listing every
topic_message from the last 7 days that references the viewer via
`@<displayName>` (per-mesh — a user can carry different display
names in different meshes). One union'd OR query, capped at 20.
Each mention card links straight into the topic chat at the right
mesh. Snippet is the first 240 chars of the decoded ciphertext with
@-tokens highlighted in clay, matching the in-chat renderer.
GET /v1/notifications mirrors the same scan for api-key-authed
clients (CLI, bots) — accepts ?since=<ISO> for incremental polling.
Both paths use Postgres regex on the decoded base64 plaintext;
when per-topic encryption lands in v0.3.0 they'll move to a
notification table populated at write time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A "search" toggle in the chat header opens a small input that
client-filters loaded messages by plaintext match on body or
sender name. Live tail auto-scroll suspends while a query is
active so matches stay visible when new messages arrive.
Server-side fulltext search lands when ciphertext moves to
per-topic symmetric keys in v0.3.0 — until then there's no
server index to query, and the loaded window (last 100 plus
forward stream) covers most "find that thing from earlier"
needs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Typing `@` in the compose box opens a dropdown of matching mesh
members fed by /v1/members. Filters live by displayName prefix
(case-insensitive); online members rank above offline; shorter
names rank higher; capped at 8 entries.
Keyboard: ArrowUp/Down to navigate, Enter or Tab to insert,
Escape to dismiss. Mouse hover updates the selection; mousedown
inserts (mousedown so the textarea doesn't lose focus first).
Rendered messages now highlight @mentions in clay so they're
visually distinct from plain text — same regex the autocomplete
uses, so the round trip is consistent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A revoked api key or missing topic returned by GET /v1/.../stream
used to throw inside the catch and bounce through the backoff loop
forever. Now any 4xx response terminates the loop and surfaces the
status + body in the panel error so the user sees the real cause.
5xx and network errors still reconnect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GET /v1/members lists every non-revoked member of the api key's
mesh, decorated with online state from presence rows. Distinct from
/v1/peers (active sessions) — sidebars want roster + live dot, not
just whoever is currently connected.
Chat panel splits into a 2-column layout (>=lg) with a 180px
sidebar that polls the roster every 20s. Online members go up top
with status-coloured dots (idle=green, working=clay, dnd=fig);
offline members fade below at 50% opacity. Bots get a "bot" tag.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Universe page aggregates unread topic_message rows per mesh for the
viewing user. Counts messages newer than topic_member.last_read_at
(or all messages if the viewer never opened the topic) and excludes
anything the viewer authored. One JOIN-grouped query, not N+1.
Mesh card surfaces the count as a clay-rounded badge to the left of
the role chip — matches the per-topic badge style on the mesh detail
page so unread is the same visual idiom across the dashboard.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PATCH /v1/topics/:name/read upserts topic_member.last_read_at for the
api key's issuing member. The chat panel calls it on mount and on
every inbound SSE message (5s debounce so we don't hammer it).
GET /v1/topics now returns unread per topic — counts messages newer
than last_read_at and not authored by the viewer. Mesh detail page
shows a clay-rounded badge next to each topic name with the count
(99+ ceiling).
AuthedApiKey gains issuedByMemberId so endpoints can attribute
side-effects to the minting member. Required because external api
keys aren't tied to a specific peer member; only dashboard- and
CLI-minted keys carry one.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GET /v1/topics/:name/stream opens an SSE firehose, polled server-side
every 2s and streamed as `message` events. Forward-only — clients
hit /messages once for backfill, then live from connect-time onward.
Heartbeats every 30s keep the connection through proxies.
Web chat panel reads the stream via fetch + ReadableStream so the
bearer token stays in the Authorization header (EventSource can't
set custom headers, which would force token-in-URL leaks). Auto-
reconnect with exponential backoff. setInterval polling removed.
Vercel maxDuration bumped to 300s on the catch-all API route so
streams aren't cut at the 10s default.
drizzle migrations/meta/ deleted — superseded by the filename-
tracked custom runner in apps/broker/src/migrate.ts (c2cd67a).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Patch release on top of 1.6.0:
- Revoke-by-id-prefix bug fix (broker.revokeApiKey now returns
structured status; CLI surfaces not_found / not_unique). Pasting
the 8-char prefix from `apikey list` output now works as users
expect, instead of silently no-op'ing with a misleading "✔
revoked" message. Already deployed to broker.
- whoami falls back to local mesh-config view when no web session
is signed in. Users who joined via invite (and never ran
`claudemesh login`) now see their member ids and pubkey prefixes
per mesh, instead of a "Not signed in" dead end.
- README updated: REST surface lives at claudemesh.com/api/v1/*
(web app), NOT ic.claudemesh.com/api/v1/* (broker). Surfaced
during CLI-only smoke test against prod when curl on the broker
host returned 404.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
claudemesh apikey revoke <id> reported success even when the input
didn't match any row in mesh.api_key. The CLI's `apikey list` shows
truncated 8-char prefixes; users naturally paste those; broker did
exact-id match against meshApiKey.id; UPDATE affected 0 rows; old
revokeApiKey returned void so the CLI couldn't tell. Discovered via
end-to-end CLI smoke test against prod (roadmap validation pass).
Three-part fix:
- broker.revokeApiKey now returns
{ status: "revoked"|"not_found"|"not_unique"; id?, matches? } and
accepts either the full id or a unique prefix (>=6 chars). Prefix
matching is bounded to the caller's mesh and only succeeds if
exactly one row matches; ambiguous prefixes return not_unique so
we never silently revoke the wrong key.
- New WSApiKeyRevokeResponseMessage carries the structured status
back to the CLI. Old apikey_revoke_ok type removed before being
released — never shipped to users. The error path is no longer
used for not_found/not_unique cases; the unified response carries
both outcomes.
- CLI's apiKeyRevoke now resolves with { ok, id } | { ok: false,
code, message }. runApiKeyRevoke surfaces the code/message and
exits non-zero on failure (NOT_FOUND for missing, INVALID_ARGS
for ambiguous prefix).
Net effect: pasting `claudemesh apikey revoke vq0fwjdX` now actually
revokes the key whose id starts with vq0fwjdX (or fails loud if 0
or >1 keys match). Verified against prod via the new branch's CLI
binary before commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Earlier wording claimed --dangerously-load-development-channels "goes
away" at v3.0.0. That overstated what we know. Some opt-in mechanism
is always required for Claude Code to accept external runtime events
from a third-party process — that's a security invariant, not a quirk
of today's flag.
What changes at v3.0.0 is the FORM of the opt-in (stable settings
entry, native transport subscription, etc.), not its existence. The
"dangerously" / "experimental" / "development" framing is what
disappears, because the underlying API graduates from experimental
to stable. The flag itself, or its successor, lives on as a normal
config entry that claudemesh install writes once.
Public roadmap and internal spec both updated to reflect this.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Public docs/roadmap.md gets the v1.6.0 cut moved to shipped, drops the
v0.2.0-as-next section in favor of a v1.6.x patch line + v1.7.0 demo
cut + v2.0.0 daemon redesign + v3.0.0 native-channels migration target.
Items that were in v0.2.0-next migrate down: gateways and tag routing
land in v0.3.0 alongside per-topic encryption and self-hosted broker.
The detailed strategic version lives at
.artifacts/specs/2026-05-02-roadmap.md — schedule, cost estimates,
migration paths, deliberate exclusions, the load-bearing principle for
the daemon shift ("the user is the unit, not the Claude session").
The public file stays marketing-tone; the artifact captures internal
planning.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Web-first owners had no mesh.member row because the broker only ever
created one on first WS hello (CLI flow). The topic chat page server
component requires that row to issue a dashboard apikey
(issuedByMemberId is a FK to mesh.member), so visiting the chat for a
web-only mesh hit notFound() on the owner's own room.
Forward fix: createMyMesh now generates a fresh ed25519 peer keypair,
inserts a mesh.member row with role=admin and dashboardUserId=userId,
and subscribes the owner to the auto-created #general topic as 'lead'.
The peer secret key is intentionally discarded — web users don't sign
anything in v0.2.0 (no DMs, base64 plaintext on topics). If the same
user later runs the CLI, the broker mints a separate member row from
its own keypair; both work for their respective surfaces.
Backfill: apps/broker/scripts/backfill-owner-members.ts walks every
non-archived mesh whose owner has no member row, generates real
ed25519 keypairs via libsodium, inserts the rows in a transaction,
and subscribes each as 'lead' on #general. Already run against prod
— 13 owner rows minted, ddtest verified end-to-end via playwriter
(send → poll → render round-trip ok).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Now that the filename-tracked runner is in place and prod is bootstrapped,
BROKER_SKIP_MIGRATE=1 is no longer needed. Removed from Coolify env;
the comment is updated to reflect that the flag is a break-glass for
ops, not the steady-state.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
drizzle's _journal.json drifted to idx=11 while the file system had 25
.sql files; the prod drizzle.__drizzle_migrations table was further
behind with 3 rows. The runtime migrator silently skipped anything
outside the journal, so every new schema change required psql -f by
hand.
The new runner tracks applied files in mesh.__cmh_migrations
(filename PK + sha256 + applied_at). On startup it bootstraps the
tracking table inline, lists migrations/*.sql lexicographically,
filters out already-applied files, and runs the rest in transaction
order under the existing pg_advisory_lock. SHA mismatches on
already-applied files emit a warning but don't fail (cosmetic edits
are common); production drift detection lives elsewhere.
Bootstrap script at apps/broker/scripts/bootstrap-cmh-migrations.ts
computes file hashes and seeds the tracking table — already run
against prod with all 25 current files registered as applied. Future
deploys pick up only truly new migrations.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- mesh.topic.id has no PG-side default (drizzle $defaultFn is ORM-only)
- mesh.topic_member.role needs an explicit cast to the enum type
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The web chat surface needed a guaranteed landing room — a topic that
exists for every mesh from creation onward so the dashboard always has
somewhere to drop the user. #general is the convention; ephemeral DMs
remain ephemeral (mesh.message_queue) so agentic privacy is unchanged.
Three hooks plus a backfill:
- packages/api/src/modules/mesh/mutations.ts — createMyMesh now calls
ensureGeneralTopic() right after the mesh insert. New helper is
idempotent via the unique (mesh_id, name) index.
- apps/broker/src/index.ts — handleMeshCreate (CLI claudemesh new)
inserts #general + subscribes the owner member as 'lead' in the
same handler.
- apps/broker/src/crypto.ts — invite-claim flow auto-subscribes the
newly minted member to #general as 'member', defensively ensuring
the topic exists if predates this change.
- packages/db/migrations/0024_general_topic_backfill.sql — one-shot
backfill: creates #general for every active mesh that doesn't have
one, subscribes every active member, and marks the mesh owner as
'lead' based on owner_user_id == member.user_id. Idempotent.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two UX wins for the v0.2.0 chat surface:
- Mesh cards on /dashboard now show topic count alongside members and
tier ("3 MEMBERS · 2 TOPICS · FREE"). Active topics render in clay,
zero in tertiary. One aggregate query, not N+1.
- Mesh detail page replaces the CLI-hint empty state with an inline
CreateTopicForm. Non-empty topic lists get a compact "+ new topic"
pill in the section header. Server action validates name format
(lowercase letters/digits/dashes, 1-50 chars), inserts via the
unique (meshId, name) index, auto-subscribes the creator as topic
lead, then redirects into the chat.
Sidebar audit — kept platform/manage/dev structure as is. Topics are
mesh-scoped so a top-level "topics" entry would have nothing to land
on without a mesh chosen first. Discoverability lives on the mesh
cards instead.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Audit against peer-graph-panel, live-stream-panel, state-timeline-panel,
and resource-panel showed the chat used generic shadcn Card chrome
instead of the established panel pattern. Refactor swaps the wrapper
to the canonical idiom:
- rounded-[var(--cm-radius-lg)] + border-[var(--cm-border)] + bg-[var(--cm-bg)]
- mono header strip with clay-pulse fetch dot, 11px label, 10px metadata
- mono 9px footer status bar (mesh slug · poll cadence · key expiry)
- Anthropic Mono via var(--cm-font-mono) on chrome, sans on message body
- compose textarea uses cm-bg-elevated + cm-border-hover focus state
- error line in cm-fig (#c46686) instead of generic destructive
No behavior change — only chrome. Polling, send path, decode logic
unchanged.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New dashboard route at /dashboard/meshes/[id]/topics/[name] gives signed-in
users a thin chat client over the v0.2.0 REST surface. The mesh detail page
now lists topics with one-click links into the chat. Backend layout:
- packages/api/src/modules/mesh/api-key-auth.ts — exports
createDashboardApiKey() that mints a 24h read+send key scoped to a single
topic for the caller's member id. The page server component calls this on
every render and embeds the secret in the props of the client component;
the secret never touches sessionStorage so a tab close = key effectively
abandoned (the row remains until expiresAt).
- apps/web/.../topics/[name]/page.tsx — server component, NextAuth gate,
resolves the user's meshMember.id, mints the key, renders the shell.
- apps/web/src/modules/mesh/topic-chat-panel.tsx — client component, polls
GET /v1/topics/:name/messages every 5s, sends via POST /v1/messages.
Encoding wraps base64(plaintext) into the ciphertext field — matches the
current broker contract until per-topic HKDF lands in v0.3.0.
The mesh detail page gains a Topics section with empty-state copy that
points users at the CLI verb (claudemesh topic create) for now; topic
creation from the web UI is a follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The v0.2.0 backend cut. Topics, API keys, REST /api/v1/*, and bridge
peers — all in one CLI release. Adds three new verb namespaces:
topic (channel pub/sub), apikey (REST client auth), bridge (cross-mesh
forwarding).
Also pins @claudemesh/sdk as a workspace devDependency so the bridge
implementation is bundled by Bun at build time and doesn't leak into
the npm tarball's runtime deps.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A bridge holds memberships in two meshes and relays messages on a
single topic between them. Federation-lite without a broker-to-broker
protocol.
SDK additions:
- Bridge class (start, stop, EventEmitter for forwarded/dropped/error)
- MeshClient.joinTopic / leaveTopic / createTopic methods
- Loop prevention: plaintext hop counter prefix __cmh<n>: with maxHops
default 2; echo guard via senderPubkey == own session pubkey
CLI additions:
- claudemesh bridge run <config.yaml> long-lived process
- claudemesh bridge init prints config template
- Zero-dep YAML parser for the flat bridge config shape
The hop prefix is visible in message bodies — minor wart, fixed in
v0.3.0 by moving loop tracking into broker primitives.
SDK kept as devDependency since Bun bundles it into dist; no impact
on npm publish or runtime resolution.
Spec: .artifacts/specs/2026-05-02-v0.2.0-scope.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bearer-auth REST endpoints for humans, scripts, bots — anyone without
browser-side ed25519. Same key model as broker WS, scoped by capability
and optional topic whitelist.
Endpoints (v0.2.0 minimum):
- POST /v1/messages
- GET /v1/topics
- GET /v1/topics/:name/messages (limit, before cursor)
- GET /v1/peers
Auth: Authorization: Bearer cm_<secret>. Middleware verifies prefix +
SHA-256 hash with constant-time compare; capability + topic-scope
asserted per route. Cross-mesh isolation: every endpoint scopes to
apiKey.meshId.
Live delivery: writes to messageQueue + topic_message; broker's
existing pendingTimer drains and pushes to live peers. Real-time
push from REST writes is a follow-up.
Spec: .artifacts/specs/2026-05-02-v0.2.0-scope.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issuance flow over WS for now (REST endpoints come next slice).
Plaintext secret returned ONCE on create — never recoverable.
- broker: 3 WS handlers (apikey_create/list/revoke), wire types in
union, audit log on issuance + revoke
- ws-client: apiKeyCreate/List/Revoke with resolver maps, response
dispatch
- CLI: claudemesh apikey create <label> [--cap a,b] [--topic c,d]
[--expires ISO]; list shows status, scope, last-used; revoke by id
- policy: apikey create + revoke prompt by default (issuing or
disabling a credential is meaningful)
Default capability set is "send,read" — least privilege for unscoped
keys (admin must explicitly opt-in).
Spec: .artifacts/specs/2026-05-02-v0.2.0-scope.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Foundation for v0.2.0 REST + external WS auth.
Bearer tokens stored as SHA-256 hashes; secrets are 256-bit CSPRNG so
Argon2 would waste cost without security gain.
Adds mesh.api_key table, migration 0023 applied manually to prod, and
helpers: createApiKey, listApiKeys, revokeApiKey, verifyApiKey.
Next slices: CLI apikey verbs and REST endpoints in apps/web router.
Spec: .artifacts/specs/2026-05-02-v0.2.0-scope.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Broker already plumbs peer_type. Real blocker is browser-side ed25519
hello-sig — sidestepped by exposing REST API for humans (and external
scripts/bots), with web chat UI as a thin REST client using dashboard
session auth. Collapses #2 (humans) and #3 (REST) into one deliverable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
handleSend's pre-flight check rejected #<topicId> sends because the
target wasn't matched by @group / * / pubkey, so it fell into the
"direct" branch and looked for a peer with that pubkey. Topic targets
need their own class — delivery happens via topic_member, not by
matching connected peers.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Broker (apps/broker/src/index.ts)
- Unified disconnect/kick handler uses close code 1000 for disconnect
(CLI auto-reconnects) vs 4001 for kick (CLI exits, no reconnect).
- Ban now closes with code 4002.
- Hello handler: revoked members get a specific 'revoked' error with a
'Contact the mesh owner to rejoin' message, then ws.close(4002).
Previously banned users saw the generic 'unauthorized' error.
- list_bans handler returns { name, pubkey, revokedAt } for each
revoked member.
CLI (apps/cli)
- ws-client: close codes 4001 and 4002 set .closed = true and stash
.terminalClose so callers can surface a friendly message instead of
the low-level 'ws terminal close' error. Revoked error in hello is
also captured as a terminal close.
- withMesh catches terminalClose and prints:
4001 → 'Kicked from this mesh. Run claudemesh to rejoin.'
4002 → the broker's 'Contact the mesh owner to rejoin.' message
- kick.ts now exports runDisconnect + runKick with clear hints:
'disconnect' → 'They will auto-reconnect within seconds.'
'kick' → 'They can rejoin anytime by running claudemesh.'
- cli.ts adds 'disconnect' dispatch; HELP updated.
Semantics:
disconnect: session reset, no DB state, auto-reconnects
kick : session ends, no DB state, user must manually rejoin
ban : session ends + revokedAt set, cannot rejoin until unban
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Promote CLI from 1.0.0-alpha.42 to stable 1.0.0 so
`npm i -g claudemesh-cli` installs the current release without
needing the @alpha dist-tag.
Both dist-tags now point at 1.0.0 — `@alpha` kept as an alias for
continuity so existing docs, install scripts, and scheduled upgrade
commands keep working.
upgrade + doctor commands updated to prefer the `latest` dist-tag
(falling back to `alpha`) and to suggest `npm i -g claudemesh-cli`
without the @alpha suffix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three bugs compounding when multiple peers share a display name:
1. list_peers (MCP + CLI) truncated pubkey to 12 hex chars with an
ellipsis. A truncated pubkey cannot be used as a routing key, so
the caller had no way to disambiguate visually.
2. send_message required the full 64-hex pubkey and refused prefix
input, forcing callers to rely on --json output to get a full key.
3. Name-based resolution returned the first exact match without
filtering the caller's own session — so "send to <my-own-name>"
would bounce against the broker's self-send guard when another
session of the same user was the intended target.
Fixes:
- list_peers now prints 16-char pubkey prefix labelled "pubkey: …"
(MCP) and appends it to CLI output
- send_message accepts any 8–64 hex-char prefix and resolves against
live peer lists across joined meshes; unique match routes, multi-
match returns a disambiguation error listing each candidate's
displayName + pubkey + cwd
- Name matches now skip the caller's own session pubkey; multiple
same-named matches fail loudly with a copy-pasteable pubkey
disambiguation hint instead of silently picking one
- Full 64-char pubkeys without a live match still queue at the
broker (preserves offline-delivery semantics)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New /dashboard landing that surfaces meshes and invitations-to-you
in one view. Replaces the simple mesh grid at /dashboard (preserved
at /dashboard/legacy).
Backend additions:
- GET /api/my/invites/incoming — pending_invite rows addressed to
the authed user's email, joined with invite for role + expiry and
user/mesh for display. Unaccepted + unrevoked + unexpired only.
- DELETE /api/my/invites/incoming/:id — dismiss a pending invite
(revokes the pending_invite row only; underlying invite code stays
valid so the inviter can re-send).
Web additions (all under apps/web/src/modules/dashboard/universe/):
- welcome.tsx — editorial serif header with mesh + invite counts
- invitations.tsx — client card with Accept (→ /i/:code claim flow)
and optimistic Decline
- meshes-grid.tsx — hero card + compact grid, linked to mesh detail
- reveal.tsx — fade-up motion matching marketing _reveal.tsx
Styling uses the existing claudemesh design tokens (--cm-clay,
--cm-bg-elevated, Anthropic Sans/Serif/Mono) — nothing redefined.
Onboarding redirect (0 meshes → /meshes/new?onboarding=1) preserved,
now gated on 0 invitations too so users with pending invites still
land on the dashboard.
Sidebar icon switched to Atom for the "universe" concept.
Standalone prototype saved at prototypes/live-dashboard.html for
reference.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
If the arg isn't a URL and matches a mesh already in local config,
print a hint pointing at `launch --mesh <slug>` instead of treating
the slug as an invite code. Avoids the 501 invite_v2_disabled confusion
when users try to "enter" a mesh they already own.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The peers command opens its own WS to each mesh, which briefly appears
as a hostname-PID peer. Filter it out by session pubkey.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Broker WS handlers:
- kick: disconnect peer(s) by name, --stale duration, or --all.
Authz: owner or admin only. Closes WS + marks presence disconnected.
- ban: kick + set revokedAt on mesh.member. Hello already rejects
revoked members, so ban is instant and permanent until unban.
- unban: clear revokedAt. Peer can rejoin with their existing keypair.
- list_bans: return all revoked members for a mesh.
Session-id dedup (previous commit): handleHello disconnects ghost
presences with matching (meshId, sessionId) before inserting the new
one. Eliminates duplicate entries after broker restarts.
CLI (alpha.37):
- claudemesh kick <peer|--stale 30m|--all>
- claudemesh ban/unban <peer>
- claudemesh bans [--json]
- Uses new sendAndWait() on ws-client for request-response pattern
over WS (generic _reqId resolver).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a client reconnects with the same session_id before the 90s
stale sweeper runs, the old ghost presence stays in the connections
map. Result: duplicate entries in list_peers for the same Claude
Code instance.
Now: handleHello iterates connections for matching (meshId, sessionId),
closes the old WS, deletes from map, marks disconnected in DB.
One session_id = one presence, always.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Backwards compat shim (task 27)
- requireCliAuth() falls back to body.user_id when BROKER_LEGACY_AUTH=1
and no bearer present. Sets Deprecation + Warning headers + bumps a
broker_legacy_auth_hits_total metric so operators can watch the
legacy traffic drain to 0 before removing the shim.
- All handlers parse body BEFORE requireCliAuth so the fallback can
read user_id out of it.
HA readiness (task 29)
- .artifacts/specs/2026-04-15-broker-ha-statelessness-audit.md
documents every in-memory symbol and rollout plan (phase 0-4).
- packaging/docker-compose.ha-local.yml spins up 2 broker replicas
behind Traefik sticky sessions for local smoke testing.
- apps/broker/src/audit.ts now wraps writes in a transaction that
takes pg_advisory_xact_lock(meshId) and re-reads the tail hash
inside the txn. Concurrent broker replicas can no longer fork the
audit chain.
Deploy gate (task 30)
- /health stays permissive (200 even on transient DB blips) so
Docker doesn't kill the container on a glitch.
- New /health/ready checks DB + optional EXPECTED_MIGRATION pin,
returns 503 if either fails. External deploy gate can poll this
and refuse to promote a broken deploy.
Metrics dashboard (task 32)
- packaging/grafana/claudemesh-broker.json: ready-to-import Grafana
dashboard covering active conns, queue depth, routed/rejected
rates, grant drops, legacy-auth hits, conn rejects.
Tests (task 28)
- audit-canonical.test.ts (4 tests) pins canonical JSON semantics.
- grants-enforcement.test.ts (6 tests) covers the member-then-
session-pubkey lookup with default/explicit/blocked branches.
Docs (task 34)
- docs/env-vars.md catalogues every env var the broker + CLI read.
Crypto review prep (task 35)
- .artifacts/specs/2026-04-15-crypto-review-packet.md: reviewer
brief, threat model, scope, test coverage list, deliverables.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Broker (all need redeploy):
- sweepOrphanMessages: DELETE undelivered message_queue rows older
than 7 days; hourly sweep. Stops unbounded growth when a sender
typos a name (queued forever, never claimed).
- Per-member send rate limit: TokenBucket(60/min, burst 10) keyed on
memberId so reconnecting can't bypass. Surfaces as queued=false,
error='rate_limit: ...'.
- Pre-flight size cap: reject at handleSend if nonce+ciphertext+
targetSpec exceeds env.MAX_MESSAGE_BYTES with a clear error
instead of silent WSS frame-level kill.
- No-recipient reject: for direct sends, check any matching peer
is connected BEFORE queueing. Kills the self-send silent drop
(sending to your own pubkey when you only have one session
connected) and typo-to-offline-peer silent drops.
- WSAckMessage.error field added for structured failure reasons.
CLI:
- ws-client ack handler reads msg.queued and msg.error; surfaces
rate_limit / too_large / no_recipient to callers instead of
returning ok:true with a dummy messageId.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When Alice's session-A encrypts a direct message to Bob (target = Bob's
stable member pubkey) and Bob's session-B receives it, Bob has BOTH an
ephemeral session secret key and the member secret key. The old code
only tried session_sk, then silently failed with '⚠ message from
<sender> failed to decrypt' even though the message was valid —
just encrypted to the member key.
Now: try session first, fall back to member on null. Matches the
sender side's choice freedom (encrypt using either key).
Repros when: user opens multiple Claude Code sessions (all use the
same member key but each generates its own session key), and one
session sends to another by display-name resolution which returns
the member pubkey.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When a user opens multiple Claude Code instances on one laptop they
all share the same memberPubkey (one identity, one config.json). The
broker was broadcasting each Claude Code start/stop to every OTHER
session of the same user — showing as 'peer agutierrez left / joined'
spam in every active claude terminal.
Now: skip broadcast to presences whose memberPubkey equals the joining
or leaving presence's memberPubkey. Other actual peers on the mesh
still see the event.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The broker code moved to an append-only hash-chained audit log
(actor_member_id / actor_display_name / payload / prev_hash / hash
with integer GENERATED ALWAYS AS IDENTITY id) but prod still had
the original 0000-migration shape (actor_peer_id / metadata /
text id). Every peer_joined / peer_left event logged 'audit log
insert failed' — no audit trail captured at all.
Applied manually on prod already; committing the migration so
future environments converge.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two bugs that combined to make Claude's peer-send look successful even
when the recipient didn't exist:
1. resolveClient fell through to 'let the broker try' when a single
mesh was joined and the name didn't match any peer. The broker
queued the message against the literal unknown string, matched no
peer in fan-out, but returned a messageId — so the CLI reported
'✓ lezg → msgId' for a peer that was never there.
Now: refuse to send, list the known peer names.
2. list_peers showed the same pubkey multiple times with different
display_names (one per live session) without hinting that they
were the same member — so Claude treated them as distinct people.
Now: annotate with '[shares key with N other session(s)]' so the
caller understands one pubkey = one identity.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Older CLIs sometimes called POST /cli/mesh/create without a pubkey,
and the broker stored the string 'pending' as peer_pubkey on the
owner's mesh.member row. Every subsequent hello from the real CLI
failed the membership lookup silently, leaving the connection in
'reconnecting' forever with no useful log line.
Now: validate pubkey is 64 hex chars before creating the owner
member row. Existing 'pending' rows on prod were patched manually.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- info/inbox commands → unified render.ts
- install route: drop in-memory counter, rely on PostHog + structured logs
- docs: roadmap, CLAUDE.md reflect alpha.31 state
- tests workflow now also builds + smoke-tests the CLI bundle
- homebrew tap bootstrap kit in packaging/homebrew-tap-bootstrap/
(README + copy of the formula template for dropping into the tap repo)
- upstream Claude Code issue draft for rich <channel> UI
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- apps/cli/ is now the canonical CLI (was apps/cli-v2/).
- apps/cli/ legacy v0 archived as branch 'legacy-cli-archive' and tag
'cli-v0-legacy-final' before deletion; git history preserves it too.
- .github/workflows/release-cli.yml paths updated.
- pnpm-lock.yaml regenerated.
Broker-side peer-grant enforcement (spec: 2026-04-15-per-peer-capabilities):
- 0020_peer-grants.sql adds peer_grants jsonb + GIN index on mesh.member.
- handleSend in broker fetches recipient grant maps once per send, drops
messages silently when sender lacks the required capability.
- POST /cli/mesh/:slug/grants to update from CLI; broker_messages_dropped_by_grant_total metric.
- CLI grant/revoke/block now mirror to broker via syncToBroker.
Auto-migrate on broker startup:
- apps/broker/src/migrate.ts runs drizzle migrate with pg_advisory_lock
before the HTTP server binds. Exits non-zero on failure so Coolify
healthcheck fails closed.
- Dockerfile copies packages/db/migrations into /app/migrations.
- postgres 3.4.5 added as direct broker dep.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The compile step bypasses build.ts, so the define had to be added
to the workflow's bun build command directly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
alpha.28-30 binaries all reported 'v1.0.0-alpha.27' from a hardcoded
constant in src/constants/urls.ts — my bump sed only matched
package.json's 'version' key, not the TypeScript literal.
build.ts now reads package.json version and injects it via Bun's
`define` (source-text replacement, equivalent to esbuild --define).
urls.ts reads the injected symbol with a runtime fallback for `bun
src/...` dev mode. Version drift can't recur.
+ peers + status migrated to the render.ts unified renderer.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Final scoreboard against the Claude Code-grade CLI bar. Captures
every file shipped, every gotcha hit, and the one remaining item
(rich channel UI) that needs upstream Claude Code work.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Channel messages now render as '<sender>: <body>' with priority
+ broadcast badges in Claude Code's <channel> reminders, so the inbox
reads as a chat thread rather than bare lines.
[URGENT] alice: deploy is blocking release
bob (broadcast): team sync 15min
charlie: pr #42 lgtm
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
macos-latest = ARM64, ubuntu-latest = x64. Only darwin-arm64 and
linux-x64 binaries can execute on their build host; the others are
cross-compiled and will Exec format error.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CLI source (242 files, ~14k lines) was gitignored during the
earlier cli→cli-v2 reorg so only the published npm package carried it.
That blocks the GitHub Actions release workflow (release-cli.yml),
which clones the repo fresh on each runner and needs the source to
compile binaries via `bun build --compile`.
Moves the gitignore from root-level to `apps/cli-v2/.gitignore` with
only the usual build artefacts excluded (node_modules, dist, .turbo,
.cache). Source is now in git at apps/cli-v2/src/.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- .github/workflows/release-cli.yml: build self-contained binaries via
`bun build --compile` for darwin/linux/windows × x64/arm64 on every
cli-v* tag, attach to GitHub Release with SHA256SUMS, auto-bump the
homebrew tap on non-prerelease versions.
- packaging/homebrew/claudemesh.rb.template: formula template for the
homebrew-claudemesh tap.
- packaging/winget/claudemesh.yaml.template: winget manifest template.
- /install script now detects absence of Node and downloads the
platform-appropriate binary from the GitHub Release, installs to
~/.claudemesh/bin, and shims into ~/.local/bin — zero Node required.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Capture the design for the two tier-2 items that weren't shipped inline
in alpha.28 — both require CI/infrastructure work (GitHub Actions,
Homebrew tap, winget manifest) or broker schema migration that's safer
to do as a separate PR with feature flag rollout.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- /install shell script now points users at `claudemesh <invite-url>`
(one step) instead of the split join+launch
- InstallToggle first-time panel shows single copy-block with
install+launch on the same line
- Also advertises url-handler install and shell completions
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Email (broker):
- Rebrand mesh-invitation.tsx to match site (clay accent #d97757,
cream fg, Anthropic Serif/Mono, dark bg). Mesh glyph in header.
- Hero CTA links to the /i/short URL landing page.
- Single one-liner 'npm i -g claudemesh-cli && claudemesh launch --join URL'
so new users copy once, paste once, done.
Web InstallToggle:
- Replace two-step numbered list with single one-liner in the first-time
panel. Reduces copy/paste ops from 2 to 1 and stops prescribing
'YourName' as a literal (CLI now defaults to $USER).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the plain-text invite email with a standalone react-email
template (apps/broker/src/emails/mesh-invitation.tsx) using
@react-email/components + Tailwind. Rendered on demand in
handleCliMeshInvite and sent as both HtmlBody and TextBody via
Postmark (or html+text via Resend).
Self-contained — no dependency on @turbostarter/email, i18n, or ui
packages. Adds react, react-dom, @react-email/components, @react-email/render
to broker deps. Enables tsconfig jsx: react-jsx and .tsx includes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Broker now sends the invite email when body.email is provided and
POSTMARK_API_KEY (or RESEND_API_KEY) is configured. Returns
`emailed: boolean` so the CLI can honestly report whether the email
was sent instead of falsely claiming success on link generation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
handleCliMeshCreate now generates ownerPubkey/ownerSecretKey/rootKey so
CLI-created meshes can issue invites. handleCliMeshInvite builds the
full signed v1 payload + v2 capability (matching createMyInvite in
packages/api) and self-heals meshes created by older broker versions
that are missing keys.
Fixes 500 on claudemesh share after CLI mesh create.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- session_id (clm_sess_...) in browser URL — identifies login attempt
- user_code (ABCD-EFGH) visual confirmation — shown in both terminal and browser
- device_code (secret) — CLI polls with this, never displayed
- CLI accepts stdin paste of JWT token while polling (race)
- Web page handles both ?session= and ?code= params
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Payload CMS imports .css/.scss/.svg files that Node.js ESM can't handle
during page data collection. Added a custom ESM loader that stubs these
asset imports, fixing the build that has been broken since the upgrade.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove dependency on SocialProviders/RegisterForm which need
React Query providers. Self-contained with authClient directly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Header now checks session and shows avatar + name + Dashboard link
when logged in, instead of always showing Sign in / Start free.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
No more redirect to generic /auth/login. The /cli-auth?code=XXXX page
now shows auth forms inline (Google, GitHub, email) with device code
context — like Anthropic's "Build with Claude" page.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Drizzle schema: device_code + cli_session tables in mesh pgSchema
- Broker endpoints: POST /cli/device-code, GET /cli/device-code/:code,
POST /cli/device-code/:code/approve, GET /cli/sessions
- Web app API routes now proxy to broker (no in-memory state)
- Tracks devices per user: hostname, platform, arch, last_seen, token_hash
- JWT signed with CLI_SYNC_SECRET, 30-day expiry
- Session revocation support via revokedAt column
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New API endpoints:
- POST /api/auth/cli/device-code/new — issue device code + user code
- GET /api/auth/cli/device-code/[code] — poll device code status
- POST /api/auth/cli/device-code/[code]/approve — approve by device code
- POST /api/auth/cli/device-code/approve-by-user-code — approve by user code
Updated cli-auth page to auto-approve on page load after authentication
(no manual "Approve" button click needed).
Enables `claudemesh login` and `claudemesh register` CLI commands.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrites pricing section from single "public beta" card to side-by-side
hosted vs self-hosted comparison reflecting the cleaner product
architecture. Enterprise sell is now concrete: "Run our Docker image,
point your CLI at it, done — your mesh never leaves your VPC."
Updates hero subtitle, CTA, FAQ, and where-mesh-fits claim card to
reinforce the two deployment modes consistently across the landing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Terminals spawned by `claudemesh launch` were dropping keystrokes at
claude's prompt and showing the launch wizard re-rendering on top of
claude's TUI. Two compounding causes:
1. spawn() + child.on('exit') kept the parent node event loop alive
during claude's lifetime. Any stray readline 'data' listener or
late render from the wizard could fire on the inherited stdin/
stdout, stealing keystrokes or painting over claude's Ink TUI.
2. Raw mode / alt-screen / hidden cursor set by the wizard helpers
was not reliably restored before the handoff.
Fix:
- Swap spawn for spawnSync so the parent event loop is fully blocked
while claude runs. No listener or setImmediate can fire during
claude's lifetime.
- Hard TTY reset right before the spawn: setRawMode(false),
removeAllListeners on stdin, show cursor (ESC[?25h), exit alt
screen (ESC[?1049l). Defensive — survives partial wizard cleanup.
- Move cleanup() registration to process.on('exit') so it runs
synchronously on every exit path (normal, signal, throw).
- Preserve signal forwarding: if claude dies from a signal, re-raise
the same signal on the parent so exit codes propagate correctly.
Bumps to v0.10.6.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New hero section with a live animated mesh background: three equal
Claude Code peers in a triangle layout + six desaturated background
peers, all rendered pixel-perfect from pure React/CSS using the exact
Unicode characters and colors from Claude Code's own source.
- User prompts type into the bottom prompt-input box and "submit" to
scrollback (matching real Claude Code behavior). Mesh sends fly as
envelope icons with fading trails between peers; receivers pulse on
arrival. Dynamic routing by peer displayName.
- Radial vignette overlay keeps the hero title crisp while letting the
corner peers pulse visibly around the edges. Top/bottom linear fades
bleed into adjacent sections.
- Responsive scaling via ResizeObserver: cover-fit in hero bg context,
contain-fit for standalone use.
- Features section: added Skills, MCPs, and Commands as the first
three tabs — the mesh's real differentiators. Updated subtitle copy.
- New "Where claudemesh fits" section positioned between Features and
WhatIsClaudemesh: four-card comparison (vs MCP, vs subagents,
vs OpenClaw, and the positive claim) framing claudemesh as a wire
between Claude Code sessions, not a replacement for any of them.
All work is additive: 10 new files in apps/web/src/modules/marketing/
home/fake-claude-code/ plus hero-mesh-animation.tsx, hero-with-mesh.tsx,
and where-mesh-fits.tsx. Single edit each to features.tsx and
(marketing)/page.tsx to swap in the new hero and mount the new section.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Completes the v2 invite user experience. The generator now ships two
delivery modes behind a simple Link | Email toggle, and the vestigial
ic:// scheme is gone from every user-visible surface.
Modes
- Link (default, existing flow): mints a v2 invite, displays short URL
+ QR + CLI command. No behavioral change vs wave 2.
- Email (new): admin types a recipient email, submit dispatches through
the POST /api/my/meshes/:id/invites/email endpoint (wave 2), which
mints a normal v2 invite, records a pending_invite row, and stubs the
Postmark send with a TODO. Result card shows a "✓ Invite sent to X"
banner plus the same QR card so the admin can also share manually.
Honest UX copy on the stub:
"Email delivery is stubbed in v0.1.x — the invite is valid. Share the
link directly if needed." Avoids pretending something shipped that
hasn't.
ic:// cleanup
- inviteLink field no longer rendered or stored (still returned by the
API for backward compatibility; just not surfaced)
- CLI command now copies `claudemesh join <code>` (falls back to
shortUrl when code is null), matching the new v2 entry point
- Zero remaining `ic://` references in the UI
Implementation notes
- Two separate useForm instances (linkForm, emailForm) with dedicated
resolvers and submit handlers — clearer state boundaries than
conditional validation on a merged schema
- Mode toggle uses role="group" + aria-pressed, focus-visible ring,
keyboard-navigable
- Email result banner is role="status" for screen readers
- RPC client has one `as any` on `(api.my.meshes[":id"].invites as any)
.email.$post` — the endpoint IS registered server-side (wave 2) but
the monorepo's Hono type regen is out-of-band; TODO comment marks the
cast for removal when the RPC types catch up
- No new deps
- Component export signature unchanged
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wires the v2 invite protocol end-to-end from a CLI user's perspective.
Broker foundation landed in c1fa3bc; this commit is the glue between
it and the human.
API (packages/api)
- createMyInvite now mints BOTH v1 token (legacy) AND v2 capability.
Two-phase insert: row first (to get invite.id), then UPDATE with
signed canonical bytes stored as JSON {canonical, signature} in the
capabilityV2 column. Broker's claim handler parses the same shape.
- canonicalInviteV2 locked to `v=2|mesh_id|invite_id|expires_at|role|
owner_pubkey_hex` — byte-identical to apps/broker/src/crypto.ts.
- brokerHttpBase() helper rewrites wss://host/ws → https://host for
server-to-server calls.
- POST /api/public/invites/:code/claim — thin proxy to broker;
passes status + body through, 502 broker_unreachable on fetch fail,
cache-control: no-store.
- POST /api/my/meshes/:id/invites/email — mints a normal v2 invite
via createMyInvite, records a pending_invite row, calls stubbed
sendEmailInvite (logs TODO for Postmark wiring in a later PR).
- New schemas: claimInviteInput/ResponseSchema,
createEmailInviteInput/ResponseSchema, v2 fields on
createMyInviteResponseSchema.
- v1 paths untouched — legacy /join/[token] and /api/public/invite/:token
continue to work throughout v0.1.x.
CLI (apps/cli)
- New `claudemesh join <code-or-url>` subcommand.
- Accepts bare code (abc12345), short URL (claudemesh.com/i/abc12345),
or legacy ic://join/<token>. Detects v2 vs v1 and dispatches.
- v2 path: generates fresh ephemeral x25519 keypair (separate from
the ed25519 identity) → POST /api/public/invites/:code/claim →
unseals sealed_root_key via crypto_box_seal_open → persists mesh
with inviteVersion: 2 and base64url rootKey to local config.
- Signature verification skipped with TODO — v0.1.x trusts broker;
seal-open is already authenticated.
- apps/cli/src/lib/invite-v2.ts: generateX25519Keypair, claimInviteV2,
parseV2InviteInput.
- state/config.ts: additive rootKey?/inviteVersion? fields.
CLI friction reducer
- apps/cli/src/index.ts: flag-first invocations
(`claudemesh --resume xxx`, `claudemesh -c`, `claudemesh -- --model
opus`) now route through `launch` automatically. Bare `claudemesh`
still shows welcome; known subcommands dispatch normally.
- Removes one word of cognitive load: users never type `launch`.
No schema changes. No new deps. v1 fully backward compatible.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- /meshes and /status now show mesh slug names instead of truncated IDs
- meshSlug cached on connect and loaded from DB join on boot
- Remove dangerous fallback that connected to ALL meshes in email flow
- BridgeRow now includes optional meshSlug field
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restores the curl installer script at /install. Adds:
- In-memory fetch counter (visible in container logs)
- Server-side PostHog event 'install_script_fetched' with
IP, user-agent, and referer (fire-and-forget)
- Console log per fetch for monitoring
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Missing from standalone output → invite creation crashes with
'Cannot find module libsodium-wrappers' in production.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Masked email with asterisks broke Telegram Markdown bold syntax.
Use plain text for the code prompt message.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Dynamic import returns module wrapper, need .default.ready then .default
for the actual sodium functions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Members created by CLI don't have dashboardUserId set. Now searches
by both userId and dashboardUserId columns. Falls back to all meshes
if no member link found (bootstrap case for mesh owners).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The member table has no secretKey column (by design - keys are local).
Email verification now generates a fresh ed25519 keypair and creates
a new bridge-specific member entry for each mesh the user belongs to.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Users can now type /connect in the bot → enter email → receive 6-digit
code → enter code → auto-connect to all meshes linked to that email.
Supports Resend and Postmark email providers via env vars.
Rate-limited to 5 code attempts, 10-min expiry.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Instead of broadcasting files to all peers, the bot now uploads first
then shows an inline keyboard: individual peers, Everyone, or Keep private.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
turbopack.rules only applies to turbopack. When building with
TURBOPACK=0 (required for Payload CMS), webpack has no SVG rule.
Icons.UnitedKingdom returns an object → React #130. Adding a
webpack config rule for @svgr/webpack fixes both bundler paths.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Editorial timeline with vertical track, colored phase markers,
2-column feature grids per milestone. Shows v0.1→v0.8 evolution:
Foundation → Groups → Shared Intelligence → Files → Data Platform
→ Platform. Anchored by '66 npm releases. Every feature below is
in production today.' Dashed 'next' card at bottom for roadmap.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Claude Code's MCP prompts path doesn't support the context field
natively. When a skill has context:"fork", prepend an instruction
telling the model to use the Agent tool with the specified agent
type and model.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New broker endpoints for CLI auth sync flow (POST /cli-sync),
member profile management, and mesh settings. Includes JWT
verification for dashboard-issued sync tokens. DB schema adds
member profile fields and mesh policy columns.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Welcome was silently dropped when sent before Claude Code's
notifications/initialized. Add 2s delay after WS connects to
ensure the MCP handshake is complete.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move startClients() to run after server.connect(), not before.
MCP server is available to Claude Code in <0.5s instead of ~30s.
Tool handlers gracefully return errors until WS is ready.
Push event wiring happens in background callback.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Node ESM can't handle .css imports during Next.js route collection.
This loader intercepts .css resolutions and returns empty modules,
fixing the build for all Payload deps (richtext-lexical, react-image-crop, etc.)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Root cause: Next.js 16 defaults to Turbopack for builds, but Payload CMS's
richtext-lexical imports .css files that fail during route collection in
Node ESM context.
Fix: add @payloadcms/richtext-lexical and @payloadcms/next back to
serverExternalPackages so Next.js skips their internal imports during
route collection. Use --webpack explicitly since Turbopack production
builds are incompatible with Payload (payloadcms/payload#14786).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claudemesh MCP server now declares prompts:{} and resources:{} capabilities.
Mesh skills auto-appear as /claudemesh:skill-name slash commands in Claude Code
via prompts/list+get, and as skill://claudemesh/{name} resources for the
upcoming MCP_SKILLS protocol. share_skill accepts optional metadata (when_to_use,
allowed_tools, model, context, agent) stored in the manifest jsonb column.
Change notifications sent on share/remove so Claude Code refreshes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
System messages (watch_triggered, mcp_deployed, peer_joined, etc.)
have senderPubkey='system' with empty ciphertext. The push handler
now formats them as readable plaintext instead of failing to decrypt.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
System messages (watch_triggered, mcp_deployed, peer_joined, etc.)
have senderPubkey='system' with empty ciphertext. The push handler
now formats them as readable plaintext instead of failing to decrypt.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Boot restore now checks runner /health to see what's already running,
then updates DB status to match. Fixes the bug where broker restart
marked running services as 'failed' because it tried to re-deploy
without shared source volume.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Runner /load now accepts gitUrl, npxPackage, or sourcePath. It handles
git clone and npm install internally. Broker no longer needs shared
volume for source extraction — just tells the runner what to fetch.
CLI mesh_mcp_deploy now supports npx_package as a third source type.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- apps/runner/: Dockerfile (node22 + python3 + uv + bun) + supervisor.mjs
(HTTP API for load/call/unload/health)
- docker-compose: runner service with shared services-data volume
- Broker mcp_deploy: git clone or zip extract → runner /load → MCP spawn
- Broker mcp_call: routes managed services to runner via HTTP, falls back
to live-proxy for peer-hosted servers
- RUNNER_URL env var for broker → runner communication
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- broker-crypto.ts: AES-256-GCM encrypt/decrypt with BROKER_ENCRYPTION_KEY
- mcp_deploy stores env as _encryptedEnv in mesh.service.config (no plaintext in DB)
- boot restore: decrypts _encryptedEnv and re-spawns services via service-manager
- auto-generates ephemeral key if BROKER_ENCRYPTION_KEY not set (logs warning)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
claudemesh launch now supports:
--resume <id> / -r — resume a previous Claude Code session
--continue / -c — continue the most recent conversation
When resuming, skips generating a new session ID so the mesh peer
identity persists. The detectClaudeSessionId() fallback in ws/client.ts
picks up the existing session UUID from the .jsonl file.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
claudemesh launch now generates a UUID and passes it to claude via
--session-id flag + CLAUDEMESH_SESSION_ID env var. The MCP server
reads this and sends it in the hello handshake.
Fallback: when launched without claudemesh launch (e.g., claude --resume),
detectClaudeSessionId() scans ~/.claude/projects/ for the most recent
.jsonl file and extracts the session UUID from the filename.
Benefits:
- Broker detects reconnections (same session = restore state)
- Multiple peers in same project dir get unique identities
- Session identity persists across --resume
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Wave 3I handlers (vault_set, vault_list, vault_delete, mesh_mcp_deploy,
mesh_mcp_undeploy, mesh_mcp_update, mesh_mcp_logs, mesh_mcp_scope,
mesh_mcp_schema, mesh_mcp_catalog, mesh_skill_deploy) were lost during
the re-apply phase. Tools were registered in tools/list but returned
"Unknown tool" because the switch cases in server.ts were missing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Simplify getting-started to 2 steps: npm install + launch --join.
Remove "claudemesh install" section, update join page to show
launch --join as the primary flow, update invite format examples.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Drop /install route (curl|bash script). Install is just `npm i -g
claudemesh-cli`. Update hero, FAQ, getting-started, and join flow to
reflect the simplified 3-step onboarding: install → join → launch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add the foundation for deploying and managing MCP servers on the VPS
broker, with per-peer credential vaults and visibility scopes.
Architecture:
- One Docker container per mesh with a Node supervisor
- Each MCP server runs as a child process with its own stdio pipe
- claudemesh launch installs native MCP entries in ~/.claude.json
- Mid-session deploys fall back to svc__* dynamic tools + list_changed
New components:
- DB: mesh.service + mesh.vault_entry tables, mesh.skill extensions
- Broker: 19 wire protocol types, 11 message handlers, service catalog
in hello_ack with scope filtering, service-manager.ts (775 lines)
- CLI: 13 tool definitions, 12 WS client methods, tool call handlers,
startServiceProxy() for native MCP proxy mode
- Launch: catalog fetch, native MCP entry install, stale sweep, cleanup,
MCP_TIMEOUT=30s, MAX_MCP_OUTPUT_TOKENS=50k
Security: path sanitization on service names, column whitelist on
upsertService, returning()-based delete checks, vault E2E encryption.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shows the full hierarchy: Organization → Mesh → @groups → Peers
with live state + memory. Six coordination patterns below with
code snippets: lead-gather, delegation, voting, chain review,
broadcast, targeted views. Footer: 'All patterns are conventions
in system prompts. The broker routes; Claude coordinates.'
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Shows the 12 capability categories that flow through the mesh:
messages, groups, state, memory, files, SQL, vectors, graph,
tasks, context, streams, scheduled. Each with a mono icon tag
and one-line description. Anchored by '43 MCP tools, 5
persistence backends' footer.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The CSS import error was caused by richtext-lexical being in
serverExternalPackages — Node can't require .css files. Removing
it lets Turbopack bundle it (handling CSS natively). Other payload
packages stay external (they don't import CSS).
Restores turbopack as the default production bundler.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
NPCs are mesh data (skills, memory, state), not peers. One API call
per interaction, 3 coordinator peers per faction. Game connector
assembles context from mesh and calls any LLM on demand.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The mesh is the communication fabric, not the simulation engine.
SimController pattern: external controller drives tick loop, computes
visibility, sends observations to peers, collects actions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- generateMetadata instead of metadata (getMetadata returns a function)
- Use TURBOPACK=0 env prefix instead of --no-turbopack flag (not recognized in Docker)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The --no-turbopack flag isn't recognized when Next.js runs inside the
Docker builder stage. The Dockerfile already sets ENV TURBOPACK=0 which
achieves the same effect.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Persistent MCP servers (opt-in via `persistent: true`) survive host
disconnects — they appear as offline in mcp_list and auto-restore when
the host reconnects. Ephemeral servers (default) still clean up on
disconnect. Offline servers return a clear error on mcp_call with
time-since-disconnect info.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
changelog-20260407.md: full implementation details for 21 features
vision-20260407.md: slimmed to shipped summary + remaining items
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Save groups, profile, visibility, summary, display name, and cumulative
stats to a new mesh.peer_state table on disconnect. On reconnect (same
meshId + memberId), restore them automatically — hello groups take
precedence over stored groups if provided. Broadcast peer_returned
system event with last-seen time and summary to other peers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Peers now receive [system] notifications when MCP servers join or
leave the mesh, with tool names and hosting peer info.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sherif enforces consistent dependency versions across the monorepo.
The connectors used ^8.0.0 for ws and @types/ws while the rest used
exact 8.20.0 / 8.5.13. Also sorted dependencies alphabetically.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
17 of 22 items done, 2 partial. Updated all section headers and
added implementation notes with commits and timestamps.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The lockfile was stale — connector-slack/package.json added 7 deps that
weren't reflected in pnpm-lock.yaml, causing frozen-lockfile builds to fail.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Teaches AI when to use filesystem (local), read_peer_file (remote
<1MB), or share_file (persistent, no size limit).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When read_peer_file targets a local peer (same hostname), prepend a
hint with the direct filesystem path. Still executes the relay as
fallback — AI learns the shortcut without being blocked.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Peers report os.hostname() in the hello handshake. list_peers shows
[local] or [remote] tag per peer. MCP instructions teach AI to read
local peers' files directly via filesystem instead of relay.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add the webhook handler module (webhooks.ts) that verifies secrets
against the mesh.webhook table and broadcasts incoming HTTP POST
payloads to all connected mesh peers. This completes the webhook
feature whose schema, types, WS CRUD handlers, and CLI tools were
added in the previous commits.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wire up MCP tool handlers for the peer file sharing relay. Peers can
now read files and list directories from other peers' local filesystems
through the mesh broker. Includes name-to-pubkey resolution, base64
decode, and instructions table update.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Broker-driven clock that broadcasts periodic heartbeat ticks to all
peers in a mesh. Speed is configurable from x1 (real-time, 60s ticks)
to x100 (600ms ticks) for load testing simulations. Auto-pauses when
the last peer disconnects.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds share_skill, get_skill, list_skills, and remove_skill across the full
stack (Drizzle schema, broker CRUD + WS handlers, CLI client methods, MCP
tools). Skills are mesh-scoped, unique by name, and searchable via ILIKE
on name/description/tags.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add tamper-evident audit logging where each entry includes a SHA-256
hash of the previous entry, forming a verifiable chain per mesh.
Events tracked: peer_joined, peer_left, state_set, message_sent
(never logs message content). New WS handlers: audit_query for
paginated retrieval, audit_verify for chain integrity verification.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Standalone TypeScript SDK that any process can use to join a mesh and
send/receive messages. Implements the same WS protocol and libsodium
crypto_box encryption as the CLI, with an EventEmitter-based API.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Peers self-report resource usage via set_stats; stats visible in
list_peers responses and the new mesh_stats MCP tool. CLI auto-reports
every 60s and tracks messagesIn/Out, toolCalls, uptime, and errors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Introduces @claudemesh/connector-telegram — a standalone bridge process
that joins a mesh as peerType: "connector" and relays messages
bidirectionally between a Telegram chat and mesh peers via long polling.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Peers can register MCP servers with the mesh and other peers can invoke
those tools through the existing claudemesh connection without restarting.
Broker: in-memory MCP registry with mcp_register/unregister/list/call
handlers, call forwarding to hosting peer with 30s timeout, and automatic
cleanup on peer disconnect.
CLI: mcpRegister/mcpUnregister/mcpList/mcpCall client methods, inbound
mcp_call_forward handler, and 4 new MCP tools (mesh_mcp_register,
mesh_mcp_list, mesh_tool_call, mesh_mcp_remove).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two new panels below the existing peer graph + live stream grid:
- StateTimelinePanel: vertical timeline of audit events and presence
status changes, auto-scrolling, sorted newest-first
- ResourcePanel: 2x2 card grid showing live peers, envelopes by
priority, audit event breakdown, and session status
Both share the same TanStack Query cache key as the existing panels
(no extra API calls). Matches the --cm-* dark terminal aesthetic.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
After MCP registration and hooks setup, `claudemesh install` now checks
the config for joined meshes. If empty, it prints actionable guidance
(join command + dashboard URL) instead of the generic "Next:" line.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace in-memory-only setTimeout scheduling with a DB-backed system
that survives broker restarts. Adds:
- `scheduled_message` table in mesh schema (Drizzle + raw CREATE TABLE
for zero-downtime deploys)
- Minimal 5-field cron parser (no dependencies) with next-fire-time
calculation for recurring entries
- On broker boot, all non-cancelled entries are loaded from PostgreSQL
and timers re-armed automatically
- CLI `schedule_reminder` MCP tool accepts optional `cron` expression
- CLI `remind` command accepts `--cron` flag
- One-shot reminders remain backward compatible — no cron field = same
behavior as before
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add 4 missing tools (cancel_scheduled, grant_file_access, list_scheduled,
schedule_reminder) and sort the array alphabetically for maintainability.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Renders peers as SVG nodes in a radial layout with animated edges
showing real-time message traffic. Shares the same TanStack Query
cache as LiveStreamPanel (same queryKey). Side-by-side on desktop,
stacked on mobile.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rewrite all command and argument descriptions in index.ts to follow
imperative mood, omit filler, use backtick-formatted values, and
surface key behaviors (e.g. launch spawns Claude Code with MCP,
remind supports list/cancel subactions, send accepts @group and *).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The "0 */2 * * *" cron example inside a /** comment caused TSC to
parse */ as end-of-comment, producing syntax errors.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Predefined mesh configurations (dev-team, research, ops-incident,
simulation, personal) let users bootstrap meshes with groups, roles,
state keys, and system prompt hints. Templates are bundled at build
time via Bun's JSON import support.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extend the WS hello handshake with optional peerType, channel, and model
fields so peers can advertise what kind of client they are. The broker
stores these in-memory on PeerConn and returns them (along with cwd) in
the peers_list response. CLI peers command and MCP list_peers tool now
display the new metadata.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a peer connects or disconnects, the broker now broadcasts a
system push (subtype: "system") to all other peers in the same mesh.
The CLI formats these as [system] channel notifications so AI sessions
can react to topology changes without polling.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Landing page copy was stuck at the v0.1 feature set (messaging + state + memory + groups).
The CLI now ships 43 MCP tools across 5 persistence backends. This commit brings the site
copy in sync with what's actually built.
Changes:
- Hero, features, pricing, FAQ, CTA, footer: reflect 43 tools, files, SQL, vectors, graphs
- Features section: expanded from 4 tabs to 7 (added Files, Database, Vectors)
- New /getting-started page: full install guide with correct 4-step flow
- New Mesh vs MCP section: side-by-side diagrams + 8-row comparison table
- Fix: install-toggle on /join page had `npx claudemesh@latest init` (init doesn't exist)
→ replaced with `curl -fsSL https://claudemesh.com/install | bash`
- Navigation: added Getting Started to header, footer, hero link
- COPY.md synced with all 6 capability areas
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract _reqId from incoming WS messages and include it in every direct
response sendToPeer call and sendError call. Clients can now match
responses to requests by ID instead of relying on FIFO ordering.
Old clients without _reqId are unaffected (field simply omitted).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace all 22 resolver Array<fn> patterns with Map<reqId, {resolve, timer}>.
Outgoing messages now include _reqId; on response the broker's echoed _reqId
is used for exact matching, with FIFO fallback for brokers that don't echo it.
Add makeReqId() helper and resolveFromMap() utility. Error propagation block
updated to iterate Maps and pop the oldest entry across all queues.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- shareContext: adds optional memberId param; when provided, upserts on
(meshId, memberId) instead of (meshId, presenceId) — prevents stale
context rows accumulating on every reconnect. Falls back to presenceId
for legacy/anonymous connections. Also refreshes presenceId on update
so it stays current.
- schema: adds member_id column + unique index context_mesh_member_idx
on mesh.context table; new migration 0013_context-stable-member-key.sql.
- index.ts call site updated to pass conn.memberId as the stable key.
- createStream: replaces SELECT-then-INSERT TOCTOU race with atomic
INSERT ... ON CONFLICT DO NOTHING RETURNING, followed by SELECT on miss.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- claim_task/complete_task: send taskId not id
- graph_result: read msg.records not msg.rows
- message_status: try all mesh clients, not only first
- broker: omit state_result for set_state (fixes get_state cross-contamination)
- error handler: unblock first pending resolver on unmatched broker errors
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
broker: owner also fetches sealedKey from mesh.file_key (not skipped),
only non-owners are blocked when key is missing
cli: explicit error when encrypted file has no sealedKey (no silent raw download)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- parse x-encrypted/x-owner-pubkey/x-file-keys headers in handleUploadPost
- pass encrypted and ownerPubkey to uploadFile, call insertFileKeys after
- get_file: fetch sealedKey for non-owners, block if missing, include in response
- list_files: include encrypted field per file
- add grant_file_access WS handler so owners can seal keys for peers
- update types.ts with new message interfaces and union members
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace manual switch + HELP string with citty defineCommand/runMain.
Flag definitions in index.ts are now the single source of truth for
--help output. Remove parseArgs() from launch.ts; accept citty-parsed
flags + rawArgs (-- passthrough to claude preserved).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
broker: expand member groups to ancestor paths at drain time (pull model)
- @flexicar message reaches peers in @flexicar/core, @flexicar/output, etc.
- Resolved at drainForMember — no DB changes, fully backward-compatible
- Any depth: flexicar/team/backend also matches @flexicar and @flexicar/team
cli: wire --role all the way through to session config + env
- Config.role field added
- launch.ts stores role in sessionConfig, passes CLAUDEMESH_ROLE env var
- mcp/server.ts includes role in identity string
- manager.ts auto-joins groups from config on WS connect (--groups flag now works)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Normalise tags to Array before Drizzle insert (PgArray mapper calls
.map() and throws if value is not a standard JS Array)
- Use uploadedByName instead of uploadedByMember FK — the X-Member-Id
header carries the mesh slug, not a mesh.member primary key
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
z.coerce.boolean() treats any non-empty string as true, so MINIO_USE_SSL="false" → true.
Switch to explicit enum+transform.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The sender exclusion filter (excludeSenderSessionPubkey) was blocking
delivery of ALL messages from the sender, including direct messages
to other peers. Now only excludes on broadcast (target_spec = '*').
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Node.js stdout to a pipe is buffered. Without periodic event loop
activity, WS callback → server.notification() → stdout.write() may
not flush until the next I/O event. A 1s setInterval (NOT unref'd)
keeps the event loop ticking so notifications flush immediately.
This is why claude-intercom worked: its 1s HTTP poll kept the event
loop active as a side effect. Claudemesh's passive WS listener let
the event loop settle, causing stdout to buffer indefinitely.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sends test messages to self through the full pipeline per priority
and measures round-trip timing. Reports send→ack and send→receive
latency. Detects broker priority gating (status=working holds next/low).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revert poll-based drain (v0.5.2 overcorrection). Claude Code source
confirms notifications are processed event-driven via React
useEffect, not polled. The WS onPush → server.notification() path
is correct.
Added section 13 to SPEC.md documenting the full Claude Code
notification pipeline, feature gates, priority gating, and common
push delivery issues.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace WS onPush→notification with timer-based buffer drain.
The old claude-intercom used 1s polling and worked reliably.
WS async callbacks may not flush stdio properly for MCP
notifications. Polling on a timer ensures consistent delivery.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
--inbox: count-only notifications, no content in context
--no-messages: tools only, zero prompt injection risk
Default: push (real-time, current behavior)
Wizard shows mode picker when no flag provided.
MCP instructions tell Claude its current mode.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Message modes: push/inbox/off for controlling prompt injection risk.
Shared MCPs: mesh-level MCP servers proxied through the broker —
install once, every peer has access. Full architecture, DB schema,
WS protocol, credential isolation, resource limits.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hero: sessions form a team with groups, state, memory — not just
messaging. Features: 4 tabs with real CLI code (groups, state,
memory, coordination patterns). Use cases: team sprint with 5
agents, new-hire knowledge transfer via recall(), deploy-frozen
via shared state. All match the shipped spec (v0.3.0).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full vision: claudemesh provisions shared infrastructure per mesh.
Peers share messages, state, memory, files, vector embeddings,
entity graphs, session context, tasks, structured databases, and
real-time streams. All through MCP tools, zero configuration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Files: MinIO-backed file sharing built into the broker.
share_file for persistent mesh files, send_message(file:) for
ephemeral attachments. Presigned URLs for download, access
tracking per peer.
Broker infra: MinIO in docker-compose, internal network.
HTTP POST /upload endpoint. WS handlers for get_file,
list_files, file_status, delete_file.
Multi-target: send_message(to:) accepts string or array.
Targets deduplicated before delivery.
Targeted views: MCP instructions teach Claude to send
tailored messages per audience instead of generic broadcasts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase B + C + message delivery status.
State: shared key-value store per mesh. set_state pushes changes
to all peers. get_state/list_state for reads. Peers coordinate
through shared facts instead of messages.
Memory: persistent knowledge with full-text search (tsvector).
remember/recall/forget. New peers recall context from past sessions.
message_status: check delivery status with per-recipient detail
(delivered/held/disconnected).
Multicast fix: broadcast and @group messages now push directly to
all connected peers instead of racing through queue drain.
MCP instructions: dynamic identity injection (name, groups, role),
comprehensive tool reference, group coordination guide.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase A of the claudemesh spec. Peers can now join named groups
with roles, and messages route to @group targets.
Broker:
- @group routing in fan-out (matches peer group membership)
- @all alias for broadcast
- join_group/leave_group WS messages + DB persistence
- list_peers returns group metadata
- drainForMember matches @group targetSpecs in SQL
CLI:
- join_group/leave_group MCP tools
- send_message supports @group targets
- list_peers shows group membership
- PeerInfo includes groups array
- Peer name cache for push notifications
Launch:
- --role flag (optional peer role)
- --groups flag (comma-separated, e.g. "frontend:lead,reviewers")
- Interactive wizard for role + groups when flags omitted
- Groups written to session config for broker hello
Spec: SPEC.md added with full v0.2 vision (groups, state, memory)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
excludeSenderMemberId blocked delivery to ALL peers sharing the
same member_id (all sessions from one join). Replaced with
excludeSenderSessionPubkey which only excludes the sender's own
session — peers with different session pubkeys receive correctly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Write displayName into tmpdir config.json so the MCP server reads
it directly. Env vars from claudemesh launch may not propagate to
MCP child processes spawned by Claude Code. Config file is reliable.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
claudemesh launch now passes --dangerously-skip-permissions to
claude so peers can chat without per-tool-call approval prompts.
Shows a clear explanation before launch; user confirms with Enter.
Skip with -y/--yes for CI or repeat launches.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
onPush now queries list_peers to resolve the sender's pubkey to their
display name. Instructions updated to tell Claude to reply by name
instead of raw pubkey. Fixes two-way messaging between named peers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Store sender's sessionPubkey on message_queue at send time.
drainForMember returns COALESCE(sender_session_pubkey, peer_pubkey)
so the recipient gets the correct sender key for decryption.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each WS connection generates its own ed25519 keypair (sessionPubkey)
sent in the hello handshake. The broker stores it on the presence
row and uses it for message routing + list_peers. This gives every
`claudemesh launch` a unique crypto identity without burning invite
uses — member auth stays permanent, session identity is ephemeral.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace Zod schemas with plain TypeScript validation in env.ts,
config.ts, and invite/parse.ts. Zod 4 classes break under bun
build --target=node (Class2 is not a constructor).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
resolveClient() now resolves display names via list_peers WS query.
Supports exact match, partial match (unique substring), and falls
back to pubkey/channel/broadcast pass-through.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The turbopack.rules config for @svgr/webpack was removed during
the Payload integration attempts. Without it, SVG imports return
raw module objects instead of React components. This crashes
LocaleCustomizer → Icons.UnitedKingdom → object → React #130.
Next.js 16.2.2 supports turbopack in production builds, so this
config is safe now.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The hydration crash exists on both 16.0.10 and 16.2.2 — it's a
pre-existing component bug, not a Next.js regression. Stay on
latest for security + Payload compat when we re-add it.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wire list_peers and set_summary MCP tools to the broker's WS
protocol instead of returning stubs. Peers can now discover each
other, see status/summary, and route messages by display name.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Next.js 16.2.2 causes React #130 on client hydration in
production standalone output. Server renders fine but client
JS crashes. Downgrade to 16.0.10 which was the last working
version. Payload CMS is fully removed from prod so the
turbopack restriction is no longer relevant.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove ALL Payload imports, withPayload wrapper, and (payload)
routes. Blog index + changelog are now static data arrays.
Blog post at /blog/peer-messaging-claude-code is static TSX.
Payload CMS stays as a dev dependency for future local admin
but has zero presence in the production build.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Final working pattern: withPayload via require() for build
compatibility, admin page replaced with redirect (no RootPage
import = no React #130), payload packages externalized from
turbopack bundle. Blog/changelog use server-side getPayload().
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Turbopack tries to parse esbuild's native binary as JS, causing
build failure. Externalize all Payload-related packages so they
resolve at runtime, not bundled.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
withPayload crashes ALL routes with React #130 in standalone
output — even with admin page replaced by redirect. The wrapper
injects a client-side ConfigProvider that fails hydration.
Removed: withPayload wrapper, entire (payload) route group.
Kept: payload.config.ts, migrations, blog/changelog server-side
queries with graceful DB fallback.
Payload admin runs on local dev only (add withPayload back in
next.config when running pnpm dev). Production content via
static TSX pages or future API-based publishing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep withPayload (needed for build compilation) but replace the
admin RootPage with a redirect. The RootPage's ConfigProvider
causes React #130 in standalone output. Blog/changelog use
server-side getPayload() which works fine.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The withPayload wrapper injects a client-side ConfigProvider that
crashes hydration on every route when the Payload admin can't
initialize in standalone output. Blog/changelog pages use server-
side getPayload() which works without the wrapper.
Payload admin at /payload is disabled until standalone server
init is implemented. All user-facing content works.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replicate working cuidecar Payload setup:
- require() instead of ESM import for withPayload
- routes.admin = "/payload" to avoid /admin conflicts
- (payload)/payload/ route group with own layout + importMap
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Payload CMS integration crashes the entire production app — the
withPayload wrapper + admin routes break when DB tables don't
exist and the layout conflicts with i18n routing.
Keeping: payload.config.ts, blog/changelog pages with graceful
DB fallback, static blog post page. Payload admin will be added
back once properly integrated with a dedicated route group that
doesn't inherit the main app layout.
The blog post at /blog/peer-messaging-claude-code is static TSX
and works without Payload runtime.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Static TSX page at /blog/peer-messaging-claude-code while Payload
admin is not yet configured in production. Full 1100-word post on
protocol, dev-channels, prompt-injection, and next steps.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Production containers get DATABASE_URL (postgres) — Payload
creates tables in a 'payload' schema. Local dev falls back to
SQLite file for zero-config.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Production has no SQLite — Payload pages now catch connection
errors and render empty state instead of crashing with React #130.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove Payload's /api/[...slug] route that conflicts with existing
/api/[...route]. Blog/changelog pages use Payload's local API.
Includes cli install.ts backup + assertNoMcpLoss guards (from
worktree agent).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Payload CMS v3.81 withPayload() requires Next.js >=16.1.0 for
production turbopack builds. Upgrade resolves the build failure.
Reverts the dev-only withPayload workaround — now loads normally.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Payload CMS v3.81 withPayload() injects a turbopack config key
that Next.js 16.0.10 rejects in production builds (needs >=16.1).
Load withPayload only in dev; production gets a pass-through.
Payload admin works locally; production serves blog/changelog
as regular Next.js pages querying the Payload API.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Next.js 16.0.10 fails production builds with turbopack config
present (needs >=16.1.0). Gate it behind NODE_ENV !== production.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cover encryptDirect/decryptDirect with three scenarios (happy path,
wrong recipient, tampered ciphertext) and invite link parsing with
round-trip, expiry rejection, and malformed input handling.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The private setStatus(ConnStatus) conflicted with the public
setStatus("idle"|"working"|"dnd") method, causing TS2393 under strict
typecheck. Rename the private one to setConnStatus and update all
internal call sites.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three surgical edits for credibility:
Hero subheadline: remove WhatsApp/Slack/phone promises (roadmap,
not shipped), replace "reachable from anywhere you are" (vague)
with concrete value prop: E2E encrypted, delivered mid-turn as
<channel> reminders, broker never sees plaintext. Change "Free
and open-source. Forever." → "Open-source CLI. Free during
public beta." to match the pricing section.
Logo bar: remove Vercel/Linear/Stripe/Supabase/Shopify/Figma
(not actual customers). Replace with tech stack labels: Claude
Code, MCP, libsodium, Bun, TypeScript, MIT.
FAQ: fix "Is claudemesh free?" to match beta pricing. Fix "How
do I get started?" to reference the real curl installer instead
of nonexistent npx claudemesh init. Fix "Which Claude Code
versions?" to name actual install + launch flow.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Landing page showed \`curl -fsSL claudemesh.sh/install | bash\` but
the domain didn't resolve, so anyone copy-pasting got a DNS error.
Ship:
- apps/web/src/app/install/route.ts: GET returns an auditable bash
installer (Node preflight, npm install -g claudemesh-cli, runs
claudemesh install, prints next steps, colored output). No Node
auto-install — fails clean if missing with a pointer.
- apps/web/src/proxy.ts: exclude /install from the i18n matcher so
Next.js returns the shell script unmangled.
- hero.tsx + features.tsx: swap claudemesh.sh → claudemesh.com.
Test: curl http://localhost:3000/install | bash -n → OK.
Content-Type: text/x-shellscript; charset=utf-8.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The 6-tier grid was selling features that don't exist yet:
- Pro \$12/mo: dashboard, peer registry, message history (not built)
- Plus \$24/mo: Tailscale mesh (already default), MCP bridge (free),
audit log (not built)
- Team \$99/mo: \"self-hosted broker\" AND \"25 peers\" AND
\"unlimited peers\" — three contradictions in one tier
- Business \$499/mo: multi-region, retention, Slack/Linear (roadmap)
- Enterprise: claimed \"SOC 2 pack\" without certification
Replaced with a single Public-Beta card:
- Free, no card required
- Two columns: Shipping today (verified against source) + Roadmap
v0.2–v0.3 (clearly labeled)
- Promise: \"Beta users keep the free plan for life\"
Non-additive rewrite of a shipped section. Authorized by user
explicitly; required because the prior pricing created refund +
legal risk.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Running \`claudemesh\` with no args now detects install state and
prints context-appropriate guidance: suggests \`install\` if MCP
not registered, \`join\` if no meshes, \`launch\` if ready.
Replaces the static HELP dump with a first-run wizard that meets
users where they are.
Static HELP still available via --help.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three Tier-2 polish commands for debugging + discoverability:
- claudemesh --version / -v: print CLI version (baked from
package.json at build time via Bun JSON import).
- claudemesh status: WS-probe each joined mesh's broker, report
reachability per mesh. Exit 1 if any broker unreachable.
- claudemesh doctor: run 6 preconditions — Node>=20, claude on PATH,
MCP registered, hooks registered, config file parses + chmod 0600,
mesh keypairs validate. Each check has a pass/fail + fix hint.
Exit 0 if all pass.
Help text now leads with version (\"claudemesh v0.1.3 —\").
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Additive NEWS entry pointing to the new public repo
github.com/alezmad/claudemesh-cli and the launch command.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `claudemesh launch [args]` that spawns Claude Code with
--dangerously-load-development-channels server:claudemesh so peer
messages arrive as <channel> system reminders mid-turn instead of
pull-only via check_messages. Windows uses shell:true to resolve
claude.cmd from PATHEXT.
Prints an info banner before spawning that explains the channel's
scope (peer text injection only), the trust model (treat as
untrusted input), and that existing tool-approval prompts remain
the safety net. --quiet skips the banner.
Install output now mentions `claudemesh launch` as the recommended
launch path; plain `claude` still works for pull-only mode.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The push handler previously fell through to base64-decoding the
raw ciphertext whenever decryptDirect() returned null. For direct
(crypto_box) messages that produces garbage binary which surfaces
as garbled bytes in Claude's <channel> reminder. Limit the base64
fallback to legacy broadcast/channel messages (no senderPubkey),
and emit a clearer "⚠ message from <pubkey> failed to decrypt"
warning when direct decryption fails.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Same bug as cuidecar/cccb43a: \`overscroll-behavior: none\` applied to
the universal selector killed wheel events on every overflow:hidden
container on the page — hero, demo-dashboard, cta, surfaces, anything
with rounded cards. Combined with the mesh-stream overflow-y-auto
(fixed in 701516b) this was double-trapping the wheel.
Move the rule from \`*\` to \`html\`, change to \`overscroll-behavior-y\`.
Still prevents rubber-band chaining at the document level, but lets
wheel events propagate naturally through nested overflow containers.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The demo-dashboard embedded MeshStream with a fixed min-h-[480px] grid
+ overflow-y-auto on the message <ol>. Browsers capture every wheel
event that fires over a scrollable container — so hovering the demo
section froze page scroll until the user moved the cursor off.
Landing demo has only 6 messages, never needs internal scroll. The
fixed viewport only makes sense in the live dashboard where envelope
count can exceed the box.
Added `scrollable?: boolean` prop to MeshStream (default false).
- demo-dashboard (landing): no prop → intrinsic height, no overflow,
wheel events propagate to the page
- live-stream-panel (/dashboard/meshes/[id]/live): scrollable → keeps
the chat-style fixed viewport with scroll
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full parity with claude-peers:
1. Push-injection (the "tap on shoulder" UX)
- MCP server now declares experimental.claude/channel capability
- BrokerClient onPush handlers emit server.notification({
method: "notifications/claude/channel",
params: { content, meta: {from_id, from_name, mesh_slug,
mesh_id, priority, sent_at, delivered_at, kind}}
})
- Claude Code injects each push as <channel source="claudemesh">
system reminder, so the receiver session sees inbound messages
WITHOUT calling check_messages manually
- Updated MCP instructions with the "RESPOND IMMEDIATELY" framing
(adapted from claude-peers)
2. Status hooks in install (default-on, --no-hooks to opt out)
- new apps/cli/src/commands/hook.ts: reads stdin JSON (Claude Code
hook payload), extracts cwd+session_id, POSTs /hook/set-status
to every joined mesh's broker in parallel with process.ppid +
1s timeout per POST. Silent fail, fire-and-forget.
- install.ts: writes to ~/.claude/settings.json registering
`claudemesh hook idle` on Stop + `claudemesh hook working` on
UserPromptSubmit. Idempotent, preserves other hook entries.
- uninstall.ts: removes both hook entries + MCP entry; leaves
unrelated hook/MCP entries alone.
- dedupes by brokerUrl (multiple meshes on same broker → one POST)
3. CLI surface
- new subcommand: `claudemesh hook <status>` (internal, but
exposed so Claude Code can invoke it via the hook shell command)
- `install --no-hooks` for users who want bare MCP registration
- --help updated
Coexistence with claude-peers: both tools register Stop and
UserPromptSubmit hooks, each POSTs to its own broker. Claude Code
fires multiple hooks per event without conflict.
npm version 0.1.0 → 0.1.1 (patch).
Verified:
- install with hooks → 2 entries added to settings.json ✓
- install --no-hooks → "Hooks skipped" ✓
- uninstall → both MCP entry + 2 hook entries removed ✓
- `echo '{...}' | claudemesh hook idle` with no joined meshes →
silent no-op ("no joined meshes, nothing to do") ✓
- MCP initialize response includes experimental.claude/channel ✓
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three bugs caught via devtools on live site:
**1. CSP 'font-src 'self' data:' violation × 3 per landing load.**
BaseLayout was loading Geist + Geist_Mono via next/font/google. In
prod builds Next.js self-hosts those under /_next/static, but the
generated CSS still references `--font-sans: "Geist", …` which some
browsers resolve by re-requesting fonts.gstatic.com. Since we ship
Anthropic Sans/Serif/Mono self-hosted already (/fonts/*.woff2 via
@font-face in globals.css), the Geist dependency was pure overhead.
Removed `next/font/google` imports entirely. Added a `.cm-root`
class on <html> that remaps the Tailwind `--font-sans/--font-mono`
tokens to our `--cm-font-sans/--cm-font-mono` vars — so every
Tailwind `font-sans` / `font-mono` utility now resolves to Anthropic
families. No Google Fonts fetch, no CSP violation.
**2. /pricing 401 on public visit.**
`<Plan>` calls `useCustomer()` → `GET /api/billing/customer` which
needs auth. Unauthed visitor on /pricing → 401 in devtools + wasted
round trip. Gated `useCustomer` on `authClient.useSession()` —
query `enabled: !!session?.user`. Public visitors now skip the fetch
entirely; signed-in users still get their customer record.
**3. Residual "Welcome back! 👋" on /auth/login (both locales).**
Emoji sweep (e91fc80) missed the i18n translation files. Removed 👋
from en/auth.json + es/auth.json login header titles.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clickable HTTPS invite URLs replace the raw ic://join/<token> as the
primary share format. Someone receiving a link in Slack now lands on
a friendly page with install instructions, not a dead-end.
Backend:
- createMyInvite returns a new joinUrl field
(https://claudemesh.com/join/<token>) alongside the existing
ic://join/<token> inviteLink and raw token. Schema + Hono route
updated. ic:// scheme stays — CLI parses both.
- New GET /api/public/invite/:token in packages/api/src/modules/public/
(unauthed). Decodes the base64url payload, verifies ed25519
signature against owner_pubkey using the same canonicalInvite()
contract the broker enforces on join, then joins mesh/invite/user
to return the shape needed by the landing page. Does NOT mutate
usedCount — this is a read-only preview.
- Error taxonomy: malformed | bad_signature | expired | revoked |
exhausted | mesh_archived | not_found. Each returned with any
metadata we CAN surface (meshName, inviterName, expiresAt) so the
error page can be specific ("ask Jordan for a new one").
- cache-control: public max-age=30 on valid invites, no-store on
errors (reasons flip as state changes).
Frontend:
- New public route /[locale]/join/[token] (no auth). Server
Component fetches the preview endpoint, branches on valid/invalid,
renders a minimal landing-design-language shell (wordmark header,
clay accents, serif headlines, mono commands).
- Valid-invite view: "You're invited to {meshName}", inviter +
role + member-count lede, install-toggle component.
- Invalid-invite view: per-reason error copy + inviter name when
available + link back to /.
- InstallToggle client component: three-way state
(unknown/yes/no). Asks "first time / already set up?", then shows
either the 3-step install+init+join path with per-step copy
buttons, or the single claudemesh join <token> command for users
who have the CLI. Every code block has copy-to-clipboard.
- Security footer: "ed25519 keypair generated locally, you keep
your keys, broker sees ciphertext only, leave anytime with
claudemesh leave <mesh-slug>".
Invite generator (/dashboard/meshes/[id]/invite):
- QR code now encodes the HTTPS joinUrl instead of ic:// (phone
cameras land on the web page → friendly path).
- Primary CTA copies the HTTPS URL. Secondary "Copy CLI command"
for fast-path users. Footer explanation updated.
CLI coordination note: dispatched to broker/db lane — claudemesh CLI
needs to accept BOTH ic://join/<token> AND
https://claudemesh.com/join/<token> (extract <token> from pathname).
Server side already returns both.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pairs with claudemesh-2's new /join/[token] landing page. Users can
now paste a clickable HTTPS URL instead of the dev-only ic:// scheme.
apps/cli/src/invite/parse.ts — new extractInviteToken() handles
four input formats before handing the raw base64url token to the
existing parseInviteLink pipeline:
- https://claudemesh.com/join/<token> (primary, clickable)
- https://claudemesh.com/<locale>/join/<token> (i18n prefix)
- ic://join/<token> (still supported, dev)
- <raw-token> (last resort: bare base64url)
User-facing strings updated to the HTTPS form:
- cli help: "join <url>"
- install success message
- list (no-meshes) hint
- MCP server "no meshes" error
- README.md primary example
- docs/QUICKSTART.md Path A + Path B
Verified extractInviteToken() on all 4 formats — each returns the
same base64url token → same broker /join lookup.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ScrollContainer — the wrapper under every dashboard/admin SidebarInset
— had zero horizontal padding on its scroll child, so pages rendered
edge-to-edge against the viewport. On wide screens content also
stretched to whatever width the sidebar left over (no max-width).
Single-point fix: wrap the scroll child in
<div class="mx-auto w-full max-w-[var(--cm-max-w)] px-4 py-6 md:px-8 md:py-8">
Hits every route under SidebarInset in one change:
- /dashboard
- /dashboard/meshes + /new + /[id] + /[id]/invite + /[id]/live
- /dashboard/invites
- /dashboard/settings (+ billing, security)
- /admin + /admin/users, /organizations, /customers, /meshes,
/sessions, /invites, /audit
px-4 → md:px-8 matches the marketing sections' gutter rhythm.
max-w-[var(--cm-max-w)] (90rem) caps content on ultra-wide.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
apps/broker/scripts/load-test.ts — configurable harness (N peers ×
M msgs). Each peer gets a real ed25519 keypair, signs its own hello,
encrypts every send via crypto_box. Measures send→ack latency
(broker queue write) and send→push latency (full e2e round-trip).
Samples broker RSS + FD count via ps/lsof if BROKER_PID is set.
docs/LOAD-TEST-v0.1.0.md — honest baseline results:
- ≤ 10 peers × 100 msgs: sub-second p99, 100% delivery
- 25-100 peers × 100 msgs: 5-10s p99, 100% delivery, no FD leaks
- 100 peers × 1000 msgs (100k total): 23s p99, 88.8% delivery at
15min drain cap. Peak RSS 1156MB, max FDs 122.
Broker is DB-bound — bottleneck is fanout amplification (every send
triggers N drain queries across connected peers). Document this
honestly as where v0.1.0 tops out. Real production traffic is
orders of magnitude lighter than this burst test (human/AI cadence,
not synthetic burst) — launch-ready as-is.
v0.2 optimization targets documented in the report:
- fanout decoupling (batch drains on timer)
- drop refreshStatusFromJsonl from delivery hot path
- pipelined acks
- horizontal sharding by meshId
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Live social-proof counter on the landing page, tied to the E2E
narrative. Formatted as understated mono footer, not hero brag.
Backend — new GET /api/public/stats (unauthed, 60s in-memory cache):
{
messagesRouted: SELECT COUNT(*) FROM mesh.message_queue,
meshesCreated: SELECT COUNT(*) FROM mesh.mesh WHERE archivedAt IS NULL,
peersActive: SELECT COUNT(*) FROM mesh.presence WHERE disconnectedAt IS NULL,
lastUpdated: ISO timestamp,
}
Aggregate counts only — no ids, no names, no ciphertext, no routing
metadata. Safe for public consumption. cache-control header sets
public/s-maxage=60 for edge caching. `x-cache: HIT|MISS` for debug.
Frontend — new MeshStats Server Component at
modules/marketing/home/mesh-stats.tsx. Reads the endpoint server-side
via the ~/lib/api/server client, renders monospace footer:
ciphertext routed → 4,217 messages · 12 meshes · 8 peers online
broker sees none of it
Graceful zero state: when messagesRouted === 0 shows
"ciphertext → ready to route" instead of embarrassing zeros. Tabular-
nums for the numeric spans so they don't jitter across renders.
Mounted between <CallToAction /> and <LatestNewsToaster />. Page-level
`export const revalidate = 60` so Next.js ISR refreshes the counter
every minute without a DB hit on every request (combined with the
API cache = two-layer 60s TTL, DB sees ~1 query/minute).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Strategic positioning split for v0.1.0:
- Local/single-machine self-host → redirect to claude-intercom (MIT,
simpler, purpose-built for that case)
- Cross-machine / team → hosted claudemesh.com (E2E encrypted, zero-ops)
- Building the broker from source is an audit/fork path, not the
primary self-host flow. Enterprise self-host packaging deferred to
v0.2+.
Previous "Run your own broker" section pushed users toward a docker
pull + self-host flow we're not publishing images for this launch
(ghcr.io/alezmad/claudemesh-broker stays as future enterprise path).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
.gitea/workflows/release.yml runs on any v-prefixed tag push (and on
workflow_dispatch with a manual tag input). Strips the v prefix, logs
in to ghcr.io via the GHCR_TOKEN repo secret, then runs the existing
publish-images.sh → all 3 multi-arch images land with :<tag> + :latest
tags.
Workflow path from future releases:
git tag v0.1.1
git push --tags gitea-vps v0.1.1
→ 10 min later: ghcr.io/alezmad/claudemesh-*:0.1.1 + :latest live.
Inert until act_runner is installed on gitea-vps (post-launch decision
per ovhcloud-agutmou). Also serves as executable documentation for
forkers on Gitea/GitHub.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
GHCR_TOKEN=ghp_xxx scripts/publish-images.sh 0.1.0 — logs into ghcr.io
as alezmad and pushes all 3 claudemesh-* images (broker + web + migrate,
multi-arch) via the existing build-multiarch.sh. Supports --dry-run
that prints what would publish without logging in or pushing.
When user drops their GHCR PAT, shipping the 0.1.0 image tag is one
command.
Also documents post-trim image sizes in DEPLOY.md Step 2 (broker 341MB,
migrate 653MB, web 250MB).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use pnpm deploy to flatten each package's runtime subset into /deploy,
then copy ONLY that into the runtime stage. Catalog + workspace:*
specifiers previously forced full-workspace resolution into every
image's node_modules — unnecessary for either runtime.
Results (arm64, same smoke tests pass):
- broker: 3.26GB → 341MB (-90%, drops all devDeps incl. drizzle-kit)
- migrate: 3.27GB → 653MB (-80%, keeps drizzle-kit which IS runtime)
Broker /health confirms GIT_SHA build-arg still propagates (gitSha:
"30bc24f" in smoke test). Migrate still reads drizzle.config.ts and
attempts the connection correctly.
--legacy flag needed because pnpm 10 defaults to inject-workspace-
packages mode which the monorepo doesn't opt into; legacy is safe here.
--ignore-scripts on deploy skips the root postinstall (sherif lint:ws)
which has nothing to do with runtime.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
User owns the alezmad github scope, not a claudemesh org — point README
+ build script + DEPLOY.md at the real namespace so the docker pull
snippets actually work on launch day. Image names are now
claudemesh-broker / claudemesh-web / claudemesh-migrate (prefixed since
they live under a personal scope).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
40-line block with docker run + curl /health verify + env var reference
+ build-from-source fallback pointing at scripts/build-multiarch.sh.
Sits between the architecture diagram and honest-limits section so OSS
adopters find it immediately after understanding the broker's role.
Links through to DEPLOY_SPEC.md for the full runtime contract.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
claudemesh requires an account — mesh membership is tied to user.id.
e8ad7a5 flipped the config default but the env var override at
env.config.ts:43 still defaulted to true, keeping the button visible.
Fixed at env var level + example files. Needs Coolify rebuild since
NEXT_PUBLIC_* is build-time in Next standalone.
Four parallel jobs on push to main and on PRs:
- lint — pnpm lint (turbo across workspace)
- typecheck — pnpm typecheck (turbo across workspace)
- test-broker — pgvector/pg17 service container, drizzle-kit migrate,
then vitest on apps/broker (64 tests per DEPLOY_SPEC.md)
- build-amd64 — docker buildx build of broker + migrate + web images
for linux/amd64 (catches Linux-only Dockerfile bugs that Mac local
buildx can't hit reliably, closes the documented multi-arch followup)
All jobs use frozen-lockfile install + pnpm-store cache via setup-node.
Regenerates pnpm-lock.yaml to resolve apps/cli zod catalog drift that
was silently blocking any frozen-lockfile install (shipped under same
commit since CI cannot pass without it).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
scripts/build-multiarch.sh produces linux/amd64 + linux/arm64 image
manifests for all three deployable images. Mac devs (Apple Silicon)
pulling claudemesh images get arm64 native — no QEMU, no 2-4x startup
penalty, no warnings. VPS (amd64) gets the native variant from the
same manifest.
- 3 images in one script: broker, web, migrate
- Tags both <SHA> and :latest per image
- GIT_SHA build-arg wired in for broker /health provenance
Replaces scripts/build-and-push.sh which was hardcoded to a dead
registry (192.168.1.3:3030) and wrong org (alezmad/turbostarter).
DEPLOY.md Step 2 rewritten to use the new script + Mac Docker Desktop
Rosetta-emulation gotcha documented.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- switch email provider from resend (unused) to postmark (creds available)
- re-enable requireEmailVerification now that email path works
- env vars POSTMARK_API_KEY + EMAIL_FROM must be set in Coolify
Documents the v0.1.0 scope limit for the web dashboard and the v0.2
plan for turning the browser into a full mesh peer.
Context: quick-send composer was scoped into the mobile-responsive
pass but requires a client-side crypto decision. The correct design
is a WebCrypto-generated ed25519 keypair + IndexedDB storage so the
browser joins the mesh with the same security posture as the CLI,
not a second-class shortcut that breaks E2E. That's a 1-2 day build
(keypair gen, IndexedDB wrapper, crypto_box, signed hello, invite
redemption, key export UX) — out of scope for v0.1.0 launch.
v0.1.0 honest limit: dashboard = read-only situational awareness.
Messaging = CLI/MCP tools only.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Anthropic design language is icon-only, no emoji. User flagged that
claude-intercom components (and copy I wrote) were leaning on emoji
decoration. Swept all user-visible emojis in apps/web + packages/ui.
Changes:
- meshes/new onboarding banner: "Welcome to claudemesh 👋" → drop the
wave, text stands alone
- meshes/[id]/invite banner: "🎉 Mesh created" → "Mesh created"
- demo-dashboard script message: "thanks 🙏" → "thanks." (inline prose)
- MeshStream message-type chips: replaced the ⟐ / ← / → unicode
glyphs with proper inline SVG icons (10×10 stroke paths). Each chip
now carries: plus-sign for broadcast, up-arrow for hand-raise,
right-arrow for direct. Same claude-orange / emerald / neutral
coloring, same typography — just geometry instead of text symbols.
Nothing swapped to Lucide React imports yet — Icons barrel in
packages/ui/web only exports a subset (Circle, Check, MessageCircle,
Sparkles, Megaphone), and the four glyphs we needed were simpler as
inline SVG than adding barrel exports + per-component import plumbing.
If emoji→Lucide fully lands, we'll add the rest to the Icons barrel
in one pass.
Skipped per PM spec: TTS announcements, commit messages, code
comments, logs — not user-visible.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
RESEND_API_KEY / SMTP credentials not yet configured in production.
Users sign up + land in dashboard immediately, no verification email.
Re-enable requireEmailVerification when email provider is live:
packages/auth/src/server.ts:93
README + FAQ claim MIT, but LICENSE.md was still the TurboStarter EULA
from the scaffold — mismatch is an HN/launch blocker. Replace with MIT
for claudemesh-authored code + Attribution section preserving scaffold
obligations.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mesh detail page at /dashboard/meshes/[id]:
- Header: flex-col → flex-row at sm breakpoint. Live/Invite buttons
stretch full-width stacked on mobile (flex-1), auto-width side-by-
side from sm up.
- "Generate invite link" truncates to "Invite" on mobile (viewport
constrained) so the button fits next to Live.
- Members + active-invites rows: stack metadata vertically on mobile
(flex-col → sm:flex-row), wrap badges inside with flex-wrap so the
member display-name + role + revoked badges don't horizontal-scroll.
Invites list at /dashboard/invites:
- Wrap the table in overflow-x-auto with min-w-[560px] on the table
itself. 5-column data-table that genuinely needs horizontal space
— don't fake it with card stacking, let the user scroll naturally.
Quick-send composer deferred to a follow-up — writes a message to the
mesh, which requires a client-side encryption decision (ed25519
keypair in the browser? key derivation from session? plaintext-to-
broker and break E2E?). Parked as its own spec.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Footer "Built with ✦ TurboStarter" link → "Built on ⎇ claude-intercom · MIT".
Credits the MIT OSS foundation claudemesh sits on and aligns with the
GitHub icon in the header already pointing at alezmad/claude-intercom.
Dropped the 512-byte TurboStarter wordmark SVG + the large brand icon.
Kept a lean GitHub glyph + text so it reads as attribution, not ad.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- PRODUCT_NAME default: TurboStarter → claudemesh (.env.example, .env.local)
- SEED_EMAIL default: me@turbostarter.dev → dev@example.com
- README dev accounts table: reflect new seed email format
- DEPLOY.md: fix stale SEED_EMAIL reference
Keeps DB user as turbostarter per docker-compose.yml default; retains
TurboStarter attribution link in README Contributing section (legit
credit for the template this repo is built on).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
One-shot migrate container runs drizzle-kit migrate against DATABASE_URL
and exits 0 before web boots. web service depends_on with condition
service_completed_successfully, so failed migrations block web startup
instead of serving 500s against a stale schema. Broker deliberately does
NOT depend on migrate - it tolerates DB-down gracefully per DEPLOY_SPEC
and should keep serving WS peers even during migration failures.
Also excludes apps/cli from docker build context (CLI ships to npm, not
containers) to sidestep zod spec drift in its package.json vs lockfile.
Known followup: migrate image is 3.27GB due to pnpm catalog: specifiers
forcing full-workspace resolution. pnpm deploy bundle trim is a P2.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three launch-visible friction fixes:
#3: "Continuar como invitado" (anonymous sign-in) removed. claudemesh
requires an account — mesh membership, invite issuance, and audit
trails are all tied to a user.id. Flipping the toggle is enough:
the AnonymousLogin component is gated by
`authConfig.providers.anonymous` in login.tsx, so disabling the
flag makes the button disappear from both /login and /register.
#4: OAuth buttons now show proper brand labels. Was rendering lowercase
"github" / "google" / "apple" via capitalize CSS (which users read
as "is this broken?"). Now renders "Continue with GitHub" /
"Continue with Google" / "Continue with Apple" next to the existing
brand icons. Also swapped layout: was `grow basis-28` (side-by-side
chips), now `w-full justify-center` (stacked full-width buttons) —
matches claude.com login styling more closely.
#6: Session hydration race on /dashboard — NON-ISSUE verified. The
0-mesh redirect runs in a Server Component AFTER
/dashboard/layout.tsx's getSession() gate. Server api.ts forwards
cookies to the Hono backend, so no client-side auth state is in
play. No fix needed.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wires the Discord-style demo UI to real user data. Users with 1+ meshes
now get situational awareness: who's online, what's in the queue, what
the broker saw recently — polling every 4s, all E2E encrypted.
Extraction pass:
- New `<MeshStream peers messages channelLabel footer>` renderer at
modules/marketing/home/mesh-stream.tsx — pure presentation, no
playback engine, no data fetching. Handles peer filter, hover-for-
ciphertext tooltip, animated message list.
- demo-dashboard.tsx refactored to use it: keeps the playback loop,
traffic-light chrome, and script-driven messages; passes everything
to MeshStream via props. ~120 LOC shorter.
Backend:
- new GET /api/my/meshes/:id/stream in packages/api (same authz gate
as /my/meshes/:id — owner OR non-revoked member). Returns:
- up to 20 live presences (disconnectedAt IS NULL), joined to
meshMember for displayName
- up to 50 most-recent message_queue envelopes with metadata only:
sender + displayName, targetSpec, priority, createdAt, deliveredAt,
byte size, and a 24-char ciphertext preview (this IS what the
broker sees — no plaintext anywhere in the response)
- up to 20 recent audit events
- getMyMeshStreamResponseSchema in schema/mesh-user.ts matches exactly.
Frontend:
- new LiveStreamPanel client component at modules/mesh/live-stream-panel.tsx
— react-query with refetchInterval: 4000ms, refetchIntervalInBackground
false. Maps presences + envelopes to MeshStream's Peer/Message shape,
classifies targetSpec into message type ("tag:*" → ask_mesh, "*" →
broadcast, else direct). Passes through the ciphertextPreview as the
hover content — no fake ciphertext in live view.
- new route /dashboard/meshes/[id]/live with server-side authz preflight
via /my/meshes/:id. Mounts LiveStreamPanel inside a dashboard page
shell with breadcrumb back to mesh detail.
- Mesh detail page gets a new "Live" pill button (clay-pulsing dot)
next to "Generate invite link" in the header.
- paths config gets dashboard.user.meshes.live(id).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claudemesh/cli was already taken on npm by an unrelated project
(claudemesh "domain packages", v1.0.7). PM picked option A: publish
unscoped as claudemesh-cli. Binary name stays "claudemesh" — users
type the natural thing on install:
npm install -g claudemesh-cli
claudemesh install
claudemesh join ic://join/...
renamed references everywhere:
- apps/cli/package.json: name
- apps/cli/README.md: title + install command
- apps/cli/src/{index.ts, mcp/server.ts, commands/install.ts} headers
- docs/QUICKSTART.md: install command, version banner, npx hint
- docs/roadmap.md: package name
also (PM journey-friction #5): surface the "restart Claude Code" step
LOUDLY in install output. Added a yellow-bold warning line after the
✓ success lines so new users don't miss the restart step (MCP tools
only load on Claude Code restart).
⚠ RESTART CLAUDE CODE for MCP tools to appear.
ANSI colors gated on isTTY + NO_COLOR/TERM=dumb guards.
bundle rebuilt. ready for npm publish pending user's `npm adduser`.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Makes @claudemesh/cli installable globally via npm without requiring
bun on user machines. (Bun stays the dev runtime; bundled output is
node-compatible.)
- bun build --target=node --outfile dist/index.js produces a 2.69MB
standalone bundle with node-shebang banner
- package.json: add description/keywords/author/license/homepage/
repository, set bin to ./dist/index.js, files=[dist, README, LICENSE],
publishConfig.access=public, engines.node >=20
- prepublishOnly auto-runs the build
- pin zod from catalog: to 4.1.13 (npm rejects catalog: refs)
- swap Bun.spawnSync → node:child_process.spawnSync in install.ts
(the only Bun-global usage in the package)
- strip shebang from src/index.ts (banner supplies it post-bundle)
install command now runs in two modes:
- BUNDLED (npm i -g): detects dist/index.js path, writes MCP entry
with command "claudemesh" (relies on the global bin shim on PATH)
- SOURCE (bun src/index.ts, dev): preflights bun, writes MCP entry
with command "bun <absolute-path> mcp"
verified end-to-end:
- node dist/index.js --help prints usage ✓
- node dist/index.js install writes correct ~/.claude.json ✓
- node dist/index.js mcp | tools/list returns all 5 tools ✓
- bun src/index.ts install (dev mode) still works ✓
NOT PUBLISHED YET — @claudemesh/cli is owned by an unrelated project
on npm. Awaiting user decision on alternative name (claudemesh-cli,
@alezmad/claudemesh-cli, or new org scope). Bundle is name-agnostic
and will reuse regardless.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Every shadcn/ui component (Button, Card, Input, Dialog, Table, Sidebar,
Form, …) was still rendering with the TurboStarter-inherited oklch
palette from @turbostarter/ui-web — white backgrounds, neutral greys,
turbostarter-orange primary — because we only used --cm-* tokens via
inline styles in the marketing pages and auth layout, never remapped the
shadcn tokens the components actually read.
User flagged this on the live site — BetterAuth forms, dashboard cards,
admin data-tables all off-brand.
Shortest fix: override the shadcn tokens at the :root, [data-theme="orange"],
and .dark selectors in globals.css so they resolve to --cm-* values.
Every shadcn component auto-themes without a single component rewrite.
Mappings:
- --background → --cm-bg (#141413)
- --foreground → --cm-fg (#faf9f5)
- --card/popover → --cm-bg-elevated (#1f1e1d)
- --primary → --cm-clay (#d97757)
- --muted → --cm-bg-elevated
- --muted-foreground → --cm-fg-tertiary
- --border/--input → --cm-border (clay @ 20%)
- --ring → --cm-clay (clay focus ring)
- --radius → --cm-radius-md (0.5rem)
- sidebar tokens → cm-bg-elevated + cm-clay
- color-scheme → dark (kills white flash)
--destructive / --success left as standard red/green hexes — they
don't need to match claudemesh palette, they need to signal.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Visitors read the page and still don't grok claudemesh is a *mesh* of
agents, not chatbot integrations. Fix: drop them straight into a live
Discord-style view of 4 peers talking. No auth, no WS, no backend —
a pre-recorded 10-second conversation that loops, encrypted over a
fake broker.
The conversation script (demo-dashboard-script.ts) hits every mental
model the landing needs to plant:
bob-desktop → #payments: "stripe sig verification broken?"
alice-laptop (self-nominates): "hit this 2wks ago, pulling fix"
alice → bob (direct): "<actual fix with file+line>"
bob → alice: "saved me. thanks 🙏"
carol-ios → #infra: "CI red on main?"
bob → carol: "reverting 7af3d, ~2min"
Covers: tag-routed broadcast (ask_mesh), self-election (hand-raise),
direct-peer DM, cross-surface (phone peer in the mix), multi-thread
concurrency.
Component (demo-dashboard.tsx, ~420 LOC):
┌─────────────────────────────────────────────────┐
│ meshes | peers | live message stream │
│ side | list | (motion fade+rise on each msg) │
│ bar | | │
└─────────────────────────────────────────────────┘
- requestAnimationFrame playback loop against SCRIPT[].t offsets
- Auto-loops after SCRIPT_DURATION_MS, 4s pause baked in
- Per-peer filter: click a peer in the sidebar, only their messages
show in the stream (from OR to), shows "filtered: <peer>" in header
- Play / pause / restart buttons
- Hover any message → dashed clay box shows the fake ciphertext:
"broker sees only this: AUp3+n7z1bY=.kQfM9vL4jR8..." — drives the
E2E point without a paragraph of crypto copy
- Status dots: green idle, clay pulse working, grey offline
- Surface glyphs inline (terminal / phone / slack) next to peer names
- Message type chips: ⟐ broadcast, ← hand-raise, → direct
- Progress bar at bottom ties the loop to a visible timeline
- Window chrome with traffic-light dots + "mesh.claudemesh.com ·
flexicar-ops · 4 peers online" header
Mounted between WhatIsClaudemesh and BeyondTerminal — explainer
first, then show-don't-tell.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous flow printed a \`claude mcp add ...\` command and asked
users to paste it. That's 2 steps, a typo surface, and a point of
user dropoff. Replace with direct read-modify-write of ~/.claude.json.
install:
- preflights bun on PATH (clear error + Bun.com link if missing)
- verifies the MCP entry file exists on disk
- reads ~/.claude.json (empty object if absent)
- adds/updates mcpServers.claudemesh with resolved absolute path
- writes back with 0600 perms, creates parent dir if needed
- read-back verification (bails loudly if post-write state is wrong)
- idempotent: re-running returns "unchanged" if entry already matches
- preserves existing mcpServers entries + other top-level config keys
uninstall:
- removes the claudemesh entry if present
- no-ops cleanly when entry or config file doesn't exist
- doesn't touch anything else
Both print a clear post-action hint: "Restart Claude Code to load
the MCP server. Then join a mesh with claudemesh join <invite-link>".
verified locally with HOME=/tmp/fake-home:
- fresh install → ✓ added, config emitted correctly
- re-install → ✓ unchanged (idempotent)
- install alongside existing "other-mcp" entry → both preserved,
plus unrelated top-level keys kept verbatim
- uninstall → ✓ removed, claudemesh gone, other entries intact
- uninstall again → · not present (no error)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the vanilla TurboStarter template with a claudemesh-first
README aligned to the landing page positioning.
- Lead with "mesh of Claudes, not one you talk to" mental shift
- Concrete use case (Alice/Bob Stripe bug) before any install steps
- Install + join flow with @claudemesh/cli
- ASCII architecture diagram: broker at center, peers orbiting
- Honest limits section (what it is NOT, what's roadmap)
- Repo layout section
- TurboStarter dev setup moved under Contributing
All conversion CTAs were pointing to the dead github.com/claudemesh/
claudemesh repo or # hash fragments. Landing is the primary funnel for
v0.1.0 — every "Start" button is a conversion-critical surface.
Fixes:
- Header "Start free" → /auth/register
- Header GitHub nav item → REMOVED (kept the icon button, repointed)
- Hero "Start free" → /auth/register
- Pricing 6× CTAs: Solo/Pro/Plus/Team/Business → /auth/register,
Enterprise → /contact
- CTA footer "Star on GitHub" → /auth/register ("Start free")
- BeyondTerminal "Read the protocol spec" → /auth/register
("Get on the mesh")
GitHub reinstated as a dedicated icon button in the header right side,
pointing to https://github.com/alezmad/claude-intercom — the MIT OSS
foundation claudemesh is built on. Honest provenance: claude-intercom
is the local peer-mesh gift to the community, claudemesh is the hosted
cross-machine extension.
Tooltip: "Built on claude-intercom · MIT open source".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fixes the "chatbot integration" misread of the landing page by framing
claudemesh as a mesh-not-a-bridge above the gateways section.
- Mental shift (before/after): one Claude per project → mesh of Claudes,
mesh-as-substrate with surfaces tapping in
- Three concrete use cases with honest limits: solo multi-machine,
cross-repo team (Alice's Stripe fix / Bob rediscovers), mobile 3am
oversight via WhatsApp gateway
- Inline SVG architecture diagram: broker at center ("routes only · never
decrypts"), six peers hexagon-orbiting with ciphertext edges
- Anti-framing "what claudemesh is NOT" list to kill misreads
- Italic pull-quote closer with the honest one-liner
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Strategic positioning upgrade. claudemesh was framed as terminal-to-
terminal — which is only half the story. The broker is protocol-
agnostic: any peer with an ed25519 keypair joins the mesh, so the mesh
can reach WhatsApp bots, Telegram, iOS apps, Slack, email gateways,
browser extensions. Terminal is ONE client, not THE client.
New section at /#beyond: "Your mesh. Any surface." — 6 gateway cards
(Terminal / WhatsApp / Telegram / iOS·Android / Slack / Email) with
honest status badges:
- shipping → Terminal only (what we have today)
- on the roadmap → WhatsApp, Telegram, iOS/Android (we will build)
- build it yourself → Slack, Email (open protocol, community territory)
No overclaiming: we don't pretend WhatsApp is live. The honest framing
is exactly the aspirational hook — the architecture is there, the hooks
exist, someone could build a gateway peer today.
Each card has a custom 28px inline SVG glyph in clay, short serif
description, and a status chip. Grid staggers in with Motion.
Footer CTA: "the protocol is open · ed25519 + libsodium · build a gateway
for anything" + link to /#protocol on GitHub.
Hero subhead reworked to hint at cross-surface: "Peer mesh for Claude —
reachable from anywhere you are. … Terminal is one client, not THE
client."
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auth routes (/login, /register, /forgot-password, /update-password, /join)
were rendering with the default Geist fonts + shadcn neutral palette +
turbostarter SVG logos — completely off-brand against the marketing
landing. User reported from production.
Rewire auth/layout.tsx to:
- use --cm-bg / --cm-fg / --cm-clay tokens (dark #141413)
- Anthropic Sans for UI, Anthropic Serif for the right-aside tagline
- claudemesh wordmark (mesh glyph + serif) in place of Icons.Logo /
Icons.LogoText
- right aside: mesh glyph + serif tagline "Every Claude Code session,
woven into one mesh." + description paragraph, matching the CTA
copy from the landing
- subtle orange radial glow on the aside for depth
Inner form components (BetterAuth password/social buttons) pick up the
tokens from globals.css, so the forms look native on the dark layout
without per-component rewrites.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New user signs in → /dashboard (user) → hits server-side getMyMeshes → 0
results → redirects to /dashboard/meshes/new?onboarding=1. Create-mesh
page renders a welcome banner explaining what a mesh is. After submit,
if ?onboarding=1 was set, the form bounces to
/dashboard/meshes/[id]/invite?onboarding=1 instead of the mesh detail
page. Invite page renders a "🎉 Mesh created" banner with the
`claudemesh join <link>` CLI snippet.
The onboarding flag is URL-driven — no persistence needed, dismissal
happens naturally when the user navigates away.
Also rewrites the /dashboard (user) home page from the placeholder
"Welcome to your Dashboard" TurboStarter card grid to a claudemesh-
native view: top 6 meshes with badges, All meshes / New mesh CTAs.
Removes the unused Card/Icons imports.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Production /join on the broker (from feat 18c) rejects every invite
with invite_bad_signature because the web UI was emitting unsigned
payloads. This fixes that.
createMyMesh now generates ed25519 owner keypair + 32-byte root key
and stores all three on the mesh row. createMyInvite loads them,
signs the canonical invite bytes via crypto_sign_detached, and
emits a fully-signed payload matching what the broker expects:
payload = {v, mesh_id, mesh_slug, broker_url, expires_at,
mesh_root_key, role, owner_pubkey, signature}
canonical = same fields minus signature, "|"-delimited
signature = ed25519_sign(canonical, mesh.owner_secret_key)
token = base64url(JSON(payload)) ← stored as invite.token
The base64url(JSON) token IS the DB lookup key — broker's /join
does `WHERE invite.token = <that string>`, then re-verifies the
signature it extracts from the decoded payload.
Also drops the sha256 derivePlaceholderRootKey() helper and the
encodeInviteLink helper, both replaced by inline logic.
backfill extended: the one-off script now populates owner_pubkey
AND owner_secret_key AND root_key together in a single pass. Query
condition is `WHERE any of the three IS NULL`, so running it
post-migration catches every row regardless of partial prior fills.
requires packages/api to depend on libsodium-wrappers + types
(added). 64/64 broker tests still green.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Completes the server-side invite-signing story. The web UI's
create-invite flow needs the mesh owner's ed25519 SECRET key to sign
each invite payload; these columns let the backend hold + use them
per mesh.
- mesh.mesh.owner_secret_key (text, nullable): ed25519 secret key
(hex, 64 bytes) paired with owner_pubkey. Stored PLAINTEXT AT REST
for v0.1.0. Acceptable trade-off for a managed-broker SaaS launch —
the operator controls the key anyway. v0.2.0 will either encrypt
with a column-level KEK or migrate to client-held keys.
- mesh.mesh.root_key (text, nullable): 32-byte shared key
(base64url, no padding) used by channel/broadcast encryption in
later steps. Embedded in every invite so joiners receive it at
join time.
migrations/0002_vengeful_enchantress.sql — two ALTER TABLE ADD
COLUMN. Nullable so existing rows don't need backfill to migrate;
the backfill script populates them idempotently.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two visible launch-polish issues:
1. BuyCtaDialog popup was firing on an exponential backoff schedule
(15s, 30s, 60s, …) pushing users toward turbostarter.dev/#pricing +
Discord. Wrong product, wrong audience. Fully removed: mount point
in [locale]/layout.tsx + the component file + localStorage keys will
self-prune on next visit.
2. WhatsApp/Slack/Twitter link previews were pulling the TurboStarter
boilerplate opengraph-image.png (from Jan 8). Replaced with a 1200×630
claudemesh OG: "CLAUDEMESH" pixel wordmark left side, hero mesh
composition (6 Claude Code terminals + pixel-crab hub + orange
energy lattice + vaporwave grid floor) right side, "peer mesh for
Claude Code sessions" tagline in mono beneath wordmark.
3. Default metadata description swapped from the dangling
`common:product.description` i18n key (which rendered as the key
itself because the key doesn't exist in our trimmed translations)
to a hardcoded claudemesh description.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Populates mesh.mesh.owner_pubkey for pre-18c rows by generating a
fresh ed25519 keypair per mesh + emitting the secret key to stdout
for out-of-band hand-off.
Idempotent: only patches rows WHERE owner_pubkey IS NULL. Machine-
readable output (tab-separated: mesh_id, slug, pubkey, secret_key)
so operators can pipe into a secure store.
Usage:
DATABASE_URL=... bun apps/broker/scripts/backfill-owner-pubkey.ts > owners.tsv
# then securely distribute secrets to mesh owners
Verified locally: nulled smoke-test mesh's owner_pubkey → ran backfill
→ fresh keypair written, secret emitted.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Step 16 (account / profile) — landed smaller than scoped because turbo-
starter already ships the full /dashboard/settings flow (avatar, name,
email, language, delete-account) and BetterAuth handles security +
sessions out of the box. Reuses that surface; adds the claudemesh-
specific bits only.
- GET /api/my/export — returns a JSON bundle of the user's profile,
meshes they own, meshes they belong to, invites they've issued, and
audit events from their OWNED meshes (privacy: don't leak events
from meshes merely joined). Limited to 5k audit rows.
- ExportData component on /dashboard/settings — button downloads the
bundle as claudemesh-export-<userId>-<YYYY-MM-DD>.json client-side.
- Sidebar (user group) "settings" label swapped to "account" to match
the Step 16 naming. Same /dashboard/settings route, same existing
i18n key ("account" was already in common.json).
No schema changes: user.name (BetterAuth) IS the mesh display name.
meshMember.displayName is the per-join override that lands from the
CLI at registration time.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BetterAuth social providers (GitHub, Google, Apple) were already wired
on the server side at packages/auth/src/server.ts. Env vars
GITHUB_CLIENT_ID/SECRET + GOOGLE_CLIENT_ID/SECRET already present in
.env.example + .env.production.template. The SocialProviders component
at apps/web/src/modules/auth/form/social-providers.tsx already renders
the buttons.
The only missing piece was trimming the provider list — we had Apple in
config/auth.ts but no plan to ship Apple for v0.1.0. Drop it.
Add docs/oauth-setup.md with step-by-step wiring for:
- GitHub OAuth app (Homepage + callback URLs)
- Google OAuth client (authorized origins + redirect URIs)
- Production env propagation
- Troubleshooting (redirect_uri_mismatch, invalid_client, etc)
User action required: create the GitHub OAuth app + add claudemesh.com
redirect to the existing Google OAuth client in GCP project
surfquant-490521, then populate the 4 env vars in production.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Four new routes under /dashboard/(user)/*:
- /dashboard/meshes — card grid of user's meshes with myRole badge,
memberCount, tier, archived state. Empty state with "Create first mesh"
CTA.
- /dashboard/meshes/[id] — mesh detail (members list + active invites)
with "Generate invite link" CTA in header.
- /dashboard/meshes/new — placeholder route for create form (form lands
in next commit).
- /dashboard/meshes/[id]/invite — placeholder route for invite generator
(generator lands in next commit).
- /dashboard/invites — table of invites the user has issued across all
meshes, with derived status (active/revoked/expired/exhausted).
Sidebar nav (user group) extended with Meshes + Invites entries. paths
config extended with dashboard.user.meshes.{index,new,mesh,invite} and
dashboard.user.invites.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New /my/* Hono router scoped by session.user.id. User can only see meshes
they own OR have a non-revoked meshMember row for. All 7 endpoints guard
authz at the query level (ownerUserId = userId OR EXISTS membership).
- GET /my/meshes — paginated list with myRole, isOwner, memberCount
- POST /my/meshes — create mesh (slug collision check, returns id + slug)
- GET /my/meshes/:id — detail (mesh + members + invites)
- POST /my/meshes/:id/invites — generate ic://join/<base64url(JSON)> link.
Matches apps/cli/src/invite/parse.ts format exactly. mesh_root_key is a
deterministic sha256(mesh.id:slug) placeholder until Step 18 ed25519
signing lands.
- POST /my/meshes/:id/archive — owner-only
- POST /my/meshes/:id/leave — member self-removal (sets revokedAt)
- GET /my/invites — list invites this user has issued
Schemas live in packages/api/src/schema/mesh-user.ts. All enums mirror
the DB enums from packages/db/src/schema/mesh.ts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WS handshake is now authenticated end-to-end. The broker proves that
every connected peer actually holds the secret key for the pubkey
they claim as identity — not just that they know the pubkey.
wire format change:
{type:"hello", meshId, memberId, pubkey, sessionId, pid, cwd,
timestamp, signature}
where signature = ed25519_sign(canonical, secretKey)
and canonical = `${meshId}|${memberId}|${pubkey}|${timestamp}`
broker verifies on every hello:
1. timestamp within ±60s of broker clock → else close(1008, timestamp_skew)
2. pubkey is 64 hex chars, signature is 128 hex chars → else malformed
3. crypto_sign_verify_detached(signature, canonical, pubkey) → else bad_signature
4. (existing) mesh.member row exists for (meshId, pubkey) → else unauthorized
All rejection paths close the WS with code 1008 + structured error
message + metrics counter increment (connections_rejected_total by
reason).
new modules:
- apps/broker/src/crypto.ts: canonicalHello, verifyHelloSignature,
HELLO_SKEW_MS constant
- apps/cli/src/crypto/hello-sig.ts: matching signHello helper
clients updated:
- apps/cli/src/ws/client.ts: signs hello before send
- apps/broker/scripts/{peer-a,peer-b}.ts (smoke-test): sign hellos
with seed-provided secret keys
new regression tests — tests/hello-signature.test.ts (7):
- valid signature accepted
- bad signature (signed with wrong key) rejected
- timestamp too old rejected (>60s)
- timestamp too far in future rejected (>60s)
- tampered canonical field (different meshId at verify time) rejected
- malformed hex pubkey rejected
- malformed signature length rejected
verified live:
- apps/broker/scripts/smoke-test.sh: full hello+ack+send+push flow
- apps/cli/scripts/roundtrip.ts: signed hello + encrypted message
- 55/55 tests pass
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
hono rpc + tanstack query type inference is whack-a-mole across
new admin backoffice + dashboard. runtime compiles fine; only
type-checker yells. ship now, fix types post-launch.
tracked as ts-debt post v0.1.0
The fixed full-viewport overlay had overflow-auto AND pointer-events-none,
creating a scroll container that intercepted wheel events on hover in some
browsers — even though it was supposed to be click-through. Any viewport
< lg (1024px) broke page scroll when hovering anywhere above the fold.
Move overflow-y-auto + max-h-full to the inner panel (where it actually
needs to scroll for long nav lists) and keep the outer container purely
as a pointer-events-none positioning wrapper.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Direct messages between peers are now end-to-end encrypted. The
broker only ever sees {nonce, ciphertext} — plaintext lives on the
two endpoints.
apps/cli/src/crypto/envelope.ts:
- encryptDirect(message, recipientPubkeyHex, senderSecretKeyHex)
→ {nonce, ciphertext} via crypto_box_easy, 24-byte fresh nonce
- decryptDirect(envelope, senderPubkeyHex, recipientSecretKeyHex)
→ plaintext or null (null on MAC failure / malformed input)
- ed25519 keys (from Step 17) are converted to X25519 on the fly via
crypto_sign_ed25519_{pk,sk}_to_curve25519 — one signing keypair
covers both signing + encryption roles.
BrokerClient.send():
- if targetSpec is a 64-hex pubkey → encrypt via crypto_box
- else (broadcast "*" or channel "#foo") → base64-wrapped plaintext
(shared-key encryption for channels lands in a later step)
InboundPush now carries:
- plaintext: string | null (decrypted body, null if decryption failed
OR it's a non-direct message)
- kind: "direct" | "broadcast" | "channel" | "unknown"
MCP check_messages formatter reads plaintext directly.
side-fixes pulled in during 18a:
- apps/broker/scripts/seed-test-mesh.ts now generates real ed25519
keypairs (the previous "aaaa…" / "bbbb…" fillers weren't valid
curve points, so crypto_sign_ed25519_pk_to_curve25519 rejected
them). Seed output now includes secretKey for each peer.
- apps/broker/src/broker.ts drainForMember wraps the atomic claim in
a CTE + outer ORDER BY so FIFO ordering is SQL-sourced, not
JS-sorted (Postgres microsecond timestamps collapse to the same
Date.getTime() milliseconds otherwise).
- vitest.config.ts fileParallelism: false — test files share
DB state via cleanupAllTestMeshes afterAll, so running them in
parallel caused one file's cleanup to race another's inserts.
- integration/health.test.ts "returns 200" now uses waitFullyHealthy
(a 200-only waiter) instead of waitHealthyOrAny — prevents a race
with the startup DB ping.
verified live:
- apps/cli/scripts/roundtrip.ts (direct A→B): ciphertext in DB is
opaque bytes (not base64-plaintext), decrypted correctly on arrival
- apps/cli/scripts/join-roundtrip.ts (full join → encrypted send):
PASSED
- 48/48 broker tests green
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Four new admin routes backed by the mesh API modules:
- /admin/meshes — paginated data-table (name, owner, tier, transport,
members, created). Tier + transport multiSelect filters.
- /admin/meshes/[id] — detail page: owner row + 4 live sections
(members, presences, invites, last 50 audit events).
- /admin/sessions — live Claude Code WS presences. Status filter,
pulse dot for working sessions, disconnected badge.
- /admin/invites — invite tokens w/ status derived client-side
(active/revoked/expired/exhausted).
- /admin/audit — metadata-only event log, event-type + mesh + date
filters.
Overview page at /admin rewritten to 6 summary cards (users, orgs,
customers, meshes, sessions, messages 24h) joining the base
/admin/summary and /admin/summary/mesh endpoints.
Sidebar navigation gains a second "mesh" group with the four new entries.
paths.ts extended with admin.meshes / sessions / invites / audit.
All UI reuses @turbostarter/ui-web/data-table — columns.tsx + thin
*-data-table.tsx wrapper per the existing users pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tiny tsx script that flips user.role to "admin" via BetterAuth's admin
plugin convention (role column on the existing user table, not a custom
isAdmin boolean).
Wired through packages/auth → root package.json with the same env-sourcing
pattern as auth:seed.
Usage:
pnpm admin:grant me@example.com
Extends the Hono adminRouter with four new read-only mesh admin modules:
meshes, sessions, invites, audit. Each ships {queries,router}.ts following
the existing users/organizations/customers pattern (paginated Drizzle
transactions, getOrderByFromSort sorting, ilike search, enum filters).
- GET /admin/meshes — paginated list with owner join + member count subquery
- GET /admin/meshes/:id — detail: members, presences, invites, last 50 audit
events (returns {mesh: null,...} shell on not-found to stay single-shape
for Hono RPC inference)
- GET /admin/sessions — live WS presences across every mesh, joined to
member/mesh for display, status + active/disconnected filters
- GET /admin/invites — invite tokens w/ mesh + createdBy user joins,
revoked/expired filters
- GET /admin/audit — mesh audit log with eventType/meshId/date filters
Summary endpoint extended: new GET /admin/summary/mesh returns
{meshes, activeMeshes, totalPresences, activePresences, messages24h}.
Messages24h derived from audit_log where event_type='message_sent'
in the past 24h.
Schemas live in packages/api/src/schema/mesh-admin.ts, re-exported from
the schema barrel. All mesh/role/transport enums mirror the DB enums
from packages/db/src/schema/mesh.ts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Hono RPC client loses the response shape on /organizations/:id because
the route has no zod response validator on c.json(). Tactical cast at the
callsite unblocks the web Docker build. Proper fix is to add a
getOrganizationResponseSchema in packages/api and wire it into the route.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
drainForMember previously ran SELECT undelivered rows, THEN UPDATE
delivered_at. Two concurrent callers (e.g. WS fan-out on send +
handleHello's own drain for the target) could both SELECT the same
row before either UPDATEd, pushing the same envelope twice.
now: single atomic UPDATE ... FROM member ... WHERE id IN (
SELECT id ... FOR UPDATE SKIP LOCKED
) RETURNING mq.*, m.peer_pubkey AS sender_pubkey.
FOR UPDATE SKIP LOCKED is the key primitive — concurrent callers
each claim DISJOINT sets, so a message can never be drained twice.
Union of all concurrent drains still covers every eligible row.
re-sorts RETURNING rows by created_at client-side (Postgres makes no
FIFO guarantee on the RETURNING clause's output order), and normalizes
created_at to Date since raw-sql results can come back as ISO strings.
regression: tests/dup-delivery.test.ts (4 tests)
- two concurrent drains produce disjoint result sets
- six concurrent drains partition cleanly (20 messages, each drained once)
- subsequent drain after success returns empty
- FIFO ordering preserved within a single drain
48/48 tests pass. Live round-trip no longer logs the double-push.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
End-to-end join: user runs `claudemesh join ic://join/<base64>` and
walks away with a signed member record + persistent keypair.
new modules:
- src/crypto/keypair.ts: libsodium ed25519 keypair generation. Format
is crypto_sign_keypair raw bytes, hex-encoded (32-byte pub, 64-byte
secret = seed || pub). Same format libsodium will need in Step 18
for sign/verify.
- src/invite/parse.ts: ic://join/<base64url(JSON)> parser with Zod
shape validation + expiry check. encodeInviteLink helper for tests.
- src/invite/enroll.ts: POST /join to broker, converts ws:// to http://
transparently.
rewritten join command wires them together:
1. parse invite → 2. generate keypair → 3. POST /join → 4. persist
config → 5. print success.
state/config.ts: saveConfig now chmods the file to 0600 after write,
since it holds ed25519 secret keys. No-op on Windows.
signature verification (step 18) + invite-token one-time-use tracking
are deferred. For now the invite link is a plain bearer token; any
client with the link can join.
verified end-to-end via apps/cli/scripts/join-roundtrip.ts:
build invite → run join subprocess → load new config → connect as
new member → send A→B → receive push. Flow passes.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single HTTP POST /join the CLI calls after parsing an invite link +
generating an ed25519 keypair client-side. Broker validates the mesh
exists + is not archived, inserts a mesh.member row (or returns the
existing id for idempotency), returns {ok, memberId, alreadyMember?}.
body: {mesh_id, peer_pubkey, display_name, role}
- peer_pubkey must be 64 hex chars (32 bytes)
- role is "admin" | "member"
v0.1.0 trusts the request — no invite-token validation, no ed25519
signature check. Both land in Step 18 alongside libsodium wrapping.
size cap enforced via MAX_MESSAGE_BYTES (shared with hook endpoint).
structured log line per enrollment with truncated pubkey + whether
it was a new member or re-enrolled existing one.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
broker-client: full WS client with hello handshake + ack, auto-reconnect
with exponential backoff (1s → 30s capped), in-memory outbound queue
(max 100) during reconnect, 500-entry push buffer for check_messages.
MCP tool integration:
- send_message: "slug:target" prefix or single-mesh fast path
- check_messages: drains push buffers across all clients
- set_status: fans manual override across all connected meshes
- set_summary: stubbed (broker protocol extension needed)
- list_peers: stubbed — lists connected mesh slugs + statuses
manager module holds Map<meshId, BrokerClient>, starts on MCP server
boot for every joined mesh in ~/.claudemesh/config.json.
new CLI command: seed-test-mesh injects a mesh row for dev testing.
also fixes a broker-side hello race: handleHello sent hello_ack before
the caller closure assigned presenceId, so clients sending right after
the ack hit the no_hello check. Fix: return presenceId, caller sets
closure var, THEN sends hello_ack. Queue drain is fire-and-forget now.
round-trip verified: two clients, A→B, push received with correct
senderPubkey + ciphertext. 44/44 broker tests still pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds 23 tests across 4 files, taking total broker coverage from
21 → 44 passing in ~2.5s.
Unit tests (no I/O):
- tests/rate-limit.test.ts (6): TokenBucket capacity, refill rate,
no-overflow cap, independent buckets per key, sweep GC.
- tests/metrics.test.ts (5): all 10 series present in /metrics,
counter increment semantics, labelled series produce distinct lines,
gauge set overwrites, Prometheus format well-formed.
- tests/logging.test.ts (5): JSON per line, required fields (ts, level,
component, msg), context merging, level preservation, no plain-text
escape hatches.
Integration tests (spawn real broker subprocesses on random ports):
- tests/integration/health.test.ts (7):
* GET /health 200 + {status, db, version, gitSha, uptime} (healthy DB)
* GET /health 503 + {status:degraded, db:down} (unreachable DB)
* GET /metrics 200 text/plain with all expected series
* GET /nope → 404
* POST /hook/set-status oversized body → 413
* POST /hook/set-status 6th req/min → 429
* Rate limit isolation by (pid, cwd) key
Integration tests use node:child_process (vitest runs under Node, not
Bun — Bun.spawn isn't available). Each suite spawns its own broker
subprocess with a random port + tailored env vars.
Not yet covered (flagged for follow-up):
- WebSocket connection caps (needs seeded mesh + WS client setup)
- WebSocket message-size rejection (ws.maxPayload behavior)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Authoritative runtime contract for the broker. Documents:
- HTTP + WS routes (single-port architecture)
- Required + optional env vars (DATABASE_URL, caps, TTLs, limits)
- /health and /metrics semantics, including 503 behavior on DB drop
- SIGTERM/SIGINT graceful shutdown sequence
- Recommended multi-stage Docker build (node:slim for pnpm, oven/bun
for runtime) with GIT_SHA build-arg convention
- Signal/grace-period guidance for orchestrators
- Prometheus metric names + suggested alert thresholds
- CI pattern for the test suite (needs a live Postgres)
- Deployment target hand-off to the deploy lane
Complements the existing Dockerfile (claudemesh-3's work) with the
runtime contract the Dockerfile implements.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the minimum ops surface area for a production broker without
over-engineering. All new config knobs are env-var driven with sane
defaults.
New modules:
- logger.ts: structured JSON logs (one line, stderr, ready for
Loki/Datadog ingestion without preprocessing)
- metrics.ts: in-process Prometheus counters + gauges, exposed at
GET /metrics. Tracks connections, messages, queue depth, TTL
sweeps, hook requests, DB health.
- rate-limit.ts: token-bucket rate limiter keyed by (pid, cwd).
Applied to POST /hook/set-status at 30/min default.
- db-health.ts: Postgres ping loop with exponential-backoff retry.
GET /health returns 503 while DB is down.
- build-info.ts: version + gitSha (from GIT_SHA env or `git rev-parse`
fallback) + uptime, surfaced on /health.
Behavior changes:
- Connection caps: MAX_CONNECTIONS_PER_MESH (default 100). Exceed →
close(1008, "capacity") + metric increment.
- Message size: MAX_MESSAGE_BYTES (default 65536). WS applies it via
`ws.maxPayload`. Hook POST bodies cap out with 413.
- Structured logs everywhere replacing the old `log()` helper.
- Env validation stricter: DATABASE_URL required + regex-checked for
postgres:// prefix.
New endpoints:
- GET /health → {status, db, version, gitSha, uptime}. 503 if DB down.
- GET /metrics → Prometheus text format.
Verified: 21/21 tests still pass. Hit /health + /metrics live —
gitSha resolves correctly via `git rev-parse --short HEAD` in dev.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
21 integration tests (14 broker behavior + 7 path encoding), all
passing in ~1s against a real Postgres (claudemesh_test database on
the dev container).
Test infrastructure:
- apps/broker/vitest.config.ts extends @turbostarter/vitest-config/base
- tests/helpers.ts: setupTestMesh() creates a fresh mesh + 2 members
per test with a unique slug, returns cleanup function that cascades
the delete. cleanupAllTestMeshes() as an afterAll safety net.
- Mesh isolation in broker logic means tests don't interfere even when
they share a database — no per-test TRUNCATE needed.
Ported behavior tests (broker.test.ts, 14 tests):
- hook flips status + queued "next" messages unblock
- "now"-priority bypasses the working gate
- DND is sacred (hooks cannot unset it)
- hook source stays fresh through jsonl refresh
- source decays to jsonl when hook signal goes stale
- isHookFresh freshness window + source-type rules
- TTL sweep flips stuck "working" → idle
- TTL sweep leaves DND alone
- first-turn race: hook fired pre-connect stashed in pending_status
- applyPendingHookStatus picks newest matching entry
- expired pending entries are ignored on connect
- broadcast targetSpec (*) reaches all members
- pubkey mismatch → message not drained
- mesh isolation: peer in mesh X doesn't drain from mesh Y
Ported encoding tests (encoding.test.ts, 7 tests):
- macOS, Linux, Windows path encoding first-candidate correctness
- Roberto's H:\Claude → H--Claude regression test (2026-04-04)
- Candidate dedup, drive-stripped fallback, leading-dash fallback
How to run: from apps/broker,
DATABASE_URL="postgresql://.../claudemesh_test" pnpm test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Step 3 pruned packages/{ai,cms,cognitive-context} but left whole
route groups + feature modules that depended on them. Those files
were unbuildable since that prune. Removes them now so the workspace
can be validated:
Route groups:
- apps/web/src/app/[locale]/(apps)/{chat,image,pdf,tts}/
- apps/web/src/app/[locale]/(marketing)/blog/
Feature modules:
- apps/web/src/modules/{chat,image,pdf,tts,common/ai,marketing/blog}/
- packages/api/src/modules/ai/ (chat, image, pdf, stt, tts, router)
3 stragglers remain (separate handoff to claudemesh-2):
- apps/web/src/app/[locale]/(marketing)/legal/[slug]/page.tsx (cms)
- apps/web/src/app/sitemap.ts (cms)
- apps/web/src/modules/common/layout/credits/index.tsx (ai)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
chat/image/mesh modules all exported a generic `const schema`
binding. When packages/db/src/schema/index.ts did `export * from
"./chat"` + `export * from "./image"` + `export * from "./mesh"`,
TypeScript's ambiguous-re-export rule silently dropped the colliding
bindings — drizzle-kit's introspection could not find the pgSchema
instances, so CREATE SCHEMA statements were never emitted. The
migration worked on the prior dev DB only because chat/image already
existed from an earlier turbostarter run; a fresh clone would fail.
pdf.ts already used `pdfSchema` (unique name). Applied the same
pattern everywhere:
- chat.ts: `export const chatSchema = pgSchema("chat")`
- image.ts: `export const imageSchema = pgSchema("image")`
- mesh.ts: `export const meshSchema = pgSchema("mesh")`
Also added `CREATE EXTENSION IF NOT EXISTS vector` at the top of the
migration (pgvector is used by pdf.embedding — the generated
migration assumed it was pre-enabled).
Verified end-to-end against a fresh pgvector/pgvector:pg17 container:
`pnpm drizzle-kit migrate` applies cleanly from scratch, all 7 mesh.*
tables + chat/image/pdf/mesh schemas created correctly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
**Source:** Reverse-engineered from `@posthog/wizard` (npm cache), applied to `apps/cli/src/commands/launch.ts`
## Why
Launch wizard has three compounding problems:
1.**Imperative branching** — `launch.ts` checks account → mesh → name → role → exec in hardcoded order. Adding a screen requires touching existing code. Hard to reason about `--resume`, `--non-interactive`, and skip conditions.
2.**Terminal bleed-through on handoff** — wizard→`claude` exec corrupts Ink's TUI state (garbled word wraps, tool labels overwritten, spinner fragments fused to paths). Root cause is spread across multiple exit paths instead of one choke point.
3.**Inconsistent visual design** — ad-hoc colors per file, no central palette, no shared icon set, no shared layout primitives. Every screen reinvents status rows, centering, and spacing.
PostHog's wizard solves all three with one architectural pattern: **declarative flow pipelines + session-as-store + shared visual primitives**. This artifact captures the plan to port that pattern.
The router walks the array, skips entries where `show(s) === false` or `isComplete(s) === true`, and returns the first remaining entry. Zero switch statements. Zero hardcoded transitions. Adding a screen = appending an object.
### Overlay stack
Separate from the linear flow cursor. Interrupts (port conflict, auth expired, managed settings) are pushed onto `overlays[]` from anywhere and popped when dismissed. Active screen = top of overlay stack OR flow cursor. Flows never need to know about interrupts.
### Session as single source of truth
One `WizardStore` holds all session state. Screens subscribe via React 18 `useSyncExternalStore`. Completion predicates read session; imperative code writes session; the router re-resolves on every change.
`--resume <id>` populates the session from saved state; every satisfied predicate auto-skips. The wizard renders only the screens that still need input. No special `--resume` branches in screen code.
### `--non-interactive` works for free
Non-interactive mode: walk the flow, for each incomplete entry check if its required session fields can be sourced from CLI flags. If yes, populate and continue. If no, **fail fast with a clear message** naming the missing flag. Never silently guess defaults.
PostHog only does SGR reset + clear + home on unmount — they don't hand off to another full-screen app, so that's enough for them. Claudemesh needs the full mode-reset because Claude Code takes over the TTY.
Steps 1–5 are the atomic unit of value: they fix the bleed-through bug, establish the visual system, and unblock everything else. Should ship as one PR.
Steps 6–9 can each ship independently.
Step 10 is polish — defer until after v0.2.
## Open questions
- **Ink version**: current CLI uses Ink 4.x? PostHog is on Ink 5 with `useSyncExternalStore`. Check `apps/cli/package.json` before porting the store pattern — Ink 4 needs a different subscription approach.
- **React version**: `useSyncExternalStore` is React 18+. Confirm.
- **Flow granularity**: should `Join` (paste invite) be a separate flow from `Launch`, or an overlay inside `Launch`? PostHog-style: separate flow triggered from the welcome screen. Simpler.
- **Resume semantics**: does `--resume <id>` resume the *Claude* session only, or also restore the wizard's last mesh/name/role choice? If the latter, need a `~/.claudemesh/sessions/<id>.json` alongside Claude's own session file.
**Purpose:** Exhaustive audit of what v1 ships today. **Every row in this document must still work after v2 lands.** v2 is a refactor + CLI user flows, NOT a functional rewrite; this inventory is the regression checklist.
| `sync` | `commands/sync.ts` | Sync meshes from the user's claudemesh.com dashboard account | `--force` |
| `profile` | `commands/profile.ts` | View or edit member profile (self or another member if admin) | `--mesh`, `--role-tag`, `--groups`, `--message-mode`, `--name`, `--member`, `--json` |
| `status` | `commands/status.ts` | Check broker connectivity for each joined mesh | (none) |
`apps/cli/src/index.ts` lines 339–355 implement a **friction reducer**: if the user types `claudemesh --resume xxx` or any flag-first invocation, the argv is rewritten to `claudemesh launch --resume xxx` before citty parses it. This lets users skip typing `launch` for common flag-only forms.
**Must preserve in v2.** Users may depend on this. Applies to `--resume`, `--continue`, `-y`, `--mesh`, `--name`, etc.
---
## 2. MCP tools (79 total)
Defined in `apps/cli/src/mcp/tools.ts` with schemas, implemented in `apps/cli/src/mcp/server.ts` with per-tool case handlers. Each MCP tool is a RPC that the CLI's MCP server handles locally or forwards to the broker via WS.
Grouped by domain family. Every tool listed here has a working handler in v1.
### 2.1 Messaging (4)
| Tool | v1 behavior |
|---|---|
| `send_message` | Send encrypted message to peer, group, or broadcast. Supports priorities: `now` (immediate), `next` (default), `low`. Broker queues if recipient offline. |
| `list_peers` | List connected peers in the mesh with `presenceId`, `displayName`, `status`, `summary`, `groups`, `roleTag`. |
| `message_status` | Query delivery state of a sent message by `messageId`. |
| `share_file` | Upload a file to MinIO. Supports `to: <peer>` for E2E encryption (symmetric key wrapped with peer pubkey), or mesh-wide sharing. Supports `persistent` vs `ephemeral` storage. |
| `get_file` | Download a file by `fileId`. Returns a presigned MinIO URL. |
| `list_files` | List files in the mesh by `scope`, `tags`, author. |
| `file_status` | Query status of a file: who downloaded, when. |
| `delete_file` | Delete a file (owner only). |
| `grant_file_access` | Add another peer as a recipient of an already-encrypted file (re-wraps symmetric key). |
| `read_peer_file` | Read a file from another peer's working directory (requires peer online + sharing). |
| `list_peer_files` | List files in a peer's shared directory (tree of names, not contents). |
### 2.7 Vectors (Qdrant) (4)
| Tool | v1 behavior |
|---|---|
| `vector_store` | Store embedding with metadata in a named collection. |
| `vector_search` | Nearest-neighbor search in a collection with `limit`. |
| `vector_delete` | Delete a vector by ID. |
| `list_collections` | List collections in the mesh's Qdrant namespace. |
### 2.8 Graph (Neo4j) (2)
| Tool | v1 behavior |
|---|---|
| `graph_query` | Read-only Cypher MATCH query on the per-mesh Neo4j database. |
| `ping_mesh` | Test messages through the full pipeline, measure round-trip per priority. Diagnoses push delivery issues. |
### 2.15 Mesh clock — write (3)
| Tool | v1 behavior |
|---|---|
| `mesh_set_clock` | Set simulation clock speed (1–100x). Peers receive heartbeat ticks at the simulated rate. |
| `mesh_pause_clock` | Pause simulation clock. |
| `mesh_resume_clock` | Resume paused clock. |
### 2.16 Skills (5)
| Tool | v1 behavior |
|---|---|
| `share_skill` | Publish a reusable skill (name + description + instructions + tags + when_to_use + allowed_tools + model + context + agent + user_invocable + argument_hint). Exposed as MCP prompts and `skill://` resources. |
| `get_skill` | Load a skill's full instructions by name. |
| `list_skills` | Browse available skills, optionally filter by keyword. |
| `remove_skill` | Remove a shared skill. |
| `mesh_skill_deploy` | Deploy a multi-file skill bundle from zip or git repo. |
### 2.17 MCP registry tier 1 — peer-hosted (4)
| Tool | v1 behavior |
|---|---|
| `mesh_mcp_register` | Register a peer's local MCP server with the mesh (server_name, description, tools schema, persistent flag). Other peers can invoke via `mesh_tool_call`. |
| `mesh_mcp_list` | List MCP servers in the mesh with their tools + hosting peer. |
| `mesh_tool_call` | Call a tool on a mesh-registered MCP server. Routes: caller → broker → hosting peer → execute → result back. 30s timeout. |
| `mesh_mcp_remove` | Unregister a peer-hosted MCP server. |
| `mesh_mcp_deploy` | Deploy an MCP server from zip (via `file_id`), git URL, or npx package. Runs on broker VPS in Docker sandbox. Scope: `peer` (default), `mesh`, or `{group/groups/role/peers}`. Runtime: node / python / bun. Memory, network_allow, env with `$vault:` references. |
| `mesh_mcp_undeploy` | Stop and remove a managed MCP server. |
| `mesh_mcp_logs` | Tail recent logs from a managed server. |
| `mesh_mcp_scope` | Get or set visibility scope. |
| `mesh_mcp_schema` | Inspect tool schemas for a deployed server. |
| `mesh_mcp_catalog` | List all deployed services with status, scope, tool count. |
### 2.19 Vault (3)
| Tool | v1 behavior |
|---|---|
| `vault_set` | Store encrypted credential. `type: env` (string, injected as env var via `$vault:<key>`) or `type: file` (file written to `mount_path` in container). |
| `vault_list` | List vault entries (keys + metadata only, no values). |
| `create_webhook` | Create an inbound webhook. Returns a URL external services (GitHub, CI/CD, monitoring) can POST to. Payload becomes a mesh message to all peers. |
| `list_webhooks` | List active webhooks. |
| `delete_webhook` | Deactivate by name. |
---
## 3. Broker WS protocol
`apps/broker/src/index.ts` dispatches 85 message types over a single WebSocket endpoint (`WS_PATH`). Each WS message is a client-initiated RPC; most of the 79 MCP tools above map 1:1 to a WS message. Some additional WS messages exist for connection lifecycle + internal routing.
### 3.1 Connection lifecycle (3)
-`hello` — client authentication. Ed25519 signature over `{meshId, memberId, pubkey, timestamp}`. Broker verifies, creates presence row, replies with `hello_ack`.
| `mesh.telegram_bridge` | Telegram bot registration per mesh. |
---
## 6. Broker backend services
Five external services the broker manages at runtime. All currently work in v1 and ship in the default Docker Compose deployment.
| Service | Purpose | File | Per-mesh model |
|---|---|---|---|
| **Postgres** (Drizzle) | Primary data store for mesh schema. Also used for `mesh_execute` / `mesh_query` / `mesh_schema` shared-SQL tools via per-mesh schemas. | `db.ts` | Schema-per-mesh for shared SQL tools |
- Bridge row persistence in `mesh.telegram_bridge`
**This is ~18% of the broker's total source**. v2 must either:
1. Port the logic into a standalone MCP connector (`apps/mcp-telegram/`), or
2. Keep this file in the broker and wire it into the v2 architecture unchanged (my recommendation per the previous conversation — bundled into the broker image)
Either way, **every behavior documented here must still work after v2 lands**.
- **Hello signatures**: Ed25519 signed tuple of `(meshId, memberId, pubkey, timestamp)`. Verified on every WS connection. Replay protection via timestamp window.
- **Invite verification**: canonical invite payload (`canonicalInvite`) signed by mesh owner, Ed25519 verified on claim
- **JWT**: for `/cli-sync` endpoint — the CLI obtains a JWT from `claudemesh.com` via browser flow, passes it to the broker, broker verifies and returns the user's mesh list
- Enforces limits on `send`, `vector_store`, `mesh_execute`, `mesh_mcp_deploy`, etc.
### 7.8 Metrics (`metrics.ts`)
Prometheus metrics exposed at `/metrics`:
- Request counts by op type
- Latencies p50/p99
- Connection counts per mesh
- Message delivery counts by priority
- Error rates
### 7.9 Audit log (`audit.ts`)
- Every mutation is audited to `mesh.audit_log`
- Tamper-evidence via hash chaining
- Accessible via `audit_query` and `audit_verify` WS ops
### 7.10 Member API (`member-api.ts`, 284 lines)
Exports:
-`updateMemberProfile()` — used by `PATCH /mesh/:id/member/:memberId`
-`listMeshMembers()` — used by `GET /mesh/:id/members`
-`updateMeshSettings()` — used by `PATCH /mesh/:id/settings`
### 7.11 CLI sync (`cli-sync.ts`, 133 lines)
Exports `handleCliSync()` for `POST /cli-sync`. This is **already the "CLI sync meshes from dashboard" feature** — v2 will reuse this endpoint for its mesh-list refresh logic.
**v1 test coverage is minimal for the CLI side.** 2 unit test files for 12k LOC.
Broker has ~10 test files. They cover crypto primitives, invite flow, hello signatures, rate limiting, metrics — but **not** the 85 WS message handlers comprehensively.
---
## 12. The "must preserve" list (high-priority regression checks)
If v2 breaks any of these, it's a user-facing regression:
### 12.1 First-run experience
- [ ]`claudemesh` bare command → welcome wizard
- [ ]`claudemesh install` registers MCP server + status hooks in Claude Code config
- [ ]`claudemesh join <url>` enrolls into a mesh from a v1 OR v2 invite URL
- [ ]`claudemesh launch` starts Claude Code with mesh connectivity
### 12.2 Session lifecycle
- [ ] Status hooks fire correctly on Claude Code session start/stop/pause
These stay in the `.artifacts/specs/` as reference documents. They describe a good destination. They are NOT v1.0.0 requirements.
---
## 14. Known v1 technical debt / gaps (worth noting)
These aren't features — they're places where v1 is weaker than it could be. Document here so v2 doesn't blindly port the weaknesses.
- **CLI auth is missing** — v1 has no `login` / `logout` command. All account-level operations require the web dashboard. This is what v2 is adding.
- **Imperative command branching** — `commands/launch.ts` is 775 lines with nested flag handling. Cleaner in v2's flow pipeline.
- **Minimal CLI test coverage** — 2 test files for 12k LOC. v2 should have colocated tests per service.
- **Rate limiting is in-memory only** — doesn't survive broker restart; not Redis-backed.
- **No CLI-side caching** — every `list_peers` / `mesh_info` call hits the broker. v2's local-first layer (Pass 2) addresses this.
- **Telegram bridge is a large monolithic file** (1711 lines) — legitimate complexity, but v2 may want to modularize if it touches it.
- **v1 wizard bleed-through** — `launch` → `claude` handoff leaves ANSI state dirty. v2's `resetTerminal()` choke point fixes this.
None of these are regressions if v2 keeps them as-is. v2 should **not** prioritize fixing them — fix them when they become a problem, not speculatively.
---
## 15. Reading this inventory
**If you're implementing v2 Phase 1** (foundation layers): every tool in §2, every WS op in §3, every HTTP endpoint in §4, every DB table in §5 must have a place in the v2 folder structure. No new semantics, no improved algorithms — just move the working code.
**If you're reviewing a v2 PR**: check it against §12 ("must preserve" list). If the PR changes the behavior of anything in that list, it's a regression and needs explicit sign-off.
**If you're writing v2 docs**: reference this document. Every feature here is user-visible and documented in v1's README / slash-command help / tool descriptions. v2 docs should mention every feature from §2 as preserved.
\f0\fs24 \cf0 Mesh templates for predefined roles, groups\'85\
Mesh blockchain, can it be a good addition? For what?\
Mesh webhooks, external web sockets, restful apis to be connected to the mesh (mcp)\
Mesh skills available for all ai? Like a mesh catalog of skills for sessions to get and use them?\
Inicial private mesh by default for every new user\
Mesh dashboard for situational awareness of mesh, to illustrate the peers connected, their activity, status, mesh structure\
Mesh of meshes? bridge?\
Mesh Connectors: slack, telegram, they can appear as peers? Or sth different?\
Connect humans to the mesh? Peer info to know about if human, type of channel (telegram or whatever) or llm model if ai?\
How to connect others than just claude code? The problem will be the push system I suppose\
\
Add path (pwd) where each session is being executed for them to understand how to reference files if same computer? Maybe only visible for peers on same computer?\
What if a peer on connection can make available all the project files, folders and subfolders? Direct access? So other ai can read files if needed from connected projects?\
Can we have peer stats for example about context consumption?\
Mesh notifications about new peers, new connectors, new resources? Broadcast?\
Allow group or role changes dynamically not only on mesh connection?\
Dynamic mcp that can be connected or disconnected on realtime without resetting the claude code sessions?\
Mesh templates on creation, with a predefined structure that it can be changed as well by mesh admin role? Or any? Or what idea?\
What if reminders can be just cron so ai knows exactly how to configure crons for the mesh? So broker can handle the cron creation? What about mesh heartbeats to keep ai alive?\
Sandbox for code execution, python, node, chromium, etc so any peer can connect to resources, and resources being scalable on real time if a new peer needs a sandbox?\
| `dead` | match | Return `409 idempotency_key_reused`, `conflict: "outbox_dead_fingerprint_match", reason: "<last_error>"`. Same id never auto-retried |
| **`aborted`** (NEW v8) | **match** | Return `409 idempotency_key_reused`, `conflict: "outbox_aborted_fingerprint_match"`. The id was retired by operator action; never reusable |
| **C. Outbox stored, broker permanent rejection** | Broker returns `4xx` after IPC accept | `dead` | Yes | No — rotate via `requeue` |
| **D. Operator retirement** | Operator runs `requeue` on `dead` or `pending` row | `aborted` (audit) + new row with fresh id | Yes (still consumed) | Old id NEVER reusable; new id is fresh |
The "daemon-consumed?" column is the daemon-layer authority. It does
not depend on whether the broker ever saw the request — phase C above
shows the broker has not committed a dedupe row, but the daemon still
The broker validates in **four phases** relative to dedupe-row
insertion. Phase B0 (NEW v10 — codex r9) makes idempotent retries
free of rate-limit budget so a daemon retry of an already-committed
message can never get rate-limit-rejected:
| Phase | Validation | Side effects | Result for direct broker callers |
|---|---|---|---|
| **B0. Dedupe fast-path** (NEW v10) | Read `mesh.client_message_dedupe` for `(mesh_id, client_message_id)`. **Does not touch rate-limit budget.** | None | If row exists & fingerprint matches → `200 duplicate` with original `broker_message_id`. If row exists & fingerprint mismatches → `409 idempotency_key_reused`. If row absent → continue to B1 |
| **B1. Pre-dedupe-claim** (atomic, external) | Auth (mesh membership), schema, size, mesh exists, member exists, destination kind valid, payload bytes ≤ `max_payload.inline_bytes`, **rate limit not exceeded** (idempotent external limiter — see §4.6.4) | None | `4xx` returned. No dedupe row, no broker-consumed id. Caller may retry with same id once condition clears |
| **B2. Post-dedupe-claim** (in-tx) | Conditions that require the accept transaction to be in progress: destination_ref existence (topic exists, member subscribed, etc.) | INSERT into dedupe rolled back | `4xx` returned, transaction rolled back, no dedupe row remains. Caller may retry with same id |
| **B3. Accepted** | All side effects commit atomically | Dedupe row, message row, history row, delivery_queue rows, mention_index rows | `201` returned with `broker_message_id`. Id is broker-consumed |
**Why B0 is correct (codex r9)**: idempotent retries should never be
distinguishable from "the call worked" from the caller's perspective.
A retry that the broker can resolve to the original accept must do so
before any operation that could fail (rate limit, capacity check,
auth-quota, etc.). B0 reads — non-mutating, no transaction — so it can
be skipped on the strictly-new-id path with negligible cost (one
indexed PK lookup against the dedupe table).
**Race semantics for new ids (v10 — codex r9)**: B0 is a non-locking
read; two same-id requests can both miss B0 simultaneously. Without
care, both would consume rate-limit budget. v10 requires the limiter
to be **idempotent over `(mesh_id, client_message_id, window)`**:
budget is consumed at most once per id-window pair regardless of
concurrent retries (§4.6.4). The "second" retry that misses B0 still
sees its `INCR` short-circuited by the limiter and proceeds to B2/B3
without budget impact. Whichever request wins the dedupe `INSERT`
commits; the loser sees fingerprint match (rollback to `200
duplicate`) or mismatch (`409`).
**Daemon-mediated callers**: in v0.9.0 the daemon is the only B-phase
caller. Daemon-mediated callers see only the daemon-layer rules
(§4.6.1). The broker's "may retry with same id" wording in the table
above applies to direct broker callers only (none in v0.9.0; reserved
for future SDK paths).
**Critical guarantee (v9 — tightened from v8)**: a dedupe row exists
**iff the broker accept transaction committed (B3)**. There is no
broker code path where a permanent 4xx leaves a dedupe row behind.
If the broker decides post-commit that an accepted message is invalid
(async content-policy job, async moderation, etc.), that's NOT a
permanent rejection — it's a follow-up event that operates on the
| Rate-limit **counters** (telemetry only) | Async, eventually consistent | Authoritative limiter is the external Redis-style INCR in B1 (§4.6.4); the DB counter is rebuilt for dashboards, not consulted for accept |
| Audit log entries | Async append-only stream | Audit log can be rebuilt from message history; in-tx writes hurt p99 |
| Search/FTS index updates | Async via outbox-pattern worker | Index can be rebuilt from authoritative tables |
| Rate-limit authority muddled (B2 vs async counters) | Rate limit moved to B1 via external atomic limiter (Redis-style INCR with TTL); DB rate-limit counters demoted to telemetry-only | §4.6.2, §4.6.4, §4.7.1 |
User-editable. `claudemesh daemon reload` re-reads it without dropping the WS.
## 14. Lifecycle — the operational flows v1 was missing
### 14.1 Key rotation
```
claudemesh daemon rotate-keypair
```
Mints fresh ed25519 + x25519. Registers new pubkey with broker as a `member_keypair_rotated` operation (broker associates new pubkey with same member id). Old pubkey is held server-side for 24h grace (decrypts in-flight messages encrypted to old pubkey), then revoked.
### 14.2 Local token rotation
```
claudemesh daemon rotate-token
```
Atomically writes a new `local_token`, returns the old one alongside the new
one for 60s grace. SDKs that already have the old token finish in-flight
requests; new requests use the new token. After 60s, old token is rejected.
### 14.3 Compromised host revocation
From the dashboard or another mesh-owner session:
```
claudemesh member revoke <pubkey>
```
Broker marks member as revoked. Connected daemon receives `member_revoked`
push, self-disables (refuses new IPC, closes WS), exits with non-zero status,
logs forensic event.
### 14.4 Image-clone lifecycle
Covered in §2.2. Three policies (`refuse`, `warn`, `allow` — settable per-host
- Restore from local journal-only inbox (read-only mode; sends disabled).
- Wipe + rebuild from broker (fetches last N days of message history if
available; topics need re-subscribe; outbox is irrecoverable, queued sends are
lost).
- Wipe + start fresh.
## 15. Version compatibility
### 15.1 Negotiation handshake
On daemon connect to broker AND on every IPC request:
```
GET /v1/version
{
"daemon_version": "0.9.0",
"ipc_api": "v1",
"ipc_minor": 3, # additive minor
"schema_version": 7,
"broker_protocol_min": "0.7",
"broker_protocol_max": "0.9"
}
```
### 15.2 Compat policy
| Across | Policy |
|---|---|
| Daemon ↔ Broker | Daemon refuses to connect if broker version < daemon's `broker_protocol_min`. Broker logs warning. Pre-1.0 we may break this with notice; post-1.0 we maintain backward compat for ≥6 months. |
| CLI ↔ Daemon | CLI checks daemon's `ipc_api`. Same major = OK. Different major = CLI falls back to cold-path with warning. |
| Container side-channel | Same loopback namespace | Read another container's daemon | Containers share host loopback only if explicitly net=host. `local_token` defends. Recommended: bind UDS only inside containers |
| Compromised hook | Capability token in env | Use that scope | Capability tokens are scoped + short-lived; cannot escalate |
| Compromised broker | Full mesh visibility on its side | Deliver malicious messages, identity-impersonate | E2E encryption (crypto_box DMs, per-topic keys) — broker can't read content. Out-of-scope for daemon |
| Cloned VM image | Same keypair on two hosts | Identity collision | Host fingerprint detection + dashboard audit + `--remint` flow |
| Stolen laptop | Disk access | Mesh impersonation forever | `member revoke` from dashboard. Without disk encryption, this is the user's laptop security; documented in security guide |
| Untrusted hook author | Hook script content | Exfil mesh data | Hook is on disk YOU control. If you ran `git pull` on a malicious hooks/ repo, that's a code-supply-chain attack out of scope for the daemon |
### 16.2 Out of scope
- Defending against an attacker with root on the daemon host. They can read
`keypair.json` directly.
- Defending against malicious peers in the same mesh sending malformed
payloads. Daemon validates structure but trusts mesh members.
- Defending against compromised broker. Out-of-scope for daemon; mesh-level
E2E protects content but not metadata.
## 17. Migration — what changes for existing users
Same as v1. Additive. No DB migration on broker. Existing
### 16.1 Attacker classes — same matrix as v2 §16, plus:
| Attacker | Has | Wants | Mitigations |
|---|---|---|---|
| **Shared CI runner** (NEW) | Same Unix UID as other untrusted jobs | Read this user's persistent keypair across job boundaries | Auto-detect CI envs (§2.1) → ephemeral default + UDS-only + isolated `$HOME`. If operator overrides with `--persistent`, log warning `persistent_keypair_in_ci_environment`. |
| **Malicious mesh peer** (PROMOTED from out-of-scope to in-scope) | Mesh membership | Send malformed payload to crash daemon | Every inbound shape validated against schema before any processing. Daemon refuses unknown fields (defense-in-depth) and emits `cm_daemon_invalid_inbound_total`. Crashes from inbound payloads are bugs. |
### 16.2 Stated explicitly out of scope
- Root attacker on daemon host (can read keypair directly).
- Compromised broker (E2E content protection still holds; metadata is not
protected by daemon — that's mesh-level).
- Sophisticated attacker who copies BOTH `keypair.json` and
`host_fingerprint.json` (§2.2 calls this out).
- Receivers other than daemon-hosted peers deduping inbound traffic
(post-v0.9.0).
### 16.3 Container & CI defaults table (NEW)
| Environment | Identity | IPC | Hooks |
|---|---|---|---|
| Bare metal / VM (default) | Persistent (clone-detected) | UDS + TCP loopback | Enabled |
This catches **image clones, restored backups, copy-pasted homedirs** —
accidents made by humans. It does not defend against an attacker who copies
both `keypair.json` and `host_fingerprint.json`. The threat model (§16) says
this explicitly.
#### 2.2.1 Fingerprint source precedence (NEW — codex r3)
`host_fingerprint.json` stores `sha256(host_id || stable_mac)` where the
inputs are computed from the OS-specific table below, in order:
| OS | `host_id` (try in order) | `stable_mac` |
|---|---|---|
| Linux | `/etc/machine-id` → `/var/lib/dbus/machine-id` → first stable MAC | First non-loopback non-virtual interface, lex-sorted by name (`en…`/`eth…` before `wl…`); `docker0/veth*/br-*/lo` excluded |
`local_token` NOT included; regenerated on restore.
### 14.3 Local token rotation, compromised host revocation, image-clone, uninstall, recovery — unchanged from v2 §14.3
---
## 15. Version compat — feature-bit negotiation with **parameters** (codex r3)
v3's feature bits were boolean. Codex r3: dedupe-window, max-payload, key
epochs all need parameters. v4 makes feature bits string-keyed entries that
optionally carry a value.
### 15.1 Feature bits with parameters
| Bit | Type | Parameters | Notes |
|---|---|---|---|
| `client_message_id_dedupe` | object | `{ mode: "permanent"\|"windowed", window_hours?: int, tombstone_retention_days: int }` | Daemon reads `mode` to decide whether to enforce its own outbox max-age cap. `tombstone_retention_days` (broker-controlled) tells daemon how long it can expect "already-accepted" replies after the source row is GC'd |
| Kubernetes (`KUBERNETES_SERVICE_HOST`) | Persistent | UDS-only | Enabled | Single pod = single tenant |
| CI (`CI=true`, `GITHUB_ACTIONS`, etc.) | Ephemeral | UDS-only | Disabled by default (`[hooks] enabled = false`) | Multi-tenant runner; arbitrary code; ephemeral identity = no cross-job leak; hooks disabled because CI workloads are arbitrary user code |
| RunPod (`RUNPOD_POD_ID`) | Persistent | UDS-only | Enabled | Long-lived single-tenant sandbox; user owns the pod for its lifetime; identical trust model to a Docker container, NOT to a CI runner |
**RunPod resolution (codex r3)**: v3 listed RunPod under both "ephemeral
identity" and "hooks enabled" which was contradictory. v4 treats RunPod as
a **single-tenant container** (Docker-like): persistent identity, UDS-only,
hooks enabled. RunPod is removed from the CI auto-detect list (§2.1).
Operators who run RunPod as multi-tenant sandbox-as-CI can opt in with
deployed before daemon. Daemon refuses to start if `client_message_id_dedupe`
feature bit is missing from broker's negotiation response.
---
## What changed v3 → v4 (codex round-3 actionable items)
| Codex r3 item | v4 fix | Section |
|---|---|---|
| Broker dedupe window: permanent vs windowed? | **Picked permanent**; schema clarified; outbox `max_age_hours` raised back to 168h | §4 |
| Feature bits should be parameterized | All feature bits are string-keyed with optional value object | §15.1, §15.2 |
| Key archive record format unspecified | Full schema with `key_id`, timestamps, `max_archived_keys`, force-expiry rule, write-failure semantics | §14.1.1 |
| Document fingerprint source precedence per OS | Per-OS table for `host_id` and stable MAC; cloud-image false-positive note | §2.2.1 |
3. Broker code: every `INSERT` into `topic_message` / `message_queue` first
`INSERT ... ON CONFLICT DO UPDATE RETURNING` into
`client_message_dedupe`. The conflict path returns existing
`broker_message_id` instead of creating a new row.
4. Broker code: nightly job to delete `client_message_dedupe` rows where
`expires_at < NOW()`.
5. Broker code: hook into the existing message-retention sweep to set
`history_available = FALSE` on dedupe rows whose message row has been
pruned.
6. Broker advertises `client_message_id_dedupe` feature bit in negotiation
response.
7. Daemon refuses to start unless that feature bit is advertised with valid
params.
---
## What changed v4 → v5 (codex round-4 actionable items)
| Codex r4 item | v5 fix | Section |
|---|---|---|
| Dedupe must be retention-scoped, not "permanent" with row-deletion gap | Real `mesh.client_message_dedupe` table; retention independent of message rows; `permanent` becomes opt-in mode meaning "no expires_at" | §4.1, §4.3 |
| Rename misleading mode | `retention_scoped` is the default; `permanent` reserved for explicit opt-in | §4.3, §15.1 |
| Deterministic duplicate response | New shape with `duplicate`, `broker_message_id`, `history_available`; removed `client_id_unknown` | §4.4 |
insert in **one transaction** (§4.7). Pre-generated
`broker_message_id` (ulid in code) passed in.
5. Broker code: nightly job to delete dedupe rows where `expires_at <
NOW()` (skip in `permanent` mode).
6. Broker code: hook into the message-retention sweep — when a
`topic_message` or `message_queue` row is hard-deleted, find the
matching dedupe row by `client_message_id` and set `history_available
= FALSE`. (Note: `client_message_id` is nullable on those tables for
legacy traffic; nullable rows have no dedupe row to update.)
7. Broker code: nightly orphan-check job (§4.7); alerts on non-zero.
8. Broker advertises `client_message_id_dedupe` feature with
`params.version = 2` and `request_fingerprint: true`.
9. Daemon refuses to start unless that feature bit is advertised with
valid v2 params.
Rollback plan: feature flag disables fingerprint enforcement broker-side
(falls back to existing pre-v6 behavior — no dedupe). Daemons that
require fingerprint refuse to start. Operator switches off the feature
flag, reverts the daemon, restarts. No data loss; pending dedupe rows
remain in place for the next forward roll.
---
## What changed v5 → v6 (codex round-5 actionable items)
| Codex r5 item | v6 fix | Section |
|---|---|---|
| Idempotency key reuse with different payload silently collapses | `request_fingerprint` BYTEA in dedupe table; canonical form per §4.4; 409 on mismatch | §4.3, §4.4, §4.5 |
| `dead` | match | Return `409 idempotency_key_reused` with `conflict: "outbox_dead_fingerprint_match", reason: "<last_error>"`. Caller must rotate the id (see §4.6.3) — daemon refuses to re-attempt a dead row's exact bytes. |
9. Daemon refuses to start unless above is advertised.
Daemon side:
- Outbox table gains `aborted` status (§4.6.3); migration ALTER on the
CHECK constraint at startup if SQLite version <DDL works without
a recreate; else table recreate via `INSERT INTO new SELECT * FROM
old`. v0.9.0 daemons are fresh installs by definition; existing
outboxes don't exist.
- IPC accept path implements §4.5.1 lookup table.
- IPC error envelope adds `conflict` and `daemon_fingerprint_prefix`
fields for 409 responses.
- New CLI verb `claudemesh daemon outbox requeue --id <id>
--new-client-id [auto|<id>]` (§4.6.3).
---
## What changed v6 → v7 (codex round-6 actionable items)
| Codex r6 item | v7 fix | Section |
|---|---|---|
| Daemon-local duplicate POST semantics undefined | Full lookup table for pending/inflight/done/dead × match/mismatch; `409 idempotency_key_reused` at IPC layer with `conflict` field | §4.5 |
| §4.6 rejected-request contradiction | Single rule: id consumed iff outbox row written; pre-outbox failures leave id untouched; broker-rejected outbox row goes to `dead`, requires `requeue --new-client-id` | §4.6 |
| §4.7 pseudocode wrong | Corrected: `INSERT ON CONFLICT DO NOTHING`, then `SELECT FOR SHARE`, then branch on returned `broker_message_id` and `fingerprint` | §4.7.2 |
| Max-age math equals window at min | Min `dedupe_retention_days` raised to 7; safety margin always >= 24h; derived max-age strictly < window | §4.8, §15.1 |
| `dead` | match | Return `409 idempotency_key_reused`, `conflict: "outbox_dead_fingerprint_match", reason: "<last_error>"`. Same id never auto-retried |
| **`aborted`** (NEW v8) | **match** | Return `409 idempotency_key_reused`, `conflict: "outbox_aborted_fingerprint_match"`. The id was retired by operator action; never reusable |
| **C. Outbox stored, broker permanent rejection** | Broker returns `4xx` after IPC accept | `dead` | No — rotate via `requeue --new-client-id` |
| **D. Operator retirement** | Operator runs `requeue --new-client-id` on `dead` or `pending` row | `aborted` (audit) + new row with fresh id | Old id NEVER reusable; new id is fresh |
#### 4.6.2 Broker-side rejection phasing (NEW v8 — codex r7)
The broker validates in two phases relative to dedupe-row insertion:
| Phase | Validation | Result |
|---|---|---|
| **B1. Pre-dedupe-claim** (NEW — explicit) | Auth (mesh membership), schema, size, mesh exists, member exists, destination kind valid, payload bytes ≤ `max_payload.inline_bytes` | `4xx` returned. **No dedupe row inserted.** Caller may retry with same id and corrected payload. |
| **B2. Post-dedupe-claim** | Anything that requires the dedupe-claim transaction to be in progress: destination_ref existence (topic exists, member subscribed, etc.), per-mesh rate limit not exceeded | `4xx` returned, transaction rolled back, **no dedupe row remains**. Caller may retry with same id. |
| **B3. Accepted** | All side effects (dedupe row, message row, history row, delivery_queue rows) commit atomically | `201` returned with `broker_message_id` |
**Critical guarantee (v8)**: there is no broker code path where a
permanent rejection (4xx) leaves a dedupe row behind. Either the
request committed and a dedupe row exists (B3), or it didn't and no
dedupe row exists (B1, B2). This makes "dedupe row exists" the single
unambiguous signal of "id consumed at the broker layer."
If broker decides post-commit that an accepted message is invalid
(e.g. an async content-policy job runs on accepted messages), that's
NOT a permanent rejection — that's a follow-up moderation event that
operates on the broker_message_id, not on the dedupe key.
#### 4.6.3 Operator recovery via `requeue` (corrected v8)
To unstick a `dead` or `pending`-but-stuck row, operator runs:
columns and the `aborted` enum value (§4.5.2). Migration applies via
`INSERT INTO new SELECT * FROM old` recreation if needed; v0.9.0 is
greenfield.
- IPC accept switches to `BEGIN IMMEDIATE` for SQLite serialization
(§4.5.1 step 3).
- IPC accept handles `aborted` rows per §4.5.1 (always 409).
- `claudemesh daemon outbox requeue` always mints a fresh
`client_message_id`; never frees the old id. `--new-client-id <id>`
and `--auto` are the only modes; the old `client_message_id`
argument is removed.
---
## What changed v7 → v8 (codex round-7 actionable items)
| Codex r7 item | v8 fix | Section |
|---|---|---|
| `aborted` not in §4.5.1; `UNIQUE` contradiction | Added two `aborted` rows (match/mismatch) to lookup table; old id never reusable; new audit columns `aborted_at`/`aborted_by`/`superseded_by` | §4.5.1, §4.5.2, §4.6.3 |
| SQLite `SELECT FOR UPDATE` invalid | Replaced with `BEGIN IMMEDIATE` for daemon-local serialization | §4.5.1 |
| Side-effect inventory ambiguous on rate-limit/audit/search | Explicit in-tx vs outside-tx table with rationale per item | §4.7.1 |
| Operator id reuse semantics | Old id permanently retired in `aborted`; requeue always mints fresh id; no daemon-side path to release used ids | §4.6.3 |
---
## What needs review (round 8)
1. **`aborted` permanence (§4.5.1, §4.6.3)** — is "old id permanently
dead" correct, or is there a real operational case where releasing
an id (e.g. caller mistyped a uuid) is worth the audit-trail loss?
2. **Phase B1/B2/B3 split (§4.6.2)** — clean enough? Is rate-limiting
in B2 (in-tx) the right call, or should it be B1 (cheaper to enforce
pre-tx)?
3. **In-tx mention_index (§4.7.1)** — agree it should be in-tx, or
should mention indexing be async like search?
4. **`BEGIN IMMEDIATE` (§4.5.1)** — correct SQLite primitive, or should
it be `BEGIN EXCLUSIVE` to also block readers? (Probably not — readers
should see committed-pending rows, but worth confirming.)
5.**Anything else still wrong?** Read it as if you were going to
operate this for a year.
Three options:
- **(a) v8 is shippable**: lock the spec, start coding the frozen core.
- **(b) v9 needed**: list the must-fix items.
- **(c) the architecture itself is wrong**: what would you do differently?
| `dead` | match | Return `409 idempotency_key_reused`, `conflict: "outbox_dead_fingerprint_match", reason: "<last_error>"`. Same id never auto-retried |
| **`aborted`** (NEW v8) | **match** | Return `409 idempotency_key_reused`, `conflict: "outbox_aborted_fingerprint_match"`. The id was retired by operator action; never reusable |
| **C. Outbox stored, broker permanent rejection** | Broker returns `4xx` after IPC accept | `dead` | Yes | No — rotate via `requeue` |
| **D. Operator retirement** | Operator runs `requeue` on `dead` or `pending` row | `aborted` (audit) + new row with fresh id | Yes (still consumed) | Old id NEVER reusable; new id is fresh |
The "daemon-consumed?" column is the daemon-layer authority. It does
not depend on whether the broker ever saw the request — phase C above
shows the broker has not committed a dedupe row, but the daemon still
holds the id in `dead` state.
#### 4.6.2 Broker-side rejection phasing (v9 — rate limit moved to B1)
The broker validates in two phases relative to dedupe-row insertion:
| Phase | Validation | Side effects | Result for direct broker callers |
|---|---|---|---|
| **B1. Pre-dedupe-claim** (atomic, external) | Auth (mesh membership), schema, size, mesh exists, member exists, destination kind valid, payload bytes ≤ `max_payload.inline_bytes`, **rate limit not exceeded** (atomic external limiter — see §4.6.4) | None | `4xx` returned. No dedupe row, no broker-consumed id. Caller may retry with same id once condition clears |
| **B2. Post-dedupe-claim** (in-tx) | Conditions that require the accept transaction to be in progress: destination_ref existence (topic exists, member subscribed, etc.) | INSERT into dedupe rolled back | `4xx` returned, transaction rolled back, no dedupe row remains. Caller may retry with same id |
| **B3. Accepted** | All side effects commit atomically | Dedupe row, message row, history row, delivery_queue rows, mention_index rows | `201` returned with `broker_message_id`. Id is broker-consumed |
**Daemon-mediated callers**: in v0.9.0 the daemon is the only B-phase
caller. Daemon-mediated callers see only the daemon-layer rules
(§4.6.1). The broker's "may retry with same id" wording in the table
above applies to direct broker callers only (none in v0.9.0; reserved
for future SDK paths).
**Critical guarantee (v9 — tightened from v8)**: a dedupe row exists
**iff the broker accept transaction committed (B3)**. There is no
broker code path where a permanent 4xx leaves a dedupe row behind.
If the broker decides post-commit that an accepted message is invalid
(async content-policy job, async moderation, etc.), that's NOT a
permanent rejection — it's a follow-up event that operates on the
| Rate-limit **counters** (telemetry only) | Async, eventually consistent | Authoritative limiter is the external Redis-style INCR in B1 (§4.6.4); the DB counter is rebuilt for dashboards, not consulted for accept |
| Audit log entries | Async append-only stream | Audit log can be rebuilt from message history; in-tx writes hurt p99 |
| Search/FTS index updates | Async via outbox-pattern worker | Index can be rebuilt from authoritative tables |
| Rate-limit authority muddled (B2 vs async counters) | Rate limit moved to B1 via external atomic limiter (Redis-style INCR with TTL); DB rate-limit counters demoted to telemetry-only | §4.6.2, §4.6.4, §4.7.1 |
Single binary. No external runtime beyond the existing CLI dependencies. The daemon *is* the CLI in long-running mode — `claudemesh daemon up` is a flag on the same binary.
## 2. Identity — persistent member, not ephemeral session
The daemon mints a stable ed25519 + x25519 keypair on first startup, stored in `keypair.json`. Registers with the broker as a **persistent member** — same identity across restarts, reconnects, host migrations. `runpod-worker-3` is `runpod-worker-3` forever, until you `claudemesh daemon reset` or revoke the keypair.
`--name` is taken at first `daemon up`; subsequent runs read the keypair file and ignore `--name` unless `--rename` is passed (which produces a `member_renamed` event the broker propagates to peers).
This is the default. It's the right thing for servers. There is no `--ephemeral` mode.
## 3. IPC surface — single versioned API, three transports
**Transports**, all serving identical JSON:
- **UDS** at `~/.claudemesh/daemon/<slug>/sock` (primary, default)
- **TCP loopback** on auto-allocated port written to `http.port` (Docker / Windows clients)
- **Server-Sent Events** stream at `GET /v1/events` for push (real-time inbound)
**No auth on local IPC.** Trust boundary is the OS — UDS is mode 0600, TCP listens on 127.0.0.1 only. If you can reach the socket, you're already running as the right user; the daemon's `keypair.json` is also reachable, so adding a token would be theatre.
**Endpoint surface — exactly mirrors CLI verbs:**
```
# messaging
POST /v1/send {to, message, priority?, meta?, replyToId?}
POST /v1/topic/post {topic, message, priority?, mentions?}
POST /v1/topic/subscribe {topic}
GET /v1/topic/list
GET /v1/inbox ?since=<iso>&topic=<n>&from=<peer>&limit=<n>
POST /v1/broadcast {message, scope: "*"|"@group"|...}
# peers + presence
GET /v1/peers ?mesh=<slug>
POST /v1/profile {summary?, status?, visible?, avatar?, ...}
POST /v1/groups/join {name, role?}
POST /v1/groups/leave {name}
# state, memory, vector, graph — full mesh-services platform
POST /v1/state/set {key, value, scope?: "mesh"|"member"}
GET /v1/state/get ?key=...
GET /v1/state/list
POST /v1/memory/remember {content, tags?}
GET /v1/memory/recall ?q=<query>
POST /v1/vector/store {collection, text, metadata?}
GET /v1/vector/search ?collection=<c>&q=<query>&limit=<n>
POST /v1/graph/query {cypher, params?}
# files
POST /v1/file/share {path, to?, message?, persistent?}
GET /v1/file/get ?id=<fileId>&out=<path>
GET /v1/file/list
# tasks + scheduling
POST /v1/task/create {title, assignee?, priority?, tags?}
POST /v1/task/claim {id}
POST /v1/task/complete {id, result?}
POST /v1/scheduling/remind {at|in|cron, message, to?}
# skills + MCP services (full peer participation)
POST /v1/skill/deploy {path}
POST /v1/skill/share {name, manifest}
POST /v1/mcp/register {server_name, description, tools, transport}
GET /v1/health {connected, lag_ms, queue_depth, mesh, member_pubkey, uptime_s}
GET /v1/metrics Prometheus exposition
POST /v1/heartbeat {} (caller asserts it's alive — daemon may set status="working")
```
Every CLI verb the platform offers has a daemon endpoint. No second-class features. Apps written against the daemon get the same surface as Claude Code itself.
## 4. Outbound — exactly-once via SQLite + idempotency keys
Sends route through `outbox.db` first, then to the broker. Schema:
```sql
CREATETABLEoutbox(
idTEXTPRIMARYKEY,-- ulid
idempotency_keyTEXTUNIQUE,-- caller-provided or autogen
-`claudemesh daemon search "OOM"` queries the FTS index (instant, offline-capable).
- Apps that connect mid-stream replay history via `?since=<iso>`.
- Exposed in metrics: `cm_daemon_inbox_rows`, `cm_daemon_inbox_bytes`.
## 6. Hooks — first-class scripted reactions
Hooks turn the daemon from a passive relay into an autonomous peer. Files in `hooks/`:
```
hooks/
on-message.sh every inbound message (DM + topic)
on-dm.sh DMs only
on-mention.sh when @<my-name> appears anywhere
on-topic-<name>.sh a specific topic (e.g. on-topic-alerts.sh)
on-file-share.sh file shared with me
on-task-assigned.sh task assigned to me
on-disconnect.sh WS dropped (informational)
on-reconnect.sh reconnected (informational)
on-startup.sh daemon up
pre-send.sh filter / mutate outbound (last gate)
```
**Contract:**
- Stdin: full event JSON.
- Stdout (if non-empty, JSON object): used as a structured response. For inbound messages, `{reply: "..."}` posts a reply automatically.
- Exit 0 = success; non-zero logs + counts but does not retry.
- Timeout: 30s default, override via `# claudemesh:timeout=120s` shebang comment.
- Env: `PATH=/usr/bin:/bin`, `CLAUDEMESH_MESH=<slug>`, `CLAUDEMESH_MEMBER=<pubkey>`, `CLAUDEMESH_HOME=<config-dir>`, plus the daemon's own broker session token in `CLAUDEMESH_TOKEN` so the script can call `claudemesh send` without re-authenticating.
- Concurrent execution: bounded pool (default 8) — overflow queues, never blocks the WS reader.
This makes a server a real participant: it auto-replies to "@worker-3 status?", auto-acks file shares, auto-claims tasks, escalates errors to oncall — all configured by dropping shell scripts in a directory.
## 7. Multi-mesh — one daemon per mesh, coordinated by a supervisor
Multi-mesh handled by **one daemon per mesh** (no shared state, no cross-mesh leakage). Coordinated by:
```
claudemesh daemon up --all # spawns one daemon per joined mesh
claudemesh daemon down --all
claudemesh daemon status --all # JSON table of every daemon
claudemesh daemon ps # alias of status
```
CLI verbs without `--mesh` continue to do their existing aggregator routing (`/v1/me/...`) and additionally each daemon contributes inbound state to the aggregator.
## 8. Auto-routing — every CLI verb prefers the daemon
The CLI's `withMesh` helper is replaced by `viaDaemonOrMesh`:
1. Read `~/.claudemesh/daemon/<slug>/pid`.
2. If alive → call the daemon's UDS endpoint.
3. Else → cold path (existing `withMesh` flow, opens its own short-lived WS).
Transparent to the user. `claudemesh send X "msg"` from a script becomes a sub-millisecond local UDS call when a daemon is up, instead of a 1-second broker handshake.
## 9. Service installation
```bash
claudemesh daemon install-service # writes systemd unit / launchd plist / Windows SC
claudemesh daemon uninstall-service
```
Generated unit:
-`Restart=on-failure`, `RestartSec=5s`
-`MemoryMax=512M` (will rarely use this)
-`StandardOutput/Error=journal`
- For systemd, runs as the invoking user (no root needed).
`claudemesh install` (the existing setup verb) gains an opt-in prompt: *"Install as a background service that always runs?"* For interactive users this is opt-in; for `--yes` it defaults to yes on Linux servers (detected by absence of TTY + presence of systemd).
## 10. Observability
```
claudemesh daemon status human-readable: connected, lag, queue, hooks fired
Each is ~300 lines. All three are versioned in lockstep with the daemon's `/v1` surface. A `/v2` surface (when it eventually exists) keeps `/v1` alive indefinitely — old SDKs never break.
## 12. Security model — explicit boundaries
| Boundary | Trust | Mechanism |
|---|---|---|
| App ↔ Daemon (local) | OS user | UDS 0600, TCP loopback only |
| Hook ↔ Daemon (env) | OS user + filesystem | `hooks/` dir mode 0700; only files there execute; no remote install |
| Daemon ↔ Disk | OS user | All daemon files mode 0600/0644 under `~/.claudemesh/daemon/` |
**No new attack surface introduced by the daemon** — apps that previously could read `~/.claudemesh/config.json` directly already had full mesh access; the daemon just adds an IPC layer on top.
**Hook RCE consideration**: a peer cannot install a hook on your daemon. Hooks are files YOU put on disk. Inbound messages can only trigger hooks that already exist with content you wrote. The broker has no path to your hook directory.
## 13. Configuration — `config.toml`
```toml
[daemon]
mesh="prod"# set on `daemon up --mesh`; immutable thereafter
display_name="runpod-worker-3"
log_level="info"
[ipc]
http_port=0# 0 = auto-allocate
http_bind="127.0.0.1"# never 0.0.0.0; explicit if you know what you're doing
User-editable. `claudemesh daemon reload` re-reads it without dropping the WS.
## 14. Migration — what changes for existing users
-`claudemesh launch` (Claude Code mode) is unchanged. It can optionally `--via-daemon` to share the WS with a running daemon, but defaults to its own session (preserves "ephemeral session" semantics that Claude Code expects).
-`claudemesh send X "msg"` and every other cold-path verb gets a transparent speedup when a daemon is up. No flag, no opt-in, no behavior difference visible to the user.
- Existing `~/.claudemesh/config.json` is consumed unchanged by the daemon.
- No DB migration. No broker changes. The daemon talks to the existing `/v1` HTTPS + WSS surfaces — broker doesn't even know whether a connection is `claudemesh launch` or `claudemesh daemon`.
---
## What needs review
Please critically review this spec for the v0.9.0 anchor. Specifically I want
your hardest pushback on:
1.**Identity model** — persistent member by default vs ephemeral session. Have I
missed a case where ephemeral is the right answer for a daemon? Should
`--ephemeral` exist?
2.**No-auth local IPC** — UDS 0600 + TCP loopback. Is "OS-trust is enough"
actually safe in shared-tenant Linux (multi-user host, container
side-channel)? Should there be a per-daemon token even locally?
3.**SQLite outbox/inbox** — single writer, WAL, batched fsync. Is the
exactly-once-via-idempotency-key claim defensible? What's the failure mode
I'm glossing over?
4.**Hooks fork-execing scripts** — RCE/data-exfil concerns I'm dismissing too
easily? Should hooks be sandboxed (seccomp, no network, …)?
5.**Auto-routing CLI verbs through daemon** — does this break composability
with existing `claudemesh launch`? Race conditions when both are running?
What about pidfile-stale detection?
6.**One daemon per mesh** — why not one daemon serving all meshes, with mesh
selection per-request? What does single-daemon actually buy beyond "fewer
processes"?
7.**The IPC surface duplicates the broker REST surface** — am I solving a
problem the broker REST + per-mesh apikey already solves, with extra
| **C. Outbox stored, broker permanent rejection** | Broker returns `4xx` after IPC accept | `dead` | No — rotate via `requeue` |
| **D. Operator retirement** | Operator runs `requeue` on `dead` or `pending` row | `aborted` (audit) + new row with fresh id | Old id NEVER reusable; new id is fresh |
The broker validates in three phases relative to dedupe-row insertion:
| Phase | Validation | Side effects | Result for direct broker callers (none in v0.9.0) |
|---|---|---|---|
| **B1. Pre-dedupe-claim** | Auth (mesh membership), schema, size, mesh exists, member exists, destination kind valid, payload bytes ≤ `max_payload.inline_bytes`, rate limit not exceeded | None | `4xx`. No dedupe row. Direct broker caller may retry with same id |
| **B2. Post-dedupe-claim** (in-tx) | destination_ref existence (topic exists, member subscribed, etc.) | INSERT into dedupe rolled back | `4xx`, transaction rolled back, no dedupe row remains. Direct broker caller may retry with same id |
| **B3. Accepted** | All side effects commit atomically | Dedupe row, message row, history row, delivery_queue rows | `201` with `broker_message_id` |
**Daemon-mediated callers (the only path in v0.9.0)** see only the
daemon-layer rules of §4.6.1: any broker `4xx` after IPC accept lands
the outbox row in `dead`. Daemon-mediated callers MUST rotate via
`requeue` (§4.5.3); the daemon-consumed id is never reusable
regardless of whether the broker layer sees a dedupe row. The "may
retry with same id" wording above describes broker-bypass callers
only, which v0.9.0 does not have.
**Critical guarantee**: there is no broker code path where a permanent
4xx leaves a dedupe row behind. Either the request committed and a
dedupe row exists (B3), or it didn't and no dedupe row exists (B1, B2).
"Dedupe row exists" is the unambiguous signal of "id consumed at the
broker layer."
If the broker decides post-commit that an accepted message is invalid
(async content-policy job), that's NOT a permanent rejection — it's a
follow-up moderation event that operates on the `broker_message_id`,
not on the dedupe key.
Net result: `client_message_dedupe` rows only exist when the broker
**successfully** accepted a message and committed it. The single source
of truth for "was this idempotency key consumed?" is the existence of
the dedupe row. No status enum, no ambiguous states.
### 4.7 Broker atomicity contract
#### 4.7.1 Side-effect inventory
Every successful broker accept atomically commits these durable state
changes in **one transaction**:
| Effect | Table | Why in-tx |
|---|---|---|
| Dedupe record | `mesh.client_message_dedupe` | Idempotency authority |
| Message body | `mesh.topic_message` / `mesh.message_queue` | Authoritative store |
1.**Identity is opaque, display is free-form.** Humans pick any name; the system uses random IDs.
2.**Secrets never appear in URLs.** Links are capabilities, not credentials.
3.**Defaults are obvious; advanced options are discoverable but hidden.**
4.**Self-service wherever possible; admins don't become gatekeepers.**
5.**Every visible action is also an auditable event.**
These mirror how Anthropic builds its own org/workspace/project model.
---
## Part 1 — Meshes
### Problem
Global uniqueness on `mesh.slug` creates name collisions at scale. Two users picking "platform" or "test" fight for the slug. At 50k users this is the default state.
### Decision
**Drop the slug as an identity concept.** `mesh.id` (opaque, already random) is the canonical identifier everywhere (URLs, invites, broker lookups). `mesh.name` is a free-form display label, non-unique. `mesh.slug` is kept as a non-unique cosmetic string derived from the name at creation time, embedded in invite payloads for debugging.
### What this enables
- Two users can both name their mesh "platform-team" with zero friction
- URLs stay stable (`/meshes/{id}`) even if the user renames the mesh
- No "slug taken" error state exists in the product anymore
### Tradeoff explicitly accepted
Users lose the ability to type `claudemesh join platform-team` — but they never did, because the CLI takes signed invite tokens, not slugs. This capability was phantom.
### Implementation — DONE in this spec
- [x] Drop `UNIQUE` constraint on `mesh.slug` (migration `0017_mesh-slug-non-unique.sql`)
- [x] Remove `slug` field from `createMyMeshInputSchema`
- [x] Remove slug field from `CreateMeshForm`
- [x] Server-side `toSlug(name)` derives slug from name automatically
- [x] Schema comment documents the non-canonical role of `slug`
### Future (optional, not in v0.1.x)
- **Vanity slugs as a Pro feature:** one globally-unique handle per *account* (not per mesh), exposed as `claudemesh.com/@acme/...`. Sold as part of an org tier. This is where slug uniqueness actually pays for itself — against usernames, not against meshes.
---
## Part 2 — Invitations
### Problems with the current invite system
| # | Problem | Severity |
|---|---|---|
| 1 | `mesh_root_key` is embedded in the invite URL as base64url JSON | 🔴 **Security** |
| 2 | Invite URLs are ~400 chars of opaque base64url | 🟡 UX |
| 3 | No invite-by-email; only shareable link | 🟡 UX |
| 4 | Required form fields (role, maxUses, expiresInDays) for every invite | 🟡 UX |
| 5 | Landing page does not clearly preview role/consent | 🟡 UX |
| 6 | No audit trail for invites received-but-never-clicked | 🟢 Polish |
| 7 | `ic://` link scheme is vestigial, nothing registers the handler | 🟢 Polish |
### Severity 🔴 — the root key leak
Current canonical invite bytes:
```
v | mesh_id | mesh_slug | broker_url | expires_at | mesh_root_key | role | owner_pubkey
```
`mesh_root_key` is a 32-byte shared secret used by all channel and broadcast encryption in the mesh. Once it lives in a URL:
- Slack/Telegram/Discord link previews fetch and cache the URL → root key is in those caches
any future claim fails at broker with invite_revoked
root_key is NOT rotated — past members keep access
(for "kick a member" semantics, use a separate member revocation, which DOES rotate the key)
```
**Properties**
- URL contains only `{code}` (8 chars base62)
-`signed_capability` lives server-side; leaks of the URL never expose the root key
- Screenshot of invite URL is harmless
- Link preview bots see nothing sensitive
- Broker DB is the source of truth for revocation
**Migration strategy (v1 → v2)**
- Add `invite.code`, `invite.v2_capability` columns (nullable for existing rows)
-`createMyInvite` generates BOTH v1 token (legacy) and v2 code
- Web invite UI displays the short URL by default, long URL as "Legacy format" disclosure
- Broker accepts both formats until v0.2.0
- Announce deprecation window; at v0.2.0 the long-format endpoints 410 Gone
**Status update 2026-04-10 — v2 is now being implemented in parallel**
The scope that was deferred at the top of the session is actively landing in a coordinated multi-agent push:
- Broker: new `/api/public/invites/:code/claim` endpoint, `crypto_box_seal` against recipient x25519 pubkey, signed capability verification, single-use accounting.
- DB: `mesh.invite.version` int, `mesh.invite.capability_v2` text nullable, `mesh.invite.claimed_by_pubkey` text nullable. New table `mesh.pending_invite` for email invites.
- CLI / web claim client: generates a fresh x25519 keypair (separate from the ed25519 identity), POSTs the pubkey, unseals the returned `sealed_root_key`, then verifies `canonical_v2` against `owner_pubkey`.
- Email invites (parallel track): Postmark delivery wired on top of `pending_invite`; the email body carries the same `claudemesh.com/i/{code}` short URL.
v1 invites continue to work throughout v0.1.x. v1 endpoints return `410 Gone` at v0.2.0.
Docs updated in the same session: `SPEC.md` §14b, `docs/protocol.md` (v2 invites subsection), `docs/roadmap.md` (in progress).
---
### Severity 🟡 — implemented this session
#### Short invite codes (URL shortening, backward-compatible)
Additive: invites now get both a long token AND a short opaque code. The web app prefers the short URL.
**DB:** new nullable `invite.code` column, unique. New migration `0018_invite-short-code.sql`.
**Web:** new server route `/i/[code]/page.tsx` that resolves the code server-side and redirects to the canonical `/join/[token]` page. Invite generator UI shows the short URL as the primary "Copy link" target.
**Backward compat:** existing invites without a `code` keep working via their long token. No broker/CLI changes.
**This is NOT the v2 protocol.** It only fixes the URL-length problem. The root key is still embedded in the long token that the short code resolves to. The short code is a URL shortener, not a capability boundary. Document this clearly so nobody confuses the two.
---
#### Collapsed advanced fields
The invite form asks for `role`, `max uses`, `expires in days` upfront. 90% of users only ever create `{ role: member, max_uses: 1, expires_in_days: 7 }`.
Change: defaults are pre-filled; the three fields are hidden behind an "Advanced" disclosure.
---
### Severity 🟡 — deferred
#### Invite by email
- Requires an `invitation_email` table or equivalent pending-invites state
- Requires wire-up to email delivery (already have Postmark via turbostarter)
- Out of scope this session; fits naturally on top of v2 invite protocol
#### Consent landing redesign
- The `/join/[token]` page should show: mesh name, inviter, role being granted, member count, expiry, explicit "Join as Member of ACME" button
- Needs a design pass
- Deferred
---
### Severity 🟢 — deferred
- Remove `ic://` scheme — it's dead, nothing handles it, safe to delete in v0.1.x cleanup
- Received-but-not-clicked audit — falls out of email invites for free
---
## Summary table
| Change | Status | File(s) |
|---|---|---|
| Drop global slug uniqueness | ✅ done | `packages/db/src/schema/mesh.ts`, migration `0017` |
| Remove slug from create-mesh form | ✅ done | `apps/web/src/modules/mesh/create-mesh-form.tsx` |
| Server-derived slug from name | ✅ done | `packages/api/src/modules/mesh/mutations.ts` |
| Short invite codes (URL shortener) | ✅ done | `packages/db` migration `0018`, api, web `/i/[code]` |
claudemesh launch --mesh platform-team -y # spawns Claude Code in the mesh
```
For CI / scripting / non-interactive contexts, PAT works too:
```
claudemesh login --token cm_pat_abc123
claudemesh create "CI test mesh" --json | jq .id
```
This is the auth substrate that unblocks the "Anthropic vision" — every other dashboard-only feature (meshes, invites, members, billing) becomes CLI-accessible after this lands.
- Refresh tokens with rotation (long-lived API keys are sufficient for v1)
- Multi-account switching (one logged-in identity per `~/.claudemesh/auth.json`)
- Device fleet management UI (single "revoke" button per token is enough for v1)
## Auth model overview
Two coexisting credential types, both backed by **Better Auth's `apiKey` plugin**:
| Type | Created via | Lifetime | Use case | Storage |
|---|---|---|---|---|
| **Device-code session token** | `claudemesh login` (OAuth-style browser handshake) | 90 days, auto-renew on use | Interactive humans on their workstation | `~/.claudemesh/auth.json` |
| **Personal access token (PAT)** | Dashboard → Settings → CLI tokens → Generate | User-chosen (30d / 90d / 1y / never), explicit revocation | CI, scripts, automation, server-side cron | Anywhere the user puts it; CLI reads from `--token` flag, env var, or `auth.json` |
Both flow through the same `Authorization: Bearer cm_<type>_<random>` header. The API doesn't care which one it gets — it just validates against the `api_key` table.
-`cm_pat_<32-byte base32>` — personal access tokens
The `cm_` prefix lets us scan for leaked tokens with regex (e.g. GitHub secret scanning, internal scripts). The middle segment (`session` / `pat`) is for human readability in token lists, not for security.
## User flows
### 1. First-time login (interactive happy path)
```
$ claudemesh login
██ claudemesh login
Opening browser for authentication…
If your browser didn't open, visit:
https://claudemesh.com/cli-auth?code=ABCD-EFGH
Enter this code:
ABCD-EFGH
Waiting for confirmation… ⠋
```
In the browser:
1. User lands on `/cli-auth?code=ABCD-EFGH`
2. If not signed in, Better Auth login screen appears, then redirects back
3. User sees a confirmation card:
```
Link this CLI session?
Code: ABCD-EFGH
Device: Alejandro's MacBook Pro · darwin · arm64
Expires in 9:47
[Approve] [Deny]
```
4. User clicks Approve
CLI polls every 1.5s, sees `approved`, receives token, writes `~/.claudemesh/auth.json` with `0600`, prints:
```
✔ Authenticated as Alejandro Gutiérrez
✔ Token saved to ~/.claudemesh/auth.json
✔ Synced 3 meshes: alexis-mou, dev, claudefarm
Run claudemesh --help to get started.
```
### 2. First-time login (PAT, non-interactive)
```
$ claudemesh login --token cm_pat_abc123def456...
✔ Authenticated as Alejandro Gutiérrez (via PAT "ci-deploy")
No auth prompt — token in `auth.json` is used silently.
### 4. Token expired or revoked
```
$ claudemesh peers
✘ Authentication failed (token expired or revoked)
Run claudemesh login to re-authenticate.
```
Exit code `2`. The `auth.json` is **not** auto-deleted (user might be debugging) but the next `claudemesh login` overwrites it cleanly.
### 5. Wizard launch flow with auth integration
When `claudemesh` (bare, no auth) is run:
```
██ claudemesh
▸ Sign in (opens browser)
Paste a personal access token
Join a mesh via invite URL
Exit
```
After auth completes, the wizard transitions naturally into the launch flow (mesh picker → name → role → confirm → handoff). One uninterrupted experience from "fresh install" to "Claude Code in a mesh."
claudemesh create "CI run $GITHUB_RUN_ID" --json > mesh.json
```
Or zero-state:
```
- env:
CLAUDEMESH_TOKEN: ${{ secrets.CLAUDEMESH_PAT }}
run: claudemesh create "CI run $GITHUB_RUN_ID" --json
```
Token resolution order: `--token` flag > `CLAUDEMESH_TOKEN` env var > `~/.claudemesh/auth.json`.
### 7. Logout
```
$ claudemesh logout
✔ Token revoked on server
✔ Removed ~/.claudemesh/auth.json
```
`logout` calls `DELETE /api/my/cli/sessions/current` to revoke server-side, then unlinks the local file. Best-effort: if the server call fails, still delete locally and warn.
## Architecture
### Backend — Better Auth `apiKey` plugin
Better Auth ships an `apiKey` plugin that handles:
- Token generation (cryptographically random)
- Hashed storage (only the hash hits the DB; raw token never persisted)
Type definitions live in `packages/api/src/contracts/cli.ts` (new file) — generated from the existing tRPC routers as plain types so the CLI doesn't need to import the whole tRPC client.
Base URL from `CLAUDEMESH_API_URL` env var, defaults to `https://claudemesh.com`. Allows local dev against `http://localhost:3000`.
Polls every 1.5s. Server returns `{ slow_down: true }` if polled too fast (rate limit at 1/sec).
## Security
1. **Tokens are hashed at rest** (Better Auth `apiKey` plugin handles this with bcrypt or argon2).
2. **Raw tokens shown to user once.** PATs in dashboard, device-code tokens via `claudemesh login` output. Never logged, never re-displayable.
3. **`auth.json` is `0600`.** CLI refuses to write if parent dir can't be made `0700`. Warns on read if mode is wider.
4. **Token prefix `cm_` enables secret scanning.** Document the regex `cm_(session|pat)_[a-z0-9]{32,}` in security docs so GitHub secret scanning, GitGuardian, etc. can detect leaks.
5. **`/api/auth/cli/device-code/:device_code` polling is rate-limited** to 1 req/sec per IP per device_code. Returns `429` with `slow_down: true` body.
6. **Device codes expire in 10 minutes.** Approved-but-unclaimed tokens stay valid (the polling endpoint still returns the token for 60 seconds after approval, then the device_code row is GC'd).
7. **Audit logging.** Every device-code approval, PAT creation, and PAT revocation emits an audit event (`auth.cli.session.created`, `auth.cli.pat.created`, etc.). Stored in existing audit log if there is one, otherwise new `audit_log` table.
8. **Session invalidation on password change.** When a user changes their password via Better Auth, all `cli_session` `api_key` rows for that user are revoked. PATs are NOT auto-revoked (they're explicitly user-managed).
9. **Token revocation is immediate.** `auth.api.verifyApiKey` checks DB on every request — no in-memory cache.
10. **No CSRF concern** for device-code endpoints — the unauthed ones don't act on user state, the authed ones use Better Auth's existing CSRF protection.
## Wizard UX integration
The current welcome wizard already has:
```
▸ Create account (new to claudemesh)
Sign in (existing account)
Paste an invite URL
Exit
```
After this spec lands, the welcome screen becomes:
```
██ claudemesh
▸ Sign in ← device-code OAuth
Paste an access token ← PAT path
Join via invite URL ← unchanged
Create account ← opens /register, then back to login
Exit
```
"Sign in" becomes the headline option. The current "Create account" still opens browser to `/register` but flows back through the device-code handshake instead of a custom callback.
Once authenticated, the wizard transitions to:
```
██ claudemesh launch
Account ✔ Alejandro Gutiérrez
Mesh ▸ (pick one — 3 available)
Name ✔ Alexis (from --name)
Role ▸ (pick one)
▸ Continue
Cancel
```
Status rows show what's filled and what's left. Mesh picker fetches from `GET /api/my/meshes` via the freshly minted token.
This integrates cleanly with the wizard architecture refactor in `2026-04-10-cli-wizard-architecture-refactor.md`: auth becomes one screen in the launch flow with `isComplete: s => s.user !== null`. On a fresh machine the auth screen runs; on a returning machine it's auto-skipped.
| PAT pasted to `--token` is malformed | Print "Invalid token format (expected `cm_pat_…`)", exit 1 |
| PAT pasted to `--token` is valid format but unknown | API returns 401, print "Token rejected", exit 2 |
| Two CLI instances poll simultaneously | Both get the same approved status; first to read gets the token, second gets `{ status: 'approved', token: null }` (already_claimed). Document this. |
| User clicks Approve in browser, then closes tab | CLI's poll catches it, login succeeds. The browser tab closure is irrelevant. |
| User completes login on machine A, then runs `claudemesh login` on machine B with same account | Both sessions coexist as separate `api_key` rows. `claudemesh whoami --sessions` shows both. |
## Implementation phases
Each phase ships independently and is independently testable.
### Phase 1 — Backend foundation (4–6 hours)
- [ ] Wire Better Auth `apiKey` plugin in `apps/web/src/lib/auth/`
- [ ] Migration `0020_cli-device-code.sql`
- [ ] Drizzle schema for `cli_device_code` in `packages/db/src/schema/auth.ts`
- [ ] Telemetry events for `auth.cli.login.{start,success,fail}`
- [ ] Bump `apps/cli/package.json` to `0.11.0`
- [ ] Publish to npm
- [ ] Deploy broker / web (no broker changes, web for new routes)
**Total estimate:** 19–26 hours of focused work. Realistic: 3–4 days with testing and review.
## Dependencies between phases
```
Phase 1 (backend) ──┬─→ Phase 2 (web routes)
└─→ Phase 3 (CLI auth core)
│
└─→ Phase 4 (commands)
│
└─→ Phase 5 (wizard)
│
└─→ Phase 6 (ship)
```
Phase 1 and 2 can be parallelized after the schema lands. Phase 3 needs Phase 1 endpoints live (even if on staging). Phase 4 onwards is strictly serial.
## Telemetry
Emit these events (PostHog or whatever the existing analytics are):
Telemetry is **opt-out**. First run shows a one-line notice: "claudemesh collects anonymized usage telemetry. Disable with `claudemesh telemetry off`."
## Open questions
1. **Better Auth `apiKey` plugin version** — confirm it's installed and at a version that supports `enableMetadata`. Check `pnpm why better-auth` in `apps/web`.
2. **Audit log table** — does one already exist? If not, this spec adds three rows of log; not worth a new table for that. Use `console.log` with structured JSON to stderr and let the platform's log collector handle it.
3. **Email sending** — `claudemesh invite --email` requires a transactional email path. Does the web app already have one (Resend, Postmark)? If yes, reuse. If no, defer the email send to a follow-up; the invite command can still create the invite and print the URL.
4. **Token scopes** — v1 ships with no scopes; every token has full account access. Should we add `mesh:read`, `mesh:write`, `invite:create` scopes from day one, or wait? **Recommendation:** wait. YAGNI. Add when a user actually wants a read-only CI token.
5. **PAT expiry default** — 90 days? 1 year? Never? Better Auth supports all three. **Recommendation:** 1 year default, user can pick "never" with explicit warning.
6. **Mesh slug uniqueness in `claudemesh create`** — what happens if two users try to create meshes with the same slug? Existing API behavior should be tested. If it errors, the CLI should suggest `--slug platform-team-2`.
7. **`claudemesh login` when already logged in** — re-authenticate (overwrite) or error ("already logged in, run logout first")? **Recommendation:** re-authenticate silently with a one-line notice ("Replacing existing session for Alejandro").
## Acceptance criteria
For v0.11.0 to ship, all of these must be true:
- [ ] `claudemesh login` on a fresh machine (no `auth.json`) opens browser, completes device-code flow, writes `auth.json`, runs in <30 seconds end-to-end
- [ ] `claudemesh login --token cm_pat_…` works without browser
- [ ] `claudemesh logout` revokes server-side and deletes local file
- [ ] `claudemesh whoami` prints user identity and token source
- [ ] `claudemesh create "Test mesh"` creates a real mesh on the server, joins it locally, and the user can see it on the dashboard
- [ ] `claudemesh invite --email a@b.c --mesh test` creates an invite and prints the URL
- [ ] `claudemesh launch` (bare) on a fresh machine walks login → mesh picker → name/role → Claude Code, all in one wizard
- [ ] Dashboard `/dashboard/settings/cli-tokens` lists, creates, and revokes PATs
- [ ] All flows work in `en` and `es`
- [ ] Existing `claudemesh launch` invocations (with token already in `auth.json`) still work without prompting
- [ ] Token in `auth.json` survives an hour of idle and continues to work (no aggressive expiry)
- [ ] Revoking a token in the dashboard makes the next CLI call fail with a clear error
- [ ] Documentation updated in `README.md`, `apps/cli/README.md`, `docs/quickstart.md`
- [ ] CHANGELOG entry written
- [ ] Published to npm as `claudemesh-cli@0.11.0`
## What this unlocks
Once this lands, every dashboard-only feature becomes one CLI command away. Future specs that depend on this:
- `claudemesh members list` / `claudemesh members add`
| `connections` | 147 | yes (WS state) | ✅ naturally per-node | WS connections are pinned to a node by L7 routing. Each node holds only its own connections. **OK as long as the LB uses sticky sessions or cross-node fan-out.** |
| `connectionsPerMesh` | 148 | yes | 🟡 per-node count, not global | Used for capacity cap. Global cap requires Redis. |
| `urlWatches` | 173 | yes | 🔴 stuck on one node | If peer disconnects from node A and reconnects on B, the watch stays orphaned on A. **Needs DB/Redis, or "pin to owning node". Acceptable risk if watches are per-session ephemeral.** |
| `streamSubscriptions` | 259 | yes | 🔴 multi-node broken | Sub on A, publish on B → message never reaches A's subscribers. **Needs Redis pub/sub for HA.** |
| `meshClocks` | 270 | yes | 🔴 multi-node broken | Simulated clocks must be single-authority. Solve by pinning one node as clock leader (simple leader election) or by moving clock state to DB. |
| `mcpRegistry` | 327 | yes | 🔴 multi-node broken | MCP server catalog cached in memory. If deployed on A but called on B, B doesn't know it exists. **Must be DB-backed** (partly is already — see `mesh_service` table). Audit the cache/DB sync path. |
| `mcpCallResolvers` | 338 | yes | ✅ per-call ephemeral | In-flight callback resolvers; WS sticks to owning node so this is fine. |
| `scheduledMessages` | 359 | yes | 🔴 multi-node broken | Scheduled delivery timers live in-process. Restart loses them. Persistence exists (`scheduled_message` table) + recovery on startup, but two nodes could both fire the same timer. **Needs a leader lock or per-schedule pg_advisory_lock on fire.** |
| `sendRateLimit` | index.ts:494 | yes | 🟡 per-node | Each node enforces its own quota; a client spread across nodes could 2x the limit. Tolerable if sticky sessions hold. |
| `hookRateLimit` | index.ts:482 | yes | 🟡 per-node | Same as sendRateLimit. |
| `lastHash` (audit.ts:22) | — | yes | 🔴 broken on write | Two nodes writing audit rows concurrently will BOTH read the same last hash, BOTH compute a new hash, and both INSERT — the chain forks. **Needs `SELECT FOR UPDATE` or a single audit writer.** |
## Conclusion
**Current broker is NOT HA-safe.** Five symbols break under multi-instance:
| `apps/cli/src/services/crypto/box.ts` | 60 | `crypto_box_easy` / `crypto_box_open_easy` wrappers that accept ed25519 keys and convert to curve25519 via `crypto_sign_*_to_curve25519`. |
| `apps/cli/src/commands/backup.ts` | ~180 | Config backup via Argon2id + XChaCha20-Poly1305 (`crypto_aead_xchacha20poly1305_ietf_*`) from a user passphrase. |
- Secrets storage on the user's machine (we rely on OS file mode 0600)
## Threat model
### Adversary profile
- **Network attacker** on the wire between CLI and broker. Controls
DNS, can inject packets, can replay. TLS terminates at Traefik;
assume TLS is trusted.
- **Malicious broker** operator. Can read any row in Postgres.
- **Mesh peer** with a valid member record. Can try to escalate
privileges, impersonate other members, replay, DoS, exfiltrate
other members' messages.
- **Laptop thief** who has the user's `~/.claudemesh/` directory but
not the login password. (Keys on disk at mode 0600.)
### Must hold
- E2E: broker cannot read plaintext of direct messages.
- Signature: no member can forge messages signed as another member.
- Invite integrity: modifying an invite URL invalidates the signature.
- Backup secrecy: an attacker with the backup file but not the
passphrase learns nothing.
- Audit integrity: tampering with an audit row breaks chain
verification.
### Known weaknesses (deliberate)
- **root_key in v1 invite URL**: current long URL form carries the
mesh root key in base64(JSON). Short-URL mode (`/i/<code>`) resolves
to the same token server-side, so this does NOT reduce the exposure.
v2 protocol moves root_key out of the URL but CLI migration is not
yet shipped.
- **Session-key routing identity**: a peer can claim arbitrary
`sessionPubkey` in hello (validated as 64-hex in alpha.36 but not
proven-own). Proof-of-secret-key for session key is not enforced.
Impact: a peer can route messages as any session pubkey it chooses
but cannot decrypt replies without the matching secret, so the
impact is DoS/confusion, not impersonation.
- **mesh.owner_secret_key stored plaintext** in the DB. A malicious
broker can issue arbitrary invites. Mitigated only by DB access
control.
## Review checklist for the reviewer
1.**libsodium usage**
- Are nonces generated with `randombytes_buf` and never reused?
-`crypto_box_easy` / `crypto_box_open_easy` order and parameters correct?
- Are ed25519 keys converted to curve25519 on BOTH sides consistently?
- Is `crypto_sign_detached` / `crypto_sign_verify_detached` used with the right message bytes?
2.**Invite protocol**
- Canonical bytes v1 + v2 format strings stable across CLI and broker?
- Replay protection: is a v1 URL reusable? (short URL + usedCount)
- Is the `maxUses` counter race-safe? (atomic UPDATE with `lt`)
- v2 root_key sealing: does `crypto_box_seal` fit the trust model?
- Is recipient_x25519_pubkey validated on both shape and length?
3.**Audit chain**
- Is the canonical JSON serialization reviewable and stable?
- Does `pg_advisory_xact_lock` actually serialize writes on the same mesh under HA?
- Can a malicious broker rewrite history by dropping the `lastHash` cache + DROPping rows + replaying with a new chain? (Yes — documented. Mitigation is append-only at the DB level.)
4.**At-rest encryption (broker-crypto.ts)**
- AES-256-GCM with 12-byte IV + 16-byte tag — correct, but is the IV generation guaranteed random and unique per encryption?
- Any concern about auth tag truncation or nonce collision under high volume?
5.**Backup (cli/commands/backup.ts)**
- Argon2id params reasonable? (INTERACTIVE — should possibly be SENSITIVE.)
- XChaCha20-Poly1305 parameter order?
- Does the passphrase-minimum (12 chars) match the Argon2id parameters?
- Is the salt stored alongside the ciphertext and read back correctly?
6.**Session vs member key**
- When is which key used? Is there any path where one is trusted for the other's purpose?
7.**Hello signature**
- Timestamp skew window (`±60s`) — does the broker reject out-of-window replays?
- Is the canonical hello string covered by the signature exactly?
8.**Grants**
- Can a peer bypass server-side grant enforcement by lying about their
own sender key in hello? (Signature pins memberPubkey to a real
2. Dogfood: new invites generated by `claudemesh share` use `/api/public/invite-code/:code` with v2-shape response that omits token; CLI resolves via claim.
3. Verify with `claudemesh verify` safety numbers cross-check.
4. After 2 weeks uneventful, flip default to v2.
5. After 60 days, stop embedding root_key in long URLs entirely.
6. v3 (future): short URL becomes the only form.
## Effort
~1 day of focused crypto + testing. Broker work is done; API work is
done; CLI work is a new parse path + a new enroll path + a few tests.
Two issues with the current `claudemesh mcp` server:
1.**80+ tools registered.** Every Claude session that has claudemesh installed pays the deferred-tool-list cost (~80 entries surfacing in `ToolSearch`). Most of those tools are CLI-verb-wrappers that already have a perfect Bash equivalent — no structured I/O is gained by exposing them as MCP tools.
2.**Single-mesh push only.** A session launched with `claudemesh launch` opens its WS to one mesh. Peer messages from any other joined mesh arrive only if the user manually runs `claudemesh inbox`. The MCP push pipeline doesn't fan out across meshes.
The cleanest framing: **MCP earns its keep when a tool returns structured data Claude reads. CLI is better for fire-and-forget verbs.** Today's tool surface ignores that distinction.
## Non-goals
- **Don't redesign the architecture as "CLI-only with a daemon."** That trades warm-WS sends (~5ms in-process) for cold Bash spawns (~300-500ms) and forces a Unix-socket bridge to recover state coherence. See discussion 2026-05-01 — the platform vision (vectors, graph, files, mesh-services) genuinely benefits from typed tool I/O.
- **Don't break MCP backward compat in 1.x.** Existing scripts calling `mcp__claudemesh__send_message` keep working until 2.0; in 1.1 they're soft-deprecated with a stderr warning.
## Proposal
Three patches, ship together as 1.1.0:
### Patch 1: `--mesh <slug>` flag on `claudemesh mcp`
Today `claudemesh mcp` calls `readConfig()` and `startClients(config)` — connects to every mesh in `~/.claudemesh/config.json`. The `claudemesh launch` flow writes a per-session tmpdir config with one mesh, so practically the MCP server binds to one mesh per session.
Add an explicit flag for non-launch contexts (manual `~/.claude.json` editing):
Each instance opens one WS, holds it for the session, decrypts and forwards `claude/channel` notifications independently. Channel events already carry `[meshSlug]` in `formatPush()` (server.ts:240), so Claude knows which mesh a message came from.
1.**1.1.0** — ship all three patches. Existing users see deprecation warnings; nothing breaks.
2.**1.1.x** — collect feedback. If anyone has scripts hard-wired to the deprecated tools, surface in CHANGELOG.
3.**1.2.0** (~6 weeks later) — flip deprecation warnings to "removal in 2.0" messaging.
4.**2.0.0** — delete the 25 tool registrations. ToolSearch surface drops to ~50 entries.
## Open questions
- **Do we need a Unix-socket bridge between CLI sends and the MCP push-pipe** so they share one WS connection per mesh per session? Probably yes for `claudemesh send` warm-path performance, but it's a separate spec — file under `socket-bridge` after this lands.
- **Should `claudemesh launch` keep writing one MCP server entry** (current behavior, default for new users) or switch to the per-mesh-N-entries pattern from Patch 1? Recommend keeping single-entry default — Patch 1 is for advanced users who manually edit `~/.claude.json`.
- **Do `mesh_mcp_*` tools really belong in the keep list?** They're MCP-on-mesh management — their bias is RPC-shaped, not stream-shaped. Provisional yes; revisit if 1.1 reduces their use.
## Effort
- Patch 1: ~10 LoC + 1 test. ~30 min.
- Patch 2: ~25 tool-handler refactors (registration removed, CLI verb confirmed/added). Some new verbs (`status set`, `summary`, `visible`, `group join/leave`, `forget`, `stats`, `clock`, `ping`, `task claim/complete`, `msg-status`) need wiring through to existing broker-client methods. ~150 LoC, half a day.
title: claudemesh North Star — CLI-first with claude/channel push-pipe
status: canonical
target: 2.0.0
author: Alejandro
date: 2026-05-02
supersedes: none
references:
- 2026-05-01-mcp-tool-surface-trim.md (first cut at the trim)
- SPEC.md
- docs/protocol.md
---
# claudemesh North Star
## The commitment, in one sentence
> **CLI is the canonical surface for every claudemesh operation. MCP exists for one thing: to deliver `claude/channel` push notifications mid-turn. That's the killer feature, and it's the only reason an MCP server runs at all.**
Everything else — sending messages, listing peers, sharing files, deploying mesh-MCPs, running graph queries, scheduling jobs, publishing skills — is invoked from the CLI, by humans, scripts, cron, hooks, or by Claude itself via Bash.
## Why this shape
1.**Mid-turn interrupt is the differentiator.** When peer A sends to peer B, B's Claude session pauses what it's doing and reads the message immediately. That requires `claude/channel` notifications routed through an MCP transport — Claude Code only watches MCP server connections for those events. **Lose that, and claudemesh becomes another inbox-polling pattern.** Every other primitive can degrade to "delivered at next tool boundary"; this one cannot.
2.**CLI is universal.** Bash works in scripts, hooks, cron, CI, terminals, automation, and Claude itself (via Bash tool calls). A primitive that exists as both an MCP tool and a CLI verb is double-maintenance with one calling convention nobody actually wants.
3.**JSON-on-stdout is enough structure.** Claude reads `claudemesh peers --json` exactly as well as it reads a typed MCP tool return. The CLI man page is the schema. The "MCP gives structured I/O" advantage was real when we were paying for nothing else, but warm-WS via socket bridge (below) closes the cost gap.
4.**Surface shrinks where it matters.** ToolSearch deferred-tool list drops from ~80 entries to ~0 entries (push-pipe registers no tools). Massive context-budget win for every Claude session.
## Prior art (this is not novel architecture)
The "live-state daemon + thin scriptable CLI talking via Unix socket" pattern is the canonical shape for CLIs in this category. Reviewers should not treat this as bespoke design:
- **Docker** — `dockerd` daemon, CLI talks via `/var/run/docker.sock`. `DOCKER_HOST` env override. `docker context` for multi-daemon switching.
- **Tailscale** — `tailscaled` daemon, `tailscale` CLI via socket. Per-key ACL identity model. Same peer-mesh-with-keypairs shape as claudemesh.
- **Stripe `listen`** — long-running CLI daemon receives webhook push, forwards to local consumer. Same push-pipe-as-CLI-subcommand shape.
- **Obsidian CLI** — talks to a running Obsidian instance via REST. **Notable: ships a Claude skill (`~/.claude/skills/obsidian-cli/SKILL.md`) that documents every verb and flag for Claude consumption — replacing MCP tool introspection entirely.**
Claudemesh's CLI-first + push-pipe + socket-bridge architecture is exactly this pattern. We are following the well-trodden path, not inventing a new one.
## The six architectural commitments
### 1. **MCP server is a push-pipe, full stop.**
The MCP entrypoint (`claudemesh mcp [--mesh <slug>]`) does exactly three things:
- Holds a WS connection to the broker for the meshes it's bound to.
- Decrypts inbound peer messages.
- Emits them as `claude/channel` notifications to the parent Claude Code session.
It registers **zero tools**. It advertises only `experimental: { "claude/channel": {} }`. Its `tools/list` returns an empty array. There is no surface to discover, search, or call.
One push-pipe per joined mesh, registered in `~/.claude.json` via `claudemesh install` (or auto-injected by `claudemesh launch`). The `--mesh` flag (shipped 1.0.3) makes this trivial.
### 2. **CLI is the canonical surface for every primitive.**
Every resource has uniform CLI verbs:
| Resource | Verbs |
|---|---|
| peer | `claudemesh peers [--json] [--mesh X]` |
| group | `claudemesh group join/leave @<n> [--role X]` |
**Every verb supports `--json`** for structured consumption. **Every verb supports `--mesh <slug>`** for targeting (default: pick first or interactive picker). Verbs share one broker-call implementation — no duplication between CLI and MCP.
### 3. **Warm path via Unix socket bridge** (load-bearing for 2.0).
A push-pipe holds a live WS connection. CLI invocations should reuse that connection rather than opening their own (which costs ~300-500ms cold-start).
- CLI verbs that need broker round-trip first try to dial that socket.
- If alive: forward request, get response back over socket (~5ms).
- If absent / stale: open ephemeral WS, do the op, close (~300ms — fine for cron/scripts where there's no parent push-pipe).
Push-pipe owns one WS, all ops through that WS, broker sees ONE session per mesh per host (no duplicate hellos). On crash, socket file is unlinked by `unlink` on exit handler; stale-socket detection by `connect()` ECONNREFUSED.
This is **mandatory for 2.0** — without it, every CLI op pays cold-start, and CLI-first becomes unusably slow for tight loops.
### 4. **JSON output is the schema, with field selection and streaming.**
Every CLI verb has a deterministic `--json` output shape, documented in `docs/cli-schemas.md`, validated by zod parsers in tests. Claude reads `claudemesh vector search "x" --json` and gets a typed-array shape it can reason over identically to a tool return.
**Three output modes, mandatory across every read-shaped verb** (modeled on `gh` and `gemini`):
-`--output-format stream-json` — incremental JSONL for long-running ops (mesh-MCP calls fanning across peers, `vector search` against large indexes, `schedule list` with many entries). One object per line, Claude consumes incrementally.
Plus convenience output:
-`--jq <expr>` — native jq filter pipeline
-`--template '{{.field}}'` — Go template formatting
`schema_version: "1.0"` field on every JSON output — mandatory. Bumps when shape changes. Old code paths can pin with `--schema-version=1.0`.
### 5. **All features stay. Nothing is removed.**
This is **not a feature trim**. Every primitive in the current 80-tool surface gets a CLI verb. Vectors, graphs, mesh-MCP, files, vault, SQL — all of it. The user-facing pitch is unchanged: "claudemesh gives your Claude session a name, a network, shared memory, shared compute, shared skills, scheduled actions." The change is *how you call it*.
### 6. **The Claude skill IS the schema.** *(load-bearing for CLI-first to work)*
Stripping MCP tool introspection (`tools/list`) costs Claude its discoverability. The replacement: a packaged `claudemesh` skill at `~/.claude/skills/claudemesh/SKILL.md` written by `claudemesh install`, documenting every verb, flag, JSON shape, and gotcha. Claude reads it on demand via the Skill tool — **not on every session, not pre-loaded into deferred-tool-list**. This is exactly how `obsidian-cli` works today and it works perfectly.
The skill replaces three things at once:
- **Tool discovery** — Claude knows the verb-set after one Skill invocation. No `tools/list` needed.
- **Output schemas** — every JSON shape is documented in the skill, so Claude knows what to expect from `--json` without parsing TypeScript types at runtime.
- **Behavioral conventions** — the skill teaches "preview before delete," "confirm peer match before kick," "use `--mesh` for cross-mesh ops" — soft guardrails that complement the policy engine's hard rules.
Topic-shards for size: `claudemesh` (core), `claudemesh-platform` (vault/vectors/graph/sql/mesh-mcp), `claudemesh-schedule` (cron/webhooks/watches), `claudemesh-admin` (kick/ban/grants/install). Each shard is independently loadable.
**This is the answer to the "JSON-on-stdout is a worse schema" caveat.** It's not — when Claude has a documented skill to load, the CLI surface is *more* discoverable than 80 deferred MCP tools that bloat ToolSearch silently.
### 7. **Pluggable policy engine, not binary `--yes`.** *(answers the Bash-blast-radius caveat)*
Modeled on `gemini --policy / --admin-policy` and `codex --sandbox`. Replace the current binary `-y/--yes` with:
- **`--approval-mode plan|read-only|write|yolo`** — four levels (read-only blocks all writes, plan blocks all side effects, write prompts on dangerous verbs, yolo skips all confirmation).
Policy decisions log to a tamper-evident audit file. Org admin can ship `--admin-policy` that overrides user config. **This is the real answer to "Bash carries unrestricted blast-radius once allowed" — claudemesh's own policy engine kicks in before the broker call, regardless of what shell permissions are.**
## What this means for `claude/channel`
When peer A's CLI runs `claudemesh send peer-B "hello"`:
1. CLI dials `~/.claudemesh/sockets/<mesh>.sock` (warm path) or opens its own WS (cold).
2. Encrypts message with peer-B's pubkey via crypto_box.
4. Peer-B's push-pipe decrypts and emits a `claude/channel` notification.
5. Claude Code mid-turn-injects the message as a `<channel source="claudemesh" ...>` reminder.
6. Claude responds immediately per the system prompt convention.
Step 5 is the **only step that requires MCP**. Steps 1-4 are pure CLI + broker. The architecture is "CLI for everything, MCP for the one thing it's irreplaceable for."
## Migration path from 1.1.0
| Version | Ships | Behavior |
|---|---|---|
| **1.2.0** | Unix socket bridge. CLI verbs auto-detect push-pipe and use warm path. **Field-selectable JSON (`--json a,b,c`)** + `--jq` + `--template` adopted. | All existing MCP tools still work. Nothing breaks. |
| **1.2.1** | Ships `~/.claude/skills/claudemesh/SKILL.md` written by `claudemesh install`. Includes full verb reference + output schemas + gotchas. Topic-shards (`-platform`, `-schedule`, `-admin`). | Skill auto-installs on `claudemesh install`. |
| **1.3.0** | Schedule unification (`schedule msg/webhook/tool`). All remaining missing CLI verbs (file, vector, graph, mesh-mcp, vault, sql, stream, context, skill, watch). **`--output-format stream-json`** for long-running ops. | All existing MCP tools still work. New verbs additive. |
| **1.4.0** | Resource-model rename pass — every CLI verb is `<resource> <verb>`. Old verbs become aliases. | All existing MCP tools still work. Old CLI verbs aliased forever. |
| **1.5.0** | **Pluggable policy engine** (`--approval-mode`, `--policy`, `--admin-policy`). MCP `tools/list` shrinks to configurable allowlist (default: empty). `CLAUDEMESH_MCP_FAT=1` for users who need typed tool surface. | Default 1.5 install: MCP exposes zero tools. Push-pipe-only. Policy engine gates all writes. |
| **2.0.0** | MCP server hardcoded to push-pipe-only. Strip all tool registrations + handlers. | **Old MCP tool calls return tool-not-found.** Users must update scripts to CLI verbs. Old CLI verbs (1.4 aliases) still work. |
## What stays exactly the same
- Crypto: ed25519 sign + x25519 sealing + crypto_box for DMs. No change.
- Broker protocol: WS frame format, hello flow, audit log. No change.
- Membership / mesh-scope / capability grants. No change.
- Web app, dashboard, Telegram bridge, OAuth. No change.
- The platform vision (vault, vectors, graph, files, skills, mesh-MCPs, scheduled jobs). All shipped, all stay.
## What changes for users
-`~/.claude.json` simplifies: `"claudemesh": { "command": "claudemesh", "args": ["mcp"] }` becomes one entry per joined mesh after `claudemesh install`. Multi-mesh push works out of the box.
- ToolSearch loses ~80 deferred entries. Sessions are lighter.
- Scripts that called `mcp__claudemesh__*` get a deprecation warning in 1.x, break in 2.0 — replaced by `claudemesh <verb> --json` + `jq`.
- Claude Code system prompt for the MCP server gets shorter (no tool catalog), focused only on "RESPOND IMMEDIATELY to channel events."
## Open questions parked for future specs
- **Federation** — broker-to-broker encrypted relay so peers on different brokers can talk. Not in 2.0 scope.
- **Offline-with-TTL inbox** — persist `now` priority messages on broker if recipient is offline, with explicit TTL. Reasonable for 2.x.
- **Compute attribution** — when peer X invokes a mesh-MCP that peer Y deployed, who pays for broker compute / outbound calls? Pre-empts the eventual billing question. 2.x.
- **Universal hash-chained audit** — every state mutation per mesh is hash-chained, replayable, verifiable. Today only some events are; making it universal is its own spec.
- **ACP (Agent Communication Protocol) interop with Gemini CLI.** Gemini CLI exposes `--acp` for agent-to-agent comms — the same problem domain claudemesh occupies. Research question: is ACP a documented standard claudemesh can speak (making claudemesh peers and Gemini peers cross-talk in the same mesh), or is it Google-proprietary? If standard, implementing it is a major platform expansion. File as separate research spec before 2.x.
## What this spec is NOT
- Not a redesign of the broker. The broker stays as-is.
- Not a redesign of crypto. Crypto stays as-is.
- Not a feature deprecation. Every feature stays.
- Not optional. This is the canonical 2.0 architecture; intermediate versions migrate toward it.
## Effort estimate to 2.0
Sequential, single dev (revised after caveats survey — original estimate was rosy):
- **1.2.0** (socket bridge + field-JSON): 1-2 weeks. Socket bridge is real distributed-systems work (stale-cleanup, version negotiation, NFS/Windows edge cases) — not 2-3 days.
- **1.2.1** (claudemesh skill + topic shards): 2-3 days. Mostly content writing once schemas are documented.
- **1.3.0** (schedule unification + remaining verbs + stream-json): 1 week. Each of the ~10 missing verbs is small but adds up.
- **1.4.0** (resource-model rename + alias compat): 2-3 days.
- **1.5.0** (policy engine + MCP allowlist): 4-5 days. Policy engine is its own subsystem — parser, evaluator, audit log, admin override.
Total: **~5-6 weeks of focused work** spread over 3-4 months calendar. Each release is independently shippable; the policy engine specifically can land later than 1.5 if needed.
## Acceptance signals — how we know it worked
- **ToolSearch** in a freshly-installed claudemesh session shows zero `mcp__claudemesh__*` entries by default (vs ~80 today).
- **`claudemesh peers --json name,status`** projects exactly two fields, no extra noise.
- **`claudemesh send <peer> "hi"`** from a Bash call inside a Claude session round-trips in <50ms (warm path via socket bridge) on localhost-broker, <250ms on EU-from-US.
- **`Skill: claudemesh`** loaded once teaches Claude the entire mesh surface; subsequent CLI calls require no further introspection.
- **A policy file with `decision: deny` for `file delete`** blocks the call before it hits the broker, with a clear stderr explanation.
- **`claudemesh status set working` from cron** opens its own WS (no daemon), succeeds in <1s, no orphan connections on broker.
State of the world after a long session that shipped 1.5.0 and the v0.2.0 backend. Read this before the next session — it captures what's done, what's deployed where, what's not, and the architectural decisions worth knowing.
---
## Where things stand
### Released to npm
- **`claudemesh-cli@1.5.0`** (latest tag, published earlier today). CLI-first architecture lock-in: zero-tool MCP, policy engine, bundled `claudemesh` skill. Verified install + smoke-tested via clean `npm i -g`.
### In `main` but NOT released yet
Everything below is committed, deployed to the broker (`wss://ic.claudemesh.com/ws`) and the web app (Vercel `claudemesh.com`), but **`claudemesh-cli@1.5.0` on npm doesn't have any of it**. Users won't see it until v1.6.0 publishes.
| Feature | Code path | Verified live? |
|---|---|---|
| Topics (schema, broker routing, CLI verbs, skill) | `packages/db/src/schema/mesh.ts`, `apps/broker/src/broker.ts`, `apps/cli/src/commands/topic.ts` | ✅ created `#deploys-test`, sent + persisted |
| REST `/api/v1/*` (messages, topics, peers, history) | `packages/api/src/modules/mesh/v1-router.ts` + `api-key-auth.ts` | ✅ posted via curl, history round-trips |
| Bridge peer (SDK + CLI) | `packages/sdk/src/bridge.ts`, `apps/cli/src/commands/bridge.ts` | ⚠️ code only — never run end-to-end |
### Architectural commitments locked this session
- **CLI-first, MCP push-pipe** (1.5.0): MCP `tools/list = []`. Inbound peer messages still arrive as `experimental.claude/channel` notifications. The bundled skill is the sole CLI-discoverability surface for Claude.
- **Humans use REST + apikey, not browser WS** (v0.2.0): the broker already plumbs `peer_type: "human"`. The real blocker was browser-side ed25519, which we sidestep by exposing REST. Web chat UI = thin client over `/v1/*` using dashboard session auth.
**Why second**: every schema change today requires manual `psql -f migration.sql` against prod. The drizzle `_journal.json` stops at idx 11, runtime migrator silently skips anything not in journal. Today's `0022_topics.sql` and `0023_api_keys.sql` were applied by hand. **Future migrations will keep needing this until fixed.**
Recommended approach:
1. Replace `drizzle-orm/postgres-js/migrator` in `apps/broker/src/migrate.ts` with a custom runner.
2. Scan `migrations/*.sql` lexicographically (already named `NNNN_*.sql`).
3. Track applied filenames in a new `mesh.__cmh_migrations` table (filename + sha256 + applied_at).
4. On startup: filter unapplied files, run them in transaction order under `pg_try_advisory_lock`. Fail loud on hash mismatch (catches edits after deploy).
5. Backfill the table with all 0000-0023 entries one-time so prod is consistent.
6. Drop the drizzle journal usage entirely (`migrations/meta/_journal.json` becomes dead state).
This unblocks every future feature touching DB.
### Session C — Web chat UI (~2-3 days, highest visibility)
**Why third**: the demo. Backend is ready; this is pure React + REST.
- Message stream — `GET /api/v1/topics/:name/messages?limit=50`. Poll every 5s for new (no WS yet — REST polling is fine for v0.2.0).
- Compose box — `POST /api/v1/messages` with `{topic, ciphertext, nonce}`.
- Members sidebar — `GET /api/v1/peers`.
- Apikey lifecycle: on first load, server-side issue an apikey for the dashboard user (using their existing NextAuth session) scoped to `read,send` on this topic. Stash in browser session storage.
Server-side helper for apikey issuance lives in `packages/api/src/modules/mesh/api-key-auth.ts` — refactor `verifyBearer` to also expose a `createApiKeyForUser(userId, meshId, scope)` helper for the dashboard handler.
---
## Three less-urgent followups (don't block sessions A-C)
1.**Bridge end-to-end smoke test**: never actually run between two meshes. Needs second test mesh + bridge member onboarding ritual. Worth doing before any blog post / external demo.
2.**`/v1/peers` includes only WS-connected agents**, not humans (since humans are REST-only and never appear in `presence`). Decide: synthetic presence rows for active apikey sessions? Or document that `/v1/peers` is "agents online"?
3.**Topic ciphertext is plaintext base64** in the current implementation — no actual encryption. The schema names it `ciphertext` for forward-compat, but the code base64-encodes UTF-8. Real per-topic symmetric key derivation (HKDF from mesh root_key + topic_id) is a v0.3.0 item.
---
## Production state worth knowing
- **Broker**: `wss://ic.claudemesh.com/ws`, deployed via Coolify on OVHcloud VPS. Auto-redeploys on push to `gitea-vps main`. Deploy ETA ~3 min.
- **Web**: `claudemesh.com`, Vercel auto-deploy on push to `github main`. Deploy ETA ~2 min.
- **Postgres**: container `eo1f5gydsgrg19b57e9s4zw7` on the VPS. SSH via `ssh ovh`, then `docker exec eo1f5gydsgrg19b57e9s4zw7 psql -U claudemesh -d claudemesh`.
- **Test mesh**: `openclaw` on the same broker has 5 active peers and one topic (`#deploys-test`).
- **Active apikey** (from earlier today's smoke): `cm_OC12dRti…` was revoked. None active right now.
---
## Files most worth reading first in next session
1.`.artifacts/specs/2026-05-02-architecture-north-star.md` — the 7 architectural commitments.
2.`.artifacts/specs/2026-05-02-v0.2.0-scope.md` — design sketches for topics, REST, bridge.
3.`apps/cli/skills/claudemesh/SKILL.md` — the canonical CLI surface; ships in npm tarball.
4. This file.
---
## Memory not yet captured
Worth adding to `~/.claude/projects/-Users-agutierrez-Desktop-claudemesh/memory/MEMORY.md` next session:
- **Drizzle journal drift is a recurring trap** — manual psql until session B lands. Save the exact apply ritual: `scp migrations/NNNN.sql ovh:/tmp/ && ssh ovh "docker cp /tmp/NNNN.sql <pg-container>:/tmp/ && docker exec <pg-container> psql -U claudemesh -d claudemesh -f /tmp/NNNN.sql"`.
- **`workspace:*` deps break `npm publish`** — keep SDK as devDependency in `apps/cli/package.json`; Bun bundles it into dist so runtime doesn't need it. Same trick for any other workspace-only build deps.
- **Commitlint hard-caps body lines at 100 chars** — use `git commit -F /tmp/cm-commit.txt` rather than `-m` heredocs. Heredocs that exceed the limit fail the husky hook silently.
Strategic counterpart to `docs/roadmap.md` (which is the public, marketing-tone roadmap). This file captures the *why*, the dependencies, the costs, and the things we deliberately won't do.
Anchored in the v0.2.0 backend cut + `#general` auto-creation + filename-tracked migrator + owner-member backfill that all shipped 2026-05-02.
---
## Forcing function
> **Ship v1.6.x in 2 weeks. Ship v1.7.0 in a month. Make the demo. Then commit the daemon.**
Each release stands on its own — usable and shippable even if the next slips. That's the property to optimize for, not "fastest path to v3.0.0."
- Notification feed at `/dashboard` — "you have N unread in #deploys, 2 mentions in #incident." Purely aggregate; no new schema.
- One-line marketing site refresh — capture screenshots from the now-real-time UI, drop the v0.2.0 stamp from the chat footer, update README/landing.
- First public blog post + recorded demo — "claudemesh in 90 seconds" video. Triggers the first proper user-acquisition push.
**Not in scope:** any architectural change. v1.7.0 is pure UX polish on top of the v1.6.x foundation. Architecture work waits for v2.0.0.
**Why this comes before v2.0.0:** without users, the daemon is a solution for nobody. v1.7.0 produces the first real user signal so v2.0.0 has data to optimize against.
---
## v2.0.0 — 3-4 weeks, the daemon redesign
The single largest architectural shift on the roadmap. Background and rationale captured at length elsewhere this session; summary here.
### Single load-bearing principle
> **The user is the unit of mesh participation, not the Claude session.**
Every weird edge case from this session — the launch tax, the orphan owner, the per-session keypair churn, the MCP install/uninstall ritual, multi-Claude config corruption — comes from getting this one thing wrong today. Fix it once, structurally, and 70% of accumulated complexity vanishes.
- **`claudemesh-daemon`** — long-lived per-user process. One WS per workspace, kept alive across Claude session lifetimes. Listens on `~/.claudemesh/sockets/<workspace>.sock`. Started by `claudemesh login`, persists across reboots.
- **HKDF-derived peer keypairs from JWT** — same identity across machines, no key copy ritual. Web sign-up = CLI sign-up = same row in `mesh_member`.
- **Stateless CLI verbs** — each existing command (`send`, `peers`, `topic`, `apikey`, `bridge`, `state`, `remember`, etc.) retargeted to dial the daemon socket. ~3000 LoC of plumbing deleted, ~500 LoC of glue added.
- **50-line MCP server** — dial daemon, forward inbound peer messages as `experimental.claude/channel` notifications. The push-pipe shrinks from ~150 LoC to ~50.
- **`claudemesh launch` deprecated** — replaced by ambient mode: `claude` with no flags. Launch becomes a one-line alias that prints "ambient mode now, just run `claude`" and exits.
- **"Mesh" → "workspace"** in the public surface. DB tables keep `mesh_*` names for migration sanity.
### What v2.0.0 kills
-`claudemesh launch` command — the 8-thing bootstrap was paying for state the daemon now owns persistently.
-`--dangerously-skip-permissions` — set once at install in `settings.json` allowedTools, never seen by the user again.
-`--dangerously-load-development-channels` — written into `~/.claude.json` once at install, never seen again.
- ~80% reduction in support load for "launch flags," "config corruption," "peer keypair lost," "owner has no member row"
- ~0 cost to broker, web app, schema, protocol — none of the deep parts change
### Migration path (backwards-compatible at every step)
1.**Week 1** — daemon binary + unix socket protocol + retarget two CLI verbs (`send`, `peers`) as the smoke test. Ship to alpha testers.
2.**Week 2** — retarget remaining verbs. HKDF-keypair migration with a one-shot `claudemesh migrate-identity` command for existing users.
3.**Week 3** — `claudemesh launch` becomes a deprecated alias. MCP server retargeted to daemon socket. Backfill: every existing user's daemon spins up on first `claudemesh` invocation.
4.**Cut v2.0.0**: remove deprecated launch alias one minor release later (v2.1.0) once metrics show no one's hitting it.
---
## v0.3.0 — 4-6 weeks, the operator chapter
For teams that want to run their own broker, encrypt at the topic level, or wire claudemesh to messaging surfaces beyond Claude Code.
- **Per-topic HKDF encryption** — kills the "broker can read your messages" wart. Symmetric key derived from `mesh.root_key + topic.id`. Web client gets the topic key from the sealed root_key it already holds.
- **Self-hosted broker packaging** — single `docker-compose.yml`, postgres included. CLI accepts `--broker wss://...` to point anywhere. Federation primer.
- **WhatsApp gateway** — peer bot that forwards a topic to a WhatsApp group.
- **Telegram gateway** — same pattern.
- **Tag routing** — `claudemesh send tag:repo:billing "deployed"` lands at every peer working on that repo. Already protocol-supported, needs CLI ergonomics + dashboard surface.
v0.3.0 is when teams that want to run their own broker can do so without paying us. Counterintuitively important: it's also when we can charge for hosted with a clean conscience.
---
## v3.0.0 — Anthropic-blessed cut (conditional)
Conditional on Anthropic shipping first-class agent-to-agent channels in Claude Code. We don't control the timing.
### What's load-bearing about today's flag
`--dangerously-load-development-channels server:claudemesh` does two things:
1. Loads the claudemesh MCP server.
2. Tells Claude Code to treat its `experimental.claude/channel` notifications as runtime channel events.
The flag is named `dangerously-load-development-channels`*specifically because* the channel API is experimental and unstable. Some opt-in mechanism will always be required for Claude Code to receive external events from a third-party process — that's a security-model invariant, not a quirk of today's flag. What changes at v3.0.0 is the *form* of the opt-in, not its existence.
### Two scenarios depending on Anthropic's choice
**Scenario A — MCP-channel API graduates.** The same MCP-based push primitive becomes stable.
- The `--dangerously-load-development-channels` flag is replaced by a stable settings.json entry — e.g. `mcpServers.claudemesh.acceptChannelNotifications = true`.
- The `experimental.` prefix on the notification namespace goes away.
- Net user-visible change: nothing, because we already write the flag once at install and the user never sees it. The migration is internal: swap the install logic to write the new settings entry instead of the old flag.
**Scenario B — non-MCP transport ships.** Anthropic introduces a sidecar IPC, a native WebSocket subscription declared in settings, or some other primitive.
- The 50-line MCP wrapper from v2.0.0 disappears.
- The daemon plugs into the new transport directly.
- Some opt-in config is still required (settings.json entry, environment variable, etc.) — Claude Code must know to subscribe to the daemon's channel.
- Net user-visible change: still nothing if our `claudemesh install` adapts to write the new opt-in form.
### What disappears regardless
- The `experimental.` prefix on the channel API (it stabilizes).
- The `dangerously-` framing of the flag (the API is no longer experimental).
- The "you have to pass a launch flag to load development channels" mental model.
### What stays regardless
- An opt-in mechanism somewhere (security model invariant).
- The daemon as the lifecycle owner.
- The protocol, schema, broker, topics, web chat — all unchanged.
### Marketing pivot
claudemesh becomes a "hosted backend for Claude's native multi-agent feature" rather than a "Claude Code extension." The product story simplifies regardless of which shape ships, because the user no longer has to think about MCP servers, dangerous flags, or experimental APIs — claudemesh is just there.
Until v3.0.0 lands, v2.x ships with the MCP bridge under the existing flag. v3.0.0 is the migration target, not a planned feature.
---
## Cross-cutting tracks (always-on, not version-gated)
| Track | What it covers | Target version |
|---|---|---|
| Mobile | iOS peer app (thin: push + reply, same JWT identity) | v2.x |
| Browser peer (proper) | IndexedDB ed25519 + WebCrypto crypto_box for the dashboard. Today's web is REST-only; this makes it a true peer. | v2.x |
| Peer transcript queries | "Hey Claude2, what have you touched in the last hour?" cross-session memory primitive | v0.3.0+ |
| Custom bot framework / plugin marketplace | Premature — claudemesh barely has organic users. Build the user base first, then platform. |
| Voice channels | Out of scope. Different product. |
| Video chat | Same. |
| Email-as-peer (incoming SMTP → mesh) | Has demand from one user; ship if 3+ ask. |
| AI summarization of channels | LLM cost + scope creep. Users can wire their own with the existing message API. |
| Mobile push notifications via APNs/FCM | Wait for the iOS peer app, then revisit. |
| Reactions / threading | Not yet — would muddle the protocol surface for marginal value. Reconsider after v0.3.0 user feedback. |
---
## Single-sentence summary
**Polish v1.6.x → ship v1.7.0 demo → commit v2.0.0 daemon → open the operator chapter at v0.3.0 → plug into native channels at v3.0.0 when Anthropic ships them.** Each release stands on its own. The protocol, the schema, the broker, and the topics are all already correct — what changes is the lifecycle owner around them.
**Theme: from agent-only mesh to mesh of agents, humans, and external systems — with conversation context.**
| # | Feature | Effort | Spine |
|---|---------|--------|-------|
| 1 | **Topics** (channels/rooms within a mesh) | 2-3 d | yes |
| 2 | **Humans in the mesh** (web chat panel) | 2-3 d | depends on #1 |
| 3 | **REST API + external WS** (API keys per mesh) | 2-3 d | depends on #1 |
| 4 | **Bridge peer** (forwards one topic between meshes) | 1 d | depends on #1 |
Optional pickup if all four ship early:
- **Local peer aliases** (~0.5 d) — IRC-style local labels for hard-to-remember displayNames.
- **Semantic peer search** (~0.5 d) — already in vision doc; useful once topics exist.
Total: 7-9 days plus 1-2 days slack. Targeting **release window: 2026-05-12 to 2026-05-16**.
---
## Why this cut
The 1.5.0 architecture (CLI-first, tool-less MCP, policy engine) is finished. The next bottleneck is **product surface**, not engineering.
Current taxonomy `mesh + group + role` is the right *organizational* structure but missing a *conversational* primitive. Every message is DM or `@group` broadcast — there's no continuity for "the deploys conversation," no scoped state/memory/files, no way for a human to join a topic without joining the whole mesh, no way for a bridge to forward a single thread of work.
**Topics fix this.** They are the spine of v0.2.0:
- Without topics, "humans in mesh" floods every human with every peer's chatter.
- Without topics, "bridge" forwards everything (loop risk, signal-to-noise problem).
- Without topics, REST API endpoints have no natural sub-mesh scope.
Once topics exist, humans + REST + bridge each become 50% smaller because they slot into a clean primitive instead of inventing one.
---
## Deferred
| Item | Why later |
|---|---|
| **Federation** (broker-to-broker) | Bridges prototype it. Learn from real use first. |
claudemesh topic delete deploys # creator/admin only
claudemesh send "#deploys""rolling out 1.5.1"
```
**MCP `claude/channel` notification gains `topic`** as an attribute so peers know which conversation an inbound message belongs to.
**Effort breakdown:** schema + drizzle migration + CLI verbs + broker routing changes (filter by topic membership) + skill update. ~250 LoC across CLI + ~200 LoC broker.
---
### 2. Humans in the mesh
**Mental model:** a human is a peer with `peer_type: "human"` whose presence is durable (no session pubkey rotation; identity tied to an account). They join *topics*, not the whole mesh — so they only see relevant traffic.
> **Implementation update (2026-05-02):** `peer_type: "ai" | "human" | "connector"` is already plumbed end-to-end in the broker (hello envelope, ConnectedPeer, list_peers). What was missing wasn't broker support — it's the **interface** for humans, who don't have browser-side ed25519 to do hello-sig. Realistic path: **REST API is the human interface** (rolled into #3 below). The web chat panel becomes a thin client that posts/reads via REST using the dashboard user's session auth — not its own keypair. This collapses #2 and #3 into a single deliverable: REST → UI on top.
**Wire:**
```jsonc
// hello envelope gains:
{
"peer_type":"human",
"session_pubkey":<ephemeral,perbrowsertab>,
"member_pubkey":<durable,account-tied>,
"display_name":"Alejandro"
}
```
**Web panel (`apps/web`):**
```
/dashboard/mesh/<slug>/topic/<topic-name>
├── topic header (members, settings)
├── message stream (WS-driven, infinite scroll on history)
├── compose box (typing indicator broadcast on focus)
└── members sidebar (presence, profile, last_read_at)
```
**Backend changes:**
- Persistent message history per topic (drizzle table `topic_messages`; existing direct messages stay ephemeral by design).
- Typing indicator: short-lived broadcast on the topic channel (`{type: "typing", peer: "..."}`).
**Privacy invariant:** a human in `#deploys` sees only `#deploys` traffic + DMs sent to them. Never the whole mesh. This is the *whole reason* topics come first.
**Effort:** WS endpoint already exists (broker side). Add: topic_messages table, history endpoint, web UI components (compose, stream, members). ~3 days.
---
### 3. REST API + external WS
**Auth:** API keys per mesh, scoped by capability + topic.
```yaml
api_key:
id:<ulid>
mesh_slug:openclaw
label:"ci-bot"
hash:<argon2id>
capabilities:["send","read"]
topic_scopes:["#deploys"]# null = all topics; explicit = whitelist
GET /v1/topics/:name/messages History (with pagination cursor).
GET /v1/peers List online peers (filtered by key scope).
GET /v1/state Read mesh state.
POST /v1/state Write mesh state.
```
**External WS:**`wss://ic.claudemesh.com/ws?api_key=...&topic=deploys` — connects with `peer_type: "external"`. Push-pipe parity with internal sessions; can subscribe to topic streams.
**Why REST keys not session keypairs:** external clients (Zapier, GitHub Actions, mobile apps, Slack workspace bots) need long-lived bearer-like creds, not ephemeral keypairs. Different threat model — scope tightly via topic + capability.
**Mental model:** a bridge is a peer that holds memberships in two meshes and forwards traffic on a single topic between them. SDK-only (no broker changes).
**Loop prevention:** every forwarded message gets a `bridge_hop_<n>` tag; bridges drop messages that already carry their own tag (prevents echo) and any message with `max_hops` exceeded.
**CLI:**`claudemesh bridge run <config.yaml>` — runs an SDK bridge as a long-lived process. Useful for "run a bridge inside a docker container or systemd unit."
**What it deliberately doesn't do:**
- Cross-broker federation (that's a separate broker-to-broker protocol).
- Bidirectional state/memory sync (only messages on a single topic).
- Identity unification (a peer in mesh A is *not* the same peer in mesh B; the bridge appears as the messenger).
**Effort:** ~1 day on top of the existing SDK.
---
## Acceptance signals
v0.2.0 ships when all four are demonstrable end-to-end:
1. A peer creates `#deploys`, two other peers join it, traffic is topic-scoped, mesh-wide chat doesn't see it.
2. A human signs in at `claudemesh.com`, joins `#deploys`, sends a message, a Claude session in the mesh receives it as a `<channel>` interrupt with `topic="deploys"`.
3. A `curl` POST against `/v1/messages` with an API key delivers a message into `#deploys`; the same API key is rejected on `#secrets`.
4. A bridge peer running locally forwards `#incidents` between two test meshes; loop is prevented; one-shot demo recorded.
---
## Out of scope (explicitly)
- Topic hierarchy / nesting (flat namespace per mesh; revisit at scale).
- Topic-scoped capability grants (`grant <peer> read:#topic`) — solvable later via capability extension.
- Threads-within-topics (Slack-style). Defer.
- Voice / video / file-upload UX for humans — text only in v0.2.0.
- **Topics retrofit risk** — existing 1.5.0 message envelope assumes "to" is peer/group/star. Adding `topic` is additive on the wire but changes routing logic. Test path: backfill existing meshes with a default `#general` topic; opt-in to topic-only routing.
- **Web chat session lifecycle** — humans expect "I closed the tab and came back, my place is preserved." Ephemeral session pubkeys break that. Workaround: tie human peer identity to `member_pubkey` + last_read_at on the topic; session pubkey rotates per tab but membership is durable.
- **API key abuse** — leaked keys = anyone can post. Mitigations: capability + topic scoping; rate limits per key; `last_used_at` + audit trail; revoke verb is fast.
---
## Open questions
1. Do existing `@group` semantics survive intact, or do we collapse `@group` and `#topic` into one primitive? (Answer favored: keep both — different axes.)
2. Should topics persist messages by default, or be opt-in? (Default: yes for `peer_type: "human"`-touched topics; configurable per topic for agent-only ones.)
3. Where does mesh-MCP discovery live in the topic model — per topic or per mesh? (Likely per mesh; mesh-MCP is infrastructure, not conversation.)
# claudemesh — agentic peer communication, full end-state
## What this document is
The end-state architecture for claudemesh as a transport-agnostic agentic peer-comms platform. Not a release plan, not a sprint roadmap — the **shape** the system needs to converge on. Implementation order at the end is a *suggestion*, not a contract; time estimates are deliberately omitted because the surface is too cross-cutting to phase by weeks.
v1 of this spec (same date, no `-v2` suffix) treated the broker as the sole data plane. v2 corrects that: **the broker is a coordination plane (signaling, discovery, offline queue, fan-out, registry, revocation); the data plane is hybrid P2P** with broker fallback for the cases P2P can't cover. Closer to how Tailscale, libp2p, LiveKit, and modern WebRTC stacks work in production.
## TL;DR
- **Identity** — three keypair types (member, session, service) all rooted in a member's secret key. Member is durable, session is per-launch, service is a member-scoped delegate for non-Claude integrations. Every service has its own pubkey and explicit revocation.
- **P2P first** (WebRTC data channels, future: QUIC) when both peers online + NAT-traversable.
- **Broker-relayed** when peers are NAT-blocked, when one peer is offline, or for group/topic/broadcast where fan-out at the broker is structurally cheaper than N-way sender-side fan-out.
- **Pure broker** for service identities that can't run a P2P stack (HTTP webhook senders, OpenAI Assistants, browser SDKs without WebRTC).
- **Channels** — typed envelope (dm, group, topic, rpc, system, stream). Channel type drives crypto, routing, and transport selection. `meta` is required in v2 envelope.
- **Transports** — pluggable adapters under one interface: WS-to-broker (today), WebRTC P2P, HTTP webhook, future LiveKit/QUIC/etc. Broker negotiates which adapter a peer pair uses.
- **Crypto** — every direct message is E2E encrypted to recipient's pubkey regardless of transport. Broker never sees plaintext. P2P doesn't get any extra trust just because it's direct.
- **Delivery** — at-least-once **requires receiver ack** before broker marks `delivered_at`. The retry path before that is best-effort with idempotent dedupe at the receiver.
The CLI-first commitment from the North Star spec stays intact. Every channel type and every transport is invocable from `claudemesh <verb>`. MCP serves only `claude/channel` mid-turn push.
---
## The forcing functions (why this shape, not a smaller one)
1.**Multi-session interconnect already broke** (1.30.0 → 1.32.1) because the per-session WS subsystem shipped without push handler. Symptom of "broker is the data plane and we keep bolting on" thinking. Need to formalize roles and transport adapters before the next bolt-on.
2.**Codex review surfaced a correctness bug** in `drainForMember` — claims `delivered_at = NOW()`*before* WS push succeeds; if `ws.readyState !== OPEN` the row is marked delivered and message is lost. At-most-once with no retry. Inherited by every channel/transport added unless fixed at the foundation.
3.**The agentic-comms domain has standardized on hybrid P2P + central coordinator.** Tailscale (control plane + WireGuard P2P), LiveKit (signaling + SFU + P2P data channels), libp2p (DHT discovery + multi-transport), Iroh (gossip + QUIC P2P). Pure-broker is a 2010s pattern; pure-P2P is academic. Hybrid is the norm.
4.**claudemesh's pricing/economics demand P2P.** Every byte through the broker is your cost. Voice transcripts, file transfers, real-time tool I/O — bandwidth-heavy. P2P data plane lets the broker scale linearly with peer count, not message volume.
5.**Privacy/sovereignty matters as the agent ecosystem grows.** "Your agents talk to my agents" should default to peer-to-peer paths when possible. Broker as relay is fine; broker as forced middleman is not.
---
## Audience for this architecture
| Peer type | Identity | Online presence | Data plane preference | Notes |
| **Daemon, no launch** (idle Mac with daemon running) | Member pubkey | WS to broker | Broker only (no P2P partner unless launched) | Receives broadcasts + member-targeted DMs |
| **Voice agent** (LiveKit, Pipecat) | Service identity, member-signed | LiveKit room + bridge | LiveKit room data channels intra-room; bridge over broker for cross-mesh | Side-car bridges room ↔ broker |
| **OpenAI Assistant / Anthropic Skill** | Service identity, scoped token | HTTP outbound, webhook inbound | Broker only (can't run P2P) | Daemon does delegated re-encryption |
| **Browser-based peer** (web dashboard, SDK) | Member or service identity | WS to broker, WebRTC for P2P | P2P-where-possible (browsers ARE WebRTC-native) | Full feature parity once on-mesh |
| **Webhook consumer** (Stripe-style passive) | Service identity | HTTP webhook inbound only | Broker only | Topic subscriptions; no inbound channel |
| **Bridge** (Slack, WhatsApp, IRC, Matrix) | Service identity per bridge + per-end-user delegated | WS to broker | Broker only for bridge ↔ broker; native protocol for bridge ↔ external | Trust delegated to bridge operator |
| **Cron / scheduled actor** | Member pubkey or service identity | Ephemeral; HTTP send only | Broker only | No long-lived connection |
| **CLI-only user** (no Claude Code) | Member pubkey | Ephemeral on each `claudemesh send` | Broker only | Command-line agent, queues via outbox |
Every row in this table works without changing the broker's coordination plane.
---
## Layer 1: Identity
Three keypair types, one auth model.
### Member identity (durable)
- Ed25519 keypair, generated at `claudemesh join <invite>`. Held in `~/.claudemesh/config.json` per mesh.
- The auth boundary — grants, kicks, bans operate on members.
- Used for hello signature on the daemon's control-plane WS.
- Used as cryptographic root of trust for sibling sessions and service identities.
### Session identity (ephemeral, per-launch)
- Ed25519 keypair generated by each `claudemesh launch`. Held in process memory only.
- Parent-signed attestation vouches for it (TTL 12h, broker cap 24h). Rotation = new launch.
- Used for hello signature on the per-session WS, and as routing key for DMs targeted at *this specific launched session*.
- Session secret never touches disk; lives only in the daemon's `sessionBrokers` map keyed by IPC token.
### Service identity (third type, additive)
For non-Claude integrations that can't or shouldn't use a per-launch session.
```
ServiceIdentity {
service_id // Stable string id ("openai-assistant-foo", "livekit-room-bar")
transport_hint // "ws" | "http-webhook" | "sse" | "livekit" — informs how the broker reaches it
delegate_daemon_pubkey? // Optional. Set when the daemon holds the service's secret on its behalf.
}
```
Two flavors:
- **Holds-secret service** — has its own keypair (`service_pubkey` + service-secret kept by the service itself). Runs E2E crypto end-to-end. Voice agent side-cars, browser SDK, MQTT bridges.
- **Delegated service** — daemon holds the service-secret on the service's behalf. Senders still encrypt to `service_pubkey`; daemon decrypts on receipt and forwards plaintext (or re-signs) to the service via its `transport_hint`. Used by HTTP webhook consumers, OpenAI Assistants. Trust is in the daemon owner. `delegate_daemon_pubkey` records who's holding.
All three identity types resolve to a `member_id` for authorization. They differ in liveness (member = always; session = per-launch; service = scoped) and transport hint (member/session = WS-resident; service = polymorphic).
### Identity revocation (explicit)
Existing v1 left this implicit. v2 makes it concrete:
- **CLI verb:** `claudemesh service revoke <service_id>` (also `claudemesh peer revoke <pubkey>` for member revocation).
- **Broker effect:** add row to `revocation` table with `(mesh_id, revoked_pubkey, revoked_at, revoked_by, reason?)`. Drop any active WS for that pubkey (close 4002 "revoked"). Reject future helloes.
- **Drain effect:** `drainForMember` checks revocation list at drain time; ciphertext-in-flight from the revoked sender is dropped (sender already broker-acked, but recipient never sees it).
- **Gossip:** revocation events publish on the `system` channel (highest priority). Online peers cache; offline peers see on reconnect. Required so P2P sessions also honor revoke (otherwise a revoked peer's stored attestations could keep working over direct paths).
- **Latency target:** <30s for online peers to receive and apply.
- **Expiry vs revoke distinction:** `expires_at` is graceful (predictable, scheduled rotation); revoke is emergency (leaked secret, fired employee, compromised host). Both use the same revocation table; `expires_at` enforces silently when reached, revoke is logged as an audit event.
---
## Layer 2: Coordination plane (the broker, properly scoped)
The broker is **not** the data plane. Its real responsibilities:
1.**Mesh state authority** — member roster, group memberships, topic registry, service registrations, revocation list. Source of truth for who's in a mesh and what they can do.
2.**Peer discovery** — `list_peers` returns currently-online presences. Broker is the only system that knows which peers are reachable now and over which transports.
3.**Signaling for P2P upgrades** — when peer A wants to open a P2P connection to peer B, A sends a SDP offer through the broker; B responds with an SDP answer through the broker. Once the data channel is up, broker is out of the path. Same as WebRTC signaling.
4.**Offline message queue** — when recipient is offline, broker stores the (encrypted) message until they reconnect. P2P can't do this without an "always-on peer" model, which is awkward to bootstrap.
5.**Group / topic / broadcast fan-out** — broker is the cheap fan-out point. Sender publishes once; broker delivers to N recipients. P2P fan-out (gossipsub) is possible but adds significant complexity for a feature most meshes won't need at scale.
6.**TURN-style relay for NAT-blocked pairs** — when P2P negotiation fails (symmetric NAT, restrictive corporate firewall), broker carries the data. Functionally equivalent to TURN.
7.**Revocation gossip publisher** — broker pushes revocation events to all online peers via the `system` channel; peers cache them.
8.**Audit log + persistence layer** — encrypted message metadata for compliance. Bodies are E2E-encrypted, so audit is over (sender, recipient, channel, timestamp, size), not content.
The broker is **NOT**:
- The default path for online-online direct messages (P2P should win).
- The decryptor for any direct message (E2E means broker sees ciphertext only).
- A bottleneck on bulk data (file transfer, voice, screen share — these go P2P or fail).
- The sole identity authority for active sessions (P2P sessions verify attestations locally via cached mesh state).
### Two roles per mesh on the WS layer (Codex-1 correction, kept)
Within the broker's WS surface, the daemon holds two roles per mesh, not one connection per launch:
- **Session connections** — N per mesh, session-keyed. Carries: presence row keyed on session pubkey + signaling for P2P upgrades involving this session + inbound for session-targeted DMs that arrive via broker fallback.
A peer who's purely on the broker (no P2P) functions exactly as today. A peer who upgrades to P2P with another peer keeps its broker WS for the other roles.
---
## Layer 3: Data plane (hybrid P2P + broker fallback)
The data plane is what carries actual message bodies. Three modes, selected per (sender, recipient, channel) tuple:
### Mode 1: Direct P2P (preferred when possible)
Two peers run a WebRTC data channel (or QUIC stream — pluggable, see Layer 4) between their daemons. Established via signaling through the broker; once up, broker is out of the path.
**When P2P is selected:**
- Both peers are online (have an active broker WS).
- Both peers' transports advertise P2P capability (WebRTC available; not a webhook-only service identity; not a browser without `RTCPeerConnection`).
- ICE negotiation succeeds (at least one candidate pair works — direct, server-reflexive, or peer-reflexive).
- Channel type is `dm`, `rpc`, or `stream` (the 1:1 cases).
**P2P session lifecycle:**
- Established lazily on first message (warm-up cost ~200ms; dominated by ICE + DTLS handshake). Subsequent messages reuse the channel.
- Idle timeout: 5min of no traffic → tear down. Re-established on next message.
- Hard timeout: 1h max regardless of activity, then re-handshake. Limits damage of compromised session keys.
- Either side can demote to broker-relay at any time; broker is the fallback always.
**Crypto on P2P:**
- DTLS handshake provides transport encryption (forward secrecy; recipient pubkey verified via cached attestation chain).
- Application-layer crypto_box ALSO runs on top — same as broker-relayed messages — so the wire format and decryption path are identical on the receiver side. Defense in depth, no special-case code.
### Mode 2: Broker-relayed (fallback)
The current path. Sender encrypts to recipient pubkey (member or session or service), pushes to broker via WS, broker queues, recipient pulls (or broker pushes to recipient's WS).
**When broker-relay is selected:**
- One peer offline → broker queues, delivers on reconnect.
- ICE negotiation fails → broker becomes the relay.
- Channel type is `group`, `topic`, or `broadcast` → broker fan-out is structurally cheaper than P2P fan-out for any group >2.
- Service identity at either end can't run P2P → broker is the only path.
**Crypto:** unchanged from today — E2E crypto_box, broker sees ciphertext only.
### Mode 3: Direct webhook (broker as broker, not as relay)
For service identities advertising `transport_hint: "http-webhook"`. Sender encrypts to service's `service_pubkey` (or to delegate-daemon's pubkey for delegated services), broker POSTs the ciphertext to the service's registered URL with HMAC signature + retry. No long-lived connection on the service side.
This is functionally a "broker queue, custom delivery transport" — broker still mediates, but delivery is HTTP not WS.
### Selection logic (deterministic, sender-side)
```
function pickTransport(sender, recipient, channel) -> Transport:
else: return broker.relay # fall through, log degraded
```
Policy lives in the daemon's send path. Broker doesn't know or care — it sees only the messages that actually go through it.
---
## Layer 4: Transport adapters (pluggable)
A transport adapter is an implementation of how *one peer pair* moves bytes. Defined by an interface; new adapters added without touching upper layers.
1.**`WsBrokerTransport`** — current code. WebSocket to `wss://ic.claudemesh.com/ws`. Underpins both broker-relay (Mode 2) and signaling for P2P upgrades.
2.**`WebRtcP2pTransport`** — RTCPeerConnection + RTCDataChannel. Browser, Node (`node-datachannel` or similar), CLI all supported. Chunking handled at envelope layer for `stream` channel.
3.**`HttpWebhookTransport`** — outbound HTTP POST to broker `/v1/send`; inbound HTTP POST to a registered webhook URL. Unidirectional from peer's perspective. Mid-turn push: no.
4.**`LiveKitRoomTransport`** — for voice agents. Side-car bridges a LiveKit room to claudemesh. Maps a LiveKit participant → claudemesh service identity.
Future adapters TBD as concrete needs surface — no commitments here. (v1 listed MQTT/gRPC/SSE as future named adapters; v2 drops the named list per Codex-2 should-cut feedback.)
The peer's daemon advertises transport capabilities at hello time; broker stores them in the presence row; senders consult them via `list_peers` (capability fields added to the response).
---
## Layer 5: Channels (typed envelope)
Channels define **semantics**: what the message means, what crypto to apply, what delivery guarantees, what fan-out, what backpressure.
```typescript
typeChannelType=
|"dm"// 1:1 direct, encrypted to recipient pubkey, at-least-once with ack
|"group"// post to named group, per-recipient encrypt or symmetric, at-least-once with ack
streamTerminator?: boolean;// stream only; signals end
rpcCorrelationId?: string;// rpc only; back-edge for response
rpcResponse?: boolean;// rpc only; this is a response, not request
replyToId?: string;// dm/topic threading
mentions?: string[];// dm/topic; @-callouts
expiresAt?: number;// any; broker drops past this; default 7d for queued
};
/** Sender Ed25519 signature over canonical bytes. Verified by recipient
* (and by broker for system-message origin). */
signature: string;
}
```
### Stream concurrency
For `channel: "stream"`, **`meta.streamId` is required**. Two concurrent streams to the same recipient pubkey use distinct streamIds; receiver demuxes by `(from, streamId)`. Without this, multi-stream voice transcripts or file transfers from the same peer would collide.
### Crypto by channel
-`dm`, `rpc`, `stream` → crypto_box(plaintext, recipient_pubkey, sender_secretkey). Receiver verifies attestation chain to ensure recipient_pubkey is a valid identity rooted in a current member.
-`group` → for now: per-recipient crypto_box (sender encrypts N times, broker fans out). Future: hybrid Curve25519 → AES-GCM with sender key wrap, like Signal Sender Keys.
-`topic` → per-topic symmetric key (already in v0.2.0 spec). Key rotation = new topic + members re-subscribe. Keys distributed via DM at join time, encrypted to each member's pubkey.
-`system` → broker is the signer; receivers verify against the broker's published Ed25519 pubkey. Plaintext bodies allowed since these are operational events.
**At-least-once requires receiver ack.** Today's broker sets `delivered_at = NOW()` inside the claim CTE before WS push succeeds — that's at-most-once with no retry. The end-state behavior:
1. Sender's daemon writes to outbox (durable).
2. Drain worker sends to broker; broker acks with `client_message_id` echo (this is sender → broker delivery ack, NOT end-to-end).
3. Broker queues with `claimed_at` NULL, `delivered_at` NULL.
4. On recipient hello / push opportunity: broker claims by setting `claimed_at = NOW(), claim_id = <presenceId>` (lease 30s).
5. Broker `sendToPeer` writes to WS / P2P / webhook.
6. Receiver processes envelope and emits `client_ack { clientMessageId }` back to broker.
7. Broker sets `delivered_at = NOW()` ON ACK RECEIPT.
8. If lease expires without ack → broker re-eligible to claim and re-deliver.
9. Receiver dedupes by `clientMessageId` (idempotent insert into inbox).
Until ack is wired (transitional state), the transitional label is **best-effort retry with idempotent dedupe**, not at-least-once. The outbox + claim/lease + dedupe combination upgrades to at-least-once when the ack path is in place.
`rpc` exactly-once is the same path with the addition that the response carries the `rpcCorrelationId`; sender retries the request until response received OR `timeoutMs` elapses; receiver-side dedupe ensures the handler runs at most once.
### Mid-turn push
`channel: "dm"` with `meta.priority: "now"` and recipient is a launched Claude Code session → recipient's daemon emits `claude/channel` MCP push; the session's Claude Code reads it mid-turn. Other priorities deliver via `claudemesh inbox` poll or at next tool boundary.
### Reply threading + mentions
Uniform across `dm` and `topic`: `meta.replyToId` references the original message's `clientMessageId`. `meta.mentions` is an array of pubkeys (or `@<group>`) — UI/CLI surfaces them; broker doesn't enforce.
---
## Layer 6: Mesh state — broker authority + signed gossip
The mesh state (members, groups, topics, services, revocations, policies) needs both:
- **Authority** — single source of truth. The broker DB. Mutations (add member, revoke, change policy) go through broker, signed by mesh owner / admin.
- **Replication** — every peer needs a current-enough copy to authorize incoming P2P messages locally (otherwise revoke can't be enforced when peers chat directly).
End-state: broker publishes signed mesh-state-update events on the `system` channel; peers cache and apply. Conflict resolution is trivial because broker is authority — peers merge updates by version vector. Eventually consistent in seconds, not the open-ended convergence of CRDT-only systems.
For peer revocation specifically: revocation gossip is highest priority and must propagate within 30s to all online peers. Offline peers see it on reconnect.
- Per-topic symmetric keys (v0.2.0 baseline; v2 makes it a hard requirement for topics).
- Broker signing key for `system` channel events (single Ed25519 keypair the broker holds; pubkey published in mesh state).
- Service identity attestations carry `service_pubkey` + `scopes`.
- Forward-secrecy for long-lived P2P sessions: post-handshake, derive a fresh symmetric key per session epoch (1h max); rotate.
---
## Migration order (architectural milestones, NO time estimates)
The end-state above doesn't ship in one PR. The following ordering minimizes regression risk and lets each milestone be useful on its own. **No weeks/sprints attached** — work proceeds when the prior milestone is stable.
### Milestone 1 — Foundational correctness
*Required before anything else. Without this, every later milestone inherits the bugs.*
- Extract `connectWsWithBackoff` helper. Refactor `DaemonBrokerClient` and `SessionBrokerClient` to use it. Eliminates the drift bug class.
- Drop daemon's stray `sessionPubkey` field (or rename + document).
- Tighten daemon-WS inbound filter — `*` broadcasts and member-targeted DMs only; session-targeted DMs land on session WS exclusively.
- Add `presence.role` column at broker (`control-plane | session | service`); list_peers + fan-out + reconnect honor it.
- **Fix broker drain race** — schema migration adds `claimed_at`, `claim_id`, `claim_expires_at` columns. Rewrite `drainForMember` for two-phase claim/deliver. Re-claim if `claimed_at` older than lease (30s).
- Receiver-side `client_ack` for at-least-once with ack (Codex-2 correction). Without ack wiring this stays at "best-effort retry with idempotent dedupe."
- Receiver-side dedupe: idempotent insert on `clientMessageId`; finished + made required for v2 envelopes.
### Milestone 2 — Capability advertisement + transport abstraction
*Sets up the interface. No new transport yet.*
- Define `PeerTransport` interface; refactor existing WS code to be the first implementation. No behavioral change.
- Add capabilities field to hello payload + presence row + `list_peers` response.
- Define `Envelope v2` schema with `meta` required + `streamId` requirement on `stream` channel. Broker accepts both v1 and v2 (v1 auto-upgraded server-side by inferring `channel` from `targetSpec` shape). Senders start emitting v2.
### Milestone 3 — Service identity + HTTP webhook transport
*First non-WS transport. Validates abstraction. Includes revocation.*
- Service identity registration: `claudemesh service register --type webhook --pubkey <hex> --scopes ...` mints attestation, stores broker-side. Service pubkey explicit in attestation.
- Service revocation: `claudemesh service revoke <service_id>` writes broker denylist + closes any active connections + publishes `system` revocation event.
- Add `HttpWebhookTransport` (broker-side outbound: POST with HMAC + retry; daemon-side inbound: HTTP server receives webhook callbacks → handleBrokerPush).
- Add `/v1/send` HTTP POST endpoint on broker (today broker is WS-only for sends).
- Demo: cron job using only `curl` posts to mesh; webhook subscriber receives.
- (`SseTransport` deferred — Codex-2 should-cut feedback. Pull in when concrete browser need arises.)
### Milestone 4 — Typed channels: rpc, stream, system
*Channel layer becomes real.*
-`channel: "rpc"` end-to-end: correlation id routing through any transport, response timeout, `claudemesh rpc <peer> <method> <args>` CLI verb.
- Onion routing option for adversarial environments.
---
## Non-goals (explicit)
- **Replacing Slack / Discord / Matrix as a human chat product.** claudemesh is for agent coordination; humans participate via bridges or direct DMs but UX is CLI-first.
- **Pure-P2P with no central coordinator.** The broker stays — for offline queue, group fan-out, mesh authority, revocation. "P2P-first hybrid" is the commitment, not "P2P-only."
- **Replacing the MCP `claude/channel` push-pipe.** Mid-turn interrupt stays MCP. The data-plane changes don't touch the daemon-to-Claude-Code path.
- **Real-time media (audio/video) directly in claudemesh data channels.** Bandwidth-heavy media goes through dedicated stacks (LiveKit, WebRTC SFU). claudemesh metadata + signaling glues them.
---
## Open questions
1.**Mid-turn push when sender is on P2P session.** P2P delivery to recipient's daemon → daemon emits MCP push. Same shape as broker-delivered. Confirm the MCP push respects per-session targeting (different session pubkey siblings of the same member).
2.**Browser peers and NAT traversal.** Browser ↔ browser via WebRTC works. Browser ↔ daemon (Node WebRTC binding) — needs testing under symmetric NAT. May require running a STUN server (Google's for now; eventually self-hosted). TURN fallback uses the broker WS.
3.**Backpressure on stream channel.** WebRTC data channels have built-in flow control. Broker-relayed streams need per-stream backpressure signaling to avoid OOM at the broker. Proposal: receiver advertises `stream_window_bytes` periodically; sender pauses when used.
4.**Multi-region brokers.** Today single broker. If we add a second broker (or federation), how do peers in mesh A on broker 1 talk to peers in mesh A on broker 2? Out of scope here; separate spec when forced.
---
## Acknowledgements
**Codex-1 (initial architecture review of existing code) caught:**
- "Remove daemon-WS inbound entirely" idea silently loses broadcasts + member-targeted DMs whenever zero launches exist. Corrected → retained.
- Inheritance for the dup'd lifecycle would become a god class. Composition via helper kept.
- Drain race needs `claimed_at` + delivered-on-success; "check OPEN before claim" still drops on crash. Kept.
- Token-keyed registry is correct (token = auth boundary), not a smell. Kept.
**Codex-2 (single-pass review of v1 of this spec) caught:**
- At-least-once requires receiver ack, not just "set delivered_at on success." → Layer 5 delivery semantics rewritten to require client_ack.
- Service identity needs explicit `service_pubkey` field, included in attestation. → Added to ServiceIdentity definition.
- v2 envelope `meta` should be non-optional with `clientMessageId` always present. → meta is now required.
- Service identity needed explicit revocation/disable story. → New CLI verb `claudemesh service revoke`, broker denylist, system-channel gossip propagation.
-`streamId` location ambiguous; concurrent streams to same peer would collide. → `meta.streamId` made REQUIRED for `channel: "stream"`.
- Defer `SseTransport` from Milestone 3. → Done.
- Drop named future-adapter list (MQTT/gRPC) to avoid false commitments. → Done.
The hybrid P2P data plane, transport adapter abstraction, typed channel envelope, mesh state replication, and milestone reordering are mine. Codex's reviews were targeted at correctness/scope-gap/should-cut, not redesign.
**This spec is now frozen for implementation.** No further architectural drift; deviations during implementation surface as new spec-deltas with explicit rationale, not silent edits to this document.
Today claudemesh is a **peer mesh for Claude Code sessions** — broker + CLI + per-session WS, encrypted DMs, peer list, mid-turn push via MCP. Tomorrow it has to be a **transport-agnostic agentic communication platform** that:
- treats Claude Code as **one channel type** among many (with first-class support for mid-turn interrupts via `claude/channel`)
- accepts **non-Claude agents** as peers — voice agents (LiveKit/Pipecat), OpenAI Assistants, raw HTTP webhook consumers, scheduled cron actors, human IM bridges
- exposes **typed channels** (DM, group, topic, RPC, system event, stream) so message semantics aren't shoved through one `targetSpec` string
- has a **pluggable transport layer** so a peer can join the mesh over WS, HTTP webhook, SSE, MQTT, or gRPC without changing the broker's data plane
- preserves **end-to-end encryption** as a non-negotiable for direct messages
This document specifies the architecture in three layers (identity, transport, channel), the foundational cleanup needed before adding any of it (Codex caught a few sharp issues), and the migration path that gets us there without a "v2 rewrite" event.
The CLI-first commitment from the North Star spec stays intact — every channel type and transport adapter must be invocable from `claudemesh <verb>` first, with MCP serving only `claude/channel` push.
---
## Why now
Three forcing functions:
1.**Multi-session interconnect already broke** (1.30.0 → 1.32.1). The per-session WS subsystem shipped without a push handler because the architecture assumed "one daemon WS per mesh handles everything" and then we bolted session WSes on top without finishing the inbound side. The shape is right; the wiring was incomplete. We need to formalize the role split before adding more transports.
2.**Codex review surfaced a correctness bug in the broker's drain.**`drainForMember` claims rows by setting `delivered_at = NOW()`*before* the WS push succeeds. If `ws.readyState !== OPEN` at push time, the row is marked delivered and the message is gone. This is at-most-once with no retry. Any future channel type or transport adapter inherits this bug if we don't fix it at the foundation.
3.**The agentic-comms market is becoming a thing.** Voice agents (LiveKit, Pipecat, ElevenLabs Conversational), OpenAI Assistants threads, MCP servers acting as autonomous workers, scheduled cron actors — they all need a "mesh" to coordinate. claudemesh has the right primitives (E2E crypto, peer presence, typed routing); it just needs the architecture to admit non-Claude peers without forking the codebase.
---
## Audience for this architecture
| Peer type | Identity | Transport | Channels they speak |
|---|---|---|---|
| **Claude Code session** (today) | Per-launch session pubkey, parent-attested by member key | WS to broker | DM, group, topic, system events; receives mid-turn push via MCP `claude/channel` |
| **Headless agent** (e.g. cron job, Hermes/OpenClaw worker) | Member pubkey (no per-launch session) | WS to broker, OR HTTP webhook outbound | DM, group, topic; no mid-turn push (polls inbox) |
| **Voice agent** (LiveKit/Pipecat call) | Service identity (signed by mesh owner) | WS to broker, possibly via TURN relay | DM (transcript stream), group (call participants), system events (call lifecycle) |
| **Human via Slack/WhatsApp bridge** | Service identity for the bridge, end-user mapped via membership | WS (bridge to broker) | DM, topic |
| **Webhook consumer** (Stripe-style passive listener) | Service identity, scoped to one channel | HTTP webhook outbound only | Topic (subscribe to events) |
Every row in this table needs to work without changing the broker's data plane.
---
## Layer 1: Identity
### Today
Two identity types coexist:
- **Member identity** — stable Ed25519 keypair held in `~/.claudemesh/config.json`. One per joined mesh. Used for hello signature on the daemon's main WS; used as the cryptographic root of trust for sibling sessions.
- **Session identity** — ephemeral Ed25519 keypair generated per `claudemesh launch`. Parent-signed attestation vouches for it (TTL 12h, broker cap 24h). Used for hello signature on the per-session WS; used as the routing key for DMs targeted at *this specific launched session*.
This is enough for Claude Code peers. It's not enough for the audience table above.
### Proposed: third identity type — **service identity**
A service identity is what a non-Claude integration uses to authenticate:
```
ServiceIdentity {
member_id // The mesh member that owns this service (auth boundary)
service_id // Stable id for the service ("openai-assistant-foo", "livekit-room-bar")
transport_hint // "ws" | "http-webhook" | "sse" — informs how the broker reaches it
}
```
**Three identity types, one auth model:**
- All identities resolve to a `member_id` (the auth boundary — grants, kicks, bans operate on members).
- Identities differ in *liveness* (member = always; session = per-launch; service = scoped/scheduled) and in *transport hint* (member/session = WS-resident; service = polymorphic).
**Backward compatibility:** existing member + session identities are unchanged. Service identity is additive.
### Cryptographic implications
- E2E encryption (`crypto_box`) targets a public key. Member pubkey, session pubkey, service pubkey all work the same way.
- A service that can't hold a long-lived secret (e.g. OpenAI Assistant calling out via HTTPS) gets a **delegated identity** the daemon holds — sender encrypts to the daemon's per-member key, daemon re-encrypts and forwards over the service's webhook. This adds trust in the daemon, but it's the only way to bridge to non-crypto-native peers without giving them raw secrets.
---
## Layer 2: Transport
### Today
One transport: **WebSocket to broker** (`wss://ic.claudemesh.com/ws`). Everything goes through it — hello, send, push, RPC. The CLI's daemon holds two WS instances per mesh (member-keyed `DaemonBrokerClient` + per-launch `SessionBrokerClient`).
### Proposed: transport adapter interface
```typescript
interfaceBrokerTransport{
/** One-time hello + auth handshake. Identity is opaque to the transport. */
/** Send a typed envelope. Returns a delivery promise (ack or terminal failure). */
send(envelope: Envelope):Promise<SendResult>;
/** Stream of inbound envelopes. Pull-model so a transport can be a webhook,
* not just a long-lived socket. */
inbound():AsyncIterable<Envelope>;
/** Close cleanly. */
close(reason?: string):Promise<void>;
/** Capabilities surfaced to the daemon — broker uses this to decide
* whether mid-turn push is possible, whether RPC blocks are
* supported, etc. */
capabilities: TransportCapabilities;
}
```
**Concrete adapters at v2.1.0:**
1.**`WsBrokerTransport`** — current WS implementation. The `DaemonBrokerClient` and `SessionBrokerClient` are recast as two roles using this transport with different hello payloads.
2.**`HttpWebhookTransport`** — for service identities that can't hold a WS open. Outbound: HTTP POST to the broker's `/v1/send`. Inbound: broker calls back to a registered webhook URL with retry + signature. Mid-turn push is not possible (degrades gracefully).
3.**`SseTransport`** — for browsers / restricted environments. Outbound: HTTP POST. Inbound: SSE stream from broker to client.
**Future adapters (v2.3+):**
4.**`LiveKitTransport`** — for voice agents. The "broker" is a LiveKit room; messages are LiveKit data-channel packets. Bridges to the central broker via a daemon side-car.
5.**`MqttTransport`** — for IoT / fleet scenarios.
6.**`GrpcTransport`** — for low-latency intra-cluster.
Any new adapter implements the same interface; broker logic is transport-agnostic at the API boundary.
### The two-role model (Codex's correction)
Even within one transport, the daemon holds **two roles per mesh**, not one connection per launch:
- **Control-plane connection** — one per mesh, member-keyed. Carries: outbox drain (one queue, can't race), `list_peers`/state/memory/skill RPCs, inbound for `*` broadcasts and member-targeted DMs (legacy traffic + zero-launch state).
- **Session connections** — N per mesh, session-keyed. Carries: presence row keyed on session pubkey, inbound for session-targeted DMs.
This is what we have today; the spec just makes the role split explicit. The mistake in 1.30.0–1.32.0 was treating session connections as "presence-only" instead of "second-class peers." 1.32.1 corrects that.
### Foundational cleanup (ship first, before any new transport)
1.**Extract `connectWsWithBackoff` helper** — current `DaemonBrokerClient` and `SessionBrokerClient` duplicate the WS lifecycle (open, hello, ack-timeout, close, backoff, reconnect). Codex's recommendation: composition, not inheritance. A single helper takes `{ url, buildHello, onMessage, onStatusChange }` and both clients call it. Eliminates the drift bug class that produced session_replaced thrashing.
2.**Drop the daemon's stray `sessionPubkey`** (`apps/cli/src/daemon/broker.ts:113`). It's a leftover from the era when the daemon WS was the only WS. The session role now owns session pubkeys. If we want the daemon itself to be addressable by a stable pubkey, rename it `daemonPubkey` and document it; today it's dead ballast.
3.**Tighten daemon-WS inbound filter, don't remove it** (Codex's correction to my prior take). Daemon WS should still receive `*` broadcasts and member-targeted DMs (legacy senders, zero-launch state). It should NOT decrypt session-targeted DMs (that's the session WS's job, and decryption requires the session secret which the daemon WS doesn't have anyway).
4.**Fix the broker drain race** (`apps/broker/src/broker.ts:2399-2402`). Add `claimed_at` + `claim_id` columns; claim sets `claimed_at = NOW()` (NOT `delivered_at`); push runs; `delivered_at = NOW()` is set ONLY after `ws.send` succeeds. Re-eligible if `claimed_at` is older than the lease timeout (e.g. 30s). Combined with `client_message_id` dedupe on the receiver side, this gives at-least-once semantics, which is what an agentic comms platform needs.
5.**Decouple presence-WS-role from session-WS-role at the broker.** Today `connectPresence` is called from both `handleHello` and `handleSessionHello`. The two paths diverge in identity (member vs session pubkey) and dedup key (sessionId in both cases). Make the role explicit on the presence row (`role: "control-plane" | "session" | "service"`) so list_peers, fan-out, and reconnect can reason about it. Hidden `claudemesh-daemon` rows in 1.32.0's `peer list` are a hack covering for missing typing.
---
## Layer 3: Channels
### Today
One channel type: **direct messages with target-spec routing**. `targetSpec` is a string that the broker pattern-matches:
-`<64-hex-pubkey>` → DM to that member or session
-`*` → broadcast to mesh
-`@<groupname>` → group post
-`#<topicId>` → topic post
This works but it's overloaded — the same `send` verb covers DMs, broadcasts, groups, topics, and (since v0.9) tagged messages. As we add agentic peers, the semantics matter and the routing key string can't carry them.
### Proposed: typed channel envelope
```typescript
typeChannelType=
|"dm"// 1:1 message, encrypted to recipient pubkey
|"group"// post to named group, encrypted per-recipient (today: base64 plaintext)
|"topic"// pub/sub topic, persisted, history available, per-topic symmetric key
|"rpc"// request/response, correlation id, timeout, structured result
- A voice agent sending a partial transcript wants `channel: "stream"` semantics — high-frequency, small chunks, idempotent, no per-message ack required.
- An OpenAI Assistant calling a tool wants `channel: "rpc"` — request-response with timeout, correlation back-edge so the response routes.
- A scheduled cron actor reporting completion wants `channel: "topic"` — fire-and-forget, persisted history.
- Today all of these get bolted onto `dm` with conventions; v2 envelope makes them first-class.
### Claude Code channels — first-class support
Two specific channel features for Claude Code:
1.**Mid-turn interrupt** (`claude/channel` push). Already implemented via the MCP push-pipe. The new envelope makes it explicit: `channel: "dm"` with `meta.priority: "now"` triggers MCP push to a launched session. Other priorities deliver at next inbox poll.
2.**Reply threading** (`meta.replyToId`). Already partially supported on topics; v2 makes it work uniformly across `dm` and `topic`. The receiver Claude Code session sees a structured reply thread instead of flat history.
3.**Mentions** (`meta.mentions`). Already supported on topics; v2 surfaces them on `dm` too — useful for `@<peer>` callouts in groups even when the message body is encrypted.
### Backward compatibility
Envelope v1 (today's shape) stays accepted by the broker until v3.x. v1 envelopes are auto-upgraded server-side: `channel` inferred from `targetSpec` shape (`*` → group/broadcast, `#` → topic, hex → dm). Existing CLIs keep working.
---
## Future integrations (concrete)
These are not part of v2.0 — they're the test cases the architecture must support:
### LiveKit voice agent
- Service identity: `livekit-room-<id>`, signed by mesh owner.
- Transport: dedicated daemon side-car hosts a LiveKit participant; data-channel packets bridge to the central broker via WS.
- Channels: `stream` for transcript chunks, `system` for call lifecycle (joined/left/muted), `dm` for sidebar text.
- E2E: per-call ephemeral keypair held by the side-car; participants' member keys are discovered via mesh peer list.
### OpenAI Assistant integration
- Service identity: `openai-assistant-<id>`, scoped to one or more topics + RPC.
- Transport: HTTP webhook out (broker → assistant API), HTTP POST in (assistant → broker `/v1/send`).
- Channels: `rpc` for tool-style invocations from claudemesh peers, `topic` for assistant-published events.
- Crypto: delegated to daemon (assistant can't hold a libsodium secret; daemon re-encrypts on its behalf).
### Generic webhook consumer (Stripe-style)
- Service identity: `webhook-<consumer-id>`, scoped to subscribed topics.
- Transport: HTTP webhook out only. No inbound — it's a passive sink.
- Channels: `topic` only.
- Crypto: not E2E; webhook bodies are signed (HMAC-SHA256, sender = mesh) but plaintext.
### Human-via-WhatsApp bridge
- Service identity: `whatsapp-bridge`, with member-mapping for each end-user.
- Transport: WS (bridge holds long connection to broker), bridges to WhatsApp Business API.
- [ ]**Fix drain race**: schema migration adds `claimed_at`, `claim_id`, `claim_expires_at` columns; rewrite `drainForMember` for two-phase claim/deliver; add re-claim path for stale leases.
- [ ] Receiver-side: harden `client_message_id` dedupe (already partial in 1.32.x; finish for at-least-once). Add idempotent insert that returns existing row on conflict.
**Success criteria:**
- Two-session smoke test still passes (1.32.1 baseline).
- Crash-mid-push test: kill broker between claim and send; verify message redelivers on broker restart + recipient reconnect.
- Reconnect storm test: 100 reconnect cycles per session over 60s; zero message loss.
### v2.1.0 — Transport adapter interface
**Target: 2–3 weeks after v2.0.0**
- [ ] Define `BrokerTransport` interface; refactor existing WS code to be the first implementation.
- [ ] Add `HttpWebhookTransport` adapter (broker side: outbound HTTP POST with retry + HMAC signature; daemon side: HTTP server that receives webhook callbacks and inserts into inbox).
- [ ] Add `/v1/send` HTTP endpoint on the broker (today the broker is WS-only for sends).
- [ ] Service identity registration flow: `claudemesh service register --type webhook --scopes dm:read,topic:write` mints attestation, stores it locally + on broker.
- [ ] Basic `SseTransport` for browser/CI use cases.
**Success criteria:**
- A scheduled cron job using only `curl` can send to the mesh (no daemon required).
- A webhook consumer subscribed to a topic receives messages within 5s of post.
### v2.2.0 — Typed channels (envelope v2)
**Target: 2–3 weeks after v2.1.0**
- [ ] Define `Envelope v2` schema; broker accepts both v1 and v2; sender-side code emits v2.
- WhatsApp bridge (validate human-bridge service identity).
These are not on the critical path for the architecture; they prove it.
---
## Non-goals (explicit)
- **Replacing Slack / Discord.** claudemesh is for agent coordination. Human chat is a side-effect, not the headline.
- **Federation across multiple brokers.** v2.0 stays single-broker per mesh. Multi-broker (gossip / federation) is a separate spec, post-v3.
- **Sync-only / no-broker P2P.** Direct peer-to-peer (without the central broker) is a different architecture (libp2p, Iroh). Not in scope.
- **Replacing the MCP push-pipe.** Mid-turn interrupt stays MCP-based. The transport-adapter layer is broker-side; MCP is daemon-to-Claude-Code, untouched.
---
## Open questions
1.**How does a service identity prove liveness?** WS gives us implicit liveness via the connection. HTTP webhook services need an explicit heartbeat / health-check. Proposal: broker periodically POSTs to `<webhook>/health`; service is marked offline after 3 consecutive failures.
2.**RPC routing through offline peers — what's the failure mode?** If `claudemesh rpc <peer> ...` and the peer is offline, do we (a) queue and wait (DM semantics) or (b) fail fast (REST semantics)? Proposal: RPC fails fast with `peer_offline` after a 5s probe; explicit `--wait` flag opts into DM-style queue.
3.**Per-topic symmetric key rotation.** Existing v0.2.0 spec mentions per-topic keys. Rotation policy (when, who triggers, how members re-sync) is unsolved. Defer to a separate spec; v2.2.0 ships with one-shot keys (rotate by re-creating topic).
---
## Acknowledgements
Cross-checked with Codex (GPT-5.2, high reasoning) on the foundational cleanup section. Codex caught:
- The "remove daemon-WS inbound entirely" idea would silently lose broadcasts + member-targeted DMs whenever zero launches exist. Corrected.
- Inheritance for the dup'd lifecycle would become a god class. Composition via helper is the right call.
- The drain race needs a `claimed_at` + delivered-on-success fix; "check OPEN before claim" still drops on crash.
- Token-keyed registry is correct (token = auth boundary), not a smell.
The agentic-comms / typed-channels / transport-adapter layers are mine — Codex didn't touch those because the question I asked was about the existing architecture's smells, not the future roadmap.
| Member private key never leaves the user's machine, but the **attestation** (signed token) can be replayed within its TTL. | TTL bound 24h; refresh on launch; revocation path = drop the parent member's mesh enrollment (nuclear, but works). |
| Cascading WS connections — N launches = N+1 broker WSes per user. | Acceptable up to 10-20 concurrent sessions; if it ever becomes a problem, multiplex per-session at the protocol level (one WS, multiple presence rows). Out of scope for v1. |
| Daemon restart kills all session WSes — `peer list` from inside a launched session sees the remaining 5 peers but not its own siblings until they re-register. | Same as 1.29.0 registry. The registry could persist to sqlite later; for v1, accepted. |
| Broker schema cost: every new presence row has a different `session_pubkey`, growing the table faster. | Already accepted — broker prunes disconnected rows on a 30-day window. Per-session keys triple the row count at peak but stay within the prune budget. |
## Compatibility
- **Older brokers** can't validate `session_hello`. Sessions will
attempt the new hello, get back `unknown_message_type`, and fall
back to the existing member-keyed hello (no per-session presence,
but everything still works as 1.28.0). Add the broker change first,
let it deploy, then ship the CLI side.
- **Older CLIs** continue to work unchanged — they don't open
per-session WSes. They appear as ephemeral cold-path rows just like
today, and lose the symmetric-visibility property between siblings.
- **Backward visible:** users on 1.30.0+ on the same mesh as users on
≤1.29.x will see the older users as one row (their daemon) instead
of one row per session. Acceptable — opt-in to the new visibility
| Daemon outbound routing (Sprint 4: real targets + crypto) | 1.25.0 | ✅ Done — outbox stores `mesh`, `target_spec`, `nonce`, `ciphertext`, `priority`; resolution + `crypto_box` happens at IPC accept time; drain is a forwarder |
| CLI thin-client routing for read verbs | 1.25.0 | ✅ Partial — `peer list`, `skill list/get` route through daemon when present; same `trySendViaDaemon` fallback shape |
| Ambient mode (raw `claude` Just Works) | 1.25.0 | ✅ Documented + functional for the daemon's attached mesh |
## What remains (in dependency order)
### A. Daemon multi-mesh (the prerequisite for "ambient mode for everything")
**Why it's the critical path:** ambient mode today only works for the single mesh the daemon is attached to. Users with N meshes either run N daemons (different sock paths) or restart the daemon to switch. Neither is acceptable for the v2.0.0 promise.
**What it takes:**
- Daemon holds `Map<slug, DaemonBrokerClient>` instead of one broker.
- Outbox row's `mesh` column (1.25.0 added) is the dispatch key.
- IPC `/v1/send` requires `mesh` field (or infers from target prefix `<slug>:<target>`).
- SSE event payloads already include `mesh` slug; no change needed.
- Drain worker selects broker by row's `mesh` column.
-`daemon up` with no `--mesh` attaches to all joined meshes; with `--mesh X` restricts to X (legacy mode for explicit single-mesh).
- Inbox dedupe keeps using `client_message_id` UNIQUE; mesh column for filtering only.
**Estimated effort:** 1 week. ~600 LoC across `run.ts`, `drain.ts`, `ipc/server.ts`, plus tests for per-mesh dispatch.
**Risk:** medium. The single-mesh assumption is baked into a few places (peer-list response shape, skill-list response shape). Need to choose: per-mesh tagged responses (breaking) or array-of-meshes wrapped responses (additive). Recommend the latter for back-compat.
### B. HKDF-derived peer keypairs (cross-machine identity)
**Why it matters:** today each install per machine = fresh keypair = different mesh member identity. User signs in on laptop and desktop and shows up as two different members. v2.0.0 promised "same identity across machines."
**What it takes:**
-`HKDF(account_secret, info: "claudemesh/mesh/<mesh_id>/peer", salt: <user_id>)` derives a deterministic ed25519 keypair per mesh.
-`account_secret` derives from the user's authenticated session — needs broker-side endpoint to vend it on first install.
- Enrollment flow changes: instead of generating a fresh keypair, derive it. Subsequent installs find the same pubkey already in `mesh.member` and skip enrollment.
- Migration: existing members keep their old keypairs (they're stored in config). Only new joins use HKDF. Optional: opt-in re-enrollment for users who want cross-machine sync.
**Risk:** high. Crypto change with security implications. Needs design review (account_secret distribution security, HKDF salt choice, key compromise recovery story).
### C. Mesh → workspace public surface rename
**Why it matters:** "mesh" is internal jargon for what users experience as "a workspace." v2.0.0 calls for the rename to align UX language.
**What it takes:**
- All CLI verbs gain `workspace` aliases (`claudemesh workspace list` ≡ `claudemesh list`).
- Help text, docs, README, marketing site updated.
- DB tables stay `mesh_*` (migration cost prohibitive; not user-visible).
- Wire protocol stays `mesh_*` (broker change too disruptive).
- Eventually deprecate the `mesh` aliases (~2 minor versions later).
**Estimated effort:** 3-4 days. Mostly rote search/replace + new aliases.
**Risk:** low. Cosmetic.
### D. Full CLI-to-thin-client conversion
**Why it matters:** today the CLI has bridge + cold-path code that duplicates ~3000 LoC of broker WS / crypto / decode logic that the daemon also has. Once daemon is multi-mesh, every verb can become "open IPC, send request, render response."
**What it takes:**
- Each verb: replace `withMesh(...)` (which opens its own broker WS) with `daemonOnly(...)` (calls IPC, errors if daemon down).
- Drop `bridge/server.ts`, `bridge/client.ts`, `bridge/socket-broker.ts` entirely.
- Drop most of `services/broker/ws-client.ts` from the CLI build (kept only for daemon's internal use).
- CLI binary shrinks ~30-40%.
- Daemon becomes the only broker WS holder per user.
**Risk:** medium. Breaks workflows where CLI is used without daemon (CI environments, headless scripts). Need to keep a `--no-daemon` escape hatch or document the constraint.
1.26.0 (next): A. Daemon multi-mesh — "ambient mode for everything"
1.27.0: D. CLI-to-thin-client conversion — drops ~3000 LoC
1.28.0: C. Mesh → workspace rename (aliases shipped, no removal yet)
2.0.0: B. HKDF identity (separate security-reviewed arc)
```
A → D → C → B is the right order:
- A unblocks ambient mode for multi-mesh users (highest UX value).
- D unblocks the LoC reduction the v2.0.0 promise mentioned ("3000 LoC removed").
- C is cosmetic; do it once D has stabilized.
- B is the most security-sensitive; do it last, with proper review.
## Out of scope for the v2.0.0 endpoint
- **Topic crypto (Sprint 5+).** Topics still ship as base64 plaintext. Real per-topic encryption is a v0.3.0 operator-layer item, parallel track.
- **Broker hardening for daemon idempotency (Sprint 7).** Partial unique index on `(mesh_id, client_message_id) WHERE NOT NULL` and the `mesh.client_message_dedupe` table. Documented in `2026-05-03-daemon-spec-broker-hardening-followups.md`.
- **`launch` deprecation.** 1.25.0 docs now recommend ambient mode for default cases; `launch` stays as the override path. Full deprecation is a 2.x decision.
- **Match the example**: Your implementation should follow the example project's patterns as closely as possible.
## Framework guidelines
- For Next.js 15.3+, initialize PostHog in instrumentation-client.ts for the simplest setup
- For feature flags, use useFeatureFlagEnabled() or useFeatureFlagPayload() hooks - they handle loading states and external sync automatically
- Add analytics capture in event handlers where user actions occur, NOT in useEffect reacting to state changes
- Do NOT use useEffect for data transformation - calculate derived values during render instead
- Do NOT use useEffect to respond to user events - put that logic in the event handler itself
- Do NOT use useEffect to chain state updates - calculate all related updates together in the event handler
- Do NOT use useEffect to notify parent components - call the parent callback alongside setState in the event handler
- To reset component state when a prop changes, pass the prop as the component's key instead of using useEffect
- useEffect is ONLY for synchronizing with external systems (non-React widgets, browser APIs, network subscriptions)
## Identifying users
Identify users during login and signup events. Refer to the example code and documentation for the correct identify pattern for this framework. If both frontend and backend code exist, pass the client-side session and distinct ID using `X-POSTHOG-DISTINCT-ID` and `X-POSTHOG-SESSION-ID` headers to maintain correlation.
## Error tracking
Add PostHog error tracking to relevant files, particularly around critical user flows and API boundaries.
This is a [Next.js](https://nextjs.org) App Router example demonstrating PostHog integration with product analytics, session replay, feature flags, and error tracking.
## Features
- **Product analytics**: Track user events and behaviors
- **Session replay**: Record and replay user sessions
- **Error tracking**: Capture and track errors
- **User authentication**: Demo login system with PostHog user identification
- **Server-side & Client-side tracking**: Examples of both tracking methods
- **Reverse proxy**: PostHog ingestion through Next.js rewrites
// Include the defaults option as required by PostHog
defaults:'2026-01-30',
// Enables capturing unhandled exceptions via Error Tracking
capture_exceptions: true,
// Turn on debug in development mode
debug: process.env.NODE_ENV==="development",
});
//IMPORTANT: Never combine this approach with other client-side PostHog initialization approaches, especially components like a PostHogProvider. instrumentation-client.ts is the correct solution for initializating client-side PostHog in Next.js 15.3+ apps.
description: Start the event tracking setup process by analyzing the project and creating an event tracking plan
---
We're making an event tracking plan for this project.
Before proceeding, find any existing `posthog.capture()` code. Make note of event name formatting.
From the project's file list, select between 10 and 15 files that might have interesting business value for event tracking, especially conversion and churn events. Also look for additional files related to login that could be used for identifying users, along with error handling. Read the files. If a file is already well-covered by PostHog events, replace it with another option. Do not spawn subagents.
Look for opportunities to track client-side events.
**IMPORTANT: Server-side events are REQUIRED** if the project includes any instrumentable server-side code. If the project has API routes (e.g., `app/api/**/route.ts`) or Server Actions, you MUST include server-side events for critical business operations like:
- Payment/checkout completion
- Webhook handlers
- Authentication endpoints
Do not skip server-side events - they capture actions that cannot be tracked client-side.
Create a new file with a JSON array at the root of the project: .posthog-events.json. It should include one object for each event we want to add: event name, event description, and the file path we want to place the event in. If events already exist, don't duplicate them; supplement them.
Track actions only, not pageviews. These can be captured automatically. Exceptions can be made for "viewed"-type events that correspond to the top of a conversion funnel.
As you review files, make an internal note of opportunities to identify users and catch errors. We'll need them for the next step.
## Status
Before beginning a phase of the setup, you will send a status message with the exact prefix '[STATUS]', as in:
description: Implement PostHog event tracking in the identified files, following best practices and the example project
---
For each of the files and events noted in .posthog-events.json, make edits to capture events using PostHog. Make sure to set up any helper files needed. Carefully examine the included example project code: your implementation should match it as closely as possible. Do not spawn subagents.
Use environment variables for PostHog keys. Do not hardcode PostHog keys.
If a file already has existing integration code for other tools or services, don't overwrite or remove that code. Place PostHog code below it.
For each event, add useful properties, and use your access to the PostHog source code to ensure correctness. You also have access to documentation about creating new events with PostHog. Consider this documentation carefully and follow it closely before adding events. Your integration should be based on documented best practices. Carefully consider how the user project's framework version may impact the correct PostHog integration approach.
Remember that you can find the source code for any dependency in the node_modules directory. This may be necessary to properly populate property names. There are also example project code files available via the PostHog MCP; use these for reference.
Where possible, add calls for PostHog's identify() function on the client side upon events like logins and signups. Use the contents of login and signup forms to identify users on submit. If there is server-side code, pass the client-side session and distinct ID to the server-side code to identify the user. On the server side, make sure events have a matching distinct ID where relevant.
It's essential to do this in both client code and server code, so that user behavior from both domains is easy to correlate.
You should also add PostHog exception capture error tracking to these files where relevant.
Remember: Do not alter the fundamental architecture of existing files. Make your additions minimal and targeted.
Remember the documentation and example project resources you were provided at the beginning. Read them now.
## Status
Status to report in this phase:
- Inserting PostHog capture code
- A status message for each file whose edits you are planning, including a high level summary of changes
description: Review and fix any errors in the PostHog integration implementation
---
Check the project for errors. Read the package.json file for any type checking or build scripts that may provide input about what to fix. Remember that you can find the source code for any dependency in the node_modules directory. Do not spawn subagents.
Ensure that any components created were actually used.
Once all other tasks are complete, run any linter or prettier-like scripts found in the package.json, but ONLY on the files you have edited or created during this session. Do not run formatting or linting across the entire project's codebase.
description: Review and fix any errors in the PostHog integration implementation
---
Use the PostHog MCP to create a new dashboard named "Analytics basics" based on the events created here. Make sure to use the exact same event names as implemented in the code. Populate it with up to five insights, with special emphasis on things like conversion funnels, churn events, and other business critical insights.
Search for a file called `.posthog-events.json` and read it for available events. Do not spawn subagents.
Create the file posthog-setup-report.md. It should include a summary of the integration edits, a table with the event names, event descriptions, and files where events were added, along with a list of links for the dashboard and insights created. Follow this format:
<wizard-report>
# PostHog post-wizard report
The wizard has completed a deep integration of your project. [Detailed summary of changes]
[table of events/descriptions/files]
## Next steps
We've built some insights and a dashboard for you to keep an eye on user behavior, based on the events we just instrumented:
[links]
### Agent skill
We've left an agent skill folder in your project. You can use this context for further agent development when using Claude Code. This will help ensure the model provides the most up-to-date approaches for integrating PostHog.
Linking events to specific users enables you to build a full picture of how they're using your product across different sessions, devices, and platforms.
This is straightforward to do when [capturing backend events](/docs/product-analytics/capture-events?tab=Node.js.md), as you associate events to a specific user using a `distinct_id`, which is a required argument.
However, in the frontend of a [web](/docs/libraries/js/features.md#capturing-events) or [mobile app](/docs/libraries/ios.md#capturing-events), a `distinct_id` is not a required argument — PostHog's SDKs will generate an anonymous `distinct_id` for you automatically and you can capture events anonymously, provided you use the appropriate [configuration](/docs/libraries/js/features.md#capturing-anonymous-events).
To link events to specific users, call `identify`:
PostHog AI
### Web
```javascript
posthog.identify(
'distinct_id',// Replace 'distinct_id' with your user's unique identifier
{email:'max@hedgehogmail.com',name:'Max Hedgehog'}// optional: set additional person properties
);
```
### Android
```kotlin
PostHog.identify(
distinctId=distinctID,// Replace 'distinctID' with your user's unique identifier
posthog.identify('distinct_id',{// Replace "distinct_id" with your user's unique identifier
email:'max@hedgehogmail.com',// optional: set additional person properties
name:'Max Hedgehog'
})
```
### Dart
```dart
awaitPosthog().identify(
userId:'distinct_id',// Replace "distinct_id" with your user's unique identifier
userProperties:{
email:"max@hedgehogmail.com",// optional: set additional person properties
name:"Max Hedgehog"
});
```
Events captured after calling `identify` are identified events and this creates a person profile if one doesn't exist already.
Due to the cost of processing them, anonymous events can be up to 4x cheaper than identified events, so it's recommended you only capture identified events when needed.
## How identify works
When a user starts browsing your website or app, PostHog automatically assigns them an **anonymous ID**, which is stored locally.
Provided you've [configured persistence](/docs/libraries/js/persistence.md) to use cookies or `localStorage`, this enables us to track anonymous users – even across different sessions.
By calling `identify` with a `distinct_id` of your choice (usually the user's ID in your database, or their email), you link the anonymous ID and distinct ID together.
Thus, all past and future events made with that anonymous ID are now associated with the distinct ID.
This enables you to do things like associate events with a user from before they log in for the first time, or associate their events across different devices or platforms.
Using identify in the backend
Although you can call `identify` using our backend SDKs, it is used most in frontends. This is because there is no concept of anonymous sessions in the backend SDKs, so calling `identify` only updates person profiles.
## Best practices when using `identify`
### 1\. Call `identify` as soon as you're able to
In your frontend, you should call `identify` as soon as you're able to.
Typically, this is every time your **app loads** for the first time, and directly after your **users log in**.
This ensures that events sent during your users' sessions are correctly associated with them.
You only need to call `identify` once per session, and you should avoid calling it multiple times unnecessarily.
If you call `identify` multiple times with the same data without reloading the page in between, PostHog will ignore the subsequent calls.
### 2\. Use unique strings for distinct IDs
If two users have the same distinct ID, their data is merged and they are considered one user in PostHog. Two common ways this can happen are:
- Your logic for generating IDs does not generate sufficiently strong IDs and you can end up with a clash where 2 users have the same ID.
- There's a bug, typo, or mistake in your code leading to most or all users being identified with generic IDs like `null`, `true`, or `distinctId`.
PostHog also has built-in protections to stop the most common distinct ID mistakes.
### 3\. Reset after logout
If a user logs out on your frontend, you should call `reset()` to unlink any future events made on that device with that user.
This is important if your users are sharing a computer, as otherwise all of those users are grouped together into a single user due to shared cookies between sessions.
**We strongly recommend you call `reset` on logout even if you don't expect users to share a computer.**
You can do that like so:
PostHog AI
### Web
```javascript
posthog.reset()
```
### iOS
```swift
PostHogSDK.shared.reset()
```
### Android
```kotlin
PostHog.reset()
```
### React Native
```jsx
posthog.reset()
```
### Dart
```dart
Posthog().reset()
```
If you *also* want to reset the `device_id` so that the device will be considered a new device in future events, you can pass `true` as an argument:
Web
PostHog AI
```javascript
posthog.reset(true)
```
### 4\. Person profiles and properties
You'll notice that one of the parameters in the `identify` method is a `properties` object.
This enables you to set [person properties](/docs/product-analytics/person-properties.md).
Whenever possible, we recommend passing in all person properties you have available each time you call identify, as this ensures their person profile on PostHog is up to date.
Person properties can also be set being adding a `$set` property to a event `capture` call.
See our [person properties docs](/docs/product-analytics/person-properties.md) for more details on how to work with them and best practices.
### 5\. Use deep links between platforms
We recommend you call `identify` [as soon as you're able](#1-call-identify-as-soon-as-youre-able), typically when a user signs up or logs in.
This doesn't work if one or both platforms are unauthenticated. Some examples of such cases are:
- Onboarding and signup flows before authentication.
- Unauthenticated web pages redirecting to authenticated mobile apps.
- Authenticated web apps prompting an app download.
In these cases, you can use a [deep link](https://developer.android.com/training/app-links/deep-linking) on Android and [universal links](https://developer.apple.com/documentation/xcode/supporting-universal-links-in-your-app) on iOS to identify users.
1. Use `posthog.get_distinct_id()` to get the current distinct ID. Even if you cannot call identify because the user is unauthenticated, this will return an anonymous distinct ID generated by PostHog.
2. Add the distinct ID to the deep link as query parameters, along with other properties like UTM parameters.
3. When the user is redirected to the app, parse the deep link and handle the following cases:
- The user is already authenticated on the mobile app. In this case, call [`posthog.alias()`](/docs/libraries/js/features.md#alias) with the distinct ID from the web. This associates the two distinct IDs as a single person.
- The user is unauthenticated. In this case, call [`posthog.identify()`](/docs/libraries/js/features.md#identifying-users) with the distinct ID from the web. Events will be associated with this distinct ID.
As long as you associate the distinct IDs with `posthog.identify()` or `posthog.alias()`, you can track events generated across platforms.
PostHog makes it easy to get data about traffic and usage of your [Next.js](https://nextjs.org/) app. Integrating PostHog into your site enables analytics about user behavior, custom events capture, session recordings, feature flags, and more.
This guide walks you through integrating PostHog into your Next.js app using the [React](/docs/libraries/react.md) and the [Node.js](/docs/libraries/node.md) SDKs.
> You can see a working example of this integration in our [Next.js demo app](https://github.com/PostHog/posthog-js/tree/main/playground/nextjs).
Next.js has both client and server-side rendering, as well as pages and app routers. We'll cover all of these options in this guide.
> **Try `@posthog/next` (pre-release):** A simplified Next.js integration with synchronized client/server identity, server-side flag bootstrapping, and a built-in API proxy. [Read the setup guide →](/docs/libraries/next-js/posthog-next.md)
## Prerequisites
To follow this guide along, you need:
1. A PostHog instance (either [Cloud](https://app.posthog.com/signup) or [self-hosted](/docs/self-host.md))
2. A Next.js application
## Beta: integration via LLM
Install PostHog for Next.js in seconds with our wizard by running this prompt with [LLM coding agents](/blog/envoy-wizard-llm-agent.md) like Cursor and Bolt, or by running it in your terminal.
`npx @posthog/wizard@latest`
[Learn more](/wizard.md)
Or, to integrate manually, continue with the rest of this guide.
## Client-side setup
Install `posthog-js` using your package manager:
PostHog AI
### npm
```bash
npm install --save posthog-js
```
### Yarn
```bash
yarn add posthog-js
```
### pnpm
```bash
pnpm add posthog-js
```
### Bun
```bash
bun add posthog-js
```
Add your environment variables to your `.env.local` file and to your hosting provider (e.g. Vercel, Netlify, AWS). You can find your project token in your [project settings](https://app.posthog.com/project/settings).
.env.local
PostHog AI
```shell
NEXT_PUBLIC_POSTHOG_TOKEN=<ph_project_token>
NEXT_PUBLIC_POSTHOG_HOST=https://us.i.posthog.com
```
These values need to start with `NEXT_PUBLIC_` to be accessible on the client-side.
## Integration
Next.js provides the [`instrumentation-client.ts|js`](https://nextjs.org/docs/app/api-reference/file-conventions/instrumentation-client) file for client-side setup. Add it to the root of your Next.js app (for both app and pages router) and initialize PostHog in it like this:
When using `instrumentation-client`, the values you pass to `posthog.init` remain fixed for the entire session. This means bootstrapping only works if you evaluate flags **before your app renders** (for example, on the server).
If you need flag values after the app has rendered, you’ll want to:
- Evaluate the flag on the server and pass the value into your app, or
- Evaluate the flag in an earlier page/state, then store and re-use it when needed.
Both approaches avoid flicker and give you the same outcome as bootstrapping, as long as you use the same `distinct_id` across client and server.
See the [bootstrapping guide](/docs/feature-flags/bootstrapping.md) for more information.
## Identifying users
> **Identifying users is required.** Call `posthog.identify('your-user-id')` after login to link events to a known user. This is what connects frontend event captures, [session replays](/docs/session-replay.md), [LLM traces](/docs/ai-engineering.md), and [error tracking](/docs/error-tracking.md) to the same person — and lets backend events link back too.
>
> See our guide on [identifying users](/docs/getting-started/identify-users.md) for how to set this up.
Set up a reverse proxy (recommended)
We recommend [setting up a reverse proxy](/docs/advanced/proxy.md), so that events are less likely to be intercepted by tracking blockers.
We have our [own managed reverse proxy service](/docs/advanced/proxy/managed-reverse-proxy.md), which is free for all PostHog Cloud users, routes through our infrastructure, and makes setting up your proxy easy.
If you don't want to use our managed service then there are several other options for creating a reverse proxy, including using [Cloudflare](/docs/advanced/proxy/cloudflare.md), [AWS Cloudfront](/docs/advanced/proxy/cloudfront.md), and [Vercel](/docs/advanced/proxy/vercel.md).
Grouping products in one project (recommended)
If you have multiple customer-facing products (e.g. a marketing website + mobile app + web app), it's best to install PostHog on them all and [group them in one project](/docs/settings/projects.md).
This makes it possible to track users across their entire journey (e.g. from visiting your marketing website to signing up for your product), or how they use your product across multiple platforms.
Add IPs to Firewall/WAF allowlists (recommended)
For certain features like [heatmaps](/docs/toolbar/heatmaps.md), your Web Application Firewall (WAF) may be blocking PostHog’s requests to your site. Add these IP addresses to your WAF allowlist or rules to let PostHog access your site.
The [React feature flag hooks](/docs/libraries/react.md#feature-flags) work automatically when PostHog is initialized via `instrumentation-client.ts`. The hooks use the initialized posthog-js singleton:
See the [React SDK docs](/docs/libraries/react.md) for examples of how to use:
- [`posthog-js` functions like custom event capture, user identification, and more.](/docs/libraries/react.md#using-posthog-js-functions)
- [Feature flags including variants and payloads.](/docs/libraries/react.md#feature-flags)
You can also read [the full `posthog-js` documentation](/docs/libraries/js/features.md) for all the usable functions.
## Server-side analytics
Next.js enables you to both server-side render pages and add server-side functionality. To integrate PostHog into your Next.js app on the server-side, you can use the [Node SDK](/docs/libraries/node.md).
First, install the `posthog-node` library:
PostHog AI
### npm
```bash
npm install posthog-node --save
```
### Yarn
```bash
yarn add posthog-node
```
### pnpm
```bash
pnpm add posthog-node
```
### Bun
```bash
bun add posthog-node
```
### Router-specific instructions
## App router
For the app router, we can initialize the `posthog-node` SDK once with a `PostHogClient` function, and import it into files.
This enables us to send events and fetch data from PostHog on the server – without making client-side requests.
> **Note:** Because server-side functions in Next.js can be short-lived, we set `flushAt` to `1` and `flushInterval` to `0`.
>
> - `flushAt` sets how many capture calls we should flush the queue (in one batch).
> - `flushInterval` sets how many milliseconds we should wait before flushing the queue. Setting them to the lowest number ensures events are sent immediately and not batched. We also need to call `await posthog.shutdown()` once done.
To use this client, we import it into our pages and call it with the `PostHogClient` function:
JavaScript
PostHog AI
```javascript
importLinkfrom'next/link'
importPostHogClientfrom'../posthog'
exportdefaultasyncfunctionAbout(){
constposthog=PostHogClient()
constflags=awaitposthog.getAllFlags(
'user_distinct_id'// replace with a user's distinct ID
For the pages router, we can use the `getServerSideProps` function to access PostHog on the server-side, send events, evaluate feature flags, and more.
> **Note**: Make sure to *always* call `await client.shutdown()` after sending events from the server-side. PostHog queues events into larger batches, and this call forces all batched events to be flushed immediately.
### Server-side configuration
Next.js overrides the default `fetch` behavior on the server to introduce their own cache. PostHog ignores that cache by default, as this is Next.js's default behavior for any fetch call.
You can override that configuration when initializing PostHog, but make sure you understand the pros/cons of using Next.js's cache and that you might get cached results rather than the actual result our server would return. This is important for feature flags, for example.
next_options:{// Passed to the `next` option for `fetch`
revalidate:60,// Cache for 60 seconds
tags:['posthog'],// Can be used with Next.js `revalidateTag` function
},
}
})
```
## Configuring a reverse proxy to PostHog
To improve the reliability of client-side tracking and make requests less likely to be intercepted by tracking blockers, you can setup a reverse proxy in Next.js. Read more about deploying a reverse proxy using [Next.js rewrites](/docs/advanced/proxy/nextjs.md), [Next.js middleware](/docs/advanced/proxy/nextjs-middleware.md), and [Vercel rewrites](/docs/advanced/proxy/vercel.md).
## Further reading
- [How to set up Next.js analytics, feature flags, and more](/tutorials/nextjs-analytics.md)
- [How to set up Next.js pages router analytics, feature flags, and more](/tutorials/nextjs-pages-analytics.md)
- [How to set up Next.js A/B tests](/tutorials/nextjs-ab-tests.md)
# [OPTIONAL] Signup credits (default: 100 in production)
FREE_TIER_CREDITS=100
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.