Seven-ship sequence that took the daemon from "works for one session"
to "internally consistent for N sessions on one daemon." Architecture
invariant after 1.34.13: every shared store / channel scopes by
recipient (SSE demux at bind layer + token forwarding, inbox per-
recipient columns, outbox sender-session routing).
- 1.34.7 inbox flush + delete commands
- 1.34.8 seen_at column + TTL prune + first echo guard
- 1.34.9 broader echo guard + system-event polish + staleness warning
- 1.34.10 per-session SSE demux (SseFilterOptions) + universal daemon
(--mesh / --name deprecated) + daemon_started version stamp
- 1.34.11 inbox per-recipient column (storage half of 1.34.10)
- 1.34.12 daemon up detaches by default (logs to ~/.claudemesh/daemon/
daemon.log; service units explicitly pass --foreground)
- 1.34.13 MCP forwards session token on /v1/events — the actual fix
that activates 1.34.10's demux. Without this header the
daemon's session resolved null, filter was empty, every MCP
received the unfiltered global stream.
Roadmap entry at docs/roadmap.md captures the timeline + the four
known gaps tracked for follow-ups (launch env-var leak, broker
listPeers mesh-filter, kick on control-plane no-op, session caps as
first-class concept).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five info-level log points across the WS lifecycle helper:
ws_open_attempt / ws_open_ok / ws_hello_sent / ws_hello_acked /
ws_closed (with status + close code/reason).
Surfaced during M1 smoke testing — without these the only visible
signal was "presence row missing on broker," which made it hard to
distinguish "WS never opened" / "opened but hello rejected" /
"acked then closed by broker."
Both clients prefix the helper-emitted msg ("session_broker_*",
"broker_*") so log greps stay clean per role.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves the merge of m1-broker-drain-race-and-presence-role and
m1-cli-lifecycle-and-role-peer-list into main:
* Rename wire-level role classification field `role` → `peerRole`
to avoid collision with 1.31.5's top-level `role` lift of
`profile.role` (user-supplied string consumed by the agent-vibes
claudemesh skill). `peerRole` is the broker presence taxonomy
(control-plane/session/service); top-level `role` keeps its 1.31.5
semantics.
- apps/broker/src/broker.ts (listPeersInMesh return)
- apps/broker/src/index.ts (peers_list response)
- apps/broker/src/types.ts (WSPeersListMessage)
- apps/cli/src/commands/peers.ts (PeerRecord + filter + lift)
* Wire CLI client_ack emission: handleBrokerPush gains
ackClientMessage callback; daemon-WS and session-WS each got a
sendClientAck() method that frames {type:"client_ack",
clientMessageId, brokerMessageId?} and forwards via the lifecycle
helper. Run.ts wires the callback into both onPush paths.
Receiver dedupes against existing inbox row first then acks
unconditionally — broker needs the ack regardless of dedupe to
release its claim lease.
- apps/cli/src/daemon/inbound.ts (ackClientMessage in InboundContext)
- apps/cli/src/daemon/broker.ts + session-broker.ts (sendClientAck)
- apps/cli/src/daemon/run.ts (wire-up)
* Version bump 1.32.1 → 1.33.0; CHANGELOG entry replaces "Unreleased"
with full m1 description.
Verification: tsc clean across cli + broker; CLI 83/83 unit tests
pass; broker 50 unit tests pass (5 integration test files require a
live Postgres and were skipped — pre-existing infra gap, not a
regression). CLI bundle rebuilt; version 1.33.0 baked.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Foundational cleanups before agentic-comms architecture work
(.artifacts/specs/2026-05-04-agentic-comms-architecture-v2.md).
All behavior-preserving.
1. Extract `connectWsWithBackoff` into apps/cli/src/daemon/ws-lifecycle.ts.
Both DaemonBrokerClient and SessionBrokerClient now share one
lifecycle implementation (connect, hello-handshake, ack-timeout,
close + backoff reconnect). Each client provides its own buildHello
/ isHelloAck / onMessage hooks and keeps its own RPC bookkeeping
(pendingAcks, peerListResolvers, onPush). Composition over
inheritance per Codex's review; no protocol shape changes.
2. Drop daemon-WS ephemeral session pubkey. DaemonBrokerClient no
longer mints + sends a per-reconnect ephemeral keypair in its
hello. Session-targeted DMs land on SessionBrokerClient since
1.32.1, not the member-keyed daemon-WS, so the field was
vestigial. Send-encrypt path now signs DMs with the stable mesh
member secret. handleBrokerPush invocations from daemon-WS only
pass the member secret — session decryption is the session-WS's
job.
3. Role-aware peer list. `peer list` now hides peers whose
broker-emitted `role` is `'control-plane'`. `--all` opts back in.
JSON output emits `role` at top level. Older brokers that don't
emit role yet default to 'session', so legacy peer rows stay
visible without the broker-side change shipped first. Replaces
the prior `peerType === 'claudemesh-daemon'` channel-name hack.
Typecheck + tests + build all green.
SessionBrokerClient (daemon-side, since 1.30.0) was constructed
without a push handler and silently dropped every inbound `push` /
`inbound` frame. Header docstring claimed it handled "inbound DM
delivery for messages targeted at the session pubkey" but the
callback was never wired.
Net effect: any DM sent to a peer's session pubkey (everything
`peer list` returns now) was queued, broker-acked, marked
delivered_at on the broker, and thrown away by the recipient
daemon. inbox.db stayed at zero rows; `claudemesh inbox` reported
"no messages" no matter what arrived.
Two-session smoke surfaced this — sender outbox status=done with
broker_message_id, recipient inbox empty.
Fix: wire SessionBrokerClient to forward push/inbound frames to
the same handleBrokerPush the member-keyed broker already uses.
Pass the per-session secret key as sessionSecretKeyHex so
decryptOrFallback tries it first; member key remains the fallback
for legacy member-targeted traffic.
Verified end-to-end with two registered sessions sending in both
directions — inbox.db row count went 0 → 2.
Files: apps/cli/src/daemon/session-broker.ts,
apps/cli/src/daemon/run.ts. No broker change required.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real production bug observed in 1.31.0 / 1.31.1: every CLI verb from
inside a claudemesh launch-spawned session printed
[claudemesh] warn service-managed daemon not responding within 8000ms
even when the launchd-managed daemon was healthy and answering
direct UDS probes in 10ms.
Root cause: claudemesh launch exports CLAUDEMESH_CONFIG_DIR to a
per-session tmpdir so joined-mesh state and the IPC session token
stay isolated. DAEMON_PATHS read from the same env, so inside a
launched session the CLI looked for daemon.sock at
/var/folders/.../claudemesh-XXXX/daemon/daemon.sock — which never
exists. The CLI declared the daemon down, fell into the service-
managed wait branch, and timed out.
The daemon is a per-machine singleton serving every session; its
files live at ~/.claudemesh/daemon/ regardless of overlays. Pin
DAEMON_PATHS.DAEMON_DIR to that location. New CLAUDEMESH_DAEMON_DIR
override is preserved for tests and multi-daemon dev setups.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1.31.0 introduced a session reaper that called execFileSync(ps) once
per registered session every 5s. With many sessions registered, the
daemon's event loop stalled for hundreds of ms — long enough that
incoming /v1/version probes from the CLI timed out against a healthy
daemon and the new service-managed warning fired.
Fix:
- getProcessStartTime is now async (execFile + promisify); never
blocks the event loop
- New getProcessStartTimes(pids) issues one batched ps for all
survivors instead of N separate forks. Sweep cost is fixed
regardless of session count.
- registerSession stays sync; start-time capture is fire-and-forget
- reapDead is now async; the setInterval wrapper voids it so a
rejected sweep cannot crash the daemon
Behavior is otherwise unchanged from 1.31.0: same 5s cadence, same
PID-reuse guard semantics, same broker-WS teardown via the registry
hook. 83/83 tests still green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three operability fixes for users running the daemon under launchd or
systemd.
PID-watcher autoclean
=====================
The session reaper already dropped registry entries with dead pids on
a 30s loop, but had two real-world gaps:
- 30s sweep let stale presence linger on the broker for half a minute
- bare process.kill(pid, 0) trusts a recycled pid; a registry entry
could survive its real owner's death whenever the OS rolled the
pid number forward to a new program
Process-exit IPC from claude-code is best-effort and skipped on
SIGKILL / OOM / segfault / panic, so it cannot replace the sweep.
Fix:
- New process-info.ts captures opaque per-process start-times via
ps -o lstart= (works on macOS and Linux, ~1 ms per call)
- registerSession stores the start-time alongside the pid
- reapDead drops entries when pid is dead OR start-time changed
since register
- Sweep cadence 30s -> 5s
- Best-effort fallback to bare liveness when start-time capture
fails at register time
Registry hooks already close the per-session broker WS on
deregister, so peer list rebuilds within one sweep of any session
exit.
Service-managed daemon: no more "spawn failed" false alarms
===========================================================
After claudemesh install (which writes a launchd plist or systemd
unit with KeepAlive=true), users routinely saw
[claudemesh] warn daemon spawn failed: socket did not appear
within 3000ms
even when the daemon was running fine. Two contributing causes:
1. Probe timeout was 800ms — the first IPC after a launchd-driven
restart can take longer (SQLite migration + broker WS opens) and
tripped it. Bumped to 2500ms.
2. On a failed probe the CLI tried its own detached spawn, which
collided with launchd's KeepAlive restart cycle (singleton lock
fails, child exits) and we'd then time out polling for a socket
that was actually about to come up.
Now: when the launchd plist or systemd unit exists, the CLI does not
attempt a spawn. It waits up to 8s for the OS-managed unit to bring
the socket up. New service-not-ready state distinguishes "OS hasn't
restarted it yet" from "we tried to spawn and it failed".
Install verifies broker connectivity, not just process start
============================================================
Previously install ended once launchctl reported the unit loaded —
a daemon that boots but cannot reach the broker (blocked :443,
expired TLS, DNS, broker outage) only surfaced on the user's first
peer list or send.
/v1/health now includes per-mesh broker WS state. install polls it
for up to 15s after service boot and prints either "broker
connected (mesh=...)" or a warning naming the meshes still in
connecting state, with a hint at common causes.
The verification is best-effort and does not fail the install — it
just surfaces the issue early.
Tests
=====
4 new vitest cases cover the reaper paths: dead pid, live pid plus
matching start-time, live pid plus mismatched start-time (PID
reuse), and the no-start-time fallback. 83 of 83 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
claudemesh install was baking --mesh <primary> into the launchd plist /
systemd unit, locking the daemon to a single mesh and contradicting
1.26.0's multi-mesh design. users with >1 joined mesh fell off the
daemon path on every non-primary verb (cold-WS fallback, peer list
returning all meshes because the server-side filter ran against zero
attached state, "daemon spawn failed: socket did not appear" from
launched sessions in sibling meshes).
now: meshSlug is optional in InstallArgs; claudemesh install omits it
so the unit runs `claudemesh daemon up` with no flag, which attaches
to every joined mesh. `claudemesh daemon install-service --mesh <slug>`
is preserved as opt-in for single-mesh hosts and CI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
two install-path fixes that bit on first 1.30.0 upgrade:
- pin node by absolute path in launchd plist / systemd unit. shebang's
/usr/bin/env node resolved against the service environment PATH and
picked up system Node 22.x, which lacks node:sqlite (experimental)
→ daemon died with ERR_UNKNOWN_BUILTIN_MODULE. process.execPath now
goes first, so the daemon always runs under the same Node that ran
claudemesh install.
- tear down the old daemon before bootstrapping. claudemesh install on
a machine with an already-running daemon hit Bootstrap failed: 5:
Input/output error (launchctl refuses to re-bootstrap a loaded unit
+ old daemon held the singleton lock). Now we run launchctl bootout
(systemd: systemctl --user stop) first, plus SIGTERM to any orphan
pid in daemon.pid, so subsequent installs replace cleanly.
both fixes apply to darwin and linux paths. windows path is unchanged
— it doesn't have a service-install today (daemon-install-service
errors with "unsupported platform" on win32).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- broker-actions: msg-status section header used out-of-scope `id`
variable; was a real bug (renders "message undefined…" on the JSON
path). Fixed to use the in-scope lookupId.
- exit-codes: add IO_ERROR (10) — referenced in three places by
platform-actions but never declared.
- types/text-import.d.ts: declare wildcard `*.md` module so Bun's
text-import attribute used by skill.ts typechecks.
- ipc/server: cast PeerSummary/SkillSummary through unknown before
spreading into Record<string, unknown>.
- mcp/server: typed JSON.parse for SSE events.
- bridge/daemon-route: import path with .ts → .js (esm).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
per-session presence is small and uncomplicated enough that a rollback
flag isn't load-bearing. backwards compat is already covered at the
protocol layer — older brokers reply unknown_message_type to
session_hello and the SessionBrokerClient marks itself closed for that
mesh, which is the same outcome the flag would have given. removing
the flag, the helper, and the conditional from the registry hook.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
flips CLAUDEMESH_SESSION_PRESENCE default to ON. With the broker side
already shipped (the session_hello handler from earlier in this sprint
A wave), every claudemesh launch now gets its own long-lived broker
presence row owned by the daemon and identified by a per-launch
ephemeral keypair vouched by the member's stable key. Two sessions in
the same cwd finally see each other in peer list — the symptom users
have been hitting since 1.28.0 dropped the bridge tier.
Bumps roadmap: 1.30.0 = presence (was queued for 1.30/wizard); the
launch-wizard refactor moves to 1.31.0, setup wizard to 1.32.0, the
mesh→workspace rename to 1.33.0. Verification smoke documented in the
1.30.0 changelog entry.
Rollback: CLAUDEMESH_SESSION_PRESENCE=0 (also accepts "false"/"off").
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
daemon-side half of 1.30.0 per-session broker presence. behind
CLAUDEMESH_SESSION_PRESENCE=1 (default OFF this cycle so the broker
side bakes before the flag flips).
- SessionBrokerClient (apps/cli/src/daemon/session-broker.ts) — slim
WS that opens with session_hello, presence-only, no outbox drain.
- session-hello-sig.ts — signParentAttestation (12h TTL, ≤24h cap) and
signSessionHello, mirroring the broker canonical formats.
- session-registry: optional presence field on SessionInfo;
setRegistryHooks for onRegister/onDeregister callbacks. Hook errors
are caught so they can never throttle registry mutations.
- IPC POST /v1/sessions/register accepts the presence material under
body.presence (session_pubkey, session_secret_key, parent_attestation).
Older callers without it stay scoped + supported.
- run.ts wires the registry hooks: on register, opens a SessionBrokerClient
for the matching mesh; on deregister (explicit or reaper), closes it.
Shutdown closes any remaining session WSes before the IPC server.
8 new unit tests cover registry lifecycle (replace/throw/presence
roundtrip) and signature canonical-bytes verification against libsodium.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
every claudemesh launch-spawned session now mints a 32-byte random
token, writes it under tmpdir (mode 0600), and registers it with the
daemon. cli invocations from inside that session inherit
CLAUDEMESH_IPC_TOKEN_FILE in env, attach the token via Authorization:
ClaudeMesh-Session <hex>, and the daemon resolves it to a SessionInfo.
server-side: every read route that filters by mesh now uses meshFromCtx —
explicit query/body wins, session default fills in when missing. write
routes follow the same pattern.
cli-side: peers.ts (and other multi-mesh-iterating verbs in future)
prefers session-token mesh over all joined meshes when the user didn't
pass --mesh explicitly.
backward-compatible in both directions — tokenless callers behave
exactly as before. registry is in-memory; daemon restart loses it but
the 30s reaper handles dead pids and most callers re-register on next
launch.
verified end-to-end: peer list with token returns 4 prueba1 peers,
without token returns 3 meshes' peers (aggregate).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
extend the daemon thin-client surface to two more verb families: state
get/set/list now routes through `/v1/state`, and remember/recall/forget
through `/v1/memory`. same warm-path pattern as 1.25.0 — try the unix
socket first, fall back to the cold ws path when the daemon is absent.
multi-mesh aware (aggregates on read, requires `--mesh` for writes
when ambiguous).
also ships an early `claudemesh workspace <verb>` alias surface — bare
teaser for the 1.28.0 mesh→workspace public rename. no-arg falls
through to launch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 1.26.0 step that finally delivers ambient mode for multi-mesh
users. Daemon holds Map<slug, DaemonBrokerClient>; one process, one
PID per user, all your meshes online concurrently.
run.ts: claudemesh daemon up with no --mesh attaches to every joined
mesh from config. --mesh <slug> still scopes to one (legacy mode).
The daemon_started log line reports meshes: [...] instead of mesh.
drain.ts: dispatches each outbox row to the broker keyed by row.mesh
(column added in 1.25.0). Legacy rows with mesh=NULL fall back to the
only broker if there's exactly one, otherwise mark dead with a clear
error.
ipc/server.ts:
- GET /v1/peers aggregates across all attached meshes; each peer
record gains a mesh field. ?mesh=<slug> narrows server-side.
- GET /v1/skills aggregates similarly; /v1/skills/:name walks meshes
and returns first match.
- POST /v1/send requires mesh field on multi-mesh daemons; auto-picks
on single-mesh; returns 400 with attached list if ambiguous.
- POST /v1/profile accepts optional mesh; without it, fans out to all
attached meshes (consistent presence).
CLI: trySendViaDaemon now forwards expectedMesh as the body's mesh
field (was informational, now authoritative). claudemesh send
--mesh A and --mesh B from the same shell both route to the right
broker via the same daemon process.
Verified: aggregated peer list across 3 attached meshes; cross-mesh
sends from CLI reach status=done with correct broker_message_ids.
Released as 1.26.0 on npm.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Daemon outbox now stores resolved target_spec + crypto_box ciphertext
+ nonce per row. Drain worker is a forwarder; no per-row resolution at
drain time. Outbound routing is no longer a placeholder.
Schema additions (additive, NULL allowed for legacy rows): outbox.mesh,
target_spec, nonce, ciphertext, priority. v0.9.0 rows keep draining via
the broadcast fallback so existing in-flight rows finish cleanly.
IPC /v1/send resolves the user-friendly to (display name, hex prefix,
full pubkey, @group, *, #topicId) into a broker-format target_spec at
accept time. DMs encrypt via crypto_box; broadcast/topic/group base64
the plaintext. Hex prefixes (16+ chars) match against connected peers.
CLI thin-client routing extends trySendViaDaemon pattern to peer list
and skill list/get. Three new helpers in services/bridge/daemon-route.ts.
SKILL.md gains ambient mode section: after claudemesh install, raw
claude works for the daemon's attached mesh. Launch stays as the
override path.
Spec at .artifacts/specs/2026-05-04-v2-roadmap-completion.md orders
the remaining v2.0.0 work: multi-mesh daemon (1.26), CLI-to-thin-client
(1.27), mesh-to-workspace rename (1.28), HKDF identity (2.0).
Released as 1.25.0 on npm.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The architectural convergence v0.9.0 was building toward. CLI keeps
working without a daemon (claudemesh send/peer/inbox/...), but the MCP
push-pipe — which Claude Code uses for mid-turn channel emits, slash
commands, and resources — now requires the daemon. There is no fallback.
Daemon (additive):
- /v1/skills (list) and /v1/skills/:name (get) IPC endpoints, so the
MCP shim can surface mesh skills without holding its own broker WS.
- listSkills() / getSkill() on DaemonBrokerClient.
- SSE 'message' event now carries plaintext body, sender_member_pubkey,
priority, and subtype — full payload the MCP shim needs to render a
channel notification.
MCP server: 979 → 469 LoC (470 of the remaining 469 is the unrelated
mesh-service proxy mode; the push-pipe path is ~200 LoC including
boilerplate).
- Probes ~/.claudemesh/daemon/daemon.sock at boot. Bails loudly with
actionable instructions if missing.
- Subscribes to /v1/events SSE and translates each event into a
notifications/claude/channel emit.
- Fetches mesh skills from the daemon for ListPrompts/GetPrompt and
ListResources/ReadResource. ListTools returns []; the CLI is the API.
- No broker WS, no decryption, no reconnect logic. Daemon owns all of it.
claudemesh install: auto-installs and starts the daemon service for the
user's primary mesh (launchd / systemd-user). Pass --no-service to skip.
claudemesh launch: probes the daemon socket; if absent, spawns
'claudemesh daemon up --mesh <slug>' detached and waits up to 10s for
the socket. Surfaces a clear warning on timeout but doesn't block —
Claude Code's MCP shim will print the same error if the daemon really
isn't there.
Bundle: dist/entrypoints/mcp.js drops from 154KB → 104KB (gzipped 34KB
→ 19KB). Test: MCP boots cleanly via stdio, declares correct
capabilities, talks JSON-RPC; daemon /v1/skills returns the empty list
as expected on a mesh with no skills.
Released as 1.24.0 on npm.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>