# Per-session broker presence — daemon-multiplexed **Status:** spec, queued for 1.30.0 (alongside launch-wizard refactor). **Owner:** alezmad **Author:** Claude (Sprint A planning, 2026-05-04) **Related:** `2026-05-04-v2-roadmap-completion.md` (Sprint A overview), 1.29.0 session-registry CHANGELOG entry. ## Problem After 1.28.0 dropped the bridge tier, **launched `claude` sessions have no persistent broker presence**. Only the daemon does. Concretely: two `claudemesh launch` sessions in the same cwd, querying `peer list` 2 s apart, **never see each other**. Each `claudemesh peer list` opens a short-lived cold-path WS that creates a `presence` row for the duration of the query and tears it down. The "this session" row everyone sees in their own snapshot is created by the snapshot itself; sibling sessions' queries miss it because their WS-lifetimes don't overlap. Confirmed empirically (2026-05-04, same-cwd ECIJA-Intranet test): | Snapshot | timestamp | self pubkey | self `connectedAt` | |---|---|---|---| | Session A | 11:42:37Z | `61d96106cb499208` | 11:42:38Z (= query time) | | Session B | 11:42:39Z | `ce77188aba02827d` | 11:42:38Z (= query time) | Each saw 5 long-lived peers (the daemon and unrelated other sessions) plus its own ephemeral row. Neither saw the other. ## Goal Every launched `claude` session has a long-lived broker presence row **owned by the daemon**, identified by the session's per-launch keypair. Siblings see each other in `peer list` immediately and continuously, not as snapshot artifacts. ## Non-goals - Cross-machine session sync (waiting on 2.0.0 HKDF identity). - Replacing the daemon's own presence row — the daemon stays as a separate row for "the user on this machine, no specific session." - Persistence of the session-presence link across daemon restarts — daemon restart can be allowed to require launched sessions to re-register (same compromise as the in-memory session registry from 1.29.0). ## Design ### State machine The 1.29.0 session registry already tracks `Map` inside the daemon. Extend it to own a per-session broker connection. ``` session lifecycle: POST /v1/sessions/register → registry.set(token, info) → daemon.openSessionWs(info) ← NEW → broker creates presence row owned by session.pubkey DELETE /v1/sessions/:token → registry.delete(token) → daemon.closeSessionWs(token) ← NEW → broker marks presence.disconnectedAt = now() reaper (30 s tick): pid dead? → registry.delete(token) → daemon.closeSessionWs(token) ``` ### Daemon-side: per-session `BrokerClient` Today the daemon holds `Map` (one WS per attached mesh). Add a parallel `Map` for the per-launch ephemeral connections. `SessionBrokerClient` is the existing `BrokerClient` reused, configured with the session's per-launch keypair instead of the member's stable keypair. It registers presence (`presence_join`) and stays connected until `closeSessionWs(token)` fires. It does **not** drain the outbox — that's the member-keypair `DaemonBrokerClient`'s job. It only carries presence + receives DMs targeted at the session pubkey. ### Broker-side: parent-vouched presence auth Today's broker accepts hello-sig auth where: - Caller signs the broker's nonce with their `mesh_member` keypair. - Broker looks up `mesh_member.peer_pubkey == sig.pubkey`. For per-session keypairs, the session pubkey is **not** in `mesh_member` — it's freshly generated by `claudemesh launch`. We need a new attestation flow: ``` hello { type: "session_hello", session_pubkey: , parent_member_pubkey: , display_name, cwd, role, groups, parent_signature: ed25519_sign(member_priv, "claudemesh-session/" || session_pubkey || "/" || nonce), nonce_challenge: , } ``` Broker validates: 1. `parent_member_pubkey` exists in `mesh.member` for the target mesh. 2. `parent_signature` validates against `parent_member_pubkey` over the canonical message above. 3. Broker inserts a presence row keyed on `session_pubkey` but `member_id` pointing at the parent member's `mesh.member.id`. This is the OAuth-style refresh-vs-access pattern: the parent member key vouches "this ephemeral session pubkey belongs to me." The broker binds the row to the parent member but uses the session pubkey for routing (so DMs targeted at the session pubkey land at this WS). ### CLI-side: launch.ts produces the parent signature `claudemesh launch` already mints the session keypair and writes the session-token file. Extend it to also produce a `parent_signature` that the daemon can present when opening the session WS: ```ts const sessionPubkey = sessionKeypair.publicKey; const parentSig = ed25519_sign( mesh.secretKey, Buffer.concat([ Buffer.from("claudemesh-session/"), sessionPubkey, Buffer.from("/"), /* nonce comes from broker — handled at WS-connect time */ ]), ); ``` Actually, the nonce is broker-issued at hello time, so the signature needs to be produced fresh per WS-connect. Simpler approach: the `POST /v1/sessions/register` body carries the *member secret key* (or a derived signing capability) so the daemon can sign nonces on behalf of the session. That's a key-leak risk. Better: register carries a **pre-signed attestation** good for a TTL window: ``` register body adds: parent_attestation: { session_pubkey: hex, parent_member_pubkey: hex, expires_at: ISO, signature: ed25519_sign(member_priv, "claudemesh-session-attest/" || session_pubkey || "/" || expires_at), } ``` Daemon presents this attestation in `session_hello`; broker validates expiry and signature, then issues a nonce challenge that the daemon can satisfy with the session keypair (which IS held by the daemon for the lifetime of the registration). Two-stage: parent vouches the session; session signs the nonce. ### Registry persistence For now, in-memory only (matching 1.29.0). Daemon restart drops all session WSes; launched `claude` processes are responsible for re-registering on next CLI invocation. Acceptable v1 behaviour; revisit when sqlite persistence lands for the registry. ## Wire changes ### Broker - New `session_hello` message type (additive; existing `hello` for member auth unchanged). - `presence` row schema unchanged — `member_id` still required, but `session_pubkey` differs from member's stable pubkey. - Validate `parent_attestation.expires_at <= now() + 24h` to bound attestation reuse. ### Daemon - New `SessionBrokerClient` factory — wraps `BrokerClient` with session-mode hello. - `Map` alongside the existing `Map`. - IPC routes: - `POST /v1/sessions/register` — extend body schema with `parent_attestation`. - `DELETE /v1/sessions/:token` — close the session WS first, then drop registry entry. ### CLI (`claudemesh launch`) - Mint session keypair (today only writes the session token; need to add ed25519 keypair generation per launch and write the privkey alongside the token). - Sign `parent_attestation` with the member key from the joined-mesh config. - POST register with both the new keypair and the attestation. ## LoC estimate - Daemon `SessionBrokerClient` + registry hook: ~120 LoC. - IPC route schema extension + validation: ~40 LoC. - Broker `session_hello` handler + tests: ~140 LoC. - CLI `claudemesh launch` keypair + attestation: ~60 LoC. - Tests + smoke: ~80 LoC. Total: **~440 LoC** across CLI + daemon + broker. ## Risks | Risk | Mitigation | |---|---| | Member private key never leaves the user's machine, but the **attestation** (signed token) can be replayed within its TTL. | TTL bound 24h; refresh on launch; revocation path = drop the parent member's mesh enrollment (nuclear, but works). | | Cascading WS connections — N launches = N+1 broker WSes per user. | Acceptable up to 10-20 concurrent sessions; if it ever becomes a problem, multiplex per-session at the protocol level (one WS, multiple presence rows). Out of scope for v1. | | Daemon restart kills all session WSes — `peer list` from inside a launched session sees the remaining 5 peers but not its own siblings until they re-register. | Same as 1.29.0 registry. The registry could persist to sqlite later; for v1, accepted. | | Broker schema cost: every new presence row has a different `session_pubkey`, growing the table faster. | Already accepted — broker prunes disconnected rows on a 30-day window. Per-session keys triple the row count at peak but stay within the prune budget. | ## Compatibility - **Older brokers** can't validate `session_hello`. Sessions will attempt the new hello, get back `unknown_message_type`, and fall back to the existing member-keyed hello (no per-session presence, but everything still works as 1.28.0). Add the broker change first, let it deploy, then ship the CLI side. - **Older CLIs** continue to work unchanged — they don't open per-session WSes. They appear as ephemeral cold-path rows just like today, and lose the symmetric-visibility property between siblings. - **Backward visible:** users on 1.30.0+ on the same mesh as users on ≤1.29.x will see the older users as one row (their daemon) instead of one row per session. Acceptable — opt-in to the new visibility by upgrading. ## Sequencing 1. **Broker change ships first.** Add `session_hello` handler, deploy, bake for ~24h. No CLI behaviour change yet. 2. **Daemon `SessionBrokerClient` ships next** behind a feature flag (`CLAUDEMESH_SESSION_PRESENCE=1`). Manually test with two launched sessions in the same cwd; verify both see each other. 3. **CLI keypair-mint + attestation in `launch.ts` ships last**, behind the same flag. 4. Flip the flag default in 1.30.0 release; document rollback via env. ## Verification End-to-end smoke (paste into 1.30.0's CHANGELOG): ``` $ # In two different shells, both cd ~/Desktop/foo: $ claudemesh launch --name SessionA -y # shell 1 $ claudemesh launch --name SessionB -y # shell 2 $ $ # In a third shell: $ claudemesh peer list --json --mesh foo | jq '.[] | {n: .displayName, c: .cwd}' { "n": "SessionA", "c": "/.../foo" } ← persistent, not query-induced { "n": "SessionB", "c": "/.../foo" } $ $ # In SessionA's shell: $ claudemesh peer list --mesh foo should include SessionB. $ $ # Kill SessionB (Ctrl-C in shell 2). Wait <30s. $ claudemesh peer list --mesh foo should NOT include SessionB (reaper closed its WS). ``` ## Open questions - Should the per-session WS also drain *its own* outbox subset, or stay presence-only? Recommend presence-only for v1 — keeps state machines simple, daemon's member-keyed WS handles all sends. Can be revisited when per-session policy DSL ships. - Should the parent attestation be revocable mid-session? Could add an IPC route on the daemon. Out of scope for v1; revoke = drop the whole member enrollment.