records the design for daemon-multiplexed broker presence — every launched claude session gets its own long-lived presence row owned by the daemon, identified by a per-launch ephemeral keypair vouched by the member's stable keypair. resolves the "two sibling sessions can't see each other in peer list" gap that surfaced when the bridge tier was deleted in 1.28.0. covers state machine, broker session_hello handler, parent-attestation signing, ipc route extension, sequencing (broker first, daemon flagged, cli third), compat with older builds, and verification smoke. ~440 loc estimate across cli + daemon + broker. queued for 1.30.0 alongside the launch-wizard refactor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
11 KiB
Per-session broker presence — daemon-multiplexed
Status: spec, queued for 1.30.0 (alongside launch-wizard refactor).
Owner: alezmad
Author: Claude (Sprint A planning, 2026-05-04)
Related: 2026-05-04-v2-roadmap-completion.md (Sprint A overview),
1.29.0 session-registry CHANGELOG entry.
Problem
After 1.28.0 dropped the bridge tier, launched claude sessions have
no persistent broker presence. Only the daemon does.
Concretely: two claudemesh launch sessions in the same cwd, querying
peer list 2 s apart, never see each other. Each claudemesh peer list opens a short-lived cold-path WS that creates a presence row
for the duration of the query and tears it down. The "this session"
row everyone sees in their own snapshot is created by the snapshot
itself; sibling sessions' queries miss it because their WS-lifetimes
don't overlap.
Confirmed empirically (2026-05-04, same-cwd ECIJA-Intranet test):
| Snapshot | timestamp | self pubkey | self connectedAt |
|---|---|---|---|
| Session A | 11:42:37Z | 61d96106cb499208 |
11:42:38Z (= query time) |
| Session B | 11:42:39Z | ce77188aba02827d |
11:42:38Z (= query time) |
Each saw 5 long-lived peers (the daemon and unrelated other sessions) plus its own ephemeral row. Neither saw the other.
Goal
Every launched claude session has a long-lived broker presence row
owned by the daemon, identified by the session's per-launch
keypair. Siblings see each other in peer list immediately and
continuously, not as snapshot artifacts.
Non-goals
- Cross-machine session sync (waiting on 2.0.0 HKDF identity).
- Replacing the daemon's own presence row — the daemon stays as a separate row for "the user on this machine, no specific session."
- Persistence of the session-presence link across daemon restarts — daemon restart can be allowed to require launched sessions to re-register (same compromise as the in-memory session registry from 1.29.0).
Design
State machine
The 1.29.0 session registry already tracks Map<token, SessionInfo>
inside the daemon. Extend it to own a per-session broker connection.
session lifecycle:
POST /v1/sessions/register
→ registry.set(token, info)
→ daemon.openSessionWs(info) ← NEW
→ broker creates presence row owned by session.pubkey
DELETE /v1/sessions/:token
→ registry.delete(token)
→ daemon.closeSessionWs(token) ← NEW
→ broker marks presence.disconnectedAt = now()
reaper (30 s tick): pid dead?
→ registry.delete(token)
→ daemon.closeSessionWs(token)
Daemon-side: per-session BrokerClient
Today the daemon holds Map<meshSlug, DaemonBrokerClient> (one WS per
attached mesh). Add a parallel Map<token, SessionBrokerClient> for
the per-launch ephemeral connections.
SessionBrokerClient is the existing BrokerClient reused, configured
with the session's per-launch keypair instead of the member's stable
keypair. It registers presence (presence_join) and stays connected
until closeSessionWs(token) fires. It does not drain the outbox
— that's the member-keypair DaemonBrokerClient's job. It only carries
presence + receives DMs targeted at the session pubkey.
Broker-side: parent-vouched presence auth
Today's broker accepts hello-sig auth where:
- Caller signs the broker's nonce with their
mesh_memberkeypair. - Broker looks up
mesh_member.peer_pubkey == sig.pubkey.
For per-session keypairs, the session pubkey is not in mesh_member
— it's freshly generated by claudemesh launch. We need a new
attestation flow:
hello {
type: "session_hello",
session_pubkey: <fresh keypair>,
parent_member_pubkey: <member keypair from config>,
display_name, cwd, role, groups,
parent_signature: ed25519_sign(member_priv,
"claudemesh-session/" || session_pubkey || "/" || nonce),
nonce_challenge: <broker nonce>,
}
Broker validates:
parent_member_pubkeyexists inmesh.memberfor the target mesh.parent_signaturevalidates againstparent_member_pubkeyover the canonical message above.- Broker inserts a presence row keyed on
session_pubkeybutmember_idpointing at the parent member'smesh.member.id.
This is the OAuth-style refresh-vs-access pattern: the parent member key vouches "this ephemeral session pubkey belongs to me." The broker binds the row to the parent member but uses the session pubkey for routing (so DMs targeted at the session pubkey land at this WS).
CLI-side: launch.ts produces the parent signature
claudemesh launch already mints the session keypair and writes the
session-token file. Extend it to also produce a parent_signature
that the daemon can present when opening the session WS:
const sessionPubkey = sessionKeypair.publicKey;
const parentSig = ed25519_sign(
mesh.secretKey,
Buffer.concat([
Buffer.from("claudemesh-session/"),
sessionPubkey,
Buffer.from("/"),
/* nonce comes from broker — handled at WS-connect time */
]),
);
Actually, the nonce is broker-issued at hello time, so the signature
needs to be produced fresh per WS-connect. Simpler approach: the
POST /v1/sessions/register body carries the member secret key (or
a derived signing capability) so the daemon can sign nonces on behalf
of the session.
That's a key-leak risk. Better: register carries a pre-signed attestation good for a TTL window:
register body adds:
parent_attestation: {
session_pubkey: hex,
parent_member_pubkey: hex,
expires_at: ISO,
signature: ed25519_sign(member_priv,
"claudemesh-session-attest/" ||
session_pubkey || "/" ||
expires_at),
}
Daemon presents this attestation in session_hello; broker validates
expiry and signature, then issues a nonce challenge that the daemon
can satisfy with the session keypair (which IS held by the daemon
for the lifetime of the registration). Two-stage: parent vouches the
session; session signs the nonce.
Registry persistence
For now, in-memory only (matching 1.29.0). Daemon restart drops all
session WSes; launched claude processes are responsible for
re-registering on next CLI invocation. Acceptable v1 behaviour;
revisit when sqlite persistence lands for the registry.
Wire changes
Broker
- New
session_hellomessage type (additive; existinghellofor member auth unchanged). presencerow schema unchanged —member_idstill required, butsession_pubkeydiffers from member's stable pubkey.- Validate
parent_attestation.expires_at <= now() + 24hto bound attestation reuse.
Daemon
- New
SessionBrokerClientfactory — wrapsBrokerClientwith session-mode hello. Map<token, SessionBrokerClient>alongside the existingMap<slug, DaemonBrokerClient>.- IPC routes:
POST /v1/sessions/register— extend body schema withparent_attestation.DELETE /v1/sessions/:token— close the session WS first, then drop registry entry.
CLI (claudemesh launch)
- Mint session keypair (today only writes the session token; need to add ed25519 keypair generation per launch and write the privkey alongside the token).
- Sign
parent_attestationwith the member key from the joined-mesh config. - POST register with both the new keypair and the attestation.
LoC estimate
- Daemon
SessionBrokerClient+ registry hook: ~120 LoC. - IPC route schema extension + validation: ~40 LoC.
- Broker
session_hellohandler + tests: ~140 LoC. - CLI
claudemesh launchkeypair + attestation: ~60 LoC. - Tests + smoke: ~80 LoC.
Total: ~440 LoC across CLI + daemon + broker.
Risks
| Risk | Mitigation |
|---|---|
| Member private key never leaves the user's machine, but the attestation (signed token) can be replayed within its TTL. | TTL bound 24h; refresh on launch; revocation path = drop the parent member's mesh enrollment (nuclear, but works). |
| Cascading WS connections — N launches = N+1 broker WSes per user. | Acceptable up to 10-20 concurrent sessions; if it ever becomes a problem, multiplex per-session at the protocol level (one WS, multiple presence rows). Out of scope for v1. |
Daemon restart kills all session WSes — peer list from inside a launched session sees the remaining 5 peers but not its own siblings until they re-register. |
Same as 1.29.0 registry. The registry could persist to sqlite later; for v1, accepted. |
Broker schema cost: every new presence row has a different session_pubkey, growing the table faster. |
Already accepted — broker prunes disconnected rows on a 30-day window. Per-session keys triple the row count at peak but stay within the prune budget. |
Compatibility
- Older brokers can't validate
session_hello. Sessions will attempt the new hello, get backunknown_message_type, and fall back to the existing member-keyed hello (no per-session presence, but everything still works as 1.28.0). Add the broker change first, let it deploy, then ship the CLI side. - Older CLIs continue to work unchanged — they don't open per-session WSes. They appear as ephemeral cold-path rows just like today, and lose the symmetric-visibility property between siblings.
- Backward visible: users on 1.30.0+ on the same mesh as users on ≤1.29.x will see the older users as one row (their daemon) instead of one row per session. Acceptable — opt-in to the new visibility by upgrading.
Sequencing
- Broker change ships first. Add
session_hellohandler, deploy, bake for ~24h. No CLI behaviour change yet. - Daemon
SessionBrokerClientships next behind a feature flag (CLAUDEMESH_SESSION_PRESENCE=1). Manually test with two launched sessions in the same cwd; verify both see each other. - CLI keypair-mint + attestation in
launch.tsships last, behind the same flag. - Flip the flag default in 1.30.0 release; document rollback via env.
Verification
End-to-end smoke (paste into 1.30.0's CHANGELOG):
$ # In two different shells, both cd ~/Desktop/foo:
$ claudemesh launch --name SessionA -y # shell 1
$ claudemesh launch --name SessionB -y # shell 2
$
$ # In a third shell:
$ claudemesh peer list --json --mesh foo | jq '.[] | {n: .displayName, c: .cwd}'
{ "n": "SessionA", "c": "/.../foo" } ← persistent, not query-induced
{ "n": "SessionB", "c": "/.../foo" }
$
$ # In SessionA's shell:
$ claudemesh peer list --mesh foo
should include SessionB.
$
$ # Kill SessionB (Ctrl-C in shell 2). Wait <30s.
$ claudemesh peer list --mesh foo
should NOT include SessionB (reaper closed its WS).
Open questions
- Should the per-session WS also drain its own outbox subset, or stay presence-only? Recommend presence-only for v1 — keeps state machines simple, daemon's member-keyed WS handles all sends. Can be revisited when per-session policy DSL ships.
- Should the parent attestation be revocable mid-session? Could add an IPC route on the daemon. Out of scope for v1; revoke = drop the whole member enrollment.