Files
claudemesh/.artifacts/specs/2026-05-04-per-session-presence.md
Alejandro Gutiérrez 364178d95b docs(spec): per-session broker presence (queued for 1.30.0)
records the design for daemon-multiplexed broker presence — every
launched claude session gets its own long-lived presence row owned
by the daemon, identified by a per-launch ephemeral keypair vouched
by the member's stable keypair.

resolves the "two sibling sessions can't see each other in peer list"
gap that surfaced when the bridge tier was deleted in 1.28.0. covers
state machine, broker session_hello handler, parent-attestation
signing, ipc route extension, sequencing (broker first, daemon
flagged, cli third), compat with older builds, and verification
smoke.

~440 loc estimate across cli + daemon + broker. queued for 1.30.0
alongside the launch-wizard refactor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:47:31 +01:00

11 KiB

Per-session broker presence — daemon-multiplexed

Status: spec, queued for 1.30.0 (alongside launch-wizard refactor). Owner: alezmad Author: Claude (Sprint A planning, 2026-05-04) Related: 2026-05-04-v2-roadmap-completion.md (Sprint A overview), 1.29.0 session-registry CHANGELOG entry.

Problem

After 1.28.0 dropped the bridge tier, launched claude sessions have no persistent broker presence. Only the daemon does.

Concretely: two claudemesh launch sessions in the same cwd, querying peer list 2 s apart, never see each other. Each claudemesh peer list opens a short-lived cold-path WS that creates a presence row for the duration of the query and tears it down. The "this session" row everyone sees in their own snapshot is created by the snapshot itself; sibling sessions' queries miss it because their WS-lifetimes don't overlap.

Confirmed empirically (2026-05-04, same-cwd ECIJA-Intranet test):

Snapshot timestamp self pubkey self connectedAt
Session A 11:42:37Z 61d96106cb499208 11:42:38Z (= query time)
Session B 11:42:39Z ce77188aba02827d 11:42:38Z (= query time)

Each saw 5 long-lived peers (the daemon and unrelated other sessions) plus its own ephemeral row. Neither saw the other.

Goal

Every launched claude session has a long-lived broker presence row owned by the daemon, identified by the session's per-launch keypair. Siblings see each other in peer list immediately and continuously, not as snapshot artifacts.

Non-goals

  • Cross-machine session sync (waiting on 2.0.0 HKDF identity).
  • Replacing the daemon's own presence row — the daemon stays as a separate row for "the user on this machine, no specific session."
  • Persistence of the session-presence link across daemon restarts — daemon restart can be allowed to require launched sessions to re-register (same compromise as the in-memory session registry from 1.29.0).

Design

State machine

The 1.29.0 session registry already tracks Map<token, SessionInfo> inside the daemon. Extend it to own a per-session broker connection.

session lifecycle:
  POST /v1/sessions/register
       → registry.set(token, info)
       → daemon.openSessionWs(info)         ← NEW
       → broker creates presence row owned by session.pubkey

  DELETE /v1/sessions/:token
       → registry.delete(token)
       → daemon.closeSessionWs(token)       ← NEW
       → broker marks presence.disconnectedAt = now()

  reaper (30 s tick): pid dead?
       → registry.delete(token)
       → daemon.closeSessionWs(token)

Daemon-side: per-session BrokerClient

Today the daemon holds Map<meshSlug, DaemonBrokerClient> (one WS per attached mesh). Add a parallel Map<token, SessionBrokerClient> for the per-launch ephemeral connections.

SessionBrokerClient is the existing BrokerClient reused, configured with the session's per-launch keypair instead of the member's stable keypair. It registers presence (presence_join) and stays connected until closeSessionWs(token) fires. It does not drain the outbox — that's the member-keypair DaemonBrokerClient's job. It only carries presence + receives DMs targeted at the session pubkey.

Broker-side: parent-vouched presence auth

Today's broker accepts hello-sig auth where:

  • Caller signs the broker's nonce with their mesh_member keypair.
  • Broker looks up mesh_member.peer_pubkey == sig.pubkey.

For per-session keypairs, the session pubkey is not in mesh_member — it's freshly generated by claudemesh launch. We need a new attestation flow:

hello {
  type: "session_hello",
  session_pubkey: <fresh keypair>,
  parent_member_pubkey: <member keypair from config>,
  display_name, cwd, role, groups,
  parent_signature: ed25519_sign(member_priv,
                                 "claudemesh-session/" || session_pubkey || "/" || nonce),
  nonce_challenge: <broker nonce>,
}

Broker validates:

  1. parent_member_pubkey exists in mesh.member for the target mesh.
  2. parent_signature validates against parent_member_pubkey over the canonical message above.
  3. Broker inserts a presence row keyed on session_pubkey but member_id pointing at the parent member's mesh.member.id.

This is the OAuth-style refresh-vs-access pattern: the parent member key vouches "this ephemeral session pubkey belongs to me." The broker binds the row to the parent member but uses the session pubkey for routing (so DMs targeted at the session pubkey land at this WS).

CLI-side: launch.ts produces the parent signature

claudemesh launch already mints the session keypair and writes the session-token file. Extend it to also produce a parent_signature that the daemon can present when opening the session WS:

const sessionPubkey = sessionKeypair.publicKey;
const parentSig = ed25519_sign(
  mesh.secretKey,
  Buffer.concat([
    Buffer.from("claudemesh-session/"),
    sessionPubkey,
    Buffer.from("/"),
    /* nonce comes from broker — handled at WS-connect time */
  ]),
);

Actually, the nonce is broker-issued at hello time, so the signature needs to be produced fresh per WS-connect. Simpler approach: the POST /v1/sessions/register body carries the member secret key (or a derived signing capability) so the daemon can sign nonces on behalf of the session.

That's a key-leak risk. Better: register carries a pre-signed attestation good for a TTL window:

register body adds:
  parent_attestation: {
    session_pubkey: hex,
    parent_member_pubkey: hex,
    expires_at: ISO,
    signature: ed25519_sign(member_priv,
                             "claudemesh-session-attest/" ||
                             session_pubkey || "/" ||
                             expires_at),
  }

Daemon presents this attestation in session_hello; broker validates expiry and signature, then issues a nonce challenge that the daemon can satisfy with the session keypair (which IS held by the daemon for the lifetime of the registration). Two-stage: parent vouches the session; session signs the nonce.

Registry persistence

For now, in-memory only (matching 1.29.0). Daemon restart drops all session WSes; launched claude processes are responsible for re-registering on next CLI invocation. Acceptable v1 behaviour; revisit when sqlite persistence lands for the registry.

Wire changes

Broker

  • New session_hello message type (additive; existing hello for member auth unchanged).
  • presence row schema unchanged — member_id still required, but session_pubkey differs from member's stable pubkey.
  • Validate parent_attestation.expires_at <= now() + 24h to bound attestation reuse.

Daemon

  • New SessionBrokerClient factory — wraps BrokerClient with session-mode hello.
  • Map<token, SessionBrokerClient> alongside the existing Map<slug, DaemonBrokerClient>.
  • IPC routes:
    • POST /v1/sessions/register — extend body schema with parent_attestation.
    • DELETE /v1/sessions/:token — close the session WS first, then drop registry entry.

CLI (claudemesh launch)

  • Mint session keypair (today only writes the session token; need to add ed25519 keypair generation per launch and write the privkey alongside the token).
  • Sign parent_attestation with the member key from the joined-mesh config.
  • POST register with both the new keypair and the attestation.

LoC estimate

  • Daemon SessionBrokerClient + registry hook: ~120 LoC.
  • IPC route schema extension + validation: ~40 LoC.
  • Broker session_hello handler + tests: ~140 LoC.
  • CLI claudemesh launch keypair + attestation: ~60 LoC.
  • Tests + smoke: ~80 LoC.

Total: ~440 LoC across CLI + daemon + broker.

Risks

Risk Mitigation
Member private key never leaves the user's machine, but the attestation (signed token) can be replayed within its TTL. TTL bound 24h; refresh on launch; revocation path = drop the parent member's mesh enrollment (nuclear, but works).
Cascading WS connections — N launches = N+1 broker WSes per user. Acceptable up to 10-20 concurrent sessions; if it ever becomes a problem, multiplex per-session at the protocol level (one WS, multiple presence rows). Out of scope for v1.
Daemon restart kills all session WSes — peer list from inside a launched session sees the remaining 5 peers but not its own siblings until they re-register. Same as 1.29.0 registry. The registry could persist to sqlite later; for v1, accepted.
Broker schema cost: every new presence row has a different session_pubkey, growing the table faster. Already accepted — broker prunes disconnected rows on a 30-day window. Per-session keys triple the row count at peak but stay within the prune budget.

Compatibility

  • Older brokers can't validate session_hello. Sessions will attempt the new hello, get back unknown_message_type, and fall back to the existing member-keyed hello (no per-session presence, but everything still works as 1.28.0). Add the broker change first, let it deploy, then ship the CLI side.
  • Older CLIs continue to work unchanged — they don't open per-session WSes. They appear as ephemeral cold-path rows just like today, and lose the symmetric-visibility property between siblings.
  • Backward visible: users on 1.30.0+ on the same mesh as users on ≤1.29.x will see the older users as one row (their daemon) instead of one row per session. Acceptable — opt-in to the new visibility by upgrading.

Sequencing

  1. Broker change ships first. Add session_hello handler, deploy, bake for ~24h. No CLI behaviour change yet.
  2. Daemon SessionBrokerClient ships next behind a feature flag (CLAUDEMESH_SESSION_PRESENCE=1). Manually test with two launched sessions in the same cwd; verify both see each other.
  3. CLI keypair-mint + attestation in launch.ts ships last, behind the same flag.
  4. Flip the flag default in 1.30.0 release; document rollback via env.

Verification

End-to-end smoke (paste into 1.30.0's CHANGELOG):

$ # In two different shells, both cd ~/Desktop/foo:
$ claudemesh launch --name SessionA -y    # shell 1
$ claudemesh launch --name SessionB -y    # shell 2
$
$ # In a third shell:
$ claudemesh peer list --json --mesh foo | jq '.[] | {n: .displayName, c: .cwd}'
{ "n": "SessionA", "c": "/.../foo" }    ← persistent, not query-induced
{ "n": "SessionB", "c": "/.../foo" }
$
$ # In SessionA's shell:
$ claudemesh peer list --mesh foo
should include SessionB.
$
$ # Kill SessionB (Ctrl-C in shell 2). Wait <30s.
$ claudemesh peer list --mesh foo
should NOT include SessionB (reaper closed its WS).

Open questions

  • Should the per-session WS also drain its own outbox subset, or stay presence-only? Recommend presence-only for v1 — keeps state machines simple, daemon's member-keyed WS handles all sends. Can be revisited when per-session policy DSL ships.
  • Should the parent attestation be revocable mid-session? Could add an IPC route on the daemon. Out of scope for v1; revoke = drop the whole member enrollment.