feat(cli,broker): stable session identity — fix ghost peers + lost DMs (1.35.0)
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled

Session identity is now anchored on Claude Code's session UUID instead of a
fresh random keypair per launch. The ed25519 session keypair is generated
once per (mesh, session UUID) and persisted under
~/.claudemesh/sessions/<mesh>/<uuid>.json, so relaunching or --resume-ing the
same session reuses the same sessionPubkey.

Why: a DM is sealed (crypto_box) to the recipient's sessionPubkey. With
ephemeral per-launch keys, the pubkey rotated on every relaunch, so queued
messages became undecryptable AND the old presence lingered as a same-name
ghost that won queued-DM claim races. Reconnecting could not recover the
peer because it minted yet another key. On --resume the CLI also registered
a throwaway random id unrelated to the resumed session, so the broker never
recognized the returning peer.

CLI (launch.ts):
- resolve the stable UUID for all paths: fresh mints + forces via
  --session-id; --resume V registers V; --continue resolves the most-recent
  session UUID from ~/.claude/projects/<cwd>.
- use loadOrCreateSessionKeypair(mesh, uuid) instead of generateKeypair().

CLI (daemon/run.ts):
- onRegister closes any prior SessionBrokerClient holding the same pubkey
  under a different token (the leaked-WS ghost).

Broker (handleSessionHello):
- reattach by sessionPubkey regardless of lease state (online or grace),
  closing the stale socket — enforces one live presence per session pubkey,
  killing the duplicate and draining queued DMs on return.

Trade-off: session secret keys now persist on disk (the member key already
does); SPEC.md updated to reflect the stable-identity model. Older CLIs
remain compatible (they keep using ephemeral keys).

New: keypair-store.ts + 7 unit tests. Full CLI suite: 114/114 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
Alejandro Gutiérrez
2026-06-02 12:59:36 +01:00
parent 589d050f81
commit 2b88784005
7 changed files with 337 additions and 8 deletions

View File

@@ -2191,23 +2191,38 @@ async function handleSessionHello(
// session leave.
for (const [pid, oldConn] of connections) {
if (oldConn.meshId !== hello.meshId) continue;
if (oldConn.leaseState !== "offline") continue;
if (oldConn.sessionPubkey !== hello.sessionPubkey) continue;
// Same sessionPubkey = same logical session. The CLI now anchors the
// session keypair on Claude Code's session UUID and persists it, so a
// matching pubkey is always the same peer relaunching/resuming — never
// a coincidental collision. Reattach whether the old lease is in its
// 90s grace window OR still nominally "online" (a duplicate/relaunch
// that raced ahead of the old socket's close). The new WS is
// authoritative: cancel any eviction timer, close the stale socket if
// it differs, swap in the new WS, restore online. This is the "one
// presence per session pubkey" invariant — it kills the same-name
// ghost that used to win queued-DM claim races.
const wasState = oldConn.leaseState;
if (oldConn.evictionTimer) {
clearTimeout(oldConn.evictionTimer);
oldConn.evictionTimer = null;
}
if (oldConn.ws !== ws) {
try { oldConn.ws.close(1000, "session_replaced"); } catch { /* already dead */ }
}
oldConn.ws = ws;
oldConn.leaseState = "online";
oldConn.leaseUntil = 0;
oldConn.lastPongAt = Date.now();
// Refresh mutable fields from the new hello.
oldConn.sessionId = hello.sessionId;
oldConn.cwd = hello.cwd;
if (hello.displayName) oldConn.displayName = hello.displayName;
log.info("session_hello reattach (lease)", {
log.info("session_hello reattach", {
presence_id: pid,
session_pubkey: hello.sessionPubkey.slice(0, 12),
was: wasState,
});
void restorePresence(pid);
void maybePushQueuedMessages(pid);