Both sides now actively detect half-dead WS connections instead of
waiting for kernel TCP keepalive (~2hrs default on Linux). Bug user
reported: "claudemesh peer list" shows zero peers despite running
sessions, because NAT/CGNAT silently dropped the WS flow but neither
side noticed.
Broker (apps/broker/src/index.ts):
- Add lastPongAt to PeerConn, populate at connections.set sites,
bump in ws.on("pong").
- 30s ping loop now also terminates conns whose pong is >75s stale.
ws.terminate() fires the close handler → existing peer_left path.
Daemon (apps/cli/src/daemon/ws-lifecycle.ts):
- Add idle watchdog at 30s cadence, started after hello-ack.
- Bumps lastActivity on incoming message, ping, and pong frames.
- Sends sock.ping() if recent activity, terminates if idle >75s.
- Watchdog cleared on close handler + explicit close().
CLI 1.34.15 → 1.34.16. Broker stays 0.1.0 (deploys from main).
Spec: .artifacts/specs/2026-05-05-continuous-presence.md (full lease
model + resume token, this commit ships only the watchdogs — first
of four progressive layers).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Five info-level log points across the WS lifecycle helper:
ws_open_attempt / ws_open_ok / ws_hello_sent / ws_hello_acked /
ws_closed (with status + close code/reason).
Surfaced during M1 smoke testing — without these the only visible
signal was "presence row missing on broker," which made it hard to
distinguish "WS never opened" / "opened but hello rejected" /
"acked then closed by broker."
Both clients prefix the helper-emitted msg ("session_broker_*",
"broker_*") so log greps stay clean per role.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Foundational cleanups before agentic-comms architecture work
(.artifacts/specs/2026-05-04-agentic-comms-architecture-v2.md).
All behavior-preserving.
1. Extract `connectWsWithBackoff` into apps/cli/src/daemon/ws-lifecycle.ts.
Both DaemonBrokerClient and SessionBrokerClient now share one
lifecycle implementation (connect, hello-handshake, ack-timeout,
close + backoff reconnect). Each client provides its own buildHello
/ isHelloAck / onMessage hooks and keeps its own RPC bookkeeping
(pendingAcks, peerListResolvers, onPush). Composition over
inheritance per Codex's review; no protocol shape changes.
2. Drop daemon-WS ephemeral session pubkey. DaemonBrokerClient no
longer mints + sends a per-reconnect ephemeral keypair in its
hello. Session-targeted DMs land on SessionBrokerClient since
1.32.1, not the member-keyed daemon-WS, so the field was
vestigial. Send-encrypt path now signs DMs with the stable mesh
member secret. handleBrokerPush invocations from daemon-WS only
pass the member secret — session decryption is the session-WS's
job.
3. Role-aware peer list. `peer list` now hides peers whose
broker-emitted `role` is `'control-plane'`. `--all` opts back in.
JSON output emits `role` at top level. Older brokers that don't
emit role yet default to 'session', so legacy peer rows stay
visible without the broker-side change shipped first. Replaces
the prior `peerType === 'claudemesh-daemon'` channel-name hack.
Typecheck + tests + build all green.