Three operability fixes for users running the daemon under launchd or
systemd.
PID-watcher autoclean
=====================
The session reaper already dropped registry entries with dead pids on
a 30s loop, but had two real-world gaps:
- 30s sweep let stale presence linger on the broker for half a minute
- bare process.kill(pid, 0) trusts a recycled pid; a registry entry
could survive its real owner's death whenever the OS rolled the
pid number forward to a new program
Process-exit IPC from claude-code is best-effort and skipped on
SIGKILL / OOM / segfault / panic, so it cannot replace the sweep.
Fix:
- New process-info.ts captures opaque per-process start-times via
ps -o lstart= (works on macOS and Linux, ~1 ms per call)
- registerSession stores the start-time alongside the pid
- reapDead drops entries when pid is dead OR start-time changed
since register
- Sweep cadence 30s -> 5s
- Best-effort fallback to bare liveness when start-time capture
fails at register time
Registry hooks already close the per-session broker WS on
deregister, so peer list rebuilds within one sweep of any session
exit.
Service-managed daemon: no more "spawn failed" false alarms
===========================================================
After claudemesh install (which writes a launchd plist or systemd
unit with KeepAlive=true), users routinely saw
[claudemesh] warn daemon spawn failed: socket did not appear
within 3000ms
even when the daemon was running fine. Two contributing causes:
1. Probe timeout was 800ms — the first IPC after a launchd-driven
restart can take longer (SQLite migration + broker WS opens) and
tripped it. Bumped to 2500ms.
2. On a failed probe the CLI tried its own detached spawn, which
collided with launchd's KeepAlive restart cycle (singleton lock
fails, child exits) and we'd then time out polling for a socket
that was actually about to come up.
Now: when the launchd plist or systemd unit exists, the CLI does not
attempt a spawn. It waits up to 8s for the OS-managed unit to bring
the socket up. New service-not-ready state distinguishes "OS hasn't
restarted it yet" from "we tried to spawn and it failed".
Install verifies broker connectivity, not just process start
============================================================
Previously install ended once launchctl reported the unit loaded —
a daemon that boots but cannot reach the broker (blocked :443,
expired TLS, DNS, broker outage) only surfaced on the user's first
peer list or send.
/v1/health now includes per-mesh broker WS state. install polls it
for up to 15s after service boot and prints either "broker
connected (mesh=...)" or a warning naming the meshes still in
connecting state, with a hint at common causes.
The verification is best-effort and does not fail the install — it
just surfaces the issue early.
Tests
=====
4 new vitest cases cover the reaper paths: dead pid, live pid plus
matching start-time, live pid plus mismatched start-time (PID
reuse), and the no-start-time fallback. 83 of 83 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
whoami --json exits with EXIT.AUTH_FAILED (=2) when not signed in.
The JSON output is the contract under test, valid regardless of exit
code — execSync was throwing on exit 2 so the assertion never ran.
Switch to spawnSync, accept {0,2}, parse stdout independently.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
daemon-side half of 1.30.0 per-session broker presence. behind
CLAUDEMESH_SESSION_PRESENCE=1 (default OFF this cycle so the broker
side bakes before the flag flips).
- SessionBrokerClient (apps/cli/src/daemon/session-broker.ts) — slim
WS that opens with session_hello, presence-only, no outbox drain.
- session-hello-sig.ts — signParentAttestation (12h TTL, ≤24h cap) and
signSessionHello, mirroring the broker canonical formats.
- session-registry: optional presence field on SessionInfo;
setRegistryHooks for onRegister/onDeregister callbacks. Hook errors
are caught so they can never throttle registry mutations.
- IPC POST /v1/sessions/register accepts the presence material under
body.presence (session_pubkey, session_secret_key, parent_attestation).
Older callers without it stay scoped + supported.
- run.ts wires the registry hooks: on register, opens a SessionBrokerClient
for the matching mesh; on deregister (explicit or reaper), closes it.
Shutdown closes any remaining session WSes before the IPC server.
8 new unit tests cover registry lifecycle (replace/throw/presence
roundtrip) and signature canonical-bytes verification against libsodium.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- apps/cli/ is now the canonical CLI (was apps/cli-v2/).
- apps/cli/ legacy v0 archived as branch 'legacy-cli-archive' and tag
'cli-v0-legacy-final' before deletion; git history preserves it too.
- .github/workflows/release-cli.yml paths updated.
- pnpm-lock.yaml regenerated.
Broker-side peer-grant enforcement (spec: 2026-04-15-per-peer-capabilities):
- 0020_peer-grants.sql adds peer_grants jsonb + GIN index on mesh.member.
- handleSend in broker fetches recipient grant maps once per send, drops
messages silently when sender lacks the required capability.
- POST /cli/mesh/:slug/grants to update from CLI; broker_messages_dropped_by_grant_total metric.
- CLI grant/revoke/block now mirror to broker via syncToBroker.
Auto-migrate on broker startup:
- apps/broker/src/migrate.ts runs drizzle migrate with pg_advisory_lock
before the HTTP server binds. Exits non-zero on failure so Coolify
healthcheck fails closed.
- Dockerfile copies packages/db/migrations into /app/migrations.
- postgres 3.4.5 added as direct broker dep.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>