feat(cli): self-healing daemon lifecycle

every daemon-routed verb now probes the ipc socket via /v1/version
(instead of trusting existsSync), cleans up stale sock/pid files left
by a crashed daemon, and auto-spawns a detached `claudemesh daemon up`
under a file-lock when the daemon is down. polls for liveness up to a
budget (3s for ad-hoc verbs, 10s for launch) before falling through to
cold path.

includes a per-process result cache (script doing 50 sends pays spawn
cost at most once), a 30s recently-failed marker (no thundering-herd
retries on crash-loop), a spawn-lock (concurrent invocations share one
attempt), and a recursion guard env var (nested cli calls inside the
daemon process skip auto-spawn).

fixes the stale-socket bug where launch's ensureDaemonRunning returned
early on a left-over socket file from a crashed daemon, silently
breaking the spawned claude session's mcp shim.

deferred to 1.28.0: --strict / --no-daemon flags, lazy-loading of
cold-path code, per-session ipc tokens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Alejandro Gutiérrez
2026-05-04 11:17:32 +01:00
parent 8a5469a5df
commit 2b6cf2c14b
6 changed files with 396 additions and 49 deletions

View File

@@ -1,5 +1,54 @@
# Changelog
## 1.27.3 (2026-05-04) — self-healing daemon lifecycle
The CLI now auto-recovers from a dead daemon on every invocation
instead of silently mis-routing through a stale socket.
### What changed
- New `services/daemon/lifecycle.ts` — single helper that probes the
IPC socket via `/v1/version` (instead of trusting `existsSync`),
cleans up stale `daemon.sock` / `daemon.pid` files, and auto-spawns
a detached `claudemesh daemon up` under a file-lock when the daemon
is missing.
- Polls for socket liveness up to a budget (3 s for ad-hoc verbs,
10 s for `claudemesh launch`) before falling through.
- Recently-failed marker (`~/.claudemesh/daemon/.spawn-failure`,
30 s TTL) prevents thundering-herd retries when the daemon
crash-loops at startup.
- Spawn-lock (`~/.claudemesh/daemon/.spawn.lock`) ensures concurrent
CLI invocations share one spawn attempt instead of racing.
- Per-process result cache — a script doing 50 sends pays the spawn
cost at most once, not 50 times.
- Recursion guard via `CLAUDEMESH_INTERNAL_NO_AUTOSPAWN=1` env (set
on the spawned daemon's env) so nested CLI calls inside the daemon
process don't re-trigger spawn.
### User-visible behavior
- `peer list`, `send`, `state get`, etc. now restart the daemon
automatically when invoked while the daemon is down.
- One-line stderr info on auto-restart:
`[claudemesh] info daemon restarted automatically (took 615ms)`.
- Cold-path fallback fires only when auto-spawn fails or is
suppressed by the recently-failed marker; in those cases a `warn`
line points at the daemon log.
### Bug fixed
`claudemesh launch`'s `ensureDaemonRunning` previously checked only
`existsSync(SOCK_FILE)` and returned early on a stale socket left by
a crashed daemon — silently breaking new sessions. Now delegates to
the lifecycle helper which probes the socket and recovers.
### What's not in this patch
- `--strict` and `--no-daemon` flags (deferred to D in 1.28.0).
- Lazy-loading of cold-path code (deferred to 1.28.0).
- Per-session IPC tokens (deferred to 1.28.0 alongside D's
thin-client conversion).
## 1.27.2 (2026-05-04) — skill: full-flag launch templates
Documentation-only ship. `skills/claudemesh/SKILL.md` gains a canonical