claudemesh

Author	SHA1	Message	Date
Alejandro Gutiérrez	1a14cef1e0	feat(cli): 1.31.0 — session autoclean + broker verification + service path Some checks failed CI / Lint (push) Has been cancelled Details CI / Typecheck (push) Has been cancelled Details CI / Broker tests (Postgres) (push) Has been cancelled Details CI / Docker build (linux/amd64) (push) Has been cancelled Details Three operability fixes for users running the daemon under launchd or systemd. PID-watcher autoclean ===================== The session reaper already dropped registry entries with dead pids on a 30s loop, but had two real-world gaps: - 30s sweep let stale presence linger on the broker for half a minute - bare process.kill(pid, 0) trusts a recycled pid; a registry entry could survive its real owner's death whenever the OS rolled the pid number forward to a new program Process-exit IPC from claude-code is best-effort and skipped on SIGKILL / OOM / segfault / panic, so it cannot replace the sweep. Fix: - New process-info.ts captures opaque per-process start-times via ps -o lstart= (works on macOS and Linux, ~1 ms per call) - registerSession stores the start-time alongside the pid - reapDead drops entries when pid is dead OR start-time changed since register - Sweep cadence 30s -> 5s - Best-effort fallback to bare liveness when start-time capture fails at register time Registry hooks already close the per-session broker WS on deregister, so peer list rebuilds within one sweep of any session exit. Service-managed daemon: no more "spawn failed" false alarms =========================================================== After claudemesh install (which writes a launchd plist or systemd unit with KeepAlive=true), users routinely saw [claudemesh] warn daemon spawn failed: socket did not appear within 3000ms even when the daemon was running fine. Two contributing causes: 1. Probe timeout was 800ms — the first IPC after a launchd-driven restart can take longer (SQLite migration + broker WS opens) and tripped it. Bumped to 2500ms. 2. On a failed probe the CLI tried its own detached spawn, which collided with launchd's KeepAlive restart cycle (singleton lock fails, child exits) and we'd then time out polling for a socket that was actually about to come up. Now: when the launchd plist or systemd unit exists, the CLI does not attempt a spawn. It waits up to 8s for the OS-managed unit to bring the socket up. New service-not-ready state distinguishes "OS hasn't restarted it yet" from "we tried to spawn and it failed". Install verifies broker connectivity, not just process start ============================================================ Previously install ended once launchctl reported the unit loaded — a daemon that boots but cannot reach the broker (blocked :443, expired TLS, DNS, broker outage) only surfaced on the user's first peer list or send. /v1/health now includes per-mesh broker WS state. install polls it for up to 15s after service boot and prints either "broker connected (mesh=...)" or a warning naming the meshes still in connecting state, with a hint at common causes. The verification is best-effort and does not fail the install — it just surfaces the issue early. Tests ===== 4 new vitest cases cover the reaper paths: dead pid, live pid plus matching start-time, live pid plus mismatched start-time (PID reuse), and the no-start-time fallback. 83 of 83 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 14:05:44 +01:00
Alejandro Gutiérrez	81f0e4f7ac	feat(cli): 1.28.0 — bridge deletion + daemon-policy flags drop the orphaned bridge tier (~600 LoC). client/server/protocol files deleted; tryBridge had returned null in production for seven releases since the 1.24.0 mcp shim rewrite stopped opening the sockets. each verb now has two paths: daemon (with 1.27.3's auto-spawn) → cold ws. add per-process daemon policy: --strict (error instead of cold fallback) and --no-daemon (skip daemon entirely). enforcement at withMesh so a single chokepoint covers every verb. env equivalents CLAUDEMESH_STRICT_DAEMON / CLAUDEMESH_NO_DAEMON. flag wins. net -394 loc; the daemon-up case ships ~600 loc lighter and the fallback story is one tier simpler. first sprint A drop; per-session ipc tokens and the wizard refactors follow in 1.29.0+. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 12:23:04 +01:00
Alejandro Gutiérrez	2b6cf2c14b	feat(cli): self-healing daemon lifecycle every daemon-routed verb now probes the ipc socket via /v1/version (instead of trusting existsSync), cleans up stale sock/pid files left by a crashed daemon, and auto-spawns a detached `claudemesh daemon up` under a file-lock when the daemon is down. polls for liveness up to a budget (3s for ad-hoc verbs, 10s for launch) before falling through to cold path. includes a per-process result cache (script doing 50 sends pays spawn cost at most once), a 30s recently-failed marker (no thundering-herd retries on crash-loop), a spawn-lock (concurrent invocations share one attempt), and a recursion guard env var (nested cli calls inside the daemon process skip auto-spawn). fixes the stale-socket bug where launch's ensureDaemonRunning returned early on a left-over socket file from a crashed daemon, silently breaking the spawned claude session's mcp shim. deferred to 1.28.0: --strict / --no-daemon flags, lazy-loading of cold-path code, per-session ipc tokens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 11:17:32 +01:00
Alejandro Gutiérrez	5785454ac9	feat: collapse mesh.name and mesh.slug into one identifier (v1.21.0) Some checks failed CI / Lint (push) Has been cancelled Details CI / Typecheck (push) Has been cancelled Details CI / Broker tests (Postgres) (push) Has been cancelled Details CI / Docker build (linux/amd64) (push) Has been cancelled Details Pre-launch fix: every visible surface already keyed on slug, so "name" was a parallel string that only existed to confuse users on rename ("I renamed but nothing visible changed"). Now slug IS the identifier. claudemesh rename <old> <new> is the whole rename surface. PATCH /api/cli/meshes/:slug body becomes { slug } and the route writes both columns to keep them in sync. Mesh create derives slug from input.name and stores name = slug. Pickers drop the (parens). The claudemesh slug verb shipped 30 min ago is removed — merged into rename. The mesh.name DB column stays for now to avoid touching ~25 reader sites; a follow-up migration drops it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 15:23:04 +01:00
Alejandro Gutiérrez	ee12510ef1	refactor: rename cli-v2 → cli, archive legacy cli, plus broker-side grants + auto-migrate Some checks failed CI / Lint (push) Has been cancelled Details CI / Typecheck (push) Has been cancelled Details CI / Broker tests (Postgres) (push) Has been cancelled Details CI / Docker build (linux/amd64) (push) Has been cancelled Details - apps/cli/ is now the canonical CLI (was apps/cli-v2/). - apps/cli/ legacy v0 archived as branch 'legacy-cli-archive' and tag 'cli-v0-legacy-final' before deletion; git history preserves it too. - .github/workflows/release-cli.yml paths updated. - pnpm-lock.yaml regenerated. Broker-side peer-grant enforcement (spec: 2026-04-15-per-peer-capabilities): - 0020_peer-grants.sql adds peer_grants jsonb + GIN index on mesh.member. - handleSend in broker fetches recipient grant maps once per send, drops messages silently when sender lacks the required capability. - POST /cli/mesh/:slug/grants to update from CLI; broker_messages_dropped_by_grant_total metric. - CLI grant/revoke/block now mirror to broker via syncToBroker. Auto-migrate on broker startup: - apps/broker/src/migrate.ts runs drizzle migrate with pg_advisory_lock before the HTTP server binds. Exits non-zero on failure so Coolify healthcheck fails closed. - Dockerfile copies packages/db/migrations into /app/migrations. - postgres 3.4.5 added as direct broker dep. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 08:44:52 +01:00

5 Commits