Files
claudemesh/packages/db/migrations/0029_drain_lease_and_presence_role.sql
Alejandro Gutiérrez 5a8db796a0 feat(db): m1 — message_queue claim lease + presence.role columns
Schema groundwork for v2 agentic-comms milestone 1.

mesh.message_queue gets three nullable columns (claimed_at, claim_id,
claim_expires_at) so drainForMember can move from "claim-and-deliver in
one UPDATE" to a two-phase claim/lease + recipient-ack model. This is
the at-least-once retry hook the broker has been missing.

mesh.presence gets a typed `role` column ('control-plane' | 'session'
| 'service') with default 'session' so legacy hellos keep working. The
CLI's hidden-daemon hack (peerType === 'claudemesh-daemon') will swap
to a role-based filter in a follow-up worktree.

Migration is hand-authored as 0029_*.sql to match the existing pattern
(drizzle-kit's _journal.json drifted long ago — the runtime migrator
in apps/broker/src/migrate.ts tracks files lexicographically via
mesh.__cmh_migrations, not the journal).
2026-05-04 18:10:04 +01:00

49 lines
2.4 KiB
SQL

-- Milestone 1 (v2 agentic-comms architecture).
--
-- Two concerns rolled into one migration because both are tiny and both
-- ship together with the broker change in the same PR:
--
-- 1. message_queue claim/lease columns (drainForMember race fix)
-- --------------------------------------------------------------
-- Before this migration, drainForMember claimed rows by setting
-- `delivered_at = NOW()` inside the same UPDATE that selected them.
-- If the recipient WS was closed between claim-time and ws.send(),
-- the message was silently dropped — the row read as "delivered" so
-- the next reconnect's drain skipped it. At-most-once semantics with
-- no retry hook.
--
-- The fix moves to two-phase claim/deliver with a lease:
-- claimed_at — set when drainForMember picks the row
-- claim_id — presenceId of the claimer (debugging)
-- claim_expires_at — claimed_at + 30s; if no `client_ack` lands by
-- then, a sweeper clears the claim and the row
-- is re-eligible for a new drain (at-least-once).
--
-- `delivered_at` only gets set when the recipient WS replies with a
-- `client_ack` containing the original client_message_id. Until any
-- daemon emits `client_ack`, claims will simply expire and re-deliver
-- — which is the desired retry behaviour for unreliable transports.
--
-- 2. presence.role column
-- --------------------------------------------------------------
-- The CLI currently hides daemon connections from `peer list` by
-- matching `peerType === 'claudemesh-daemon'`, which is fragile and
-- overloads a free-form field. M1 introduces a typed `role` column on
-- presence with three documented values:
-- 'control-plane' — long-lived daemon WS (one per host)
-- 'session' — per-Claude-Code-session WS (default)
-- 'service' — autonomous bots/services attached to a mesh
--
-- Backfilled to 'session' (default) so legacy presence rows keep their
-- existing visibility. The two hello paths in the broker pass
-- 'control-plane' / 'session' explicitly. CLI-side filter swap
-- (peerType -> role) is a follow-up worktree.
ALTER TABLE "mesh"."message_queue"
ADD COLUMN "claimed_at" timestamp,
ADD COLUMN "claim_id" text,
ADD COLUMN "claim_expires_at" timestamp;
ALTER TABLE "mesh"."presence"
ADD COLUMN "role" text NOT NULL DEFAULT 'session';