Files
claudemesh/.artifacts/specs/2026-05-03-daemon-spec-v0.9.0.md
Alejandro Gutiérrez abaa4bcf87 feat(cli): claudemesh daemon — peer mesh runtime (v0.9.0)
Long-lived process that holds a persistent WS to the broker and exposes
a local IPC surface (UDS + bearer-auth TCP loopback). Implements the
v0.9.0 spec under .artifacts/specs/.

Core:
- daemon up | status | version | down | accept-host
- daemon outbox list [--failed|--pending|--inflight|--done|--aborted]
- daemon outbox requeue <id> [--new-client-id <id>]
- daemon install-service / uninstall-service (macOS launchd, Linux systemd)

IPC routes:
- /v1/version, /v1/health
- /v1/send  (POST)  — full §4.5.1 idempotency lookup table
- /v1/inbox (GET)   — paged history
- /v1/events        — SSE stream of message/peer_join/peer_leave/broker_status
- /v1/peers         — broker passthrough
- /v1/profile       — summary/status/visible/avatar/title/bio/capabilities
- /v1/outbox + /v1/outbox/requeue — operator recovery

Storage (SQLite via node:sqlite / bun:sqlite):
- outbox.db: pending/inflight/done/dead/aborted with audit columns
- inbox.db: dedupe by client_message_id, decrypts DMs via existing crypto
- BEGIN IMMEDIATE serialization for daemon-local accept races

Identity:
- host_fingerprint.json (machine-id || first-stable-mac)
- refuse-on-mismatch policy with `daemon accept-host` recovery

CLI integration:
- claudemesh send detects the daemon and routes through /v1/send when
  present, falling back to bridge socket / cold path otherwise

Tests: 15-case coverage of the §4.5.1 IPC duplicate lookup table.

Spec arc preserved at .artifacts/specs/2026-05-03-daemon-{v1..v10}.md;
v0.9.0 implementation target locked at 2026-05-03-daemon-spec-v0.9.0.md;
deferred items at 2026-05-03-daemon-spec-broker-hardening-followups.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 20:03:05 +01:00

30 KiB
Raw Blame History

claudemesh daemon — Implementation spec v0.9.0

Implementation target. Locked from the v1v10 codex-reviewed spec series. This document is what we build for v0.9.0 of the daemon.

Base: v6 (the round where the architecture passed codex's structural review — request_fingerprint, dedupe table, atomicity contract, feature-bit negotiation, key archive format).

Pulled in from v7v9: six cheap, load-bearing fixes that close real v0.9.0-era bugs (not future-scale concerns):

  1. aborted outbox status + audit columns (operator recovery without destroying audit trail) — v7 §4.5.2
  2. BEGIN IMMEDIATE for daemon-local SQLite serialization (v6's SELECT FOR UPDATE is invalid SQLite anyway) — v7 §4.5.1
  3. Daemon-local IPC duplicate lookup table over outbox states × fingerprint match/mismatch — v8 §4.5.1
  4. Phase B1/B2/B3 broker validation split (the concept; we don't need the elaborate phase tables) — v7 §4.6.2
  5. Side-effect inventory (in-tx vs async) as an implementation comment block — v8 §4.7.1
  6. Two-layer ID model wording: daemon-consumed iff outbox row, broker-consumed iff dedupe row — v9 §4.1

Deferred to broker-hardening followups (see 2026-05-03-daemon-spec-broker-hardening-followups.md for the full list and rationale): B0 dedupe fast-path, Lua-scripted idempotent rate limiter, in-tx mention_index, 4011/4012 close-code split, per-OS fingerprint precedence table, request-fingerprint schema-v2 in feature negotiation. These are real improvements but not v0.9.0 blockers; they land as the broker matures.

Intent §0 unchanged from v2.


0. Intent — unchanged, see v2 §0


1. Process model — unchanged from v3 §1 / v2 §1


2. Identity — unchanged from v5 §2


3. IPC surface — unchanged from v4 §3


4. Delivery contract — at-least-once with request-fingerprinted dedupe

Codex r5: dedupe must compare the whole request shape, not just (mesh, client_message_id). Otherwise a caller who reuses an idempotency key with a different destination or body silently drops the new send and gets the old send's metadata back.

4.1 The contract (precise)

Two-layer ID rule (from v9): a client_message_id is daemon-consumed iff an outbox row exists for it; broker-consumed iff a dedupe row exists in mesh.client_message_dedupe. The two layers are independent: a daemon-consumed id may or may not be broker-consumed (depending on whether the send reached broker commit). In v0.9.0 there are no daemon-bypass clients, so for practical purposes "daemon-consumed" is the operative rule.

Local guarantee: each successful POST /v1/send returns a stable client_message_id. The send is durably persisted to outbox.db before the response returns. The daemon enforces request-fingerprint idempotency at the IPC layer (§4.5).

Local audit guarantee: a client_message_id once written to outbox.db is never released. Operator recovery via requeue always mints a fresh id; the old row stays in aborted for audit. There is no daemon-side path to free a used id.

Broker guarantee: the broker maintains a dedupe record per accepted (mesh_id, client_message_id) in mesh.client_message_dedupe. Each dedupe record carries a canonical request_fingerprint. Retries with the same id AND matching fingerprint collapse to the original broker_message_id. Retries with mismatched fingerprint return 409 idempotency_key_reused and do not create a new message.

Atomicity guarantee: dedupe row insertion, message row insertion, and history row insertion happen in one broker DB transaction. Either all land, or none do. No orphan dedupe rows.

End-to-end guarantee: at-least-once delivery, with client_message_id propagated to receivers' inboxes.

4.2 Daemon-supplied client_message_id — unchanged from v3 §4.2

4.3 Broker schema — request fingerprint added (v6)

CREATE TABLE mesh.client_message_dedupe (
  mesh_id              UUID    NOT NULL REFERENCES mesh.mesh(id) ON DELETE CASCADE,
  client_message_id    TEXT    NOT NULL,

  -- The original accepted message; FK NOT enforced because the message row
  -- may be GC'd by retention sweeps before the dedupe row expires.
  broker_message_id    UUID    NOT NULL,

  -- Canonical fingerprint of the original request. Recomputed on every
  -- duplicate retry; mismatch → 409 idempotency_key_reused. Schema in §4.4.
  request_fingerprint  BYTEA   NOT NULL,                    -- 32-byte sha256

  destination_kind     TEXT    NOT NULL CHECK(destination_kind IN ('topic','dm','queue')),
  destination_ref      TEXT    NOT NULL,
  first_seen_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  expires_at           TIMESTAMPTZ,                          -- NULL = `permanent` mode
  history_available    BOOLEAN NOT NULL DEFAULT TRUE,        -- flipped FALSE when message row GC'd

  PRIMARY KEY (mesh_id, client_message_id)
);

CREATE INDEX client_message_dedupe_expires_idx
  ON mesh.client_message_dedupe(expires_at)
  WHERE expires_at IS NOT NULL;

ALTER TABLE mesh.topic_message ADD COLUMN client_message_id TEXT;
ALTER TABLE mesh.message_queue ADD COLUMN client_message_id TEXT;

status column dropped (codex r5). Rejected requests do not consume idempotency keys. Rationale below in §4.6.

4.4 Request fingerprint — canonical form (NEW v6)

The fingerprint covers everything that makes a send semantically distinct. A retry must reproduce the same fingerprint bit-for-bit; anything else is a different send and must not be collapsed.

request_fingerprint = sha256(
  envelope_version || 0x00 ||
  destination_kind || 0x00 ||
  destination_ref  || 0x00 ||
  reply_to_id_or_empty || 0x00 ||
  priority         || 0x00 ||
  meta_canonical_json || 0x00 ||
  body_hash
)

Where:

  • envelope_version: integer string (e.g. "1"). Bumps when the envelope shape changes.
  • destination_kind: topic, dm, or queue.
  • destination_ref: topic name, recipient ed25519 pubkey hex, or queue id.
  • reply_to_id_or_empty: original broker_message_id or empty string.
  • priority: now, next, or low.
  • meta_canonical_json: the meta field, serialized with sorted keys, no whitespace, escape-canonical (RFC 8785 JCS). Empty meta = empty string.
  • body_hash: sha256(body bytes), hex.

The fingerprint is computed:

  1. Daemon-side before durable outbox persistence — stored as outbox.request_fingerprint (NEW column) so retries always produce the same fingerprint regardless of caller behavior.
  2. Broker-side on first receipt — stored in client_message_dedupe.request_fingerprint.
  3. Broker-side on every duplicate retry — recomputed and compared byte-equal to the stored value.

If the daemon and broker disagree on the canonical form (e.g. JCS implementation drift), the broker emits cm_broker_dedupe_fingerprint_mismatch_total{client_id, mesh_id} and returns 409 idempotency_key_reused with a body that includes the broker's fingerprint hex for debugging. Daemons that see this should log it loudly and stop retrying that outbox row (it goes to dead).

4.5 Daemon-local idempotency at the IPC layer (from v8)

The daemon enforces fingerprint idempotency before the request hits outbox.db so a caller bug never creates duplicate-key/mismatch-payload state at all.

4.5.1 IPC accept algorithm

On POST /v1/send:

  1. Validate request envelope (auth, schema, size limits, destination resolvable). Failures here return 4xx immediately. No outbox row is written; the client_message_id is not consumed.
  2. Compute request_fingerprint (§4.4).
  3. Open a SQLite transaction with BEGIN IMMEDIATE so a concurrent IPC accept on the same id serializes against this one. BEGIN IMMEDIATE acquires the RESERVED lock at transaction start; SQLite has no row-level lock and SELECT FOR UPDATE is not supported.
  4. SELECT id, request_fingerprint, status, broker_message_id, last_error FROM outbox WHERE client_message_id = ?.
  5. Apply the lookup table below. For the "(no row)" case, INSERT inside the same transaction.
  6. COMMIT.
Existing row state Fingerprint Daemon response
(no row) INSERT new outbox row pending; return 202 accepted, queued
pending match Return 202 accepted, queued. No mutation
pending mismatch Return 409, conflict: "outbox_pending_fingerprint_mismatch"
inflight match Return 202 accepted, inflight. No mutation
inflight mismatch Return 409, conflict: "outbox_inflight_fingerprint_mismatch"
done match Return 200 ok, duplicate: true, broker_message_id, history_id. No broker call
done mismatch Return 409, conflict: "outbox_done_fingerprint_mismatch", broker_message_id
dead match Return 409, conflict: "outbox_dead_fingerprint_match", reason: "<last_error>"
dead mismatch Return 409, conflict: "outbox_dead_fingerprint_mismatch"
aborted match Return 409, conflict: "outbox_aborted_fingerprint_match". Operator-retired id, never reusable
aborted mismatch Return 409, conflict: "outbox_aborted_fingerprint_mismatch"

Every 409 carries the daemon's request_fingerprint (8-byte hex prefix) for client/server canonical-form-drift debugging. A client_message_id written to outbox.db is permanently bound to that row's lifecycle — the only "free" state is "no row exists".

4.5.2 Outbox table

CREATE TABLE outbox (
  id                  TEXT PRIMARY KEY,
  client_message_id   TEXT NOT NULL UNIQUE,
  request_fingerprint BLOB NOT NULL,                          -- 32 bytes
  payload             BLOB NOT NULL,
  enqueued_at         INTEGER NOT NULL,
  attempts            INTEGER DEFAULT 0,
  next_attempt_at     INTEGER NOT NULL,
  status              TEXT CHECK(status IN
                        ('pending','inflight','done','dead','aborted')),
  last_error          TEXT,
  delivered_at        INTEGER,
  broker_message_id   TEXT,
  aborted_at          INTEGER,                                -- v7
  aborted_by          TEXT,                                   -- v7: operator/auto
  superseded_by       TEXT                                    -- v7: id of requeue successor
);
CREATE INDEX outbox_pending ON outbox(status, next_attempt_at);
CREATE INDEX outbox_aborted ON outbox(status, aborted_at) WHERE status = 'aborted';

aborted_at / aborted_by / superseded_by give operators a clear audit trail. superseded_by lets outbox inspect show the chain when a row is requeued multiple times. request_fingerprint is computed once at IPC accept time and frozen for the row's lifecycle.

4.5.3 Operator recovery via requeue

claudemesh daemon outbox requeue --id <outbox_row_id>
                                  [--new-client-id <id> | --auto]
                                  [--patch-payload <path>]

Atomically (single SQLite transaction):

  1. Marks the existing row aborted, sets aborted_at = now, aborted_by = "operator". Row is never deleted — audit trail permanent.
  2. Mints a fresh client_message_id (caller-supplied or auto-ulid).
  3. Inserts a new outbox row pending with the fresh id and the same payload (or patched if --patch-payload).
  4. Sets superseded_by = <new_row_id> on the old row.

The old client_message_id is permanently dead. There is no path for an id to become free again.

4.5b Broker duplicate response — three cases

Case HTTP/WS code Body
First insert 201 created { broker_message_id, client_message_id, history_id, duplicate: false }
Duplicate, fingerprint match 200 ok { broker_message_id, client_message_id, history_id, duplicate: true, history_available, first_seen_at }
Duplicate, fingerprint mismatch 409 idempotency_key_reused { client_message_id, conflict: "request_fingerprint_mismatch", broker_fingerprint_prefix: "ab12cd34..." } (first 8 bytes hex)

Daemon outcomes:

  • 201 → mark outbox row done, store broker_message_id.
  • 200 duplicate with history_available: true → mark done, log INFO.
  • 200 duplicate with history_available: false → mark done, log WARN.
  • 409 idempotency_key_reused → mark outbox row dead. Operator runs outbox requeue (§4.5.3); old id stays aborted, new id is fresh.

4.6 Rejected-request semantics — id consumed iff outbox row written

Rule: a client_message_id is daemon-consumed iff the daemon writes an outbox row. Anything that fails before outbox insertion (auth, schema, size, destination not resolvable) leaves the id untouched and freely reusable.

4.6.1 Daemon-side rejection phasing

Phase When daemon rejects Outbox row? Caller may reuse id?
A. IPC validation (auth, schema, size, destination resolvable) Before §4.5.1 step 3 No Yes — id never consumed
B. Outbox stored, broker network/transient failure After IPC accept, broker 5xx or timeout pending → retried N/A — daemon owns retries
C. Outbox stored, broker permanent rejection Broker returns 4xx after IPC accept dead No — rotate via requeue
D. Operator retirement Operator runs requeue on dead or pending row aborted (audit) + new row with fresh id Old id NEVER reusable; new id is fresh

4.6.2 Broker-side rejection phasing (B1 / B2 / B3)

The broker validates in three phases relative to dedupe-row insertion:

Phase Validation Side effects Result for direct broker callers (none in v0.9.0)
B1. Pre-dedupe-claim Auth (mesh membership), schema, size, mesh exists, member exists, destination kind valid, payload bytes ≤ max_payload.inline_bytes, rate limit not exceeded None 4xx. No dedupe row. Direct broker caller may retry with same id
B2. Post-dedupe-claim (in-tx) destination_ref existence (topic exists, member subscribed, etc.) INSERT into dedupe rolled back 4xx, transaction rolled back, no dedupe row remains. Direct broker caller may retry with same id
B3. Accepted All side effects commit atomically Dedupe row, message row, history row, delivery_queue rows 201 with broker_message_id

Daemon-mediated callers (the only path in v0.9.0) see only the daemon-layer rules of §4.6.1: any broker 4xx after IPC accept lands the outbox row in dead. Daemon-mediated callers MUST rotate via requeue (§4.5.3); the daemon-consumed id is never reusable regardless of whether the broker layer sees a dedupe row. The "may retry with same id" wording above describes broker-bypass callers only, which v0.9.0 does not have.

Critical guarantee: there is no broker code path where a permanent 4xx leaves a dedupe row behind. Either the request committed and a dedupe row exists (B3), or it didn't and no dedupe row exists (B1, B2). "Dedupe row exists" is the unambiguous signal of "id consumed at the broker layer."

If the broker decides post-commit that an accepted message is invalid (async content-policy job), that's NOT a permanent rejection — it's a follow-up moderation event that operates on the broker_message_id, not on the dedupe key.

Net result: client_message_dedupe rows only exist when the broker successfully accepted a message and committed it. The single source of truth for "was this idempotency key consumed?" is the existence of the dedupe row. No status enum, no ambiguous states.

4.7 Broker atomicity contract

4.7.1 Side-effect inventory

Every successful broker accept atomically commits these durable state changes in one transaction:

Effect Table Why in-tx
Dedupe record mesh.client_message_dedupe Idempotency authority
Message body mesh.topic_message / mesh.message_queue Authoritative store
History row mesh.message_history Replay log; lost-on-rollback breaks ordered replay
Fan-out work mesh.delivery_queue Each recipient must see exactly committed messages

Outside the transaction (non-authoritative or rebuildable):

  • WS push to live subscribers — best-effort live notifications.
  • Webhook fan-out — async via delivery_queue workers.
  • Rate-limit counters — telemetry only; authority is the external limiter checked in B1.
  • Audit log entries — append-only stream; rebuildable from history.
  • Search/FTS index updates — async via outbox-pattern worker.
  • Mention index updates — async (deferred in-tx promotion to followups doc).
  • Metrics — Prometheus, pull-based.

If any in-transaction insert fails, the transaction rolls back completely. The accept is 5xx to daemon; daemon retries. No partial state.

4.7.2 Pseudocode

-- Pre-generate broker_message_id (ulid) in code, pass in.
BEGIN;

-- Step 1: try to claim the idempotency key.
INSERT INTO mesh.client_message_dedupe
  (mesh_id, client_message_id, broker_message_id, request_fingerprint,
   destination_kind, destination_ref, expires_at)
  VALUES ($mesh_id, $client_id, $msg_id, $fingerprint,
          $dest_kind, $dest_ref, $expires_at)
  ON CONFLICT (mesh_id, client_message_id) DO NOTHING;

-- Step 2: inspect what's actually there now (ours or someone else's).
SELECT broker_message_id, request_fingerprint, destination_kind,
       destination_ref, history_available, first_seen_at
  FROM mesh.client_message_dedupe
  WHERE mesh_id = $mesh_id AND client_message_id = $client_id
  FOR SHARE;

-- Branch:
--   row.broker_message_id == $msg_id  → first insert; continue.
--   row.broker_message_id != $msg_id  → duplicate. Compare fingerprints:
--     match    → ROLLBACK; return 200 duplicate.
--     mismatch → ROLLBACK; return 409 idempotency_key_reused.

-- Step 3: validate Phase B2 (destination_ref existence — topic exists,
-- member subscribed, etc.). If B2 fails → ROLLBACK; return 4xx (no
-- dedupe row remains).

-- Step 4: insert in-tx side effects (§4.7.1).
INSERT INTO mesh.topic_message (id, mesh_id, client_message_id, body, ...)
  VALUES ($msg_id, $mesh_id, $client_id, ...);

INSERT INTO mesh.message_history (broker_message_id, mesh_id, ...)
  VALUES ($msg_id, $mesh_id, ...);

INSERT INTO mesh.delivery_queue (broker_message_id, recipient_pubkey, ...)
  SELECT $msg_id, member_pubkey, ...
    FROM mesh.topic_subscription
    WHERE topic = $dest_ref AND mesh_id = $mesh_id;

COMMIT;

The branch logic determines the response shape (201 / 200 duplicate / 409 idempotency_key_reused) before COMMIT. The duplicate and 409 branches always ROLLBACK because nothing else needs to commit. SELECT … FOR SHARE blocks concurrent writers from upgrading the same dedupe row mid-transaction.

4.7.3 Failure modes

  • Crash before COMMIT: all rows roll back. Next daemon retry inserts cleanly.
  • Crash after COMMIT but before WS ACK: dedupe row exists. Daemon retries → fingerprint matches → 200 duplicate. Net: exactly one broker-accepted row, one daemon done transition.
  • Constraint violation on message row insert: rolls back the whole tx. 5xx to daemon. Same fingerprint reproduces; daemon eventually marks dead. No orphan dedupe row.

Counter cm_broker_dedupe_orphan_check_total runs nightly and validates that every client_message_dedupe row has a matching topic_message / message_queue row OR the matching row has been retention-pruned (history_available = FALSE). Inconsistencies logged as cm_broker_dedupe_orphan_found{mesh_id} for human review.

4.8 Outbox schema

The authoritative outbox schema for v0.9.0 is in §4.5.2 (includes aborted status and audit columns from the v7 pull). request_fingerprint is computed at IPC accept time and frozen for the row's lifecycle — the daemon never recomputes from payload post-enqueue (would produce drift if envelope_version changes between daemon runs).

4.9 Outbox max-age math — bounded (v6)

Codex r5: the v5 formula (dedupe_retention_days * 24) - 24h_margin breaks at dedupe_retention_days = 1 (yields zero) and is undefined behavior at <= 1.

v6 formula and bounds:

  • Minimum supported broker dedupe retention: 3 days. Daemon refuses to start if broker advertises dedupe_retention_days < 3 (treats it as feature_param_invalid, exits 4010).

  • Daemon max_age_hours derivation:

    • permanent mode → daemon uses config default (168h = 7d), cap 720h (30d).
    • retention_scoped mode → daemon max_age_hours = max(72, (dedupe_retention_days * 24) - safety_margin_hours) where safety_margin_hours = max(24, ceil(dedupe_retention_days * 0.1 * 24)). For dedupe_retention_days=3 this gives max(72, 72-24) = 72h. For 30 days: max(72, 720-72) = 648h. For 365 days: max(72, 8760-876) = 7884h.
    • The 72h floor prevents the daemon outbox from being uselessly short — three days is enough margin for normal operator response to a paged outage.
  • Operator override allowed via [outbox] max_age_hours_override = N, but if N exceeds dedupe_retention_days * 24 - 1 daemon refuses to start with outbox_max_age_above_dedupe_window. The override exists for the rare case of a much-shorter-than-default outbox; it does not exist to circumvent the broker's dedupe window.

4.10 Inbox schema — unchanged from v3 §4.5

4.11 Crash recovery — unchanged from v3 §4.6

4.12 Failure modes — corrected for fingerprint model (v6)

  • Fingerprint mismatch on retry (409 idempotency_key_reused): outbox row marked dead. Surfaced in --failed view. Operator command outbox requeue --new-id <id> rotates client_message_id and retries.
  • Daemon retry after dedupe row hard-deleted by retention sweep: in retention_scoped mode, daemon max_age_hours is bounded inside the retention window (§4.9), so this can only happen via operator override. In that case the retry creates a NEW dedupe row + new message — the caller chose this risk explicitly. Counter cm_daemon_retry_after_dedupe_expired_total.
  • Daemon retry after dedupe row hard-deleted in permanent mode: cannot happen by definition — permanent means no expires_at. Only mesh deletion removes dedupe rows.
  • Duplicate row, history pruned: as v5 §4.4. Mark done, log cm_daemon_dedupe_history_pruned_total.

5. Inbound — unchanged from v3 §5


6. Hooks — unchanged from v4 §6


7-13. Multi-mesh, auto-routing, service install, observability, SDKs, security model, configuration — unchanged from v4


14. Lifecycle — unchanged from v5 §14


15. Version compat — feature param updated for new dedupe semantics

15.1 Feature bits with parameters (v6 update)

Bit params.version Required parameters Optional parameters
client_message_id_dedupe 1 mode: "retention_scoped"|"permanent", dedupe_retention_days: int (>= 3) (when mode=retention_scoped), request_fingerprint: bool == true tombstone_history_pruned_window_days: int
concurrent_connection_policy 1 (no parameters) default_policy: "prefer_newest"|"prefer_oldest"|"allow_concurrent"
member_keypair_rotated_event 1 (no parameters)
key_epoch 1 max_concurrent_epochs: int (>= 1)
max_payload 1 inline_bytes: int (>= 1024), blob_bytes: int (>= 1024)

client_message_id_dedupe ships at params.version = 1 with request_fingerprint: bool == true as a required parameter. A broker that doesn't advertise the feature, or advertises it without request_fingerprint: true, is treated as "feature missing" and the daemon refuses to start. That's intentional — v0.9.0 daemons require fingerprint enforcement for safe idempotency.

The schema-version-2 evolution (parameters that need versioning) is deferred (see followups doc).

dedupe_retention_days minimum is 3 (matches the §4.9 floor).

15.2 Negotiation handshake — unchanged shape from v5 §15.2

15.3 IPC negotiation — unchanged from v3 §15.3

15.4 Compatibility matrix — unchanged from v3 §15.4

15.5 Diagnostic close code (v0.9.0)

v0.9.0 ships a single WebSocket close code with a structured close_reason JSON payload that distinguishes the underlying cause:

Code Reason close_reason.kind values
4010 feature_unavailable feature_unavailable (feature missing from broker's supported) · feature_param_invalid (params fail validation: missing required, out of bounds, unknown version) · feature_param_below_floor (param below daemon's hard floor, e.g. dedupe_retention_days < 3)

close_reason payload shape:

{
  "kind": "feature_unavailable" | "feature_param_invalid" | "feature_param_below_floor",
  "feature": "client_message_id_dedupe",
  "detail": "..."
}

Daemon logs the full negotiation payload at WARN before exiting; supervisor + alerting catches the restart loop. The split into 4011/4012 codes is deferred (see followups doc).


16. Threat model — unchanged from v4 §16


17. Migration — broker dedupe table + atomicity (v6)

Broker side, deploy order:

  1. CREATE TABLE mesh.client_message_dedupe with v6 schema (additive, online-safe).
  2. ALTER TABLE mesh.topic_message ADD COLUMN client_message_id.
  3. ALTER TABLE mesh.message_queue ADD COLUMN client_message_id.
  4. Broker code refactor: every accept path wraps dedupe insert + message insert in one transaction (§4.7). Pre-generated broker_message_id (ulid in code) passed in.
  5. Broker code: nightly job to delete dedupe rows where expires_at < NOW() (skip in permanent mode).
  6. Broker code: hook into the message-retention sweep — when a topic_message or message_queue row is hard-deleted, find the matching dedupe row by client_message_id and set history_available = FALSE. (Note: client_message_id is nullable on those tables for legacy traffic; nullable rows have no dedupe row to update.)
  7. Broker code: nightly orphan-check job (§4.7); alerts on non-zero.
  8. Broker advertises client_message_id_dedupe feature with params.version = 1 and request_fingerprint: true.
  9. Daemon refuses to start unless that feature bit is advertised with valid v1 params.

Rollback plan: feature flag disables fingerprint enforcement broker-side (falls back to existing pre-v6 behavior — no dedupe). Daemons that require fingerprint refuse to start. Operator switches off the feature flag, reverts the daemon, restarts. No data loss; pending dedupe rows remain in place for the next forward roll.


v0.9.0 lock — what's in vs deferred

In (this document): everything codex r1r4 ratified plus the six sweet-spot pulls from v7v9 enumerated at the top — aborted outbox status, BEGIN IMMEDIATE, IPC duplicate lookup table, B1/B2/B3 phasing concept, side-effect inventory, two-layer ID model.

Deferred (see 2026-05-03-daemon-spec-broker-hardening-followups.md):

  • B0 dedupe fast-path before rate-limit (v10).
  • Lua-scripted idempotent rate limiter keyed by (mesh, client_id, window) (v10).
  • In-tx mesh.mention_index (v8).
  • 4011 / 4012 close-code split (v6 §15.5 — collapsed to 4010 with structured reason JSON for v0.9.0).
  • Per-OS fingerprint precedence elaborate table (v8 §2.2.1).
  • request_fingerprint schema-version-2 in feature negotiation (v6 §15.1 ships at version 1 with request_fingerprint: bool).
  • Force-expiry / quarantine semantics for keypair-archive.json (v8 §14.1.1).

These deferrals are real improvements but not v0.9.0 blockers. They land as the broker matures and we have actual scale-load to optimize against.


Cross-spec note: §15.5 close-code collapse

For v0.9.0 we ship a single 4010 feature_unavailable close code with a structured close_reason JSON payload that distinguishes the underlying cause:

{
  "close_reason": {
    "kind": "feature_unavailable" | "feature_param_invalid" | "feature_param_below_floor",
    "feature": "client_message_id_dedupe",
    "detail": "..."
  }
}

The 4011/4012 split is deferred to followups.


NON-NORMATIVE: round-6 review trailer (preserved for audit only)

Not part of the v0.9.0 contract. Preserved verbatim from the v6 source spec as a record of the open questions at the time of the codex round-6 review. Items below have either been resolved in this merged document, deferred to the followups doc, or superseded. Do NOT use this section as a checklist for implementation.

  1. Request fingerprint canonical form (§4.4) — does JCS work cross-language for meta_canonical_json (Python json.dumps, Go encoding/json, JS JSON.stringify all behave differently)? Should we ship a vetted JCS lib in each SDK or fall back to a simpler "sorted keys + no spaces + escape-as-stored" rule with conformance tests?
  2. Atomicity contract (§4.7) — is the orphan-check sufficient, or does a violation mean we need a "broker rebuild dedupe from messages" recovery tool? The latter is destructive but useful for ops emergencies.
  3. Max-age formula (§4.9) — is the 72h floor correct? Is the percentage-based safety margin (max(24, ceil(0.1 * dedupe_window))) the right shape? Or simpler to say "always 24h"?
  4. 409 idempotency_key_reused recovery flow (§4.5) — is sending the row to dead and surfacing it via outbox --failed enough? Should the daemon emit a high-priority event for the SSE stream so operators are paged immediately?
  5. Diagnostic close codes (§15.5) — is splitting 4010/4011/4012 useful, or does it just push complexity onto operators? Should we collapse to 4010 with structured close-reason JSON instead?
  6. Anything else still wrong? Read it as if you were going to operate this for a year. What falls down?

Three options:

  • (a) v6 is shippable: lock the spec, start coding the frozen core.
  • (b) v7 needed: list the must-fix items.
  • (c) the architecture itself is wrong: what would you do differently?

Be ruthless.