- Bump apps/cli/package.json to 1.22.0 (additive feature: claudemesh daemon long-lived runtime). - CHANGELOG entry for 1.22.0 covering subcommands, idempotency wiring, crash recovery, and the deferred Sprint 7 broker hardening. - Roadmap entry for v0.9.0 daemon foundation right above the v2.0.0 daemon redesign section, so the bridge release is documented as the shipped step toward the larger architectural shift. - Move shipped daemon specs (v1..v10 iteration trail + locked v0.9.0 spec + broker-hardening followups) from .artifacts/specs/ to .artifacts/shipped/ per the project artifact-pipeline convention. Not in this commit: npm publish and the cli-v1.22.0 GitHub release tag — both are public-distribution actions and require explicit user approval. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
30 KiB
claudemesh daemon — Implementation spec v0.9.0
Implementation target. Locked from the v1–v10 codex-reviewed spec series. This document is what we build for v0.9.0 of the daemon.
Base: v6 (the round where the architecture passed codex's structural review — request_fingerprint, dedupe table, atomicity contract, feature-bit negotiation, key archive format).
Pulled in from v7–v9: six cheap, load-bearing fixes that close real v0.9.0-era bugs (not future-scale concerns):
abortedoutbox status + audit columns (operator recovery without destroying audit trail) — v7 §4.5.2BEGIN IMMEDIATEfor daemon-local SQLite serialization (v6'sSELECT FOR UPDATEis invalid SQLite anyway) — v7 §4.5.1- Daemon-local IPC duplicate lookup table over outbox states × fingerprint match/mismatch — v8 §4.5.1
- Phase B1/B2/B3 broker validation split (the concept; we don't need the elaborate phase tables) — v7 §4.6.2
- Side-effect inventory (in-tx vs async) as an implementation comment block — v8 §4.7.1
- Two-layer ID model wording: daemon-consumed iff outbox row, broker-consumed iff dedupe row — v9 §4.1
Deferred to broker-hardening followups (see
2026-05-03-daemon-spec-broker-hardening-followups.mdfor the full list and rationale): B0 dedupe fast-path, Lua-scripted idempotent rate limiter, in-tx mention_index, 4011/4012 close-code split, per-OS fingerprint precedence table, request-fingerprint schema-v2 in feature negotiation. These are real improvements but not v0.9.0 blockers; they land as the broker matures.Intent §0 unchanged from v2.
0. Intent — unchanged, see v2 §0
1. Process model — unchanged from v3 §1 / v2 §1
2. Identity — unchanged from v5 §2
3. IPC surface — unchanged from v4 §3
4. Delivery contract — at-least-once with request-fingerprinted dedupe
Codex r5: dedupe must compare the whole request shape, not just
(mesh, client_message_id). Otherwise a caller who reuses an idempotency
key with a different destination or body silently drops the new send and
gets the old send's metadata back.
4.1 The contract (precise)
Two-layer ID rule (from v9): a
client_message_idis daemon-consumed iff an outbox row exists for it; broker-consumed iff a dedupe row exists inmesh.client_message_dedupe. The two layers are independent: a daemon-consumed id may or may not be broker-consumed (depending on whether the send reached broker commit). In v0.9.0 there are no daemon-bypass clients, so for practical purposes "daemon-consumed" is the operative rule.Local guarantee: each successful
POST /v1/sendreturns a stableclient_message_id. The send is durably persisted tooutbox.dbbefore the response returns. The daemon enforces request-fingerprint idempotency at the IPC layer (§4.5).Local audit guarantee: a
client_message_idonce written tooutbox.dbis never released. Operator recovery viarequeuealways mints a fresh id; the old row stays inabortedfor audit. There is no daemon-side path to free a used id.Broker guarantee: the broker maintains a dedupe record per accepted
(mesh_id, client_message_id)inmesh.client_message_dedupe. Each dedupe record carries a canonicalrequest_fingerprint. Retries with the same id AND matching fingerprint collapse to the originalbroker_message_id. Retries with mismatched fingerprint return409 idempotency_key_reusedand do not create a new message.Atomicity guarantee: dedupe row insertion, message row insertion, and history row insertion happen in one broker DB transaction. Either all land, or none do. No orphan dedupe rows.
End-to-end guarantee: at-least-once delivery, with
client_message_idpropagated to receivers' inboxes.
4.2 Daemon-supplied client_message_id — unchanged from v3 §4.2
4.3 Broker schema — request fingerprint added (v6)
CREATE TABLE mesh.client_message_dedupe (
mesh_id UUID NOT NULL REFERENCES mesh.mesh(id) ON DELETE CASCADE,
client_message_id TEXT NOT NULL,
-- The original accepted message; FK NOT enforced because the message row
-- may be GC'd by retention sweeps before the dedupe row expires.
broker_message_id UUID NOT NULL,
-- Canonical fingerprint of the original request. Recomputed on every
-- duplicate retry; mismatch → 409 idempotency_key_reused. Schema in §4.4.
request_fingerprint BYTEA NOT NULL, -- 32-byte sha256
destination_kind TEXT NOT NULL CHECK(destination_kind IN ('topic','dm','queue')),
destination_ref TEXT NOT NULL,
first_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ, -- NULL = `permanent` mode
history_available BOOLEAN NOT NULL DEFAULT TRUE, -- flipped FALSE when message row GC'd
PRIMARY KEY (mesh_id, client_message_id)
);
CREATE INDEX client_message_dedupe_expires_idx
ON mesh.client_message_dedupe(expires_at)
WHERE expires_at IS NOT NULL;
ALTER TABLE mesh.topic_message ADD COLUMN client_message_id TEXT;
ALTER TABLE mesh.message_queue ADD COLUMN client_message_id TEXT;
status column dropped (codex r5). Rejected requests do not
consume idempotency keys. Rationale below in §4.6.
4.4 Request fingerprint — canonical form (NEW v6)
The fingerprint covers everything that makes a send semantically distinct. A retry must reproduce the same fingerprint bit-for-bit; anything else is a different send and must not be collapsed.
request_fingerprint = sha256(
envelope_version || 0x00 ||
destination_kind || 0x00 ||
destination_ref || 0x00 ||
reply_to_id_or_empty || 0x00 ||
priority || 0x00 ||
meta_canonical_json || 0x00 ||
body_hash
)
Where:
envelope_version: integer string (e.g."1"). Bumps when the envelope shape changes.destination_kind:topic,dm, orqueue.destination_ref: topic name, recipient ed25519 pubkey hex, or queue id.reply_to_id_or_empty: originalbroker_message_idor empty string.priority:now,next, orlow.meta_canonical_json: themetafield, serialized with sorted keys, no whitespace, escape-canonical (RFC 8785 JCS). Empty meta = empty string.body_hash: sha256(body bytes), hex.
The fingerprint is computed:
- Daemon-side before durable outbox persistence — stored as
outbox.request_fingerprint(NEW column) so retries always produce the same fingerprint regardless of caller behavior. - Broker-side on first receipt — stored in
client_message_dedupe.request_fingerprint. - Broker-side on every duplicate retry — recomputed and compared byte-equal to the stored value.
If the daemon and broker disagree on the canonical form (e.g. JCS
implementation drift), the broker emits
cm_broker_dedupe_fingerprint_mismatch_total{client_id, mesh_id} and
returns 409 idempotency_key_reused with a body that includes the
broker's fingerprint hex for debugging. Daemons that see this should
log it loudly and stop retrying that outbox row (it goes to dead).
4.5 Daemon-local idempotency at the IPC layer (from v8)
The daemon enforces fingerprint idempotency before the request hits
outbox.db so a caller bug never creates duplicate-key/mismatch-payload
state at all.
4.5.1 IPC accept algorithm
On POST /v1/send:
- Validate request envelope (auth, schema, size limits, destination
resolvable). Failures here return
4xximmediately. No outbox row is written; theclient_message_idis not consumed. - Compute
request_fingerprint(§4.4). - Open a SQLite transaction with
BEGIN IMMEDIATEso a concurrent IPC accept on the same id serializes against this one.BEGIN IMMEDIATEacquires the RESERVED lock at transaction start; SQLite has no row-level lock andSELECT FOR UPDATEis not supported. SELECT id, request_fingerprint, status, broker_message_id, last_error FROM outbox WHERE client_message_id = ?.- Apply the lookup table below. For the "(no row)" case, INSERT inside the same transaction.
- COMMIT.
| Existing row state | Fingerprint | Daemon response |
|---|---|---|
| (no row) | — | INSERT new outbox row pending; return 202 accepted, queued |
pending |
match | Return 202 accepted, queued. No mutation |
pending |
mismatch | Return 409, conflict: "outbox_pending_fingerprint_mismatch" |
inflight |
match | Return 202 accepted, inflight. No mutation |
inflight |
mismatch | Return 409, conflict: "outbox_inflight_fingerprint_mismatch" |
done |
match | Return 200 ok, duplicate: true, broker_message_id, history_id. No broker call |
done |
mismatch | Return 409, conflict: "outbox_done_fingerprint_mismatch", broker_message_id |
dead |
match | Return 409, conflict: "outbox_dead_fingerprint_match", reason: "<last_error>" |
dead |
mismatch | Return 409, conflict: "outbox_dead_fingerprint_mismatch" |
aborted |
match | Return 409, conflict: "outbox_aborted_fingerprint_match". Operator-retired id, never reusable |
aborted |
mismatch | Return 409, conflict: "outbox_aborted_fingerprint_mismatch" |
Every 409 carries the daemon's request_fingerprint (8-byte hex
prefix) for client/server canonical-form-drift debugging. A
client_message_id written to outbox.db is permanently bound to that
row's lifecycle — the only "free" state is "no row exists".
4.5.2 Outbox table
CREATE TABLE outbox (
id TEXT PRIMARY KEY,
client_message_id TEXT NOT NULL UNIQUE,
request_fingerprint BLOB NOT NULL, -- 32 bytes
payload BLOB NOT NULL,
enqueued_at INTEGER NOT NULL,
attempts INTEGER DEFAULT 0,
next_attempt_at INTEGER NOT NULL,
status TEXT CHECK(status IN
('pending','inflight','done','dead','aborted')),
last_error TEXT,
delivered_at INTEGER,
broker_message_id TEXT,
aborted_at INTEGER, -- v7
aborted_by TEXT, -- v7: operator/auto
superseded_by TEXT -- v7: id of requeue successor
);
CREATE INDEX outbox_pending ON outbox(status, next_attempt_at);
CREATE INDEX outbox_aborted ON outbox(status, aborted_at) WHERE status = 'aborted';
aborted_at / aborted_by / superseded_by give operators a clear
audit trail. superseded_by lets outbox inspect show the chain when
a row is requeued multiple times. request_fingerprint is computed
once at IPC accept time and frozen for the row's lifecycle.
4.5.3 Operator recovery via requeue
claudemesh daemon outbox requeue --id <outbox_row_id>
[--new-client-id <id> | --auto]
[--patch-payload <path>]
Atomically (single SQLite transaction):
- Marks the existing row
aborted, setsaborted_at = now,aborted_by = "operator". Row is never deleted — audit trail permanent. - Mints a fresh
client_message_id(caller-supplied or auto-ulid). - Inserts a new outbox row
pendingwith the fresh id and the same payload (or patched if--patch-payload). - Sets
superseded_by = <new_row_id>on the old row.
The old client_message_id is permanently dead. There is no path for
an id to become free again.
4.5b Broker duplicate response — three cases
| Case | HTTP/WS code | Body |
|---|---|---|
| First insert | 201 created |
{ broker_message_id, client_message_id, history_id, duplicate: false } |
| Duplicate, fingerprint match | 200 ok |
{ broker_message_id, client_message_id, history_id, duplicate: true, history_available, first_seen_at } |
| Duplicate, fingerprint mismatch | 409 idempotency_key_reused |
{ client_message_id, conflict: "request_fingerprint_mismatch", broker_fingerprint_prefix: "ab12cd34..." } (first 8 bytes hex) |
Daemon outcomes:
201→ mark outbox rowdone, storebroker_message_id.200 duplicatewithhistory_available: true→ markdone, log INFO.200 duplicatewithhistory_available: false→ markdone, log WARN.409 idempotency_key_reused→ mark outbox rowdead. Operator runsoutbox requeue(§4.5.3); old id staysaborted, new id is fresh.
4.6 Rejected-request semantics — id consumed iff outbox row written
Rule: a
client_message_idis daemon-consumed iff the daemon writes an outbox row. Anything that fails before outbox insertion (auth, schema, size, destination not resolvable) leaves the id untouched and freely reusable.
4.6.1 Daemon-side rejection phasing
| Phase | When daemon rejects | Outbox row? | Caller may reuse id? |
|---|---|---|---|
| A. IPC validation (auth, schema, size, destination resolvable) | Before §4.5.1 step 3 | No | Yes — id never consumed |
| B. Outbox stored, broker network/transient failure | After IPC accept, broker 5xx or timeout |
pending → retried |
N/A — daemon owns retries |
| C. Outbox stored, broker permanent rejection | Broker returns 4xx after IPC accept |
dead |
No — rotate via requeue |
| D. Operator retirement | Operator runs requeue on dead or pending row |
aborted (audit) + new row with fresh id |
Old id NEVER reusable; new id is fresh |
4.6.2 Broker-side rejection phasing (B1 / B2 / B3)
The broker validates in three phases relative to dedupe-row insertion:
| Phase | Validation | Side effects | Result for direct broker callers (none in v0.9.0) |
|---|---|---|---|
| B1. Pre-dedupe-claim | Auth (mesh membership), schema, size, mesh exists, member exists, destination kind valid, payload bytes ≤ max_payload.inline_bytes, rate limit not exceeded |
None | 4xx. No dedupe row. Direct broker caller may retry with same id |
| B2. Post-dedupe-claim (in-tx) | destination_ref existence (topic exists, member subscribed, etc.) | INSERT into dedupe rolled back | 4xx, transaction rolled back, no dedupe row remains. Direct broker caller may retry with same id |
| B3. Accepted | All side effects commit atomically | Dedupe row, message row, history row, delivery_queue rows | 201 with broker_message_id |
Daemon-mediated callers (the only path in v0.9.0) see only the
daemon-layer rules of §4.6.1: any broker 4xx after IPC accept lands
the outbox row in dead. Daemon-mediated callers MUST rotate via
requeue (§4.5.3); the daemon-consumed id is never reusable
regardless of whether the broker layer sees a dedupe row. The "may
retry with same id" wording above describes broker-bypass callers
only, which v0.9.0 does not have.
Critical guarantee: there is no broker code path where a permanent 4xx leaves a dedupe row behind. Either the request committed and a dedupe row exists (B3), or it didn't and no dedupe row exists (B1, B2). "Dedupe row exists" is the unambiguous signal of "id consumed at the broker layer."
If the broker decides post-commit that an accepted message is invalid
(async content-policy job), that's NOT a permanent rejection — it's a
follow-up moderation event that operates on the broker_message_id,
not on the dedupe key.
Net result: client_message_dedupe rows only exist when the broker
successfully accepted a message and committed it. The single source
of truth for "was this idempotency key consumed?" is the existence of
the dedupe row. No status enum, no ambiguous states.
4.7 Broker atomicity contract
4.7.1 Side-effect inventory
Every successful broker accept atomically commits these durable state changes in one transaction:
| Effect | Table | Why in-tx |
|---|---|---|
| Dedupe record | mesh.client_message_dedupe |
Idempotency authority |
| Message body | mesh.topic_message / mesh.message_queue |
Authoritative store |
| History row | mesh.message_history |
Replay log; lost-on-rollback breaks ordered replay |
| Fan-out work | mesh.delivery_queue |
Each recipient must see exactly committed messages |
Outside the transaction (non-authoritative or rebuildable):
- WS push to live subscribers — best-effort live notifications.
- Webhook fan-out — async via
delivery_queueworkers. - Rate-limit counters — telemetry only; authority is the external limiter checked in B1.
- Audit log entries — append-only stream; rebuildable from history.
- Search/FTS index updates — async via outbox-pattern worker.
- Mention index updates — async (deferred in-tx promotion to followups doc).
- Metrics — Prometheus, pull-based.
If any in-transaction insert fails, the transaction rolls back
completely. The accept is 5xx to daemon; daemon retries. No partial
state.
4.7.2 Pseudocode
-- Pre-generate broker_message_id (ulid) in code, pass in.
BEGIN;
-- Step 1: try to claim the idempotency key.
INSERT INTO mesh.client_message_dedupe
(mesh_id, client_message_id, broker_message_id, request_fingerprint,
destination_kind, destination_ref, expires_at)
VALUES ($mesh_id, $client_id, $msg_id, $fingerprint,
$dest_kind, $dest_ref, $expires_at)
ON CONFLICT (mesh_id, client_message_id) DO NOTHING;
-- Step 2: inspect what's actually there now (ours or someone else's).
SELECT broker_message_id, request_fingerprint, destination_kind,
destination_ref, history_available, first_seen_at
FROM mesh.client_message_dedupe
WHERE mesh_id = $mesh_id AND client_message_id = $client_id
FOR SHARE;
-- Branch:
-- row.broker_message_id == $msg_id → first insert; continue.
-- row.broker_message_id != $msg_id → duplicate. Compare fingerprints:
-- match → ROLLBACK; return 200 duplicate.
-- mismatch → ROLLBACK; return 409 idempotency_key_reused.
-- Step 3: validate Phase B2 (destination_ref existence — topic exists,
-- member subscribed, etc.). If B2 fails → ROLLBACK; return 4xx (no
-- dedupe row remains).
-- Step 4: insert in-tx side effects (§4.7.1).
INSERT INTO mesh.topic_message (id, mesh_id, client_message_id, body, ...)
VALUES ($msg_id, $mesh_id, $client_id, ...);
INSERT INTO mesh.message_history (broker_message_id, mesh_id, ...)
VALUES ($msg_id, $mesh_id, ...);
INSERT INTO mesh.delivery_queue (broker_message_id, recipient_pubkey, ...)
SELECT $msg_id, member_pubkey, ...
FROM mesh.topic_subscription
WHERE topic = $dest_ref AND mesh_id = $mesh_id;
COMMIT;
The branch logic determines the response shape (201 / 200 duplicate
/ 409 idempotency_key_reused) before COMMIT. The duplicate and 409
branches always ROLLBACK because nothing else needs to commit.
SELECT … FOR SHARE blocks concurrent writers from upgrading the same
dedupe row mid-transaction.
4.7.3 Failure modes
- Crash before
COMMIT: all rows roll back. Next daemon retry inserts cleanly. - Crash after
COMMITbut before WS ACK: dedupe row exists. Daemon retries → fingerprint matches →200 duplicate. Net: exactly one broker-accepted row, one daemondonetransition. - Constraint violation on message row insert: rolls back the whole tx.
5xxto daemon. Same fingerprint reproduces; daemon eventually marksdead. No orphan dedupe row.
Counter cm_broker_dedupe_orphan_check_total runs nightly and
validates that every client_message_dedupe row has a matching
topic_message / message_queue row OR the matching row has been
retention-pruned (history_available = FALSE). Inconsistencies logged
as cm_broker_dedupe_orphan_found{mesh_id} for human review.
4.8 Outbox schema
The authoritative outbox schema for v0.9.0 is in §4.5.2 (includes
aborted status and audit columns from the v7 pull). request_fingerprint
is computed at IPC accept time and frozen for the row's lifecycle —
the daemon never recomputes from payload post-enqueue (would produce
drift if envelope_version changes between daemon runs).
4.9 Outbox max-age math — bounded (v6)
Codex r5: the v5 formula (dedupe_retention_days * 24) - 24h_margin
breaks at dedupe_retention_days = 1 (yields zero) and is undefined
behavior at <= 1.
v6 formula and bounds:
-
Minimum supported broker dedupe retention: 3 days. Daemon refuses to start if broker advertises
dedupe_retention_days < 3(treats it asfeature_param_invalid, exits 4010). -
Daemon
max_age_hoursderivation:permanentmode → daemon uses config default (168h = 7d), cap 720h (30d).retention_scopedmode → daemonmax_age_hours = max(72, (dedupe_retention_days * 24) - safety_margin_hours)wheresafety_margin_hours = max(24, ceil(dedupe_retention_days * 0.1 * 24)). Fordedupe_retention_days=3this givesmax(72, 72-24) = 72h. For 30 days:max(72, 720-72) = 648h. For 365 days:max(72, 8760-876) = 7884h.- The 72h floor prevents the daemon outbox from being uselessly short — three days is enough margin for normal operator response to a paged outage.
-
Operator override allowed via
[outbox] max_age_hours_override = N, but ifNexceedsdedupe_retention_days * 24 - 1daemon refuses to start withoutbox_max_age_above_dedupe_window. The override exists for the rare case of a much-shorter-than-default outbox; it does not exist to circumvent the broker's dedupe window.
4.10 Inbox schema — unchanged from v3 §4.5
4.11 Crash recovery — unchanged from v3 §4.6
4.12 Failure modes — corrected for fingerprint model (v6)
- Fingerprint mismatch on retry (
409 idempotency_key_reused): outbox row markeddead. Surfaced in--failedview. Operator commandoutbox requeue --new-id <id>rotatesclient_message_idand retries. - Daemon retry after dedupe row hard-deleted by retention sweep: in
retention_scopedmode, daemonmax_age_hoursis bounded inside the retention window (§4.9), so this can only happen via operator override. In that case the retry creates a NEW dedupe row + new message — the caller chose this risk explicitly. Countercm_daemon_retry_after_dedupe_expired_total. - Daemon retry after dedupe row hard-deleted in
permanentmode: cannot happen by definition —permanentmeans noexpires_at. Only mesh deletion removes dedupe rows. - Duplicate row, history pruned: as v5 §4.4. Mark
done, logcm_daemon_dedupe_history_pruned_total.
5. Inbound — unchanged from v3 §5
6. Hooks — unchanged from v4 §6
7-13. Multi-mesh, auto-routing, service install, observability, SDKs, security model, configuration — unchanged from v4
14. Lifecycle — unchanged from v5 §14
15. Version compat — feature param updated for new dedupe semantics
15.1 Feature bits with parameters (v6 update)
| Bit | params.version |
Required parameters | Optional parameters |
|---|---|---|---|
client_message_id_dedupe |
1 |
mode: "retention_scoped"|"permanent", dedupe_retention_days: int (>= 3) (when mode=retention_scoped), request_fingerprint: bool == true |
tombstone_history_pruned_window_days: int |
concurrent_connection_policy |
1 |
(no parameters) | default_policy: "prefer_newest"|"prefer_oldest"|"allow_concurrent" |
member_keypair_rotated_event |
1 |
(no parameters) | — |
key_epoch |
1 |
max_concurrent_epochs: int (>= 1) |
— |
max_payload |
1 |
inline_bytes: int (>= 1024), blob_bytes: int (>= 1024) |
— |
client_message_id_dedupe ships at params.version = 1 with
request_fingerprint: bool == true as a required parameter. A broker
that doesn't advertise the feature, or advertises it without
request_fingerprint: true, is treated as "feature missing" and the
daemon refuses to start. That's intentional — v0.9.0 daemons require
fingerprint enforcement for safe idempotency.
The schema-version-2 evolution (parameters that need versioning) is deferred (see followups doc).
dedupe_retention_days minimum is 3 (matches the §4.9 floor).
15.2 Negotiation handshake — unchanged shape from v5 §15.2
15.3 IPC negotiation — unchanged from v3 §15.3
15.4 Compatibility matrix — unchanged from v3 §15.4
15.5 Diagnostic close code (v0.9.0)
v0.9.0 ships a single WebSocket close code with a structured
close_reason JSON payload that distinguishes the underlying cause:
| Code | Reason | close_reason.kind values |
|---|---|---|
4010 |
feature_unavailable |
feature_unavailable (feature missing from broker's supported) · feature_param_invalid (params fail validation: missing required, out of bounds, unknown version) · feature_param_below_floor (param below daemon's hard floor, e.g. dedupe_retention_days < 3) |
close_reason payload shape:
{
"kind": "feature_unavailable" | "feature_param_invalid" | "feature_param_below_floor",
"feature": "client_message_id_dedupe",
"detail": "..."
}
Daemon logs the full negotiation payload at WARN before exiting; supervisor + alerting catches the restart loop. The split into 4011/4012 codes is deferred (see followups doc).
16. Threat model — unchanged from v4 §16
17. Migration — broker dedupe table + atomicity (v6)
Broker side, deploy order:
CREATE TABLE mesh.client_message_dedupewith v6 schema (additive, online-safe).ALTER TABLE mesh.topic_message ADD COLUMN client_message_id.ALTER TABLE mesh.message_queue ADD COLUMN client_message_id.- Broker code refactor: every accept path wraps dedupe insert + message
insert in one transaction (§4.7). Pre-generated
broker_message_id(ulid in code) passed in. - Broker code: nightly job to delete dedupe rows where
expires_at < NOW()(skip inpermanentmode). - Broker code: hook into the message-retention sweep — when a
topic_messageormessage_queuerow is hard-deleted, find the matching dedupe row byclient_message_idand sethistory_available = FALSE. (Note:client_message_idis nullable on those tables for legacy traffic; nullable rows have no dedupe row to update.) - Broker code: nightly orphan-check job (§4.7); alerts on non-zero.
- Broker advertises
client_message_id_dedupefeature withparams.version = 1andrequest_fingerprint: true. - Daemon refuses to start unless that feature bit is advertised with valid v1 params.
Rollback plan: feature flag disables fingerprint enforcement broker-side (falls back to existing pre-v6 behavior — no dedupe). Daemons that require fingerprint refuse to start. Operator switches off the feature flag, reverts the daemon, restarts. No data loss; pending dedupe rows remain in place for the next forward roll.
v0.9.0 lock — what's in vs deferred
In (this document): everything codex r1–r4 ratified plus the six
sweet-spot pulls from v7–v9 enumerated at the top — aborted outbox
status, BEGIN IMMEDIATE, IPC duplicate lookup table, B1/B2/B3 phasing
concept, side-effect inventory, two-layer ID model.
Deferred (see 2026-05-03-daemon-spec-broker-hardening-followups.md):
- B0 dedupe fast-path before rate-limit (v10).
- Lua-scripted idempotent rate limiter keyed by
(mesh, client_id, window)(v10). - In-tx
mesh.mention_index(v8). - 4011 / 4012 close-code split (v6 §15.5 — collapsed to 4010 with structured reason JSON for v0.9.0).
- Per-OS fingerprint precedence elaborate table (v8 §2.2.1).
request_fingerprintschema-version-2 in feature negotiation (v6 §15.1 ships at version 1 withrequest_fingerprint: bool).- Force-expiry / quarantine semantics for
keypair-archive.json(v8 §14.1.1).
These deferrals are real improvements but not v0.9.0 blockers. They land as the broker matures and we have actual scale-load to optimize against.
Cross-spec note: §15.5 close-code collapse
For v0.9.0 we ship a single 4010 feature_unavailable close code with
a structured close_reason JSON payload that distinguishes the
underlying cause:
{
"close_reason": {
"kind": "feature_unavailable" | "feature_param_invalid" | "feature_param_below_floor",
"feature": "client_message_id_dedupe",
"detail": "..."
}
}
The 4011/4012 split is deferred to followups.
NON-NORMATIVE: round-6 review trailer (preserved for audit only)
Not part of the v0.9.0 contract. Preserved verbatim from the v6 source spec as a record of the open questions at the time of the codex round-6 review. Items below have either been resolved in this merged document, deferred to the followups doc, or superseded. Do NOT use this section as a checklist for implementation.
- Request fingerprint canonical form (§4.4) — does JCS work
cross-language for
meta_canonical_json(Python json.dumps, Go encoding/json, JS JSON.stringify all behave differently)? Should we ship a vetted JCS lib in each SDK or fall back to a simpler "sorted keys + no spaces + escape-as-stored" rule with conformance tests? - Atomicity contract (§4.7) — is the orphan-check sufficient, or does a violation mean we need a "broker rebuild dedupe from messages" recovery tool? The latter is destructive but useful for ops emergencies.
- Max-age formula (§4.9) — is the 72h floor correct? Is the
percentage-based safety margin (
max(24, ceil(0.1 * dedupe_window))) the right shape? Or simpler to say "always 24h"? 409 idempotency_key_reusedrecovery flow (§4.5) — is sending the row todeadand surfacing it viaoutbox --failedenough? Should the daemon emit a high-priority event for the SSE stream so operators are paged immediately?- Diagnostic close codes (§15.5) — is splitting 4010/4011/4012 useful, or does it just push complexity onto operators? Should we collapse to 4010 with structured close-reason JSON instead?
- Anything else still wrong? Read it as if you were going to operate this for a year. What falls down?
Three options:
- (a) v6 is shippable: lock the spec, start coding the frozen core.
- (b) v7 needed: list the must-fix items.
- (c) the architecture itself is wrong: what would you do differently?
Be ruthless.