Long-lived process that holds a persistent WS to the broker and exposes
a local IPC surface (UDS + bearer-auth TCP loopback). Implements the
v0.9.0 spec under .artifacts/specs/.
Core:
- daemon up | status | version | down | accept-host
- daemon outbox list [--failed|--pending|--inflight|--done|--aborted]
- daemon outbox requeue <id> [--new-client-id <id>]
- daemon install-service / uninstall-service (macOS launchd, Linux systemd)
IPC routes:
- /v1/version, /v1/health
- /v1/send (POST) — full §4.5.1 idempotency lookup table
- /v1/inbox (GET) — paged history
- /v1/events — SSE stream of message/peer_join/peer_leave/broker_status
- /v1/peers — broker passthrough
- /v1/profile — summary/status/visible/avatar/title/bio/capabilities
- /v1/outbox + /v1/outbox/requeue — operator recovery
Storage (SQLite via node:sqlite / bun:sqlite):
- outbox.db: pending/inflight/done/dead/aborted with audit columns
- inbox.db: dedupe by client_message_id, decrypts DMs via existing crypto
- BEGIN IMMEDIATE serialization for daemon-local accept races
Identity:
- host_fingerprint.json (machine-id || first-stable-mac)
- refuse-on-mismatch policy with `daemon accept-host` recovery
CLI integration:
- claudemesh send detects the daemon and routes through /v1/send when
present, falling back to bridge socket / cold path otherwise
Tests: 15-case coverage of the §4.5.1 IPC duplicate lookup table.
Spec arc preserved at .artifacts/specs/2026-05-03-daemon-{v1..v10}.md;
v0.9.0 implementation target locked at 2026-05-03-daemon-spec-v0.9.0.md;
deferred items at 2026-05-03-daemon-spec-broker-hardening-followups.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
20 KiB
claudemesh daemon — Final Spec v7
Round 7. v6 was reviewed by codex (round 6) which found the broker layer largely correct but caught five daemon-side and broker-tx correctness gaps:
- Daemon-local duplicate POST semantics undefined — local fingerprint comparison missing across
pending/inflight/done/dead.- §4.6 rejected-request contradiction — talked about both "fix and retry" and "fingerprint mismatch → 409". Only one of those can be true.
- §4.7 pseudocode bug —
ON CONFLICT DO NOTHING RETURNINGreturns nothing on conflict; the fingerprint comparison was in the wrong branch.- Max-age math floor consumes margin — at min retention (3 days), daemon max-age 72h equals broker window 72h. Not inside the window.
- Broker transaction boundary incomplete — fan-out/queue/history side effects not stated as in-transaction; "optional" wording was wrong.
v7 fixes all five. Intent §0 unchanged from v2. v7 only revises §4 (delivery contract) and §15 (feature param min) and §17 (migration).
0. Intent — unchanged, see v2 §0
1. Process model — unchanged
2. Identity — unchanged from v5 §2
3. IPC surface — unchanged from v4 §3
4. Delivery contract — at-least-once, fingerprinted at IPC and broker layers
4.1 The contract (precise — v7)
Local guarantee: each successful
POST /v1/sendreturns a stableclient_message_id. The send is durably persisted tooutbox.dbbefore the response returns. The daemon enforces request-fingerprint idempotency at the IPC layer: a duplicatePOSTwith the sameclient_message_idand matchingrequest_fingerprintreturns the stable prior result; with a mismatched fingerprint it returns local409 idempotency_key_reusedand the new request is not persisted.Broker guarantee: the broker maintains a dedupe record per accepted
(mesh_id, client_message_id)inmesh.client_message_dedupewithrequest_fingerprint. Retries with matching fingerprint collapse; retries with mismatched fingerprint return409 idempotency_key_reusedwithout creating a new message.Atomicity guarantee: every durable side effect of a successful accept (dedupe row, message row, fan-out work, history row, queue insertion) lands in the same broker DB transaction. Either all commit or none do.
End-to-end guarantee: at-least-once delivery, with
client_message_idpropagated to receivers' inboxes.
4.2 Daemon-supplied client_message_id — unchanged from v3 §4.2
4.3 Broker schema — unchanged from v6 §4.3
(mesh.client_message_dedupe table with request_fingerprint BYTEA, no
status column.)
4.4 Request fingerprint canonical form — unchanged from v6 §4.4
4.5 Daemon-local idempotency at the IPC layer (NEW v7 — codex r6)
The daemon enforces fingerprint idempotency before the request hits
outbox.db so a caller bug never creates duplicate-key/mismatch-payload
state at all.
4.5.1 IPC accept algorithm
On POST /v1/send:
- Validate request envelope (auth, schema, size limits). Failures
here return
4xximmediately. No outbox row is written. Theclient_message_id(whether caller-supplied or daemon-minted) is not consumed — the same id may be reused by the caller for a subsequent valid send. - Compute
request_fingerprint(§4.4). - Look up existing outbox row by
client_message_id:
| Existing row state | Fingerprint match? | Daemon response |
|---|---|---|
| (no row) | — | Insert new outbox row in pending; return 202 accepted, queued with client_message_id |
pending |
match | Return 202 accepted, queued with the existing client_message_id. No new row. Idempotent retry of an in-progress send |
pending |
mismatch | Return 409 idempotency_key_reused with conflict: "outbox_pending_fingerprint_mismatch". No mutation of the existing row. |
inflight |
match | Return 202 accepted, inflight. No new row. Caller is retrying mid-broker-roundtrip |
inflight |
mismatch | Return 409 idempotency_key_reused with conflict: "outbox_inflight_fingerprint_mismatch" |
done |
match | Return 200 ok, duplicate: true, broker_message_id, history_id. No new row, no broker call |
done |
mismatch | Return 409 idempotency_key_reused with conflict: "outbox_done_fingerprint_mismatch", broker_message_id |
dead |
match | Return 409 idempotency_key_reused with conflict: "outbox_dead_fingerprint_match", reason: "<last_error>". Caller must rotate the id (see §4.6.3) — daemon refuses to re-attempt a dead row's exact bytes. |
dead |
mismatch | Return 409 idempotency_key_reused with conflict: "outbox_dead_fingerprint_mismatch" |
Rule: any IPC 409 carries the daemon's request_fingerprint (8-byte
hex prefix) so callers can debug client/server canonical-form drift.
4.5.2 Outbox table — fingerprint required, atomic UPSERT removed
CREATE TABLE outbox (
id TEXT PRIMARY KEY,
client_message_id TEXT NOT NULL UNIQUE,
request_fingerprint BLOB NOT NULL, -- 32 bytes
payload BLOB NOT NULL,
enqueued_at INTEGER NOT NULL,
attempts INTEGER DEFAULT 0,
next_attempt_at INTEGER NOT NULL,
status TEXT CHECK(status IN ('pending','inflight','done','dead')),
last_error TEXT,
delivered_at INTEGER,
broker_message_id TEXT
);
CREATE INDEX outbox_pending ON outbox(status, next_attempt_at);
Insertion is BEGIN; SELECT FOR UPDATE; if-no-row INSERT; COMMIT; —
explicit lock + check + insert, not INSERT OR IGNORE. The daemon
never auto-mutates an existing row's request_fingerprint or
payload; mismatches are 409s, not silent overwrites.
request_fingerprint is computed once at IPC accept time and frozen.
Retries to the broker re-send the same bytes from payload and the
same request_fingerprint. Daemon does not recompute post-enqueue.
4.6 Rejected-request semantics — pick one rule (NEW v7 — codex r6)
Rule: the
client_message_idis consumed iff the daemon writes an outbox row. Anything that fails before outbox insertion (validation, auth, size) leaves the id untouched and freely reusable.
This makes §4.6 internally consistent with §4.5:
4.6.1 IPC validation failure (no outbox row written)
- Schema/auth/size/destination-not-resolvable failures return
4xximmediately. Theclient_message_idis not stored anywhere on the daemon. Caller may re-send with the same id and a fixed payload; it will be treated as a fresh request because no outbox row exists.
4.6.2 Outbox row exists, broker permanent rejection (4xx response)
- Daemon receives
4xxfrom broker (e.g. payload size delta between daemon and broker advertised limits, mesh-level reject). Outbox row transitions todeadwithlast_errorpopulated. - Caller retrying with same
client_message_id→ daemon returns409 idempotency_key_reused, conflict: "outbox_dead_*"per §4.5.1. - The id is consumed (row is locked in
dead) until operator action.
4.6.3 Operator recovery: rotating an idempotency key
To unstick a dead row whose payload needs to change, operator runs:
claudemesh daemon outbox requeue --id <outbox_id> --new-client-id [auto|<id>]
This atomically:
- Marks the existing
deadrow asaborted(terminal, never retried). - Creates a new outbox row with a fresh
client_message_id(caller- supplied or daemon-ulid'd) and the SAME or a CALLER-PATCHED payload. - The old
client_message_idbecomes free again at the daemon layer but is still locked at the broker layer if the broker had ever accepted it (its dedupe row stays). For a row that died before broker acceptance, the id is fully reusable end-to-end.
Operators see a clear distinction between dead (needs operator
attention) and aborted (intentionally retired). Add aborted to the
status CHECK constraint:
status TEXT CHECK(status IN ('pending','inflight','done','dead','aborted'))
4.7 Broker atomicity contract — corrected pseudocode + side-effect inventory (v7 — codex r6)
4.7.1 Side effects inside the transaction
Every successful broker accept atomically commits the following durable state in one transaction:
| Effect | Table | Notes |
|---|---|---|
| Dedupe record | mesh.client_message_dedupe |
NEW row keyed by (mesh_id, client_message_id) |
| Message body | mesh.topic_message OR mesh.message_queue |
NEW row keyed by broker_message_id (pre-generated ulid) |
| History row | mesh.message_history |
NEW row pointing at broker_message_id for ordered replay |
| Fan-out work | mesh.delivery_queue |
One row per intended recipient (member subscribed to topic, recipient of DM, etc.) |
Effects outside the transaction (committed after ACK to daemon):
- WebSocket pushes to currently-connected subscribers — these are best- effort live notifications; on failure subscribers fetch from history on next connect.
- Webhook fan-out (post-v0.9.0 feature) — runs asynchronously off the
delivery_queuerows committed inside the transaction.
If any in-transaction insert fails (constraint violation, DB error),
the transaction rolls back: no dedupe row, no message row, no history,
no delivery queue rows. Broker returns 5xx to daemon; daemon retries.
4.7.2 Corrected pseudocode (codex r6)
The fingerprint comparison must happen on the conflict-select branch,
not the RETURNING branch:
BEGIN;
-- Pre-generate broker_message_id (ulid) outside the transaction, pass in.
-- Step 1: try to claim the idempotency key.
INSERT INTO mesh.client_message_dedupe
(mesh_id, client_message_id, broker_message_id, request_fingerprint,
destination_kind, destination_ref, expires_at)
VALUES ($mesh_id, $client_id, $msg_id, $fingerprint,
$dest_kind, $dest_ref, $expires_at)
ON CONFLICT (mesh_id, client_message_id) DO NOTHING;
-- Step 2: was it our insert?
SELECT broker_message_id, request_fingerprint, destination_kind,
destination_ref, history_available, first_seen_at
FROM mesh.client_message_dedupe
WHERE mesh_id = $mesh_id AND client_message_id = $client_id
FOR SHARE;
-- If returned.broker_message_id == $msg_id (our pre-generated id),
-- this was the first insert. Continue to step 3.
-- If returned.broker_message_id != $msg_id AND
-- returned.request_fingerprint == $fingerprint,
-- this is a duplicate retry. ROLLBACK; return 200 duplicate.
-- If returned.broker_message_id != $msg_id AND
-- returned.request_fingerprint != $fingerprint,
-- ROLLBACK; return 409 idempotency_key_reused.
-- Step 3: insert message row, history, fan-out queue.
INSERT INTO mesh.topic_message (id, mesh_id, client_message_id, body, ...)
VALUES ($msg_id, $mesh_id, $client_id, ...);
INSERT INTO mesh.message_history (broker_message_id, mesh_id, ...)
VALUES ($msg_id, $mesh_id, ...);
INSERT INTO mesh.delivery_queue (broker_message_id, recipient_pubkey, ...)
SELECT $msg_id, member_pubkey, ...
FROM mesh.topic_subscription
WHERE topic = $dest_ref AND mesh_id = $mesh_id;
COMMIT;
The branch logic determines the response shape (201 vs 200 duplicate vs 409 idempotency_key_reused) before COMMIT. The
duplicate and 409 branches always ROLLBACK because nothing else
needs to commit on those paths.
SELECT … FOR SHARE blocks concurrent writers from upgrading the
same dedupe row mid-transaction; a concurrent insert with the same
key will block until our transaction completes.
4.7.3 Orphan check — covers full inventory now
The nightly cm_broker_dedupe_orphan_check_total job (v6 §4.7) is
extended to verify all four in-transaction effects. For each
client_message_dedupe row:
- Either the corresponding
topic_message/message_queuerow exists, ORhistory_available = FALSEAND a deleted-tombstone is recorded. - AND a corresponding
message_historyrow exists (or has been pruned per history retention). - AND zero outstanding
delivery_queuerows older than fan-out timeout reference abroker_message_idwhose dedupe row is missing.
Any inconsistency logged as cm_broker_atomicity_violation_found for
human review. Should be zero in steady state.
4.8 Outbox max-age math — strictly inside broker window (v7 — codex r6)
Codex r6: at v6's 3-day minimum, daemon max_age (72h) equaled broker window (72h). That isn't "inside the window."
v7 raises the floor and tightens the formula:
- Minimum supported broker
dedupe_retention_days: 7 (was 3 in v6). Below this, daemon refuses to start with4012 feature_param_below_floor. - Daemon
max_age_hoursderivation (retention_scopedmode):At minimum (7 days):safety_margin_hours = max(24, ceil(dedupe_retention_days * 0.1 * 24)) max_age_hours = (dedupe_retention_days * 24) - safety_margin_hourssafety_margin = max(24, 17) = 24h;max_age = 168 - 24 = 144h. Daemon outbox ≤144h, broker window ≥168h, gap ≥24h. - Daemon
max_age_hoursderivation (permanentmode):max_age_hours = config.outbox.max_age_hours_default (168h) capped at config.outbox.max_age_hours_cap (720h) - Operator override:
[outbox] max_age_hours_override = Naccepted iffN <= dedupe_retention_days * 24 - 24. Above that → daemon refuses to start withoutbox_max_age_above_dedupe_windowclear text. - The 72h floor from v6 is dropped because the new 7-day broker minimum already produces a 144h derived max-age — well above any realistic floor concern.
4.9 Inbox schema — unchanged from v3 §4.5
4.10 Crash recovery — unchanged from v3 §4.6
4.11 Failure modes — unchanged from v6 §4.12, with §4.5/§4.6 added
- IPC accept fingerprint-mismatch on duplicate id: returns 409 with
conflictfield per §4.5.1. Caller must rotate id. - Outbox row stuck in
dead: operator runsoutbox requeue --new-client-idper §4.6.3. - Broker fingerprint mismatch on retry: as v6 §4.5. Daemon marks
dead, surfaces inoutbox --failed. - Daemon retry after dedupe row hard-deleted by broker retention
sweep: cannot happen unless operator overrode
max_age_hoursbeyond the safety margin. Inpermanentmode cannot happen at all. - Atomicity violation found by orphan check: alerts ops; broker team investigates. Should be zero.
5. Inbound — unchanged from v3 §5
6. Hooks — unchanged from v4 §6
7-13. — unchanged from v4
14. Lifecycle — unchanged from v5 §14
15. Version compat — minimum dedupe_retention_days raised
15.1 Feature bits with parameters (v7 update)
Only one row changes from v6 §15.1:
| Bit | params.version |
Required parameters | Optional parameters |
|---|---|---|---|
client_message_id_dedupe |
2 |
mode: "retention_scoped"|"permanent", dedupe_retention_days: int (>= 7) (when mode=retention_scoped), request_fingerprint: bool == true |
tombstone_history_pruned_window_days: int |
dedupe_retention_days minimum raised from 3 to 7 to keep daemon
outbox max-age strictly inside the broker window with margin (§4.8).
15.2 — 15.5 unchanged from v6 §15
(feature_negotiation_request/response, IPC negotiation, compat
matrix, diagnostic close codes 4010 / 4011 / 4012.)
16. Threat model — unchanged from v4 §16
17. Migration — broker dedupe + atomicity + corrected pseudocode (v7)
Broker side, deploy order:
CREATE TABLE mesh.client_message_dedupe(v6 §4.3 schema, unchanged in v7).ALTER TABLE mesh.topic_message ADD COLUMN client_message_id.ALTER TABLE mesh.message_queue ADD COLUMN client_message_id.- Broker code refactor: every accept path runs the v7 §4.7.2 corrected pseudocode in one transaction with the side-effect inventory from §4.7.1 — dedupe row, message row, history row, delivery_queue rows all in-tx.
- Broker code: existing fan-out workers consume
delivery_queuerows committed by the accept transaction. - Broker code: nightly retention sweep +
history_availableflip on message-row pruning (unchanged from v6 §17 step 5+6). - Broker code: extended orphan-check job (v7 §4.7.3) — alerts on atomicity violations across full inventory.
- Broker advertises
client_message_id_dedupefeature withparams.version = 2,request_fingerprint: true,dedupe_retention_days >= 7(was 3). - Daemon refuses to start unless above is advertised.
Daemon side:
- Outbox table gains
abortedstatus (§4.6.3); migration ALTER on the CHECK constraint at startup if SQLite version <DDL works without a recreate; else table recreate viaINSERT INTO new SELECT * FROM old. v0.9.0 daemons are fresh installs by definition; existing outboxes don't exist. - IPC accept path implements §4.5.1 lookup table.
- IPC error envelope adds
conflictanddaemon_fingerprint_prefixfields for 409 responses. - New CLI verb
claudemesh daemon outbox requeue --id <id> --new-client-id [auto|<id>](§4.6.3).
What changed v6 → v7 (codex round-6 actionable items)
| Codex r6 item | v7 fix | Section |
|---|---|---|
| Daemon-local duplicate POST semantics undefined | Full lookup table for pending/inflight/done/dead × match/mismatch; 409 idempotency_key_reused at IPC layer with conflict field |
§4.5 |
| §4.6 rejected-request contradiction | Single rule: id consumed iff outbox row written; pre-outbox failures leave id untouched; broker-rejected outbox row goes to dead, requires requeue --new-client-id |
§4.6 |
| §4.7 pseudocode wrong | Corrected: INSERT ON CONFLICT DO NOTHING, then SELECT FOR SHARE, then branch on returned broker_message_id and fingerprint |
§4.7.2 |
| Max-age math equals window at min | Min dedupe_retention_days raised to 7; safety margin always >= 24h; derived max-age strictly < window |
§4.8, §15.1 |
| Broker atomicity scope incomplete | Side-effect inventory: dedupe + message + history + delivery_queue all in-tx; WS push and webhook fan-out explicitly outside-tx; orphan check extended | §4.7.1, §4.7.3 |
New aborted outbox status |
Distinguishes operator-retired rows from dead rows | §4.6.3 |
What needs review (round 7)
- IPC lookup table (§4.5.1) — does it cover all the realistic
client races? The "inflight + match" return is
202 accepted, inflight— should it be200 okwith the broker response if the broker has already responded? Or does the daemon prefer to respond from local state always? - Aborted vs dead vs done (§4.6.3) — is the three-state terminal
distinction useful, or noisy? Would
dead+ anaborted_attimestamp suffice? - §4.7.2 transaction shape —
SELECT FOR SHAREafterINSERT ON CONFLICT DO NOTHINGis two round-trips. Could it be one withINSERT ... ON CONFLICT DO UPDATE SET ... RETURNING xmax = 0or similar Postgres-specific trick? Worth optimizing here? - Max-age formula at higher windows — at 365 days,
safety_margin = ceil(0.1 * 365 * 24) = 876h ≈ 36.5 days. Daemon max-age =8760 - 876 = 7884h ≈ 328 days. Is that the right shape, or should the safety margin be capped (e.g.min(72, ceil(0.1 * w)))? - Side-effect inventory (§4.7.1) — anything missing? E.g. broker- side rate-limit counters, audit-log entries, mention-fanout-search?
- Anything else still wrong? Read it as if you were going to operate this for a year. What falls down?
Three options:
- (a) v7 is shippable: lock the spec, start coding the frozen core.
- (b) v8 needed: list the must-fix items.
- (c) the architecture itself is wrong: what would you do differently?
Be ruthless.