Files
claudemesh/.artifacts/shipped/2026-05-03-daemon-final-spec-v3.md
Alejandro Gutiérrez a2568ad9f4
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
chore(release): cli 1.22.0 — daemon v0.9.0 + housekeeping
- Bump apps/cli/package.json to 1.22.0 (additive feature: claudemesh
  daemon long-lived runtime).
- CHANGELOG entry for 1.22.0 covering subcommands, idempotency wiring,
  crash recovery, and the deferred Sprint 7 broker hardening.
- Roadmap entry for v0.9.0 daemon foundation right above the v2.0.0
  daemon redesign section, so the bridge release is documented as the
  shipped step toward the larger architectural shift.
- Move shipped daemon specs (v1..v10 iteration trail + locked v0.9.0
  spec + broker-hardening followups) from .artifacts/specs/ to
  .artifacts/shipped/ per the project artifact-pipeline convention.

Not in this commit: npm publish and the cli-v1.22.0 GitHub release tag
— both are public-distribution actions and require explicit user
approval.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 20:24:32 +01:00

26 KiB

claudemesh daemon — Final Spec v3

Round 3. v2 of this spec was reviewed by another model and pushed back on identity/clone semantics (boot-id false-positives), delivery contract (broker must dedupe on client-supplied id — protocol change), CI shared-runner threat model, version negotiation (need feature bits, not ranges), key rotation crypto, hook scope granularity, inbox schema correctness, and ~7 smaller polish items. v3 incorporates all of them.

The intent §0 from v2 is unchanged and still authoritative — read it there. v3 only revises what changed.


0. Intent — unchanged, see v2 §0

Pre-launch peer-mesh runtime. Servers/laptops become first-class peers. Stable identity, persistent WS, local IPC, hooks. Not a webhook gateway, not a generic broker. We can break anything.

One claim retracted from v1/v2: "exactly-once" delivery. Replaced with a precise contract in §4 below.


1. Process model — same as v2 §1

Resource caps, file layout, single-binary unchanged.


2. Identity — accidental-clone detection only, plus broker dedupe

Codex was right: v2's clone detection was both too weak (anyone copying host_fingerprint.json along with keypair.json defeats it) and too noisy (boot-id flips every reboot → false-positives on every legitimate restart).

2.1 Modes

claudemesh daemon up                       # default: persistent member
claudemesh daemon up --ephemeral           # in-memory keypair, never written
claudemesh daemon up --ephemeral --ttl 2h  # auto-shutdown after duration

CI auto-detection (NEW): if any of the following env vars are set (CI=true, GITHUB_ACTIONS, GITLAB_CI, BUILDKITE, CIRCLECI, JENKINS_URL, RUNPOD_POD_ID, KUBERNETES_SERVICE_HOST), AND --persistent is not explicitly passed, daemon defaults to --ephemeral. Rationale in §16.

2.2 Accidental-clone detection (NOT attacker-grade)

Frame change: this catches image clones, restored backups, copy-pasted homedirs — accidents made by humans operating at human speed. It does not defend against an attacker who copies both keypair.json and host_fingerprint.json. The threat model (§16) says this explicitly.

Persisted fingerprint = sha256(machine-id || first-stable-mac). Notably:

  • No boot-id — that flips on every reboot and would false-positive every legitimate restart.
  • No hostname — laptops legitimately rename themselves.
  • first-stable-mac = MAC of the lexicographically first non-loopback, non-virtual interface present at first daemon boot. Frozen at first run; not recomputed.

Behavior on mismatch:

  • Default policy: refuse to start. Print: "This keypair was created on a different host. If you legitimately moved hardware, run claudemesh daemon accept-host (writes a fresh fingerprint, keeps keypair). If this is a clone of an existing daemon, run claudemesh daemon remint (mints fresh keypair, registers as a new member)."
  • [clone] policy = "refuse" | "warn" | "allow" overrides per host.

2.3 Concurrent-duplicate-identity broker policy (NEW — protocol change)

When the broker receives two WS connections claiming the same member pubkey:

  • prefer_newest (default): older connection is closed with code 4003 replaced_by_newer_connection. New connection takes over presence/inbox delivery. Daemon-side: receives the close code, logs forensic event, exits with non-zero status (lets supervisor restart it; if the other host is the legitimate one, supervisor restart-loops are noisy enough to alert).
  • prefer_oldest: new connection is rejected with code 4004 member_already_connected. The new daemon refuses to start.
  • allow_concurrent (new mode, server-side feature flag): both connections accepted; broker tracks both as sibling sessions of the same member (same model as claudemesh launch siblings today). Useful when a user really does want one keypair on multiple hosts (e.g. failover pairs).

Configured per-mesh in mesh.cloneConcurrencyPolicy. Default: prefer_newest. Broker emits member_concurrent_connection audit event in all cases.

2.4 Rename, key rotation — see §14


3. IPC surface — frozen core, hardened auth

3.1 Frozen core (v0.9.0) — slight cut from v2

Codex agreed v2's cut was mostly right, except: defer FTS-search to a capability gate, keep peer list in core, drop redundancies.

# Messaging
POST   /v1/send              {to, message, priority?, meta?, replyToId?,
                              client_message_id?}
POST   /v1/topic/post        {topic, message, priority?, mentions?,
                              client_message_id?}
POST   /v1/topic/subscribe   {topic}
POST   /v1/topic/unsubscribe {topic}
GET    /v1/topic/list
GET    /v1/inbox             ?since=<iso>&topic=<n>&from=<peer>&limit=<n>
                             # plain SQL paging; NO FTS in v0.9.0

# Peers + presence (kept in core — central to "first-class peer")
GET    /v1/peers             ?mesh=<slug>
POST   /v1/profile           {summary?, status?, visible?}

# Files (already production)
POST   /v1/file/share        {path, to?, message?, persistent?}
GET    /v1/file/get          ?id=<fileId>&out=<path>
GET    /v1/file/list

# Events — push
GET    /v1/events            text/event-stream
       core events: message, peer_join, peer_leave, file_shared,
                    daemon_disconnect, daemon_reconnect, hook_executed,
                    feature_negotiation_failed

# Control plane
GET    /v1/health            (auth required by default — see §3.3)
GET    /v1/metrics           (auth required by default)
GET    /v1/version           (auth required by default)
POST   /v1/heartbeat         {}

inbox/search with FTS deferred to v0.9.x capability gate inbox_fts.

3.2 Capability-gated future surface (v0.9.x)

Same as v2 §3.2 — state, memory, vector, graph, tasks, scheduling, mcp_host, skill_share, plus new inbox_fts. None enabled in v0.9.0.

3.3 Local IPC authentication — tightened

Same shape as v2 §3.3 but with codex's polish folded in:

Transport Auth Notes
UDS None (FS perms 0600) Reaching socket = same UID
TCP loopback Authorization: Bearer <local_token> REQUIRED 127.0.0.1 only
SSE Authorization: Bearer <local_token> REQUIRED same

Token plumbing rules (NEW):

  • local_token MUST be in the Authorization header. Never accepted in query string. Endpoint that sees a ?token=... query param logs a security event and returns 400.
  • local_token MUST be redacted from access logs (Authorization: Bearer *** in logs).
  • local_token rotation atomically writes a new file; SDKs hold the OLD token valid for 60s grace, then it's rejected.

Endpoint default auth (NEW — codex):

  • Every IPC endpoint requires the local token by default, including /v1/health, /v1/metrics, /v1/version. [ipc] public_health_check = true opts in to public /v1/health for k8s probes etc.

Container default (NEW — codex):

  • If KUBERNETES_SERVICE_HOST is set OR /.dockerenv exists OR /proc/1/cgroup indicates a container OR explicit --container flag, daemon defaults to UDS-only ([ipc] tcp_enabled = false). Containers share host loopback when network_mode: host; UDS-only avoids the side-channel.

Origin/Host policy:

  • Host header must be localhost, 127.0.0.1, [::1] or empty. Else 403.
  • Origin header: explicit allowlist (default: empty). SSRF-from-browser bounce-attack defense.
  • User-Agent requirement DROPPED (codex called it theatre — correct).
  • CORS: never echo Access-Control-Allow-Origin; preflight returns 403.

3.4 Request limits & backpressure — same as v2


4. Delivery contract — at-least-once, broker-dedupes-on-client-id

Codex caught the real protocol gap: idempotency only works if the broker dedupes on the caller's id, not its own. This requires a broker change.

4.1 The contract (precise)

Local guarantee: each successful POST /v1/send returns a stable client_message_id. The send is durably persisted to outbox.db before the response returns.

Broker guarantee: the broker dedupes on client_message_id for a 24h window. Multiple inflight retries from the daemon for the same client_message_id produce at most one broker-accepted row.

End-to-end guarantee: at-least-once delivery to subscribers, with client_message_id propagated in the inbound envelope so receivers can dedupe locally on their side. We do not guarantee at-most-once end-to-end — that requires receiver-side dedupe, which the daemon's inbox.db provides for daemon-hosted peers.

4.2 Daemon-supplied client_message_id (NEW — broker protocol change)

Every send has a stable id minted on the daemon, not the broker:

  • Caller-supplied via Idempotency-Key header → wins.
  • Caller-supplied in body as client_message_id field → second.
  • Else daemon mints a ulid → last.

The id is:

  • Returned in the IPC response.
  • Stored in outbox.db as a UNIQUE NOT NULL column (real dedupe, not INSERT OR IGNORE on nullable — codex caught this).
  • Propagated to the broker on every retry (client_message_id field in the WS send envelope and in POST /v1/messages).
  • Stored in the broker's meshTopicMessage.client_message_id column with a UNIQUE constraint scoped to (meshId, client_message_id).
  • Propagated in the inbound delivery to receivers' inboxes.

Broker behavior on duplicate client_message_id: returns the already-stored messageId and historyId from the prior insertion. No new row, no new fan-out, idempotent.

4.3 Broker schema delta (NEW)

ALTER TABLE mesh.topic_message
  ADD COLUMN client_message_id TEXT;
ALTER TABLE mesh.message_queue
  ADD COLUMN client_message_id TEXT;

CREATE UNIQUE INDEX topic_message_client_id_idx
  ON mesh.topic_message(mesh_id, client_message_id)
  WHERE client_message_id IS NOT NULL;
CREATE UNIQUE INDEX message_queue_client_id_idx
  ON mesh.message_queue(mesh_id, client_message_id)
  WHERE client_message_id IS NOT NULL;

Partial unique index — legacy traffic without client_message_id (from claudemesh launch, dashboard chat, web posts) is unaffected.

4.4 Outbox schema (corrected)

CREATE TABLE outbox (
  id                  TEXT PRIMARY KEY,                -- ulid (local row id)
  client_message_id   TEXT NOT NULL UNIQUE,            -- propagated to broker
  payload             BLOB NOT NULL,
  enqueued_at         INTEGER NOT NULL,
  attempts            INTEGER DEFAULT 0,
  next_attempt_at     INTEGER NOT NULL,
  status              TEXT CHECK(status IN ('pending','inflight','done','dead')),
  last_error          TEXT,
  delivered_at        INTEGER,
  broker_message_id   TEXT                              -- set on ACK
);
CREATE INDEX outbox_pending ON outbox(status, next_attempt_at);

UNIQUE NOT NULL on client_message_id: caller retries with the same id collide locally and become a no-op.

4.5 Inbox schema (corrected — content table + FTS index)

Codex caught: FTS5 virtual tables are not where you put CREATE INDEX. Real shape:

-- Content table — the durable store
CREATE TABLE inbox (
  id                  TEXT PRIMARY KEY,                -- ulid (local row id)
  client_message_id   TEXT NOT NULL UNIQUE,            -- dedupe key
  broker_message_id   TEXT,
  mesh                TEXT NOT NULL,
  topic               TEXT,
  sender_pubkey       TEXT NOT NULL,
  sender_name         TEXT NOT NULL,
  body                TEXT,
  meta                TEXT,                            -- JSON
  received_at         INTEGER NOT NULL,
  reply_to_id         TEXT
);
CREATE INDEX inbox_received_at ON inbox(received_at);
CREATE INDEX inbox_topic       ON inbox(topic);
CREATE INDEX inbox_sender      ON inbox(sender_pubkey);

-- FTS5 index — gated behind capability `inbox_fts` (deferred to v0.9.x)
-- When enabled, populated via triggers; absent in v0.9.0.

Insert path: INSERT INTO inbox(...) ON CONFLICT(client_message_id) DO NOTHING RETURNING id. The RETURNING clause tells us whether a new row landed; only new rows trigger hooks.

4.6 Crash recovery — explicit semantics

On daemon startup:

  1. Rows in inflight reset to pending with attempts++, next_attempt_at = now + min_backoff. Note: these may double-deliver if the broker actually accepted before the local ACK persisted. The client_message_id propagation ensures the broker dedupes the retry — net result: exactly one broker-accepted row, possibly two daemon-side inflight → done transitions.
  2. outbox.db PRAGMA integrity_check; failure → daemon refuses to start, point at claudemesh daemon recover.
  3. inbox.db integrity check; failure → move to inbox.db.corrupt-<ts>, create fresh empty inbox, log inbox_corruption_recovered. Inbox is a cache; recoverable from broker history.

4.7 Failure modes the spec is honest about

  • Broker dedupe window expired: daemon retries a 25h-old send. Broker accepts again as if new (no dedupe). Daemon's outbox max_age_hours (default 168h = 7d) is longer than broker dedupe (24h), so this is possible. Default daemon max_age_hours REDUCED to 23h to stay inside broker dedupe window. Configurable up only if the operator accepts the risk explicitly.
  • dead rows: surface in claudemesh daemon outbox --failed. User manually requeues (outbox requeue <id>) or drops (outbox drop <id>).
  • Receiver-side dedupe failure: only daemon-hosted receivers dedupe. claudemesh launch and dashboard chat clients DO NOT dedupe today — fixing them is post-v0.9.0.

5. Inbound — schema corrected (see §4.5), retention as v2

30-day rolling retention (configurable). Weekly VACUUM. claudemesh daemon search deferred to inbox_fts capability.


6. Hooks — scopes tightened, exfiltration acknowledged

Codex was right: capability tokens removed the broad-token footgun, not exfiltration. Untrusted hook payload + network_policy=deny not reliable across platforms. Spec is now honest about that.

6.1 Hooks contract — same shape as v2 §6, with tighter defaults

6.2 Capability scopes — narrowed for v0.9.0

Codex pushed: scopes were too coarse. v0.9.0 scopes are exactly:

Scope Capability Notes
reply:event Reply to the specific event that triggered this hook Bound to event_id; daemon validates target; expires on hook exit
dm:send:<sender_pubkey> Send DM only to the specific sender Bound to one pubkey from event; not a write to anyone
topic:<name>:post Post to the specific topic that fired Bound to topic from event; can't write elsewhere

No read scopes in v0.9.0. A hook cannot read state, inbox, peers, etc. If a hook wants to consult mesh data to compose its reply, it does so via the event payload (which the daemon redacted appropriately) or via shell out to a fresh claudemesh <verb> call (which uses the user's existing config and is subject to its own auth). No daemon-mediated read tokens.

6.3 Sandboxing — supported, not promised

Codex caught: "network_policy=deny" sounds reliable but isn't cross-platform. Spec now says explicitly:

  • network_policy = "deny" is best-effort:
    • Linux: enforced via unshare --net if available; else firewall rule via iptables -m owner if available; else daemon logs warning that policy cannot be enforced and the hook STILL runs.
    • macOS: enforced via sandbox-exec profile if available; else warning + run.
    • Windows: not enforced; warning + run.
  • Operators on hostile networks should set enabled = false for hooks they don't trust.
  • Daemon cm_daemon_hook_unenforceable_total counter exposes the count of hooks that ran with weakened sandbox.

6.4 Payload size & truncation — NEW

Stdin payloads to hooks capped at 256 KB (configurable). Larger payloads truncated with _truncated: true flag in the JSON event. Hook stdout captured up to output_size_limit (default 64 KB).

6.5 Audit log + killpg — same as v2


7. Multi-mesh — same as v2 §7


8. Auto-routing — same as v2 §8 (codex agreed it was clarified correctly)


9. Service installation — same as v2 §9

Add: when claudemesh daemon install-service runs in CI-detected environment, prints Refusing to install persistent service in CI; ephemeral mode only. and exits non-zero unless --allow-ci-persistent is passed.


10. Observability — same as v2 §10

Add metric: cm_daemon_hook_unenforceable_total{hook,reason} (§6.3).


11. SDKs — same shape as v2, bound to frozen core only


12. Security model — same boundaries, plus dedupe + feature negotiation

Boundary Trust Mechanism
App ↔ Daemon (UDS) OS user UDS 0600
App ↔ Daemon (TCP/SSE) OS user + bearer token 127.0.0.1 + local_token + Origin/Host
Hook ↔ Daemon Capability scope Short-lived token bound to event; no read scopes
Daemon ↔ Broker Mesh keypair + feature bits WSS + ed25519 + crypto_box + per-topic keys + feature negotiation (§15)
Daemon ↔ Disk OS user All files 0600/0644
Cloned identity First-mac fingerprint Accidental-clone detection only; broker concurrent-policy on §2.3

13. Configuration — same shape as v2 §13, plus [features]

[features]
require = ["client_message_id_dedupe", "concurrent_connection_policy"]
optional = ["mesh_skill_share", "mcp_host"]
# Daemon refuses to start if broker doesn't advertise all `require` bits.

14. Lifecycle — key rotation crypto fixed

14.1 Key rotation (CORRECTED — codex)

v2 said: "old pubkey held server-side for 24h grace (decrypts in-flight messages encrypted to old pubkey)". Wrong — only the daemon has the private key. Broker can't decrypt.

Real semantics:

  • claudemesh daemon rotate-keypair mints fresh ed25519 + x25519, registers the new pubkey with the broker as member_keypair_rotated.
  • Broker associates the new pubkey with the same member id, marks the old pubkey as rotated_out (not revoked).
  • Daemon-side: the OLD x25519 private key is retained in keypair-archive.json (mode 0600, durable) for a key_grace_period (default 7 days). During the grace window, daemon will attempt to decrypt inbound messages with the new private key first, falling back to archived keys (one or more). Messages encrypted to the old pubkey by senders who haven't yet seen the rotation event continue to decrypt cleanly.
  • After the grace period, archived keys are zeroed and the file is deleted. Messages encrypted to a stale pubkey after the grace window fail to decrypt and are logged as cm_daemon_decrypt_stale_total.

14.2 Backup includes topic state (CORRECTED)

claudemesh daemon backup now packages:

  • keypair.json (current)
  • keypair-archive.json (any in-grace-window archived keys)
  • host_fingerprint.json
  • config.toml
  • local_token (NOT — token is rotated on restore)
  • topic_subscriptions.json (which topics this daemon subscribes to)
  • topic_keys.json (per-topic symmetric keys this member holds)
  • key_epoch.json (current epoch number per topic; relevant when the mesh rotates topic keys)
  • schema_version

Backup file: encrypted with a passphrase (Argon2id KDF + crypto_secretbox). Restore writes everything except local_token (regenerated). On first run after restore, daemon performs accept-host if fingerprint mismatches (restore is by definition a host change).

14.3 Local token rotation, compromised host revocation, image-clone, uninstall, recovery — same as v2 §14


15. Version compat — feature-bit negotiation (REPLACES v2 §15)

Codex was right: version ranges aren't enough when daemon depends on specific broker capabilities (client-supplied IDs, concurrent-connection policy, key epochs).

15.1 Feature bits

Each protocol-relevant capability gets a stable string identifier:

client_message_id_dedupe       broker dedupes on client_message_id (§4.2)
concurrent_connection_policy   broker honours mesh.cloneConcurrencyPolicy (§2.3)
member_keypair_rotated_event   broker emits the event (§14.1)
key_epoch                      per-topic key epochs supported (§14.2)
mesh_skill_share               post-v0.9, future
mcp_host                       post-v0.9, future

15.2 Negotiation handshake

On WS connect (after hello, before normal traffic):

→ daemon:  feature_negotiation_request
           { require:  ["client_message_id_dedupe",
                        "concurrent_connection_policy"],
             optional: ["mesh_skill_share","mcp_host"] }

← broker:  feature_negotiation_response
           { supported: ["client_message_id_dedupe",
                         "concurrent_connection_policy",
                         "member_keypair_rotated_event"],
             missing_required: [] }

If missing_required is non-empty, daemon closes the connection with code 4010 feature_unavailable, logs forensic event, exits with non-zero status. Supervisor sees a restart-loop → operator alerted via configured mechanisms.

15.3 IPC negotiation (CLI/SDK ↔ daemon)

GET /v1/version returns:

{
  "daemon_version": "0.9.0",
  "ipc_api": "v1",
  "ipc_features": ["send","topic","peers","files","events","health"],
  "schema_version": 7,
  "broker_features_negotiated": ["client_message_id_dedupe", ...]
}

CLI/SDK matches ipc_features against required. Missing required → fall-back to cold-path with warning OR fail explicitly (CLI verb's choice).

15.4 Compatibility matrix — published

GET /v1/compat
{
  "daemon": "0.9.0",
  "compatible_brokers": ["0.7.x","0.8.x","0.9.x"],
  "required_broker_features": ["client_message_id_dedupe",
                               "concurrent_connection_policy"],
  "compatible_clis": ["0.9.x"],
  "compatible_sdks": {
    "python": ">=0.9.0,<1.0.0",
    "go":     ">=0.9.0,<1.0.0",
    "ts":     ">=0.9.0,<1.0.0"
  }
}

16. Threat model — shared-CI reality folded in

16.1 Attacker classes — same matrix as v2 §16, plus:

Attacker Has Wants Mitigations
Shared CI runner (NEW) Same Unix UID as other untrusted jobs Read this user's persistent keypair across job boundaries Auto-detect CI envs (§2.1) → ephemeral default + UDS-only + isolated $HOME. If operator overrides with --persistent, log warning persistent_keypair_in_ci_environment.
Malicious mesh peer (PROMOTED from out-of-scope to in-scope) Mesh membership Send malformed payload to crash daemon Every inbound shape validated against schema before any processing. Daemon refuses unknown fields (defense-in-depth) and emits cm_daemon_invalid_inbound_total. Crashes from inbound payloads are bugs.

16.2 Stated explicitly out of scope

  • Root attacker on daemon host (can read keypair directly).
  • Compromised broker (E2E content protection still holds; metadata is not protected by daemon — that's mesh-level).
  • Sophisticated attacker who copies BOTH keypair.json and host_fingerprint.json (§2.2 calls this out).
  • Receivers other than daemon-hosted peers deduping inbound traffic (post-v0.9.0).

16.3 Container & CI defaults table (NEW)

Environment Identity IPC Hooks
Bare metal / VM (default) Persistent (clone-detected) UDS + TCP loopback Enabled
Docker container (/.dockerenv) Persistent UDS-only by default Enabled
Kubernetes (KUBERNETES_SERVICE_HOST) Persistent UDS-only Enabled
CI (CI=true, GITHUB_ACTIONS, etc.) Ephemeral UDS-only Disabled by default ([hooks] enabled = false until opted-in)
RunPod (RUNPOD_POD_ID) Ephemeral UDS-only Enabled

Operator overrides any default with explicit flags; warning logged for non-default-secure choices.


17. Migration — same as v2 §17, plus broker schema add

Broker needs the schema delta in §4.3 (additive, partial unique indexes — safe for online migration). Coordinated with daemon rollout: broker first, then daemon. Daemon refuses to start against a broker that lacks client_message_id_dedupe feature bit (§15).


What needs review (round 3)

Round 1 → identity, IPC auth, exactly-once lie, hook tokens, surface bloat, missing rotation/recovery/migration/threat-model.

Round 2 → boot-id false-positive, broker must dedupe on client id (protocol change), CI shared-runner reality, feature-bit negotiation, key rotation crypto, hook scopes, FTS schema, ~7 polish items.

This v3 attempts to address all of those. Specifically critique:

  1. Accidental-clone framing (§2.2) — does the honest framing close the issue, or does removing boot-id make the detection so weak it's not worth shipping at all? Should we drop fingerprint detection entirely and rely on broker concurrent-connection policy?
  2. Broker schema delta (§4.3) — is this the smallest correct change? Partial unique indexes feel right; anything else needed (audit table, gc job)?
  3. max_age_hours reduced to 23h — codex's logic says daemon outbox TTL must be inside broker dedupe window. Is 23h vs 24h tight enough? Should the broker advertise its dedupe window as a feature parameter so the daemon configures itself?
  4. Hook scopes (§6.2) — too tight? reply:event + dm:send:<sender> + topic:<name>:post. Does this cover real use cases for v0.9.0 hooks (auto-reply, escalate-to-oncall, file-receipt-ack)?
  5. Feature-bit negotiation (§15) — is the scheme right? Should feature-bits be string identifiers (current) or numeric bit positions in a bitmask (denser, more brittle)?
  6. CI defaults (§16.3) — is the table accurate? Anything wrong about defaulting hooks-disabled in CI?
  7. Key rotation grace-key archive (§14.1) — is 7d the right default? Is storing archived private keys on disk (mode 0600) acceptable, or should they be encrypted at rest with a passphrase?
  8. Anything still wrong? Read it as if you were going to operate this daemon for a year — what falls down?

Three options after this review:

  • (a) v3 is shippable: lock the spec, start coding the frozen core.
  • (b) v4 needed: list the must-fix items.
  • (c) the architecture itself is wrong: what would you do differently?

Be ruthless. We can break anything.