Files
claudemesh/.artifacts/shipped/2026-05-03-daemon-final-spec-v4.md
Alejandro Gutiérrez a2568ad9f4
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
chore(release): cli 1.22.0 — daemon v0.9.0 + housekeeping
- Bump apps/cli/package.json to 1.22.0 (additive feature: claudemesh
  daemon long-lived runtime).
- CHANGELOG entry for 1.22.0 covering subcommands, idempotency wiring,
  crash recovery, and the deferred Sprint 7 broker hardening.
- Roadmap entry for v0.9.0 daemon foundation right above the v2.0.0
  daemon redesign section, so the bridge release is documented as the
  shipped step toward the larger architectural shift.
- Move shipped daemon specs (v1..v10 iteration trail + locked v0.9.0
  spec + broker-hardening followups) from .artifacts/specs/ to
  .artifacts/shipped/ per the project artifact-pipeline convention.

Not in this commit: npm publish and the cli-v1.22.0 GitHub release tag
— both are public-distribution actions and require explicit user
approval.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 20:24:32 +01:00

539 lines
24 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# `claudemesh daemon` — Final Spec v4
> **Round 4.** v3 was reviewed by codex (round 3) and got an overall pass on
> architecture but flagged three precision gaps: (1) broker dedupe window
> semantics — permanent or windowed? schema as drawn was permanent but the
> prose said 24h; (2) feature-bit negotiation should carry parameters, not
> just booleans (so daemon can derive its outbox TTL from broker policy
> instead of hardcoding 23h); (3) key-archive record format and retention
> behavior were unspecified. Plus minor polish: document machine-id/MAC
> source precedence per OS, explicitly defer arbitrary outbound hook sends,
> resolve RunPod identity-vs-hooks inconsistency.
>
> **The intent §0 is unchanged from v2 — read it there.** v4 only revises
> what changed from v3.
---
## 0. Intent — unchanged, see v2 §0
Pre-launch peer-mesh runtime. Servers/laptops become first-class peers.
Stable identity, persistent WS, local IPC, hooks. Not a webhook gateway, not
a generic broker. We can break anything.
**One claim retracted from v1/v2**: "exactly-once" delivery. Replaced with a
precise contract in §4 below.
---
## 1. Process model — unchanged from v3 §1 / v2 §1
Resource caps, file layout, single-binary unchanged.
---
## 2. Identity — accidental-clone detection only, plus broker dedupe
Codex round-2 fix retained: no boot-id (false-positives every reboot).
Codex round-3 polish: spell out fingerprint sources per OS so we don't ship
a brittle "machine-id || first-mac" with no precedence rules.
### 2.1 Modes
```
claudemesh daemon up # default: persistent member
claudemesh daemon up --ephemeral # in-memory keypair, never written
claudemesh daemon up --ephemeral --ttl 2h # auto-shutdown after duration
```
**CI auto-detection**: if any of these env vars are set (`CI=true`,
`GITHUB_ACTIONS`, `GITLAB_CI`, `BUILDKITE`, `CIRCLECI`, `JENKINS_URL`,
`KUBERNETES_SERVICE_HOST`), AND `--persistent` is not explicitly passed,
daemon defaults to `--ephemeral`. Rationale in §16.
`RUNPOD_POD_ID` removed from auto-CI list (was inconsistent — see §16.3).
### 2.2 Accidental-clone detection (NOT attacker-grade)
This catches **image clones, restored backups, copy-pasted homedirs**
accidents made by humans. It does not defend against an attacker who copies
both `keypair.json` and `host_fingerprint.json`. The threat model (§16) says
this explicitly.
#### 2.2.1 Fingerprint source precedence (NEW — codex r3)
`host_fingerprint.json` stores `sha256(host_id || stable_mac)` where the
inputs are computed from the OS-specific table below, in order:
| OS | `host_id` (try in order) | `stable_mac` |
|---|---|---|
| Linux | `/etc/machine-id``/var/lib/dbus/machine-id` → first stable MAC | First non-loopback non-virtual interface, lex-sorted by name (`en…`/`eth…` before `wl…`); `docker0/veth*/br-*/lo` excluded |
| macOS | `IOPlatformUUID` (`ioreg -rd1 -c IOPlatformExpertDevice`) | First non-loopback non-virtual interface (`en0` typical) |
| Windows | `HKLM\SOFTWARE\Microsoft\Cryptography\MachineGuid` | First physical adapter (`Get-NetAdapter -Physical`), MAC sorted lex by adapter name |
| BSD | `kern.hostuuid` (`sysctl -n kern.hostuuid`) | Same MAC rule as Linux |
**Excluded interfaces** (cross-platform): loopback, point-to-point tunnels
(tailscale*, wg*, utun*, ppp*), docker (docker0, br-*, veth*), VPN
(`tap*`/`tun*`), VM bridges (vboxnet*, vmnet*), Apple awdl/llw bridges.
**Cloud-image false-positive note**: bare AMIs/Azure images regenerate
`/etc/machine-id` on first boot via cloud-init; for those, the first-boot
fingerprint is what we keep. If an operator clones a *running* VM
post-cloud-init, both `host_id` AND first-MAC will collide → the daemon
correctly flags this as an accidental clone.
If `host_id` cannot be read on the host's OS, daemon logs
`fingerprint_host_id_unavailable` and falls back to MAC-only. If MAC also
unavailable (truly headless container with no NIC), daemon logs
`fingerprint_unavailable`, persists a random UUID as `host_id`, and the
clone-detection feature is effectively disabled for this host (broker
concurrent-connection policy still works).
Behavior on mismatch (unchanged from v3): refuse / `accept-host` / `remint`.
`[clone] policy = "refuse" | "warn" | "allow"` overrides per host.
### 2.3 Concurrent-duplicate-identity broker policy — unchanged from v3 §2.3
`prefer_newest` (default), `prefer_oldest`, `allow_concurrent`. Configured
per-mesh in `mesh.cloneConcurrencyPolicy`.
### 2.4 Rename, key rotation — see §14
---
## 3. IPC surface — unchanged from v3 §3
Same frozen core, same auth model (UDS 0600 / TCP+SSE bearer / no token in
query / all endpoints auth by default / UDS-only in containers / Origin/Host
checks / no User-Agent theatre).
---
## 4. Delivery contract — at-least-once, **permanent** broker dedupe
Codex round 3 caught: v3's prose said "24h dedupe window" but the schema
(partial unique indexes with no `created_at`) gave **permanent** dedupe. We
have to pick. v4 chooses **permanent dedupe** because:
- It's the simplest correct choice. No GC job, no edge case where a
long-asleep daemon's retry slips past the window and double-sends.
- The unique index storage cost is bounded: at 1 KB per row × 100k
messages/day × 365 = ~36 GB/year of broker storage, which is well within
the broker's existing message-retention budget. Older message rows
themselves can still be GC'd by the existing message retention policy
(currently 365d) — only the `client_message_id` column on retained rows
has to live as long as that row does.
- It eliminates the daemon-side `max_age_hours = 23h` hack. Daemon outbox
TTL becomes "however long you want to keep retrying"; default 7d.
- It removes a class of "where exactly is the dedupe window edge?" bugs.
If broker storage growth becomes a real concern post-v0.9.0, we can convert
to a windowed scheme via a feature-bit upgrade (§15) — but we'd own the
correct migration semantics then.
### 4.1 The contract (precise)
> **Local guarantee**: each successful `POST /v1/send` returns a stable
> `client_message_id`. The send is durably persisted to `outbox.db` before
> the response returns.
>
> **Broker guarantee**: the broker dedupes on `client_message_id`
> **permanently within the lifetime of the row**. Multiple inflight retries
> from the daemon for the same `client_message_id` produce **at most one**
> broker-accepted row, regardless of time elapsed (subject to message-row
> retention policy on the broker). This is advertised via the
> `client_message_id_dedupe` feature-bit with `{ mode: "permanent" }`
> parameter (§15).
>
> **End-to-end guarantee**: at-least-once delivery to subscribers, with
> `client_message_id` propagated in the inbound envelope so receivers can
> dedupe locally. We do **not** guarantee at-most-once end-to-end —
> receiver-side dedupe is the receiver's job. The daemon's `inbox.db`
> provides it for daemon-hosted peers.
### 4.2 Daemon-supplied `client_message_id` — unchanged from v3 §4.2
Sources: `Idempotency-Key` header → body `client_message_id` → daemon-minted
ulid. Stored in outbox UNIQUE NOT NULL, propagated to broker, propagated to
receivers.
### 4.3 Broker schema delta — clarified as permanent dedupe
```sql
ALTER TABLE mesh.topic_message
ADD COLUMN client_message_id TEXT;
ALTER TABLE mesh.message_queue
ADD COLUMN client_message_id TEXT;
CREATE UNIQUE INDEX topic_message_client_id_idx
ON mesh.topic_message(mesh_id, client_message_id)
WHERE client_message_id IS NOT NULL;
CREATE UNIQUE INDEX message_queue_client_id_idx
ON mesh.message_queue(mesh_id, client_message_id)
WHERE client_message_id IS NOT NULL;
-- No created_at column needed for dedupe; the existing message row's
-- created_at handles row-level retention. Dedupe is permanent for the row's
-- lifetime, then naturally GC'd when the row is purged.
```
Partial unique indexes — legacy traffic without `client_message_id` (from
`claudemesh launch`, dashboard chat, web posts) is unaffected.
**Migration**: additive-only. Online ALTER TABLE on Postgres takes the row
lock for the column add but not the index build (`CREATE UNIQUE INDEX
CONCURRENTLY` is safe). Deploy order: schema migration → broker code that
reads/writes `client_message_id` → daemon code that sends it → daemon
enforces feature bit.
### 4.4 Outbox schema — unchanged from v3 §4.4
`UNIQUE NOT NULL` on `client_message_id`. Default `max_age_hours` raised
back to **168h (7d)** because broker dedupe is permanent — no need to stay
inside a 24h window.
### 4.5 Inbox schema — unchanged from v3 §4.5
Content table + indexes; FTS5 deferred.
### 4.6 Crash recovery — unchanged from v3 §4.6
### 4.7 Failure modes — windowed-broker case removed
The "broker dedupe window expired" failure mode in v3 §4.7 is **deleted**
because dedupe is permanent. Remaining cases:
- **`dead` rows**: surface in `claudemesh daemon outbox --failed`. User
manually requeues (`outbox requeue <id>`) or drops (`outbox drop <id>`).
- **Receiver-side dedupe**: only daemon-hosted receivers dedupe.
`claudemesh launch` and dashboard chat don't dedupe today; post-v0.9.0.
- **Broker row already GC'd, daemon retries**: daemon retry hits the
partial unique index → 23505 conflict. Broker treats as already-accepted,
returns the original `messageId` from a soft-delete tombstone OR (if the
row was hard-deleted by retention) returns `client_id_unknown`. Daemon
treats `client_id_unknown` as "delivered, history may have been pruned"
and marks `done`. Tombstone strategy is a broker implementation choice
(advertised via `client_message_id_dedupe.tombstone_retention_days` in
§15.1).
---
## 5. Inbound — unchanged from v3 §5
---
## 6. Hooks — scopes tightened (codex r2), explicit deferment of arbitrary sends (codex r3)
### 6.1 Hooks contract — unchanged from v2 §6 / v3 §6.1
### 6.2 Capability scopes — narrowed for v0.9.0
| Scope | Capability | Notes |
|---|---|---|
| `reply:event` | Reply to the specific event that triggered this hook | Bound to `event_id`; daemon validates target; expires on hook exit |
| `dm:send:<sender_pubkey>` | Send DM only to the specific sender | Bound to one pubkey from event; not a write to anyone |
| `topic:<name>:post` | Post to the specific topic that fired | Bound to topic from event; can't write elsewhere |
**No read scopes in v0.9.0.** Hooks read via the event payload (which the
daemon redacts appropriately), not via daemon-mediated reads.
**Explicitly deferred to post-v0.9.0** (codex r3 — say it out loud so use
cases don't pile up against an undocumented limit):
- **Arbitrary outbound `dm:send` to anyone other than the event sender** —
no scope grant for this. "Escalate to oncall" hooks must shell out to
`claudemesh send <oncall>` with the user's normal config; the daemon
doesn't issue capability tokens for arbitrary recipients.
- **Cross-topic post** — a hook firing on `topic:alerts` cannot post to
`topic:incidents`. Same reason.
- **Mesh-cross post** — hooks see one mesh at a time.
- **Reading state/inbox/peers** — covered above.
If a real use case demands cross-topic or arbitrary-recipient hooks
post-v0.9.0, we add scopes like `dm:send:*` (wildcard) or
`topic:*:post` (wildcard) and gate them behind explicit operator opt-in in
config (`[hooks.<name>] dangerous_wildcards = true`). Not in v0.9.0.
### 6.3 Sandboxing — unchanged from v3 §6.3
Best-effort `network_policy = "deny"`; cross-platform unenforceability
acknowledged; counter `cm_daemon_hook_unenforceable_total` exposed.
### 6.4 Payload size & truncation — unchanged from v3 §6.4
### 6.5 Audit log + killpg — unchanged
---
## 7. Multi-mesh — unchanged
## 8. Auto-routing — unchanged
## 9. Service installation — unchanged
## 10. Observability — unchanged
## 11. SDKs — unchanged
## 12. Security model — unchanged
---
## 13. Configuration — unchanged shape, plus parameterized features
```toml
[features]
require = [
"client_message_id_dedupe", # broker provides §4.1 contract
"concurrent_connection_policy", # broker honours mesh.cloneConcurrencyPolicy
]
optional = ["mesh_skill_share", "mcp_host"]
# Daemon refuses to start if broker doesn't advertise all `require` bits.
# Broker advertises feature parameters in the negotiation response (§15.1)
# — daemon picks up `dedupe_mode` and `tombstone_retention_days` from there
# and writes them to its runtime view, not config.
```
---
## 14. Lifecycle — key rotation crypto fixed (codex r2), archive format spec'd (codex r3)
### 14.1 Key rotation — crypto correct (codex r2)
`claudemesh daemon rotate-keypair`:
- Mints fresh ed25519 + x25519 keypairs.
- Registers new pubkeys with the broker as `member_keypair_rotated` event.
- Broker associates the new pubkey with the same member id, marks the old
pubkey as `rotated_out` (not revoked); senders who haven't received the
rotation event continue to encrypt to the old pubkey for a grace window.
- Daemon retains the old x25519 **private** key (only x25519 — ed25519 is
for signing, doesn't need a grace window) in `keypair-archive.json`.
- During grace, decrypt path: try current private key first; on
`crypto_box_open_easy` failure, walk archived keys in order. Successful
archived-key decrypts increment `cm_daemon_decrypt_archived_total`.
- After grace expiry, archived keys are zeroed and the file is rewritten
without them. Messages still encrypted to a fully-expired pubkey fail to
decrypt and increment `cm_daemon_decrypt_stale_total`.
#### 14.1.1 Archive record format (NEW — codex r3)
`keypair-archive.json` (mode 0600, atomic-rename writes):
```json
{
"schema_version": 1,
"max_archived_keys": 8,
"keys": [
{
"pubkey": "ed25519-base64...",
"x25519_pubkey": "base64...",
"x25519_privkey": "base64...", // sensitive; whole file is 0600
"key_id": "k_01HQX...", // ulid; matches broker's record
"created_at": "2026-04-12T11:00:00Z",
"rotated_out_at": "2026-05-03T16:00:00Z",
"expires_at": "2026-05-10T16:00:00Z" // rotated_out_at + grace
}
]
}
```
Rules:
- **`max_archived_keys`** (default 8): cap on archive size. If a rotation
would push the archive past the cap, the oldest entry is force-expired
(zeroed + removed) regardless of `expires_at`. Force-expiry increments
`cm_daemon_archive_force_expired_total{key_id}`. Operator who rotates
faster than 8 keys per grace-window-duration is intentionally accepting
decryption gaps for very-late inbound messages encrypted to those keys.
- **Grace period default**: 7 days. Configurable via
`[crypto] key_grace_period_days = 7`. Hard cap 30 days (codex review:
unbounded grace = unbounded archive on disk = bigger blast radius if
daemon host is compromised mid-life).
- **Cleanup**: scheduled daily at midnight local time + on-demand via
`claudemesh daemon archive-cleanup`. Walks `keys[]`, drops anything with
`expires_at < now`. If file is empty after cleanup, file is deleted.
- **Archive write failure**: rotation is aborted. Daemon refuses to commit
the new keypair if the archive can't be written durably. Logged as
`key_rotation_aborted_archive_write_failed`. New keypair is in memory
only; restart returns to old keypair. This is intentional: the archive
write is the durability point of rotation.
- **At-rest encryption**: archive file is mode 0600 plaintext, same threat
model as `keypair.json` (root-on-host can read both anyway). Operators
who want disk-level encryption can put `~/.claudemesh/` on an encrypted
volume; we don't reinvent that. Documented in the threat model (§16).
Future option `--archive-passphrase` deferred — adds passphrase prompt to
rotation/decrypt path, but breaks unattended daemon restart.
### 14.2 Backup includes topic state — unchanged from v3 §14.2
`keypair.json`, `keypair-archive.json` (with all archived keys),
`host_fingerprint.json`, `config.toml`, `topic_subscriptions.json`,
`topic_keys.json`, `key_epoch.json`, `schema_version`.
`local_token` NOT included; regenerated on restore.
### 14.3 Local token rotation, compromised host revocation, image-clone, uninstall, recovery — unchanged from v2 §14.3
---
## 15. Version compat — feature-bit negotiation with **parameters** (codex r3)
v3's feature bits were boolean. Codex r3: dedupe-window, max-payload, key
epochs all need parameters. v4 makes feature bits string-keyed entries that
optionally carry a value.
### 15.1 Feature bits with parameters
| Bit | Type | Parameters | Notes |
|---|---|---|---|
| `client_message_id_dedupe` | object | `{ mode: "permanent"\|"windowed", window_hours?: int, tombstone_retention_days: int }` | Daemon reads `mode` to decide whether to enforce its own outbox max-age cap. `tombstone_retention_days` (broker-controlled) tells daemon how long it can expect "already-accepted" replies after the source row is GC'd |
| `concurrent_connection_policy` | bool | — | Broker honours `mesh.cloneConcurrencyPolicy` |
| `member_keypair_rotated_event` | bool | — | Broker emits the event |
| `key_epoch` | object | `{ max_concurrent_epochs: int }` | Per-topic key epochs supported |
| `max_payload` | object | `{ inline_bytes: int, blob_bytes: int }` | Hard limits broker enforces |
| `mesh_skill_share` | bool | — | Future |
| `mcp_host` | bool | — | Future |
### 15.2 Negotiation handshake (parameterized)
On WS connect, after hello, before normal traffic:
```
→ daemon: feature_negotiation_request
{
require: ["client_message_id_dedupe",
"concurrent_connection_policy"],
optional: ["mesh_skill_share","mcp_host","max_payload"]
}
← broker: feature_negotiation_response
{
supported: {
"client_message_id_dedupe": {
"mode": "permanent",
"tombstone_retention_days": 30
},
"concurrent_connection_policy": true,
"member_keypair_rotated_event": true,
"max_payload": {
"inline_bytes": 65536,
"blob_bytes": 524288000
}
},
missing_required: []
}
```
If `missing_required` is non-empty, daemon closes the connection with code
4010 `feature_unavailable`, logs forensic event, exits non-zero. Supervisor
sees a restart-loop → operator alert.
If `client_message_id_dedupe.mode == "windowed"`, daemon reads
`window_hours` and configures its outbox `max_age_hours` to
`window_hours - 1` (margin) instead of the 168h default. Permanent mode →
daemon uses the config default, no override.
### 15.3 IPC negotiation — unchanged from v3 §15.3
`GET /v1/version` returns daemon version, IPC features, schema version, and
the **parsed** broker feature parameters (so SDKs querying the daemon can
display them).
### 15.4 Compatibility matrix — unchanged from v3 §15.4
Published at `GET /v1/compat`.
---
## 16. Threat model — unchanged from v3 §16, plus RunPod fix
### 16.1 Attacker classes — unchanged
### 16.2 Out of scope — unchanged
### 16.3 Container & CI defaults table (RunPod inconsistency fixed)
| Environment | Identity | IPC | Hooks | Rationale |
|---|---|---|---|---|
| Bare metal / VM (default) | Persistent (clone-detected) | UDS + TCP loopback | Enabled | Trusted operator-owned host |
| Docker container (`/.dockerenv`) | Persistent | UDS-only by default | Enabled | Single-tenant container, host loopback shared |
| Kubernetes (`KUBERNETES_SERVICE_HOST`) | Persistent | UDS-only | Enabled | Single pod = single tenant |
| CI (`CI=true`, `GITHUB_ACTIONS`, etc.) | Ephemeral | UDS-only | Disabled by default (`[hooks] enabled = false`) | Multi-tenant runner; arbitrary code; ephemeral identity = no cross-job leak; hooks disabled because CI workloads are arbitrary user code |
| RunPod (`RUNPOD_POD_ID`) | Persistent | UDS-only | Enabled | Long-lived single-tenant sandbox; user owns the pod for its lifetime; identical trust model to a Docker container, NOT to a CI runner |
**RunPod resolution (codex r3)**: v3 listed RunPod under both "ephemeral
identity" and "hooks enabled" which was contradictory. v4 treats RunPod as
a **single-tenant container** (Docker-like): persistent identity, UDS-only,
hooks enabled. RunPod is removed from the CI auto-detect list (§2.1).
Operators who run RunPod as multi-tenant sandbox-as-CI can opt in with
`--ephemeral` + `[hooks] enabled = false` explicitly.
Operator overrides any default with explicit flags; warning logged for
non-default-secure choices.
---
## 17. Migration — unchanged from v3 §17
Broker schema delta (additive partial unique indexes, safe online),
deployed before daemon. Daemon refuses to start if `client_message_id_dedupe`
feature bit is missing from broker's negotiation response.
---
## What changed v3 → v4 (codex round-3 actionable items)
| Codex r3 item | v4 fix | Section |
|---|---|---|
| Broker dedupe window: permanent vs windowed? | **Picked permanent**; schema clarified; outbox `max_age_hours` raised back to 168h | §4 |
| Feature bits should be parameterized | All feature bits are string-keyed with optional value object | §15.1, §15.2 |
| Key archive record format unspecified | Full schema with `key_id`, timestamps, `max_archived_keys`, force-expiry rule, write-failure semantics | §14.1.1 |
| Document fingerprint source precedence per OS | Per-OS table for `host_id` and stable MAC; cloud-image false-positive note | §2.2.1 |
| Explicit deferment of arbitrary outbound hook sends | Listed deferred capabilities + escape hatch path post-v0.9.0 | §6.2 |
| RunPod ephemeral-but-hooks-enabled inconsistency | RunPod treated as single-tenant container; removed from CI auto-detect | §2.1, §16.3 |
---
## What needs review (round 4)
Round 1 → identity, IPC auth, exactly-once lie, hook tokens, surface bloat,
missing rotation/recovery/migration/threat-model.
Round 2 → boot-id false-positive, broker must dedupe on client id, CI
shared-runner reality, feature-bit negotiation, key rotation crypto, hook
scopes, FTS schema, ~7 polish items.
Round 3 → dedupe window semantics, feature-bit parameters, key archive
record format, fingerprint source precedence, deferred hook scopes, RunPod
inconsistency.
This v4 attempts to address all of round 3. Specifically:
1. **Permanent dedupe choice (§4)** — does the storage-cost calculus hold?
Is the tombstone path (`client_id_unknown` after row GC) actually
workable, or does it need to be a real tombstone table?
2. **Feature parameter shape (§15.1)** — is the type system right (object
with optional value)? Should it be a flat key-value list instead?
Versioning of parameters within a feature?
3. **Archive record format (§14.1.1)** — anything missing? Is
`max_archived_keys=8` a sensible default, or should it be unbounded with
a force-expiry on storage size instead of count?
4. **Fingerprint per-OS table (§2.2.1)** — accurate? Is BSD worth listing
if we're not actively building for FreeBSD in v0.9.0?
5. **Hook deferment list (§6.2)** — does it cover all the realistic v0.9.0
ask? Is the "shell out to `claudemesh send`" workaround for escalation
ergonomically acceptable?
6. **RunPod resolution (§16.3)** — agree with treating RunPod as
single-tenant container? Or are there real multi-tenant RunPod
deployments we should default-guard against?
7. **Anything else still wrong?** Read it as if you were going to operate
this for a year. What falls down?
Three options after this review:
- **(a) v4 is shippable**: lock the spec, start coding the frozen core.
- **(b) v5 needed**: list the must-fix items.
- **(c) the architecture itself is wrong**: what would you do differently?
Be ruthless. We can break anything.