Compare commits
104 Commits
82ebd2b6be
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
1b28550f30 | ||
|
|
9d1b4f3d4c | ||
|
|
ffd0621ccc | ||
|
|
b9ecbe79ad | ||
|
|
33051b95bf | ||
|
|
64d9f9f6f9 | ||
|
|
7f61a711f1 | ||
|
|
96520394ff | ||
|
|
a2a53ff355 | ||
|
|
6780899185 | ||
|
|
cba4a938ec | ||
|
|
706e681d6e | ||
|
|
c036f759c3 | ||
|
|
54e00109ab | ||
|
|
16c148a87f | ||
|
|
b57e47ed65 | ||
|
|
5a8db796a0 | ||
|
|
dab80f475e | ||
|
|
a25102a79f | ||
|
|
7460d34335 | ||
|
|
25586d298f | ||
|
|
a852a9df18 | ||
|
|
4cfb682eab | ||
|
|
0958463998 | ||
|
|
088a4efaa3 | ||
|
|
15b7920b2a | ||
|
|
b0c1348a0a | ||
|
|
1a14cef1e0 | ||
|
|
71f7f81880 | ||
|
|
052f65149d | ||
|
|
0b3014e7eb | ||
|
|
cef246a34a | ||
|
|
f013436541 | ||
|
|
6d981976c0 | ||
|
|
f7d7d391c9 | ||
|
|
ff2aa8bf7c | ||
|
|
4d42185b0f | ||
|
|
d62b3f45d2 | ||
|
|
e688f66791 | ||
|
|
033a2d37e1 | ||
|
|
364178d95b | ||
|
|
f91871c71d | ||
|
|
92cac16c91 | ||
|
|
81f0e4f7ac | ||
|
|
2b6cf2c14b | ||
|
|
8a5469a5df | ||
|
|
e128a6ae5f | ||
|
|
3753a6e137 | ||
|
|
cb90f1ca60 | ||
|
|
0e3a5babd9 | ||
|
|
6794aa8512 | ||
|
|
c56910bfcf | ||
|
|
4eff4f5a20 | ||
|
|
a2568ad9f4 | ||
|
|
bf22afb0ed | ||
|
|
abaa4bcf87 | ||
|
|
65e63b0b27 | ||
|
|
5785454ac9 | ||
|
|
03cff156e2 | ||
|
|
e84914b25b | ||
|
|
5a1d5d6a49 | ||
|
|
f3649d761f | ||
|
|
79485898cf | ||
|
|
b69df75f0c | ||
|
|
3a3d2a6c4c | ||
|
|
f9ed3fa286 | ||
|
|
50b2ae97c2 | ||
|
|
4b459622e4 | ||
|
|
f679b49b6c | ||
|
|
5ceb311d74 | ||
|
|
e60980cfd7 | ||
|
|
ff3d11d42d | ||
|
|
43e429f204 | ||
|
|
1c335e8daa | ||
|
|
397ddb4c45 | ||
|
|
354c47c3d6 | ||
|
|
2262564680 | ||
|
|
c18891191e | ||
|
|
eb021a8a6f | ||
|
|
3964de4962 | ||
|
|
c795df4fd4 | ||
|
|
aa6c7be4eb | ||
|
|
3da06d357e | ||
|
|
075df6db08 | ||
|
|
c7ce92f35b | ||
|
|
7de13cbb71 | ||
|
|
ad70782171 | ||
|
|
646d4fa3f1 | ||
|
|
7f6af0137d | ||
|
|
2e57173ed9 | ||
|
|
95b16a23fc | ||
|
|
a3cf9b938e | ||
|
|
ce321c0a21 | ||
|
|
9ecf2d65af | ||
|
|
80755dbf9b | ||
|
|
82ee89d0dc | ||
|
|
8697c1c032 | ||
|
|
716e674473 | ||
|
|
038a5b5bf7 | ||
|
|
d871988084 | ||
|
|
3c35932191 | ||
|
|
b08daadbdc | ||
|
|
cb5faca920 | ||
|
|
77f4316f2d |
551
.artifacts/shipped/2026-05-03-daemon-final-spec-v10.md
Normal file
551
.artifacts/shipped/2026-05-03-daemon-final-spec-v10.md
Normal file
@@ -0,0 +1,551 @@
|
|||||||
|
# `claudemesh daemon` — Final Spec v10
|
||||||
|
|
||||||
|
> **Round 10.** v9 was reviewed by codex (round 9). The two-layer ID
|
||||||
|
> model (5/5) and §4.1 wording (4/5) were closed cleanly, but rate-limit
|
||||||
|
> placement created a worse failure: putting B1 limiter before dedupe
|
||||||
|
> lookup means **idempotent retries burn rate-limit budget** and a
|
||||||
|
> daemon retry of an already-committed message during a saturated
|
||||||
|
> window can get rate-limit-rejected → daemon marks `dead` → split-brain
|
||||||
|
> (broker has the message, daemon believes failure).
|
||||||
|
>
|
||||||
|
> **v10 fixes**:
|
||||||
|
>
|
||||||
|
> 1. New **Phase B0 dedupe fast-path** — read dedupe table BEFORE rate
|
||||||
|
> limit. Existing id (match or mismatch) returns immediately without
|
||||||
|
> touching rate-limit budget.
|
||||||
|
> 2. **Idempotent rate-limiter** keyed by `(mesh_id, client_message_id,
|
||||||
|
> window_bucket)` so even if two same-id requests race past B0, only
|
||||||
|
> the first one consumes budget.
|
||||||
|
> 3. **§4.11 stale text** — rate-limit moved out of B2 failure mode.
|
||||||
|
> 4. **§4.7.2 pseudocode reordered** to show B0 → B1 → BEGIN → claim →
|
||||||
|
> B2 → B3.
|
||||||
|
>
|
||||||
|
> **Intent §0 unchanged from v2.** v10 only revises §4.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0. Intent — unchanged, see v2 §0
|
||||||
|
|
||||||
|
## 1. Process model — unchanged
|
||||||
|
|
||||||
|
## 2. Identity — unchanged from v5 §2
|
||||||
|
|
||||||
|
## 3. IPC surface — unchanged from v4 §3
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Delivery contract — `aborted` clarified, broker phasing, SQLite locking
|
||||||
|
|
||||||
|
### 4.1 The contract (precise — v9, two-layer ID model)
|
||||||
|
|
||||||
|
> **Two-layer ID rules** (NEW v9 — codex r8):
|
||||||
|
>
|
||||||
|
> - **Daemon-layer**: a `client_message_id` is **daemon-consumed** iff an
|
||||||
|
> outbox row exists for it. Daemon-mediated callers can never reuse a
|
||||||
|
> daemon-consumed id, regardless of whether the broker ever saw it.
|
||||||
|
> The daemon's outbox is the single authority for "this id was issued
|
||||||
|
> by my caller against this daemon."
|
||||||
|
> - **Broker-layer**: a `client_message_id` is **broker-consumed** iff a
|
||||||
|
> dedupe row exists for `(mesh_id, client_message_id)` in
|
||||||
|
> `mesh.client_message_dedupe`. Direct broker callers (none in
|
||||||
|
> v0.9.0; reserved for future SDK paths that bypass the daemon) can
|
||||||
|
> reuse a broker-non-consumed id freely.
|
||||||
|
> - In v0.9.0 there are no daemon-bypass clients, so for practical
|
||||||
|
> purposes "daemon-consumed" is the operative rule.
|
||||||
|
>
|
||||||
|
> **Local guarantee**: each successful `POST /v1/send` returns a stable
|
||||||
|
> `client_message_id`. The send is durably persisted to `outbox.db`
|
||||||
|
> before the response returns. The daemon enforces request-fingerprint
|
||||||
|
> idempotency at the IPC layer (§4.5.1).
|
||||||
|
>
|
||||||
|
> **Local audit guarantee**: a `client_message_id` once written to
|
||||||
|
> `outbox.db` is **never released** (daemon-layer rule). Operator
|
||||||
|
> recovery via `requeue` always mints a fresh id; the old row stays in
|
||||||
|
> `aborted` for audit. There is no daemon-side path to free a used id.
|
||||||
|
>
|
||||||
|
> **Broker guarantee** (v9 — tightened): a dedupe row exists iff the
|
||||||
|
> broker accept transaction **committed** (Phase B3 reached). Phase B1
|
||||||
|
> rejections never insert dedupe rows. Phase B2 rejections roll the
|
||||||
|
> transaction back, so any partial dedupe row is unwound. Direct
|
||||||
|
> broker callers retrying after B1/B2 rejection see no dedupe row and
|
||||||
|
> may reuse the id.
|
||||||
|
>
|
||||||
|
> **Atomicity guarantee**: same as v8 §4.1.
|
||||||
|
>
|
||||||
|
> **End-to-end guarantee**: at-least-once.
|
||||||
|
|
||||||
|
### 4.2 Daemon-supplied `client_message_id` — unchanged from v3 §4.2
|
||||||
|
|
||||||
|
### 4.3 Broker schema — unchanged from v6 §4.3
|
||||||
|
|
||||||
|
### 4.4 Request fingerprint canonical form — unchanged from v6 §4.4
|
||||||
|
|
||||||
|
### 4.5 Daemon-local idempotency at the IPC layer (v8 — `aborted` added, SQLite locking)
|
||||||
|
|
||||||
|
#### 4.5.1 IPC accept algorithm (v8)
|
||||||
|
|
||||||
|
On `POST /v1/send`:
|
||||||
|
|
||||||
|
1. Validate request envelope (auth, schema, size limits, destination
|
||||||
|
resolvable). Failures here return `4xx` immediately. **No outbox row
|
||||||
|
is written; the `client_message_id` is not consumed.**
|
||||||
|
2. Compute `request_fingerprint` (§4.4).
|
||||||
|
3. Open a SQLite transaction with `BEGIN IMMEDIATE` (v8 — codex r7) so
|
||||||
|
a concurrent IPC accept on the same id serializes against this one.
|
||||||
|
`BEGIN IMMEDIATE` acquires the RESERVED lock at transaction start,
|
||||||
|
preventing any other writer from beginning a transaction on the same
|
||||||
|
database; SQLite has no row-level lock and `SELECT FOR UPDATE` is not
|
||||||
|
supported.
|
||||||
|
4. `SELECT id, request_fingerprint, status, broker_message_id,
|
||||||
|
last_error FROM outbox WHERE client_message_id = ?`.
|
||||||
|
5. Apply the lookup table below. For the "(no row)" case, INSERT the
|
||||||
|
new row inside the same transaction.
|
||||||
|
6. COMMIT.
|
||||||
|
|
||||||
|
| Existing row state | Fingerprint match? | Daemon response |
|
||||||
|
|---|---|---|
|
||||||
|
| (no row) | — | INSERT new outbox row in `pending`; return `202 accepted, queued` |
|
||||||
|
| `pending` | match | Return `202 accepted, queued`. No mutation |
|
||||||
|
| `pending` | mismatch | Return `409 idempotency_key_reused`, `conflict: "outbox_pending_fingerprint_mismatch"`. No mutation |
|
||||||
|
| `inflight` | match | Return `202 accepted, inflight`. No mutation |
|
||||||
|
| `inflight` | mismatch | Return `409 idempotency_key_reused`, `conflict: "outbox_inflight_fingerprint_mismatch"` |
|
||||||
|
| `done` | match | Return `200 ok, duplicate: true, broker_message_id, history_id`. No broker call |
|
||||||
|
| `done` | mismatch | Return `409 idempotency_key_reused`, `conflict: "outbox_done_fingerprint_mismatch", broker_message_id` |
|
||||||
|
| `dead` | match | Return `409 idempotency_key_reused`, `conflict: "outbox_dead_fingerprint_match", reason: "<last_error>"`. Same id never auto-retried |
|
||||||
|
| `dead` | mismatch | Return `409 idempotency_key_reused`, `conflict: "outbox_dead_fingerprint_mismatch"` |
|
||||||
|
| **`aborted`** (NEW v8) | **match** | Return `409 idempotency_key_reused`, `conflict: "outbox_aborted_fingerprint_match"`. The id was retired by operator action; never reusable |
|
||||||
|
| **`aborted`** (NEW v8) | **mismatch** | Return `409 idempotency_key_reused`, `conflict: "outbox_aborted_fingerprint_mismatch"` |
|
||||||
|
|
||||||
|
**Rule (v8 — codex r7)**: every IPC `409` carries the daemon's
|
||||||
|
`request_fingerprint` (8-byte hex prefix) so callers can debug
|
||||||
|
client/server canonical-form drift. **Every state in the table returns
|
||||||
|
something deterministic, including `aborted`.** A `client_message_id`
|
||||||
|
written to `outbox.db` is permanently bound to that row's lifecycle —
|
||||||
|
the only "free" state is "no row exists".
|
||||||
|
|
||||||
|
#### 4.5.2 Outbox table — fingerprint required
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE outbox (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
client_message_id TEXT NOT NULL UNIQUE,
|
||||||
|
request_fingerprint BLOB NOT NULL, -- 32 bytes
|
||||||
|
payload BLOB NOT NULL,
|
||||||
|
enqueued_at INTEGER NOT NULL,
|
||||||
|
attempts INTEGER DEFAULT 0,
|
||||||
|
next_attempt_at INTEGER NOT NULL,
|
||||||
|
status TEXT CHECK(status IN
|
||||||
|
('pending','inflight','done','dead','aborted')),
|
||||||
|
last_error TEXT,
|
||||||
|
delivered_at INTEGER,
|
||||||
|
broker_message_id TEXT,
|
||||||
|
aborted_at INTEGER, -- NEW v8
|
||||||
|
aborted_by TEXT, -- NEW v8: operator/auto
|
||||||
|
superseded_by TEXT -- NEW v8: id of the requeue successor row, if any
|
||||||
|
);
|
||||||
|
CREATE INDEX outbox_pending ON outbox(status, next_attempt_at);
|
||||||
|
CREATE INDEX outbox_aborted ON outbox(status, aborted_at) WHERE status = 'aborted';
|
||||||
|
```
|
||||||
|
|
||||||
|
`aborted_at`, `aborted_by`, `superseded_by` give operators a clear
|
||||||
|
audit trail. `superseded_by` lets `outbox inspect` show the chain when
|
||||||
|
a row was requeued multiple times.
|
||||||
|
|
||||||
|
`request_fingerprint` is computed once at IPC accept time and frozen
|
||||||
|
forever for the row's lifecycle. Daemon never recomputes from
|
||||||
|
`payload`.
|
||||||
|
|
||||||
|
### 4.6 Rejected-request semantics — two-layer rules + rate-limit moved to B1 (v9 — codex r8)
|
||||||
|
|
||||||
|
> **Two-layer rule (v9)**: a `client_message_id` is **daemon-consumed**
|
||||||
|
> iff an outbox row exists for it; **broker-consumed** iff a dedupe row
|
||||||
|
> exists. Daemon-mediated callers see daemon-layer authority (the only
|
||||||
|
> path in v0.9.0). Pre-validation failures at any layer consume nothing
|
||||||
|
> at that layer. The two layers are independent: a daemon-consumed id
|
||||||
|
> may or may not be broker-consumed (depending on whether the send
|
||||||
|
> reached B3); a daemon-non-consumed id can never be broker-consumed
|
||||||
|
> (no outbox row ⇒ no broker call from the daemon).
|
||||||
|
|
||||||
|
#### 4.6.1 Daemon-side rejection phasing (v9)
|
||||||
|
|
||||||
|
| Phase | When daemon rejects | Outbox row? | Daemon-consumed? | Same daemon caller may reuse id? |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| **A. IPC validation** (auth, schema, size, destination resolvable) | Before §4.5.1 step 3 | No | No | Yes — id never written locally |
|
||||||
|
| **B. Outbox stored, broker network/transient failure** | After IPC accept, broker `5xx` or timeout | `pending` → retried | Yes | N/A — daemon owns retries |
|
||||||
|
| **C. Outbox stored, broker permanent rejection** | Broker returns `4xx` after IPC accept | `dead` | Yes | No — rotate via `requeue` |
|
||||||
|
| **D. Operator retirement** | Operator runs `requeue` on `dead` or `pending` row | `aborted` (audit) + new row with fresh id | Yes (still consumed) | Old id NEVER reusable; new id is fresh |
|
||||||
|
|
||||||
|
The "daemon-consumed?" column is the daemon-layer authority. It does
|
||||||
|
not depend on whether the broker ever saw the request — phase C above
|
||||||
|
shows the broker has not committed a dedupe row, but the daemon still
|
||||||
|
holds the id in `dead` state.
|
||||||
|
|
||||||
|
#### 4.6.2 Broker-side rejection phasing (v10 — B0 dedupe fast-path added)
|
||||||
|
|
||||||
|
The broker validates in **four phases** relative to dedupe-row
|
||||||
|
insertion. Phase B0 (NEW v10 — codex r9) makes idempotent retries
|
||||||
|
free of rate-limit budget so a daemon retry of an already-committed
|
||||||
|
message can never get rate-limit-rejected:
|
||||||
|
|
||||||
|
| Phase | Validation | Side effects | Result for direct broker callers |
|
||||||
|
|---|---|---|---|
|
||||||
|
| **B0. Dedupe fast-path** (NEW v10) | Read `mesh.client_message_dedupe` for `(mesh_id, client_message_id)`. **Does not touch rate-limit budget.** | None | If row exists & fingerprint matches → `200 duplicate` with original `broker_message_id`. If row exists & fingerprint mismatches → `409 idempotency_key_reused`. If row absent → continue to B1 |
|
||||||
|
| **B1. Pre-dedupe-claim** (atomic, external) | Auth (mesh membership), schema, size, mesh exists, member exists, destination kind valid, payload bytes ≤ `max_payload.inline_bytes`, **rate limit not exceeded** (idempotent external limiter — see §4.6.4) | None | `4xx` returned. No dedupe row, no broker-consumed id. Caller may retry with same id once condition clears |
|
||||||
|
| **B2. Post-dedupe-claim** (in-tx) | Conditions that require the accept transaction to be in progress: destination_ref existence (topic exists, member subscribed, etc.) | INSERT into dedupe rolled back | `4xx` returned, transaction rolled back, no dedupe row remains. Caller may retry with same id |
|
||||||
|
| **B3. Accepted** | All side effects commit atomically | Dedupe row, message row, history row, delivery_queue rows, mention_index rows | `201` returned with `broker_message_id`. Id is broker-consumed |
|
||||||
|
|
||||||
|
**Why B0 is correct (codex r9)**: idempotent retries should never be
|
||||||
|
distinguishable from "the call worked" from the caller's perspective.
|
||||||
|
A retry that the broker can resolve to the original accept must do so
|
||||||
|
before any operation that could fail (rate limit, capacity check,
|
||||||
|
auth-quota, etc.). B0 reads — non-mutating, no transaction — so it can
|
||||||
|
be skipped on the strictly-new-id path with negligible cost (one
|
||||||
|
indexed PK lookup against the dedupe table).
|
||||||
|
|
||||||
|
**Race semantics for new ids (v10 — codex r9)**: B0 is a non-locking
|
||||||
|
read; two same-id requests can both miss B0 simultaneously. Without
|
||||||
|
care, both would consume rate-limit budget. v10 requires the limiter
|
||||||
|
to be **idempotent over `(mesh_id, client_message_id, window)`**:
|
||||||
|
budget is consumed at most once per id-window pair regardless of
|
||||||
|
concurrent retries (§4.6.4). The "second" retry that misses B0 still
|
||||||
|
sees its `INCR` short-circuited by the limiter and proceeds to B2/B3
|
||||||
|
without budget impact. Whichever request wins the dedupe `INSERT`
|
||||||
|
commits; the loser sees fingerprint match (rollback to `200
|
||||||
|
duplicate`) or mismatch (`409`).
|
||||||
|
|
||||||
|
**Daemon-mediated callers**: in v0.9.0 the daemon is the only B-phase
|
||||||
|
caller. Daemon-mediated callers see only the daemon-layer rules
|
||||||
|
(§4.6.1). The broker's "may retry with same id" wording in the table
|
||||||
|
above applies to direct broker callers only (none in v0.9.0; reserved
|
||||||
|
for future SDK paths).
|
||||||
|
|
||||||
|
**Critical guarantee (v9 — tightened from v8)**: a dedupe row exists
|
||||||
|
**iff the broker accept transaction committed (B3)**. There is no
|
||||||
|
broker code path where a permanent 4xx leaves a dedupe row behind.
|
||||||
|
|
||||||
|
If the broker decides post-commit that an accepted message is invalid
|
||||||
|
(async content-policy job, async moderation, etc.), that's NOT a
|
||||||
|
permanent rejection — it's a follow-up event that operates on the
|
||||||
|
`broker_message_id`, not on the dedupe key.
|
||||||
|
|
||||||
|
#### 4.6.4 Rate limiter — idempotent over `(mesh, client_id, window)` (v10 — codex r9)
|
||||||
|
|
||||||
|
Codex r9 caught: v9's plain `INCR` limiter would let idempotent
|
||||||
|
retries burn budget. A daemon retry of an already-committed message
|
||||||
|
that gets rate-limit-rejected creates a split-brain (broker has it,
|
||||||
|
daemon marks dead). v10 makes the limiter idempotent over
|
||||||
|
`(mesh_id, client_message_id, window_bucket)` so retries are free.
|
||||||
|
|
||||||
|
- **Authority**: same external Redis-style limiter used elsewhere in
|
||||||
|
claudemesh, but called via an idempotency-aware wrapper:
|
||||||
|
```
|
||||||
|
consume_budget(mesh_id, client_message_id, window_bucket) → {ok, denied}
|
||||||
|
Lua / WATCH-MULTI on Redis:
|
||||||
|
key = "rl:" + mesh_id + ":" + window_bucket
|
||||||
|
idem = "rli:" + mesh_id + ":" + client_message_id + ":" + window_bucket
|
||||||
|
if EXISTS idem → return ok -- already counted
|
||||||
|
if INCR key > limit_per_window
|
||||||
|
DECR key -- refund this attempt
|
||||||
|
return denied
|
||||||
|
SET idem 1 EX 2*window_seconds -- short TTL for repeat-detection
|
||||||
|
return ok
|
||||||
|
```
|
||||||
|
The `idem` key TTL is small (2× window) to keep memory bounded;
|
||||||
|
outside the window, retries that arrive late count as new traffic
|
||||||
|
(which is correct — the original `INCR` row has rolled out of the
|
||||||
|
window too).
|
||||||
|
- **Race semantics**: two same-id requests racing past B0 both arrive
|
||||||
|
at `consume_budget`. Whichever Redis call lands first runs the
|
||||||
|
conditional `INCR`+`SET idem`; the second sees `EXISTS idem` and
|
||||||
|
returns `ok` without `INCR`. Each id-window pair consumes at most
|
||||||
|
one budget unit. Implemented in Lua (single round-trip, atomic).
|
||||||
|
- **B2 rollback non-refund**: if the limiter accepts but the in-tx
|
||||||
|
Phase B2 then rejects (e.g. topic not found), the consumed budget
|
||||||
|
is **not** refunded. Counter
|
||||||
|
`cm_broker_rate_limit_consumed_then_b2_rejected_total` exposes the
|
||||||
|
delta. Refunding would require a coordinated rollback across the DB
|
||||||
|
tx and the limiter, which we don't want to build.
|
||||||
|
- **Async counters**: `mesh.rate_limit_counter` (or any DB-resident
|
||||||
|
view of "messages-per-mesh-per-window") is **non-authoritative** —
|
||||||
|
metrics/telemetry only, rebuilt from the authoritative limiter and
|
||||||
|
from message-history. Used for dashboards, not for accept decisions.
|
||||||
|
|
||||||
|
This split — idempotent atomic external limiter for enforcement,
|
||||||
|
async DB counters for telemetry — keeps idempotent retries free of
|
||||||
|
budget impact, prevents the v9 split-brain, and stays inside the
|
||||||
|
existing claudemesh rate-limit infrastructure.
|
||||||
|
|
||||||
|
**Why B0 still matters even with the idempotent limiter**: the
|
||||||
|
idempotent limiter prevents budget over-consumption, but it does NOT
|
||||||
|
make the limiter itself the dedupe authority. B0 is a non-mutating DB
|
||||||
|
read that resolves committed dedupe rows (the truth) without any
|
||||||
|
limiter or DB-write side effects at all. For the common retry case
|
||||||
|
(daemon timeout after broker B3 commit), B0 returns `200 duplicate`
|
||||||
|
without ever calling the limiter. B0 + idempotent limiter together
|
||||||
|
mean: idempotent retries are O(1 PK lookup), free, and never visible
|
||||||
|
to rate-limit accounting.
|
||||||
|
|
||||||
|
#### 4.6.3 Operator recovery via `requeue` (corrected v8)
|
||||||
|
|
||||||
|
To unstick a `dead` or `pending`-but-stuck row, operator runs:
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh daemon outbox requeue --id <outbox_row_id>
|
||||||
|
[--new-client-id <id> | --auto]
|
||||||
|
[--patch-payload <path>]
|
||||||
|
```
|
||||||
|
|
||||||
|
This atomically (single SQLite transaction):
|
||||||
|
|
||||||
|
1. Marks the existing row's status to `aborted`, sets `aborted_at = now`,
|
||||||
|
`aborted_by = "operator"`. Row is **never deleted** — audit trail
|
||||||
|
permanent.
|
||||||
|
2. Mints a fresh `client_message_id` (caller-supplied via `--new-client-id`
|
||||||
|
or auto-ulid'd via `--auto`).
|
||||||
|
3. Inserts a new outbox row in `pending` with the fresh id and the same
|
||||||
|
payload (or patched payload if `--patch-payload` was given).
|
||||||
|
4. Sets `superseded_by = <new_row_id>` on the old row so
|
||||||
|
`outbox inspect <old_id>` displays the chain.
|
||||||
|
|
||||||
|
**The old `client_message_id` is permanently dead** — `outbox.db` still
|
||||||
|
holds it via the `aborted` row's `UNIQUE` constraint, and any caller
|
||||||
|
re-using it gets `409 outbox_aborted_*` per §4.5.1.
|
||||||
|
|
||||||
|
If broker had ever accepted the old id (it reached B3), the broker's
|
||||||
|
dedupe row is also permanent — duplicate sends to broker with the old
|
||||||
|
id would also `409` for fingerprint mismatch (or return the original
|
||||||
|
`broker_message_id` for matching fingerprint). Daemon-side
|
||||||
|
`aborted` and broker-side dedupe row are independent records of "this
|
||||||
|
id was used," neither releases the id.
|
||||||
|
|
||||||
|
This is the resolution to v7's contradiction: there is **no path** for
|
||||||
|
an id to "become free again." If the operator wants to retry the
|
||||||
|
payload, they get a new id. The old id stays buried.
|
||||||
|
|
||||||
|
### 4.7 Broker atomicity contract — side-effect classification (v9)
|
||||||
|
|
||||||
|
#### 4.7.1 Side effects (v9 — rate limit moved to B1 external)
|
||||||
|
|
||||||
|
Every successful broker accept atomically commits these durable
|
||||||
|
state changes in **one transaction**:
|
||||||
|
|
||||||
|
| Effect | Table | In-tx? | Why |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Dedupe record | `mesh.client_message_dedupe` | **Yes** | Idempotency authority |
|
||||||
|
| Message body | `mesh.topic_message` / `mesh.message_queue` | **Yes** | Authoritative store |
|
||||||
|
| History row | `mesh.message_history` | **Yes** | Replay log; lost-on-rollback would break ordered replay |
|
||||||
|
| Fan-out work | `mesh.delivery_queue` | **Yes** | Each recipient must see exactly the messages that committed |
|
||||||
|
| Mention index entries | `mesh.mention_index` | **Yes** | Reads off mention queries must match committed messages |
|
||||||
|
|
||||||
|
**Outside the transaction** — non-authoritative or rebuildable, with
|
||||||
|
explicit rationale per item:
|
||||||
|
|
||||||
|
| Effect | Where | Why outside |
|
||||||
|
|---|---|---|
|
||||||
|
| WS push to live subscribers | Async after COMMIT | Live notifications are best-effort; receivers re-fetch from history on reconnect |
|
||||||
|
| Webhook fan-out | Async via `delivery_queue` workers | Off-band; consumes committed `delivery_queue` rows |
|
||||||
|
| Rate-limit **counters** (telemetry only) | Async, eventually consistent | Authoritative limiter is the external Redis-style INCR in B1 (§4.6.4); the DB counter is rebuilt for dashboards, not consulted for accept |
|
||||||
|
| Audit log entries | Async append-only stream | Audit log can be rebuilt from message history; in-tx writes hurt p99 |
|
||||||
|
| Search/FTS index updates | Async via outbox-pattern worker | Index can be rebuilt from authoritative tables |
|
||||||
|
| Metrics | Prometheus, pull-based | Always non-authoritative |
|
||||||
|
|
||||||
|
If any in-transaction insert fails, the transaction rolls back
|
||||||
|
completely. The accept is `5xx` to daemon; daemon retries. No partial
|
||||||
|
state.
|
||||||
|
|
||||||
|
The async side effects are driven off the in-transaction
|
||||||
|
`delivery_queue` and `message_history` rows, so they cannot get ahead
|
||||||
|
of committed state — only lag behind.
|
||||||
|
|
||||||
|
#### 4.7.2 Pseudocode — corrected and final (v8)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- =========================================================================
|
||||||
|
-- Phase B0: dedupe fast-path (NEW v10 — codex r9). Non-mutating.
|
||||||
|
-- Resolves idempotent retries WITHOUT touching rate-limit budget.
|
||||||
|
-- =========================================================================
|
||||||
|
SELECT broker_message_id, request_fingerprint, history_available, first_seen_at
|
||||||
|
FROM mesh.client_message_dedupe
|
||||||
|
WHERE mesh_id = $mesh_id AND client_message_id = $client_id;
|
||||||
|
|
||||||
|
-- If row exists:
|
||||||
|
-- fingerprint match → return 200 duplicate (broker_message_id, history_available). Done.
|
||||||
|
-- fingerprint mismatch → return 409 idempotency_key_reused. Done.
|
||||||
|
-- Otherwise: row absent → continue.
|
||||||
|
|
||||||
|
-- =========================================================================
|
||||||
|
-- Phase B1: schema/auth/size validation + idempotent rate-limit consume.
|
||||||
|
-- All before any DB transaction. Failures here return 4xx without opening a tx.
|
||||||
|
-- =========================================================================
|
||||||
|
-- consume_budget(mesh_id, client_id, window_bucket) — Lua/Redis (§4.6.4).
|
||||||
|
-- Idempotent over (mesh_id, client_id, window_bucket): retries within window
|
||||||
|
-- consume at most once.
|
||||||
|
|
||||||
|
-- =========================================================================
|
||||||
|
-- Phase B2 + B3: in-transaction claim and side effects.
|
||||||
|
-- =========================================================================
|
||||||
|
BEGIN;
|
||||||
|
|
||||||
|
INSERT INTO mesh.client_message_dedupe
|
||||||
|
(mesh_id, client_message_id, broker_message_id, request_fingerprint,
|
||||||
|
destination_kind, destination_ref, expires_at)
|
||||||
|
VALUES ($mesh_id, $client_id, $msg_id, $fingerprint,
|
||||||
|
$dest_kind, $dest_ref, $expires_at)
|
||||||
|
ON CONFLICT (mesh_id, client_message_id) DO NOTHING;
|
||||||
|
|
||||||
|
-- Inspect the row that's actually there now (ours or a racer's).
|
||||||
|
SELECT broker_message_id, request_fingerprint, destination_kind,
|
||||||
|
destination_ref, history_available, first_seen_at
|
||||||
|
FROM mesh.client_message_dedupe
|
||||||
|
WHERE mesh_id = $mesh_id AND client_message_id = $client_id
|
||||||
|
FOR SHARE;
|
||||||
|
|
||||||
|
-- Branch:
|
||||||
|
-- row.broker_message_id == $msg_id → we won the race; continue to side effects.
|
||||||
|
-- row.broker_message_id != $msg_id → racer won. Compare fingerprints:
|
||||||
|
-- fingerprint match → ROLLBACK; return 200 duplicate (the rare race-vs-B0 case
|
||||||
|
-- where two concurrent first-time-but-same-id requests
|
||||||
|
-- both missed B0 and one beat the other to the INSERT).
|
||||||
|
-- fingerprint mismatch → ROLLBACK; return 409 idempotency_key_reused.
|
||||||
|
|
||||||
|
-- Phase B2 validation: destination_ref existence (topic exists,
|
||||||
|
-- member subscribed, etc.). Rate limit is NOT here — it was checked
|
||||||
|
-- in B1 (§4.6.4) before this transaction opened.
|
||||||
|
-- If B2 fails → ROLLBACK; return 4xx (no dedupe row remains).
|
||||||
|
|
||||||
|
-- Step 4: insert all in-tx side effects (§4.7.1).
|
||||||
|
INSERT INTO mesh.topic_message (id, mesh_id, client_message_id, body, ...)
|
||||||
|
VALUES ($msg_id, $mesh_id, $client_id, ...);
|
||||||
|
|
||||||
|
INSERT INTO mesh.message_history (broker_message_id, mesh_id, ...)
|
||||||
|
VALUES ($msg_id, $mesh_id, ...);
|
||||||
|
|
||||||
|
INSERT INTO mesh.delivery_queue (broker_message_id, recipient_pubkey, ...)
|
||||||
|
SELECT $msg_id, member_pubkey, ...
|
||||||
|
FROM mesh.topic_subscription
|
||||||
|
WHERE topic = $dest_ref AND mesh_id = $mesh_id;
|
||||||
|
|
||||||
|
INSERT INTO mesh.mention_index (broker_message_id, mentioned_pubkey, ...)
|
||||||
|
SELECT $msg_id, mention_pubkey, ...
|
||||||
|
FROM unnest($mention_list);
|
||||||
|
|
||||||
|
COMMIT;
|
||||||
|
|
||||||
|
-- After COMMIT, async workers consume delivery_queue and update
|
||||||
|
-- search indexes, audit logs, rate-limit counters, etc.
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 4.7.3 Orphan check — same as v7 §4.7.3
|
||||||
|
|
||||||
|
Extended over the side-effect inventory to verify in-tx items consistency.
|
||||||
|
|
||||||
|
### 4.8 Outbox max-age math — unchanged from v7 §4.8
|
||||||
|
|
||||||
|
Min `dedupe_retention_days = 7`; derived `max_age_hours = window -
|
||||||
|
safety_margin` strictly < window; safety_margin floor 24h.
|
||||||
|
|
||||||
|
### 4.9 Inbox schema — unchanged from v3 §4.5
|
||||||
|
|
||||||
|
### 4.10 Crash recovery — unchanged from v3 §4.6
|
||||||
|
|
||||||
|
### 4.11 Failure modes — B0/B1/B2 distinction (v10)
|
||||||
|
|
||||||
|
- **IPC accept fingerprint-mismatch on duplicate id** (any state):
|
||||||
|
returns 409 with `conflict` field per §4.5.1. Caller must use a new id.
|
||||||
|
- **IPC accept against `aborted` row, fingerprint match**: returns 409
|
||||||
|
per §4.5.1. Caller must use a new id; the old id is permanently retired.
|
||||||
|
- **Outbox row stuck in `dead`**: operator runs `outbox requeue` per
|
||||||
|
§4.6.3; old id stays in `aborted`, new id is fresh.
|
||||||
|
- **Broker fingerprint mismatch on retry**: at B0 → returns 409
|
||||||
|
immediately (no rate-limit consumed). Daemon marks `dead`; operator
|
||||||
|
requeue path.
|
||||||
|
- **Idempotent retry of an already-committed id during a saturated
|
||||||
|
rate-limit window** (NEW v10): B0 fast-path returns `200 duplicate`
|
||||||
|
with the original `broker_message_id`. Rate-limit budget is NOT
|
||||||
|
consumed. Daemon transitions outbox row from `pending`/`inflight`
|
||||||
|
to `done`. **No split-brain.** This is the key correctness fix
|
||||||
|
from codex r9.
|
||||||
|
- **Daemon retry after dedupe row hard-deleted by broker retention
|
||||||
|
sweep**: cannot happen unless operator overrode `max_age_hours`.
|
||||||
|
- **Broker phase B1 rejection (rate limit, schema, size, etc.)**: no
|
||||||
|
dedupe row exists; daemon receives 4xx; idempotent limiter ensures
|
||||||
|
retries within window don't re-consume budget. If the rejection is
|
||||||
|
permanent (size, schema), daemon marks `dead`. If transient (rate
|
||||||
|
limit), daemon retries with exponential backoff until window clears
|
||||||
|
or `max_age_hours` exhausted.
|
||||||
|
- **Broker phase B2 rejection on retry**: same id reaches B2 and the
|
||||||
|
in-tx condition fails (topic deleted, member unsubscribed). B2
|
||||||
|
rolls back the dedupe insert; no dedupe row remains. Daemon
|
||||||
|
receives 4xx → marks `dead`. Operator can `requeue` if condition
|
||||||
|
clears (note: `requeue` mints a fresh id per §4.6.3, so the old id
|
||||||
|
stays `aborted`).
|
||||||
|
- **Atomicity violation found by orphan check**: alerts ops.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5-13. — unchanged from v4
|
||||||
|
|
||||||
|
## 14. Lifecycle — unchanged from v5 §14
|
||||||
|
|
||||||
|
## 15. Version compat — unchanged from v7 §15
|
||||||
|
|
||||||
|
## 16. Threat model — unchanged
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 17. Migration — v8 outbox columns + broker phase B2 (v8)
|
||||||
|
|
||||||
|
Broker side, deploy order: same as v7 §17, with one addition:
|
||||||
|
- Step 4.5: explicitly split broker accept into Phase B1 (pre-dedupe
|
||||||
|
validation, returns 4xx without writing) and Phase B2/B3 (within the
|
||||||
|
accept transaction). Implementation: refactor handler to validate
|
||||||
|
Phase B1 conditions before opening the DB transaction.
|
||||||
|
|
||||||
|
Daemon side:
|
||||||
|
- Outbox schema gains `aborted_at`, `aborted_by`, `superseded_by`
|
||||||
|
columns and the `aborted` enum value (§4.5.2). Migration applies via
|
||||||
|
`INSERT INTO new SELECT * FROM old` recreation if needed; v0.9.0 is
|
||||||
|
greenfield.
|
||||||
|
- IPC accept switches to `BEGIN IMMEDIATE` for SQLite serialization
|
||||||
|
(§4.5.1 step 3).
|
||||||
|
- IPC accept handles `aborted` rows per §4.5.1 (always 409).
|
||||||
|
- `claudemesh daemon outbox requeue` always mints a fresh
|
||||||
|
`client_message_id`; never frees the old id. `--new-client-id <id>`
|
||||||
|
and `--auto` are the only modes; the old `client_message_id`
|
||||||
|
argument is removed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What changed v8 → v9 (codex round-8 actionable items)
|
||||||
|
|
||||||
|
| Codex r8 item | v9 fix | Section |
|
||||||
|
|---|---|---|
|
||||||
|
| Cross-layer ID-consumed authority contradiction | Two-layer model: daemon-consumed iff outbox row; broker-consumed iff dedupe row committed; daemon-mediated callers see only daemon-layer authority | §4.1, §4.6.1, §4.6.2 |
|
||||||
|
| Rate-limit authority muddled (B2 vs async counters) | Rate limit moved to B1 via external atomic limiter (Redis-style INCR with TTL); DB rate-limit counters demoted to telemetry-only | §4.6.2, §4.6.4, §4.7.1 |
|
||||||
|
| §4.1 broker guarantee fuzzy | Tightened: "dedupe row exists iff broker accept transaction committed (B3)" | §4.1, §4.6.2 |
|
||||||
|
|
||||||
|
(Earlier rounds' fixes preserved unchanged.)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What needs review (round 9)
|
||||||
|
|
||||||
|
1. **Two-layer ID model (§4.1, §4.6.1)** — is the daemon-vs-broker
|
||||||
|
authority split clear, or does it create more confusion for
|
||||||
|
operators reading "consumed" in different contexts? Should we use
|
||||||
|
different verbs (e.g. "claimed" at daemon, "committed" at broker)?
|
||||||
|
2. **Rate-limit external limiter (§4.6.4)** — is "atomic external
|
||||||
|
limiter" specified concretely enough? Is the over-counting on
|
||||||
|
limiter-accepted-then-B2-rejected acceptable?
|
||||||
|
3. **B2 contents after rate-limit move** — B2 now only has
|
||||||
|
`destination_ref existence`. Worth keeping a B2 phase at all, or
|
||||||
|
collapse into B1+B3?
|
||||||
|
4. **Anything else still wrong?** Read it as if you were going to
|
||||||
|
operate this for a year.
|
||||||
|
|
||||||
|
Three options:
|
||||||
|
- **(a) v9 is shippable**: lock the spec, start coding the frozen core.
|
||||||
|
- **(b) v10 needed**: list the must-fix items.
|
||||||
|
- **(c) the architecture itself is wrong**: what would you do differently?
|
||||||
|
|
||||||
|
Be ruthless.
|
||||||
853
.artifacts/shipped/2026-05-03-daemon-final-spec-v2.md
Normal file
853
.artifacts/shipped/2026-05-03-daemon-final-spec-v2.md
Normal file
@@ -0,0 +1,853 @@
|
|||||||
|
# `claudemesh daemon` — Final Spec v2
|
||||||
|
|
||||||
|
> **Round 2 after a critical first-pass review.** v1 of this spec was reviewed
|
||||||
|
> by another model and pushed back on identity model, no-auth IPC, "exactly-once"
|
||||||
|
> overclaim, hook credentials, surface bloat, and missing operational flows
|
||||||
|
> (rotation, image clones, schema migration, threat model). v2 incorporates all
|
||||||
|
> of those.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0. Intent — what this is, what it isn't
|
||||||
|
|
||||||
|
### 0.1 The product reality
|
||||||
|
|
||||||
|
claudemesh today is a **peer mesh runtime for Claude Code sessions**. Each
|
||||||
|
session runs `claudemesh launch`, opens a WebSocket to a managed broker, gets
|
||||||
|
ephemeral identity, sends/receives DMs and topic messages with other Claude Code
|
||||||
|
sessions, posts to shared state, deploys MCP servers / skills / files,
|
||||||
|
participates in tasks, schedules reminders. Everything is E2E encrypted with
|
||||||
|
crypto_box envelopes for DMs and per-topic symmetric keys for topics. The broker
|
||||||
|
is a routing/persistence layer; peers do the actual work.
|
||||||
|
|
||||||
|
The CLI is the canonical surface — every operation is a `claudemesh <verb>`.
|
||||||
|
The MCP server is a "tool-less push pipe" that surfaces inbound messages to
|
||||||
|
Claude Code as channel notifications. There is also a web dashboard, an `/v1/*`
|
||||||
|
REST API, and an existing apikey auth model for external integrations.
|
||||||
|
|
||||||
|
### 0.2 The gap
|
||||||
|
|
||||||
|
Anything that **isn't a Claude Code session** is a second-class citizen:
|
||||||
|
|
||||||
|
- A RunPod handler that wants to alert a peer when an OOM happens has only
|
||||||
|
one option: curl an apikey-authed REST endpoint. One-way only. The handler
|
||||||
|
is not a peer — it can't be DM'd back, can't be `@-mentioned`, can't be in
|
||||||
|
`peer list`, can't claim a task assigned to it, can't host an MCP service or
|
||||||
|
share a skill. It's a webhook spoke, not a participant.
|
||||||
|
|
||||||
|
- A Temporal worker that wants to track its own progress in shared mesh state,
|
||||||
|
publish to a `#alerts` topic, and listen for "retry now" instructions has
|
||||||
|
no good shape. Either it shells out to `claudemesh send` cold-path
|
||||||
|
(a fresh WS handshake per message — ~1s latency, broker churn, no inbound
|
||||||
|
path) or it speaks the WS protocol manually (significant code, no SDK).
|
||||||
|
|
||||||
|
- A long-running CI runner, an IoT box, a phone app, a future Python or Go
|
||||||
|
service — none can be **first-class peers** without writing the same WS
|
||||||
|
reconnect / queue / encryption / presence code that the existing CLI already
|
||||||
|
has, plus an IPC surface so the host's apps can use it without re-implementing
|
||||||
|
any of that.
|
||||||
|
|
||||||
|
### 0.3 What this daemon is
|
||||||
|
|
||||||
|
A long-running process — the same `claudemesh-cli` binary in `daemon` mode —
|
||||||
|
that turns any host into a **first-class peer**:
|
||||||
|
|
||||||
|
- Stable identity across restarts (the host *is* a member of the mesh, not a
|
||||||
|
series of disconnected sessions).
|
||||||
|
- Persistent WS to the broker, with reconnect, queue, dedupe.
|
||||||
|
- Local IPC surface (UDS + loopback HTTP + SSE) that any local app can hit
|
||||||
|
to send, subscribe, query — without learning the broker protocol or carrying
|
||||||
|
long-lived secrets in app code.
|
||||||
|
- Hooks: shell scripts that fire on events. Server replies to DMs, auto-claims
|
||||||
|
tasks, escalates errors — without the app being involved.
|
||||||
|
- Same security primitives as `claudemesh launch` (mesh keypair, crypto_box,
|
||||||
|
per-topic keys). No new auth model toward the broker.
|
||||||
|
|
||||||
|
The daemon **is the runtime**. The CLI in cold-path mode is a fallback. The
|
||||||
|
Claude Code MCP integration is one client of the daemon (eventually).
|
||||||
|
|
||||||
|
### 0.4 What this daemon is NOT
|
||||||
|
|
||||||
|
- **Not a webhook gateway.** `/v1/notify` and apikeys remain the path for
|
||||||
|
systems that can't host the runtime (third-party SaaS, monitoring tools).
|
||||||
|
The daemon is for systems that *can* run a process — code you control.
|
||||||
|
|
||||||
|
- **Not a generic message broker.** It speaks claudemesh protocol to one
|
||||||
|
managed broker. It is not a substitute for NATS, Redis, Kafka, RabbitMQ.
|
||||||
|
|
||||||
|
- **Not a Slack replacement.** Topics, DMs, mentions exist because *AI
|
||||||
|
sessions* use them. Humans interact via the dashboard or a Claude Code
|
||||||
|
session, not by reading the daemon's inbox directly.
|
||||||
|
|
||||||
|
- **Not a fleet manager.** One daemon manages one mesh on one host. Multi-mesh
|
||||||
|
on one host is supported (one daemon per mesh, supervised). Cross-host
|
||||||
|
supervision is an external concern (systemd, k8s, etc.) — the daemon doesn't
|
||||||
|
reach across hosts.
|
||||||
|
|
||||||
|
### 0.5 Who deploys this
|
||||||
|
|
||||||
|
- A developer running `claudemesh daemon up` on their laptop so their open
|
||||||
|
Claude Code sessions all share one persistent connection (instead of each
|
||||||
|
opening its own ephemeral WS).
|
||||||
|
- The same developer running `claudemesh daemon install-service` on their VPS,
|
||||||
|
RunPod pod, Temporal worker, CI runner — turning each into an
|
||||||
|
addressable peer that scripts on that host can talk to via local IPC.
|
||||||
|
- Eventually: language SDKs (Python / Go / TypeScript) talking to the daemon
|
||||||
|
on `localhost`, exposing claudemesh as a first-class API for any app the
|
||||||
|
developer writes.
|
||||||
|
|
||||||
|
### 0.6 Pre-launch posture
|
||||||
|
|
||||||
|
No users yet. We can break protocol, schema, surface, anything. Optimize for
|
||||||
|
the architecture we want to live with for years, not for the smallest
|
||||||
|
shippable cut. Codex pushed back on v1 on this exact axis: do not ship
|
||||||
|
graph/vector/MCP/skills/tasks on day one — freeze a small, hardened core,
|
||||||
|
expand deliberately.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Process model
|
||||||
|
|
||||||
|
**One daemon per (user, mesh)**. Persistent. Survives reboots via OS
|
||||||
|
supervisor. Serves multiple local apps concurrently.
|
||||||
|
|
||||||
|
```
|
||||||
|
~/.claudemesh/daemon/<mesh-slug>/
|
||||||
|
pid 0600 pidfile, cleaned on shutdown
|
||||||
|
sock 0600 unix domain socket (primary IPC)
|
||||||
|
http.port 0644 auto-allocated loopback port
|
||||||
|
local_token 0600 per-daemon bearer for HTTP/TCP transports
|
||||||
|
keypair.json 0600 persistent ed25519 + x25519 — daemon identity
|
||||||
|
host_fingerprint.json 0600 machine-id + boot-id + interface mac digest
|
||||||
|
config.toml 0644 user-editable runtime tuning
|
||||||
|
outbox.db 0600 SQLite — durable outbound queue
|
||||||
|
inbox.db 0600 SQLite — N-day inbound history, FTS-indexed
|
||||||
|
schema_version 0644 integer; gates online migrations
|
||||||
|
daemon.log 0644 JSON-lines, rotating (100 MB / 14 d)
|
||||||
|
hooks/ 0700 user-managed event scripts
|
||||||
|
```
|
||||||
|
|
||||||
|
**Resource caps (defaults, configurable):**
|
||||||
|
|
||||||
|
| Resource | Default | Why |
|
||||||
|
|---|---|---|
|
||||||
|
| RSS | 256 MB | Most workloads stay under 50 MB; cap protects multi-mesh hosts |
|
||||||
|
| CPU | unlimited | Hook fan-out can spike briefly; rely on OS scheduler |
|
||||||
|
| Outbox DB | 5 GB | At 1KB avg msg, that's 5M queued. Disk-full handling at 90% |
|
||||||
|
| Inbox DB | 5 GB | Same |
|
||||||
|
| File descriptors | 1024 | UDS clients + SSE streams + DB handles + WS |
|
||||||
|
| SSE concurrent | 32 streams | DoS protection; configurable up |
|
||||||
|
| IPC concurrent | 64 in-flight | Backpressure beyond this returns `429 daemon_busy` |
|
||||||
|
| Hook concurrency | 8 | Bounded pool; overflow queues |
|
||||||
|
|
||||||
|
Single binary. Same `claudemesh-cli` package; `daemon` is one of its modes.
|
||||||
|
|
||||||
|
## 2. Identity — persistent member by default, ephemeral on opt-in, clone-aware
|
||||||
|
|
||||||
|
### 2.1 Modes
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh daemon up # default: persistent member
|
||||||
|
claudemesh daemon up --ephemeral # session-shaped, no keypair persisted
|
||||||
|
claudemesh daemon up --ephemeral --ttl=2h # auto-shutdown after TTL
|
||||||
|
```
|
||||||
|
|
||||||
|
- **Persistent (default)**: ed25519 + x25519 keypair stored in `keypair.json`.
|
||||||
|
Same identity across restarts, reconnects, supervisor cycles. Right for
|
||||||
|
servers, workers, addressable peers.
|
||||||
|
- **Ephemeral**: keypair generated in memory, never written. Daemon exits =
|
||||||
|
identity gone. Right for CI jobs, preview environments, disposable RunPod
|
||||||
|
pods, test harnesses, build agents, anything that should not leave a peer
|
||||||
|
ghost in the broker after teardown.
|
||||||
|
- **`--ttl <duration>`** on ephemeral mode: auto-shutdown after the duration,
|
||||||
|
or after `claudemesh daemon down`, whichever first. Broker member record
|
||||||
|
cleaned up on shutdown.
|
||||||
|
|
||||||
|
### 2.2 Image-clone detection
|
||||||
|
|
||||||
|
Two daemons booting with the same `keypair.json` (VM image clone, container
|
||||||
|
copy, restored backup) is a serious failure mode — broker sees connection
|
||||||
|
collisions, presence flickers, encrypted messages route to the wrong host.
|
||||||
|
|
||||||
|
Handled in three places:
|
||||||
|
|
||||||
|
1. **Daemon side**: `host_fingerprint.json` is written on first startup —
|
||||||
|
`sha256(machine-id || boot-id || mac-of-default-iface || hostname)`. On every
|
||||||
|
subsequent startup, the fingerprint is recomputed and compared. If it
|
||||||
|
differs, the daemon **refuses to start** unless `--accept-cloned-identity`
|
||||||
|
is passed (writes a fresh fingerprint and continues with the same keypair —
|
||||||
|
for legitimate hardware migrations) or `--remint` is passed (mints fresh
|
||||||
|
keypair, registers as a new member, broker reaps the old member after
|
||||||
|
grace period).
|
||||||
|
2. **Broker side**: tracks `lastSeenHostFingerprint` per member. On
|
||||||
|
reconnection from a different fingerprint, broker emits a
|
||||||
|
`member_clone_suspected` security event to the mesh owner's dashboard.
|
||||||
|
Connection itself is allowed (legitimate hardware swaps happen) but visible
|
||||||
|
for audit.
|
||||||
|
3. **Mesh owner**: `claudemesh member revoke <pubkey>` revokes the keypair
|
||||||
|
server-side; daemon receives `keypair_revoked` push event on next
|
||||||
|
connection and self-disables.
|
||||||
|
|
||||||
|
### 2.3 Rename
|
||||||
|
|
||||||
|
`--name` is taken at first `daemon up`; subsequent runs read the keypair file
|
||||||
|
and ignore `--name` unless `--rename` is passed (which produces a
|
||||||
|
`member_renamed` event the broker propagates to peers).
|
||||||
|
|
||||||
|
## 3. IPC surface — stable core only in v0.9.0
|
||||||
|
|
||||||
|
### 3.1 Frozen core surface (v0.9.0)
|
||||||
|
|
||||||
|
Codex's feedback: do not ship every CLI verb on day one. A small hardened core
|
||||||
|
first, expand under explicit capability gates.
|
||||||
|
|
||||||
|
```
|
||||||
|
# Messaging — durable, tested
|
||||||
|
POST /v1/send {to, message, priority?, meta?, replyToId?}
|
||||||
|
POST /v1/topic/post {topic, message, priority?, mentions?}
|
||||||
|
POST /v1/topic/subscribe {topic} (idempotent)
|
||||||
|
POST /v1/topic/unsubscribe {topic}
|
||||||
|
GET /v1/topic/list
|
||||||
|
GET /v1/inbox ?since=<iso>&topic=<n>&from=<peer>&limit=<n>
|
||||||
|
GET /v1/inbox/search ?q=<fts-query>&limit=<n> (FTS5)
|
||||||
|
|
||||||
|
# Peers + presence — read-only on day one
|
||||||
|
GET /v1/peers ?mesh=<slug>
|
||||||
|
POST /v1/profile {summary?, status?, visible?} (limited fields)
|
||||||
|
|
||||||
|
# Files — already production in CLI
|
||||||
|
POST /v1/file/share {path, to?, message?, persistent?}
|
||||||
|
GET /v1/file/get ?id=<fileId>&out=<path>
|
||||||
|
GET /v1/file/list
|
||||||
|
|
||||||
|
# Events — push
|
||||||
|
GET /v1/events text/event-stream
|
||||||
|
core events: message, peer_join, peer_leave, file_shared,
|
||||||
|
daemon_disconnect, daemon_reconnect, hook_executed
|
||||||
|
|
||||||
|
# Control plane
|
||||||
|
GET /v1/health {connected, lag_ms, queue_depth, inflight,
|
||||||
|
mesh, member_pubkey, uptime_s, schema_version,
|
||||||
|
daemon_version, broker_version}
|
||||||
|
GET /v1/metrics Prometheus exposition
|
||||||
|
GET /v1/version {daemon, schema, ipc_api} (negotiation)
|
||||||
|
POST /v1/heartbeat {} (caller-side liveness signal)
|
||||||
|
```
|
||||||
|
|
||||||
|
That's it. ~20 endpoints. Battle-test these before adding more.
|
||||||
|
|
||||||
|
### 3.2 Capability-gated future surface (v0.9.x roadmap)
|
||||||
|
|
||||||
|
Behind explicit feature flags in `config.toml`, post-v0.9.0:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[capabilities]
|
||||||
|
state = false # /v1/state/{set,get,list}
|
||||||
|
memory = false # /v1/memory/{remember,recall}
|
||||||
|
vector = false # /v1/vector/{store,search,delete}
|
||||||
|
graph = false # /v1/graph/query
|
||||||
|
tasks = false # /v1/task/{create,claim,complete}
|
||||||
|
scheduling = false # /v1/scheduling/remind
|
||||||
|
mcp_host = false # /v1/mcp/{register,call} (LARGEST surface; treat as v1.0)
|
||||||
|
skill_share = false # /v1/skill/{deploy,share}
|
||||||
|
```
|
||||||
|
|
||||||
|
Each capability is its own ship: design review, security review, test
|
||||||
|
coverage, capability-token model, then enable. None enabled in v0.9.0.
|
||||||
|
|
||||||
|
### 3.3 Local IPC authentication
|
||||||
|
|
||||||
|
Codex was right: loopback TCP without auth is an attack surface (browser SSRF,
|
||||||
|
container side-channels, sandboxed apps with network but no FS access, WSL
|
||||||
|
host-shared loopback).
|
||||||
|
|
||||||
|
| Transport | Auth | Rationale |
|
||||||
|
|---|---|---|
|
||||||
|
| UDS | None (relies on FS perms 0600) | Reaching the socket = same UID = can read keypair anyway |
|
||||||
|
| TCP loopback | **Required**: `Authorization: Bearer <local_token>` | Browser/container/sandbox can reach loopback without FS access |
|
||||||
|
| SSE | Required: `Authorization: Bearer <local_token>` | Same |
|
||||||
|
|
||||||
|
`local_token` is 32 bytes of `crypto.randomBytes` (~256 bits), encoded base64url,
|
||||||
|
written to `local_token` mode 0600 at daemon init. Rotated on `claudemesh
|
||||||
|
daemon rotate-token`. SDKs auto-discover the token by reading the file (same
|
||||||
|
mechanism as discovering the socket path).
|
||||||
|
|
||||||
|
**Additional defenses:**
|
||||||
|
- HTTP listener binds **127.0.0.1 only**. Refuses to bind elsewhere unless
|
||||||
|
`[ipc] http_bind = "..."` is set explicitly **and** `[ipc] http_external_auth = "..."`
|
||||||
|
points to a separate token file (escape hatch for advanced users; never the default).
|
||||||
|
- `Origin` header check: rejects requests with `Origin` set unless it's
|
||||||
|
explicitly allowlisted in config (default: empty allowlist). Defends against
|
||||||
|
browser SSRF.
|
||||||
|
- `Host` header check: must be `localhost` or `127.0.0.1`. Defends against DNS
|
||||||
|
rebinding.
|
||||||
|
- CORS: `Access-Control-Allow-Origin` never echoed; preflight returns `403`.
|
||||||
|
- `User-Agent` required (rejects empty UA — mild signal against simple SSRF).
|
||||||
|
|
||||||
|
### 3.4 Request limits + backpressure
|
||||||
|
|
||||||
|
- Max request body: **1 MB** (override per endpoint; file uploads use a separate
|
||||||
|
streaming endpoint).
|
||||||
|
- Max response body: **10 MB**; truncated with `Link: rel=next` cursor.
|
||||||
|
- Max in-flight IPC requests: **64**. Beyond → `429 daemon_busy`.
|
||||||
|
- Max SSE concurrent streams: **32**. Beyond → `429 too_many_streams`.
|
||||||
|
- Per-token rate limit: **100 req/sec** sustained, 1000/sec burst (token
|
||||||
|
bucket). Tunable.
|
||||||
|
|
||||||
|
## 4. Delivery contract — durable at-least-once with idempotent send
|
||||||
|
|
||||||
|
Codex was right: "exactly-once" is a lie. Replacing the claim with a precise
|
||||||
|
contract.
|
||||||
|
|
||||||
|
### 4.1 The contract
|
||||||
|
|
||||||
|
> **The daemon guarantees: each successful send call enqueues exactly one row
|
||||||
|
> to the broker eventually, identified by a stable `messageId`. The daemon
|
||||||
|
> does not guarantee that downstream peers process the message exactly once —
|
||||||
|
> that is the receiver's responsibility, aided by the propagated
|
||||||
|
> `idempotency_key`.**
|
||||||
|
|
||||||
|
Concretely:
|
||||||
|
|
||||||
|
- **Caller → daemon**: caller may supply `Idempotency-Key`; daemon dedupes
|
||||||
|
identical keys for 24h. Without one, daemon mints `ulid` and returns it as
|
||||||
|
`messageId`.
|
||||||
|
- **Daemon → broker**: each outbox row has at-most-one inflight transmit.
|
||||||
|
Daemon retries with exponential backoff until broker ACKs OR row hits TTL
|
||||||
|
(7d default → moves to `dead`).
|
||||||
|
- **Broker → peer**: existing claudemesh delivery semantics. Broker dedupes by
|
||||||
|
`messageId`. Peer receives ≥1 copy.
|
||||||
|
- **Peer hooks**: hooks see `idempotency_key` in the event JSON. Idempotent
|
||||||
|
hook implementations are the receiver's responsibility.
|
||||||
|
|
||||||
|
### 4.2 Outbox row state machine
|
||||||
|
|
||||||
|
```
|
||||||
|
┌────────────┐
|
||||||
|
send call → │ pending │
|
||||||
|
└─────┬──────┘
|
||||||
|
│ daemon picks up batch
|
||||||
|
▼
|
||||||
|
┌────────────┐
|
||||||
|
│ inflight │ ← attempts++, last_error written
|
||||||
|
└─┬────┬─────┘
|
||||||
|
│ │ broker NACK / network err
|
||||||
|
broker ACK │ └──────────► back to pending (with exp. backoff)
|
||||||
|
▼
|
||||||
|
┌────────────┐
|
||||||
|
│ done │ ← delivered_at set, broker_message_id stored
|
||||||
|
└────────────┘
|
||||||
|
|
||||||
|
age > max_age_hours:
|
||||||
|
┌────────────┐
|
||||||
|
│ dead │ ← surfaces in `daemon outbox --failed`
|
||||||
|
└────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4.3 Crash recovery
|
||||||
|
|
||||||
|
On daemon startup:
|
||||||
|
|
||||||
|
1. Any rows in `inflight` are reset to `pending` with `attempts++` and
|
||||||
|
`next_attempt_at = now + min_backoff`. Note: this MAY cause double-delivery
|
||||||
|
of a message that was actually ACK'd by the broker but the ACK didn't
|
||||||
|
persist locally before crash. The `idempotency_key` propagates to broker
|
||||||
|
(via message `meta`) so the broker dedupes by key.
|
||||||
|
2. `outbox.db` integrity check (`PRAGMA integrity_check`); if fails, daemon
|
||||||
|
refuses to start, points user at `claudemesh daemon recover`.
|
||||||
|
3. `inbox.db` integrity check; on failure, drops to `inbox.db.corrupt-<ts>`,
|
||||||
|
creates fresh empty inbox, logs `inbox_corruption_recovered` (does not
|
||||||
|
block startup — inbox is a cache).
|
||||||
|
|
||||||
|
### 4.4 Disk-full
|
||||||
|
|
||||||
|
- At 80% of `outbox.max_queue_size` or 80% of `[disk] reserved_bytes`: daemon
|
||||||
|
emits `outbox_pressure_high` event + Prometheus gauge. Sends still accept.
|
||||||
|
- At 95%: new sends return `507 insufficient_storage`. Existing inflight
|
||||||
|
drains.
|
||||||
|
- At 100%: daemon enters degraded mode — refuses sends, refuses new SSE
|
||||||
|
streams, holds open WS for inbound only. `daemon status` shows degraded.
|
||||||
|
- Recovery: drain via broker reconnect (drains `done` rows older than
|
||||||
|
retention window) or `claudemesh daemon outbox prune --confirm`.
|
||||||
|
|
||||||
|
### 4.5 Schema migration
|
||||||
|
|
||||||
|
`schema_version` file holds an integer. On startup:
|
||||||
|
1. If `schema_version` matches binary's expected version → continue.
|
||||||
|
2. If version is older → run `apps/cli/src/daemon/migrations/<from>-<to>.sql`
|
||||||
|
in a transaction, write new version on success.
|
||||||
|
3. If version is newer (downgrade) → daemon refuses to start, error points at
|
||||||
|
re-installing matching version.
|
||||||
|
|
||||||
|
Migrations are forward-only. Each migration is ≤ 1 transaction. Test coverage
|
||||||
|
required: every migration has a snapshot test from prior schema.
|
||||||
|
|
||||||
|
## 5. Inbound — durable history with FTS
|
||||||
|
|
||||||
|
Every inbound message is written to `inbox.db` before any hook fires:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE VIRTUAL TABLE inbox USING fts5(
|
||||||
|
message_id UNINDEXED, mesh UNINDEXED, topic, sender_pubkey UNINDEXED,
|
||||||
|
sender_name, body, meta, idempotency_key UNINDEXED,
|
||||||
|
received_at UNINDEXED, replied_to_id UNINDEXED
|
||||||
|
);
|
||||||
|
CREATE INDEX inbox_received_at ON inbox(received_at);
|
||||||
|
CREATE INDEX inbox_idem ON inbox(idempotency_key);
|
||||||
|
```
|
||||||
|
|
||||||
|
- **Receiver-side dedupe**: on insert, `INSERT OR IGNORE` on `idempotency_key`.
|
||||||
|
Duplicate broker delivery becomes a no-op locally + `cm_daemon_dedupe_total`
|
||||||
|
counter increments.
|
||||||
|
- 30-day rolling retention (configurable). `VACUUM` weekly during low-traffic
|
||||||
|
window.
|
||||||
|
- `claudemesh daemon search "OOM"` queries the FTS index.
|
||||||
|
- Apps connecting mid-stream replay history via `?since=<iso>`.
|
||||||
|
|
||||||
|
## 6. Hooks — first-class but tightly bounded
|
||||||
|
|
||||||
|
Codex was right: hooks were underspecified, and putting `CLAUDEMESH_TOKEN` in
|
||||||
|
every hook env was a serious exfil footgun.
|
||||||
|
|
||||||
|
### 6.1 Hook directory & contract
|
||||||
|
|
||||||
|
```
|
||||||
|
hooks/
|
||||||
|
on-message.sh every inbound message (DM + topic)
|
||||||
|
on-dm.sh DMs only
|
||||||
|
on-mention.sh when @<my-name> appears anywhere
|
||||||
|
on-topic-<name>.sh a specific topic
|
||||||
|
on-file-share.sh file shared with me
|
||||||
|
on-disconnect.sh WS dropped
|
||||||
|
on-reconnect.sh reconnected
|
||||||
|
on-startup.sh daemon up
|
||||||
|
pre-send.sh filter / mutate outbound (last gate)
|
||||||
|
hooks.toml per-hook policy (auth, redaction, env, timeout)
|
||||||
|
```
|
||||||
|
|
||||||
|
`hooks.toml` (mandatory; daemon refuses to invoke hooks without it):
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[on-mention]
|
||||||
|
enabled = true
|
||||||
|
timeout_s = 30
|
||||||
|
output_size_limit = 65536
|
||||||
|
redact_payload = ["body.password", "meta.api_key"] # JSONPath
|
||||||
|
allow_reply = true # if false, stdout reply ignored
|
||||||
|
capability_token_scope = ["topic:alerts:post"] # scoped, NOT broker session token
|
||||||
|
network_policy = "deny" # 'deny' | 'allow' | 'allowlist'
|
||||||
|
network_allowlist = [] # only if policy = 'allowlist'
|
||||||
|
fs_policy = "readonly" # 'readonly' | 'rw' | 'sandbox'
|
||||||
|
killpg_on_timeout = true # SIGTERM process group, not just child
|
||||||
|
audit = true # log every invocation
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6.2 Credentials passed to hooks
|
||||||
|
|
||||||
|
**Default: nothing.** No `CLAUDEMESH_TOKEN`, no broker session, nothing that
|
||||||
|
lets the hook impersonate the daemon's identity broadly.
|
||||||
|
|
||||||
|
**Opt-in per hook**: `capability_token_scope = ["topic:alerts:post"]` mints a
|
||||||
|
**short-lived (5 min) capability token** scoped to exactly that capability.
|
||||||
|
The hook can use it to call back into the daemon's IPC ("post a reply to
|
||||||
|
#alerts") but cannot use it to read state, read inbox, deploy MCP, etc. Token
|
||||||
|
expires when hook process exits OR after 5 min, whichever first.
|
||||||
|
|
||||||
|
Capability tokens are local-only — they authorize against the daemon's IPC
|
||||||
|
surface, never the broker directly. Daemon translates capability calls into
|
||||||
|
broker calls.
|
||||||
|
|
||||||
|
Env variables the hook DOES get:
|
||||||
|
- `CLAUDEMESH_MESH=<slug>`
|
||||||
|
- `CLAUDEMESH_HOOK_NAME=on-mention`
|
||||||
|
- `CLAUDEMESH_EVENT_ID=<ulid>`
|
||||||
|
- `CLAUDEMESH_CAPABILITY_TOKEN=<token>` (only if scope was configured; else absent)
|
||||||
|
- `CLAUDEMESH_DAEMON_SOCK=<path>` (so SDKs can connect for capability calls)
|
||||||
|
- `PATH=/usr/bin:/bin` (locked down)
|
||||||
|
|
||||||
|
### 6.3 Payload redaction
|
||||||
|
|
||||||
|
Hook stdin receives event JSON minus paths listed in `redact_payload`. Default
|
||||||
|
redaction: nothing. Mesh owner / daemon admin opts in.
|
||||||
|
|
||||||
|
### 6.4 Timeout & cleanup
|
||||||
|
|
||||||
|
- Per-hook `timeout_s` (default 30s). On timeout, daemon sends SIGTERM to the
|
||||||
|
hook's process group (`killpg_on_timeout=true`), waits 5s, then SIGKILL.
|
||||||
|
Catches forked grandchildren that were trying to keep things alive.
|
||||||
|
- Hook stdout/stderr captured, truncated at `output_size_limit`. Larger
|
||||||
|
outputs log a warning and discard the overflow.
|
||||||
|
|
||||||
|
### 6.5 Audit log
|
||||||
|
|
||||||
|
Every hook invocation logs:
|
||||||
|
```json
|
||||||
|
{"hook":"on-mention","event_id":"01H8…","exit":0,"duration_ms":47,
|
||||||
|
"stdout_bytes":120,"stderr_bytes":0,"replied":true,"capability_calls":1,
|
||||||
|
"ts":"2026-05-03T14:00:00Z"}
|
||||||
|
```
|
||||||
|
|
||||||
|
Stored in `daemon.log`; metrics exposed via `cm_daemon_hook_*`.
|
||||||
|
|
||||||
|
### 6.6 Sandboxing — supported, not required
|
||||||
|
|
||||||
|
The contract supports sandboxing without mandating it (mandating breaks too
|
||||||
|
many real workflows):
|
||||||
|
|
||||||
|
- Linux: opt-in `sandbox = "bubblewrap"` in `hooks.toml` runs the hook under
|
||||||
|
`bwrap` with no network (unless `network_policy != "deny"`), readonly FS
|
||||||
|
except `/tmp/<hook-id>`, no DBus, no /proc.
|
||||||
|
- macOS: opt-in `sandbox = "sandbox-exec"` with similar profile.
|
||||||
|
- Default: no sandbox; rely on Unix permissions + `network_policy=deny` (which
|
||||||
|
is enforced via `unshare --net` on Linux when available, otherwise
|
||||||
|
best-effort firewall rule).
|
||||||
|
|
||||||
|
## 7. Multi-mesh — daemon-per-mesh, supervised by a thin shell
|
||||||
|
|
||||||
|
### 7.1 The decision
|
||||||
|
|
||||||
|
One daemon per mesh, coordinated by a supervisor script. Codex pushed back —
|
||||||
|
"why not one daemon serving all meshes?". Going daemon-per-mesh because:
|
||||||
|
|
||||||
|
- **Crash isolation**: a panic in `prod` mesh's WS reader can't corrupt
|
||||||
|
`dev` mesh's outbox.
|
||||||
|
- **Resource accounting**: per-mesh RSS, per-mesh metrics, per-mesh disk
|
||||||
|
budget — easy to attribute, easy to cap.
|
||||||
|
- **Independent identity**: each mesh has its own keypair, host fingerprint,
|
||||||
|
capability gates. Conflating into one process forces shared trust.
|
||||||
|
- **Independent upgrades**: rolling daemon restarts per mesh, no downtime
|
||||||
|
across all meshes.
|
||||||
|
- **Simpler code**: zero cross-mesh routing logic in the daemon body.
|
||||||
|
|
||||||
|
The cost (process count, log fan-out) is real but bounded: typical user has
|
||||||
|
1–3 meshes. Heavy users (10–20) get a `claudemesh daemon ps` + `--all` UX that
|
||||||
|
treats them as a fleet.
|
||||||
|
|
||||||
|
### 7.2 Resource caps for fleet hosts
|
||||||
|
|
||||||
|
`config.toml` has `[fleet]` section read by `daemon up --all`:
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[fleet]
|
||||||
|
max_daemons = 10
|
||||||
|
total_memory_budget = "2GB" # divided across daemons; each gets budget/N RSS cap
|
||||||
|
total_disk_budget = "20GB" # divided across outbox + inbox per daemon
|
||||||
|
```
|
||||||
|
|
||||||
|
If a user hits `max_daemons`, `daemon up <next>` errors with a clear message
|
||||||
|
pointing at the cap.
|
||||||
|
|
||||||
|
### 7.3 Commands
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh daemon up --mesh <slug> # one mesh
|
||||||
|
claudemesh daemon up --all # all joined meshes (respects fleet caps)
|
||||||
|
claudemesh daemon down --mesh <slug>
|
||||||
|
claudemesh daemon down --all
|
||||||
|
claudemesh daemon status # all daemons, table view
|
||||||
|
claudemesh daemon status --json # machine-readable
|
||||||
|
claudemesh daemon ps # alias of status
|
||||||
|
claudemesh daemon logs --mesh <slug> [-f]
|
||||||
|
claudemesh daemon restart --mesh <slug>
|
||||||
|
```
|
||||||
|
|
||||||
|
## 8. Auto-routing — clarified, not transparent
|
||||||
|
|
||||||
|
Codex pushed back: "no behavior difference" was hand-waving. Persistent
|
||||||
|
identity, queueing, hooks, profile state — these legitimately change behavior.
|
||||||
|
|
||||||
|
### 8.1 What changes when a daemon is up
|
||||||
|
|
||||||
|
| Behavior | Cold-path CLI | Daemon-routed CLI |
|
||||||
|
|---|---|---|
|
||||||
|
| Sender attribution | Ephemeral session pubkey for that invocation | Daemon's persistent member pubkey |
|
||||||
|
| Latency | ~1s (fresh WS handshake) | <10ms (local UDS round-trip) |
|
||||||
|
| Send durability | None — if broker is unreachable, command fails | Outbox queue retries until TTL |
|
||||||
|
| Inbound visibility | Not available (cold path closes WS) | `claudemesh inbox` reads daemon's inbox.db |
|
||||||
|
| Hooks | Not invoked | Invoked on every event |
|
||||||
|
| Presence | Brief flicker as session connects+disconnects | Continuous; daemon's status reflected |
|
||||||
|
| `peer list` shows me as | A new ephemeral session each invocation | The daemon's persistent member |
|
||||||
|
|
||||||
|
### 8.2 Detection logic — connect, don't trust pidfile
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Check ~/.claudemesh/daemon/<slug>/sock exists.
|
||||||
|
2. attempt UDS connect with 100ms timeout.
|
||||||
|
3. If connect succeeds: send GET /v1/version.
|
||||||
|
4. If response is well-formed AND mesh matches AND daemon_version is
|
||||||
|
compatible → use this daemon.
|
||||||
|
5. Otherwise → cold path.
|
||||||
|
```
|
||||||
|
|
||||||
|
PID liveness check is unreliable (PID reuse, process orphaned). Socket
|
||||||
|
handshake is canonical.
|
||||||
|
|
||||||
|
### 8.3 Coexistence with `claudemesh launch`
|
||||||
|
|
||||||
|
Both can be running for the same mesh:
|
||||||
|
- Daemon connected as persistent member `runpod-worker-3`.
|
||||||
|
- A separate `claudemesh launch` connects as ephemeral session of the same
|
||||||
|
member. Visible to peers as "another session of runpod-worker-3"
|
||||||
|
(sibling-session relationship via `memberPubkey`).
|
||||||
|
- CLI verbs from inside `claudemesh launch` route through the launch session,
|
||||||
|
NOT the daemon (preserves "this Claude Code session has its own ephemeral
|
||||||
|
identity" semantics).
|
||||||
|
- CLI verbs from a separate shell route through the daemon (faster, durable).
|
||||||
|
|
||||||
|
This is consistent with the v0.5.1 self-DM guard and sibling-session
|
||||||
|
semantics already shipped.
|
||||||
|
|
||||||
|
## 9. Service installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
claudemesh daemon install-service # writes systemd unit / launchd plist / Windows SC
|
||||||
|
claudemesh daemon uninstall-service
|
||||||
|
claudemesh daemon install-service --user # user-scope unit (default; no root)
|
||||||
|
claudemesh daemon install-service --system # system-scope unit (root; multi-user host)
|
||||||
|
```
|
||||||
|
|
||||||
|
Unit defaults:
|
||||||
|
- `Restart=on-failure`, `RestartSec=5s`, `StartLimitBurst=5/5min`
|
||||||
|
- `MemoryMax=<resource cap>`, `TasksMax=128`, `LimitNOFILE=4096`
|
||||||
|
- `StandardOutput/Error=journal`
|
||||||
|
- `NoNewPrivileges=yes`, `PrivateTmp=yes`, `ProtectSystem=strict`,
|
||||||
|
`ProtectHome=read-only` with `ReadWritePaths=~/.claudemesh`
|
||||||
|
- For systemd `--user`, runs as the invoking user (no root needed).
|
||||||
|
|
||||||
|
`claudemesh install` (the existing setup verb) gains an opt-in prompt:
|
||||||
|
*"Install as a background service that always runs?"* Defaults differently
|
||||||
|
based on detected environment (TTY vs no-TTY, presence of systemd, etc.).
|
||||||
|
|
||||||
|
## 10. Observability
|
||||||
|
|
||||||
|
Standard CLI surface unchanged from v1, with the new gauges/counters:
|
||||||
|
|
||||||
|
```
|
||||||
|
cm_daemon_connected{mesh} 0/1
|
||||||
|
cm_daemon_reconnects_total{mesh,reason}
|
||||||
|
cm_daemon_lag_ms{mesh} last broker round-trip
|
||||||
|
cm_daemon_outbox_depth{mesh,status} pending|inflight|dead
|
||||||
|
cm_daemon_outbox_age_seconds{mesh} oldest pending row
|
||||||
|
cm_daemon_dedupe_total{mesh,direction} out|in
|
||||||
|
cm_daemon_disk_pct{mesh,kind} outbox|inbox
|
||||||
|
cm_daemon_send_total{mesh,kind,status}
|
||||||
|
cm_daemon_recv_total{mesh,kind,from_type}
|
||||||
|
cm_daemon_hook_invocations_total{hook,exit}
|
||||||
|
cm_daemon_hook_duration_seconds{hook} histogram
|
||||||
|
cm_daemon_hook_capability_calls_total{hook,scope}
|
||||||
|
cm_daemon_ipc_request_total{endpoint,status,transport}
|
||||||
|
cm_daemon_ipc_duration_seconds{endpoint} histogram
|
||||||
|
cm_daemon_local_token_rotations_total
|
||||||
|
cm_daemon_clone_suspected_total
|
||||||
|
```
|
||||||
|
|
||||||
|
Tracing: optional OpenTelemetry export.
|
||||||
|
|
||||||
|
## 11. SDKs — three, slim, core-API only
|
||||||
|
|
||||||
|
Same shape as v1 but only target the **frozen core surface** (§3.1). State /
|
||||||
|
memory / vector / graph / tasks / MCP / skills are NOT in v0.9.0 SDKs — they
|
||||||
|
ship per capability gate.
|
||||||
|
|
||||||
|
Each SDK auto-discovers the daemon: reads `sock` path, `http.port`,
|
||||||
|
`local_token`. SDKs versioned in lockstep with the daemon's `/v1` surface.
|
||||||
|
|
||||||
|
## 12. Security model — explicit boundaries
|
||||||
|
|
||||||
|
| Boundary | Trust | Mechanism |
|
||||||
|
|---|---|---|
|
||||||
|
| App ↔ Daemon (UDS) | OS user, FS perms | UDS 0600 |
|
||||||
|
| App ↔ Daemon (TCP/SSE) | OS user + bearer token | 127.0.0.1 only + `local_token` + Origin/Host check |
|
||||||
|
| Hook ↔ Daemon | Capability scope | Short-lived capability token, never broker session |
|
||||||
|
| Daemon ↔ Broker | Mesh keypair | WSS + ed25519 hello + crypto_box DM + per-topic keys |
|
||||||
|
| Daemon ↔ Disk | OS user | All daemon files mode 0600/0644 under `~/.claudemesh/daemon/` |
|
||||||
|
| Cloned identity | Host fingerprint check | Daemon refuses to start; dashboard audit event |
|
||||||
|
|
||||||
|
## 13. Configuration
|
||||||
|
|
||||||
|
`config.toml` — same shape as v1 plus:
|
||||||
|
- `[capabilities]` (§3.2)
|
||||||
|
- `[fleet]` (§7.2)
|
||||||
|
- `[disk] reserved_bytes` (§4.4)
|
||||||
|
- `[clone] policy = "refuse" | "warn" | "allow"` (§2.2)
|
||||||
|
|
||||||
|
User-editable. `claudemesh daemon reload` re-reads it without dropping the WS.
|
||||||
|
|
||||||
|
## 14. Lifecycle — the operational flows v1 was missing
|
||||||
|
|
||||||
|
### 14.1 Key rotation
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh daemon rotate-keypair
|
||||||
|
```
|
||||||
|
|
||||||
|
Mints fresh ed25519 + x25519. Registers new pubkey with broker as a `member_keypair_rotated` operation (broker associates new pubkey with same member id). Old pubkey is held server-side for 24h grace (decrypts in-flight messages encrypted to old pubkey), then revoked.
|
||||||
|
|
||||||
|
### 14.2 Local token rotation
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh daemon rotate-token
|
||||||
|
```
|
||||||
|
|
||||||
|
Atomically writes a new `local_token`, returns the old one alongside the new
|
||||||
|
one for 60s grace. SDKs that already have the old token finish in-flight
|
||||||
|
requests; new requests use the new token. After 60s, old token is rejected.
|
||||||
|
|
||||||
|
### 14.3 Compromised host revocation
|
||||||
|
|
||||||
|
From the dashboard or another mesh-owner session:
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh member revoke <pubkey>
|
||||||
|
```
|
||||||
|
|
||||||
|
Broker marks member as revoked. Connected daemon receives `member_revoked`
|
||||||
|
push, self-disables (refuses new IPC, closes WS), exits with non-zero status,
|
||||||
|
logs forensic event.
|
||||||
|
|
||||||
|
### 14.4 Image-clone lifecycle
|
||||||
|
|
||||||
|
Covered in §2.2. Three policies (`refuse`, `warn`, `allow` — settable per-host
|
||||||
|
via `config.toml`).
|
||||||
|
|
||||||
|
### 14.5 Backup & restore
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh daemon backup --out <path> # dumps keypair, config, schema_version
|
||||||
|
claudemesh daemon restore --in <path> # writes them; refuses if a daemon is running
|
||||||
|
```
|
||||||
|
|
||||||
|
Backup is encrypted with a passphrase (Argon2id KDF + crypto_secretbox). The
|
||||||
|
intent: "I'm reformatting my laptop, I want my mesh memberships back without
|
||||||
|
re-joining." NOT for "deploy this same identity on 10 servers" (that's the
|
||||||
|
clone problem above).
|
||||||
|
|
||||||
|
### 14.6 Uninstall / reset
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh daemon uninstall # full purge: stops, deregisters from broker, wipes ~/.claudemesh/daemon/<slug>
|
||||||
|
claudemesh daemon reset # wipes local state, keeps broker member registration (for restoring)
|
||||||
|
```
|
||||||
|
|
||||||
|
Uninstall calls broker's `POST /v1/me/members/:pubkey/leave` so member doesn't
|
||||||
|
linger as ghost. Reset is local-only, no broker contact.
|
||||||
|
|
||||||
|
### 14.7 Disk corruption recovery
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh daemon recover # interactive: integrity check + offer rebuild paths
|
||||||
|
```
|
||||||
|
|
||||||
|
Detects corrupt `outbox.db` / `inbox.db`. Options:
|
||||||
|
- Restore from local journal-only inbox (read-only mode; sends disabled).
|
||||||
|
- Wipe + rebuild from broker (fetches last N days of message history if
|
||||||
|
available; topics need re-subscribe; outbox is irrecoverable, queued sends are
|
||||||
|
lost).
|
||||||
|
- Wipe + start fresh.
|
||||||
|
|
||||||
|
## 15. Version compatibility
|
||||||
|
|
||||||
|
### 15.1 Negotiation handshake
|
||||||
|
|
||||||
|
On daemon connect to broker AND on every IPC request:
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /v1/version
|
||||||
|
{
|
||||||
|
"daemon_version": "0.9.0",
|
||||||
|
"ipc_api": "v1",
|
||||||
|
"ipc_minor": 3, # additive minor
|
||||||
|
"schema_version": 7,
|
||||||
|
"broker_protocol_min": "0.7",
|
||||||
|
"broker_protocol_max": "0.9"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 15.2 Compat policy
|
||||||
|
|
||||||
|
| Across | Policy |
|
||||||
|
|---|---|
|
||||||
|
| Daemon ↔ Broker | Daemon refuses to connect if broker version < daemon's `broker_protocol_min`. Broker logs warning. Pre-1.0 we may break this with notice; post-1.0 we maintain backward compat for ≥6 months. |
|
||||||
|
| CLI ↔ Daemon | CLI checks daemon's `ipc_api`. Same major = OK. Different major = CLI falls back to cold-path with warning. |
|
||||||
|
| SDK ↔ Daemon | SDK negotiates `ipc_minor`; uses minimum of (SDK's, daemon's). |
|
||||||
|
| Daemon binary ↔ schema | Binary refuses to start on unknown schema; migrations run forward-only; no automatic downgrade. |
|
||||||
|
|
||||||
|
### 15.3 Compatibility matrix (published in docs, machine-readable JSON at /v1/compat)
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"daemon": "0.9.0",
|
||||||
|
"compatible_brokers": ["0.7.x", "0.8.x", "0.9.x"],
|
||||||
|
"compatible_clis": ["0.9.x"],
|
||||||
|
"compatible_sdks": {
|
||||||
|
"python": ">=0.9.0,<1.0.0",
|
||||||
|
"go": ">=0.9.0,<1.0.0",
|
||||||
|
"ts": ">=0.9.0,<1.0.0"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## 16. Threat model
|
||||||
|
|
||||||
|
### 16.1 Attacker classes
|
||||||
|
|
||||||
|
| Attacker | Has | Wants | Mitigations |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Local same-user shell | OS user creds | Send / read mesh messages | None needed — they already have FS access to keypair; daemon is no worse |
|
||||||
|
| Local different-user shell | Different OS user | Read this user's daemon | UDS 0600 + TCP loopback + token. Requires OS exploit to escalate |
|
||||||
|
| Browser SSRF | Loopback HTTP | Send messages, read inbox | `local_token` + Origin/Host check + non-default port. SSRF without token cannot succeed |
|
||||||
|
| Container side-channel | Same loopback namespace | Read another container's daemon | Containers share host loopback only if explicitly net=host. `local_token` defends. Recommended: bind UDS only inside containers |
|
||||||
|
| Compromised hook | Capability token in env | Use that scope | Capability tokens are scoped + short-lived; cannot escalate |
|
||||||
|
| Compromised broker | Full mesh visibility on its side | Deliver malicious messages, identity-impersonate | E2E encryption (crypto_box DMs, per-topic keys) — broker can't read content. Out-of-scope for daemon |
|
||||||
|
| Cloned VM image | Same keypair on two hosts | Identity collision | Host fingerprint detection + dashboard audit + `--remint` flow |
|
||||||
|
| Stolen laptop | Disk access | Mesh impersonation forever | `member revoke` from dashboard. Without disk encryption, this is the user's laptop security; documented in security guide |
|
||||||
|
| Untrusted hook author | Hook script content | Exfil mesh data | Hook is on disk YOU control. If you ran `git pull` on a malicious hooks/ repo, that's a code-supply-chain attack out of scope for the daemon |
|
||||||
|
|
||||||
|
### 16.2 Out of scope
|
||||||
|
|
||||||
|
- Defending against an attacker with root on the daemon host. They can read
|
||||||
|
`keypair.json` directly.
|
||||||
|
- Defending against malicious peers in the same mesh sending malformed
|
||||||
|
payloads. Daemon validates structure but trusts mesh members.
|
||||||
|
- Defending against compromised broker. Out-of-scope for daemon; mesh-level
|
||||||
|
E2E protects content but not metadata.
|
||||||
|
|
||||||
|
## 17. Migration — what changes for existing users
|
||||||
|
|
||||||
|
Same as v1. Additive. No DB migration on broker. Existing
|
||||||
|
`~/.claudemesh/config.json` consumed unchanged. `claudemesh launch` keeps
|
||||||
|
working; daemon is opt-in.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What needs review (round 2)
|
||||||
|
|
||||||
|
Round 1 produced: identity model needs `--ephemeral` + clone-detect, IPC needs
|
||||||
|
local token, "exactly-once" was a lie, hooks needed scoped credentials, surface
|
||||||
|
needed shrinking, missing rotation/recovery/migration/threat-model.
|
||||||
|
|
||||||
|
This v2 attempts to address all of them. Specifically critique:
|
||||||
|
|
||||||
|
1. **Has the identity model fully closed the clone problem?** Refuses-on-fingerprint-mismatch
|
||||||
|
plus broker audit plus mesh-owner revoke — does this catch a sophisticated
|
||||||
|
attacker who copies `host_fingerprint.json` along with the keypair?
|
||||||
|
2. **Is the local-token model sufficient for browser-SSRF defense?**
|
||||||
|
Token + Origin + Host checks + 127.0.0.1-only. Anything else needed?
|
||||||
|
3. **The delivery contract** (§4) — is it now defensible? Does the inflight-recovery
|
||||||
|
semantics + idempotency-key propagation produce the guarantees claimed?
|
||||||
|
4. **Hook capability tokens** (§6.2) — short-lived, scoped, expire on hook exit.
|
||||||
|
Does this fully eliminate the exfil footgun? What capability scopes are
|
||||||
|
actually needed for v0.9.0 hooks?
|
||||||
|
5. **Frozen v0.9.0 surface** (§3.1) — is the cut right? Should `peer list` be
|
||||||
|
in core or capability-gated? Should `inbox/search` ship in v0.9.0?
|
||||||
|
6. **Threat model** (§16) — anything missing? Specifically thinking about CI
|
||||||
|
environments where the daemon's host is a fleet shared across many users'
|
||||||
|
builds.
|
||||||
|
7. **Lifecycle flows** (§14) — image clones, key rotation, host moves, disk
|
||||||
|
corruption, uninstall semantics. Anything still missing?
|
||||||
|
8. **Version compat** (§15) — is the negotiation handshake sufficient, or do
|
||||||
|
we need stronger guarantees (e.g. semver-strict, or a feature-bit
|
||||||
|
negotiation rather than version numbers)?
|
||||||
|
|
||||||
|
Score 1–5 each. Top 3 changes you'd insist on for v3, if any. If you think v2
|
||||||
|
is shippable, say so explicitly — over-engineering is a real risk.
|
||||||
648
.artifacts/shipped/2026-05-03-daemon-final-spec-v3.md
Normal file
648
.artifacts/shipped/2026-05-03-daemon-final-spec-v3.md
Normal file
@@ -0,0 +1,648 @@
|
|||||||
|
# `claudemesh daemon` — Final Spec v3
|
||||||
|
|
||||||
|
> **Round 3.** v2 of this spec was reviewed by another model and pushed back on
|
||||||
|
> identity/clone semantics (boot-id false-positives), delivery contract (broker
|
||||||
|
> must dedupe on client-supplied id — protocol change), CI shared-runner threat
|
||||||
|
> model, version negotiation (need feature bits, not ranges), key rotation
|
||||||
|
> crypto, hook scope granularity, inbox schema correctness, and ~7 smaller
|
||||||
|
> polish items. v3 incorporates all of them.
|
||||||
|
>
|
||||||
|
> **The intent §0 from v2 is unchanged and still authoritative — read it
|
||||||
|
> there.** v3 only revises what changed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0. Intent — unchanged, see v2 §0
|
||||||
|
|
||||||
|
Pre-launch peer-mesh runtime. Servers/laptops become first-class peers.
|
||||||
|
Stable identity, persistent WS, local IPC, hooks. Not a webhook gateway, not
|
||||||
|
a generic broker. We can break anything.
|
||||||
|
|
||||||
|
**One claim retracted from v1/v2**: "exactly-once" delivery. Replaced with a
|
||||||
|
precise contract in §4 below.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Process model — same as v2 §1
|
||||||
|
|
||||||
|
Resource caps, file layout, single-binary unchanged.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Identity — accidental-clone detection only, plus broker dedupe
|
||||||
|
|
||||||
|
Codex was right: v2's clone detection was both too weak (anyone copying
|
||||||
|
`host_fingerprint.json` along with `keypair.json` defeats it) and too noisy
|
||||||
|
(boot-id flips every reboot → false-positives on every legitimate restart).
|
||||||
|
|
||||||
|
### 2.1 Modes
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh daemon up # default: persistent member
|
||||||
|
claudemesh daemon up --ephemeral # in-memory keypair, never written
|
||||||
|
claudemesh daemon up --ephemeral --ttl 2h # auto-shutdown after duration
|
||||||
|
```
|
||||||
|
|
||||||
|
**CI auto-detection** (NEW): if any of the following env vars are set
|
||||||
|
(`CI=true`, `GITHUB_ACTIONS`, `GITLAB_CI`, `BUILDKITE`, `CIRCLECI`,
|
||||||
|
`JENKINS_URL`, `RUNPOD_POD_ID`, `KUBERNETES_SERVICE_HOST`), AND `--persistent`
|
||||||
|
is not explicitly passed, daemon defaults to `--ephemeral`. Rationale in §16.
|
||||||
|
|
||||||
|
### 2.2 Accidental-clone detection (NOT attacker-grade)
|
||||||
|
|
||||||
|
Frame change: this catches **image clones, restored backups, copy-pasted
|
||||||
|
homedirs** — accidents made by humans operating at human speed. It does not
|
||||||
|
defend against an attacker who copies both `keypair.json` and
|
||||||
|
`host_fingerprint.json`. The threat model (§16) says this explicitly.
|
||||||
|
|
||||||
|
Persisted fingerprint = `sha256(machine-id || first-stable-mac)`. Notably:
|
||||||
|
- **No boot-id** — that flips on every reboot and would false-positive
|
||||||
|
every legitimate restart.
|
||||||
|
- **No hostname** — laptops legitimately rename themselves.
|
||||||
|
- **`first-stable-mac`** = MAC of the lexicographically first non-loopback,
|
||||||
|
non-virtual interface present at first daemon boot. Frozen at first run;
|
||||||
|
not recomputed.
|
||||||
|
|
||||||
|
Behavior on mismatch:
|
||||||
|
- Default policy: refuse to start. Print: *"This keypair was created on a
|
||||||
|
different host. If you legitimately moved hardware, run
|
||||||
|
`claudemesh daemon accept-host` (writes a fresh fingerprint, keeps keypair).
|
||||||
|
If this is a clone of an existing daemon, run `claudemesh daemon remint`
|
||||||
|
(mints fresh keypair, registers as a new member)."*
|
||||||
|
- `[clone] policy = "refuse" | "warn" | "allow"` overrides per host.
|
||||||
|
|
||||||
|
### 2.3 Concurrent-duplicate-identity broker policy (NEW — protocol change)
|
||||||
|
|
||||||
|
When the broker receives two WS connections claiming the same member pubkey:
|
||||||
|
|
||||||
|
- **`prefer_newest`** (default): older connection is closed with code 4003
|
||||||
|
`replaced_by_newer_connection`. New connection takes over presence/inbox
|
||||||
|
delivery. Daemon-side: receives the close code, logs forensic event, exits
|
||||||
|
with non-zero status (lets supervisor restart it; if the *other* host is
|
||||||
|
the legitimate one, supervisor restart-loops are noisy enough to alert).
|
||||||
|
- **`prefer_oldest`**: new connection is rejected with code 4004
|
||||||
|
`member_already_connected`. The new daemon refuses to start.
|
||||||
|
- **`allow_concurrent`** (new mode, server-side feature flag): both
|
||||||
|
connections accepted; broker tracks both as sibling sessions of the same
|
||||||
|
member (same model as `claudemesh launch` siblings today). Useful when a
|
||||||
|
user really does want one keypair on multiple hosts (e.g. failover pairs).
|
||||||
|
|
||||||
|
Configured per-mesh in `mesh.cloneConcurrencyPolicy`. Default:
|
||||||
|
`prefer_newest`. Broker emits `member_concurrent_connection` audit event in
|
||||||
|
all cases.
|
||||||
|
|
||||||
|
### 2.4 Rename, key rotation — see §14
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. IPC surface — frozen core, hardened auth
|
||||||
|
|
||||||
|
### 3.1 Frozen core (v0.9.0) — slight cut from v2
|
||||||
|
|
||||||
|
Codex agreed v2's cut was mostly right, except: defer FTS-search to a
|
||||||
|
capability gate, keep `peer list` in core, drop redundancies.
|
||||||
|
|
||||||
|
```
|
||||||
|
# Messaging
|
||||||
|
POST /v1/send {to, message, priority?, meta?, replyToId?,
|
||||||
|
client_message_id?}
|
||||||
|
POST /v1/topic/post {topic, message, priority?, mentions?,
|
||||||
|
client_message_id?}
|
||||||
|
POST /v1/topic/subscribe {topic}
|
||||||
|
POST /v1/topic/unsubscribe {topic}
|
||||||
|
GET /v1/topic/list
|
||||||
|
GET /v1/inbox ?since=<iso>&topic=<n>&from=<peer>&limit=<n>
|
||||||
|
# plain SQL paging; NO FTS in v0.9.0
|
||||||
|
|
||||||
|
# Peers + presence (kept in core — central to "first-class peer")
|
||||||
|
GET /v1/peers ?mesh=<slug>
|
||||||
|
POST /v1/profile {summary?, status?, visible?}
|
||||||
|
|
||||||
|
# Files (already production)
|
||||||
|
POST /v1/file/share {path, to?, message?, persistent?}
|
||||||
|
GET /v1/file/get ?id=<fileId>&out=<path>
|
||||||
|
GET /v1/file/list
|
||||||
|
|
||||||
|
# Events — push
|
||||||
|
GET /v1/events text/event-stream
|
||||||
|
core events: message, peer_join, peer_leave, file_shared,
|
||||||
|
daemon_disconnect, daemon_reconnect, hook_executed,
|
||||||
|
feature_negotiation_failed
|
||||||
|
|
||||||
|
# Control plane
|
||||||
|
GET /v1/health (auth required by default — see §3.3)
|
||||||
|
GET /v1/metrics (auth required by default)
|
||||||
|
GET /v1/version (auth required by default)
|
||||||
|
POST /v1/heartbeat {}
|
||||||
|
```
|
||||||
|
|
||||||
|
`inbox/search` with FTS deferred to v0.9.x capability gate `inbox_fts`.
|
||||||
|
|
||||||
|
### 3.2 Capability-gated future surface (v0.9.x)
|
||||||
|
|
||||||
|
Same as v2 §3.2 — state, memory, vector, graph, tasks, scheduling,
|
||||||
|
mcp_host, skill_share, plus new `inbox_fts`. None enabled in v0.9.0.
|
||||||
|
|
||||||
|
### 3.3 Local IPC authentication — tightened
|
||||||
|
|
||||||
|
Same shape as v2 §3.3 but with codex's polish folded in:
|
||||||
|
|
||||||
|
| Transport | Auth | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| UDS | None (FS perms 0600) | Reaching socket = same UID |
|
||||||
|
| TCP loopback | `Authorization: Bearer <local_token>` REQUIRED | 127.0.0.1 only |
|
||||||
|
| SSE | `Authorization: Bearer <local_token>` REQUIRED | same |
|
||||||
|
|
||||||
|
**Token plumbing rules (NEW):**
|
||||||
|
- `local_token` MUST be in the `Authorization` header. **Never** accepted in
|
||||||
|
query string. Endpoint that sees a `?token=...` query param logs a security
|
||||||
|
event and returns 400.
|
||||||
|
- `local_token` MUST be redacted from access logs (`Authorization: Bearer
|
||||||
|
***` in logs).
|
||||||
|
- `local_token` rotation atomically writes a new file; SDKs hold the OLD
|
||||||
|
token valid for 60s grace, then it's rejected.
|
||||||
|
|
||||||
|
**Endpoint default auth (NEW — codex):**
|
||||||
|
- Every IPC endpoint requires the local token by default, **including**
|
||||||
|
`/v1/health`, `/v1/metrics`, `/v1/version`. `[ipc] public_health_check =
|
||||||
|
true` opts in to public `/v1/health` for k8s probes etc.
|
||||||
|
|
||||||
|
**Container default (NEW — codex):**
|
||||||
|
- If `KUBERNETES_SERVICE_HOST` is set OR `/.dockerenv` exists OR
|
||||||
|
`/proc/1/cgroup` indicates a container OR explicit `--container` flag,
|
||||||
|
daemon defaults to **UDS-only** (`[ipc] tcp_enabled = false`). Containers
|
||||||
|
share host loopback when `network_mode: host`; UDS-only avoids the
|
||||||
|
side-channel.
|
||||||
|
|
||||||
|
**Origin/Host policy:**
|
||||||
|
- `Host` header must be `localhost`, `127.0.0.1`, `[::1]` or empty. Else 403.
|
||||||
|
- `Origin` header: explicit allowlist (default: empty). SSRF-from-browser
|
||||||
|
bounce-attack defense.
|
||||||
|
- `User-Agent` requirement DROPPED (codex called it theatre — correct).
|
||||||
|
- CORS: never echo `Access-Control-Allow-Origin`; preflight returns 403.
|
||||||
|
|
||||||
|
### 3.4 Request limits & backpressure — same as v2
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Delivery contract — at-least-once, broker-dedupes-on-client-id
|
||||||
|
|
||||||
|
Codex caught the real protocol gap: idempotency only works if the broker
|
||||||
|
dedupes on the **caller's** id, not its own. This requires a broker change.
|
||||||
|
|
||||||
|
### 4.1 The contract (precise)
|
||||||
|
|
||||||
|
> **Local guarantee**: each successful `POST /v1/send` returns a stable
|
||||||
|
> `client_message_id`. The send is durably persisted to `outbox.db` before
|
||||||
|
> the response returns.
|
||||||
|
>
|
||||||
|
> **Broker guarantee**: the broker dedupes on `client_message_id` for a
|
||||||
|
> 24h window. Multiple inflight retries from the daemon for the same
|
||||||
|
> `client_message_id` produce **at most one** broker-accepted row.
|
||||||
|
>
|
||||||
|
> **End-to-end guarantee**: at-least-once delivery to subscribers, with
|
||||||
|
> `client_message_id` propagated in the inbound envelope so receivers can
|
||||||
|
> dedupe locally on their side. We do **not** guarantee at-most-once
|
||||||
|
> end-to-end — that requires receiver-side dedupe, which the daemon's
|
||||||
|
> inbox.db provides for daemon-hosted peers.
|
||||||
|
|
||||||
|
### 4.2 Daemon-supplied `client_message_id` (NEW — broker protocol change)
|
||||||
|
|
||||||
|
Every send has a stable id minted **on the daemon**, not the broker:
|
||||||
|
- Caller-supplied via `Idempotency-Key` header → wins.
|
||||||
|
- Caller-supplied in body as `client_message_id` field → second.
|
||||||
|
- Else daemon mints a `ulid` → last.
|
||||||
|
|
||||||
|
The id is:
|
||||||
|
- Returned in the IPC response.
|
||||||
|
- Stored in `outbox.db` as a UNIQUE NOT NULL column (real dedupe, not
|
||||||
|
`INSERT OR IGNORE` on nullable — codex caught this).
|
||||||
|
- Propagated to the broker on every retry (`client_message_id` field in the
|
||||||
|
WS send envelope and in `POST /v1/messages`).
|
||||||
|
- Stored in the broker's `meshTopicMessage.client_message_id` column with a
|
||||||
|
`UNIQUE` constraint scoped to `(meshId, client_message_id)`.
|
||||||
|
- Propagated in the inbound delivery to receivers' inboxes.
|
||||||
|
|
||||||
|
**Broker behavior on duplicate `client_message_id`**: returns the
|
||||||
|
already-stored `messageId` and `historyId` from the prior insertion. No new
|
||||||
|
row, no new fan-out, idempotent.
|
||||||
|
|
||||||
|
### 4.3 Broker schema delta (NEW)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
ALTER TABLE mesh.topic_message
|
||||||
|
ADD COLUMN client_message_id TEXT;
|
||||||
|
ALTER TABLE mesh.message_queue
|
||||||
|
ADD COLUMN client_message_id TEXT;
|
||||||
|
|
||||||
|
CREATE UNIQUE INDEX topic_message_client_id_idx
|
||||||
|
ON mesh.topic_message(mesh_id, client_message_id)
|
||||||
|
WHERE client_message_id IS NOT NULL;
|
||||||
|
CREATE UNIQUE INDEX message_queue_client_id_idx
|
||||||
|
ON mesh.message_queue(mesh_id, client_message_id)
|
||||||
|
WHERE client_message_id IS NOT NULL;
|
||||||
|
```
|
||||||
|
|
||||||
|
Partial unique index — legacy traffic without `client_message_id` (from
|
||||||
|
`claudemesh launch`, dashboard chat, web posts) is unaffected.
|
||||||
|
|
||||||
|
### 4.4 Outbox schema (corrected)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE outbox (
|
||||||
|
id TEXT PRIMARY KEY, -- ulid (local row id)
|
||||||
|
client_message_id TEXT NOT NULL UNIQUE, -- propagated to broker
|
||||||
|
payload BLOB NOT NULL,
|
||||||
|
enqueued_at INTEGER NOT NULL,
|
||||||
|
attempts INTEGER DEFAULT 0,
|
||||||
|
next_attempt_at INTEGER NOT NULL,
|
||||||
|
status TEXT CHECK(status IN ('pending','inflight','done','dead')),
|
||||||
|
last_error TEXT,
|
||||||
|
delivered_at INTEGER,
|
||||||
|
broker_message_id TEXT -- set on ACK
|
||||||
|
);
|
||||||
|
CREATE INDEX outbox_pending ON outbox(status, next_attempt_at);
|
||||||
|
```
|
||||||
|
|
||||||
|
`UNIQUE NOT NULL` on `client_message_id`: caller retries with the same id
|
||||||
|
collide locally and become a no-op.
|
||||||
|
|
||||||
|
### 4.5 Inbox schema (corrected — content table + FTS index)
|
||||||
|
|
||||||
|
Codex caught: FTS5 virtual tables are not where you put `CREATE INDEX`.
|
||||||
|
Real shape:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Content table — the durable store
|
||||||
|
CREATE TABLE inbox (
|
||||||
|
id TEXT PRIMARY KEY, -- ulid (local row id)
|
||||||
|
client_message_id TEXT NOT NULL UNIQUE, -- dedupe key
|
||||||
|
broker_message_id TEXT,
|
||||||
|
mesh TEXT NOT NULL,
|
||||||
|
topic TEXT,
|
||||||
|
sender_pubkey TEXT NOT NULL,
|
||||||
|
sender_name TEXT NOT NULL,
|
||||||
|
body TEXT,
|
||||||
|
meta TEXT, -- JSON
|
||||||
|
received_at INTEGER NOT NULL,
|
||||||
|
reply_to_id TEXT
|
||||||
|
);
|
||||||
|
CREATE INDEX inbox_received_at ON inbox(received_at);
|
||||||
|
CREATE INDEX inbox_topic ON inbox(topic);
|
||||||
|
CREATE INDEX inbox_sender ON inbox(sender_pubkey);
|
||||||
|
|
||||||
|
-- FTS5 index — gated behind capability `inbox_fts` (deferred to v0.9.x)
|
||||||
|
-- When enabled, populated via triggers; absent in v0.9.0.
|
||||||
|
```
|
||||||
|
|
||||||
|
Insert path: `INSERT INTO inbox(...) ON CONFLICT(client_message_id) DO
|
||||||
|
NOTHING RETURNING id`. The `RETURNING` clause tells us whether a new row
|
||||||
|
landed; only new rows trigger hooks.
|
||||||
|
|
||||||
|
### 4.6 Crash recovery — explicit semantics
|
||||||
|
|
||||||
|
On daemon startup:
|
||||||
|
1. Rows in `inflight` reset to `pending` with `attempts++`,
|
||||||
|
`next_attempt_at = now + min_backoff`. **Note:** these may double-deliver
|
||||||
|
if the broker actually accepted before the local ACK persisted. The
|
||||||
|
`client_message_id` propagation ensures the broker dedupes the retry —
|
||||||
|
net result: exactly one broker-accepted row, possibly two daemon-side
|
||||||
|
`inflight → done` transitions.
|
||||||
|
2. `outbox.db` PRAGMA integrity_check; failure → daemon refuses to start,
|
||||||
|
point at `claudemesh daemon recover`.
|
||||||
|
3. `inbox.db` integrity check; failure → move to `inbox.db.corrupt-<ts>`,
|
||||||
|
create fresh empty inbox, log `inbox_corruption_recovered`. Inbox is a
|
||||||
|
cache; recoverable from broker history.
|
||||||
|
|
||||||
|
### 4.7 Failure modes the spec is honest about
|
||||||
|
|
||||||
|
- **Broker dedupe window expired**: daemon retries a 25h-old send. Broker
|
||||||
|
accepts again as if new (no dedupe). Daemon's outbox `max_age_hours`
|
||||||
|
(default 168h = 7d) is longer than broker dedupe (24h), so this is
|
||||||
|
possible. Default daemon `max_age_hours` REDUCED to **23h** to stay inside
|
||||||
|
broker dedupe window. Configurable up only if the operator accepts the
|
||||||
|
risk explicitly.
|
||||||
|
- **`dead` rows**: surface in `claudemesh daemon outbox --failed`. User
|
||||||
|
manually requeues (`outbox requeue <id>`) or drops (`outbox drop <id>`).
|
||||||
|
- **Receiver-side dedupe failure**: only daemon-hosted receivers dedupe.
|
||||||
|
`claudemesh launch` and dashboard chat clients DO NOT dedupe today —
|
||||||
|
fixing them is post-v0.9.0.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Inbound — schema corrected (see §4.5), retention as v2
|
||||||
|
|
||||||
|
30-day rolling retention (configurable). Weekly VACUUM.
|
||||||
|
`claudemesh daemon search` deferred to `inbox_fts` capability.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Hooks — scopes tightened, exfiltration acknowledged
|
||||||
|
|
||||||
|
Codex was right: capability tokens removed the broad-token footgun, not
|
||||||
|
exfiltration. Untrusted hook payload + `network_policy=deny` not reliable
|
||||||
|
across platforms. Spec is now honest about that.
|
||||||
|
|
||||||
|
### 6.1 Hooks contract — same shape as v2 §6, with tighter defaults
|
||||||
|
|
||||||
|
### 6.2 Capability scopes — narrowed for v0.9.0
|
||||||
|
|
||||||
|
Codex pushed: scopes were too coarse. v0.9.0 scopes are exactly:
|
||||||
|
|
||||||
|
| Scope | Capability | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| `reply:event` | Reply to the specific event that triggered this hook | Bound to `event_id`; daemon validates target; expires on hook exit |
|
||||||
|
| `dm:send:<sender_pubkey>` | Send DM only to the specific sender | Bound to one pubkey from event; not a write to anyone |
|
||||||
|
| `topic:<name>:post` | Post to the specific topic that fired | Bound to topic from event; can't write elsewhere |
|
||||||
|
|
||||||
|
**No read scopes in v0.9.0.** A hook cannot read state, inbox, peers, etc.
|
||||||
|
If a hook wants to consult mesh data to compose its reply, it does so via
|
||||||
|
the *event payload* (which the daemon redacted appropriately) or via shell
|
||||||
|
out to a fresh `claudemesh <verb>` call (which uses the user's existing
|
||||||
|
config and is subject to its own auth). No daemon-mediated read tokens.
|
||||||
|
|
||||||
|
### 6.3 Sandboxing — supported, not promised
|
||||||
|
|
||||||
|
Codex caught: "network_policy=deny" sounds reliable but isn't cross-platform.
|
||||||
|
Spec now says explicitly:
|
||||||
|
|
||||||
|
- `network_policy = "deny"` is **best-effort**:
|
||||||
|
- Linux: enforced via `unshare --net` if available; else firewall rule via
|
||||||
|
`iptables -m owner` if available; else daemon logs warning that policy
|
||||||
|
cannot be enforced and the hook STILL runs.
|
||||||
|
- macOS: enforced via `sandbox-exec` profile if available; else warning + run.
|
||||||
|
- Windows: not enforced; warning + run.
|
||||||
|
- Operators on hostile networks should set `enabled = false` for hooks they
|
||||||
|
don't trust.
|
||||||
|
- Daemon `cm_daemon_hook_unenforceable_total` counter exposes the count of
|
||||||
|
hooks that ran with weakened sandbox.
|
||||||
|
|
||||||
|
### 6.4 Payload size & truncation — NEW
|
||||||
|
|
||||||
|
Stdin payloads to hooks capped at 256 KB (configurable). Larger payloads
|
||||||
|
truncated with `_truncated: true` flag in the JSON event. Hook stdout
|
||||||
|
captured up to `output_size_limit` (default 64 KB).
|
||||||
|
|
||||||
|
### 6.5 Audit log + killpg — same as v2
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Multi-mesh — same as v2 §7
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Auto-routing — same as v2 §8 (codex agreed it was clarified correctly)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. Service installation — same as v2 §9
|
||||||
|
|
||||||
|
Add: when `claudemesh daemon install-service` runs in CI-detected
|
||||||
|
environment, prints `Refusing to install persistent service in CI; ephemeral
|
||||||
|
mode only.` and exits non-zero unless `--allow-ci-persistent` is passed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 10. Observability — same as v2 §10
|
||||||
|
|
||||||
|
Add metric: `cm_daemon_hook_unenforceable_total{hook,reason}` (§6.3).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 11. SDKs — same shape as v2, bound to frozen core only
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 12. Security model — same boundaries, plus dedupe + feature negotiation
|
||||||
|
|
||||||
|
| Boundary | Trust | Mechanism |
|
||||||
|
|---|---|---|
|
||||||
|
| App ↔ Daemon (UDS) | OS user | UDS 0600 |
|
||||||
|
| App ↔ Daemon (TCP/SSE) | OS user + bearer token | 127.0.0.1 + `local_token` + Origin/Host |
|
||||||
|
| Hook ↔ Daemon | Capability scope | Short-lived token bound to event; no read scopes |
|
||||||
|
| Daemon ↔ Broker | Mesh keypair + feature bits | WSS + ed25519 + crypto_box + per-topic keys + feature negotiation (§15) |
|
||||||
|
| Daemon ↔ Disk | OS user | All files 0600/0644 |
|
||||||
|
| Cloned identity | First-mac fingerprint | Accidental-clone detection only; broker concurrent-policy on §2.3 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 13. Configuration — same shape as v2 §13, plus `[features]`
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[features]
|
||||||
|
require = ["client_message_id_dedupe", "concurrent_connection_policy"]
|
||||||
|
optional = ["mesh_skill_share", "mcp_host"]
|
||||||
|
# Daemon refuses to start if broker doesn't advertise all `require` bits.
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 14. Lifecycle — key rotation crypto fixed
|
||||||
|
|
||||||
|
### 14.1 Key rotation (CORRECTED — codex)
|
||||||
|
|
||||||
|
v2 said: *"old pubkey held server-side for 24h grace (decrypts in-flight
|
||||||
|
messages encrypted to old pubkey)"*. **Wrong** — only the daemon has the
|
||||||
|
private key. Broker can't decrypt.
|
||||||
|
|
||||||
|
Real semantics:
|
||||||
|
|
||||||
|
- `claudemesh daemon rotate-keypair` mints fresh ed25519 + x25519, registers
|
||||||
|
the new pubkey with the broker as `member_keypair_rotated`.
|
||||||
|
- Broker associates the new pubkey with the same member id, marks the old
|
||||||
|
pubkey as `rotated_out` (not revoked).
|
||||||
|
- **Daemon-side**: the OLD x25519 private key is retained in
|
||||||
|
`keypair-archive.json` (mode 0600, durable) for a `key_grace_period`
|
||||||
|
(default 7 days). During the grace window, daemon will attempt to decrypt
|
||||||
|
inbound messages with the new private key first, falling back to archived
|
||||||
|
keys (one or more). Messages encrypted to the old pubkey by senders who
|
||||||
|
haven't yet seen the rotation event continue to decrypt cleanly.
|
||||||
|
- After the grace period, archived keys are zeroed and the file is deleted.
|
||||||
|
Messages encrypted to a stale pubkey after the grace window fail to
|
||||||
|
decrypt and are logged as `cm_daemon_decrypt_stale_total`.
|
||||||
|
|
||||||
|
### 14.2 Backup includes topic state (CORRECTED)
|
||||||
|
|
||||||
|
`claudemesh daemon backup` now packages:
|
||||||
|
- `keypair.json` (current)
|
||||||
|
- `keypair-archive.json` (any in-grace-window archived keys)
|
||||||
|
- `host_fingerprint.json`
|
||||||
|
- `config.toml`
|
||||||
|
- `local_token` (NOT — token is rotated on restore)
|
||||||
|
- `topic_subscriptions.json` (which topics this daemon subscribes to)
|
||||||
|
- `topic_keys.json` (per-topic symmetric keys this member holds)
|
||||||
|
- `key_epoch.json` (current epoch number per topic; relevant when the mesh
|
||||||
|
rotates topic keys)
|
||||||
|
- `schema_version`
|
||||||
|
|
||||||
|
Backup file: encrypted with a passphrase (Argon2id KDF + crypto_secretbox).
|
||||||
|
Restore writes everything except `local_token` (regenerated). On first run
|
||||||
|
after restore, daemon performs `accept-host` if fingerprint mismatches
|
||||||
|
(restore is by definition a host change).
|
||||||
|
|
||||||
|
### 14.3 Local token rotation, compromised host revocation, image-clone, uninstall, recovery — same as v2 §14
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 15. Version compat — feature-bit negotiation (REPLACES v2 §15)
|
||||||
|
|
||||||
|
Codex was right: version ranges aren't enough when daemon depends on
|
||||||
|
specific broker capabilities (client-supplied IDs, concurrent-connection
|
||||||
|
policy, key epochs).
|
||||||
|
|
||||||
|
### 15.1 Feature bits
|
||||||
|
|
||||||
|
Each protocol-relevant capability gets a stable string identifier:
|
||||||
|
|
||||||
|
```
|
||||||
|
client_message_id_dedupe broker dedupes on client_message_id (§4.2)
|
||||||
|
concurrent_connection_policy broker honours mesh.cloneConcurrencyPolicy (§2.3)
|
||||||
|
member_keypair_rotated_event broker emits the event (§14.1)
|
||||||
|
key_epoch per-topic key epochs supported (§14.2)
|
||||||
|
mesh_skill_share post-v0.9, future
|
||||||
|
mcp_host post-v0.9, future
|
||||||
|
```
|
||||||
|
|
||||||
|
### 15.2 Negotiation handshake
|
||||||
|
|
||||||
|
On WS connect (after hello, before normal traffic):
|
||||||
|
|
||||||
|
```
|
||||||
|
→ daemon: feature_negotiation_request
|
||||||
|
{ require: ["client_message_id_dedupe",
|
||||||
|
"concurrent_connection_policy"],
|
||||||
|
optional: ["mesh_skill_share","mcp_host"] }
|
||||||
|
|
||||||
|
← broker: feature_negotiation_response
|
||||||
|
{ supported: ["client_message_id_dedupe",
|
||||||
|
"concurrent_connection_policy",
|
||||||
|
"member_keypair_rotated_event"],
|
||||||
|
missing_required: [] }
|
||||||
|
```
|
||||||
|
|
||||||
|
If `missing_required` is non-empty, daemon closes the connection with code
|
||||||
|
4010 `feature_unavailable`, logs forensic event, exits with non-zero status.
|
||||||
|
Supervisor sees a restart-loop → operator alerted via configured
|
||||||
|
mechanisms.
|
||||||
|
|
||||||
|
### 15.3 IPC negotiation (CLI/SDK ↔ daemon)
|
||||||
|
|
||||||
|
`GET /v1/version` returns:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"daemon_version": "0.9.0",
|
||||||
|
"ipc_api": "v1",
|
||||||
|
"ipc_features": ["send","topic","peers","files","events","health"],
|
||||||
|
"schema_version": 7,
|
||||||
|
"broker_features_negotiated": ["client_message_id_dedupe", ...]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
CLI/SDK matches `ipc_features` against required. Missing required →
|
||||||
|
fall-back to cold-path with warning OR fail explicitly (CLI verb's choice).
|
||||||
|
|
||||||
|
### 15.4 Compatibility matrix — published
|
||||||
|
|
||||||
|
```json
|
||||||
|
GET /v1/compat
|
||||||
|
{
|
||||||
|
"daemon": "0.9.0",
|
||||||
|
"compatible_brokers": ["0.7.x","0.8.x","0.9.x"],
|
||||||
|
"required_broker_features": ["client_message_id_dedupe",
|
||||||
|
"concurrent_connection_policy"],
|
||||||
|
"compatible_clis": ["0.9.x"],
|
||||||
|
"compatible_sdks": {
|
||||||
|
"python": ">=0.9.0,<1.0.0",
|
||||||
|
"go": ">=0.9.0,<1.0.0",
|
||||||
|
"ts": ">=0.9.0,<1.0.0"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 16. Threat model — shared-CI reality folded in
|
||||||
|
|
||||||
|
### 16.1 Attacker classes — same matrix as v2 §16, plus:
|
||||||
|
|
||||||
|
| Attacker | Has | Wants | Mitigations |
|
||||||
|
|---|---|---|---|
|
||||||
|
| **Shared CI runner** (NEW) | Same Unix UID as other untrusted jobs | Read this user's persistent keypair across job boundaries | Auto-detect CI envs (§2.1) → ephemeral default + UDS-only + isolated `$HOME`. If operator overrides with `--persistent`, log warning `persistent_keypair_in_ci_environment`. |
|
||||||
|
| **Malicious mesh peer** (PROMOTED from out-of-scope to in-scope) | Mesh membership | Send malformed payload to crash daemon | Every inbound shape validated against schema before any processing. Daemon refuses unknown fields (defense-in-depth) and emits `cm_daemon_invalid_inbound_total`. Crashes from inbound payloads are bugs. |
|
||||||
|
|
||||||
|
### 16.2 Stated explicitly out of scope
|
||||||
|
|
||||||
|
- Root attacker on daemon host (can read keypair directly).
|
||||||
|
- Compromised broker (E2E content protection still holds; metadata is not
|
||||||
|
protected by daemon — that's mesh-level).
|
||||||
|
- Sophisticated attacker who copies BOTH `keypair.json` and
|
||||||
|
`host_fingerprint.json` (§2.2 calls this out).
|
||||||
|
- Receivers other than daemon-hosted peers deduping inbound traffic
|
||||||
|
(post-v0.9.0).
|
||||||
|
|
||||||
|
### 16.3 Container & CI defaults table (NEW)
|
||||||
|
|
||||||
|
| Environment | Identity | IPC | Hooks |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Bare metal / VM (default) | Persistent (clone-detected) | UDS + TCP loopback | Enabled |
|
||||||
|
| Docker container (`/.dockerenv`) | Persistent | UDS-only by default | Enabled |
|
||||||
|
| Kubernetes (`KUBERNETES_SERVICE_HOST`) | Persistent | UDS-only | Enabled |
|
||||||
|
| CI (`CI=true`, `GITHUB_ACTIONS`, etc.) | Ephemeral | UDS-only | Disabled by default (`[hooks] enabled = false` until opted-in) |
|
||||||
|
| RunPod (`RUNPOD_POD_ID`) | Ephemeral | UDS-only | Enabled |
|
||||||
|
|
||||||
|
Operator overrides any default with explicit flags; warning logged for
|
||||||
|
non-default-secure choices.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 17. Migration — same as v2 §17, plus broker schema add
|
||||||
|
|
||||||
|
Broker needs the schema delta in §4.3 (additive, partial unique indexes —
|
||||||
|
safe for online migration). Coordinated with daemon rollout: broker first,
|
||||||
|
then daemon. Daemon refuses to start against a broker that lacks
|
||||||
|
`client_message_id_dedupe` feature bit (§15).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What needs review (round 3)
|
||||||
|
|
||||||
|
Round 1 → identity, IPC auth, exactly-once lie, hook tokens, surface bloat,
|
||||||
|
missing rotation/recovery/migration/threat-model.
|
||||||
|
|
||||||
|
Round 2 → boot-id false-positive, broker must dedupe on client id (protocol
|
||||||
|
change), CI shared-runner reality, feature-bit negotiation, key rotation
|
||||||
|
crypto, hook scopes, FTS schema, ~7 polish items.
|
||||||
|
|
||||||
|
This v3 attempts to address all of those. Specifically critique:
|
||||||
|
|
||||||
|
1. **Accidental-clone framing (§2.2)** — does the honest framing close the
|
||||||
|
issue, or does removing boot-id make the detection so weak it's not worth
|
||||||
|
shipping at all? Should we drop fingerprint detection entirely and rely on
|
||||||
|
broker concurrent-connection policy?
|
||||||
|
2. **Broker schema delta (§4.3)** — is this the smallest correct change?
|
||||||
|
Partial unique indexes feel right; anything else needed (audit table,
|
||||||
|
gc job)?
|
||||||
|
3. **`max_age_hours` reduced to 23h** — codex's logic says daemon outbox TTL
|
||||||
|
must be inside broker dedupe window. Is 23h vs 24h tight enough? Should
|
||||||
|
the broker advertise its dedupe window as a feature parameter so the
|
||||||
|
daemon configures itself?
|
||||||
|
4. **Hook scopes (§6.2)** — too tight? `reply:event` + `dm:send:<sender>` +
|
||||||
|
`topic:<name>:post`. Does this cover real use cases for v0.9.0 hooks
|
||||||
|
(auto-reply, escalate-to-oncall, file-receipt-ack)?
|
||||||
|
5. **Feature-bit negotiation (§15)** — is the scheme right? Should
|
||||||
|
feature-bits be string identifiers (current) or numeric bit positions in
|
||||||
|
a bitmask (denser, more brittle)?
|
||||||
|
6. **CI defaults (§16.3)** — is the table accurate? Anything wrong about
|
||||||
|
defaulting hooks-disabled in CI?
|
||||||
|
7. **Key rotation grace-key archive (§14.1)** — is 7d the right default? Is
|
||||||
|
storing archived private keys on disk (mode 0600) acceptable, or should
|
||||||
|
they be encrypted at rest with a passphrase?
|
||||||
|
8. **Anything still wrong?** Read it as if you were going to operate this
|
||||||
|
daemon for a year — what falls down?
|
||||||
|
|
||||||
|
Three options after this review:
|
||||||
|
- **(a) v3 is shippable**: lock the spec, start coding the frozen core.
|
||||||
|
- **(b) v4 needed**: list the must-fix items.
|
||||||
|
- **(c) the architecture itself is wrong**: what would you do differently?
|
||||||
|
|
||||||
|
Be ruthless. We can break anything.
|
||||||
538
.artifacts/shipped/2026-05-03-daemon-final-spec-v4.md
Normal file
538
.artifacts/shipped/2026-05-03-daemon-final-spec-v4.md
Normal file
@@ -0,0 +1,538 @@
|
|||||||
|
# `claudemesh daemon` — Final Spec v4
|
||||||
|
|
||||||
|
> **Round 4.** v3 was reviewed by codex (round 3) and got an overall pass on
|
||||||
|
> architecture but flagged three precision gaps: (1) broker dedupe window
|
||||||
|
> semantics — permanent or windowed? schema as drawn was permanent but the
|
||||||
|
> prose said 24h; (2) feature-bit negotiation should carry parameters, not
|
||||||
|
> just booleans (so daemon can derive its outbox TTL from broker policy
|
||||||
|
> instead of hardcoding 23h); (3) key-archive record format and retention
|
||||||
|
> behavior were unspecified. Plus minor polish: document machine-id/MAC
|
||||||
|
> source precedence per OS, explicitly defer arbitrary outbound hook sends,
|
||||||
|
> resolve RunPod identity-vs-hooks inconsistency.
|
||||||
|
>
|
||||||
|
> **The intent §0 is unchanged from v2 — read it there.** v4 only revises
|
||||||
|
> what changed from v3.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0. Intent — unchanged, see v2 §0
|
||||||
|
|
||||||
|
Pre-launch peer-mesh runtime. Servers/laptops become first-class peers.
|
||||||
|
Stable identity, persistent WS, local IPC, hooks. Not a webhook gateway, not
|
||||||
|
a generic broker. We can break anything.
|
||||||
|
|
||||||
|
**One claim retracted from v1/v2**: "exactly-once" delivery. Replaced with a
|
||||||
|
precise contract in §4 below.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Process model — unchanged from v3 §1 / v2 §1
|
||||||
|
|
||||||
|
Resource caps, file layout, single-binary unchanged.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Identity — accidental-clone detection only, plus broker dedupe
|
||||||
|
|
||||||
|
Codex round-2 fix retained: no boot-id (false-positives every reboot).
|
||||||
|
Codex round-3 polish: spell out fingerprint sources per OS so we don't ship
|
||||||
|
a brittle "machine-id || first-mac" with no precedence rules.
|
||||||
|
|
||||||
|
### 2.1 Modes
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh daemon up # default: persistent member
|
||||||
|
claudemesh daemon up --ephemeral # in-memory keypair, never written
|
||||||
|
claudemesh daemon up --ephemeral --ttl 2h # auto-shutdown after duration
|
||||||
|
```
|
||||||
|
|
||||||
|
**CI auto-detection**: if any of these env vars are set (`CI=true`,
|
||||||
|
`GITHUB_ACTIONS`, `GITLAB_CI`, `BUILDKITE`, `CIRCLECI`, `JENKINS_URL`,
|
||||||
|
`KUBERNETES_SERVICE_HOST`), AND `--persistent` is not explicitly passed,
|
||||||
|
daemon defaults to `--ephemeral`. Rationale in §16.
|
||||||
|
|
||||||
|
`RUNPOD_POD_ID` removed from auto-CI list (was inconsistent — see §16.3).
|
||||||
|
|
||||||
|
### 2.2 Accidental-clone detection (NOT attacker-grade)
|
||||||
|
|
||||||
|
This catches **image clones, restored backups, copy-pasted homedirs** —
|
||||||
|
accidents made by humans. It does not defend against an attacker who copies
|
||||||
|
both `keypair.json` and `host_fingerprint.json`. The threat model (§16) says
|
||||||
|
this explicitly.
|
||||||
|
|
||||||
|
#### 2.2.1 Fingerprint source precedence (NEW — codex r3)
|
||||||
|
|
||||||
|
`host_fingerprint.json` stores `sha256(host_id || stable_mac)` where the
|
||||||
|
inputs are computed from the OS-specific table below, in order:
|
||||||
|
|
||||||
|
| OS | `host_id` (try in order) | `stable_mac` |
|
||||||
|
|---|---|---|
|
||||||
|
| Linux | `/etc/machine-id` → `/var/lib/dbus/machine-id` → first stable MAC | First non-loopback non-virtual interface, lex-sorted by name (`en…`/`eth…` before `wl…`); `docker0/veth*/br-*/lo` excluded |
|
||||||
|
| macOS | `IOPlatformUUID` (`ioreg -rd1 -c IOPlatformExpertDevice`) | First non-loopback non-virtual interface (`en0` typical) |
|
||||||
|
| Windows | `HKLM\SOFTWARE\Microsoft\Cryptography\MachineGuid` | First physical adapter (`Get-NetAdapter -Physical`), MAC sorted lex by adapter name |
|
||||||
|
| BSD | `kern.hostuuid` (`sysctl -n kern.hostuuid`) | Same MAC rule as Linux |
|
||||||
|
|
||||||
|
**Excluded interfaces** (cross-platform): loopback, point-to-point tunnels
|
||||||
|
(tailscale*, wg*, utun*, ppp*), docker (docker0, br-*, veth*), VPN
|
||||||
|
(`tap*`/`tun*`), VM bridges (vboxnet*, vmnet*), Apple awdl/llw bridges.
|
||||||
|
|
||||||
|
**Cloud-image false-positive note**: bare AMIs/Azure images regenerate
|
||||||
|
`/etc/machine-id` on first boot via cloud-init; for those, the first-boot
|
||||||
|
fingerprint is what we keep. If an operator clones a *running* VM
|
||||||
|
post-cloud-init, both `host_id` AND first-MAC will collide → the daemon
|
||||||
|
correctly flags this as an accidental clone.
|
||||||
|
|
||||||
|
If `host_id` cannot be read on the host's OS, daemon logs
|
||||||
|
`fingerprint_host_id_unavailable` and falls back to MAC-only. If MAC also
|
||||||
|
unavailable (truly headless container with no NIC), daemon logs
|
||||||
|
`fingerprint_unavailable`, persists a random UUID as `host_id`, and the
|
||||||
|
clone-detection feature is effectively disabled for this host (broker
|
||||||
|
concurrent-connection policy still works).
|
||||||
|
|
||||||
|
Behavior on mismatch (unchanged from v3): refuse / `accept-host` / `remint`.
|
||||||
|
`[clone] policy = "refuse" | "warn" | "allow"` overrides per host.
|
||||||
|
|
||||||
|
### 2.3 Concurrent-duplicate-identity broker policy — unchanged from v3 §2.3
|
||||||
|
|
||||||
|
`prefer_newest` (default), `prefer_oldest`, `allow_concurrent`. Configured
|
||||||
|
per-mesh in `mesh.cloneConcurrencyPolicy`.
|
||||||
|
|
||||||
|
### 2.4 Rename, key rotation — see §14
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. IPC surface — unchanged from v3 §3
|
||||||
|
|
||||||
|
Same frozen core, same auth model (UDS 0600 / TCP+SSE bearer / no token in
|
||||||
|
query / all endpoints auth by default / UDS-only in containers / Origin/Host
|
||||||
|
checks / no User-Agent theatre).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Delivery contract — at-least-once, **permanent** broker dedupe
|
||||||
|
|
||||||
|
Codex round 3 caught: v3's prose said "24h dedupe window" but the schema
|
||||||
|
(partial unique indexes with no `created_at`) gave **permanent** dedupe. We
|
||||||
|
have to pick. v4 chooses **permanent dedupe** because:
|
||||||
|
|
||||||
|
- It's the simplest correct choice. No GC job, no edge case where a
|
||||||
|
long-asleep daemon's retry slips past the window and double-sends.
|
||||||
|
- The unique index storage cost is bounded: at 1 KB per row × 100k
|
||||||
|
messages/day × 365 = ~36 GB/year of broker storage, which is well within
|
||||||
|
the broker's existing message-retention budget. Older message rows
|
||||||
|
themselves can still be GC'd by the existing message retention policy
|
||||||
|
(currently 365d) — only the `client_message_id` column on retained rows
|
||||||
|
has to live as long as that row does.
|
||||||
|
- It eliminates the daemon-side `max_age_hours = 23h` hack. Daemon outbox
|
||||||
|
TTL becomes "however long you want to keep retrying"; default 7d.
|
||||||
|
- It removes a class of "where exactly is the dedupe window edge?" bugs.
|
||||||
|
|
||||||
|
If broker storage growth becomes a real concern post-v0.9.0, we can convert
|
||||||
|
to a windowed scheme via a feature-bit upgrade (§15) — but we'd own the
|
||||||
|
correct migration semantics then.
|
||||||
|
|
||||||
|
### 4.1 The contract (precise)
|
||||||
|
|
||||||
|
> **Local guarantee**: each successful `POST /v1/send` returns a stable
|
||||||
|
> `client_message_id`. The send is durably persisted to `outbox.db` before
|
||||||
|
> the response returns.
|
||||||
|
>
|
||||||
|
> **Broker guarantee**: the broker dedupes on `client_message_id`
|
||||||
|
> **permanently within the lifetime of the row**. Multiple inflight retries
|
||||||
|
> from the daemon for the same `client_message_id` produce **at most one**
|
||||||
|
> broker-accepted row, regardless of time elapsed (subject to message-row
|
||||||
|
> retention policy on the broker). This is advertised via the
|
||||||
|
> `client_message_id_dedupe` feature-bit with `{ mode: "permanent" }`
|
||||||
|
> parameter (§15).
|
||||||
|
>
|
||||||
|
> **End-to-end guarantee**: at-least-once delivery to subscribers, with
|
||||||
|
> `client_message_id` propagated in the inbound envelope so receivers can
|
||||||
|
> dedupe locally. We do **not** guarantee at-most-once end-to-end —
|
||||||
|
> receiver-side dedupe is the receiver's job. The daemon's `inbox.db`
|
||||||
|
> provides it for daemon-hosted peers.
|
||||||
|
|
||||||
|
### 4.2 Daemon-supplied `client_message_id` — unchanged from v3 §4.2
|
||||||
|
|
||||||
|
Sources: `Idempotency-Key` header → body `client_message_id` → daemon-minted
|
||||||
|
ulid. Stored in outbox UNIQUE NOT NULL, propagated to broker, propagated to
|
||||||
|
receivers.
|
||||||
|
|
||||||
|
### 4.3 Broker schema delta — clarified as permanent dedupe
|
||||||
|
|
||||||
|
```sql
|
||||||
|
ALTER TABLE mesh.topic_message
|
||||||
|
ADD COLUMN client_message_id TEXT;
|
||||||
|
ALTER TABLE mesh.message_queue
|
||||||
|
ADD COLUMN client_message_id TEXT;
|
||||||
|
|
||||||
|
CREATE UNIQUE INDEX topic_message_client_id_idx
|
||||||
|
ON mesh.topic_message(mesh_id, client_message_id)
|
||||||
|
WHERE client_message_id IS NOT NULL;
|
||||||
|
CREATE UNIQUE INDEX message_queue_client_id_idx
|
||||||
|
ON mesh.message_queue(mesh_id, client_message_id)
|
||||||
|
WHERE client_message_id IS NOT NULL;
|
||||||
|
|
||||||
|
-- No created_at column needed for dedupe; the existing message row's
|
||||||
|
-- created_at handles row-level retention. Dedupe is permanent for the row's
|
||||||
|
-- lifetime, then naturally GC'd when the row is purged.
|
||||||
|
```
|
||||||
|
|
||||||
|
Partial unique indexes — legacy traffic without `client_message_id` (from
|
||||||
|
`claudemesh launch`, dashboard chat, web posts) is unaffected.
|
||||||
|
|
||||||
|
**Migration**: additive-only. Online ALTER TABLE on Postgres takes the row
|
||||||
|
lock for the column add but not the index build (`CREATE UNIQUE INDEX
|
||||||
|
CONCURRENTLY` is safe). Deploy order: schema migration → broker code that
|
||||||
|
reads/writes `client_message_id` → daemon code that sends it → daemon
|
||||||
|
enforces feature bit.
|
||||||
|
|
||||||
|
### 4.4 Outbox schema — unchanged from v3 §4.4
|
||||||
|
|
||||||
|
`UNIQUE NOT NULL` on `client_message_id`. Default `max_age_hours` raised
|
||||||
|
back to **168h (7d)** because broker dedupe is permanent — no need to stay
|
||||||
|
inside a 24h window.
|
||||||
|
|
||||||
|
### 4.5 Inbox schema — unchanged from v3 §4.5
|
||||||
|
|
||||||
|
Content table + indexes; FTS5 deferred.
|
||||||
|
|
||||||
|
### 4.6 Crash recovery — unchanged from v3 §4.6
|
||||||
|
|
||||||
|
### 4.7 Failure modes — windowed-broker case removed
|
||||||
|
|
||||||
|
The "broker dedupe window expired" failure mode in v3 §4.7 is **deleted**
|
||||||
|
because dedupe is permanent. Remaining cases:
|
||||||
|
|
||||||
|
- **`dead` rows**: surface in `claudemesh daemon outbox --failed`. User
|
||||||
|
manually requeues (`outbox requeue <id>`) or drops (`outbox drop <id>`).
|
||||||
|
- **Receiver-side dedupe**: only daemon-hosted receivers dedupe.
|
||||||
|
`claudemesh launch` and dashboard chat don't dedupe today; post-v0.9.0.
|
||||||
|
- **Broker row already GC'd, daemon retries**: daemon retry hits the
|
||||||
|
partial unique index → 23505 conflict. Broker treats as already-accepted,
|
||||||
|
returns the original `messageId` from a soft-delete tombstone OR (if the
|
||||||
|
row was hard-deleted by retention) returns `client_id_unknown`. Daemon
|
||||||
|
treats `client_id_unknown` as "delivered, history may have been pruned"
|
||||||
|
and marks `done`. Tombstone strategy is a broker implementation choice
|
||||||
|
(advertised via `client_message_id_dedupe.tombstone_retention_days` in
|
||||||
|
§15.1).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Inbound — unchanged from v3 §5
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Hooks — scopes tightened (codex r2), explicit deferment of arbitrary sends (codex r3)
|
||||||
|
|
||||||
|
### 6.1 Hooks contract — unchanged from v2 §6 / v3 §6.1
|
||||||
|
|
||||||
|
### 6.2 Capability scopes — narrowed for v0.9.0
|
||||||
|
|
||||||
|
| Scope | Capability | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| `reply:event` | Reply to the specific event that triggered this hook | Bound to `event_id`; daemon validates target; expires on hook exit |
|
||||||
|
| `dm:send:<sender_pubkey>` | Send DM only to the specific sender | Bound to one pubkey from event; not a write to anyone |
|
||||||
|
| `topic:<name>:post` | Post to the specific topic that fired | Bound to topic from event; can't write elsewhere |
|
||||||
|
|
||||||
|
**No read scopes in v0.9.0.** Hooks read via the event payload (which the
|
||||||
|
daemon redacts appropriately), not via daemon-mediated reads.
|
||||||
|
|
||||||
|
**Explicitly deferred to post-v0.9.0** (codex r3 — say it out loud so use
|
||||||
|
cases don't pile up against an undocumented limit):
|
||||||
|
|
||||||
|
- **Arbitrary outbound `dm:send` to anyone other than the event sender** —
|
||||||
|
no scope grant for this. "Escalate to oncall" hooks must shell out to
|
||||||
|
`claudemesh send <oncall>` with the user's normal config; the daemon
|
||||||
|
doesn't issue capability tokens for arbitrary recipients.
|
||||||
|
- **Cross-topic post** — a hook firing on `topic:alerts` cannot post to
|
||||||
|
`topic:incidents`. Same reason.
|
||||||
|
- **Mesh-cross post** — hooks see one mesh at a time.
|
||||||
|
- **Reading state/inbox/peers** — covered above.
|
||||||
|
|
||||||
|
If a real use case demands cross-topic or arbitrary-recipient hooks
|
||||||
|
post-v0.9.0, we add scopes like `dm:send:*` (wildcard) or
|
||||||
|
`topic:*:post` (wildcard) and gate them behind explicit operator opt-in in
|
||||||
|
config (`[hooks.<name>] dangerous_wildcards = true`). Not in v0.9.0.
|
||||||
|
|
||||||
|
### 6.3 Sandboxing — unchanged from v3 §6.3
|
||||||
|
|
||||||
|
Best-effort `network_policy = "deny"`; cross-platform unenforceability
|
||||||
|
acknowledged; counter `cm_daemon_hook_unenforceable_total` exposed.
|
||||||
|
|
||||||
|
### 6.4 Payload size & truncation — unchanged from v3 §6.4
|
||||||
|
|
||||||
|
### 6.5 Audit log + killpg — unchanged
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Multi-mesh — unchanged
|
||||||
|
|
||||||
|
## 8. Auto-routing — unchanged
|
||||||
|
|
||||||
|
## 9. Service installation — unchanged
|
||||||
|
|
||||||
|
## 10. Observability — unchanged
|
||||||
|
|
||||||
|
## 11. SDKs — unchanged
|
||||||
|
|
||||||
|
## 12. Security model — unchanged
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 13. Configuration — unchanged shape, plus parameterized features
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[features]
|
||||||
|
require = [
|
||||||
|
"client_message_id_dedupe", # broker provides §4.1 contract
|
||||||
|
"concurrent_connection_policy", # broker honours mesh.cloneConcurrencyPolicy
|
||||||
|
]
|
||||||
|
optional = ["mesh_skill_share", "mcp_host"]
|
||||||
|
# Daemon refuses to start if broker doesn't advertise all `require` bits.
|
||||||
|
# Broker advertises feature parameters in the negotiation response (§15.1)
|
||||||
|
# — daemon picks up `dedupe_mode` and `tombstone_retention_days` from there
|
||||||
|
# and writes them to its runtime view, not config.
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 14. Lifecycle — key rotation crypto fixed (codex r2), archive format spec'd (codex r3)
|
||||||
|
|
||||||
|
### 14.1 Key rotation — crypto correct (codex r2)
|
||||||
|
|
||||||
|
`claudemesh daemon rotate-keypair`:
|
||||||
|
|
||||||
|
- Mints fresh ed25519 + x25519 keypairs.
|
||||||
|
- Registers new pubkeys with the broker as `member_keypair_rotated` event.
|
||||||
|
- Broker associates the new pubkey with the same member id, marks the old
|
||||||
|
pubkey as `rotated_out` (not revoked); senders who haven't received the
|
||||||
|
rotation event continue to encrypt to the old pubkey for a grace window.
|
||||||
|
- Daemon retains the old x25519 **private** key (only x25519 — ed25519 is
|
||||||
|
for signing, doesn't need a grace window) in `keypair-archive.json`.
|
||||||
|
- During grace, decrypt path: try current private key first; on
|
||||||
|
`crypto_box_open_easy` failure, walk archived keys in order. Successful
|
||||||
|
archived-key decrypts increment `cm_daemon_decrypt_archived_total`.
|
||||||
|
- After grace expiry, archived keys are zeroed and the file is rewritten
|
||||||
|
without them. Messages still encrypted to a fully-expired pubkey fail to
|
||||||
|
decrypt and increment `cm_daemon_decrypt_stale_total`.
|
||||||
|
|
||||||
|
#### 14.1.1 Archive record format (NEW — codex r3)
|
||||||
|
|
||||||
|
`keypair-archive.json` (mode 0600, atomic-rename writes):
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"schema_version": 1,
|
||||||
|
"max_archived_keys": 8,
|
||||||
|
"keys": [
|
||||||
|
{
|
||||||
|
"pubkey": "ed25519-base64...",
|
||||||
|
"x25519_pubkey": "base64...",
|
||||||
|
"x25519_privkey": "base64...", // sensitive; whole file is 0600
|
||||||
|
"key_id": "k_01HQX...", // ulid; matches broker's record
|
||||||
|
"created_at": "2026-04-12T11:00:00Z",
|
||||||
|
"rotated_out_at": "2026-05-03T16:00:00Z",
|
||||||
|
"expires_at": "2026-05-10T16:00:00Z" // rotated_out_at + grace
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Rules:
|
||||||
|
|
||||||
|
- **`max_archived_keys`** (default 8): cap on archive size. If a rotation
|
||||||
|
would push the archive past the cap, the oldest entry is force-expired
|
||||||
|
(zeroed + removed) regardless of `expires_at`. Force-expiry increments
|
||||||
|
`cm_daemon_archive_force_expired_total{key_id}`. Operator who rotates
|
||||||
|
faster than 8 keys per grace-window-duration is intentionally accepting
|
||||||
|
decryption gaps for very-late inbound messages encrypted to those keys.
|
||||||
|
- **Grace period default**: 7 days. Configurable via
|
||||||
|
`[crypto] key_grace_period_days = 7`. Hard cap 30 days (codex review:
|
||||||
|
unbounded grace = unbounded archive on disk = bigger blast radius if
|
||||||
|
daemon host is compromised mid-life).
|
||||||
|
- **Cleanup**: scheduled daily at midnight local time + on-demand via
|
||||||
|
`claudemesh daemon archive-cleanup`. Walks `keys[]`, drops anything with
|
||||||
|
`expires_at < now`. If file is empty after cleanup, file is deleted.
|
||||||
|
- **Archive write failure**: rotation is aborted. Daemon refuses to commit
|
||||||
|
the new keypair if the archive can't be written durably. Logged as
|
||||||
|
`key_rotation_aborted_archive_write_failed`. New keypair is in memory
|
||||||
|
only; restart returns to old keypair. This is intentional: the archive
|
||||||
|
write is the durability point of rotation.
|
||||||
|
- **At-rest encryption**: archive file is mode 0600 plaintext, same threat
|
||||||
|
model as `keypair.json` (root-on-host can read both anyway). Operators
|
||||||
|
who want disk-level encryption can put `~/.claudemesh/` on an encrypted
|
||||||
|
volume; we don't reinvent that. Documented in the threat model (§16).
|
||||||
|
Future option `--archive-passphrase` deferred — adds passphrase prompt to
|
||||||
|
rotation/decrypt path, but breaks unattended daemon restart.
|
||||||
|
|
||||||
|
### 14.2 Backup includes topic state — unchanged from v3 §14.2
|
||||||
|
|
||||||
|
`keypair.json`, `keypair-archive.json` (with all archived keys),
|
||||||
|
`host_fingerprint.json`, `config.toml`, `topic_subscriptions.json`,
|
||||||
|
`topic_keys.json`, `key_epoch.json`, `schema_version`.
|
||||||
|
|
||||||
|
`local_token` NOT included; regenerated on restore.
|
||||||
|
|
||||||
|
### 14.3 Local token rotation, compromised host revocation, image-clone, uninstall, recovery — unchanged from v2 §14.3
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 15. Version compat — feature-bit negotiation with **parameters** (codex r3)
|
||||||
|
|
||||||
|
v3's feature bits were boolean. Codex r3: dedupe-window, max-payload, key
|
||||||
|
epochs all need parameters. v4 makes feature bits string-keyed entries that
|
||||||
|
optionally carry a value.
|
||||||
|
|
||||||
|
### 15.1 Feature bits with parameters
|
||||||
|
|
||||||
|
| Bit | Type | Parameters | Notes |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `client_message_id_dedupe` | object | `{ mode: "permanent"\|"windowed", window_hours?: int, tombstone_retention_days: int }` | Daemon reads `mode` to decide whether to enforce its own outbox max-age cap. `tombstone_retention_days` (broker-controlled) tells daemon how long it can expect "already-accepted" replies after the source row is GC'd |
|
||||||
|
| `concurrent_connection_policy` | bool | — | Broker honours `mesh.cloneConcurrencyPolicy` |
|
||||||
|
| `member_keypair_rotated_event` | bool | — | Broker emits the event |
|
||||||
|
| `key_epoch` | object | `{ max_concurrent_epochs: int }` | Per-topic key epochs supported |
|
||||||
|
| `max_payload` | object | `{ inline_bytes: int, blob_bytes: int }` | Hard limits broker enforces |
|
||||||
|
| `mesh_skill_share` | bool | — | Future |
|
||||||
|
| `mcp_host` | bool | — | Future |
|
||||||
|
|
||||||
|
### 15.2 Negotiation handshake (parameterized)
|
||||||
|
|
||||||
|
On WS connect, after hello, before normal traffic:
|
||||||
|
|
||||||
|
```
|
||||||
|
→ daemon: feature_negotiation_request
|
||||||
|
{
|
||||||
|
require: ["client_message_id_dedupe",
|
||||||
|
"concurrent_connection_policy"],
|
||||||
|
optional: ["mesh_skill_share","mcp_host","max_payload"]
|
||||||
|
}
|
||||||
|
|
||||||
|
← broker: feature_negotiation_response
|
||||||
|
{
|
||||||
|
supported: {
|
||||||
|
"client_message_id_dedupe": {
|
||||||
|
"mode": "permanent",
|
||||||
|
"tombstone_retention_days": 30
|
||||||
|
},
|
||||||
|
"concurrent_connection_policy": true,
|
||||||
|
"member_keypair_rotated_event": true,
|
||||||
|
"max_payload": {
|
||||||
|
"inline_bytes": 65536,
|
||||||
|
"blob_bytes": 524288000
|
||||||
|
}
|
||||||
|
},
|
||||||
|
missing_required: []
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
If `missing_required` is non-empty, daemon closes the connection with code
|
||||||
|
4010 `feature_unavailable`, logs forensic event, exits non-zero. Supervisor
|
||||||
|
sees a restart-loop → operator alert.
|
||||||
|
|
||||||
|
If `client_message_id_dedupe.mode == "windowed"`, daemon reads
|
||||||
|
`window_hours` and configures its outbox `max_age_hours` to
|
||||||
|
`window_hours - 1` (margin) instead of the 168h default. Permanent mode →
|
||||||
|
daemon uses the config default, no override.
|
||||||
|
|
||||||
|
### 15.3 IPC negotiation — unchanged from v3 §15.3
|
||||||
|
|
||||||
|
`GET /v1/version` returns daemon version, IPC features, schema version, and
|
||||||
|
the **parsed** broker feature parameters (so SDKs querying the daemon can
|
||||||
|
display them).
|
||||||
|
|
||||||
|
### 15.4 Compatibility matrix — unchanged from v3 §15.4
|
||||||
|
|
||||||
|
Published at `GET /v1/compat`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 16. Threat model — unchanged from v3 §16, plus RunPod fix
|
||||||
|
|
||||||
|
### 16.1 Attacker classes — unchanged
|
||||||
|
|
||||||
|
### 16.2 Out of scope — unchanged
|
||||||
|
|
||||||
|
### 16.3 Container & CI defaults table (RunPod inconsistency fixed)
|
||||||
|
|
||||||
|
| Environment | Identity | IPC | Hooks | Rationale |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| Bare metal / VM (default) | Persistent (clone-detected) | UDS + TCP loopback | Enabled | Trusted operator-owned host |
|
||||||
|
| Docker container (`/.dockerenv`) | Persistent | UDS-only by default | Enabled | Single-tenant container, host loopback shared |
|
||||||
|
| Kubernetes (`KUBERNETES_SERVICE_HOST`) | Persistent | UDS-only | Enabled | Single pod = single tenant |
|
||||||
|
| CI (`CI=true`, `GITHUB_ACTIONS`, etc.) | Ephemeral | UDS-only | Disabled by default (`[hooks] enabled = false`) | Multi-tenant runner; arbitrary code; ephemeral identity = no cross-job leak; hooks disabled because CI workloads are arbitrary user code |
|
||||||
|
| RunPod (`RUNPOD_POD_ID`) | Persistent | UDS-only | Enabled | Long-lived single-tenant sandbox; user owns the pod for its lifetime; identical trust model to a Docker container, NOT to a CI runner |
|
||||||
|
|
||||||
|
**RunPod resolution (codex r3)**: v3 listed RunPod under both "ephemeral
|
||||||
|
identity" and "hooks enabled" which was contradictory. v4 treats RunPod as
|
||||||
|
a **single-tenant container** (Docker-like): persistent identity, UDS-only,
|
||||||
|
hooks enabled. RunPod is removed from the CI auto-detect list (§2.1).
|
||||||
|
Operators who run RunPod as multi-tenant sandbox-as-CI can opt in with
|
||||||
|
`--ephemeral` + `[hooks] enabled = false` explicitly.
|
||||||
|
|
||||||
|
Operator overrides any default with explicit flags; warning logged for
|
||||||
|
non-default-secure choices.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 17. Migration — unchanged from v3 §17
|
||||||
|
|
||||||
|
Broker schema delta (additive partial unique indexes, safe online),
|
||||||
|
deployed before daemon. Daemon refuses to start if `client_message_id_dedupe`
|
||||||
|
feature bit is missing from broker's negotiation response.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What changed v3 → v4 (codex round-3 actionable items)
|
||||||
|
|
||||||
|
| Codex r3 item | v4 fix | Section |
|
||||||
|
|---|---|---|
|
||||||
|
| Broker dedupe window: permanent vs windowed? | **Picked permanent**; schema clarified; outbox `max_age_hours` raised back to 168h | §4 |
|
||||||
|
| Feature bits should be parameterized | All feature bits are string-keyed with optional value object | §15.1, §15.2 |
|
||||||
|
| Key archive record format unspecified | Full schema with `key_id`, timestamps, `max_archived_keys`, force-expiry rule, write-failure semantics | §14.1.1 |
|
||||||
|
| Document fingerprint source precedence per OS | Per-OS table for `host_id` and stable MAC; cloud-image false-positive note | §2.2.1 |
|
||||||
|
| Explicit deferment of arbitrary outbound hook sends | Listed deferred capabilities + escape hatch path post-v0.9.0 | §6.2 |
|
||||||
|
| RunPod ephemeral-but-hooks-enabled inconsistency | RunPod treated as single-tenant container; removed from CI auto-detect | §2.1, §16.3 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What needs review (round 4)
|
||||||
|
|
||||||
|
Round 1 → identity, IPC auth, exactly-once lie, hook tokens, surface bloat,
|
||||||
|
missing rotation/recovery/migration/threat-model.
|
||||||
|
|
||||||
|
Round 2 → boot-id false-positive, broker must dedupe on client id, CI
|
||||||
|
shared-runner reality, feature-bit negotiation, key rotation crypto, hook
|
||||||
|
scopes, FTS schema, ~7 polish items.
|
||||||
|
|
||||||
|
Round 3 → dedupe window semantics, feature-bit parameters, key archive
|
||||||
|
record format, fingerprint source precedence, deferred hook scopes, RunPod
|
||||||
|
inconsistency.
|
||||||
|
|
||||||
|
This v4 attempts to address all of round 3. Specifically:
|
||||||
|
|
||||||
|
1. **Permanent dedupe choice (§4)** — does the storage-cost calculus hold?
|
||||||
|
Is the tombstone path (`client_id_unknown` after row GC) actually
|
||||||
|
workable, or does it need to be a real tombstone table?
|
||||||
|
2. **Feature parameter shape (§15.1)** — is the type system right (object
|
||||||
|
with optional value)? Should it be a flat key-value list instead?
|
||||||
|
Versioning of parameters within a feature?
|
||||||
|
3. **Archive record format (§14.1.1)** — anything missing? Is
|
||||||
|
`max_archived_keys=8` a sensible default, or should it be unbounded with
|
||||||
|
a force-expiry on storage size instead of count?
|
||||||
|
4. **Fingerprint per-OS table (§2.2.1)** — accurate? Is BSD worth listing
|
||||||
|
if we're not actively building for FreeBSD in v0.9.0?
|
||||||
|
5. **Hook deferment list (§6.2)** — does it cover all the realistic v0.9.0
|
||||||
|
ask? Is the "shell out to `claudemesh send`" workaround for escalation
|
||||||
|
ergonomically acceptable?
|
||||||
|
6. **RunPod resolution (§16.3)** — agree with treating RunPod as
|
||||||
|
single-tenant container? Or are there real multi-tenant RunPod
|
||||||
|
deployments we should default-guard against?
|
||||||
|
7. **Anything else still wrong?** Read it as if you were going to operate
|
||||||
|
this for a year. What falls down?
|
||||||
|
|
||||||
|
Three options after this review:
|
||||||
|
- **(a) v4 is shippable**: lock the spec, start coding the frozen core.
|
||||||
|
- **(b) v5 needed**: list the must-fix items.
|
||||||
|
- **(c) the architecture itself is wrong**: what would you do differently?
|
||||||
|
|
||||||
|
Be ruthless. We can break anything.
|
||||||
468
.artifacts/shipped/2026-05-03-daemon-final-spec-v5.md
Normal file
468
.artifacts/shipped/2026-05-03-daemon-final-spec-v5.md
Normal file
@@ -0,0 +1,468 @@
|
|||||||
|
# `claudemesh daemon` — Final Spec v5
|
||||||
|
|
||||||
|
> **Round 5.** v4 was reviewed by codex (round 4) and got an architectural
|
||||||
|
> pass but flagged one blocker plus four polish items.
|
||||||
|
>
|
||||||
|
> **Blocker**: §4 called dedupe "permanent" while also saying it disappears
|
||||||
|
> when retained rows are hard-deleted. Internally inconsistent. Fix: real
|
||||||
|
> broker-side dedupe/tombstone table independent of message retention.
|
||||||
|
>
|
||||||
|
> **Polish**: (a) rename `mode: "permanent"` to `retention_scoped`; (b)
|
||||||
|
> deterministic duplicate-response shape; (c) feature-parameter schema
|
||||||
|
> validation rules + per-feature parameter version; (d) drop
|
||||||
|
> "zeroed/secure-delete" promises in archive cleanup, define malformed-archive
|
||||||
|
> startup behavior; plus Linux MAC||MAC self-collision noted, RunPod warning
|
||||||
|
> log on persistent default.
|
||||||
|
>
|
||||||
|
> **Intent §0 unchanged from v2.** v5 only revises what changed from v4.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0. Intent — unchanged, see v2 §0
|
||||||
|
|
||||||
|
Pre-launch peer-mesh runtime. Servers/laptops become first-class peers.
|
||||||
|
Stable identity, persistent WS, local IPC, hooks. Not a webhook gateway, not
|
||||||
|
a generic broker. We can break anything.
|
||||||
|
|
||||||
|
**One claim retracted from v1/v2**: "exactly-once" delivery. Replaced with a
|
||||||
|
precise contract in §4.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Process model — unchanged from v3 §1 / v2 §1
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Identity — accidental-clone detection only
|
||||||
|
|
||||||
|
### 2.1 Modes — unchanged from v4 §2.1, RunPod warning added
|
||||||
|
|
||||||
|
When `RUNPOD_POD_ID` is set and identity is persistent (the default for
|
||||||
|
RunPod under v4 §16.3), daemon logs `runpod_persistent_default_assumed` at
|
||||||
|
INFO. Operators running RunPod as multi-tenant CI surface set `--ephemeral`
|
||||||
|
explicitly; the warning makes the default visible in case the assumption
|
||||||
|
doesn't fit their deployment.
|
||||||
|
|
||||||
|
### 2.2 Accidental-clone detection — unchanged from v4 §2.2
|
||||||
|
|
||||||
|
#### 2.2.1 Fingerprint source precedence — unchanged from v4 §2.2.1, with self-collision note
|
||||||
|
|
||||||
|
**Linux MAC-only fallback (NEW note)**: when `/etc/machine-id` is unreadable
|
||||||
|
and we fall back to MAC-only as `host_id`, the resulting fingerprint is
|
||||||
|
effectively `sha256(mac || mac)`. This is acceptable for clone detection
|
||||||
|
(still uniquely identifies *this* host's first-NIC MAC) but reduces entropy
|
||||||
|
to ~48 bits. Operators who want stronger fingerprinting in degraded
|
||||||
|
environments can persist a generated UUID via `host_fingerprint.id_override`
|
||||||
|
in config; documented but not required.
|
||||||
|
|
||||||
|
### 2.3 Concurrent-duplicate-identity broker policy — unchanged from v3 §2.3
|
||||||
|
|
||||||
|
### 2.4 Rename, key rotation — see §14
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. IPC surface — unchanged from v4 §3
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Delivery contract — at-least-once, **dedupe table**, retention-scoped
|
||||||
|
|
||||||
|
Codex round 4 caught: v4 said "permanent" but also said dedupe disappears
|
||||||
|
when message rows are hard-deleted. That's `retention_scoped`, not
|
||||||
|
permanent — and worse, the partial-unique-index design fails when the row
|
||||||
|
itself is gone. v5 introduces a real broker-side dedupe table with its own
|
||||||
|
retention policy, independent of message retention.
|
||||||
|
|
||||||
|
### 4.1 The contract (precise)
|
||||||
|
|
||||||
|
> **Local guarantee**: each successful `POST /v1/send` returns a stable
|
||||||
|
> `client_message_id`. The send is durably persisted to `outbox.db` before
|
||||||
|
> the response returns.
|
||||||
|
>
|
||||||
|
> **Broker guarantee**: the broker maintains a dedupe record for every
|
||||||
|
> accepted `client_message_id` in a dedicated table
|
||||||
|
> (`mesh.client_message_dedupe`). The dedupe record outlives the message
|
||||||
|
> row when the dedupe-retention policy is longer than the
|
||||||
|
> message-retention policy. While the dedupe record exists, all retries
|
||||||
|
> with that `client_message_id` collapse to the original
|
||||||
|
> `broker_message_id` deterministically. After the dedupe record expires,
|
||||||
|
> a retry would create a new message — but daemon outbox `max_age_hours`
|
||||||
|
> is configured against the broker's advertised `dedupe_retention_days`
|
||||||
|
> with margin (§15.1), so this should not happen in practice.
|
||||||
|
>
|
||||||
|
> **End-to-end guarantee**: at-least-once delivery to subscribers, with
|
||||||
|
> `client_message_id` propagated in the inbound envelope. Receiver-side
|
||||||
|
> dedupe is the receiver's job; the daemon's `inbox.db` provides it for
|
||||||
|
> daemon-hosted peers.
|
||||||
|
|
||||||
|
### 4.2 Daemon-supplied `client_message_id` — unchanged from v3 §4.2
|
||||||
|
|
||||||
|
Sources: `Idempotency-Key` header → body `client_message_id` → daemon ulid.
|
||||||
|
Stored in outbox UNIQUE NOT NULL, propagated to broker, propagated to
|
||||||
|
receivers in inbound envelope.
|
||||||
|
|
||||||
|
### 4.3 Broker schema — dedupe table separate from message rows (v5)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- The dedupe authority. One row per (mesh, client_message_id) accepted
|
||||||
|
-- by the broker. Outlives mesh.topic_message rows when retention >
|
||||||
|
-- message retention.
|
||||||
|
CREATE TABLE mesh.client_message_dedupe (
|
||||||
|
mesh_id UUID NOT NULL REFERENCES mesh.mesh(id) ON DELETE CASCADE,
|
||||||
|
client_message_id TEXT NOT NULL,
|
||||||
|
broker_message_id UUID NOT NULL, -- the original accepted message id
|
||||||
|
destination_kind TEXT NOT NULL CHECK(destination_kind IN ('topic','dm','queue')),
|
||||||
|
destination_ref TEXT NOT NULL, -- topic name, recipient pubkey, etc.
|
||||||
|
first_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||||
|
expires_at TIMESTAMPTZ, -- NULL = never expires (operator opt-in)
|
||||||
|
status TEXT NOT NULL CHECK(status IN ('accepted','rejected')),
|
||||||
|
history_available BOOLEAN NOT NULL DEFAULT TRUE, -- flipped FALSE when message row GC'd
|
||||||
|
PRIMARY KEY (mesh_id, client_message_id)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX client_message_dedupe_expires_idx
|
||||||
|
ON mesh.client_message_dedupe(expires_at)
|
||||||
|
WHERE expires_at IS NOT NULL;
|
||||||
|
|
||||||
|
-- Existing tables get the convenience back-pointer (for receiver
|
||||||
|
-- inclusion in delivered envelopes); UNIQUE NOT enforced here — the
|
||||||
|
-- dedupe table is the authority.
|
||||||
|
ALTER TABLE mesh.topic_message ADD COLUMN client_message_id TEXT;
|
||||||
|
ALTER TABLE mesh.message_queue ADD COLUMN client_message_id TEXT;
|
||||||
|
```
|
||||||
|
|
||||||
|
**Retention semantics**:
|
||||||
|
|
||||||
|
- `expires_at = NULL` → dedupe row never expires unless mesh is deleted.
|
||||||
|
Operator opts in via mesh setting `dedupeRetentionMode = "permanent"`.
|
||||||
|
- `expires_at = first_seen_at + dedupe_retention_days` → default
|
||||||
|
`retention_scoped` mode. Default value: 365 days. Configurable per-mesh.
|
||||||
|
- A nightly broker job deletes rows where `expires_at < NOW()`.
|
||||||
|
- A separate broker job, fired when the message-retention sweep hard-deletes
|
||||||
|
a `mesh.topic_message` or `mesh.message_queue` row, sets the corresponding
|
||||||
|
dedupe row's `history_available = FALSE`. The dedupe row stays — only the
|
||||||
|
payload is gone. Retries still collapse correctly; receiver requests for
|
||||||
|
history return "row pruned" deterministically (§4.4 below).
|
||||||
|
|
||||||
|
**Migration**: additive-only. Daemon refuses to start unless broker
|
||||||
|
advertises feature `client_message_id_dedupe` with `mode` of
|
||||||
|
`retention_scoped` or `permanent` (§15.1).
|
||||||
|
|
||||||
|
### 4.4 Duplicate response — deterministic shape (NEW v5 — codex r4)
|
||||||
|
|
||||||
|
When the broker sees a send with a `client_message_id` already in
|
||||||
|
`mesh.client_message_dedupe`, the response is deterministic:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"broker_message_id": "msg_01HQX...",
|
||||||
|
"client_message_id": "cmid_01HQX...",
|
||||||
|
"duplicate": true,
|
||||||
|
"history_available": true, // false if message row was GC'd
|
||||||
|
"first_seen_at": "2026-05-03T11:42:00Z",
|
||||||
|
"destination_kind": "topic",
|
||||||
|
"destination_ref": "alerts"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Daemon outcomes:
|
||||||
|
|
||||||
|
- `duplicate: true, history_available: true` → mark outbox row `done`,
|
||||||
|
store `broker_message_id`. No re-fanout (broker did the work the first
|
||||||
|
time).
|
||||||
|
- `duplicate: true, history_available: false` → mark outbox row `done` but
|
||||||
|
log `cm_daemon_dedupe_history_pruned_total`. The message *did* deliver
|
||||||
|
the first time; we just can't show it in history. Receivers who needed
|
||||||
|
it have it; receivers who didn't have already missed their window.
|
||||||
|
- No more `client_id_unknown` — that response code is removed.
|
||||||
|
|
||||||
|
### 4.5 Outbox schema — daemon-side max-age derived (v5)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE outbox (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
client_message_id TEXT NOT NULL UNIQUE,
|
||||||
|
payload BLOB NOT NULL,
|
||||||
|
enqueued_at INTEGER NOT NULL,
|
||||||
|
attempts INTEGER DEFAULT 0,
|
||||||
|
next_attempt_at INTEGER NOT NULL,
|
||||||
|
status TEXT CHECK(status IN ('pending','inflight','done','dead')),
|
||||||
|
last_error TEXT,
|
||||||
|
delivered_at INTEGER,
|
||||||
|
broker_message_id TEXT
|
||||||
|
);
|
||||||
|
CREATE INDEX outbox_pending ON outbox(status, next_attempt_at);
|
||||||
|
```
|
||||||
|
|
||||||
|
Daemon `max_age_hours` is **derived** from the broker-advertised
|
||||||
|
`dedupe_retention_days` parameter:
|
||||||
|
- `permanent` → daemon default 168h (7d), capped at 30d. (Daemon doesn't
|
||||||
|
hold sends forever — that's an outbox bug surface.)
|
||||||
|
- `retention_scoped, dedupe_retention_days = N` → daemon
|
||||||
|
`max_age_hours = (N * 24) - safety_margin_hours`. Default
|
||||||
|
`safety_margin_hours = 24`.
|
||||||
|
- Operator override permitted but logged as
|
||||||
|
`outbox_max_age_above_broker_window` if it exceeds broker safe range.
|
||||||
|
|
||||||
|
### 4.6 Inbox schema — unchanged from v3 §4.5
|
||||||
|
|
||||||
|
### 4.7 Crash recovery — unchanged from v3 §4.6
|
||||||
|
|
||||||
|
### 4.8 Failure modes — corrected for dedupe-table model
|
||||||
|
|
||||||
|
- **`dead` rows**: surface in `claudemesh daemon outbox --failed`. Same as v4.
|
||||||
|
- **Receiver-side dedupe**: only daemon-hosted receivers dedupe. Same as v4.
|
||||||
|
- **Daemon retry after dedupe row expired AND message row GC'd**: in
|
||||||
|
`retention_scoped` mode this can only happen if the daemon outbox row
|
||||||
|
was older than `dedupe_retention_days - safety_margin`. Daemon will
|
||||||
|
refuse to send rows older than its computed `max_age_hours` (§4.5) —
|
||||||
|
they go to `dead` first, surfaced for human action. So this edge is
|
||||||
|
closed by daemon-side gating, not broker-side dedupe.
|
||||||
|
- **Daemon retry after dedupe row expired BUT message row still alive**:
|
||||||
|
doesn't happen by design — dedupe retention is always ≥ message
|
||||||
|
retention in operator-sane configs. If misconfigured, message row
|
||||||
|
persists with NULL `client_message_id` reference, retry creates a new
|
||||||
|
message, broker emits `cm_broker_dedupe_misconfig_total` with
|
||||||
|
`(mesh_id, retention_dedupe_days, retention_message_days)` labels.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Inbound — unchanged from v3 §5
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Hooks — unchanged from v4 §6
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7-13. Multi-mesh, auto-routing, service install, observability, SDKs, security model, configuration — unchanged from v4
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 14. Lifecycle — archive cleanup wording corrected (codex r4)
|
||||||
|
|
||||||
|
### 14.1 Key rotation — unchanged crypto from v4 §14.1
|
||||||
|
|
||||||
|
### 14.1.1 Archive record format — corrected wording (v5)
|
||||||
|
|
||||||
|
`keypair-archive.json` (mode 0600, atomic-rename writes):
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"schema_version": 1,
|
||||||
|
"max_archived_keys": 8,
|
||||||
|
"keys": [
|
||||||
|
{
|
||||||
|
"ed25519_pubkey": "base64...", // metadata only; matches the rotated-out signing key for that key_id
|
||||||
|
"x25519_pubkey": "base64...", // matches the retained private key
|
||||||
|
"x25519_privkey": "base64...", // sensitive; whole file is 0600
|
||||||
|
"key_id": "k_01HQX...",
|
||||||
|
"created_at": "2026-04-12T11:00:00Z",
|
||||||
|
"rotated_out_at": "2026-05-03T16:00:00Z",
|
||||||
|
"expires_at": "2026-05-10T16:00:00Z"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Field clarifications (codex r4)**:
|
||||||
|
- `ed25519_pubkey` is metadata — the daemon does not retain the old ed25519
|
||||||
|
*private* key. Stored to bind `key_id` ↔ old signing identity for audit
|
||||||
|
reconstruction (e.g. "this archived x25519 was the recipient half of a
|
||||||
|
member who at the time signed messages with the matching ed25519").
|
||||||
|
- `x25519_pubkey` MUST match the public half of `x25519_privkey`. Daemon
|
||||||
|
validates on archive load; mismatch → quarantine (see corruption rules).
|
||||||
|
|
||||||
|
**Cleanup wording (codex r4)**:
|
||||||
|
- On `expires_at < now`: entry is removed from the live archive file via
|
||||||
|
atomic-rename rewrite. **Secure deletion of the prior file's data is not
|
||||||
|
guaranteed** on modern filesystems (journals, COW snapshots, SSD wear
|
||||||
|
leveling, atomic-rename leaving stale inodes). Operators who need
|
||||||
|
cryptographic erasure must operate on encrypted volumes or reissue
|
||||||
|
hardware. Documented in threat model §16.
|
||||||
|
- "Force-expiry" when `max_archived_keys` is exceeded uses the same
|
||||||
|
removal mechanism; same caveat applies. Counter
|
||||||
|
`cm_daemon_archive_force_expired_total{key_id}` exposed.
|
||||||
|
|
||||||
|
**Duplicate `key_id` handling (NEW v5)**:
|
||||||
|
- Archive load rejects any file whose `keys[]` contains two records with
|
||||||
|
the same `key_id`. Quarantine to `keypair-archive.json.malformed-<ts>`,
|
||||||
|
start with empty archive, log `keypair_archive_duplicate_key_id`. Daemon
|
||||||
|
continues to start (we don't want archive corruption to be a permanent
|
||||||
|
outage). Old in-flight messages encrypted to the lost archived keys
|
||||||
|
fail to decrypt and are counted in `cm_daemon_decrypt_stale_total`.
|
||||||
|
|
||||||
|
**Malformed archive on startup (NEW v5)**:
|
||||||
|
- File present but JSON parse fails OR schema fails OR pubkey/privkey pair
|
||||||
|
fails validation: quarantine as above, start with empty archive, log
|
||||||
|
`keypair_archive_malformed`. Same continue-startup behavior.
|
||||||
|
- File missing entirely: treated as empty archive (normal first run /
|
||||||
|
post-cleanup state), no warning.
|
||||||
|
- File present but mode != 0600: log `keypair_archive_perms` warning,
|
||||||
|
read anyway. Operators surfaced; daemon doesn't auto-chmod (they should
|
||||||
|
fix their pipeline).
|
||||||
|
|
||||||
|
### 14.2 Backup — unchanged from v4 §14.2
|
||||||
|
|
||||||
|
### 14.3 Local token rotation, compromised host revocation, image-clone, uninstall, recovery — unchanged
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 15. Version compat — feature-bit schema validation (v5)
|
||||||
|
|
||||||
|
Codex r4: feature parameters need explicit schema-validation rules and
|
||||||
|
per-feature versioning so we don't paint ourselves into a corner when a
|
||||||
|
parameter shape evolves.
|
||||||
|
|
||||||
|
### 15.1 Feature bits with parameters and versions
|
||||||
|
|
||||||
|
Each feature bit's parameters are versioned independently of broker version:
|
||||||
|
|
||||||
|
| Bit | `params.version` | Required parameters | Optional parameters |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `client_message_id_dedupe` | `1` | `mode: "retention_scoped"\|"permanent"`, `dedupe_retention_days: int (>= 1)` (when mode=retention_scoped) | `tombstone_history_pruned_window_days: int` |
|
||||||
|
| `concurrent_connection_policy` | `1` | (no parameters) | `default_policy: "prefer_newest"\|"prefer_oldest"\|"allow_concurrent"` |
|
||||||
|
| `member_keypair_rotated_event` | `1` | (no parameters) | — |
|
||||||
|
| `key_epoch` | `1` | `max_concurrent_epochs: int (>= 1)` | — |
|
||||||
|
| `max_payload` | `1` | `inline_bytes: int (>= 1024)`, `blob_bytes: int (>= 1024)` | — |
|
||||||
|
| `mesh_skill_share` | future | — | — |
|
||||||
|
| `mcp_host` | future | — | — |
|
||||||
|
|
||||||
|
**Validation rules (NEW v5)**:
|
||||||
|
|
||||||
|
When the broker advertises feature parameters in
|
||||||
|
`feature_negotiation_response`, the daemon validates against the
|
||||||
|
parameter schema for that `params.version`. Validation failures:
|
||||||
|
|
||||||
|
- **Required parameter missing**: treated identically to "feature missing
|
||||||
|
from `supported`" — if the feature is in daemon's `require[]`, daemon
|
||||||
|
closes WS with code 4010 `feature_unavailable` and exits non-zero.
|
||||||
|
- **Required parameter out of bounds** (e.g. `dedupe_retention_days = -5`,
|
||||||
|
`inline_bytes = 0`): same — treated as "feature missing from
|
||||||
|
`supported`."
|
||||||
|
- **Unknown `params.version`**: if daemon doesn't recognize the version,
|
||||||
|
treated as "feature missing." Daemon does NOT silently degrade.
|
||||||
|
- **Optional parameter missing or invalid**: daemon uses its own default,
|
||||||
|
logs `feature_optional_param_invalid{feature, param, reason}`, continues.
|
||||||
|
- **Unknown `mode` for `client_message_id_dedupe`** (not "retention_scoped"
|
||||||
|
or "permanent"): treated as "feature missing." Future modes require a
|
||||||
|
`params.version` bump.
|
||||||
|
|
||||||
|
Validation is NOT silent: every feature_negotiation_response is logged
|
||||||
|
fully (with sensitive parameters redacted, though we don't currently have
|
||||||
|
any) at DEBUG, and a single line at INFO summarizes negotiated capabilities
|
||||||
|
on each successful negotiation.
|
||||||
|
|
||||||
|
### 15.2 Negotiation handshake — shape updated (v5)
|
||||||
|
|
||||||
|
```
|
||||||
|
→ daemon: feature_negotiation_request
|
||||||
|
{
|
||||||
|
require: ["client_message_id_dedupe",
|
||||||
|
"concurrent_connection_policy"],
|
||||||
|
optional: ["mesh_skill_share","mcp_host","max_payload"]
|
||||||
|
}
|
||||||
|
|
||||||
|
← broker: feature_negotiation_response
|
||||||
|
{
|
||||||
|
supported: {
|
||||||
|
"client_message_id_dedupe": {
|
||||||
|
"params": {
|
||||||
|
"version": 1,
|
||||||
|
"mode": "retention_scoped",
|
||||||
|
"dedupe_retention_days": 365,
|
||||||
|
"tombstone_history_pruned_window_days": 30
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"concurrent_connection_policy": {
|
||||||
|
"params": { "version": 1, "default_policy": "prefer_newest" }
|
||||||
|
},
|
||||||
|
"member_keypair_rotated_event": { "params": { "version": 1 } },
|
||||||
|
"max_payload": {
|
||||||
|
"params": { "version": 1, "inline_bytes": 65536, "blob_bytes": 524288000 }
|
||||||
|
}
|
||||||
|
},
|
||||||
|
missing_required: []
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
If `missing_required` is non-empty after broker's response OR after daemon
|
||||||
|
parameter validation, daemon closes with 4010 and exits non-zero.
|
||||||
|
|
||||||
|
### 15.3 IPC negotiation — unchanged from v3 §15.3
|
||||||
|
|
||||||
|
### 15.4 Compatibility matrix — unchanged from v3 §15.4
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 16. Threat model — unchanged from v4 §16
|
||||||
|
|
||||||
|
Plus archive-secure-delete clarification under §14.1.1.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 17. Migration — broker dedupe table is the new prereq
|
||||||
|
|
||||||
|
Broker side, deploy order:
|
||||||
|
1. `CREATE TABLE mesh.client_message_dedupe` + supporting indexes
|
||||||
|
(additive, online-safe).
|
||||||
|
2. `ALTER TABLE mesh.topic_message ADD COLUMN client_message_id` (already
|
||||||
|
in v3/v4 plan).
|
||||||
|
3. Broker code: every `INSERT` into `topic_message` / `message_queue` first
|
||||||
|
`INSERT ... ON CONFLICT DO UPDATE RETURNING` into
|
||||||
|
`client_message_dedupe`. The conflict path returns existing
|
||||||
|
`broker_message_id` instead of creating a new row.
|
||||||
|
4. Broker code: nightly job to delete `client_message_dedupe` rows where
|
||||||
|
`expires_at < NOW()`.
|
||||||
|
5. Broker code: hook into the existing message-retention sweep to set
|
||||||
|
`history_available = FALSE` on dedupe rows whose message row has been
|
||||||
|
pruned.
|
||||||
|
6. Broker advertises `client_message_id_dedupe` feature bit in negotiation
|
||||||
|
response.
|
||||||
|
7. Daemon refuses to start unless that feature bit is advertised with valid
|
||||||
|
params.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What changed v4 → v5 (codex round-4 actionable items)
|
||||||
|
|
||||||
|
| Codex r4 item | v5 fix | Section |
|
||||||
|
|---|---|---|
|
||||||
|
| Dedupe must be retention-scoped, not "permanent" with row-deletion gap | Real `mesh.client_message_dedupe` table; retention independent of message rows; `permanent` becomes opt-in mode meaning "no expires_at" | §4.1, §4.3 |
|
||||||
|
| Rename misleading mode | `retention_scoped` is the default; `permanent` reserved for explicit opt-in | §4.3, §15.1 |
|
||||||
|
| Deterministic duplicate response | New shape with `duplicate`, `broker_message_id`, `history_available`; removed `client_id_unknown` | §4.4 |
|
||||||
|
| Feature parameter validation rules | `params.version` per feature; required-param failure = treated as missing-required-feature; daemon closes WS 4010, exits non-zero | §15.1 |
|
||||||
|
| Drop "zeroed/secure-delete" promise | Replaced with "removed from live archive; secure deletion not guaranteed"; threat model documents | §14.1.1 |
|
||||||
|
| Duplicate `key_id` handling | Archive load rejects, quarantine, start empty, continue | §14.1.1 |
|
||||||
|
| Malformed archive startup behavior | Quarantine, start empty, continue; mode-mismatch warns but reads | §14.1.1 |
|
||||||
|
| Linux MAC||MAC self-collision | Documented; `host_fingerprint.id_override` escape hatch | §2.2.1 |
|
||||||
|
| RunPod warning on persistent default | Logged at INFO so default is visible | §2.1 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What needs review (round 5)
|
||||||
|
|
||||||
|
1. **Dedupe table design (§4.3)** — is `(mesh_id, client_message_id)`
|
||||||
|
PRIMARY KEY enough, or do we need versioning of the dedupe row itself
|
||||||
|
(e.g. when destination changes mid-retry)? Is `destination_kind` /
|
||||||
|
`destination_ref` needed at all, or just for audit?
|
||||||
|
2. **`history_available = FALSE` semantics (§4.4)** — does it actually fix
|
||||||
|
the case where receivers ask for history of a pruned message? Or does
|
||||||
|
the receiver need its own dedupe-with-history-pruned pathway?
|
||||||
|
3. **Daemon outbox max-age math (§4.5)** — is `dedupe_retention_days * 24
|
||||||
|
- 24` margin correct? Should the margin be a percentage instead of a
|
||||||
|
fixed 24h?
|
||||||
|
4. **Feature param validation (§15.1)** — does treating "invalid required
|
||||||
|
param" as "missing required feature" lose useful diagnostic detail?
|
||||||
|
Should we have a 4011 `feature_param_invalid` close code separately?
|
||||||
|
5. **Archive quarantine (§14.1.1)** — is "continue startup with empty
|
||||||
|
archive" the right call, or should it be opt-in / refuse-by-default?
|
||||||
|
6. **Anything else still wrong?** Read it as if you were going to operate
|
||||||
|
this for a year.
|
||||||
|
|
||||||
|
Three options:
|
||||||
|
- **(a) v5 is shippable**: lock the spec, start coding the frozen core.
|
||||||
|
- **(b) v6 needed**: list the must-fix items.
|
||||||
|
- **(c) the architecture itself is wrong**: what would you do differently?
|
||||||
|
|
||||||
|
Be ruthless.
|
||||||
447
.artifacts/shipped/2026-05-03-daemon-final-spec-v6.md
Normal file
447
.artifacts/shipped/2026-05-03-daemon-final-spec-v6.md
Normal file
@@ -0,0 +1,447 @@
|
|||||||
|
# `claudemesh daemon` — Final Spec v6
|
||||||
|
|
||||||
|
> **Round 6.** v5 was reviewed by codex (round 5) which found the dedupe
|
||||||
|
> table architecture sound but called out four idempotency-correctness
|
||||||
|
> issues that would silently corrupt sends in production:
|
||||||
|
>
|
||||||
|
> 1. **Idempotency key reuse with different payload/destination** — v5
|
||||||
|
> silently collapsed a different send onto the original. Need a request
|
||||||
|
> fingerprint.
|
||||||
|
> 2. **`status = 'rejected'` underspecified** — schema allowed it, semantics
|
||||||
|
> didn't. Either fully define or drop.
|
||||||
|
> 3. **Outbox max-age math edges** — `dedupe_retention_days = 1` minus 24h
|
||||||
|
> margin = 0 hours, which is undefined.
|
||||||
|
> 4. **Broker atomicity not stated** — dedupe insert and message insert
|
||||||
|
> must be one transaction or you produce orphan dedupe rows.
|
||||||
|
>
|
||||||
|
> v6 fixes all four. **Intent §0 unchanged from v2.** v6 only revises
|
||||||
|
> idempotency semantics in §4 and migration in §17.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0. Intent — unchanged, see v2 §0
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Process model — unchanged from v3 §1 / v2 §1
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Identity — unchanged from v5 §2
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. IPC surface — unchanged from v4 §3
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Delivery contract — at-least-once with **request-fingerprinted** dedupe
|
||||||
|
|
||||||
|
Codex r5: dedupe must compare the *whole request shape*, not just
|
||||||
|
`(mesh, client_message_id)`. Otherwise a caller who reuses an idempotency
|
||||||
|
key with a different destination or body silently drops the new send and
|
||||||
|
gets the old send's metadata back.
|
||||||
|
|
||||||
|
### 4.1 The contract (precise — v6)
|
||||||
|
|
||||||
|
> **Local guarantee**: each successful `POST /v1/send` returns a stable
|
||||||
|
> `client_message_id`. The send is durably persisted to `outbox.db` before
|
||||||
|
> the response returns.
|
||||||
|
>
|
||||||
|
> **Broker guarantee**: the broker maintains a dedupe record per accepted
|
||||||
|
> `(mesh_id, client_message_id)` in `mesh.client_message_dedupe`. Each
|
||||||
|
> dedupe record carries a canonical `request_fingerprint`. Retries with
|
||||||
|
> the same `client_message_id` AND matching fingerprint collapse to the
|
||||||
|
> original `broker_message_id`. Retries with the same `client_message_id`
|
||||||
|
> but a different fingerprint return a deterministic conflict
|
||||||
|
> (`409 idempotency_key_reused`) and do **not** create a new message.
|
||||||
|
>
|
||||||
|
> **Atomicity guarantee**: dedupe row insertion and message row insertion
|
||||||
|
> happen in one broker DB transaction. Either both land, or neither. No
|
||||||
|
> orphan dedupe rows. If the broker crashes between dedupe insert and
|
||||||
|
> message insert, the rollback unwinds both.
|
||||||
|
>
|
||||||
|
> **End-to-end guarantee**: at-least-once delivery, with
|
||||||
|
> `client_message_id` propagated to receivers' inboxes.
|
||||||
|
|
||||||
|
### 4.2 Daemon-supplied `client_message_id` — unchanged from v3 §4.2
|
||||||
|
|
||||||
|
### 4.3 Broker schema — request fingerprint added (v6)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE mesh.client_message_dedupe (
|
||||||
|
mesh_id UUID NOT NULL REFERENCES mesh.mesh(id) ON DELETE CASCADE,
|
||||||
|
client_message_id TEXT NOT NULL,
|
||||||
|
|
||||||
|
-- The original accepted message; FK NOT enforced because the message row
|
||||||
|
-- may be GC'd by retention sweeps before the dedupe row expires.
|
||||||
|
broker_message_id UUID NOT NULL,
|
||||||
|
|
||||||
|
-- Canonical fingerprint of the original request. Recomputed on every
|
||||||
|
-- duplicate retry; mismatch → 409 idempotency_key_reused. Schema in §4.4.
|
||||||
|
request_fingerprint BYTEA NOT NULL, -- 32-byte sha256
|
||||||
|
|
||||||
|
destination_kind TEXT NOT NULL CHECK(destination_kind IN ('topic','dm','queue')),
|
||||||
|
destination_ref TEXT NOT NULL,
|
||||||
|
first_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||||
|
expires_at TIMESTAMPTZ, -- NULL = `permanent` mode
|
||||||
|
history_available BOOLEAN NOT NULL DEFAULT TRUE, -- flipped FALSE when message row GC'd
|
||||||
|
|
||||||
|
PRIMARY KEY (mesh_id, client_message_id)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX client_message_dedupe_expires_idx
|
||||||
|
ON mesh.client_message_dedupe(expires_at)
|
||||||
|
WHERE expires_at IS NOT NULL;
|
||||||
|
|
||||||
|
ALTER TABLE mesh.topic_message ADD COLUMN client_message_id TEXT;
|
||||||
|
ALTER TABLE mesh.message_queue ADD COLUMN client_message_id TEXT;
|
||||||
|
```
|
||||||
|
|
||||||
|
**`status` column dropped (codex r5)**. Rejected requests do **not**
|
||||||
|
consume idempotency keys. Rationale below in §4.6.
|
||||||
|
|
||||||
|
### 4.4 Request fingerprint — canonical form (NEW v6)
|
||||||
|
|
||||||
|
The fingerprint covers everything that makes a send semantically distinct.
|
||||||
|
A retry must reproduce the same fingerprint bit-for-bit; anything else is
|
||||||
|
a different send and must not be collapsed.
|
||||||
|
|
||||||
|
```
|
||||||
|
request_fingerprint = sha256(
|
||||||
|
envelope_version || 0x00 ||
|
||||||
|
destination_kind || 0x00 ||
|
||||||
|
destination_ref || 0x00 ||
|
||||||
|
reply_to_id_or_empty || 0x00 ||
|
||||||
|
priority || 0x00 ||
|
||||||
|
meta_canonical_json || 0x00 ||
|
||||||
|
body_hash
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
Where:
|
||||||
|
- `envelope_version`: integer string (e.g. `"1"`). Bumps when the envelope
|
||||||
|
shape changes.
|
||||||
|
- `destination_kind`: `topic`, `dm`, or `queue`.
|
||||||
|
- `destination_ref`: topic name, recipient ed25519 pubkey hex, or queue id.
|
||||||
|
- `reply_to_id_or_empty`: original `broker_message_id` or empty string.
|
||||||
|
- `priority`: `now`, `next`, or `low`.
|
||||||
|
- `meta_canonical_json`: the `meta` field, serialized with sorted keys,
|
||||||
|
no whitespace, escape-canonical (RFC 8785 JCS). Empty meta = empty string.
|
||||||
|
- `body_hash`: sha256(body bytes), hex.
|
||||||
|
|
||||||
|
The fingerprint is computed:
|
||||||
|
1. **Daemon-side** before durable outbox persistence — stored as
|
||||||
|
`outbox.request_fingerprint` (NEW column) so retries always produce
|
||||||
|
the same fingerprint regardless of caller behavior.
|
||||||
|
2. **Broker-side** on first receipt — stored in
|
||||||
|
`client_message_dedupe.request_fingerprint`.
|
||||||
|
3. **Broker-side** on every duplicate retry — recomputed and compared
|
||||||
|
byte-equal to the stored value.
|
||||||
|
|
||||||
|
If the daemon and broker disagree on the canonical form (e.g. JCS
|
||||||
|
implementation drift), the broker emits
|
||||||
|
`cm_broker_dedupe_fingerprint_mismatch_total{client_id, mesh_id}` and
|
||||||
|
returns `409 idempotency_key_reused` with a body that includes the
|
||||||
|
broker's fingerprint hex for debugging. Daemons that see this should
|
||||||
|
log it loudly and stop retrying that outbox row (it goes to `dead`).
|
||||||
|
|
||||||
|
### 4.5 Duplicate response — three cases (v6)
|
||||||
|
|
||||||
|
| Case | HTTP/WS code | Body |
|
||||||
|
|---|---|---|
|
||||||
|
| First insert | `201 created` | `{ broker_message_id, client_message_id, history_id, duplicate: false }` |
|
||||||
|
| Duplicate, fingerprint match | `200 ok` | `{ broker_message_id, client_message_id, history_id, duplicate: true, history_available, first_seen_at }` |
|
||||||
|
| Duplicate, fingerprint mismatch | `409 idempotency_key_reused` | `{ client_message_id, conflict: "request_fingerprint_mismatch", broker_fingerprint_prefix: "ab12cd34..." }` (first 8 bytes hex) |
|
||||||
|
|
||||||
|
Daemon outcomes:
|
||||||
|
- `201` → mark outbox row `done`, store `broker_message_id`. Normal path.
|
||||||
|
- `200 duplicate` with `history_available: true` → mark `done`, no
|
||||||
|
re-fanout, log at INFO.
|
||||||
|
- `200 duplicate` with `history_available: false` → mark `done`, log at
|
||||||
|
WARN. The original delivery succeeded; receivers got it.
|
||||||
|
- `409 idempotency_key_reused` → mark outbox row `dead`, surface in
|
||||||
|
`claudemesh daemon outbox --failed`. Operator must rotate the
|
||||||
|
idempotency key by hand and resubmit (`outbox requeue --new-id <id>`,
|
||||||
|
NEW v6 subcommand). Daemon does NOT auto-rotate to avoid masking caller
|
||||||
|
bugs.
|
||||||
|
|
||||||
|
### 4.6 Why rejected requests don't consume idempotency keys (v6)
|
||||||
|
|
||||||
|
`status` was in v5's schema but underspecified. Two scenarios:
|
||||||
|
|
||||||
|
- **Transient broker error** (DB down, queue full, network blip): daemon
|
||||||
|
retries. If we'd persisted a `rejected` row on the first attempt, the
|
||||||
|
retry would fail forever. Bad.
|
||||||
|
- **Permanent validation error** (payload too large, destination not
|
||||||
|
found, auth missing): broker returns the appropriate `4xx` immediately
|
||||||
|
without inserting a dedupe row. Daemon either fixes the request and
|
||||||
|
retries (different fingerprint → fingerprint mismatch → `409` per §4.5)
|
||||||
|
or marks dead. Persisting a "rejected" row buys nothing — the daemon
|
||||||
|
isn't going to send the same broken request again with the same key.
|
||||||
|
|
||||||
|
Net result: `client_message_dedupe` rows only exist when the broker
|
||||||
|
**successfully** accepted a message and committed it. The single source
|
||||||
|
of truth for "was this idempotency key consumed?" is the existence of
|
||||||
|
the dedupe row. No status enum, no ambiguous states.
|
||||||
|
|
||||||
|
### 4.7 Broker atomicity contract (NEW v6)
|
||||||
|
|
||||||
|
Every accept path runs in one DB transaction with the following shape:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
BEGIN;
|
||||||
|
-- Pre-generate broker_message_id outside the transaction; pass in.
|
||||||
|
INSERT INTO mesh.client_message_dedupe
|
||||||
|
(mesh_id, client_message_id, broker_message_id, request_fingerprint,
|
||||||
|
destination_kind, destination_ref, expires_at)
|
||||||
|
VALUES ($mesh_id, $client_id, $msg_id, $fingerprint,
|
||||||
|
$dest_kind, $dest_ref, $expires_at)
|
||||||
|
ON CONFLICT (mesh_id, client_message_id) DO NOTHING
|
||||||
|
RETURNING broker_message_id, request_fingerprint, history_available, first_seen_at;
|
||||||
|
|
||||||
|
-- If RETURNING was empty (conflict), do a SELECT to fetch the original
|
||||||
|
-- and exit the transaction with a duplicate response.
|
||||||
|
-- If RETURNING produced a row AND $fingerprint != returned.fingerprint,
|
||||||
|
-- that's the §4.5 mismatch path — also exit with 409.
|
||||||
|
|
||||||
|
-- Otherwise, this is the first insert. Insert the message row.
|
||||||
|
INSERT INTO mesh.topic_message (id, mesh_id, client_message_id, body, ...)
|
||||||
|
VALUES ($msg_id, $mesh_id, $client_id, ...);
|
||||||
|
|
||||||
|
-- Optional: enqueue fan-out work, etc.
|
||||||
|
COMMIT;
|
||||||
|
```
|
||||||
|
|
||||||
|
Failure modes:
|
||||||
|
- Crash before `COMMIT`: both rows roll back. Next daemon retry inserts
|
||||||
|
cleanly.
|
||||||
|
- Crash after `COMMIT` but before WS ACK: dedupe row exists, message row
|
||||||
|
exists. Daemon retries → fingerprint matches → `200 duplicate`. Net:
|
||||||
|
exactly one broker-accepted row, one daemon `done` transition.
|
||||||
|
- Constraint violation on message row insert (e.g. unique violation on
|
||||||
|
some other column): rolls back the dedupe insert. Returns `5xx` to
|
||||||
|
daemon. Daemon retries; same fingerprint reproduces the same constraint
|
||||||
|
violation; daemon eventually marks `dead`. No orphan dedupe row.
|
||||||
|
|
||||||
|
Counter `cm_broker_dedupe_orphan_check_total` runs nightly and validates
|
||||||
|
that every `client_message_dedupe` row has a matching `topic_message` or
|
||||||
|
`message_queue` row OR the matching message row has been retention-pruned
|
||||||
|
(in which case `history_available = FALSE` was set). Any row failing both
|
||||||
|
conditions is logged as `cm_broker_dedupe_orphan_found{mesh_id}` for
|
||||||
|
human review. Should be zero in steady state.
|
||||||
|
|
||||||
|
### 4.8 Outbox schema — fingerprint stored alongside (v6)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE outbox (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
client_message_id TEXT NOT NULL UNIQUE,
|
||||||
|
request_fingerprint BLOB NOT NULL, -- 32 bytes
|
||||||
|
payload BLOB NOT NULL,
|
||||||
|
enqueued_at INTEGER NOT NULL,
|
||||||
|
attempts INTEGER DEFAULT 0,
|
||||||
|
next_attempt_at INTEGER NOT NULL,
|
||||||
|
status TEXT CHECK(status IN ('pending','inflight','done','dead')),
|
||||||
|
last_error TEXT,
|
||||||
|
delivered_at INTEGER,
|
||||||
|
broker_message_id TEXT
|
||||||
|
);
|
||||||
|
CREATE INDEX outbox_pending ON outbox(status, next_attempt_at);
|
||||||
|
```
|
||||||
|
|
||||||
|
`request_fingerprint` is computed at IPC accept time and stored. Every
|
||||||
|
retry sends the same bytes. The daemon never recomputes from `payload`
|
||||||
|
post-enqueue (would produce drift if envelope_version changes between
|
||||||
|
daemon runs).
|
||||||
|
|
||||||
|
### 4.9 Outbox max-age math — bounded (v6)
|
||||||
|
|
||||||
|
Codex r5: the v5 formula `(dedupe_retention_days * 24) - 24h_margin`
|
||||||
|
breaks at `dedupe_retention_days = 1` (yields zero) and is undefined
|
||||||
|
behavior at `<= 1`.
|
||||||
|
|
||||||
|
v6 formula and bounds:
|
||||||
|
|
||||||
|
- **Minimum supported broker dedupe retention**: 3 days. Daemon refuses
|
||||||
|
to start if broker advertises `dedupe_retention_days < 3` (treats it
|
||||||
|
as `feature_param_invalid`, exits 4010).
|
||||||
|
- **Daemon `max_age_hours` derivation**:
|
||||||
|
- `permanent` mode → daemon uses config default (168h = 7d), cap 720h
|
||||||
|
(30d).
|
||||||
|
- `retention_scoped` mode → daemon `max_age_hours = max(72,
|
||||||
|
(dedupe_retention_days * 24) - safety_margin_hours)` where
|
||||||
|
`safety_margin_hours = max(24, ceil(dedupe_retention_days * 0.1 *
|
||||||
|
24))`. For `dedupe_retention_days=3` this gives
|
||||||
|
`max(72, 72-24) = 72h`. For 30 days: `max(72, 720-72) = 648h`. For
|
||||||
|
365 days: `max(72, 8760-876) = 7884h`.
|
||||||
|
- The 72h floor prevents the daemon outbox from being uselessly short
|
||||||
|
— three days is enough margin for normal operator response to a
|
||||||
|
paged outage.
|
||||||
|
|
||||||
|
- Operator override allowed via `[outbox] max_age_hours_override = N`,
|
||||||
|
but if `N` exceeds `dedupe_retention_days * 24 - 1` daemon refuses to
|
||||||
|
start with `outbox_max_age_above_dedupe_window`. The override exists
|
||||||
|
for the rare case of a much-shorter-than-default outbox; it does not
|
||||||
|
exist to circumvent the broker's dedupe window.
|
||||||
|
|
||||||
|
### 4.10 Inbox schema — unchanged from v3 §4.5
|
||||||
|
|
||||||
|
### 4.11 Crash recovery — unchanged from v3 §4.6
|
||||||
|
|
||||||
|
### 4.12 Failure modes — corrected for fingerprint model (v6)
|
||||||
|
|
||||||
|
- **Fingerprint mismatch on retry** (`409 idempotency_key_reused`): outbox
|
||||||
|
row marked `dead`. Surfaced in `--failed` view. Operator command
|
||||||
|
`outbox requeue --new-id <id>` rotates `client_message_id` and retries.
|
||||||
|
- **Daemon retry after dedupe row hard-deleted by retention sweep**: in
|
||||||
|
`retention_scoped` mode, daemon `max_age_hours` is bounded inside the
|
||||||
|
retention window (§4.9), so this can only happen via operator override.
|
||||||
|
In that case the retry creates a NEW dedupe row + new message — the
|
||||||
|
caller chose this risk explicitly. Counter
|
||||||
|
`cm_daemon_retry_after_dedupe_expired_total`.
|
||||||
|
- **Daemon retry after dedupe row hard-deleted in `permanent` mode**:
|
||||||
|
cannot happen by definition — `permanent` means no `expires_at`. Only
|
||||||
|
mesh deletion removes dedupe rows.
|
||||||
|
- **Duplicate row, history pruned**: as v5 §4.4. Mark `done`, log
|
||||||
|
`cm_daemon_dedupe_history_pruned_total`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Inbound — unchanged from v3 §5
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Hooks — unchanged from v4 §6
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7-13. Multi-mesh, auto-routing, service install, observability, SDKs, security model, configuration — unchanged from v4
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 14. Lifecycle — unchanged from v5 §14
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 15. Version compat — feature param updated for new dedupe semantics
|
||||||
|
|
||||||
|
### 15.1 Feature bits with parameters (v6 update)
|
||||||
|
|
||||||
|
| Bit | `params.version` | Required parameters | Optional parameters |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `client_message_id_dedupe` | `2` | `mode: "retention_scoped"\|"permanent"`, `dedupe_retention_days: int (>= 3)` (when mode=retention_scoped), `request_fingerprint: bool == true` | `tombstone_history_pruned_window_days: int` |
|
||||||
|
| `concurrent_connection_policy` | `1` | (no parameters) | `default_policy: "prefer_newest"\|"prefer_oldest"\|"allow_concurrent"` |
|
||||||
|
| `member_keypair_rotated_event` | `1` | (no parameters) | — |
|
||||||
|
| `key_epoch` | `1` | `max_concurrent_epochs: int (>= 1)` | — |
|
||||||
|
| `max_payload` | `1` | `inline_bytes: int (>= 1024)`, `blob_bytes: int (>= 1024)` | — |
|
||||||
|
|
||||||
|
`client_message_id_dedupe` bumped to `params.version = 2` because it now
|
||||||
|
requires `request_fingerprint = true`. A broker still on version 1
|
||||||
|
(no fingerprint comparison) is treated as "feature missing" and the
|
||||||
|
daemon refuses to start. That's intentional — v0.9.0 daemons require
|
||||||
|
fingerprint enforcement for safe idempotency.
|
||||||
|
|
||||||
|
`dedupe_retention_days` minimum raised to 3 (matches the §4.9 floor).
|
||||||
|
|
||||||
|
### 15.2 Negotiation handshake — unchanged shape from v5 §15.2
|
||||||
|
|
||||||
|
### 15.3 IPC negotiation — unchanged from v3 §15.3
|
||||||
|
|
||||||
|
### 15.4 Compatibility matrix — unchanged from v3 §15.4
|
||||||
|
|
||||||
|
### 15.5 Diagnostic close codes (NEW v6 — codex r5)
|
||||||
|
|
||||||
|
WebSocket close codes are split for diagnostic clarity:
|
||||||
|
|
||||||
|
| Code | Reason | When |
|
||||||
|
|---|---|---|
|
||||||
|
| `4010` | `feature_unavailable` | Required feature missing from broker's `supported` |
|
||||||
|
| `4011` | `feature_param_invalid` | Required feature present but parameters fail validation (missing required, out of bounds, unknown version) |
|
||||||
|
| `4012` | `feature_param_below_floor` | Required feature parameter below daemon's hard floor (e.g. `dedupe_retention_days < 3`) |
|
||||||
|
|
||||||
|
Daemon logs the full negotiation payload at WARN before exiting; supervisor
|
||||||
|
+ alerting catches the restart loop.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 16. Threat model — unchanged from v4 §16
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 17. Migration — broker dedupe table + atomicity (v6)
|
||||||
|
|
||||||
|
Broker side, deploy order:
|
||||||
|
|
||||||
|
1. `CREATE TABLE mesh.client_message_dedupe` with v6 schema (additive,
|
||||||
|
online-safe).
|
||||||
|
2. `ALTER TABLE mesh.topic_message ADD COLUMN client_message_id`.
|
||||||
|
3. `ALTER TABLE mesh.message_queue ADD COLUMN client_message_id`.
|
||||||
|
4. Broker code refactor: every accept path wraps dedupe insert + message
|
||||||
|
insert in **one transaction** (§4.7). Pre-generated
|
||||||
|
`broker_message_id` (ulid in code) passed in.
|
||||||
|
5. Broker code: nightly job to delete dedupe rows where `expires_at <
|
||||||
|
NOW()` (skip in `permanent` mode).
|
||||||
|
6. Broker code: hook into the message-retention sweep — when a
|
||||||
|
`topic_message` or `message_queue` row is hard-deleted, find the
|
||||||
|
matching dedupe row by `client_message_id` and set `history_available
|
||||||
|
= FALSE`. (Note: `client_message_id` is nullable on those tables for
|
||||||
|
legacy traffic; nullable rows have no dedupe row to update.)
|
||||||
|
7. Broker code: nightly orphan-check job (§4.7); alerts on non-zero.
|
||||||
|
8. Broker advertises `client_message_id_dedupe` feature with
|
||||||
|
`params.version = 2` and `request_fingerprint: true`.
|
||||||
|
9. Daemon refuses to start unless that feature bit is advertised with
|
||||||
|
valid v2 params.
|
||||||
|
|
||||||
|
Rollback plan: feature flag disables fingerprint enforcement broker-side
|
||||||
|
(falls back to existing pre-v6 behavior — no dedupe). Daemons that
|
||||||
|
require fingerprint refuse to start. Operator switches off the feature
|
||||||
|
flag, reverts the daemon, restarts. No data loss; pending dedupe rows
|
||||||
|
remain in place for the next forward roll.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What changed v5 → v6 (codex round-5 actionable items)
|
||||||
|
|
||||||
|
| Codex r5 item | v6 fix | Section |
|
||||||
|
|---|---|---|
|
||||||
|
| Idempotency key reuse with different payload silently collapses | `request_fingerprint` BYTEA in dedupe table; canonical form per §4.4; 409 on mismatch | §4.3, §4.4, §4.5 |
|
||||||
|
| `status='rejected'` underspecified | Dropped `status` column; rejected requests don't consume keys; existence of dedupe row = "key consumed" | §4.3, §4.6 |
|
||||||
|
| Outbox max-age math edges at low retention | 72h floor; min `dedupe_retention_days = 3`; percentage-based safety margin; explicit override gating | §4.9, §15.1 |
|
||||||
|
| Broker atomicity not stated | One transaction per accept path; orphan-check job; rollback semantics | §4.7 |
|
||||||
|
| Diagnostic detail on feature param failures | New close codes 4011 / 4012 separate from 4010 | §15.5 |
|
||||||
|
| Outbox stores fingerprint | NEW column `outbox.request_fingerprint` BLOB; computed once at IPC accept | §4.8 |
|
||||||
|
| Operator command for fingerprint-mismatch recovery | NEW `outbox requeue --new-id <id>` to rotate idempotency key | §4.5 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What needs review (round 6)
|
||||||
|
|
||||||
|
1. **Request fingerprint canonical form (§4.4)** — does JCS work
|
||||||
|
cross-language for `meta_canonical_json` (Python json.dumps,
|
||||||
|
Go encoding/json, JS JSON.stringify all behave differently)? Should
|
||||||
|
we ship a vetted JCS lib in each SDK or fall back to a simpler
|
||||||
|
"sorted keys + no spaces + escape-as-stored" rule with conformance
|
||||||
|
tests?
|
||||||
|
2. **Atomicity contract (§4.7)** — is the orphan-check sufficient, or
|
||||||
|
does a violation mean we need a "broker rebuild dedupe from messages"
|
||||||
|
recovery tool? The latter is destructive but useful for ops emergencies.
|
||||||
|
3. **Max-age formula (§4.9)** — is the 72h floor correct? Is the
|
||||||
|
percentage-based safety margin (`max(24, ceil(0.1 * dedupe_window))`)
|
||||||
|
the right shape? Or simpler to say "always 24h"?
|
||||||
|
4. **`409 idempotency_key_reused` recovery flow (§4.5)** — is sending the
|
||||||
|
row to `dead` and surfacing it via `outbox --failed` enough? Should
|
||||||
|
the daemon emit a high-priority event for the SSE stream so operators
|
||||||
|
are paged immediately?
|
||||||
|
5. **Diagnostic close codes (§15.5)** — is splitting 4010/4011/4012
|
||||||
|
useful, or does it just push complexity onto operators? Should we
|
||||||
|
collapse to 4010 with structured close-reason JSON instead?
|
||||||
|
6. **Anything else still wrong?** Read it as if you were going to
|
||||||
|
operate this for a year. What falls down?
|
||||||
|
|
||||||
|
Three options:
|
||||||
|
- **(a) v6 is shippable**: lock the spec, start coding the frozen core.
|
||||||
|
- **(b) v7 needed**: list the must-fix items.
|
||||||
|
- **(c) the architecture itself is wrong**: what would you do differently?
|
||||||
|
|
||||||
|
Be ruthless.
|
||||||
439
.artifacts/shipped/2026-05-03-daemon-final-spec-v7.md
Normal file
439
.artifacts/shipped/2026-05-03-daemon-final-spec-v7.md
Normal file
@@ -0,0 +1,439 @@
|
|||||||
|
# `claudemesh daemon` — Final Spec v7
|
||||||
|
|
||||||
|
> **Round 7.** v6 was reviewed by codex (round 6) which found the broker
|
||||||
|
> layer largely correct but caught five daemon-side and broker-tx
|
||||||
|
> correctness gaps:
|
||||||
|
>
|
||||||
|
> 1. **Daemon-local duplicate POST semantics** undefined — local fingerprint
|
||||||
|
> comparison missing across `pending` / `inflight` / `done` / `dead`.
|
||||||
|
> 2. **§4.6 rejected-request contradiction** — talked about both "fix and
|
||||||
|
> retry" and "fingerprint mismatch → 409". Only one of those can be true.
|
||||||
|
> 3. **§4.7 pseudocode bug** — `ON CONFLICT DO NOTHING RETURNING` returns
|
||||||
|
> nothing on conflict; the fingerprint comparison was in the wrong branch.
|
||||||
|
> 4. **Max-age math floor consumes margin** — at min retention (3 days),
|
||||||
|
> daemon max-age 72h equals broker window 72h. Not inside the window.
|
||||||
|
> 5. **Broker transaction boundary incomplete** — fan-out/queue/history side
|
||||||
|
> effects not stated as in-transaction; "optional" wording was wrong.
|
||||||
|
>
|
||||||
|
> v7 fixes all five. **Intent §0 unchanged from v2.** v7 only revises §4
|
||||||
|
> (delivery contract) and §15 (feature param min) and §17 (migration).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0. Intent — unchanged, see v2 §0
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Process model — unchanged
|
||||||
|
|
||||||
|
## 2. Identity — unchanged from v5 §2
|
||||||
|
|
||||||
|
## 3. IPC surface — unchanged from v4 §3
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Delivery contract — at-least-once, fingerprinted at IPC and broker layers
|
||||||
|
|
||||||
|
### 4.1 The contract (precise — v7)
|
||||||
|
|
||||||
|
> **Local guarantee**: each successful `POST /v1/send` returns a stable
|
||||||
|
> `client_message_id`. The send is durably persisted to `outbox.db` before
|
||||||
|
> the response returns. The daemon enforces request-fingerprint
|
||||||
|
> idempotency at the IPC layer: a duplicate `POST` with the same
|
||||||
|
> `client_message_id` and matching `request_fingerprint` returns the
|
||||||
|
> stable prior result; with a mismatched fingerprint it returns local
|
||||||
|
> `409 idempotency_key_reused` and the new request is **not** persisted.
|
||||||
|
>
|
||||||
|
> **Broker guarantee**: the broker maintains a dedupe record per
|
||||||
|
> accepted `(mesh_id, client_message_id)` in `mesh.client_message_dedupe`
|
||||||
|
> with `request_fingerprint`. Retries with matching fingerprint collapse;
|
||||||
|
> retries with mismatched fingerprint return `409
|
||||||
|
> idempotency_key_reused` without creating a new message.
|
||||||
|
>
|
||||||
|
> **Atomicity guarantee**: every durable side effect of a successful
|
||||||
|
> accept (dedupe row, message row, fan-out work, history row, queue
|
||||||
|
> insertion) lands in the same broker DB transaction. Either all commit
|
||||||
|
> or none do.
|
||||||
|
>
|
||||||
|
> **End-to-end guarantee**: at-least-once delivery, with
|
||||||
|
> `client_message_id` propagated to receivers' inboxes.
|
||||||
|
|
||||||
|
### 4.2 Daemon-supplied `client_message_id` — unchanged from v3 §4.2
|
||||||
|
|
||||||
|
### 4.3 Broker schema — unchanged from v6 §4.3
|
||||||
|
|
||||||
|
(`mesh.client_message_dedupe` table with `request_fingerprint BYTEA`, no
|
||||||
|
`status` column.)
|
||||||
|
|
||||||
|
### 4.4 Request fingerprint canonical form — unchanged from v6 §4.4
|
||||||
|
|
||||||
|
### 4.5 Daemon-local idempotency at the IPC layer (NEW v7 — codex r6)
|
||||||
|
|
||||||
|
The daemon enforces fingerprint idempotency **before** the request hits
|
||||||
|
`outbox.db` so a caller bug never creates duplicate-key/mismatch-payload
|
||||||
|
state at all.
|
||||||
|
|
||||||
|
#### 4.5.1 IPC accept algorithm
|
||||||
|
|
||||||
|
On `POST /v1/send`:
|
||||||
|
|
||||||
|
1. Validate request envelope (auth, schema, size limits). Failures
|
||||||
|
here return `4xx` immediately. **No outbox row is written.** The
|
||||||
|
`client_message_id` (whether caller-supplied or daemon-minted) is
|
||||||
|
**not consumed** — the same id may be reused by the caller for a
|
||||||
|
subsequent valid send.
|
||||||
|
2. Compute `request_fingerprint` (§4.4).
|
||||||
|
3. Look up existing outbox row by `client_message_id`:
|
||||||
|
|
||||||
|
| Existing row state | Fingerprint match? | Daemon response |
|
||||||
|
|---|---|---|
|
||||||
|
| (no row) | — | Insert new outbox row in `pending`; return `202 accepted, queued` with `client_message_id` |
|
||||||
|
| `pending` | match | Return `202 accepted, queued` with the existing `client_message_id`. No new row. Idempotent retry of an in-progress send |
|
||||||
|
| `pending` | mismatch | Return `409 idempotency_key_reused` with `conflict: "outbox_pending_fingerprint_mismatch"`. **No mutation of the existing row.** |
|
||||||
|
| `inflight` | match | Return `202 accepted, inflight`. No new row. Caller is retrying mid-broker-roundtrip |
|
||||||
|
| `inflight` | mismatch | Return `409 idempotency_key_reused` with `conflict: "outbox_inflight_fingerprint_mismatch"` |
|
||||||
|
| `done` | match | Return `200 ok, duplicate: true, broker_message_id, history_id`. No new row, no broker call |
|
||||||
|
| `done` | mismatch | Return `409 idempotency_key_reused` with `conflict: "outbox_done_fingerprint_mismatch", broker_message_id` |
|
||||||
|
| `dead` | match | Return `409 idempotency_key_reused` with `conflict: "outbox_dead_fingerprint_match", reason: "<last_error>"`. Caller must rotate the id (see §4.6.3) — daemon refuses to re-attempt a dead row's exact bytes. |
|
||||||
|
| `dead` | mismatch | Return `409 idempotency_key_reused` with `conflict: "outbox_dead_fingerprint_mismatch"` |
|
||||||
|
|
||||||
|
Rule: any IPC `409` carries the daemon's `request_fingerprint` (8-byte
|
||||||
|
hex prefix) so callers can debug client/server canonical-form drift.
|
||||||
|
|
||||||
|
#### 4.5.2 Outbox table — fingerprint required, atomic UPSERT removed
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE outbox (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
client_message_id TEXT NOT NULL UNIQUE,
|
||||||
|
request_fingerprint BLOB NOT NULL, -- 32 bytes
|
||||||
|
payload BLOB NOT NULL,
|
||||||
|
enqueued_at INTEGER NOT NULL,
|
||||||
|
attempts INTEGER DEFAULT 0,
|
||||||
|
next_attempt_at INTEGER NOT NULL,
|
||||||
|
status TEXT CHECK(status IN ('pending','inflight','done','dead')),
|
||||||
|
last_error TEXT,
|
||||||
|
delivered_at INTEGER,
|
||||||
|
broker_message_id TEXT
|
||||||
|
);
|
||||||
|
CREATE INDEX outbox_pending ON outbox(status, next_attempt_at);
|
||||||
|
```
|
||||||
|
|
||||||
|
Insertion is `BEGIN; SELECT FOR UPDATE; if-no-row INSERT; COMMIT;` —
|
||||||
|
explicit lock + check + insert, not `INSERT OR IGNORE`. The daemon
|
||||||
|
never auto-mutates an existing row's `request_fingerprint` or
|
||||||
|
`payload`; mismatches are 409s, not silent overwrites.
|
||||||
|
|
||||||
|
`request_fingerprint` is computed once at IPC accept time and frozen.
|
||||||
|
Retries to the broker re-send the same bytes from `payload` and the
|
||||||
|
same `request_fingerprint`. Daemon does not recompute post-enqueue.
|
||||||
|
|
||||||
|
### 4.6 Rejected-request semantics — pick one rule (NEW v7 — codex r6)
|
||||||
|
|
||||||
|
> **Rule: the `client_message_id` is consumed iff the daemon writes an
|
||||||
|
> outbox row. Anything that fails before outbox insertion (validation,
|
||||||
|
> auth, size) leaves the id untouched and freely reusable.**
|
||||||
|
|
||||||
|
This makes §4.6 internally consistent with §4.5:
|
||||||
|
|
||||||
|
#### 4.6.1 IPC validation failure (no outbox row written)
|
||||||
|
|
||||||
|
- Schema/auth/size/destination-not-resolvable failures return `4xx`
|
||||||
|
immediately. The `client_message_id` is **not** stored anywhere on
|
||||||
|
the daemon. Caller may re-send with the same id and a fixed payload;
|
||||||
|
it will be treated as a fresh request because no outbox row exists.
|
||||||
|
|
||||||
|
#### 4.6.2 Outbox row exists, broker permanent rejection (4xx response)
|
||||||
|
|
||||||
|
- Daemon receives `4xx` from broker (e.g. payload size delta between
|
||||||
|
daemon and broker advertised limits, mesh-level reject). Outbox row
|
||||||
|
transitions to `dead` with `last_error` populated.
|
||||||
|
- Caller retrying with same `client_message_id` → daemon returns
|
||||||
|
`409 idempotency_key_reused, conflict: "outbox_dead_*"` per §4.5.1.
|
||||||
|
- The id is consumed (row is locked in `dead`) until operator action.
|
||||||
|
|
||||||
|
#### 4.6.3 Operator recovery: rotating an idempotency key
|
||||||
|
|
||||||
|
To unstick a `dead` row whose payload needs to change, operator runs:
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh daemon outbox requeue --id <outbox_id> --new-client-id [auto|<id>]
|
||||||
|
```
|
||||||
|
|
||||||
|
This atomically:
|
||||||
|
1. Marks the existing `dead` row as `aborted` (terminal, never retried).
|
||||||
|
2. Creates a new outbox row with a fresh `client_message_id` (caller-
|
||||||
|
supplied or daemon-ulid'd) and the SAME or a CALLER-PATCHED payload.
|
||||||
|
3. The old `client_message_id` becomes free again at the daemon layer
|
||||||
|
but is still locked at the broker layer if the broker had ever
|
||||||
|
accepted it (its dedupe row stays). For a row that died before
|
||||||
|
broker acceptance, the id is fully reusable end-to-end.
|
||||||
|
|
||||||
|
Operators see a clear distinction between `dead` (needs operator
|
||||||
|
attention) and `aborted` (intentionally retired). Add `aborted` to the
|
||||||
|
status CHECK constraint:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
status TEXT CHECK(status IN ('pending','inflight','done','dead','aborted'))
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4.7 Broker atomicity contract — corrected pseudocode + side-effect inventory (v7 — codex r6)
|
||||||
|
|
||||||
|
#### 4.7.1 Side effects inside the transaction
|
||||||
|
|
||||||
|
Every successful broker accept atomically commits the following durable
|
||||||
|
state in **one transaction**:
|
||||||
|
|
||||||
|
| Effect | Table | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| Dedupe record | `mesh.client_message_dedupe` | NEW row keyed by `(mesh_id, client_message_id)` |
|
||||||
|
| Message body | `mesh.topic_message` OR `mesh.message_queue` | NEW row keyed by `broker_message_id` (pre-generated ulid) |
|
||||||
|
| History row | `mesh.message_history` | NEW row pointing at `broker_message_id` for ordered replay |
|
||||||
|
| Fan-out work | `mesh.delivery_queue` | One row per intended recipient (member subscribed to topic, recipient of DM, etc.) |
|
||||||
|
|
||||||
|
Effects **outside** the transaction (committed after ACK to daemon):
|
||||||
|
- WebSocket pushes to currently-connected subscribers — these are best-
|
||||||
|
effort live notifications; on failure subscribers fetch from history
|
||||||
|
on next connect.
|
||||||
|
- Webhook fan-out (post-v0.9.0 feature) — runs asynchronously off the
|
||||||
|
`delivery_queue` rows committed inside the transaction.
|
||||||
|
|
||||||
|
If any in-transaction insert fails (constraint violation, DB error),
|
||||||
|
the transaction rolls back: no dedupe row, no message row, no history,
|
||||||
|
no delivery queue rows. Broker returns `5xx` to daemon; daemon retries.
|
||||||
|
|
||||||
|
#### 4.7.2 Corrected pseudocode (codex r6)
|
||||||
|
|
||||||
|
The fingerprint comparison must happen on the conflict-select branch,
|
||||||
|
not the `RETURNING` branch:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
BEGIN;
|
||||||
|
|
||||||
|
-- Pre-generate broker_message_id (ulid) outside the transaction, pass in.
|
||||||
|
|
||||||
|
-- Step 1: try to claim the idempotency key.
|
||||||
|
INSERT INTO mesh.client_message_dedupe
|
||||||
|
(mesh_id, client_message_id, broker_message_id, request_fingerprint,
|
||||||
|
destination_kind, destination_ref, expires_at)
|
||||||
|
VALUES ($mesh_id, $client_id, $msg_id, $fingerprint,
|
||||||
|
$dest_kind, $dest_ref, $expires_at)
|
||||||
|
ON CONFLICT (mesh_id, client_message_id) DO NOTHING;
|
||||||
|
|
||||||
|
-- Step 2: was it our insert?
|
||||||
|
SELECT broker_message_id, request_fingerprint, destination_kind,
|
||||||
|
destination_ref, history_available, first_seen_at
|
||||||
|
FROM mesh.client_message_dedupe
|
||||||
|
WHERE mesh_id = $mesh_id AND client_message_id = $client_id
|
||||||
|
FOR SHARE;
|
||||||
|
|
||||||
|
-- If returned.broker_message_id == $msg_id (our pre-generated id),
|
||||||
|
-- this was the first insert. Continue to step 3.
|
||||||
|
-- If returned.broker_message_id != $msg_id AND
|
||||||
|
-- returned.request_fingerprint == $fingerprint,
|
||||||
|
-- this is a duplicate retry. ROLLBACK; return 200 duplicate.
|
||||||
|
-- If returned.broker_message_id != $msg_id AND
|
||||||
|
-- returned.request_fingerprint != $fingerprint,
|
||||||
|
-- ROLLBACK; return 409 idempotency_key_reused.
|
||||||
|
|
||||||
|
-- Step 3: insert message row, history, fan-out queue.
|
||||||
|
INSERT INTO mesh.topic_message (id, mesh_id, client_message_id, body, ...)
|
||||||
|
VALUES ($msg_id, $mesh_id, $client_id, ...);
|
||||||
|
|
||||||
|
INSERT INTO mesh.message_history (broker_message_id, mesh_id, ...)
|
||||||
|
VALUES ($msg_id, $mesh_id, ...);
|
||||||
|
|
||||||
|
INSERT INTO mesh.delivery_queue (broker_message_id, recipient_pubkey, ...)
|
||||||
|
SELECT $msg_id, member_pubkey, ...
|
||||||
|
FROM mesh.topic_subscription
|
||||||
|
WHERE topic = $dest_ref AND mesh_id = $mesh_id;
|
||||||
|
|
||||||
|
COMMIT;
|
||||||
|
```
|
||||||
|
|
||||||
|
The branch logic determines the response shape (`201` vs `200
|
||||||
|
duplicate` vs `409 idempotency_key_reused`) before COMMIT. The
|
||||||
|
duplicate and 409 branches always ROLLBACK because nothing else
|
||||||
|
needs to commit on those paths.
|
||||||
|
|
||||||
|
`SELECT … FOR SHARE` blocks concurrent writers from upgrading the
|
||||||
|
same dedupe row mid-transaction; a concurrent insert with the same
|
||||||
|
key will block until our transaction completes.
|
||||||
|
|
||||||
|
#### 4.7.3 Orphan check — covers full inventory now
|
||||||
|
|
||||||
|
The nightly `cm_broker_dedupe_orphan_check_total` job (v6 §4.7) is
|
||||||
|
extended to verify all four in-transaction effects. For each
|
||||||
|
`client_message_dedupe` row:
|
||||||
|
- Either the corresponding `topic_message` / `message_queue` row exists,
|
||||||
|
OR `history_available = FALSE` AND a deleted-tombstone is recorded.
|
||||||
|
- AND a corresponding `message_history` row exists (or has been pruned
|
||||||
|
per history retention).
|
||||||
|
- AND zero outstanding `delivery_queue` rows older than fan-out timeout
|
||||||
|
reference a `broker_message_id` whose dedupe row is missing.
|
||||||
|
|
||||||
|
Any inconsistency logged as `cm_broker_atomicity_violation_found` for
|
||||||
|
human review. Should be zero in steady state.
|
||||||
|
|
||||||
|
### 4.8 Outbox max-age math — strictly inside broker window (v7 — codex r6)
|
||||||
|
|
||||||
|
Codex r6: at v6's 3-day minimum, daemon max_age (72h) **equaled** broker
|
||||||
|
window (72h). That isn't "inside the window."
|
||||||
|
|
||||||
|
v7 raises the floor and tightens the formula:
|
||||||
|
|
||||||
|
- **Minimum supported broker `dedupe_retention_days`**: **7** (was 3 in
|
||||||
|
v6). Below this, daemon refuses to start with `4012
|
||||||
|
feature_param_below_floor`.
|
||||||
|
- **Daemon `max_age_hours` derivation** (`retention_scoped` mode):
|
||||||
|
```
|
||||||
|
safety_margin_hours = max(24, ceil(dedupe_retention_days * 0.1 * 24))
|
||||||
|
max_age_hours = (dedupe_retention_days * 24) - safety_margin_hours
|
||||||
|
```
|
||||||
|
At minimum (7 days): `safety_margin = max(24, 17) = 24h`; `max_age =
|
||||||
|
168 - 24 = 144h`. Daemon outbox ≤144h, broker window ≥168h, gap ≥24h.
|
||||||
|
- **Daemon `max_age_hours` derivation** (`permanent` mode):
|
||||||
|
```
|
||||||
|
max_age_hours = config.outbox.max_age_hours_default (168h)
|
||||||
|
capped at config.outbox.max_age_hours_cap (720h)
|
||||||
|
```
|
||||||
|
- **Operator override**: `[outbox] max_age_hours_override = N` accepted
|
||||||
|
iff `N <= dedupe_retention_days * 24 - 24`. Above that → daemon
|
||||||
|
refuses to start with `outbox_max_age_above_dedupe_window` clear text.
|
||||||
|
- The 72h floor from v6 is **dropped** because the new 7-day broker
|
||||||
|
minimum already produces a 144h derived max-age — well above any
|
||||||
|
realistic floor concern.
|
||||||
|
|
||||||
|
### 4.9 Inbox schema — unchanged from v3 §4.5
|
||||||
|
|
||||||
|
### 4.10 Crash recovery — unchanged from v3 §4.6
|
||||||
|
|
||||||
|
### 4.11 Failure modes — unchanged from v6 §4.12, with §4.5/§4.6 added
|
||||||
|
|
||||||
|
- **IPC accept fingerprint-mismatch on duplicate id**: returns 409 with
|
||||||
|
`conflict` field per §4.5.1. Caller must rotate id.
|
||||||
|
- **Outbox row stuck in `dead`**: operator runs `outbox requeue
|
||||||
|
--new-client-id` per §4.6.3.
|
||||||
|
- **Broker fingerprint mismatch on retry**: as v6 §4.5. Daemon marks
|
||||||
|
`dead`, surfaces in `outbox --failed`.
|
||||||
|
- **Daemon retry after dedupe row hard-deleted by broker retention
|
||||||
|
sweep**: cannot happen unless operator overrode `max_age_hours`
|
||||||
|
beyond the safety margin. In `permanent` mode cannot happen at all.
|
||||||
|
- **Atomicity violation found by orphan check**: alerts ops; broker
|
||||||
|
team investigates. Should be zero.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Inbound — unchanged from v3 §5
|
||||||
|
|
||||||
|
## 6. Hooks — unchanged from v4 §6
|
||||||
|
|
||||||
|
## 7-13. — unchanged from v4
|
||||||
|
|
||||||
|
## 14. Lifecycle — unchanged from v5 §14
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 15. Version compat — minimum dedupe_retention_days raised
|
||||||
|
|
||||||
|
### 15.1 Feature bits with parameters (v7 update)
|
||||||
|
|
||||||
|
Only one row changes from v6 §15.1:
|
||||||
|
|
||||||
|
| Bit | `params.version` | Required parameters | Optional parameters |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `client_message_id_dedupe` | `2` | `mode: "retention_scoped"\|"permanent"`, `dedupe_retention_days: int (>= 7)` (when mode=retention_scoped), `request_fingerprint: bool == true` | `tombstone_history_pruned_window_days: int` |
|
||||||
|
|
||||||
|
`dedupe_retention_days` minimum raised from 3 to 7 to keep daemon
|
||||||
|
outbox max-age strictly inside the broker window with margin (§4.8).
|
||||||
|
|
||||||
|
### 15.2 — 15.5 unchanged from v6 §15
|
||||||
|
|
||||||
|
(`feature_negotiation_request/response`, IPC negotiation, compat
|
||||||
|
matrix, diagnostic close codes 4010 / 4011 / 4012.)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 16. Threat model — unchanged from v4 §16
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 17. Migration — broker dedupe + atomicity + corrected pseudocode (v7)
|
||||||
|
|
||||||
|
Broker side, deploy order:
|
||||||
|
|
||||||
|
1. `CREATE TABLE mesh.client_message_dedupe` (v6 §4.3 schema, unchanged
|
||||||
|
in v7).
|
||||||
|
2. `ALTER TABLE mesh.topic_message ADD COLUMN client_message_id`.
|
||||||
|
3. `ALTER TABLE mesh.message_queue ADD COLUMN client_message_id`.
|
||||||
|
4. Broker code refactor: every accept path runs the v7 §4.7.2 corrected
|
||||||
|
pseudocode in **one transaction** with the side-effect inventory
|
||||||
|
from §4.7.1 — dedupe row, message row, history row, delivery_queue
|
||||||
|
rows all in-tx.
|
||||||
|
5. Broker code: existing fan-out workers consume `delivery_queue` rows
|
||||||
|
committed by the accept transaction.
|
||||||
|
6. Broker code: nightly retention sweep + `history_available` flip on
|
||||||
|
message-row pruning (unchanged from v6 §17 step 5+6).
|
||||||
|
7. Broker code: extended orphan-check job (v7 §4.7.3) — alerts on
|
||||||
|
atomicity violations across full inventory.
|
||||||
|
8. Broker advertises `client_message_id_dedupe` feature with
|
||||||
|
`params.version = 2`, `request_fingerprint: true`,
|
||||||
|
`dedupe_retention_days >= 7` (was 3).
|
||||||
|
9. Daemon refuses to start unless above is advertised.
|
||||||
|
|
||||||
|
Daemon side:
|
||||||
|
- Outbox table gains `aborted` status (§4.6.3); migration ALTER on the
|
||||||
|
CHECK constraint at startup if SQLite version <DDL works without
|
||||||
|
a recreate; else table recreate via `INSERT INTO new SELECT * FROM
|
||||||
|
old`. v0.9.0 daemons are fresh installs by definition; existing
|
||||||
|
outboxes don't exist.
|
||||||
|
- IPC accept path implements §4.5.1 lookup table.
|
||||||
|
- IPC error envelope adds `conflict` and `daemon_fingerprint_prefix`
|
||||||
|
fields for 409 responses.
|
||||||
|
- New CLI verb `claudemesh daemon outbox requeue --id <id>
|
||||||
|
--new-client-id [auto|<id>]` (§4.6.3).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What changed v6 → v7 (codex round-6 actionable items)
|
||||||
|
|
||||||
|
| Codex r6 item | v7 fix | Section |
|
||||||
|
|---|---|---|
|
||||||
|
| Daemon-local duplicate POST semantics undefined | Full lookup table for pending/inflight/done/dead × match/mismatch; `409 idempotency_key_reused` at IPC layer with `conflict` field | §4.5 |
|
||||||
|
| §4.6 rejected-request contradiction | Single rule: id consumed iff outbox row written; pre-outbox failures leave id untouched; broker-rejected outbox row goes to `dead`, requires `requeue --new-client-id` | §4.6 |
|
||||||
|
| §4.7 pseudocode wrong | Corrected: `INSERT ON CONFLICT DO NOTHING`, then `SELECT FOR SHARE`, then branch on returned `broker_message_id` and `fingerprint` | §4.7.2 |
|
||||||
|
| Max-age math equals window at min | Min `dedupe_retention_days` raised to 7; safety margin always >= 24h; derived max-age strictly < window | §4.8, §15.1 |
|
||||||
|
| Broker atomicity scope incomplete | Side-effect inventory: dedupe + message + history + delivery_queue all in-tx; WS push and webhook fan-out explicitly outside-tx; orphan check extended | §4.7.1, §4.7.3 |
|
||||||
|
| New `aborted` outbox status | Distinguishes operator-retired rows from dead rows | §4.6.3 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What needs review (round 7)
|
||||||
|
|
||||||
|
1. **IPC lookup table (§4.5.1)** — does it cover all the realistic
|
||||||
|
client races? The "inflight + match" return is `202 accepted,
|
||||||
|
inflight` — should it be `200 ok` with the broker response if the
|
||||||
|
broker has already responded? Or does the daemon prefer to respond
|
||||||
|
from local state always?
|
||||||
|
2. **Aborted vs dead vs done (§4.6.3)** — is the three-state terminal
|
||||||
|
distinction useful, or noisy? Would `dead` + an `aborted_at`
|
||||||
|
timestamp suffice?
|
||||||
|
3. **§4.7.2 transaction shape** — `SELECT FOR SHARE` after `INSERT ON
|
||||||
|
CONFLICT DO NOTHING` is two round-trips. Could it be one with
|
||||||
|
`INSERT ... ON CONFLICT DO UPDATE SET ... RETURNING xmax = 0` or
|
||||||
|
similar Postgres-specific trick? Worth optimizing here?
|
||||||
|
4. **Max-age formula at higher windows** — at 365 days,
|
||||||
|
`safety_margin = ceil(0.1 * 365 * 24) = 876h ≈ 36.5 days`. Daemon
|
||||||
|
max-age = `8760 - 876 = 7884h ≈ 328 days`. Is that the right shape,
|
||||||
|
or should the safety margin be capped (e.g. `min(72, ceil(0.1 * w))`)?
|
||||||
|
5. **Side-effect inventory (§4.7.1)** — anything missing? E.g. broker-
|
||||||
|
side rate-limit counters, audit-log entries, mention-fanout-search?
|
||||||
|
6. **Anything else still wrong?** Read it as if you were going to
|
||||||
|
operate this for a year. What falls down?
|
||||||
|
|
||||||
|
Three options:
|
||||||
|
- **(a) v7 is shippable**: lock the spec, start coding the frozen core.
|
||||||
|
- **(b) v8 needed**: list the must-fix items.
|
||||||
|
- **(c) the architecture itself is wrong**: what would you do differently?
|
||||||
|
|
||||||
|
Be ruthless.
|
||||||
401
.artifacts/shipped/2026-05-03-daemon-final-spec-v8.md
Normal file
401
.artifacts/shipped/2026-05-03-daemon-final-spec-v8.md
Normal file
@@ -0,0 +1,401 @@
|
|||||||
|
# `claudemesh daemon` — Final Spec v8
|
||||||
|
|
||||||
|
> **Round 8.** v7 was reviewed by codex (round 7) which found four
|
||||||
|
> remaining correctness problems, one of them new in v7:
|
||||||
|
>
|
||||||
|
> 1. **`aborted` semantics not in §4.5.1** and contradiction with `UNIQUE`
|
||||||
|
> constraint — v7 said the old id "becomes free again at the daemon
|
||||||
|
> layer," but `client_message_id TEXT NOT NULL UNIQUE` makes that
|
||||||
|
> impossible without DELETE.
|
||||||
|
> 2. **Broker permanent-rejection ordering underspec** — v7 didn't state
|
||||||
|
> when (relative to dedupe insertion) permanent 4xx fires.
|
||||||
|
> 3. **SQLite `SELECT FOR UPDATE`** — SQLite doesn't support it; needs
|
||||||
|
> `BEGIN IMMEDIATE` for daemon-local serialization.
|
||||||
|
> 4. **Side-effect inventory still ambiguous** — rate-limit counters,
|
||||||
|
> audit logs, mention/search indexes need explicit
|
||||||
|
> in-tx/non-authoritative classification.
|
||||||
|
>
|
||||||
|
> v8 fixes all four. **Intent §0 unchanged from v2.** v8 only revises §4
|
||||||
|
> (delivery contract).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0. Intent — unchanged, see v2 §0
|
||||||
|
|
||||||
|
## 1. Process model — unchanged
|
||||||
|
|
||||||
|
## 2. Identity — unchanged from v5 §2
|
||||||
|
|
||||||
|
## 3. IPC surface — unchanged from v4 §3
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Delivery contract — `aborted` clarified, broker phasing, SQLite locking
|
||||||
|
|
||||||
|
### 4.1 The contract (precise — v8)
|
||||||
|
|
||||||
|
> **Local guarantee**: each successful `POST /v1/send` returns a stable
|
||||||
|
> `client_message_id`. The send is durably persisted to `outbox.db` before
|
||||||
|
> the response returns. The daemon enforces request-fingerprint
|
||||||
|
> idempotency at the IPC layer: a duplicate `POST` with the same
|
||||||
|
> `client_message_id` returns `409 idempotency_key_reused` if the
|
||||||
|
> fingerprint mismatches, regardless of outbox row state.
|
||||||
|
>
|
||||||
|
> **Local audit guarantee (NEW v8)**: a `client_message_id` once written
|
||||||
|
> to `outbox.db` is **never released**. Operator recovery via
|
||||||
|
> `requeue --new-client-id` always mints a fresh id; the old row stays
|
||||||
|
> in `aborted` for audit. There is no daemon-side path to free a used
|
||||||
|
> id.
|
||||||
|
>
|
||||||
|
> **Broker guarantee**: same as v7 §4.1. Dedupe row exists iff the
|
||||||
|
> broker reached the post-validation accept phase (§4.7.1).
|
||||||
|
>
|
||||||
|
> **Atomicity guarantee**: same as v7 §4.1.
|
||||||
|
>
|
||||||
|
> **End-to-end guarantee**: at-least-once.
|
||||||
|
|
||||||
|
### 4.2 Daemon-supplied `client_message_id` — unchanged from v3 §4.2
|
||||||
|
|
||||||
|
### 4.3 Broker schema — unchanged from v6 §4.3
|
||||||
|
|
||||||
|
### 4.4 Request fingerprint canonical form — unchanged from v6 §4.4
|
||||||
|
|
||||||
|
### 4.5 Daemon-local idempotency at the IPC layer (v8 — `aborted` added, SQLite locking)
|
||||||
|
|
||||||
|
#### 4.5.1 IPC accept algorithm (v8)
|
||||||
|
|
||||||
|
On `POST /v1/send`:
|
||||||
|
|
||||||
|
1. Validate request envelope (auth, schema, size limits, destination
|
||||||
|
resolvable). Failures here return `4xx` immediately. **No outbox row
|
||||||
|
is written; the `client_message_id` is not consumed.**
|
||||||
|
2. Compute `request_fingerprint` (§4.4).
|
||||||
|
3. Open a SQLite transaction with `BEGIN IMMEDIATE` (v8 — codex r7) so
|
||||||
|
a concurrent IPC accept on the same id serializes against this one.
|
||||||
|
`BEGIN IMMEDIATE` acquires the RESERVED lock at transaction start,
|
||||||
|
preventing any other writer from beginning a transaction on the same
|
||||||
|
database; SQLite has no row-level lock and `SELECT FOR UPDATE` is not
|
||||||
|
supported.
|
||||||
|
4. `SELECT id, request_fingerprint, status, broker_message_id,
|
||||||
|
last_error FROM outbox WHERE client_message_id = ?`.
|
||||||
|
5. Apply the lookup table below. For the "(no row)" case, INSERT the
|
||||||
|
new row inside the same transaction.
|
||||||
|
6. COMMIT.
|
||||||
|
|
||||||
|
| Existing row state | Fingerprint match? | Daemon response |
|
||||||
|
|---|---|---|
|
||||||
|
| (no row) | — | INSERT new outbox row in `pending`; return `202 accepted, queued` |
|
||||||
|
| `pending` | match | Return `202 accepted, queued`. No mutation |
|
||||||
|
| `pending` | mismatch | Return `409 idempotency_key_reused`, `conflict: "outbox_pending_fingerprint_mismatch"`. No mutation |
|
||||||
|
| `inflight` | match | Return `202 accepted, inflight`. No mutation |
|
||||||
|
| `inflight` | mismatch | Return `409 idempotency_key_reused`, `conflict: "outbox_inflight_fingerprint_mismatch"` |
|
||||||
|
| `done` | match | Return `200 ok, duplicate: true, broker_message_id, history_id`. No broker call |
|
||||||
|
| `done` | mismatch | Return `409 idempotency_key_reused`, `conflict: "outbox_done_fingerprint_mismatch", broker_message_id` |
|
||||||
|
| `dead` | match | Return `409 idempotency_key_reused`, `conflict: "outbox_dead_fingerprint_match", reason: "<last_error>"`. Same id never auto-retried |
|
||||||
|
| `dead` | mismatch | Return `409 idempotency_key_reused`, `conflict: "outbox_dead_fingerprint_mismatch"` |
|
||||||
|
| **`aborted`** (NEW v8) | **match** | Return `409 idempotency_key_reused`, `conflict: "outbox_aborted_fingerprint_match"`. The id was retired by operator action; never reusable |
|
||||||
|
| **`aborted`** (NEW v8) | **mismatch** | Return `409 idempotency_key_reused`, `conflict: "outbox_aborted_fingerprint_mismatch"` |
|
||||||
|
|
||||||
|
**Rule (v8 — codex r7)**: every IPC `409` carries the daemon's
|
||||||
|
`request_fingerprint` (8-byte hex prefix) so callers can debug
|
||||||
|
client/server canonical-form drift. **Every state in the table returns
|
||||||
|
something deterministic, including `aborted`.** A `client_message_id`
|
||||||
|
written to `outbox.db` is permanently bound to that row's lifecycle —
|
||||||
|
the only "free" state is "no row exists".
|
||||||
|
|
||||||
|
#### 4.5.2 Outbox table — fingerprint required
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE outbox (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
client_message_id TEXT NOT NULL UNIQUE,
|
||||||
|
request_fingerprint BLOB NOT NULL, -- 32 bytes
|
||||||
|
payload BLOB NOT NULL,
|
||||||
|
enqueued_at INTEGER NOT NULL,
|
||||||
|
attempts INTEGER DEFAULT 0,
|
||||||
|
next_attempt_at INTEGER NOT NULL,
|
||||||
|
status TEXT CHECK(status IN
|
||||||
|
('pending','inflight','done','dead','aborted')),
|
||||||
|
last_error TEXT,
|
||||||
|
delivered_at INTEGER,
|
||||||
|
broker_message_id TEXT,
|
||||||
|
aborted_at INTEGER, -- NEW v8
|
||||||
|
aborted_by TEXT, -- NEW v8: operator/auto
|
||||||
|
superseded_by TEXT -- NEW v8: id of the requeue successor row, if any
|
||||||
|
);
|
||||||
|
CREATE INDEX outbox_pending ON outbox(status, next_attempt_at);
|
||||||
|
CREATE INDEX outbox_aborted ON outbox(status, aborted_at) WHERE status = 'aborted';
|
||||||
|
```
|
||||||
|
|
||||||
|
`aborted_at`, `aborted_by`, `superseded_by` give operators a clear
|
||||||
|
audit trail. `superseded_by` lets `outbox inspect` show the chain when
|
||||||
|
a row was requeued multiple times.
|
||||||
|
|
||||||
|
`request_fingerprint` is computed once at IPC accept time and frozen
|
||||||
|
forever for the row's lifecycle. Daemon never recomputes from
|
||||||
|
`payload`.
|
||||||
|
|
||||||
|
### 4.6 Rejected-request semantics — phasing made explicit (v8 — codex r7)
|
||||||
|
|
||||||
|
> **Single rule, phased**: a `client_message_id` is consumed iff a
|
||||||
|
> dedupe row exists. The dedupe row is the durable evidence that a
|
||||||
|
> request reached the post-validation accept phase. Pre-validation
|
||||||
|
> failures consume nothing — caller may freely retry the same id with
|
||||||
|
> a fixed payload.
|
||||||
|
|
||||||
|
#### 4.6.1 Daemon-side rejection phasing
|
||||||
|
|
||||||
|
| Phase | When daemon rejects | Outbox row? | Caller may reuse id? |
|
||||||
|
|---|---|---|---|
|
||||||
|
| **A. IPC validation** (auth, schema, size, destination resolvable) | Before §4.5.1 step 3 | No | Yes — id never consumed |
|
||||||
|
| **B. Outbox stored, broker network/transient failure** | After IPC accept, broker `5xx` or timeout | `pending` → retried | N/A — daemon owns retries |
|
||||||
|
| **C. Outbox stored, broker permanent rejection** | Broker returns `4xx` after IPC accept | `dead` | No — rotate via `requeue --new-client-id` |
|
||||||
|
| **D. Operator retirement** | Operator runs `requeue --new-client-id` on `dead` or `pending` row | `aborted` (audit) + new row with fresh id | Old id NEVER reusable; new id is fresh |
|
||||||
|
|
||||||
|
#### 4.6.2 Broker-side rejection phasing (NEW v8 — codex r7)
|
||||||
|
|
||||||
|
The broker validates in two phases relative to dedupe-row insertion:
|
||||||
|
|
||||||
|
| Phase | Validation | Result |
|
||||||
|
|---|---|---|
|
||||||
|
| **B1. Pre-dedupe-claim** (NEW — explicit) | Auth (mesh membership), schema, size, mesh exists, member exists, destination kind valid, payload bytes ≤ `max_payload.inline_bytes` | `4xx` returned. **No dedupe row inserted.** Caller may retry with same id and corrected payload. |
|
||||||
|
| **B2. Post-dedupe-claim** | Anything that requires the dedupe-claim transaction to be in progress: destination_ref existence (topic exists, member subscribed, etc.), per-mesh rate limit not exceeded | `4xx` returned, transaction rolled back, **no dedupe row remains**. Caller may retry with same id. |
|
||||||
|
| **B3. Accepted** | All side effects (dedupe row, message row, history row, delivery_queue rows) commit atomically | `201` returned with `broker_message_id` |
|
||||||
|
|
||||||
|
**Critical guarantee (v8)**: there is no broker code path where a
|
||||||
|
permanent rejection (4xx) leaves a dedupe row behind. Either the
|
||||||
|
request committed and a dedupe row exists (B3), or it didn't and no
|
||||||
|
dedupe row exists (B1, B2). This makes "dedupe row exists" the single
|
||||||
|
unambiguous signal of "id consumed at the broker layer."
|
||||||
|
|
||||||
|
If broker decides post-commit that an accepted message is invalid
|
||||||
|
(e.g. an async content-policy job runs on accepted messages), that's
|
||||||
|
NOT a permanent rejection — that's a follow-up moderation event that
|
||||||
|
operates on the broker_message_id, not on the dedupe key.
|
||||||
|
|
||||||
|
#### 4.6.3 Operator recovery via `requeue` (corrected v8)
|
||||||
|
|
||||||
|
To unstick a `dead` or `pending`-but-stuck row, operator runs:
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh daemon outbox requeue --id <outbox_row_id>
|
||||||
|
[--new-client-id <id> | --auto]
|
||||||
|
[--patch-payload <path>]
|
||||||
|
```
|
||||||
|
|
||||||
|
This atomically (single SQLite transaction):
|
||||||
|
|
||||||
|
1. Marks the existing row's status to `aborted`, sets `aborted_at = now`,
|
||||||
|
`aborted_by = "operator"`. Row is **never deleted** — audit trail
|
||||||
|
permanent.
|
||||||
|
2. Mints a fresh `client_message_id` (caller-supplied via `--new-client-id`
|
||||||
|
or auto-ulid'd via `--auto`).
|
||||||
|
3. Inserts a new outbox row in `pending` with the fresh id and the same
|
||||||
|
payload (or patched payload if `--patch-payload` was given).
|
||||||
|
4. Sets `superseded_by = <new_row_id>` on the old row so
|
||||||
|
`outbox inspect <old_id>` displays the chain.
|
||||||
|
|
||||||
|
**The old `client_message_id` is permanently dead** — `outbox.db` still
|
||||||
|
holds it via the `aborted` row's `UNIQUE` constraint, and any caller
|
||||||
|
re-using it gets `409 outbox_aborted_*` per §4.5.1.
|
||||||
|
|
||||||
|
If broker had ever accepted the old id (it reached B3), the broker's
|
||||||
|
dedupe row is also permanent — duplicate sends to broker with the old
|
||||||
|
id would also `409` for fingerprint mismatch (or return the original
|
||||||
|
`broker_message_id` for matching fingerprint). Daemon-side
|
||||||
|
`aborted` and broker-side dedupe row are independent records of "this
|
||||||
|
id was used," neither releases the id.
|
||||||
|
|
||||||
|
This is the resolution to v7's contradiction: there is **no path** for
|
||||||
|
an id to "become free again." If the operator wants to retry the
|
||||||
|
payload, they get a new id. The old id stays buried.
|
||||||
|
|
||||||
|
### 4.7 Broker atomicity contract — side-effect classification (v8 — codex r7)
|
||||||
|
|
||||||
|
#### 4.7.1 Side effects (v8 — explicit classification)
|
||||||
|
|
||||||
|
Every successful broker accept atomically commits these durable
|
||||||
|
state changes in **one transaction**:
|
||||||
|
|
||||||
|
| Effect | Table | In-tx? | Why |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Dedupe record | `mesh.client_message_dedupe` | **Yes** | Idempotency authority |
|
||||||
|
| Message body | `mesh.topic_message` / `mesh.message_queue` | **Yes** | Authoritative store |
|
||||||
|
| History row | `mesh.message_history` | **Yes** | Replay log; lost-on-rollback would break ordered replay |
|
||||||
|
| Fan-out work | `mesh.delivery_queue` | **Yes** | Each recipient must see exactly the messages that committed |
|
||||||
|
| Mention index entries | `mesh.mention_index` | **Yes** | Reads off mention queries must match committed messages |
|
||||||
|
|
||||||
|
**Outside the transaction** — non-authoritative or rebuildable, with
|
||||||
|
explicit rationale per item:
|
||||||
|
|
||||||
|
| Effect | Where | Why outside |
|
||||||
|
|---|---|---|
|
||||||
|
| WS push to live subscribers | Async after COMMIT | Live notifications are best-effort; receivers re-fetch from history on reconnect |
|
||||||
|
| Webhook fan-out | Async via `delivery_queue` workers | Off-band; consumes committed `delivery_queue` rows |
|
||||||
|
| Rate-limit counters | Async, eventually consistent | Counters are an estimate; over-counting on retry > under-counting |
|
||||||
|
| Audit log entries | Async append-only stream | Audit log can be rebuilt from message history; in-tx writes hurt p99 |
|
||||||
|
| Search/FTS index updates | Async via outbox-pattern worker | Index can be rebuilt from authoritative tables |
|
||||||
|
| Metrics | Prometheus, pull-based | Always non-authoritative |
|
||||||
|
|
||||||
|
If any in-transaction insert fails, the transaction rolls back
|
||||||
|
completely. The accept is `5xx` to daemon; daemon retries. No partial
|
||||||
|
state.
|
||||||
|
|
||||||
|
The async side effects are driven off the in-transaction
|
||||||
|
`delivery_queue` and `message_history` rows, so they cannot get ahead
|
||||||
|
of committed state — only lag behind.
|
||||||
|
|
||||||
|
#### 4.7.2 Pseudocode — corrected and final (v8)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
BEGIN;
|
||||||
|
|
||||||
|
-- Phase B1 already passed (see §4.6.2).
|
||||||
|
|
||||||
|
-- Phase B2 + B3: try to claim the idempotency key.
|
||||||
|
INSERT INTO mesh.client_message_dedupe
|
||||||
|
(mesh_id, client_message_id, broker_message_id, request_fingerprint,
|
||||||
|
destination_kind, destination_ref, expires_at)
|
||||||
|
VALUES ($mesh_id, $client_id, $msg_id, $fingerprint,
|
||||||
|
$dest_kind, $dest_ref, $expires_at)
|
||||||
|
ON CONFLICT (mesh_id, client_message_id) DO NOTHING;
|
||||||
|
|
||||||
|
-- Inspect the row that's actually there now (ours or someone else's).
|
||||||
|
SELECT broker_message_id, request_fingerprint, destination_kind,
|
||||||
|
destination_ref, history_available, first_seen_at
|
||||||
|
FROM mesh.client_message_dedupe
|
||||||
|
WHERE mesh_id = $mesh_id AND client_message_id = $client_id
|
||||||
|
FOR SHARE;
|
||||||
|
|
||||||
|
-- Branch:
|
||||||
|
-- row.broker_message_id == $msg_id → first insert; continue to step 3.
|
||||||
|
-- row.broker_message_id != $msg_id → duplicate. Compare fingerprints:
|
||||||
|
-- fingerprint match → ROLLBACK; return 200 duplicate.
|
||||||
|
-- fingerprint mismatch → ROLLBACK; return 409 idempotency_key_reused.
|
||||||
|
|
||||||
|
-- Step 3: validate Phase B2 (subscribers exist, rate limit not exceeded, etc.)
|
||||||
|
-- If B2 fails → ROLLBACK; return 4xx (no dedupe row remains).
|
||||||
|
|
||||||
|
-- Step 4: insert all in-tx side effects (§4.7.1).
|
||||||
|
INSERT INTO mesh.topic_message (id, mesh_id, client_message_id, body, ...)
|
||||||
|
VALUES ($msg_id, $mesh_id, $client_id, ...);
|
||||||
|
|
||||||
|
INSERT INTO mesh.message_history (broker_message_id, mesh_id, ...)
|
||||||
|
VALUES ($msg_id, $mesh_id, ...);
|
||||||
|
|
||||||
|
INSERT INTO mesh.delivery_queue (broker_message_id, recipient_pubkey, ...)
|
||||||
|
SELECT $msg_id, member_pubkey, ...
|
||||||
|
FROM mesh.topic_subscription
|
||||||
|
WHERE topic = $dest_ref AND mesh_id = $mesh_id;
|
||||||
|
|
||||||
|
INSERT INTO mesh.mention_index (broker_message_id, mentioned_pubkey, ...)
|
||||||
|
SELECT $msg_id, mention_pubkey, ...
|
||||||
|
FROM unnest($mention_list);
|
||||||
|
|
||||||
|
COMMIT;
|
||||||
|
|
||||||
|
-- After COMMIT, async workers consume delivery_queue and update
|
||||||
|
-- search indexes, audit logs, rate-limit counters, etc.
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 4.7.3 Orphan check — same as v7 §4.7.3
|
||||||
|
|
||||||
|
Extended over the side-effect inventory to verify in-tx items consistency.
|
||||||
|
|
||||||
|
### 4.8 Outbox max-age math — unchanged from v7 §4.8
|
||||||
|
|
||||||
|
Min `dedupe_retention_days = 7`; derived `max_age_hours = window -
|
||||||
|
safety_margin` strictly < window; safety_margin floor 24h.
|
||||||
|
|
||||||
|
### 4.9 Inbox schema — unchanged from v3 §4.5
|
||||||
|
|
||||||
|
### 4.10 Crash recovery — unchanged from v3 §4.6
|
||||||
|
|
||||||
|
### 4.11 Failure modes — `aborted` semantics added (v8)
|
||||||
|
|
||||||
|
- **IPC accept fingerprint-mismatch on duplicate id** (any state):
|
||||||
|
returns 409 with `conflict` field per §4.5.1. Caller must use a new id.
|
||||||
|
- **IPC accept against `aborted` row, fingerprint match**: returns 409
|
||||||
|
per §4.5.1 (NEW v8). Caller must use a new id; the old id is
|
||||||
|
permanently retired.
|
||||||
|
- **Outbox row stuck in `dead`**: operator runs `outbox requeue` per
|
||||||
|
§4.6.3; old id stays in `aborted`, new id is fresh.
|
||||||
|
- **Broker fingerprint mismatch on retry**: as v6/v7. Daemon marks
|
||||||
|
`dead`; operator requeue path.
|
||||||
|
- **Daemon retry after dedupe row hard-deleted by broker retention
|
||||||
|
sweep**: cannot happen unless operator overrode `max_age_hours`.
|
||||||
|
- **Broker phase B2 rejection on retry**: same id, same fingerprint,
|
||||||
|
but B2 condition has changed (e.g. mesh rate-limit now exceeded).
|
||||||
|
Daemon receives 4xx → marks `dead`. Operator can `requeue` once
|
||||||
|
conditions clear.
|
||||||
|
- **Atomicity violation found by orphan check**: alerts ops.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5-13. — unchanged from v4
|
||||||
|
|
||||||
|
## 14. Lifecycle — unchanged from v5 §14
|
||||||
|
|
||||||
|
## 15. Version compat — unchanged from v7 §15
|
||||||
|
|
||||||
|
## 16. Threat model — unchanged
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 17. Migration — v8 outbox columns + broker phase B2 (v8)
|
||||||
|
|
||||||
|
Broker side, deploy order: same as v7 §17, with one addition:
|
||||||
|
- Step 4.5: explicitly split broker accept into Phase B1 (pre-dedupe
|
||||||
|
validation, returns 4xx without writing) and Phase B2/B3 (within the
|
||||||
|
accept transaction). Implementation: refactor handler to validate
|
||||||
|
Phase B1 conditions before opening the DB transaction.
|
||||||
|
|
||||||
|
Daemon side:
|
||||||
|
- Outbox schema gains `aborted_at`, `aborted_by`, `superseded_by`
|
||||||
|
columns and the `aborted` enum value (§4.5.2). Migration applies via
|
||||||
|
`INSERT INTO new SELECT * FROM old` recreation if needed; v0.9.0 is
|
||||||
|
greenfield.
|
||||||
|
- IPC accept switches to `BEGIN IMMEDIATE` for SQLite serialization
|
||||||
|
(§4.5.1 step 3).
|
||||||
|
- IPC accept handles `aborted` rows per §4.5.1 (always 409).
|
||||||
|
- `claudemesh daemon outbox requeue` always mints a fresh
|
||||||
|
`client_message_id`; never frees the old id. `--new-client-id <id>`
|
||||||
|
and `--auto` are the only modes; the old `client_message_id`
|
||||||
|
argument is removed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What changed v7 → v8 (codex round-7 actionable items)
|
||||||
|
|
||||||
|
| Codex r7 item | v8 fix | Section |
|
||||||
|
|---|---|---|
|
||||||
|
| `aborted` not in §4.5.1; `UNIQUE` contradiction | Added two `aborted` rows (match/mismatch) to lookup table; old id never reusable; new audit columns `aborted_at`/`aborted_by`/`superseded_by` | §4.5.1, §4.5.2, §4.6.3 |
|
||||||
|
| Broker permanent-rejection ordering vague | Three-phase model B1 (pre-dedupe), B2 (post-claim, in-tx), B3 (accepted); permanent 4xx never leaves dedupe row | §4.6.2 |
|
||||||
|
| SQLite `SELECT FOR UPDATE` invalid | Replaced with `BEGIN IMMEDIATE` for daemon-local serialization | §4.5.1 |
|
||||||
|
| Side-effect inventory ambiguous on rate-limit/audit/search | Explicit in-tx vs outside-tx table with rationale per item | §4.7.1 |
|
||||||
|
| Operator id reuse semantics | Old id permanently retired in `aborted`; requeue always mints fresh id; no daemon-side path to release used ids | §4.6.3 |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What needs review (round 8)
|
||||||
|
|
||||||
|
1. **`aborted` permanence (§4.5.1, §4.6.3)** — is "old id permanently
|
||||||
|
dead" correct, or is there a real operational case where releasing
|
||||||
|
an id (e.g. caller mistyped a uuid) is worth the audit-trail loss?
|
||||||
|
2. **Phase B1/B2/B3 split (§4.6.2)** — clean enough? Is rate-limiting
|
||||||
|
in B2 (in-tx) the right call, or should it be B1 (cheaper to enforce
|
||||||
|
pre-tx)?
|
||||||
|
3. **In-tx mention_index (§4.7.1)** — agree it should be in-tx, or
|
||||||
|
should mention indexing be async like search?
|
||||||
|
4. **`BEGIN IMMEDIATE` (§4.5.1)** — correct SQLite primitive, or should
|
||||||
|
it be `BEGIN EXCLUSIVE` to also block readers? (Probably not — readers
|
||||||
|
should see committed-pending rows, but worth confirming.)
|
||||||
|
5. **Anything else still wrong?** Read it as if you were going to
|
||||||
|
operate this for a year.
|
||||||
|
|
||||||
|
Three options:
|
||||||
|
- **(a) v8 is shippable**: lock the spec, start coding the frozen core.
|
||||||
|
- **(b) v9 needed**: list the must-fix items.
|
||||||
|
- **(c) the architecture itself is wrong**: what would you do differently?
|
||||||
|
|
||||||
|
Be ruthless.
|
||||||
473
.artifacts/shipped/2026-05-03-daemon-final-spec-v9.md
Normal file
473
.artifacts/shipped/2026-05-03-daemon-final-spec-v9.md
Normal file
@@ -0,0 +1,473 @@
|
|||||||
|
# `claudemesh daemon` — Final Spec v9
|
||||||
|
|
||||||
|
> **Round 9.** v8 was reviewed by codex (round 8) which closed
|
||||||
|
> aborted/UNIQUE (5/5) and SQLite locking (5/5) cleanly, but flagged
|
||||||
|
> three spec-level correctness problems:
|
||||||
|
>
|
||||||
|
> 1. **Cross-layer ID-consumed authority contradiction** — v8 §4.1
|
||||||
|
> said "id consumed iff dedupe row exists" while §4.6.1 says a
|
||||||
|
> daemon-rejected id stays consumed locally with no broker dedupe
|
||||||
|
> row. Two incompatible authorities.
|
||||||
|
> 2. **Rate-limit authority muddled** — v8 listed rate limit in B2
|
||||||
|
> (in-tx authoritative) but classified rate-limit counters as
|
||||||
|
> async/non-authoritative in §4.7.1.
|
||||||
|
> 3. **§4.1 broker guarantee wording** — "post-validation accept
|
||||||
|
> phase" was fuzzy because B2 rolls back. Tighten to "accept
|
||||||
|
> committed."
|
||||||
|
>
|
||||||
|
> v9 fixes all three with **two-layer ID rules** (daemon vs broker),
|
||||||
|
> rate-limit moved to B1 via an external atomic limiter, and §4.1
|
||||||
|
> tightened. **Intent §0 unchanged from v2.** v9 only revises §4.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0. Intent — unchanged, see v2 §0
|
||||||
|
|
||||||
|
## 1. Process model — unchanged
|
||||||
|
|
||||||
|
## 2. Identity — unchanged from v5 §2
|
||||||
|
|
||||||
|
## 3. IPC surface — unchanged from v4 §3
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Delivery contract — `aborted` clarified, broker phasing, SQLite locking
|
||||||
|
|
||||||
|
### 4.1 The contract (precise — v9, two-layer ID model)
|
||||||
|
|
||||||
|
> **Two-layer ID rules** (NEW v9 — codex r8):
|
||||||
|
>
|
||||||
|
> - **Daemon-layer**: a `client_message_id` is **daemon-consumed** iff an
|
||||||
|
> outbox row exists for it. Daemon-mediated callers can never reuse a
|
||||||
|
> daemon-consumed id, regardless of whether the broker ever saw it.
|
||||||
|
> The daemon's outbox is the single authority for "this id was issued
|
||||||
|
> by my caller against this daemon."
|
||||||
|
> - **Broker-layer**: a `client_message_id` is **broker-consumed** iff a
|
||||||
|
> dedupe row exists for `(mesh_id, client_message_id)` in
|
||||||
|
> `mesh.client_message_dedupe`. Direct broker callers (none in
|
||||||
|
> v0.9.0; reserved for future SDK paths that bypass the daemon) can
|
||||||
|
> reuse a broker-non-consumed id freely.
|
||||||
|
> - In v0.9.0 there are no daemon-bypass clients, so for practical
|
||||||
|
> purposes "daemon-consumed" is the operative rule.
|
||||||
|
>
|
||||||
|
> **Local guarantee**: each successful `POST /v1/send` returns a stable
|
||||||
|
> `client_message_id`. The send is durably persisted to `outbox.db`
|
||||||
|
> before the response returns. The daemon enforces request-fingerprint
|
||||||
|
> idempotency at the IPC layer (§4.5.1).
|
||||||
|
>
|
||||||
|
> **Local audit guarantee**: a `client_message_id` once written to
|
||||||
|
> `outbox.db` is **never released** (daemon-layer rule). Operator
|
||||||
|
> recovery via `requeue` always mints a fresh id; the old row stays in
|
||||||
|
> `aborted` for audit. There is no daemon-side path to free a used id.
|
||||||
|
>
|
||||||
|
> **Broker guarantee** (v9 — tightened): a dedupe row exists iff the
|
||||||
|
> broker accept transaction **committed** (Phase B3 reached). Phase B1
|
||||||
|
> rejections never insert dedupe rows. Phase B2 rejections roll the
|
||||||
|
> transaction back, so any partial dedupe row is unwound. Direct
|
||||||
|
> broker callers retrying after B1/B2 rejection see no dedupe row and
|
||||||
|
> may reuse the id.
|
||||||
|
>
|
||||||
|
> **Atomicity guarantee**: same as v8 §4.1.
|
||||||
|
>
|
||||||
|
> **End-to-end guarantee**: at-least-once.
|
||||||
|
|
||||||
|
### 4.2 Daemon-supplied `client_message_id` — unchanged from v3 §4.2
|
||||||
|
|
||||||
|
### 4.3 Broker schema — unchanged from v6 §4.3
|
||||||
|
|
||||||
|
### 4.4 Request fingerprint canonical form — unchanged from v6 §4.4
|
||||||
|
|
||||||
|
### 4.5 Daemon-local idempotency at the IPC layer (v8 — `aborted` added, SQLite locking)
|
||||||
|
|
||||||
|
#### 4.5.1 IPC accept algorithm (v8)
|
||||||
|
|
||||||
|
On `POST /v1/send`:
|
||||||
|
|
||||||
|
1. Validate request envelope (auth, schema, size limits, destination
|
||||||
|
resolvable). Failures here return `4xx` immediately. **No outbox row
|
||||||
|
is written; the `client_message_id` is not consumed.**
|
||||||
|
2. Compute `request_fingerprint` (§4.4).
|
||||||
|
3. Open a SQLite transaction with `BEGIN IMMEDIATE` (v8 — codex r7) so
|
||||||
|
a concurrent IPC accept on the same id serializes against this one.
|
||||||
|
`BEGIN IMMEDIATE` acquires the RESERVED lock at transaction start,
|
||||||
|
preventing any other writer from beginning a transaction on the same
|
||||||
|
database; SQLite has no row-level lock and `SELECT FOR UPDATE` is not
|
||||||
|
supported.
|
||||||
|
4. `SELECT id, request_fingerprint, status, broker_message_id,
|
||||||
|
last_error FROM outbox WHERE client_message_id = ?`.
|
||||||
|
5. Apply the lookup table below. For the "(no row)" case, INSERT the
|
||||||
|
new row inside the same transaction.
|
||||||
|
6. COMMIT.
|
||||||
|
|
||||||
|
| Existing row state | Fingerprint match? | Daemon response |
|
||||||
|
|---|---|---|
|
||||||
|
| (no row) | — | INSERT new outbox row in `pending`; return `202 accepted, queued` |
|
||||||
|
| `pending` | match | Return `202 accepted, queued`. No mutation |
|
||||||
|
| `pending` | mismatch | Return `409 idempotency_key_reused`, `conflict: "outbox_pending_fingerprint_mismatch"`. No mutation |
|
||||||
|
| `inflight` | match | Return `202 accepted, inflight`. No mutation |
|
||||||
|
| `inflight` | mismatch | Return `409 idempotency_key_reused`, `conflict: "outbox_inflight_fingerprint_mismatch"` |
|
||||||
|
| `done` | match | Return `200 ok, duplicate: true, broker_message_id, history_id`. No broker call |
|
||||||
|
| `done` | mismatch | Return `409 idempotency_key_reused`, `conflict: "outbox_done_fingerprint_mismatch", broker_message_id` |
|
||||||
|
| `dead` | match | Return `409 idempotency_key_reused`, `conflict: "outbox_dead_fingerprint_match", reason: "<last_error>"`. Same id never auto-retried |
|
||||||
|
| `dead` | mismatch | Return `409 idempotency_key_reused`, `conflict: "outbox_dead_fingerprint_mismatch"` |
|
||||||
|
| **`aborted`** (NEW v8) | **match** | Return `409 idempotency_key_reused`, `conflict: "outbox_aborted_fingerprint_match"`. The id was retired by operator action; never reusable |
|
||||||
|
| **`aborted`** (NEW v8) | **mismatch** | Return `409 idempotency_key_reused`, `conflict: "outbox_aborted_fingerprint_mismatch"` |
|
||||||
|
|
||||||
|
**Rule (v8 — codex r7)**: every IPC `409` carries the daemon's
|
||||||
|
`request_fingerprint` (8-byte hex prefix) so callers can debug
|
||||||
|
client/server canonical-form drift. **Every state in the table returns
|
||||||
|
something deterministic, including `aborted`.** A `client_message_id`
|
||||||
|
written to `outbox.db` is permanently bound to that row's lifecycle —
|
||||||
|
the only "free" state is "no row exists".
|
||||||
|
|
||||||
|
#### 4.5.2 Outbox table — fingerprint required
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE outbox (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
client_message_id TEXT NOT NULL UNIQUE,
|
||||||
|
request_fingerprint BLOB NOT NULL, -- 32 bytes
|
||||||
|
payload BLOB NOT NULL,
|
||||||
|
enqueued_at INTEGER NOT NULL,
|
||||||
|
attempts INTEGER DEFAULT 0,
|
||||||
|
next_attempt_at INTEGER NOT NULL,
|
||||||
|
status TEXT CHECK(status IN
|
||||||
|
('pending','inflight','done','dead','aborted')),
|
||||||
|
last_error TEXT,
|
||||||
|
delivered_at INTEGER,
|
||||||
|
broker_message_id TEXT,
|
||||||
|
aborted_at INTEGER, -- NEW v8
|
||||||
|
aborted_by TEXT, -- NEW v8: operator/auto
|
||||||
|
superseded_by TEXT -- NEW v8: id of the requeue successor row, if any
|
||||||
|
);
|
||||||
|
CREATE INDEX outbox_pending ON outbox(status, next_attempt_at);
|
||||||
|
CREATE INDEX outbox_aborted ON outbox(status, aborted_at) WHERE status = 'aborted';
|
||||||
|
```
|
||||||
|
|
||||||
|
`aborted_at`, `aborted_by`, `superseded_by` give operators a clear
|
||||||
|
audit trail. `superseded_by` lets `outbox inspect` show the chain when
|
||||||
|
a row was requeued multiple times.
|
||||||
|
|
||||||
|
`request_fingerprint` is computed once at IPC accept time and frozen
|
||||||
|
forever for the row's lifecycle. Daemon never recomputes from
|
||||||
|
`payload`.
|
||||||
|
|
||||||
|
### 4.6 Rejected-request semantics — two-layer rules + rate-limit moved to B1 (v9 — codex r8)
|
||||||
|
|
||||||
|
> **Two-layer rule (v9)**: a `client_message_id` is **daemon-consumed**
|
||||||
|
> iff an outbox row exists for it; **broker-consumed** iff a dedupe row
|
||||||
|
> exists. Daemon-mediated callers see daemon-layer authority (the only
|
||||||
|
> path in v0.9.0). Pre-validation failures at any layer consume nothing
|
||||||
|
> at that layer. The two layers are independent: a daemon-consumed id
|
||||||
|
> may or may not be broker-consumed (depending on whether the send
|
||||||
|
> reached B3); a daemon-non-consumed id can never be broker-consumed
|
||||||
|
> (no outbox row ⇒ no broker call from the daemon).
|
||||||
|
|
||||||
|
#### 4.6.1 Daemon-side rejection phasing (v9)
|
||||||
|
|
||||||
|
| Phase | When daemon rejects | Outbox row? | Daemon-consumed? | Same daemon caller may reuse id? |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| **A. IPC validation** (auth, schema, size, destination resolvable) | Before §4.5.1 step 3 | No | No | Yes — id never written locally |
|
||||||
|
| **B. Outbox stored, broker network/transient failure** | After IPC accept, broker `5xx` or timeout | `pending` → retried | Yes | N/A — daemon owns retries |
|
||||||
|
| **C. Outbox stored, broker permanent rejection** | Broker returns `4xx` after IPC accept | `dead` | Yes | No — rotate via `requeue` |
|
||||||
|
| **D. Operator retirement** | Operator runs `requeue` on `dead` or `pending` row | `aborted` (audit) + new row with fresh id | Yes (still consumed) | Old id NEVER reusable; new id is fresh |
|
||||||
|
|
||||||
|
The "daemon-consumed?" column is the daemon-layer authority. It does
|
||||||
|
not depend on whether the broker ever saw the request — phase C above
|
||||||
|
shows the broker has not committed a dedupe row, but the daemon still
|
||||||
|
holds the id in `dead` state.
|
||||||
|
|
||||||
|
#### 4.6.2 Broker-side rejection phasing (v9 — rate limit moved to B1)
|
||||||
|
|
||||||
|
The broker validates in two phases relative to dedupe-row insertion:
|
||||||
|
|
||||||
|
| Phase | Validation | Side effects | Result for direct broker callers |
|
||||||
|
|---|---|---|---|
|
||||||
|
| **B1. Pre-dedupe-claim** (atomic, external) | Auth (mesh membership), schema, size, mesh exists, member exists, destination kind valid, payload bytes ≤ `max_payload.inline_bytes`, **rate limit not exceeded** (atomic external limiter — see §4.6.4) | None | `4xx` returned. No dedupe row, no broker-consumed id. Caller may retry with same id once condition clears |
|
||||||
|
| **B2. Post-dedupe-claim** (in-tx) | Conditions that require the accept transaction to be in progress: destination_ref existence (topic exists, member subscribed, etc.) | INSERT into dedupe rolled back | `4xx` returned, transaction rolled back, no dedupe row remains. Caller may retry with same id |
|
||||||
|
| **B3. Accepted** | All side effects commit atomically | Dedupe row, message row, history row, delivery_queue rows, mention_index rows | `201` returned with `broker_message_id`. Id is broker-consumed |
|
||||||
|
|
||||||
|
**Daemon-mediated callers**: in v0.9.0 the daemon is the only B-phase
|
||||||
|
caller. Daemon-mediated callers see only the daemon-layer rules
|
||||||
|
(§4.6.1). The broker's "may retry with same id" wording in the table
|
||||||
|
above applies to direct broker callers only (none in v0.9.0; reserved
|
||||||
|
for future SDK paths).
|
||||||
|
|
||||||
|
**Critical guarantee (v9 — tightened from v8)**: a dedupe row exists
|
||||||
|
**iff the broker accept transaction committed (B3)**. There is no
|
||||||
|
broker code path where a permanent 4xx leaves a dedupe row behind.
|
||||||
|
|
||||||
|
If the broker decides post-commit that an accepted message is invalid
|
||||||
|
(async content-policy job, async moderation, etc.), that's NOT a
|
||||||
|
permanent rejection — it's a follow-up event that operates on the
|
||||||
|
`broker_message_id`, not on the dedupe key.
|
||||||
|
|
||||||
|
#### 4.6.4 Rate limiter — atomic, external, B1 (NEW v9 — codex r8)
|
||||||
|
|
||||||
|
Codex r8 caught: v8 listed rate-limit enforcement in B2 (in-tx) but
|
||||||
|
classified rate-limit *counters* as async/non-authoritative. Both
|
||||||
|
can't be true. v9 resolves it by moving rate-limit enforcement to B1
|
||||||
|
backed by an atomic external limiter:
|
||||||
|
|
||||||
|
- **Authority**: the broker's existing Redis (or equivalent
|
||||||
|
fixed-window limiter) used for `claudemesh launch` rate-limiting is
|
||||||
|
the authority for accept-time rate-limit enforcement. `INCR` with
|
||||||
|
TTL is atomic; the broker checks the result before committing the
|
||||||
|
Phase B2/B3 transaction.
|
||||||
|
- **Idempotency interaction**: rate-limit `INCR` happens **before** the
|
||||||
|
dedupe-claim INSERT. If the limiter rejects, no DB transaction is
|
||||||
|
opened, no dedupe row exists. If the limiter accepts but the in-tx
|
||||||
|
Phase B2 then rejects (e.g. topic not found), the limiter `INCR` is
|
||||||
|
not refunded. This is intentional: refunding would require a
|
||||||
|
reliable distributed counter, and the over-counting risk is
|
||||||
|
acceptable. Counter
|
||||||
|
`cm_broker_rate_limit_consumed_then_rejected_total` exposes the
|
||||||
|
delta for ops awareness.
|
||||||
|
- **Retries**: a daemon retry with the same `client_message_id` after a
|
||||||
|
B1 rate-limit rejection produces another `INCR`. To avoid burning
|
||||||
|
rate-limit budget on retries-of-rejected-ids, the broker can
|
||||||
|
optionally short-circuit `INCR` if the rate-limit subsystem can
|
||||||
|
cheaply detect "this exact `client_message_id` was rejected for
|
||||||
|
rate-limit in the last N seconds" — but this is an optimization,
|
||||||
|
not a correctness requirement.
|
||||||
|
- **Async counters**: `mesh.rate_limit_counter` (or any DB-resident
|
||||||
|
view of "messages-per-mesh-per-window") is **non-authoritative** —
|
||||||
|
it's metrics/telemetry rebuilt from the authoritative limiter and
|
||||||
|
from message-history. Used for dashboards, not for accept decisions.
|
||||||
|
|
||||||
|
This split — atomic external limiter for enforcement, async DB
|
||||||
|
counters for telemetry — matches how every other rate-limited
|
||||||
|
subsystem in claudemesh works (`claudemesh launch`, dashboard chat
|
||||||
|
posts, etc.). No new infrastructure required.
|
||||||
|
|
||||||
|
#### 4.6.3 Operator recovery via `requeue` (corrected v8)
|
||||||
|
|
||||||
|
To unstick a `dead` or `pending`-but-stuck row, operator runs:
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh daemon outbox requeue --id <outbox_row_id>
|
||||||
|
[--new-client-id <id> | --auto]
|
||||||
|
[--patch-payload <path>]
|
||||||
|
```
|
||||||
|
|
||||||
|
This atomically (single SQLite transaction):
|
||||||
|
|
||||||
|
1. Marks the existing row's status to `aborted`, sets `aborted_at = now`,
|
||||||
|
`aborted_by = "operator"`. Row is **never deleted** — audit trail
|
||||||
|
permanent.
|
||||||
|
2. Mints a fresh `client_message_id` (caller-supplied via `--new-client-id`
|
||||||
|
or auto-ulid'd via `--auto`).
|
||||||
|
3. Inserts a new outbox row in `pending` with the fresh id and the same
|
||||||
|
payload (or patched payload if `--patch-payload` was given).
|
||||||
|
4. Sets `superseded_by = <new_row_id>` on the old row so
|
||||||
|
`outbox inspect <old_id>` displays the chain.
|
||||||
|
|
||||||
|
**The old `client_message_id` is permanently dead** — `outbox.db` still
|
||||||
|
holds it via the `aborted` row's `UNIQUE` constraint, and any caller
|
||||||
|
re-using it gets `409 outbox_aborted_*` per §4.5.1.
|
||||||
|
|
||||||
|
If broker had ever accepted the old id (it reached B3), the broker's
|
||||||
|
dedupe row is also permanent — duplicate sends to broker with the old
|
||||||
|
id would also `409` for fingerprint mismatch (or return the original
|
||||||
|
`broker_message_id` for matching fingerprint). Daemon-side
|
||||||
|
`aborted` and broker-side dedupe row are independent records of "this
|
||||||
|
id was used," neither releases the id.
|
||||||
|
|
||||||
|
This is the resolution to v7's contradiction: there is **no path** for
|
||||||
|
an id to "become free again." If the operator wants to retry the
|
||||||
|
payload, they get a new id. The old id stays buried.
|
||||||
|
|
||||||
|
### 4.7 Broker atomicity contract — side-effect classification (v9)
|
||||||
|
|
||||||
|
#### 4.7.1 Side effects (v9 — rate limit moved to B1 external)
|
||||||
|
|
||||||
|
Every successful broker accept atomically commits these durable
|
||||||
|
state changes in **one transaction**:
|
||||||
|
|
||||||
|
| Effect | Table | In-tx? | Why |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Dedupe record | `mesh.client_message_dedupe` | **Yes** | Idempotency authority |
|
||||||
|
| Message body | `mesh.topic_message` / `mesh.message_queue` | **Yes** | Authoritative store |
|
||||||
|
| History row | `mesh.message_history` | **Yes** | Replay log; lost-on-rollback would break ordered replay |
|
||||||
|
| Fan-out work | `mesh.delivery_queue` | **Yes** | Each recipient must see exactly the messages that committed |
|
||||||
|
| Mention index entries | `mesh.mention_index` | **Yes** | Reads off mention queries must match committed messages |
|
||||||
|
|
||||||
|
**Outside the transaction** — non-authoritative or rebuildable, with
|
||||||
|
explicit rationale per item:
|
||||||
|
|
||||||
|
| Effect | Where | Why outside |
|
||||||
|
|---|---|---|
|
||||||
|
| WS push to live subscribers | Async after COMMIT | Live notifications are best-effort; receivers re-fetch from history on reconnect |
|
||||||
|
| Webhook fan-out | Async via `delivery_queue` workers | Off-band; consumes committed `delivery_queue` rows |
|
||||||
|
| Rate-limit **counters** (telemetry only) | Async, eventually consistent | Authoritative limiter is the external Redis-style INCR in B1 (§4.6.4); the DB counter is rebuilt for dashboards, not consulted for accept |
|
||||||
|
| Audit log entries | Async append-only stream | Audit log can be rebuilt from message history; in-tx writes hurt p99 |
|
||||||
|
| Search/FTS index updates | Async via outbox-pattern worker | Index can be rebuilt from authoritative tables |
|
||||||
|
| Metrics | Prometheus, pull-based | Always non-authoritative |
|
||||||
|
|
||||||
|
If any in-transaction insert fails, the transaction rolls back
|
||||||
|
completely. The accept is `5xx` to daemon; daemon retries. No partial
|
||||||
|
state.
|
||||||
|
|
||||||
|
The async side effects are driven off the in-transaction
|
||||||
|
`delivery_queue` and `message_history` rows, so they cannot get ahead
|
||||||
|
of committed state — only lag behind.
|
||||||
|
|
||||||
|
#### 4.7.2 Pseudocode — corrected and final (v8)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Phase B1 already passed (see §4.6.2). This includes:
|
||||||
|
-- - schema/auth/size validation
|
||||||
|
-- - external atomic rate-limit INCR (§4.6.4)
|
||||||
|
-- Anything that fails B1 returns 4xx without ever opening this tx.
|
||||||
|
|
||||||
|
BEGIN;
|
||||||
|
|
||||||
|
-- Phase B2 + B3: try to claim the idempotency key.
|
||||||
|
INSERT INTO mesh.client_message_dedupe
|
||||||
|
(mesh_id, client_message_id, broker_message_id, request_fingerprint,
|
||||||
|
destination_kind, destination_ref, expires_at)
|
||||||
|
VALUES ($mesh_id, $client_id, $msg_id, $fingerprint,
|
||||||
|
$dest_kind, $dest_ref, $expires_at)
|
||||||
|
ON CONFLICT (mesh_id, client_message_id) DO NOTHING;
|
||||||
|
|
||||||
|
-- Inspect the row that's actually there now (ours or someone else's).
|
||||||
|
SELECT broker_message_id, request_fingerprint, destination_kind,
|
||||||
|
destination_ref, history_available, first_seen_at
|
||||||
|
FROM mesh.client_message_dedupe
|
||||||
|
WHERE mesh_id = $mesh_id AND client_message_id = $client_id
|
||||||
|
FOR SHARE;
|
||||||
|
|
||||||
|
-- Branch:
|
||||||
|
-- row.broker_message_id == $msg_id → first insert; continue to step 3.
|
||||||
|
-- row.broker_message_id != $msg_id → duplicate. Compare fingerprints:
|
||||||
|
-- fingerprint match → ROLLBACK; return 200 duplicate.
|
||||||
|
-- fingerprint mismatch → ROLLBACK; return 409 idempotency_key_reused.
|
||||||
|
|
||||||
|
-- Step 3: validate Phase B2 (destination_ref existence: topic exists,
|
||||||
|
-- member subscribed, etc.). Rate limit is NOT here — it was checked
|
||||||
|
-- atomically in B1 via the external limiter (§4.6.4) before this
|
||||||
|
-- transaction opened.
|
||||||
|
-- If B2 fails → ROLLBACK; return 4xx (no dedupe row remains).
|
||||||
|
|
||||||
|
-- Step 4: insert all in-tx side effects (§4.7.1).
|
||||||
|
INSERT INTO mesh.topic_message (id, mesh_id, client_message_id, body, ...)
|
||||||
|
VALUES ($msg_id, $mesh_id, $client_id, ...);
|
||||||
|
|
||||||
|
INSERT INTO mesh.message_history (broker_message_id, mesh_id, ...)
|
||||||
|
VALUES ($msg_id, $mesh_id, ...);
|
||||||
|
|
||||||
|
INSERT INTO mesh.delivery_queue (broker_message_id, recipient_pubkey, ...)
|
||||||
|
SELECT $msg_id, member_pubkey, ...
|
||||||
|
FROM mesh.topic_subscription
|
||||||
|
WHERE topic = $dest_ref AND mesh_id = $mesh_id;
|
||||||
|
|
||||||
|
INSERT INTO mesh.mention_index (broker_message_id, mentioned_pubkey, ...)
|
||||||
|
SELECT $msg_id, mention_pubkey, ...
|
||||||
|
FROM unnest($mention_list);
|
||||||
|
|
||||||
|
COMMIT;
|
||||||
|
|
||||||
|
-- After COMMIT, async workers consume delivery_queue and update
|
||||||
|
-- search indexes, audit logs, rate-limit counters, etc.
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 4.7.3 Orphan check — same as v7 §4.7.3
|
||||||
|
|
||||||
|
Extended over the side-effect inventory to verify in-tx items consistency.
|
||||||
|
|
||||||
|
### 4.8 Outbox max-age math — unchanged from v7 §4.8
|
||||||
|
|
||||||
|
Min `dedupe_retention_days = 7`; derived `max_age_hours = window -
|
||||||
|
safety_margin` strictly < window; safety_margin floor 24h.
|
||||||
|
|
||||||
|
### 4.9 Inbox schema — unchanged from v3 §4.5
|
||||||
|
|
||||||
|
### 4.10 Crash recovery — unchanged from v3 §4.6
|
||||||
|
|
||||||
|
### 4.11 Failure modes — `aborted` semantics added (v8)
|
||||||
|
|
||||||
|
- **IPC accept fingerprint-mismatch on duplicate id** (any state):
|
||||||
|
returns 409 with `conflict` field per §4.5.1. Caller must use a new id.
|
||||||
|
- **IPC accept against `aborted` row, fingerprint match**: returns 409
|
||||||
|
per §4.5.1 (NEW v8). Caller must use a new id; the old id is
|
||||||
|
permanently retired.
|
||||||
|
- **Outbox row stuck in `dead`**: operator runs `outbox requeue` per
|
||||||
|
§4.6.3; old id stays in `aborted`, new id is fresh.
|
||||||
|
- **Broker fingerprint mismatch on retry**: as v6/v7. Daemon marks
|
||||||
|
`dead`; operator requeue path.
|
||||||
|
- **Daemon retry after dedupe row hard-deleted by broker retention
|
||||||
|
sweep**: cannot happen unless operator overrode `max_age_hours`.
|
||||||
|
- **Broker phase B2 rejection on retry**: same id, same fingerprint,
|
||||||
|
but B2 condition has changed (e.g. mesh rate-limit now exceeded).
|
||||||
|
Daemon receives 4xx → marks `dead`. Operator can `requeue` once
|
||||||
|
conditions clear.
|
||||||
|
- **Atomicity violation found by orphan check**: alerts ops.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5-13. — unchanged from v4
|
||||||
|
|
||||||
|
## 14. Lifecycle — unchanged from v5 §14
|
||||||
|
|
||||||
|
## 15. Version compat — unchanged from v7 §15
|
||||||
|
|
||||||
|
## 16. Threat model — unchanged
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 17. Migration — v8 outbox columns + broker phase B2 (v8)
|
||||||
|
|
||||||
|
Broker side, deploy order: same as v7 §17, with one addition:
|
||||||
|
- Step 4.5: explicitly split broker accept into Phase B1 (pre-dedupe
|
||||||
|
validation, returns 4xx without writing) and Phase B2/B3 (within the
|
||||||
|
accept transaction). Implementation: refactor handler to validate
|
||||||
|
Phase B1 conditions before opening the DB transaction.
|
||||||
|
|
||||||
|
Daemon side:
|
||||||
|
- Outbox schema gains `aborted_at`, `aborted_by`, `superseded_by`
|
||||||
|
columns and the `aborted` enum value (§4.5.2). Migration applies via
|
||||||
|
`INSERT INTO new SELECT * FROM old` recreation if needed; v0.9.0 is
|
||||||
|
greenfield.
|
||||||
|
- IPC accept switches to `BEGIN IMMEDIATE` for SQLite serialization
|
||||||
|
(§4.5.1 step 3).
|
||||||
|
- IPC accept handles `aborted` rows per §4.5.1 (always 409).
|
||||||
|
- `claudemesh daemon outbox requeue` always mints a fresh
|
||||||
|
`client_message_id`; never frees the old id. `--new-client-id <id>`
|
||||||
|
and `--auto` are the only modes; the old `client_message_id`
|
||||||
|
argument is removed.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What changed v8 → v9 (codex round-8 actionable items)
|
||||||
|
|
||||||
|
| Codex r8 item | v9 fix | Section |
|
||||||
|
|---|---|---|
|
||||||
|
| Cross-layer ID-consumed authority contradiction | Two-layer model: daemon-consumed iff outbox row; broker-consumed iff dedupe row committed; daemon-mediated callers see only daemon-layer authority | §4.1, §4.6.1, §4.6.2 |
|
||||||
|
| Rate-limit authority muddled (B2 vs async counters) | Rate limit moved to B1 via external atomic limiter (Redis-style INCR with TTL); DB rate-limit counters demoted to telemetry-only | §4.6.2, §4.6.4, §4.7.1 |
|
||||||
|
| §4.1 broker guarantee fuzzy | Tightened: "dedupe row exists iff broker accept transaction committed (B3)" | §4.1, §4.6.2 |
|
||||||
|
|
||||||
|
(Earlier rounds' fixes preserved unchanged.)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What needs review (round 9)
|
||||||
|
|
||||||
|
1. **Two-layer ID model (§4.1, §4.6.1)** — is the daemon-vs-broker
|
||||||
|
authority split clear, or does it create more confusion for
|
||||||
|
operators reading "consumed" in different contexts? Should we use
|
||||||
|
different verbs (e.g. "claimed" at daemon, "committed" at broker)?
|
||||||
|
2. **Rate-limit external limiter (§4.6.4)** — is "atomic external
|
||||||
|
limiter" specified concretely enough? Is the over-counting on
|
||||||
|
limiter-accepted-then-B2-rejected acceptable?
|
||||||
|
3. **B2 contents after rate-limit move** — B2 now only has
|
||||||
|
`destination_ref existence`. Worth keeping a B2 phase at all, or
|
||||||
|
collapse into B1+B3?
|
||||||
|
4. **Anything else still wrong?** Read it as if you were going to
|
||||||
|
operate this for a year.
|
||||||
|
|
||||||
|
Three options:
|
||||||
|
- **(a) v9 is shippable**: lock the spec, start coding the frozen core.
|
||||||
|
- **(b) v10 needed**: list the must-fix items.
|
||||||
|
- **(c) the architecture itself is wrong**: what would you do differently?
|
||||||
|
|
||||||
|
Be ruthless.
|
||||||
374
.artifacts/shipped/2026-05-03-daemon-final-spec.md
Normal file
374
.artifacts/shipped/2026-05-03-daemon-final-spec.md
Normal file
@@ -0,0 +1,374 @@
|
|||||||
|
# `claudemesh daemon` — Final Spec
|
||||||
|
|
||||||
|
> Context for the reviewer: claudemesh is a peer mesh runtime for Claude Code
|
||||||
|
> sessions. Existing infrastructure: a managed broker (`wss://ic.claudemesh.com/ws`,
|
||||||
|
> Bun + Drizzle + Postgres) that handles routing, presence, topics, files,
|
||||||
|
> per-mesh apikeys, etc. There is also a CLI (`claudemesh-cli`, npm) and a web
|
||||||
|
> dashboard. Each session today is short-lived: `claudemesh launch` opens a WS,
|
||||||
|
> stays up while Claude Code is running, then closes. Server-side
|
||||||
|
> integrations (RunPod handlers, Temporal workers, CI jobs) currently have no
|
||||||
|
> first-class way to participate in a mesh — they'd either curl an apikey-auth
|
||||||
|
> REST endpoint (one-way) or shell out to the CLI cold-path (slow, no inbound).
|
||||||
|
>
|
||||||
|
> This spec proposes a `claudemesh daemon` mode that turns any host (laptop,
|
||||||
|
> server, RunPod pod) into a persistent, addressable peer with a local IPC
|
||||||
|
> surface that apps can talk to without dealing with the broker directly.
|
||||||
|
>
|
||||||
|
> The user has explicitly said: pre-launch, no users yet, optimize for the
|
||||||
|
> right architecture not the smallest first cut. They want the FINAL spec, not
|
||||||
|
> phased MVPs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Process model
|
||||||
|
|
||||||
|
**One daemon per (user, mesh)**. Persistent. Survives reboots via OS supervisor (systemd / launchd / SCM). Serves multiple local apps concurrently.
|
||||||
|
|
||||||
|
```
|
||||||
|
~/.claudemesh/daemon/<mesh-slug>/
|
||||||
|
pid 0600 pidfile, cleaned on shutdown
|
||||||
|
sock 0600 unix domain socket (primary IPC)
|
||||||
|
http.port 0644 auto-allocated loopback port (Windows / Docker fallback)
|
||||||
|
keypair.json 0600 persistent ed25519 + x25519 — daemon identity
|
||||||
|
config.toml 0644 user-editable runtime tuning
|
||||||
|
outbox.db 0600 SQLite — durable outbound queue + dedupe ledger
|
||||||
|
inbox.db 0600 SQLite — 30-day inbound history, FTS-indexed
|
||||||
|
daemon.log 0644 JSON-lines, rotating (100 MB / 14 d)
|
||||||
|
hooks/ 0700 user-managed event scripts
|
||||||
|
```
|
||||||
|
|
||||||
|
Single binary. No external runtime beyond the existing CLI dependencies. The daemon *is* the CLI in long-running mode — `claudemesh daemon up` is a flag on the same binary.
|
||||||
|
|
||||||
|
## 2. Identity — persistent member, not ephemeral session
|
||||||
|
|
||||||
|
The daemon mints a stable ed25519 + x25519 keypair on first startup, stored in `keypair.json`. Registers with the broker as a **persistent member** — same identity across restarts, reconnects, host migrations. `runpod-worker-3` is `runpod-worker-3` forever, until you `claudemesh daemon reset` or revoke the keypair.
|
||||||
|
|
||||||
|
`--name` is taken at first `daemon up`; subsequent runs read the keypair file and ignore `--name` unless `--rename` is passed (which produces a `member_renamed` event the broker propagates to peers).
|
||||||
|
|
||||||
|
This is the default. It's the right thing for servers. There is no `--ephemeral` mode.
|
||||||
|
|
||||||
|
## 3. IPC surface — single versioned API, three transports
|
||||||
|
|
||||||
|
**Transports**, all serving identical JSON:
|
||||||
|
- **UDS** at `~/.claudemesh/daemon/<slug>/sock` (primary, default)
|
||||||
|
- **TCP loopback** on auto-allocated port written to `http.port` (Docker / Windows clients)
|
||||||
|
- **Server-Sent Events** stream at `GET /v1/events` for push (real-time inbound)
|
||||||
|
|
||||||
|
**No auth on local IPC.** Trust boundary is the OS — UDS is mode 0600, TCP listens on 127.0.0.1 only. If you can reach the socket, you're already running as the right user; the daemon's `keypair.json` is also reachable, so adding a token would be theatre.
|
||||||
|
|
||||||
|
**Endpoint surface — exactly mirrors CLI verbs:**
|
||||||
|
|
||||||
|
```
|
||||||
|
# messaging
|
||||||
|
POST /v1/send {to, message, priority?, meta?, replyToId?}
|
||||||
|
POST /v1/topic/post {topic, message, priority?, mentions?}
|
||||||
|
POST /v1/topic/subscribe {topic}
|
||||||
|
GET /v1/topic/list
|
||||||
|
GET /v1/inbox ?since=<iso>&topic=<n>&from=<peer>&limit=<n>
|
||||||
|
POST /v1/broadcast {message, scope: "*"|"@group"|...}
|
||||||
|
|
||||||
|
# peers + presence
|
||||||
|
GET /v1/peers ?mesh=<slug>
|
||||||
|
POST /v1/profile {summary?, status?, visible?, avatar?, ...}
|
||||||
|
POST /v1/groups/join {name, role?}
|
||||||
|
POST /v1/groups/leave {name}
|
||||||
|
|
||||||
|
# state, memory, vector, graph — full mesh-services platform
|
||||||
|
POST /v1/state/set {key, value, scope?: "mesh"|"member"}
|
||||||
|
GET /v1/state/get ?key=...
|
||||||
|
GET /v1/state/list
|
||||||
|
POST /v1/memory/remember {content, tags?}
|
||||||
|
GET /v1/memory/recall ?q=<query>
|
||||||
|
POST /v1/vector/store {collection, text, metadata?}
|
||||||
|
GET /v1/vector/search ?collection=<c>&q=<query>&limit=<n>
|
||||||
|
POST /v1/graph/query {cypher, params?}
|
||||||
|
|
||||||
|
# files
|
||||||
|
POST /v1/file/share {path, to?, message?, persistent?}
|
||||||
|
GET /v1/file/get ?id=<fileId>&out=<path>
|
||||||
|
GET /v1/file/list
|
||||||
|
|
||||||
|
# tasks + scheduling
|
||||||
|
POST /v1/task/create {title, assignee?, priority?, tags?}
|
||||||
|
POST /v1/task/claim {id}
|
||||||
|
POST /v1/task/complete {id, result?}
|
||||||
|
POST /v1/scheduling/remind {at|in|cron, message, to?}
|
||||||
|
|
||||||
|
# skills + MCP services (full peer participation)
|
||||||
|
POST /v1/skill/deploy {path}
|
||||||
|
POST /v1/skill/share {name, manifest}
|
||||||
|
POST /v1/mcp/register {server_name, description, tools, transport}
|
||||||
|
POST /v1/mcp/call {server, tool, args}
|
||||||
|
|
||||||
|
# events (push)
|
||||||
|
GET /v1/events text/event-stream
|
||||||
|
events: message, peer_join, peer_leave, file_shared, task_assigned,
|
||||||
|
state_changed, mcp_deployed, skill_shared, hook_executed,
|
||||||
|
disconnect, reconnect
|
||||||
|
|
||||||
|
# control plane
|
||||||
|
GET /v1/health {connected, lag_ms, queue_depth, mesh, member_pubkey, uptime_s}
|
||||||
|
GET /v1/metrics Prometheus exposition
|
||||||
|
POST /v1/heartbeat {} (caller asserts it's alive — daemon may set status="working")
|
||||||
|
```
|
||||||
|
|
||||||
|
Every CLI verb the platform offers has a daemon endpoint. No second-class features. Apps written against the daemon get the same surface as Claude Code itself.
|
||||||
|
|
||||||
|
## 4. Outbound — exactly-once via SQLite + idempotency keys
|
||||||
|
|
||||||
|
Sends route through `outbox.db` first, then to the broker. Schema:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE outbox (
|
||||||
|
id TEXT PRIMARY KEY, -- ulid
|
||||||
|
idempotency_key TEXT UNIQUE, -- caller-provided or autogen
|
||||||
|
payload BLOB NOT NULL, -- serialized envelope
|
||||||
|
enqueued_at INTEGER NOT NULL,
|
||||||
|
attempts INTEGER DEFAULT 0,
|
||||||
|
next_attempt_at INTEGER NOT NULL,
|
||||||
|
status TEXT CHECK(status IN ('pending','inflight','done','dead')),
|
||||||
|
last_error TEXT,
|
||||||
|
delivered_at INTEGER
|
||||||
|
);
|
||||||
|
CREATE INDEX outbox_pending ON outbox(status, next_attempt_at);
|
||||||
|
```
|
||||||
|
|
||||||
|
- WAL mode, `synchronous=NORMAL` — durable enough, ~10k inserts/sec.
|
||||||
|
- Caller-supplied `Idempotency-Key` header dedupes retries (24h window).
|
||||||
|
- Exponential backoff with jitter; 7-day max retention; `dead` rows surface in `claudemesh daemon outbox --failed`.
|
||||||
|
- `delivered_at` set when broker ACKs the queue row, not when daemon sends — gives true at-least-once with explicit dedupe → effectively exactly-once.
|
||||||
|
|
||||||
|
## 5. Inbound — durable history with FTS
|
||||||
|
|
||||||
|
Every inbound message is written to `inbox.db` before any hook fires:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE VIRTUAL TABLE inbox USING fts5(
|
||||||
|
message_id UNINDEXED, mesh UNINDEXED, topic, sender_pubkey UNINDEXED,
|
||||||
|
sender_name, body, meta, received_at UNINDEXED, replied_to_id UNINDEXED
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
- 30-day rolling retention (configurable).
|
||||||
|
- `claudemesh daemon search "OOM"` queries the FTS index (instant, offline-capable).
|
||||||
|
- Apps that connect mid-stream replay history via `?since=<iso>`.
|
||||||
|
- Exposed in metrics: `cm_daemon_inbox_rows`, `cm_daemon_inbox_bytes`.
|
||||||
|
|
||||||
|
## 6. Hooks — first-class scripted reactions
|
||||||
|
|
||||||
|
Hooks turn the daemon from a passive relay into an autonomous peer. Files in `hooks/`:
|
||||||
|
|
||||||
|
```
|
||||||
|
hooks/
|
||||||
|
on-message.sh every inbound message (DM + topic)
|
||||||
|
on-dm.sh DMs only
|
||||||
|
on-mention.sh when @<my-name> appears anywhere
|
||||||
|
on-topic-<name>.sh a specific topic (e.g. on-topic-alerts.sh)
|
||||||
|
on-file-share.sh file shared with me
|
||||||
|
on-task-assigned.sh task assigned to me
|
||||||
|
on-disconnect.sh WS dropped (informational)
|
||||||
|
on-reconnect.sh reconnected (informational)
|
||||||
|
on-startup.sh daemon up
|
||||||
|
pre-send.sh filter / mutate outbound (last gate)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Contract:**
|
||||||
|
- Stdin: full event JSON.
|
||||||
|
- Stdout (if non-empty, JSON object): used as a structured response. For inbound messages, `{reply: "..."}` posts a reply automatically.
|
||||||
|
- Exit 0 = success; non-zero logs + counts but does not retry.
|
||||||
|
- Timeout: 30s default, override via `# claudemesh:timeout=120s` shebang comment.
|
||||||
|
- Env: `PATH=/usr/bin:/bin`, `CLAUDEMESH_MESH=<slug>`, `CLAUDEMESH_MEMBER=<pubkey>`, `CLAUDEMESH_HOME=<config-dir>`, plus the daemon's own broker session token in `CLAUDEMESH_TOKEN` so the script can call `claudemesh send` without re-authenticating.
|
||||||
|
- Concurrent execution: bounded pool (default 8) — overflow queues, never blocks the WS reader.
|
||||||
|
|
||||||
|
This makes a server a real participant: it auto-replies to "@worker-3 status?", auto-acks file shares, auto-claims tasks, escalates errors to oncall — all configured by dropping shell scripts in a directory.
|
||||||
|
|
||||||
|
## 7. Multi-mesh — one daemon per mesh, coordinated by a supervisor
|
||||||
|
|
||||||
|
Multi-mesh handled by **one daemon per mesh** (no shared state, no cross-mesh leakage). Coordinated by:
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh daemon up --all # spawns one daemon per joined mesh
|
||||||
|
claudemesh daemon down --all
|
||||||
|
claudemesh daemon status --all # JSON table of every daemon
|
||||||
|
claudemesh daemon ps # alias of status
|
||||||
|
```
|
||||||
|
|
||||||
|
CLI verbs without `--mesh` continue to do their existing aggregator routing (`/v1/me/...`) and additionally each daemon contributes inbound state to the aggregator.
|
||||||
|
|
||||||
|
## 8. Auto-routing — every CLI verb prefers the daemon
|
||||||
|
|
||||||
|
The CLI's `withMesh` helper is replaced by `viaDaemonOrMesh`:
|
||||||
|
|
||||||
|
1. Read `~/.claudemesh/daemon/<slug>/pid`.
|
||||||
|
2. If alive → call the daemon's UDS endpoint.
|
||||||
|
3. Else → cold path (existing `withMesh` flow, opens its own short-lived WS).
|
||||||
|
|
||||||
|
Transparent to the user. `claudemesh send X "msg"` from a script becomes a sub-millisecond local UDS call when a daemon is up, instead of a 1-second broker handshake.
|
||||||
|
|
||||||
|
## 9. Service installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
claudemesh daemon install-service # writes systemd unit / launchd plist / Windows SC
|
||||||
|
claudemesh daemon uninstall-service
|
||||||
|
```
|
||||||
|
|
||||||
|
Generated unit:
|
||||||
|
- `Restart=on-failure`, `RestartSec=5s`
|
||||||
|
- `MemoryMax=512M` (will rarely use this)
|
||||||
|
- `StandardOutput/Error=journal`
|
||||||
|
- For systemd, runs as the invoking user (no root needed).
|
||||||
|
|
||||||
|
`claudemesh install` (the existing setup verb) gains an opt-in prompt: *"Install as a background service that always runs?"* For interactive users this is opt-in; for `--yes` it defaults to yes on Linux servers (detected by absence of TTY + presence of systemd).
|
||||||
|
|
||||||
|
## 10. Observability
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh daemon status human-readable: connected, lag, queue, hooks fired
|
||||||
|
claudemesh daemon status --json machine-readable
|
||||||
|
claudemesh daemon logs [-f] tail daemon.log
|
||||||
|
claudemesh daemon outbox pending sends + dead-letter queue
|
||||||
|
claudemesh daemon inbox recent received messages (FTS-searchable)
|
||||||
|
claudemesh daemon metrics prints /v1/metrics
|
||||||
|
|
||||||
|
# Prometheus counters/gauges:
|
||||||
|
cm_daemon_connected{mesh} 0/1
|
||||||
|
cm_daemon_reconnects_total{mesh,reason}
|
||||||
|
cm_daemon_lag_ms{mesh} last broker round-trip
|
||||||
|
cm_daemon_outbox_depth{mesh}
|
||||||
|
cm_daemon_outbox_dead_total{mesh}
|
||||||
|
cm_daemon_send_total{mesh,kind=topic|dm|broadcast,status}
|
||||||
|
cm_daemon_recv_total{mesh,kind=topic|dm,from_type=peer|apikey|webhook}
|
||||||
|
cm_daemon_hook_invocations_total{hook,exit}
|
||||||
|
cm_daemon_hook_duration_seconds{hook} histogram
|
||||||
|
cm_daemon_ipc_request_total{endpoint,status}
|
||||||
|
cm_daemon_ipc_duration_seconds{endpoint} histogram
|
||||||
|
```
|
||||||
|
|
||||||
|
Tracing: optional OpenTelemetry export (`config.toml: [otel] endpoint = ...`) — emits spans for every IPC request + downstream broker call.
|
||||||
|
|
||||||
|
## 11. SDKs — three, all thin
|
||||||
|
|
||||||
|
The daemon's HTTP+UDS surface is the API; SDKs are convenience wrappers, not new surfaces.
|
||||||
|
|
||||||
|
**Python** (single file, stdlib only — no `requests`, no `aiohttp`):
|
||||||
|
```python
|
||||||
|
from claudemesh import Daemon
|
||||||
|
cm = Daemon() # auto-discovers running daemon for current cwd's mesh
|
||||||
|
cm.send("@oncall", "OOM detected")
|
||||||
|
cm.topic.post("alerts", "build done", mentions=["alice"])
|
||||||
|
for evt in cm.events(): # SSE stream, blocking iterator
|
||||||
|
if evt.kind == "message" and "@me" in evt.body:
|
||||||
|
cm.send(evt.from_pubkey, "got it, on it")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Go** (single file, stdlib only — no third-party deps):
|
||||||
|
```go
|
||||||
|
cm, _ := claudemesh.Connect()
|
||||||
|
cm.Send(ctx, "@oncall", "OOM detected")
|
||||||
|
for evt := range cm.Events(ctx) { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
**TypeScript / Node** (zero runtime deps, ESM only):
|
||||||
|
```ts
|
||||||
|
import { Daemon } from "@claudemesh/daemon-client";
|
||||||
|
const cm = await Daemon.connect();
|
||||||
|
await cm.send("@oncall", "OOM detected");
|
||||||
|
for await (const evt of cm.events()) { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
Each is ~300 lines. All three are versioned in lockstep with the daemon's `/v1` surface. A `/v2` surface (when it eventually exists) keeps `/v1` alive indefinitely — old SDKs never break.
|
||||||
|
|
||||||
|
## 12. Security model — explicit boundaries
|
||||||
|
|
||||||
|
| Boundary | Trust | Mechanism |
|
||||||
|
|---|---|---|
|
||||||
|
| App ↔ Daemon (local) | OS user | UDS 0600, TCP loopback only |
|
||||||
|
| Daemon ↔ Broker | Mesh keypair | WSS + ed25519 hello sig + crypto_box DM envelopes + per-topic keys (existing model) |
|
||||||
|
| Hook ↔ Daemon (env) | OS user + filesystem | `hooks/` dir mode 0700; only files there execute; no remote install |
|
||||||
|
| Daemon ↔ Disk | OS user | All daemon files mode 0600/0644 under `~/.claudemesh/daemon/` |
|
||||||
|
|
||||||
|
**No new attack surface introduced by the daemon** — apps that previously could read `~/.claudemesh/config.json` directly already had full mesh access; the daemon just adds an IPC layer on top.
|
||||||
|
|
||||||
|
**Hook RCE consideration**: a peer cannot install a hook on your daemon. Hooks are files YOU put on disk. Inbound messages can only trigger hooks that already exist with content you wrote. The broker has no path to your hook directory.
|
||||||
|
|
||||||
|
## 13. Configuration — `config.toml`
|
||||||
|
|
||||||
|
```toml
|
||||||
|
[daemon]
|
||||||
|
mesh = "prod" # set on `daemon up --mesh`; immutable thereafter
|
||||||
|
display_name = "runpod-worker-3"
|
||||||
|
log_level = "info"
|
||||||
|
|
||||||
|
[ipc]
|
||||||
|
http_port = 0 # 0 = auto-allocate
|
||||||
|
http_bind = "127.0.0.1" # never 0.0.0.0; explicit if you know what you're doing
|
||||||
|
uds_mode = "0600"
|
||||||
|
|
||||||
|
[outbox]
|
||||||
|
max_queue_size = 10000
|
||||||
|
max_age_hours = 168 # 7 days
|
||||||
|
fsync_mode = "batched_50ms" # 'strict' | 'batched_50ms' | 'off'
|
||||||
|
|
||||||
|
[inbox]
|
||||||
|
retention_days = 30
|
||||||
|
fts_enabled = true
|
||||||
|
|
||||||
|
[reconnect]
|
||||||
|
initial_backoff_ms = 500
|
||||||
|
max_backoff_ms = 30000
|
||||||
|
backoff_multiplier = 2.0
|
||||||
|
jitter_pct = 25
|
||||||
|
|
||||||
|
[hooks]
|
||||||
|
enabled = true
|
||||||
|
concurrency = 8
|
||||||
|
default_timeout_s = 30
|
||||||
|
|
||||||
|
[metrics]
|
||||||
|
prometheus_enabled = true
|
||||||
|
otel_endpoint = "" # empty = disabled
|
||||||
|
```
|
||||||
|
|
||||||
|
User-editable. `claudemesh daemon reload` re-reads it without dropping the WS.
|
||||||
|
|
||||||
|
## 14. Migration — what changes for existing users
|
||||||
|
|
||||||
|
- `claudemesh launch` (Claude Code mode) is unchanged. It can optionally `--via-daemon` to share the WS with a running daemon, but defaults to its own session (preserves "ephemeral session" semantics that Claude Code expects).
|
||||||
|
- `claudemesh send X "msg"` and every other cold-path verb gets a transparent speedup when a daemon is up. No flag, no opt-in, no behavior difference visible to the user.
|
||||||
|
- Existing `~/.claudemesh/config.json` is consumed unchanged by the daemon.
|
||||||
|
- No DB migration. No broker changes. The daemon talks to the existing `/v1` HTTPS + WSS surfaces — broker doesn't even know whether a connection is `claudemesh launch` or `claudemesh daemon`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What needs review
|
||||||
|
|
||||||
|
Please critically review this spec for the v0.9.0 anchor. Specifically I want
|
||||||
|
your hardest pushback on:
|
||||||
|
|
||||||
|
1. **Identity model** — persistent member by default vs ephemeral session. Have I
|
||||||
|
missed a case where ephemeral is the right answer for a daemon? Should
|
||||||
|
`--ephemeral` exist?
|
||||||
|
2. **No-auth local IPC** — UDS 0600 + TCP loopback. Is "OS-trust is enough"
|
||||||
|
actually safe in shared-tenant Linux (multi-user host, container
|
||||||
|
side-channel)? Should there be a per-daemon token even locally?
|
||||||
|
3. **SQLite outbox/inbox** — single writer, WAL, batched fsync. Is the
|
||||||
|
exactly-once-via-idempotency-key claim defensible? What's the failure mode
|
||||||
|
I'm glossing over?
|
||||||
|
4. **Hooks fork-execing scripts** — RCE/data-exfil concerns I'm dismissing too
|
||||||
|
easily? Should hooks be sandboxed (seccomp, no network, …)?
|
||||||
|
5. **Auto-routing CLI verbs through daemon** — does this break composability
|
||||||
|
with existing `claudemesh launch`? Race conditions when both are running?
|
||||||
|
What about pidfile-stale detection?
|
||||||
|
6. **One daemon per mesh** — why not one daemon serving all meshes, with mesh
|
||||||
|
selection per-request? What does single-daemon actually buy beyond "fewer
|
||||||
|
processes"?
|
||||||
|
7. **The IPC surface duplicates the broker REST surface** — am I solving a
|
||||||
|
problem the broker REST + per-mesh apikey already solves, with extra
|
||||||
|
complexity for caching + queueing?
|
||||||
|
8. **What's missing entirely** — auth boundaries, recovery flows, on-disk
|
||||||
|
secret rotation, anything else a production daemon shipped with this spec
|
||||||
|
would lack?
|
||||||
|
|
||||||
|
Score the spec on each axis: 1 = serious flaw, 5 = sound. Then list the
|
||||||
|
top 3 changes you'd insist on before I write any code. Be ruthless — pre-launch
|
||||||
|
window means I can break anything.
|
||||||
@@ -0,0 +1,218 @@
|
|||||||
|
# `claudemesh daemon` — broker-hardening followups
|
||||||
|
|
||||||
|
> **Purpose**: refinements found during the v6 → v10 codex review series
|
||||||
|
> that are real improvements but **not** v0.9.0 blockers. The
|
||||||
|
> implementation target is `2026-05-03-daemon-spec-v0.9.0.md`. This
|
||||||
|
> document lists what was deferred, why, and the trigger that promotes
|
||||||
|
> each item to "must-do."
|
||||||
|
>
|
||||||
|
> **Background**: codex reviewed the daemon spec across 9 rounds (v1
|
||||||
|
> through v10). Rounds 1–4 found load-bearing architectural issues
|
||||||
|
> (identity, IPC auth, exactly-once lie, hook tokens, rotation, etc.).
|
||||||
|
> Rounds 5–9 found progressively finer correctness issues inside one
|
||||||
|
> subsystem (broker idempotency mechanics). v6 closed the architectural
|
||||||
|
> review; v7–v10 are increasingly fine-grained idempotency-correctness
|
||||||
|
> shavings on the same layer. Pre-launch (no users) doesn't need v7–v10
|
||||||
|
> level rigor. We pulled the cheap wins into v0.9.0; the rest waits.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. B0 dedupe fast-path before rate-limit (v10)
|
||||||
|
|
||||||
|
**What v10 said**: read `mesh.client_message_dedupe` BEFORE consulting
|
||||||
|
the rate limiter. Existing id (match or mismatch) returns immediately
|
||||||
|
without touching rate-limit budget.
|
||||||
|
|
||||||
|
**Why deferred**: v0.9.0 doesn't have meaningful rate-limit pressure on
|
||||||
|
the daemon path. The split-brain failure (broker accepted, daemon
|
||||||
|
believes failure due to rate-limit-rejection-on-retry) requires
|
||||||
|
sustained saturated rate-limit windows, which don't exist pre-launch.
|
||||||
|
|
||||||
|
**Promote when**: any single mesh sees rate-limit rejections AND has
|
||||||
|
daemon retries against committed ids. Telemetry to watch:
|
||||||
|
`cm_broker_rate_limit_rejection_total` per mesh > 0 sustained.
|
||||||
|
|
||||||
|
**Implementation cost**: small — one indexed PK lookup before the
|
||||||
|
existing limiter call. The work is mostly testing the race semantics.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Lua-scripted idempotent rate limiter (v10)
|
||||||
|
|
||||||
|
**What v10 said**: limiter keyed by `(mesh_id, client_message_id,
|
||||||
|
window_bucket)` so retries-within-window consume budget at most once.
|
||||||
|
|
||||||
|
**Why deferred**: depends on (1) above. Without B0 fast-path this is
|
||||||
|
incremental complexity for marginal benefit. With B0 it becomes the
|
||||||
|
right belt-and-suspenders fix for the rare race where two same-id
|
||||||
|
requests both miss B0 simultaneously.
|
||||||
|
|
||||||
|
**Promote when**: B0 ships. Same trigger.
|
||||||
|
|
||||||
|
**Implementation cost**: medium — Lua script in Redis, careful TTL
|
||||||
|
tuning, integration with existing limiter call sites.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. In-tx `mesh.mention_index` (v8)
|
||||||
|
|
||||||
|
**What v8 said**: mention-fanout index updates should commit inside the
|
||||||
|
broker accept transaction so mention-search reads can never see a
|
||||||
|
mention pointing at an uncommitted message.
|
||||||
|
|
||||||
|
**Why deferred**: the lag between accept-commit and async
|
||||||
|
mention-indexer is small (single-digit milliseconds in expected
|
||||||
|
deployment). Stale-read window during mention search is acceptable for
|
||||||
|
v0.9.0; receivers learn of mentions via the `mention` event in their
|
||||||
|
inbox stream regardless.
|
||||||
|
|
||||||
|
**Promote when**: real users complain about "I was mentioned but the
|
||||||
|
mention search doesn't show it" with reproducible cases that don't
|
||||||
|
self-heal in seconds.
|
||||||
|
|
||||||
|
**Implementation cost**: small — add `INSERT INTO mesh.mention_index`
|
||||||
|
to the accept transaction. The async indexer becomes a backfill
|
||||||
|
fallback rather than the primary path.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. 4011 / 4012 close-code split (v6 §15.5)
|
||||||
|
|
||||||
|
**What v6 said**: split `4010 feature_unavailable` into three codes:
|
||||||
|
`4010` (missing), `4011` (params invalid), `4012` (params below floor).
|
||||||
|
|
||||||
|
**Why deferred**: v0.9.0 ships single `4010` with structured
|
||||||
|
`close_reason` JSON containing `kind`, `feature`, `detail`. Same
|
||||||
|
diagnostic information, simpler protocol surface.
|
||||||
|
|
||||||
|
**Promote when**: ops tooling or external monitoring needs distinct
|
||||||
|
status codes (e.g. PagerDuty rules that fire on 4012-only). Probably
|
||||||
|
never; structured JSON is parseable.
|
||||||
|
|
||||||
|
**Implementation cost**: trivial — three constants and a switch on
|
||||||
|
`close_reason.kind`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Per-OS fingerprint precedence elaborate table (v8 §2.2.1)
|
||||||
|
|
||||||
|
**What v8 said**: comprehensive per-OS table covering Linux machine-id
|
||||||
|
sources, macOS `IOPlatformUUID`, Windows `MachineGuid`, BSD
|
||||||
|
`kern.hostuuid`, plus interface exclusion rules.
|
||||||
|
|
||||||
|
**Why deferred**: v0.9.0 ships with the simpler "machine-id ||
|
||||||
|
first-stable-mac" rule from v6. Edge cases (cloud images,
|
||||||
|
machine-id-not-readable, etc.) are documented when first hit.
|
||||||
|
|
||||||
|
**Promote when**: operators report fingerprint false-positives we can't
|
||||||
|
explain from the v6 rule. Each report adds one row to the per-OS
|
||||||
|
table.
|
||||||
|
|
||||||
|
**Implementation cost**: incremental — each OS-specific source is a
|
||||||
|
small probe function with a fallback chain.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. `request_fingerprint` schema-version-2 in feature negotiation (v6 §15.1)
|
||||||
|
|
||||||
|
**What v6 said**: `client_message_id_dedupe` feature parameters
|
||||||
|
versioned independently. v0.9.0 ships at version 1 with a single
|
||||||
|
`request_fingerprint: bool` flag.
|
||||||
|
|
||||||
|
**Why deferred**: we don't yet need parameterized fingerprint variants
|
||||||
|
(different canonical forms, different hash algos). Version-bump path
|
||||||
|
is documented; we'll use it when we add the second fingerprint mode.
|
||||||
|
|
||||||
|
**Promote when**: we want a fingerprint algo other than sha256/JCS
|
||||||
|
(e.g. a faster hash, or a normalized canonical form).
|
||||||
|
|
||||||
|
**Implementation cost**: small — single feature-bit version bump
|
||||||
|
following the documented pattern.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Force-expiry / quarantine semantics for `keypair-archive.json` (v8 §14.1.1)
|
||||||
|
|
||||||
|
**What v8 said**: `max_archived_keys` cap with force-expiry; explicit
|
||||||
|
quarantine of malformed archive (`keypair-archive.json.malformed-<ts>`);
|
||||||
|
duplicate `key_id` rejection; mode-mismatch warning behavior.
|
||||||
|
|
||||||
|
**Why deferred**: v0.9.0 ships the simpler v6 rule — drop expired
|
||||||
|
entries on cleanup pass; refuse to start on malformed archive (loud,
|
||||||
|
operator-actionable). The v8 elaboration makes archive corruption
|
||||||
|
non-blocking, which is operationally nicer but trades off audit
|
||||||
|
clarity.
|
||||||
|
|
||||||
|
**Promote when**: a real operator hits an archive corruption that
|
||||||
|
shouldn't have brought the daemon down (e.g. mid-rotation crash leaves
|
||||||
|
a partially-written archive).
|
||||||
|
|
||||||
|
**Implementation cost**: small — quarantine logic + one extra startup
|
||||||
|
check.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Cross-language JCS conformance for `request_fingerprint` (v6 §4.4 round-6 question)
|
||||||
|
|
||||||
|
**What v6 asked**: does JCS work cross-language for
|
||||||
|
`meta_canonical_json`? Python json.dumps, Go encoding/json, and JS
|
||||||
|
JSON.stringify all behave differently. Should we ship a vetted JCS lib
|
||||||
|
in each SDK?
|
||||||
|
|
||||||
|
**Why deferred from v0.9.0**: the daemon ships in TypeScript only for
|
||||||
|
v0.9.0 (the `claudemesh-cli` package). Single-language JCS is trivial.
|
||||||
|
SDK ports come post-v0.9.0.
|
||||||
|
|
||||||
|
**Promote when**: we ship the Python or Go SDK. Each SDK port gets a
|
||||||
|
JCS conformance test against a corpus of envelopes.
|
||||||
|
|
||||||
|
**Implementation cost**: small per-language — a conformance fixture
|
||||||
|
file and a unit test.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Sprint 7 (this session) — what landed vs deferred
|
||||||
|
|
||||||
|
**Landed in code** (not yet deployed):
|
||||||
|
- `packages/db/migrations/0028_message_queue_idempotency_fields.sql` adds
|
||||||
|
nullable `client_message_id` and `request_fingerprint` columns to
|
||||||
|
`mesh.message_queue` (additive, online-safe).
|
||||||
|
- `apps/broker/src/broker.ts` — `queueMessage` and `drainForMember`
|
||||||
|
thread the new columns through.
|
||||||
|
- `apps/broker/src/index.ts` — `handleSend` picks them up from the
|
||||||
|
daemon's wire envelope; outbound push echoes them back so receiving
|
||||||
|
daemons can dedupe.
|
||||||
|
- `apps/broker/src/types.ts` — `WSPushMessage` declares the optional
|
||||||
|
fields.
|
||||||
|
|
||||||
|
**Deployment plan (not auto-applied)**:
|
||||||
|
1. Apply migration against prod DB (the broker's filename-tracked
|
||||||
|
migrator picks up `0028_*.sql` on next startup).
|
||||||
|
2. Deploy the broker with the code changes via Coolify.
|
||||||
|
3. Verify a daemon-originated send shows non-null `client_message_id`
|
||||||
|
in `mesh.message_queue` afterwards.
|
||||||
|
|
||||||
|
**Still deferred** (full broker hardening):
|
||||||
|
- `mesh.client_message_dedupe` table with `request_fingerprint BYTEA`
|
||||||
|
and atomic accept transaction (spec §4.7).
|
||||||
|
- Feature-bit advertisement on hello_ack of
|
||||||
|
`client_message_id_dedupe` v1, with daemon-side enforcement (spec §15).
|
||||||
|
- Partial unique index `(mesh_id, client_message_id) WHERE NOT NULL`.
|
||||||
|
|
||||||
|
These sit behind the same trigger as the followups below: do them when
|
||||||
|
real users hit operational corners that this addressing doesn't cover.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## How to use this document
|
||||||
|
|
||||||
|
When picking up post-v0.9.0 work on the daemon:
|
||||||
|
|
||||||
|
1. Check whether any of the "promote when" triggers above have fired.
|
||||||
|
2. If yes, consult the corresponding versioned spec (v6/v7/v8/v9/v10)
|
||||||
|
for the full proposed change.
|
||||||
|
3. Implement the lift, update `daemon-spec-v0.9.0.md` to reflect the
|
||||||
|
merge, and remove the item from this followups list.
|
||||||
|
|
||||||
|
The versioned specs live in `.artifacts/specs/` indefinitely as a
|
||||||
|
review-trail audit.
|
||||||
680
.artifacts/shipped/2026-05-03-daemon-spec-v0.9.0.md
Normal file
680
.artifacts/shipped/2026-05-03-daemon-spec-v0.9.0.md
Normal file
@@ -0,0 +1,680 @@
|
|||||||
|
# `claudemesh daemon` — Implementation spec v0.9.0
|
||||||
|
|
||||||
|
> **Implementation target.** Locked from the v1–v10 codex-reviewed spec
|
||||||
|
> series. This document is what we build for v0.9.0 of the daemon.
|
||||||
|
>
|
||||||
|
> **Base**: v6 (the round where the architecture passed codex's
|
||||||
|
> structural review — request_fingerprint, dedupe table, atomicity
|
||||||
|
> contract, feature-bit negotiation, key archive format).
|
||||||
|
>
|
||||||
|
> **Pulled in from v7–v9**: six cheap, load-bearing fixes that close
|
||||||
|
> real v0.9.0-era bugs (not future-scale concerns):
|
||||||
|
>
|
||||||
|
> 1. `aborted` outbox status + audit columns (operator recovery without
|
||||||
|
> destroying audit trail) — v7 §4.5.2
|
||||||
|
> 2. `BEGIN IMMEDIATE` for daemon-local SQLite serialization (v6's
|
||||||
|
> `SELECT FOR UPDATE` is invalid SQLite anyway) — v7 §4.5.1
|
||||||
|
> 3. Daemon-local IPC duplicate lookup table over outbox states ×
|
||||||
|
> fingerprint match/mismatch — v8 §4.5.1
|
||||||
|
> 4. Phase B1/B2/B3 broker validation split (the concept; we don't need
|
||||||
|
> the elaborate phase tables) — v7 §4.6.2
|
||||||
|
> 5. Side-effect inventory (in-tx vs async) as an implementation comment
|
||||||
|
> block — v8 §4.7.1
|
||||||
|
> 6. Two-layer ID model wording: daemon-consumed iff outbox row,
|
||||||
|
> broker-consumed iff dedupe row — v9 §4.1
|
||||||
|
>
|
||||||
|
> **Deferred to broker-hardening followups** (see
|
||||||
|
> `2026-05-03-daemon-spec-broker-hardening-followups.md` for the full list and
|
||||||
|
> rationale): B0 dedupe fast-path, Lua-scripted idempotent rate
|
||||||
|
> limiter, in-tx mention_index, 4011/4012 close-code split, per-OS
|
||||||
|
> fingerprint precedence table, request-fingerprint schema-v2 in
|
||||||
|
> feature negotiation. These are real improvements but not v0.9.0
|
||||||
|
> blockers; they land as the broker matures.
|
||||||
|
>
|
||||||
|
> **Intent §0 unchanged from v2.**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 0. Intent — unchanged, see v2 §0
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Process model — unchanged from v3 §1 / v2 §1
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Identity — unchanged from v5 §2
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. IPC surface — unchanged from v4 §3
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Delivery contract — at-least-once with **request-fingerprinted** dedupe
|
||||||
|
|
||||||
|
Codex r5: dedupe must compare the *whole request shape*, not just
|
||||||
|
`(mesh, client_message_id)`. Otherwise a caller who reuses an idempotency
|
||||||
|
key with a different destination or body silently drops the new send and
|
||||||
|
gets the old send's metadata back.
|
||||||
|
|
||||||
|
### 4.1 The contract (precise)
|
||||||
|
|
||||||
|
> **Two-layer ID rule** (from v9): a `client_message_id` is
|
||||||
|
> **daemon-consumed** iff an outbox row exists for it; **broker-consumed**
|
||||||
|
> iff a dedupe row exists in `mesh.client_message_dedupe`. The two layers
|
||||||
|
> are independent: a daemon-consumed id may or may not be broker-consumed
|
||||||
|
> (depending on whether the send reached broker commit). In v0.9.0 there
|
||||||
|
> are no daemon-bypass clients, so for practical purposes "daemon-consumed"
|
||||||
|
> is the operative rule.
|
||||||
|
>
|
||||||
|
> **Local guarantee**: each successful `POST /v1/send` returns a stable
|
||||||
|
> `client_message_id`. The send is durably persisted to `outbox.db` before
|
||||||
|
> the response returns. The daemon enforces request-fingerprint
|
||||||
|
> idempotency at the IPC layer (§4.5).
|
||||||
|
>
|
||||||
|
> **Local audit guarantee**: a `client_message_id` once written to
|
||||||
|
> `outbox.db` is never released. Operator recovery via `requeue` always
|
||||||
|
> mints a fresh id; the old row stays in `aborted` for audit. There is
|
||||||
|
> no daemon-side path to free a used id.
|
||||||
|
>
|
||||||
|
> **Broker guarantee**: the broker maintains a dedupe record per accepted
|
||||||
|
> `(mesh_id, client_message_id)` in `mesh.client_message_dedupe`. Each
|
||||||
|
> dedupe record carries a canonical `request_fingerprint`. Retries with
|
||||||
|
> the same id AND matching fingerprint collapse to the original
|
||||||
|
> `broker_message_id`. Retries with mismatched fingerprint return
|
||||||
|
> `409 idempotency_key_reused` and do **not** create a new message.
|
||||||
|
>
|
||||||
|
> **Atomicity guarantee**: dedupe row insertion, message row insertion,
|
||||||
|
> and history row insertion happen in one broker DB transaction. Either
|
||||||
|
> all land, or none do. No orphan dedupe rows.
|
||||||
|
>
|
||||||
|
> **End-to-end guarantee**: at-least-once delivery, with
|
||||||
|
> `client_message_id` propagated to receivers' inboxes.
|
||||||
|
|
||||||
|
### 4.2 Daemon-supplied `client_message_id` — unchanged from v3 §4.2
|
||||||
|
|
||||||
|
### 4.3 Broker schema — request fingerprint added (v6)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE mesh.client_message_dedupe (
|
||||||
|
mesh_id UUID NOT NULL REFERENCES mesh.mesh(id) ON DELETE CASCADE,
|
||||||
|
client_message_id TEXT NOT NULL,
|
||||||
|
|
||||||
|
-- The original accepted message; FK NOT enforced because the message row
|
||||||
|
-- may be GC'd by retention sweeps before the dedupe row expires.
|
||||||
|
broker_message_id UUID NOT NULL,
|
||||||
|
|
||||||
|
-- Canonical fingerprint of the original request. Recomputed on every
|
||||||
|
-- duplicate retry; mismatch → 409 idempotency_key_reused. Schema in §4.4.
|
||||||
|
request_fingerprint BYTEA NOT NULL, -- 32-byte sha256
|
||||||
|
|
||||||
|
destination_kind TEXT NOT NULL CHECK(destination_kind IN ('topic','dm','queue')),
|
||||||
|
destination_ref TEXT NOT NULL,
|
||||||
|
first_seen_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||||
|
expires_at TIMESTAMPTZ, -- NULL = `permanent` mode
|
||||||
|
history_available BOOLEAN NOT NULL DEFAULT TRUE, -- flipped FALSE when message row GC'd
|
||||||
|
|
||||||
|
PRIMARY KEY (mesh_id, client_message_id)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX client_message_dedupe_expires_idx
|
||||||
|
ON mesh.client_message_dedupe(expires_at)
|
||||||
|
WHERE expires_at IS NOT NULL;
|
||||||
|
|
||||||
|
ALTER TABLE mesh.topic_message ADD COLUMN client_message_id TEXT;
|
||||||
|
ALTER TABLE mesh.message_queue ADD COLUMN client_message_id TEXT;
|
||||||
|
```
|
||||||
|
|
||||||
|
**`status` column dropped (codex r5)**. Rejected requests do **not**
|
||||||
|
consume idempotency keys. Rationale below in §4.6.
|
||||||
|
|
||||||
|
### 4.4 Request fingerprint — canonical form (NEW v6)
|
||||||
|
|
||||||
|
The fingerprint covers everything that makes a send semantically distinct.
|
||||||
|
A retry must reproduce the same fingerprint bit-for-bit; anything else is
|
||||||
|
a different send and must not be collapsed.
|
||||||
|
|
||||||
|
```
|
||||||
|
request_fingerprint = sha256(
|
||||||
|
envelope_version || 0x00 ||
|
||||||
|
destination_kind || 0x00 ||
|
||||||
|
destination_ref || 0x00 ||
|
||||||
|
reply_to_id_or_empty || 0x00 ||
|
||||||
|
priority || 0x00 ||
|
||||||
|
meta_canonical_json || 0x00 ||
|
||||||
|
body_hash
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
Where:
|
||||||
|
- `envelope_version`: integer string (e.g. `"1"`). Bumps when the envelope
|
||||||
|
shape changes.
|
||||||
|
- `destination_kind`: `topic`, `dm`, or `queue`.
|
||||||
|
- `destination_ref`: topic name, recipient ed25519 pubkey hex, or queue id.
|
||||||
|
- `reply_to_id_or_empty`: original `broker_message_id` or empty string.
|
||||||
|
- `priority`: `now`, `next`, or `low`.
|
||||||
|
- `meta_canonical_json`: the `meta` field, serialized with sorted keys,
|
||||||
|
no whitespace, escape-canonical (RFC 8785 JCS). Empty meta = empty string.
|
||||||
|
- `body_hash`: sha256(body bytes), hex.
|
||||||
|
|
||||||
|
The fingerprint is computed:
|
||||||
|
1. **Daemon-side** before durable outbox persistence — stored as
|
||||||
|
`outbox.request_fingerprint` (NEW column) so retries always produce
|
||||||
|
the same fingerprint regardless of caller behavior.
|
||||||
|
2. **Broker-side** on first receipt — stored in
|
||||||
|
`client_message_dedupe.request_fingerprint`.
|
||||||
|
3. **Broker-side** on every duplicate retry — recomputed and compared
|
||||||
|
byte-equal to the stored value.
|
||||||
|
|
||||||
|
If the daemon and broker disagree on the canonical form (e.g. JCS
|
||||||
|
implementation drift), the broker emits
|
||||||
|
`cm_broker_dedupe_fingerprint_mismatch_total{client_id, mesh_id}` and
|
||||||
|
returns `409 idempotency_key_reused` with a body that includes the
|
||||||
|
broker's fingerprint hex for debugging. Daemons that see this should
|
||||||
|
log it loudly and stop retrying that outbox row (it goes to `dead`).
|
||||||
|
|
||||||
|
### 4.5 Daemon-local idempotency at the IPC layer (from v8)
|
||||||
|
|
||||||
|
The daemon enforces fingerprint idempotency **before** the request hits
|
||||||
|
`outbox.db` so a caller bug never creates duplicate-key/mismatch-payload
|
||||||
|
state at all.
|
||||||
|
|
||||||
|
#### 4.5.1 IPC accept algorithm
|
||||||
|
|
||||||
|
On `POST /v1/send`:
|
||||||
|
|
||||||
|
1. Validate request envelope (auth, schema, size limits, destination
|
||||||
|
resolvable). Failures here return `4xx` immediately. **No outbox
|
||||||
|
row is written; the `client_message_id` is not consumed.**
|
||||||
|
2. Compute `request_fingerprint` (§4.4).
|
||||||
|
3. Open a SQLite transaction with `BEGIN IMMEDIATE` so a concurrent IPC
|
||||||
|
accept on the same id serializes against this one. `BEGIN IMMEDIATE`
|
||||||
|
acquires the RESERVED lock at transaction start; SQLite has no
|
||||||
|
row-level lock and `SELECT FOR UPDATE` is not supported.
|
||||||
|
4. `SELECT id, request_fingerprint, status, broker_message_id,
|
||||||
|
last_error FROM outbox WHERE client_message_id = ?`.
|
||||||
|
5. Apply the lookup table below. For the "(no row)" case, INSERT inside
|
||||||
|
the same transaction.
|
||||||
|
6. COMMIT.
|
||||||
|
|
||||||
|
| Existing row state | Fingerprint | Daemon response |
|
||||||
|
|---|---|---|
|
||||||
|
| (no row) | — | INSERT new outbox row `pending`; return `202 accepted, queued` |
|
||||||
|
| `pending` | match | Return `202 accepted, queued`. No mutation |
|
||||||
|
| `pending` | mismatch | Return `409`, `conflict: "outbox_pending_fingerprint_mismatch"` |
|
||||||
|
| `inflight` | match | Return `202 accepted, inflight`. No mutation |
|
||||||
|
| `inflight` | mismatch | Return `409`, `conflict: "outbox_inflight_fingerprint_mismatch"` |
|
||||||
|
| `done` | match | Return `200 ok, duplicate: true, broker_message_id, history_id`. No broker call |
|
||||||
|
| `done` | mismatch | Return `409`, `conflict: "outbox_done_fingerprint_mismatch", broker_message_id` |
|
||||||
|
| `dead` | match | Return `409`, `conflict: "outbox_dead_fingerprint_match", reason: "<last_error>"` |
|
||||||
|
| `dead` | mismatch | Return `409`, `conflict: "outbox_dead_fingerprint_mismatch"` |
|
||||||
|
| `aborted` | match | Return `409`, `conflict: "outbox_aborted_fingerprint_match"`. Operator-retired id, never reusable |
|
||||||
|
| `aborted` | mismatch | Return `409`, `conflict: "outbox_aborted_fingerprint_mismatch"` |
|
||||||
|
|
||||||
|
Every `409` carries the daemon's `request_fingerprint` (8-byte hex
|
||||||
|
prefix) for client/server canonical-form-drift debugging. A
|
||||||
|
`client_message_id` written to `outbox.db` is permanently bound to that
|
||||||
|
row's lifecycle — the only "free" state is "no row exists".
|
||||||
|
|
||||||
|
#### 4.5.2 Outbox table
|
||||||
|
|
||||||
|
```sql
|
||||||
|
CREATE TABLE outbox (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
client_message_id TEXT NOT NULL UNIQUE,
|
||||||
|
request_fingerprint BLOB NOT NULL, -- 32 bytes
|
||||||
|
payload BLOB NOT NULL,
|
||||||
|
enqueued_at INTEGER NOT NULL,
|
||||||
|
attempts INTEGER DEFAULT 0,
|
||||||
|
next_attempt_at INTEGER NOT NULL,
|
||||||
|
status TEXT CHECK(status IN
|
||||||
|
('pending','inflight','done','dead','aborted')),
|
||||||
|
last_error TEXT,
|
||||||
|
delivered_at INTEGER,
|
||||||
|
broker_message_id TEXT,
|
||||||
|
aborted_at INTEGER, -- v7
|
||||||
|
aborted_by TEXT, -- v7: operator/auto
|
||||||
|
superseded_by TEXT -- v7: id of requeue successor
|
||||||
|
);
|
||||||
|
CREATE INDEX outbox_pending ON outbox(status, next_attempt_at);
|
||||||
|
CREATE INDEX outbox_aborted ON outbox(status, aborted_at) WHERE status = 'aborted';
|
||||||
|
```
|
||||||
|
|
||||||
|
`aborted_at` / `aborted_by` / `superseded_by` give operators a clear
|
||||||
|
audit trail. `superseded_by` lets `outbox inspect` show the chain when
|
||||||
|
a row is requeued multiple times. `request_fingerprint` is computed
|
||||||
|
once at IPC accept time and frozen for the row's lifecycle.
|
||||||
|
|
||||||
|
#### 4.5.3 Operator recovery via `requeue`
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh daemon outbox requeue --id <outbox_row_id>
|
||||||
|
[--new-client-id <id> | --auto]
|
||||||
|
[--patch-payload <path>]
|
||||||
|
```
|
||||||
|
|
||||||
|
Atomically (single SQLite transaction):
|
||||||
|
1. Marks the existing row `aborted`, sets `aborted_at = now`,
|
||||||
|
`aborted_by = "operator"`. Row is **never deleted** — audit trail
|
||||||
|
permanent.
|
||||||
|
2. Mints a fresh `client_message_id` (caller-supplied or auto-ulid).
|
||||||
|
3. Inserts a new outbox row `pending` with the fresh id and the same
|
||||||
|
payload (or patched if `--patch-payload`).
|
||||||
|
4. Sets `superseded_by = <new_row_id>` on the old row.
|
||||||
|
|
||||||
|
The old `client_message_id` is permanently dead. There is no path for
|
||||||
|
an id to become free again.
|
||||||
|
|
||||||
|
### 4.5b Broker duplicate response — three cases
|
||||||
|
|
||||||
|
| Case | HTTP/WS code | Body |
|
||||||
|
|---|---|---|
|
||||||
|
| First insert | `201 created` | `{ broker_message_id, client_message_id, history_id, duplicate: false }` |
|
||||||
|
| Duplicate, fingerprint match | `200 ok` | `{ broker_message_id, client_message_id, history_id, duplicate: true, history_available, first_seen_at }` |
|
||||||
|
| Duplicate, fingerprint mismatch | `409 idempotency_key_reused` | `{ client_message_id, conflict: "request_fingerprint_mismatch", broker_fingerprint_prefix: "ab12cd34..." }` (first 8 bytes hex) |
|
||||||
|
|
||||||
|
Daemon outcomes:
|
||||||
|
- `201` → mark outbox row `done`, store `broker_message_id`.
|
||||||
|
- `200 duplicate` with `history_available: true` → mark `done`, log INFO.
|
||||||
|
- `200 duplicate` with `history_available: false` → mark `done`, log WARN.
|
||||||
|
- `409 idempotency_key_reused` → mark outbox row `dead`. Operator runs
|
||||||
|
`outbox requeue` (§4.5.3); old id stays `aborted`, new id is fresh.
|
||||||
|
|
||||||
|
### 4.6 Rejected-request semantics — id consumed iff outbox row written
|
||||||
|
|
||||||
|
> **Rule**: a `client_message_id` is daemon-consumed iff the daemon
|
||||||
|
> writes an outbox row. Anything that fails before outbox insertion
|
||||||
|
> (auth, schema, size, destination not resolvable) leaves the id
|
||||||
|
> untouched and freely reusable.
|
||||||
|
|
||||||
|
#### 4.6.1 Daemon-side rejection phasing
|
||||||
|
|
||||||
|
| Phase | When daemon rejects | Outbox row? | Caller may reuse id? |
|
||||||
|
|---|---|---|---|
|
||||||
|
| **A. IPC validation** (auth, schema, size, destination resolvable) | Before §4.5.1 step 3 | No | Yes — id never consumed |
|
||||||
|
| **B. Outbox stored, broker network/transient failure** | After IPC accept, broker `5xx` or timeout | `pending` → retried | N/A — daemon owns retries |
|
||||||
|
| **C. Outbox stored, broker permanent rejection** | Broker returns `4xx` after IPC accept | `dead` | No — rotate via `requeue` |
|
||||||
|
| **D. Operator retirement** | Operator runs `requeue` on `dead` or `pending` row | `aborted` (audit) + new row with fresh id | Old id NEVER reusable; new id is fresh |
|
||||||
|
|
||||||
|
#### 4.6.2 Broker-side rejection phasing (B1 / B2 / B3)
|
||||||
|
|
||||||
|
The broker validates in three phases relative to dedupe-row insertion:
|
||||||
|
|
||||||
|
| Phase | Validation | Side effects | Result for direct broker callers (none in v0.9.0) |
|
||||||
|
|---|---|---|---|
|
||||||
|
| **B1. Pre-dedupe-claim** | Auth (mesh membership), schema, size, mesh exists, member exists, destination kind valid, payload bytes ≤ `max_payload.inline_bytes`, rate limit not exceeded | None | `4xx`. No dedupe row. Direct broker caller may retry with same id |
|
||||||
|
| **B2. Post-dedupe-claim** (in-tx) | destination_ref existence (topic exists, member subscribed, etc.) | INSERT into dedupe rolled back | `4xx`, transaction rolled back, no dedupe row remains. Direct broker caller may retry with same id |
|
||||||
|
| **B3. Accepted** | All side effects commit atomically | Dedupe row, message row, history row, delivery_queue rows | `201` with `broker_message_id` |
|
||||||
|
|
||||||
|
**Daemon-mediated callers (the only path in v0.9.0)** see only the
|
||||||
|
daemon-layer rules of §4.6.1: any broker `4xx` after IPC accept lands
|
||||||
|
the outbox row in `dead`. Daemon-mediated callers MUST rotate via
|
||||||
|
`requeue` (§4.5.3); the daemon-consumed id is never reusable
|
||||||
|
regardless of whether the broker layer sees a dedupe row. The "may
|
||||||
|
retry with same id" wording above describes broker-bypass callers
|
||||||
|
only, which v0.9.0 does not have.
|
||||||
|
|
||||||
|
**Critical guarantee**: there is no broker code path where a permanent
|
||||||
|
4xx leaves a dedupe row behind. Either the request committed and a
|
||||||
|
dedupe row exists (B3), or it didn't and no dedupe row exists (B1, B2).
|
||||||
|
"Dedupe row exists" is the unambiguous signal of "id consumed at the
|
||||||
|
broker layer."
|
||||||
|
|
||||||
|
If the broker decides post-commit that an accepted message is invalid
|
||||||
|
(async content-policy job), that's NOT a permanent rejection — it's a
|
||||||
|
follow-up moderation event that operates on the `broker_message_id`,
|
||||||
|
not on the dedupe key.
|
||||||
|
|
||||||
|
Net result: `client_message_dedupe` rows only exist when the broker
|
||||||
|
**successfully** accepted a message and committed it. The single source
|
||||||
|
of truth for "was this idempotency key consumed?" is the existence of
|
||||||
|
the dedupe row. No status enum, no ambiguous states.
|
||||||
|
|
||||||
|
### 4.7 Broker atomicity contract
|
||||||
|
|
||||||
|
#### 4.7.1 Side-effect inventory
|
||||||
|
|
||||||
|
Every successful broker accept atomically commits these durable state
|
||||||
|
changes in **one transaction**:
|
||||||
|
|
||||||
|
| Effect | Table | Why in-tx |
|
||||||
|
|---|---|---|
|
||||||
|
| Dedupe record | `mesh.client_message_dedupe` | Idempotency authority |
|
||||||
|
| Message body | `mesh.topic_message` / `mesh.message_queue` | Authoritative store |
|
||||||
|
| History row | `mesh.message_history` | Replay log; lost-on-rollback breaks ordered replay |
|
||||||
|
| Fan-out work | `mesh.delivery_queue` | Each recipient must see exactly committed messages |
|
||||||
|
|
||||||
|
**Outside the transaction** (non-authoritative or rebuildable):
|
||||||
|
- WS push to live subscribers — best-effort live notifications.
|
||||||
|
- Webhook fan-out — async via `delivery_queue` workers.
|
||||||
|
- Rate-limit counters — telemetry only; authority is the external
|
||||||
|
limiter checked in B1.
|
||||||
|
- Audit log entries — append-only stream; rebuildable from history.
|
||||||
|
- Search/FTS index updates — async via outbox-pattern worker.
|
||||||
|
- Mention index updates — async (deferred in-tx promotion to followups
|
||||||
|
doc).
|
||||||
|
- Metrics — Prometheus, pull-based.
|
||||||
|
|
||||||
|
If any in-transaction insert fails, the transaction rolls back
|
||||||
|
completely. The accept is `5xx` to daemon; daemon retries. No partial
|
||||||
|
state.
|
||||||
|
|
||||||
|
#### 4.7.2 Pseudocode
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Pre-generate broker_message_id (ulid) in code, pass in.
|
||||||
|
BEGIN;
|
||||||
|
|
||||||
|
-- Step 1: try to claim the idempotency key.
|
||||||
|
INSERT INTO mesh.client_message_dedupe
|
||||||
|
(mesh_id, client_message_id, broker_message_id, request_fingerprint,
|
||||||
|
destination_kind, destination_ref, expires_at)
|
||||||
|
VALUES ($mesh_id, $client_id, $msg_id, $fingerprint,
|
||||||
|
$dest_kind, $dest_ref, $expires_at)
|
||||||
|
ON CONFLICT (mesh_id, client_message_id) DO NOTHING;
|
||||||
|
|
||||||
|
-- Step 2: inspect what's actually there now (ours or someone else's).
|
||||||
|
SELECT broker_message_id, request_fingerprint, destination_kind,
|
||||||
|
destination_ref, history_available, first_seen_at
|
||||||
|
FROM mesh.client_message_dedupe
|
||||||
|
WHERE mesh_id = $mesh_id AND client_message_id = $client_id
|
||||||
|
FOR SHARE;
|
||||||
|
|
||||||
|
-- Branch:
|
||||||
|
-- row.broker_message_id == $msg_id → first insert; continue.
|
||||||
|
-- row.broker_message_id != $msg_id → duplicate. Compare fingerprints:
|
||||||
|
-- match → ROLLBACK; return 200 duplicate.
|
||||||
|
-- mismatch → ROLLBACK; return 409 idempotency_key_reused.
|
||||||
|
|
||||||
|
-- Step 3: validate Phase B2 (destination_ref existence — topic exists,
|
||||||
|
-- member subscribed, etc.). If B2 fails → ROLLBACK; return 4xx (no
|
||||||
|
-- dedupe row remains).
|
||||||
|
|
||||||
|
-- Step 4: insert in-tx side effects (§4.7.1).
|
||||||
|
INSERT INTO mesh.topic_message (id, mesh_id, client_message_id, body, ...)
|
||||||
|
VALUES ($msg_id, $mesh_id, $client_id, ...);
|
||||||
|
|
||||||
|
INSERT INTO mesh.message_history (broker_message_id, mesh_id, ...)
|
||||||
|
VALUES ($msg_id, $mesh_id, ...);
|
||||||
|
|
||||||
|
INSERT INTO mesh.delivery_queue (broker_message_id, recipient_pubkey, ...)
|
||||||
|
SELECT $msg_id, member_pubkey, ...
|
||||||
|
FROM mesh.topic_subscription
|
||||||
|
WHERE topic = $dest_ref AND mesh_id = $mesh_id;
|
||||||
|
|
||||||
|
COMMIT;
|
||||||
|
```
|
||||||
|
|
||||||
|
The branch logic determines the response shape (`201` / `200 duplicate`
|
||||||
|
/ `409 idempotency_key_reused`) before COMMIT. The duplicate and 409
|
||||||
|
branches always ROLLBACK because nothing else needs to commit.
|
||||||
|
`SELECT … FOR SHARE` blocks concurrent writers from upgrading the same
|
||||||
|
dedupe row mid-transaction.
|
||||||
|
|
||||||
|
#### 4.7.3 Failure modes
|
||||||
|
|
||||||
|
- Crash before `COMMIT`: all rows roll back. Next daemon retry inserts
|
||||||
|
cleanly.
|
||||||
|
- Crash after `COMMIT` but before WS ACK: dedupe row exists. Daemon
|
||||||
|
retries → fingerprint matches → `200 duplicate`. Net: exactly one
|
||||||
|
broker-accepted row, one daemon `done` transition.
|
||||||
|
- Constraint violation on message row insert: rolls back the whole tx.
|
||||||
|
`5xx` to daemon. Same fingerprint reproduces; daemon eventually
|
||||||
|
marks `dead`. No orphan dedupe row.
|
||||||
|
|
||||||
|
Counter `cm_broker_dedupe_orphan_check_total` runs nightly and
|
||||||
|
validates that every `client_message_dedupe` row has a matching
|
||||||
|
`topic_message` / `message_queue` row OR the matching row has been
|
||||||
|
retention-pruned (`history_available = FALSE`). Inconsistencies logged
|
||||||
|
as `cm_broker_dedupe_orphan_found{mesh_id}` for human review.
|
||||||
|
|
||||||
|
### 4.8 Outbox schema
|
||||||
|
|
||||||
|
The authoritative outbox schema for v0.9.0 is in §4.5.2 (includes
|
||||||
|
`aborted` status and audit columns from the v7 pull). `request_fingerprint`
|
||||||
|
is computed at IPC accept time and frozen for the row's lifecycle —
|
||||||
|
the daemon never recomputes from `payload` post-enqueue (would produce
|
||||||
|
drift if envelope_version changes between daemon runs).
|
||||||
|
|
||||||
|
### 4.9 Outbox max-age math — bounded (v6)
|
||||||
|
|
||||||
|
Codex r5: the v5 formula `(dedupe_retention_days * 24) - 24h_margin`
|
||||||
|
breaks at `dedupe_retention_days = 1` (yields zero) and is undefined
|
||||||
|
behavior at `<= 1`.
|
||||||
|
|
||||||
|
v6 formula and bounds:
|
||||||
|
|
||||||
|
- **Minimum supported broker dedupe retention**: 3 days. Daemon refuses
|
||||||
|
to start if broker advertises `dedupe_retention_days < 3` (treats it
|
||||||
|
as `feature_param_invalid`, exits 4010).
|
||||||
|
- **Daemon `max_age_hours` derivation**:
|
||||||
|
- `permanent` mode → daemon uses config default (168h = 7d), cap 720h
|
||||||
|
(30d).
|
||||||
|
- `retention_scoped` mode → daemon `max_age_hours = max(72,
|
||||||
|
(dedupe_retention_days * 24) - safety_margin_hours)` where
|
||||||
|
`safety_margin_hours = max(24, ceil(dedupe_retention_days * 0.1 *
|
||||||
|
24))`. For `dedupe_retention_days=3` this gives
|
||||||
|
`max(72, 72-24) = 72h`. For 30 days: `max(72, 720-72) = 648h`. For
|
||||||
|
365 days: `max(72, 8760-876) = 7884h`.
|
||||||
|
- The 72h floor prevents the daemon outbox from being uselessly short
|
||||||
|
— three days is enough margin for normal operator response to a
|
||||||
|
paged outage.
|
||||||
|
|
||||||
|
- Operator override allowed via `[outbox] max_age_hours_override = N`,
|
||||||
|
but if `N` exceeds `dedupe_retention_days * 24 - 1` daemon refuses to
|
||||||
|
start with `outbox_max_age_above_dedupe_window`. The override exists
|
||||||
|
for the rare case of a much-shorter-than-default outbox; it does not
|
||||||
|
exist to circumvent the broker's dedupe window.
|
||||||
|
|
||||||
|
### 4.10 Inbox schema — unchanged from v3 §4.5
|
||||||
|
|
||||||
|
### 4.11 Crash recovery — unchanged from v3 §4.6
|
||||||
|
|
||||||
|
### 4.12 Failure modes — corrected for fingerprint model (v6)
|
||||||
|
|
||||||
|
- **Fingerprint mismatch on retry** (`409 idempotency_key_reused`): outbox
|
||||||
|
row marked `dead`. Surfaced in `--failed` view. Operator command
|
||||||
|
`outbox requeue --new-id <id>` rotates `client_message_id` and retries.
|
||||||
|
- **Daemon retry after dedupe row hard-deleted by retention sweep**: in
|
||||||
|
`retention_scoped` mode, daemon `max_age_hours` is bounded inside the
|
||||||
|
retention window (§4.9), so this can only happen via operator override.
|
||||||
|
In that case the retry creates a NEW dedupe row + new message — the
|
||||||
|
caller chose this risk explicitly. Counter
|
||||||
|
`cm_daemon_retry_after_dedupe_expired_total`.
|
||||||
|
- **Daemon retry after dedupe row hard-deleted in `permanent` mode**:
|
||||||
|
cannot happen by definition — `permanent` means no `expires_at`. Only
|
||||||
|
mesh deletion removes dedupe rows.
|
||||||
|
- **Duplicate row, history pruned**: as v5 §4.4. Mark `done`, log
|
||||||
|
`cm_daemon_dedupe_history_pruned_total`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Inbound — unchanged from v3 §5
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Hooks — unchanged from v4 §6
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7-13. Multi-mesh, auto-routing, service install, observability, SDKs, security model, configuration — unchanged from v4
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 14. Lifecycle — unchanged from v5 §14
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 15. Version compat — feature param updated for new dedupe semantics
|
||||||
|
|
||||||
|
### 15.1 Feature bits with parameters (v6 update)
|
||||||
|
|
||||||
|
| Bit | `params.version` | Required parameters | Optional parameters |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `client_message_id_dedupe` | `1` | `mode: "retention_scoped"\|"permanent"`, `dedupe_retention_days: int (>= 3)` (when mode=retention_scoped), `request_fingerprint: bool == true` | `tombstone_history_pruned_window_days: int` |
|
||||||
|
| `concurrent_connection_policy` | `1` | (no parameters) | `default_policy: "prefer_newest"\|"prefer_oldest"\|"allow_concurrent"` |
|
||||||
|
| `member_keypair_rotated_event` | `1` | (no parameters) | — |
|
||||||
|
| `key_epoch` | `1` | `max_concurrent_epochs: int (>= 1)` | — |
|
||||||
|
| `max_payload` | `1` | `inline_bytes: int (>= 1024)`, `blob_bytes: int (>= 1024)` | — |
|
||||||
|
|
||||||
|
`client_message_id_dedupe` ships at `params.version = 1` with
|
||||||
|
`request_fingerprint: bool == true` as a required parameter. A broker
|
||||||
|
that doesn't advertise the feature, or advertises it without
|
||||||
|
`request_fingerprint: true`, is treated as "feature missing" and the
|
||||||
|
daemon refuses to start. That's intentional — v0.9.0 daemons require
|
||||||
|
fingerprint enforcement for safe idempotency.
|
||||||
|
|
||||||
|
The schema-version-2 evolution (parameters that need versioning) is
|
||||||
|
deferred (see followups doc).
|
||||||
|
|
||||||
|
`dedupe_retention_days` minimum is 3 (matches the §4.9 floor).
|
||||||
|
|
||||||
|
### 15.2 Negotiation handshake — unchanged shape from v5 §15.2
|
||||||
|
|
||||||
|
### 15.3 IPC negotiation — unchanged from v3 §15.3
|
||||||
|
|
||||||
|
### 15.4 Compatibility matrix — unchanged from v3 §15.4
|
||||||
|
|
||||||
|
### 15.5 Diagnostic close code (v0.9.0)
|
||||||
|
|
||||||
|
v0.9.0 ships a single WebSocket close code with a structured
|
||||||
|
`close_reason` JSON payload that distinguishes the underlying cause:
|
||||||
|
|
||||||
|
| Code | Reason | `close_reason.kind` values |
|
||||||
|
|---|---|---|
|
||||||
|
| `4010` | `feature_unavailable` | `feature_unavailable` (feature missing from broker's `supported`) · `feature_param_invalid` (params fail validation: missing required, out of bounds, unknown version) · `feature_param_below_floor` (param below daemon's hard floor, e.g. `dedupe_retention_days < 3`) |
|
||||||
|
|
||||||
|
`close_reason` payload shape:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"kind": "feature_unavailable" | "feature_param_invalid" | "feature_param_below_floor",
|
||||||
|
"feature": "client_message_id_dedupe",
|
||||||
|
"detail": "..."
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Daemon logs the full negotiation payload at WARN before exiting;
|
||||||
|
supervisor + alerting catches the restart loop. The split into
|
||||||
|
4011/4012 codes is deferred (see followups doc).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 16. Threat model — unchanged from v4 §16
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 17. Migration — broker dedupe table + atomicity (v6)
|
||||||
|
|
||||||
|
Broker side, deploy order:
|
||||||
|
|
||||||
|
1. `CREATE TABLE mesh.client_message_dedupe` with v6 schema (additive,
|
||||||
|
online-safe).
|
||||||
|
2. `ALTER TABLE mesh.topic_message ADD COLUMN client_message_id`.
|
||||||
|
3. `ALTER TABLE mesh.message_queue ADD COLUMN client_message_id`.
|
||||||
|
4. Broker code refactor: every accept path wraps dedupe insert + message
|
||||||
|
insert in **one transaction** (§4.7). Pre-generated
|
||||||
|
`broker_message_id` (ulid in code) passed in.
|
||||||
|
5. Broker code: nightly job to delete dedupe rows where `expires_at <
|
||||||
|
NOW()` (skip in `permanent` mode).
|
||||||
|
6. Broker code: hook into the message-retention sweep — when a
|
||||||
|
`topic_message` or `message_queue` row is hard-deleted, find the
|
||||||
|
matching dedupe row by `client_message_id` and set `history_available
|
||||||
|
= FALSE`. (Note: `client_message_id` is nullable on those tables for
|
||||||
|
legacy traffic; nullable rows have no dedupe row to update.)
|
||||||
|
7. Broker code: nightly orphan-check job (§4.7); alerts on non-zero.
|
||||||
|
8. Broker advertises `client_message_id_dedupe` feature with
|
||||||
|
`params.version = 1` and `request_fingerprint: true`.
|
||||||
|
9. Daemon refuses to start unless that feature bit is advertised with
|
||||||
|
valid v1 params.
|
||||||
|
|
||||||
|
Rollback plan: feature flag disables fingerprint enforcement broker-side
|
||||||
|
(falls back to existing pre-v6 behavior — no dedupe). Daemons that
|
||||||
|
require fingerprint refuse to start. Operator switches off the feature
|
||||||
|
flag, reverts the daemon, restarts. No data loss; pending dedupe rows
|
||||||
|
remain in place for the next forward roll.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## v0.9.0 lock — what's in vs deferred
|
||||||
|
|
||||||
|
**In** (this document): everything codex r1–r4 ratified plus the six
|
||||||
|
sweet-spot pulls from v7–v9 enumerated at the top — `aborted` outbox
|
||||||
|
status, `BEGIN IMMEDIATE`, IPC duplicate lookup table, B1/B2/B3 phasing
|
||||||
|
concept, side-effect inventory, two-layer ID model.
|
||||||
|
|
||||||
|
**Deferred** (see `2026-05-03-daemon-spec-broker-hardening-followups.md`):
|
||||||
|
- B0 dedupe fast-path before rate-limit (v10).
|
||||||
|
- Lua-scripted idempotent rate limiter keyed by
|
||||||
|
`(mesh, client_id, window)` (v10).
|
||||||
|
- In-tx `mesh.mention_index` (v8).
|
||||||
|
- 4011 / 4012 close-code split (v6 §15.5 — collapsed to 4010 with
|
||||||
|
structured reason JSON for v0.9.0).
|
||||||
|
- Per-OS fingerprint precedence elaborate table (v8 §2.2.1).
|
||||||
|
- `request_fingerprint` schema-version-2 in feature negotiation (v6
|
||||||
|
§15.1 ships at version 1 with `request_fingerprint: bool`).
|
||||||
|
- Force-expiry / quarantine semantics for `keypair-archive.json`
|
||||||
|
(v8 §14.1.1).
|
||||||
|
|
||||||
|
These deferrals are real improvements but not v0.9.0 blockers. They
|
||||||
|
land as the broker matures and we have actual scale-load to optimize
|
||||||
|
against.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Cross-spec note: §15.5 close-code collapse
|
||||||
|
|
||||||
|
For v0.9.0 we ship a single `4010 feature_unavailable` close code with
|
||||||
|
a structured `close_reason` JSON payload that distinguishes the
|
||||||
|
underlying cause:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"close_reason": {
|
||||||
|
"kind": "feature_unavailable" | "feature_param_invalid" | "feature_param_below_floor",
|
||||||
|
"feature": "client_message_id_dedupe",
|
||||||
|
"detail": "..."
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The 4011/4012 split is deferred to followups.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## NON-NORMATIVE: round-6 review trailer (preserved for audit only)
|
||||||
|
|
||||||
|
> **Not part of the v0.9.0 contract.** Preserved verbatim from the
|
||||||
|
> v6 source spec as a record of the open questions at the time of the
|
||||||
|
> codex round-6 review. Items below have either been resolved in this
|
||||||
|
> merged document, deferred to the followups doc, or superseded.
|
||||||
|
> Do NOT use this section as a checklist for implementation.
|
||||||
|
|
||||||
|
1. **Request fingerprint canonical form (§4.4)** — does JCS work
|
||||||
|
cross-language for `meta_canonical_json` (Python json.dumps,
|
||||||
|
Go encoding/json, JS JSON.stringify all behave differently)? Should
|
||||||
|
we ship a vetted JCS lib in each SDK or fall back to a simpler
|
||||||
|
"sorted keys + no spaces + escape-as-stored" rule with conformance
|
||||||
|
tests?
|
||||||
|
2. **Atomicity contract (§4.7)** — is the orphan-check sufficient, or
|
||||||
|
does a violation mean we need a "broker rebuild dedupe from messages"
|
||||||
|
recovery tool? The latter is destructive but useful for ops emergencies.
|
||||||
|
3. **Max-age formula (§4.9)** — is the 72h floor correct? Is the
|
||||||
|
percentage-based safety margin (`max(24, ceil(0.1 * dedupe_window))`)
|
||||||
|
the right shape? Or simpler to say "always 24h"?
|
||||||
|
4. **`409 idempotency_key_reused` recovery flow (§4.5)** — is sending the
|
||||||
|
row to `dead` and surfacing it via `outbox --failed` enough? Should
|
||||||
|
the daemon emit a high-priority event for the SSE stream so operators
|
||||||
|
are paged immediately?
|
||||||
|
5. **Diagnostic close codes (§15.5)** — is splitting 4010/4011/4012
|
||||||
|
useful, or does it just push complexity onto operators? Should we
|
||||||
|
collapse to 4010 with structured close-reason JSON instead?
|
||||||
|
6. **Anything else still wrong?** Read it as if you were going to
|
||||||
|
operate this for a year. What falls down?
|
||||||
|
|
||||||
|
Three options:
|
||||||
|
- **(a) v6 is shippable**: lock the spec, start coding the frozen core.
|
||||||
|
- **(b) v7 needed**: list the must-fix items.
|
||||||
|
- **(c) the architecture itself is wrong**: what would you do differently?
|
||||||
|
|
||||||
|
Be ruthless.
|
||||||
@@ -79,7 +79,43 @@ ephemeral pubkey. Either:
|
|||||||
Decoder pulls the prefix, uses it as the sender pubkey. No schema
|
Decoder pulls the prefix, uses it as the sender pubkey. No schema
|
||||||
change beyond what 0026 already ships.
|
change beyond what 0026 already ships.
|
||||||
|
|
||||||
(b) wins on simplicity. Phase 2 implementation uses it.
|
**(b) wins on simplicity. Phase 3 implementation ships it. Both the
|
||||||
|
broker creator-seal and the CLI re-seal write the
|
||||||
|
`<32-byte sender pubkey><cipher>` blob.** `topic.encrypted_key_pubkey`
|
||||||
|
becomes informational only — the wire-format truth is the inline prefix.
|
||||||
|
|
||||||
|
## Web client gap (phase 3.5)
|
||||||
|
|
||||||
|
The CLI side of phase 3 ships in this cut. The web side does NOT —
|
||||||
|
because web member rows have `peerPubkey` registered server-side but
|
||||||
|
the corresponding ed25519 SECRET is discarded immediately after
|
||||||
|
generation (see `mutations.ts:createMyMesh`). Without the secret the
|
||||||
|
browser can't `crypto_box_open` its sealed topic key.
|
||||||
|
|
||||||
|
Three fixes, in increasing order of effort:
|
||||||
|
|
||||||
|
1. **Browser-side persistent identity (recommended)** — generate an
|
||||||
|
ed25519 keypair in the browser on first dashboard visit, store the
|
||||||
|
secret in IndexedDB, sync the public half to `mesh.member.peerPubkey`
|
||||||
|
via a new `POST /v1/me/peer-pubkey` endpoint. Topic keys then seal
|
||||||
|
to the new pubkey; web user decrypts locally. Existing #general
|
||||||
|
topics need a re-seal cycle (the v0.3.0 phase-3 re-seal loop in
|
||||||
|
the CLI already does this for any pending member, including web
|
||||||
|
ones). Spec lift: ~3 hours, mostly browser code + a sync endpoint.
|
||||||
|
|
||||||
|
2. **Server-held secret** — keep the member's ed25519 secret server-
|
||||||
|
side. Trivial to implement, but the broker can read everything,
|
||||||
|
defeating the security claim. **Rejected.**
|
||||||
|
|
||||||
|
3. **JWT-derived keys** — derive the member's keypair from a stable
|
||||||
|
user-secret (e.g. PBKDF2 over their session JWT). Means cross-
|
||||||
|
device same key, but needs the JWT to include ~32 bytes of stable
|
||||||
|
key material. Tied to v2.0.0 daemon redesign. **Deferred.**
|
||||||
|
|
||||||
|
Phase 3 ships option 1 deferred; web stays on v1 plaintext until 3.5.
|
||||||
|
The CLI re-seal loop in `topic tail` already handles re-sealing for
|
||||||
|
web members ONCE they have a real pubkey — no broker work needed
|
||||||
|
when 3.5 lands.
|
||||||
|
|
||||||
## Option C — leaderless protocol (DEFERRED)
|
## Option C — leaderless protocol (DEFERRED)
|
||||||
|
|
||||||
@@ -95,44 +131,48 @@ asks for FS (forward secrecy) on group chat.
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Phase-2 implementation checklist
|
## Implementation checklist
|
||||||
|
|
||||||
Schema (0026 — done):
|
Schema (0026 — done):
|
||||||
- [x] `topic.encrypted_key_pubkey` (legacy field, will be unused in
|
- [x] `topic.encrypted_key_pubkey` (informational; wire truth is the
|
||||||
Option B's "embed in payload" mode, but keeping it for
|
inline 32-byte prefix on each `topic_member_key.encryptedKey`)
|
||||||
forward-compat if we ever switch to Option C)
|
|
||||||
- [x] `topic_member_key.(encrypted_key, nonce)`
|
- [x] `topic_member_key.(encrypted_key, nonce)`
|
||||||
- [x] `topic_message.body_version` (1 = v0.2.0 plaintext, 2 = v0.3.0 ciphertext)
|
- [x] `topic_message.body_version` (1 = plaintext, 2 = v2 ciphertext)
|
||||||
|
|
||||||
API (some done — see annotations):
|
API (phase 3 — done):
|
||||||
- [x] `GET /v1/topics/:name/key` — fetch the calling member's sealed copy
|
- [x] `GET /v1/topics/:name/key` — fetch the calling member's sealed copy
|
||||||
- [ ] `GET /v1/topics/:name/pending-seals` — list members without keys
|
- [x] `GET /v1/topics/:name/pending-seals` — list members without keys
|
||||||
- [ ] `POST /v1/topics/:name/seal` — submit a re-sealed copy
|
- [x] `POST /v1/topics/:name/seal` — submit a re-sealed copy
|
||||||
|
- [x] `GET /v1/topics/:name/messages` returns `bodyVersion`
|
||||||
|
- [x] `GET /v1/topics/:name/stream` emits `bodyVersion`
|
||||||
|
- [x] `POST /v1/messages` accepts `bodyVersion` (1|2) + skips regex
|
||||||
|
mention extraction on v2
|
||||||
|
|
||||||
Broker:
|
Broker / web mutation (phase 3 — done):
|
||||||
- [x] `createTopic` generates topic key + seals for creator
|
- [x] `createTopic` generates topic key + seals for creator with
|
||||||
- [ ] `joinTopic` becomes a "pending" insert — no key seal
|
inline-sender-pubkey blob format
|
||||||
- [ ] (optional) WS notification to online topic members when a new
|
- [x] `ensureGeneralTopic` (web) mirrors the same flow
|
||||||
joiner arrives, so re-seal latency is sub-second instead of
|
|
||||||
polling-bound
|
|
||||||
|
|
||||||
Client (CLI + web):
|
Client — CLI (phase 3 — done):
|
||||||
- [ ] On topic open, fetch sealed key, decrypt + cache in memory
|
- [x] `services/crypto/topic-key.ts` — fetch + decrypt + encrypt + reseal helpers
|
||||||
- [ ] On send, encrypt body with topic key, set `body_version: 2`
|
- [x] `topic tail` decrypts v2 messages on render
|
||||||
- [ ] On render, decrypt v2 messages with cached key; v1 stays
|
- [x] `topic post` encrypts v2 on send via REST POST /v1/messages
|
||||||
base64 plaintext (legacy)
|
- [x] Background re-seal loop in `topic tail` (30s cadence)
|
||||||
- [ ] Background re-seal loop — poll for pending joiners, seal,
|
|
||||||
POST
|
|
||||||
|
|
||||||
UX:
|
Client — web (phase 3.5 — DEFERRED):
|
||||||
- [ ] "waiting for a peer to share the topic key" state when GET key
|
- [ ] Browser-side persistent identity (IndexedDB)
|
||||||
returns 404
|
- [ ] `POST /v1/me/peer-pubkey` sync endpoint
|
||||||
- [ ] "you are the only online member — joiners can't read messages
|
- [ ] Web chat panel encrypt-on-send + decrypt-on-render (currently v1)
|
||||||
until someone else logs in" warning when sole online holder
|
|
||||||
goes offline
|
|
||||||
|
|
||||||
The phase-2 commit ships only the schema + creator-seal + GET /key.
|
UX surfaces (phase 3 — done in CLI):
|
||||||
The pending-seals endpoint, seal POST, and client encryption land in
|
- [x] "waiting for a peer to share the topic key" warning on tail
|
||||||
phase 3 once this spec gets a code review. Mention fan-out from
|
- [ ] (web) "your encryption keys are pending — pair this browser"
|
||||||
phase 1 already works for both v1 and v2 messages, so /v1/notifications
|
banner once 3.5 lands
|
||||||
keeps working through the cutover.
|
|
||||||
|
Mention fan-out from phase 1 already works for both v1 and v2
|
||||||
|
messages, so `/v1/notifications` keeps working through the cutover.
|
||||||
|
|
||||||
|
The phase-3 cut ships full CLI encryption + re-seal flow. Web remains
|
||||||
|
on v1 plaintext until 3.5 lands the browser identity layer. Mixed
|
||||||
|
CLI+web meshes in the meantime should keep using v1 sends OR accept
|
||||||
|
that web members can't read v2 messages.
|
||||||
|
|||||||
204
.artifacts/specs/2026-05-02-workspace-view.md
Normal file
204
.artifacts/specs/2026-05-02-workspace-view.md
Normal file
@@ -0,0 +1,204 @@
|
|||||||
|
# Workspace view — per-user superset over joined meshes
|
||||||
|
|
||||||
|
**Status:** spec / not started
|
||||||
|
**Target:** v0.4.0
|
||||||
|
**Author:** Alejandro
|
||||||
|
**Date:** 2026-05-02
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Users routinely belong to multiple meshes — work, personal, side
|
||||||
|
projects, ECIJA + flexicar + openclaw + prueba1 in our own dogfood.
|
||||||
|
Today's CLI is mesh-scoped: every read or write either auto-picks the
|
||||||
|
default mesh or forces an interactive picker. Common questions like
|
||||||
|
*"who's online across all my meshes?"* or *"any new @-mentions
|
||||||
|
anywhere?"* require N round-trips, one per mesh.
|
||||||
|
|
||||||
|
A few verbs already aggregate implicitly (`peer list`, `inbox`,
|
||||||
|
`list`), but the surface is patchy and inconsistent.
|
||||||
|
|
||||||
|
We want the equivalent of "all my Slacks in one sidebar" — without
|
||||||
|
breaking the per-mesh trust model that v0.3.0 was built around.
|
||||||
|
|
||||||
|
## What it is NOT
|
||||||
|
|
||||||
|
- **Not a literal universal mesh.** A single global mesh everyone
|
||||||
|
joins collapses the trust boundary, blows up broadcast fan-out
|
||||||
|
(O(users²)), and turns into spam. See the universal-mesh discussion
|
||||||
|
rejected in this same session.
|
||||||
|
- **Not federation.** Federation is the broker-side equivalent
|
||||||
|
(already roadmapped under v0.3.0). Workspace is purely client-side.
|
||||||
|
- **Not identity stitching for *other* peers.** `Mou@openclaw` and
|
||||||
|
`Mou@flexicar-2` may or may not be the same human. Don't auto-merge.
|
||||||
|
Stitching MY identities is fine — local config knows.
|
||||||
|
|
||||||
|
## What it IS
|
||||||
|
|
||||||
|
A virtual layer that aggregates reads across the meshes the user has
|
||||||
|
joined, while keeping writes mesh-scoped. Pure projection over
|
||||||
|
existing per-mesh tables. Zero broker changes. Zero protocol changes.
|
||||||
|
|
||||||
|
```
|
||||||
|
┌──────────────────────────────┐
|
||||||
|
│ workspace │
|
||||||
|
│ (per-user view, client) │
|
||||||
|
└─┬────────┬────────┬─────────┬┘
|
||||||
|
│ │ │ │
|
||||||
|
┌─────▼──┐ ┌───▼──┐ ┌───▼──┐ ┌────▼──┐
|
||||||
|
│ mesh A │ │ B │ │ C │ │ ... │
|
||||||
|
└────────┘ └──────┘ └──────┘ └───────┘
|
||||||
|
(each remains its own crypto + trust domain)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Surface
|
||||||
|
|
||||||
|
### New verbs (all read-only, all aggregating)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
claudemesh me # overview: meshes, online peers, unread, tasks
|
||||||
|
claudemesh me topics # all subscribed topics, namespaced
|
||||||
|
claudemesh me notifications # cross-mesh @-mentions feed
|
||||||
|
claudemesh me activity # cross-mesh recent send/recv/topic-post
|
||||||
|
claudemesh me search "<q>" # full-text across memory + topics + tasks
|
||||||
|
```
|
||||||
|
|
||||||
|
`claudemesh me` (no subcommand) prints a one-screen dashboard:
|
||||||
|
|
||||||
|
```
|
||||||
|
workspace — agutmou (4 meshes · 23 peers visible · 2 unread @you)
|
||||||
|
|
||||||
|
meshes
|
||||||
|
openclaw 7 peers · 3 topics · last activity 2m
|
||||||
|
flexicar-2 5 peers · 1 topic · last activity 18m
|
||||||
|
prueba1 4 peers · idle
|
||||||
|
ECIJA 7 peers · 2 topics · 1 @you · last activity 4h
|
||||||
|
|
||||||
|
unread @-mentions
|
||||||
|
ECIJA · #incident-2026-05-02 · 1 from coronel-abos
|
||||||
|
openclaw · #deploys · 1 from claudemesh-2
|
||||||
|
|
||||||
|
pending tasks (3)
|
||||||
|
ECIJA ship-F4-cliente high claimed by you
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Default-aggregation rule for existing verbs
|
||||||
|
|
||||||
|
When `--mesh` is omitted on a *read-only* verb, aggregate. When
|
||||||
|
`--mesh` is omitted on a *write* verb, fall back to current behavior
|
||||||
|
(default mesh or interactive picker). Already-aggregating verbs keep
|
||||||
|
working unchanged.
|
||||||
|
|
||||||
|
| Verb | Today | After workspace |
|
||||||
|
|---|---|---|
|
||||||
|
| `peer list` | aggregates ✅ | unchanged |
|
||||||
|
| `inbox` | aggregates ✅ | unchanged |
|
||||||
|
| `list` | aggregates ✅ (lists meshes) | unchanged |
|
||||||
|
| `notification list` | mesh-scoped | aggregates by default |
|
||||||
|
| `topic list` | mesh-scoped | aggregates with namespacing |
|
||||||
|
| `task list` | mesh-scoped | aggregates by default |
|
||||||
|
| `state list` | mesh-scoped | aggregates by default |
|
||||||
|
| `memory recall` | mesh-scoped | aggregates by default |
|
||||||
|
| `info` / `stats` / `ping` | mesh-scoped | unchanged (per-mesh diagnostics) |
|
||||||
|
| `send`, `topic post`, `state set`, `remember`, ... | mesh-scoped | unchanged (writes pick a mesh) |
|
||||||
|
|
||||||
|
### Rendering rules for aggregated views
|
||||||
|
|
||||||
|
1. **Topic namespacing.** `#deploys` exists in two meshes — they're
|
||||||
|
different rooms. Render as `openclaw/#deploys`. Inside a
|
||||||
|
mesh-scoped command, keep the bare `#deploys` shorthand.
|
||||||
|
2. **Peer name collisions.** `Mou@openclaw` notation when the same
|
||||||
|
display name resolves in more than one mesh. Single resolution =
|
||||||
|
bare name.
|
||||||
|
3. **Time-grouped activity.** `me activity` sorts globally by ts
|
||||||
|
descending; mesh tag is shown as a dim suffix.
|
||||||
|
4. **Unread roll-up.** `me notifications` is a per-row
|
||||||
|
`[mesh][topic][snippet]` list, newest first.
|
||||||
|
|
||||||
|
## API surface (REST)
|
||||||
|
|
||||||
|
Mirror the read aggregations server-side so the dashboard + future
|
||||||
|
mobile/web UIs share the same endpoints.
|
||||||
|
|
||||||
|
```
|
||||||
|
GET /v1/me # workspace overview
|
||||||
|
GET /v1/me/meshes # joined meshes + summary stats
|
||||||
|
GET /v1/me/topics # all subscribed topics, all meshes
|
||||||
|
GET /v1/me/notifications # cross-mesh @-mentions
|
||||||
|
GET /v1/me/activity # unified activity feed
|
||||||
|
GET /v1/me/peers # already implicit; formalize
|
||||||
|
GET /v1/me/search?q=... # full-text across tables
|
||||||
|
```
|
||||||
|
|
||||||
|
Auth: needs a *user-scoped* api key (one issued per user, sees all
|
||||||
|
their meshes), which we don't have today — current keys are mesh-
|
||||||
|
scoped. Two options:
|
||||||
|
|
||||||
|
- **(a) Per-user key.** New token type `cm_u_...` issued by the
|
||||||
|
dashboard, scopes to all meshes the issuing user belongs to. Cheaper
|
||||||
|
to build; harder to reason about because the blast radius is
|
||||||
|
larger if leaked.
|
||||||
|
- **(b) Multi-mesh aggregation.** Accept N mesh-scoped keys
|
||||||
|
concurrently; CLI auto-mints them via the existing `withRestKey`
|
||||||
|
pattern, one per joined mesh. No new key type. More round-trips on
|
||||||
|
cold start, but rotation/revocation stays simple.
|
||||||
|
|
||||||
|
**Recommendation: (b).** Reuses today's auth model, doesn't widen the
|
||||||
|
blast radius, and the ephemeral keys we already mint per-command keep
|
||||||
|
the surface area minimal. The CLI orchestrates the fan-out client-
|
||||||
|
side.
|
||||||
|
|
||||||
|
## Storage
|
||||||
|
|
||||||
|
Pure projection at first. The cross-mesh queries are SELECT joins
|
||||||
|
over `mesh_member`, `mesh_topic`, `mesh_topic_member`,
|
||||||
|
`mesh_notification`, `mesh_topic_message`, `mesh_task`, `presence`.
|
||||||
|
|
||||||
|
If `me` queries become hot (likely once dashboards land), add a
|
||||||
|
materialized `user_workspace_view` refreshed on writes. Don't
|
||||||
|
optimize early.
|
||||||
|
|
||||||
|
## Effort
|
||||||
|
|
||||||
|
| Component | Effort |
|
||||||
|
|---|---|
|
||||||
|
| CLI verbs (`me`, `me topics`, etc.) | 1.5 days |
|
||||||
|
| Default-aggregation rule across existing verbs | 0.5 day |
|
||||||
|
| REST endpoints `/v1/me/*` | 1 day |
|
||||||
|
| Multi-mesh apikey orchestration in `withRestKey` | 0.5 day |
|
||||||
|
| Tests + docs | 0.5 day |
|
||||||
|
| **Total** | **~4 days** |
|
||||||
|
|
||||||
|
## Open questions
|
||||||
|
|
||||||
|
1. **`me` as namespace vs. flag.** Could be `claudemesh --workspace
|
||||||
|
topics` instead of `claudemesh me topics`. The verb form is
|
||||||
|
shorter and reads better; sticking with it.
|
||||||
|
2. **Notification ordering.** All notifications globally interleaved
|
||||||
|
by ts, or per-mesh sections? Default to **interleaved** with mesh
|
||||||
|
tag prefix; users can `--by-mesh` to group.
|
||||||
|
3. **Search relevance.** Cross-mesh full-text search is easy when each
|
||||||
|
mesh has its own pg full-text index. Cross-mesh ranking is the
|
||||||
|
harder problem (IDF varies). Punt to v0.4.1 — start with simple
|
||||||
|
tied-rank merge.
|
||||||
|
4. **Web dashboard.** Should the web dashboard's main view become a
|
||||||
|
workspace view by default? Yes, but that's downstream of this
|
||||||
|
spec — once `/v1/me/*` exists, the web rewrite is the obvious
|
||||||
|
next step.
|
||||||
|
|
||||||
|
## Out of scope (v0.4.0)
|
||||||
|
|
||||||
|
- Federation / cross-broker workspace.
|
||||||
|
- Identity stitching for non-self peers.
|
||||||
|
- Cross-mesh search ranking sophistication.
|
||||||
|
- Cross-mesh write fan-out (`me broadcast` is intentionally NOT a
|
||||||
|
verb — too easy to misuse).
|
||||||
|
- Mobile/web parity beyond the REST endpoints.
|
||||||
|
|
||||||
|
## Why we ship this
|
||||||
|
|
||||||
|
Because "I want one Slack-like sidebar for all my claudemesh meshes"
|
||||||
|
is the highest-frequency UX gap users hit, and the answer is two
|
||||||
|
days of plumbing on top of what already exists. Federation is the
|
||||||
|
right answer for cross-organization reach; workspace is the right
|
||||||
|
answer for *one user, many meshes*. Both compose.
|
||||||
506
.artifacts/specs/2026-05-04-agentic-comms-architecture-v2.md
Normal file
506
.artifacts/specs/2026-05-04-agentic-comms-architecture-v2.md
Normal file
@@ -0,0 +1,506 @@
|
|||||||
|
---
|
||||||
|
title: claudemesh — full end-state architecture for agentic peer communication
|
||||||
|
status: draft (v2 — supersedes v1: removes time-boxed phasing, adds P2P data plane, applies Codex-2 correctness/scope-gap edits)
|
||||||
|
target: end-state (architectural milestones, not version timelines)
|
||||||
|
author: Alejandro + Claude (Codex GPT-5.2 cross-checked twice)
|
||||||
|
date: 2026-05-04
|
||||||
|
supersedes: 2026-05-04-agentic-comms-architecture.md (v1)
|
||||||
|
references:
|
||||||
|
- 2026-05-02-architecture-north-star.md (CLI-first commitment, push-pipe)
|
||||||
|
- 2026-05-04-per-session-presence.md (per-launch session pubkey + attestation)
|
||||||
|
- apps/cli/CHANGELOG.md (1.30.0–1.32.1 history)
|
||||||
|
---
|
||||||
|
|
||||||
|
# claudemesh — agentic peer communication, full end-state
|
||||||
|
|
||||||
|
## What this document is
|
||||||
|
|
||||||
|
The end-state architecture for claudemesh as a transport-agnostic agentic peer-comms platform. Not a release plan, not a sprint roadmap — the **shape** the system needs to converge on. Implementation order at the end is a *suggestion*, not a contract; time estimates are deliberately omitted because the surface is too cross-cutting to phase by weeks.
|
||||||
|
|
||||||
|
v1 of this spec (same date, no `-v2` suffix) treated the broker as the sole data plane. v2 corrects that: **the broker is a coordination plane (signaling, discovery, offline queue, fan-out, registry, revocation); the data plane is hybrid P2P** with broker fallback for the cases P2P can't cover. Closer to how Tailscale, libp2p, LiveKit, and modern WebRTC stacks work in production.
|
||||||
|
|
||||||
|
## TL;DR
|
||||||
|
|
||||||
|
- **Identity** — three keypair types (member, session, service) all rooted in a member's secret key. Member is durable, session is per-launch, service is a member-scoped delegate for non-Claude integrations. Every service has its own pubkey and explicit revocation.
|
||||||
|
- **Coordination plane** — broker handles signaling, peer discovery, offline message queue, group/topic fan-out, mesh state authority, revocation gossip. Always reachable.
|
||||||
|
- **Data plane** — hybrid:
|
||||||
|
- **P2P first** (WebRTC data channels, future: QUIC) when both peers online + NAT-traversable.
|
||||||
|
- **Broker-relayed** when peers are NAT-blocked, when one peer is offline, or for group/topic/broadcast where fan-out at the broker is structurally cheaper than N-way sender-side fan-out.
|
||||||
|
- **Pure broker** for service identities that can't run a P2P stack (HTTP webhook senders, OpenAI Assistants, browser SDKs without WebRTC).
|
||||||
|
- **Channels** — typed envelope (dm, group, topic, rpc, system, stream). Channel type drives crypto, routing, and transport selection. `meta` is required in v2 envelope.
|
||||||
|
- **Transports** — pluggable adapters under one interface: WS-to-broker (today), WebRTC P2P, HTTP webhook, future LiveKit/QUIC/etc. Broker negotiates which adapter a peer pair uses.
|
||||||
|
- **Crypto** — every direct message is E2E encrypted to recipient's pubkey regardless of transport. Broker never sees plaintext. P2P doesn't get any extra trust just because it's direct.
|
||||||
|
- **Delivery** — at-least-once **requires receiver ack** before broker marks `delivered_at`. The retry path before that is best-effort with idempotent dedupe at the receiver.
|
||||||
|
|
||||||
|
The CLI-first commitment from the North Star spec stays intact. Every channel type and every transport is invocable from `claudemesh <verb>`. MCP serves only `claude/channel` mid-turn push.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## The forcing functions (why this shape, not a smaller one)
|
||||||
|
|
||||||
|
1. **Multi-session interconnect already broke** (1.30.0 → 1.32.1) because the per-session WS subsystem shipped without push handler. Symptom of "broker is the data plane and we keep bolting on" thinking. Need to formalize roles and transport adapters before the next bolt-on.
|
||||||
|
|
||||||
|
2. **Codex review surfaced a correctness bug** in `drainForMember` — claims `delivered_at = NOW()` *before* WS push succeeds; if `ws.readyState !== OPEN` the row is marked delivered and message is lost. At-most-once with no retry. Inherited by every channel/transport added unless fixed at the foundation.
|
||||||
|
|
||||||
|
3. **The agentic-comms domain has standardized on hybrid P2P + central coordinator.** Tailscale (control plane + WireGuard P2P), LiveKit (signaling + SFU + P2P data channels), libp2p (DHT discovery + multi-transport), Iroh (gossip + QUIC P2P). Pure-broker is a 2010s pattern; pure-P2P is academic. Hybrid is the norm.
|
||||||
|
|
||||||
|
4. **claudemesh's pricing/economics demand P2P.** Every byte through the broker is your cost. Voice transcripts, file transfers, real-time tool I/O — bandwidth-heavy. P2P data plane lets the broker scale linearly with peer count, not message volume.
|
||||||
|
|
||||||
|
5. **Privacy/sovereignty matters as the agent ecosystem grows.** "Your agents talk to my agents" should default to peer-to-peer paths when possible. Broker as relay is fine; broker as forced middleman is not.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Audience for this architecture
|
||||||
|
|
||||||
|
| Peer type | Identity | Online presence | Data plane preference | Notes |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| **Claude Code session** | Per-launch session pubkey, member-attested | WS to broker (control + signaling) | P2P first, broker fallback | Mid-turn push via MCP `claude/channel` |
|
||||||
|
| **Daemon, no launch** (idle Mac with daemon running) | Member pubkey | WS to broker | Broker only (no P2P partner unless launched) | Receives broadcasts + member-targeted DMs |
|
||||||
|
| **Voice agent** (LiveKit, Pipecat) | Service identity, member-signed | LiveKit room + bridge | LiveKit room data channels intra-room; bridge over broker for cross-mesh | Side-car bridges room ↔ broker |
|
||||||
|
| **OpenAI Assistant / Anthropic Skill** | Service identity, scoped token | HTTP outbound, webhook inbound | Broker only (can't run P2P) | Daemon does delegated re-encryption |
|
||||||
|
| **Browser-based peer** (web dashboard, SDK) | Member or service identity | WS to broker, WebRTC for P2P | P2P-where-possible (browsers ARE WebRTC-native) | Full feature parity once on-mesh |
|
||||||
|
| **Webhook consumer** (Stripe-style passive) | Service identity | HTTP webhook inbound only | Broker only | Topic subscriptions; no inbound channel |
|
||||||
|
| **Bridge** (Slack, WhatsApp, IRC, Matrix) | Service identity per bridge + per-end-user delegated | WS to broker | Broker only for bridge ↔ broker; native protocol for bridge ↔ external | Trust delegated to bridge operator |
|
||||||
|
| **Cron / scheduled actor** | Member pubkey or service identity | Ephemeral; HTTP send only | Broker only | No long-lived connection |
|
||||||
|
| **CLI-only user** (no Claude Code) | Member pubkey | Ephemeral on each `claudemesh send` | Broker only | Command-line agent, queues via outbox |
|
||||||
|
|
||||||
|
Every row in this table works without changing the broker's coordination plane.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Layer 1: Identity
|
||||||
|
|
||||||
|
Three keypair types, one auth model.
|
||||||
|
|
||||||
|
### Member identity (durable)
|
||||||
|
- Ed25519 keypair, generated at `claudemesh join <invite>`. Held in `~/.claudemesh/config.json` per mesh.
|
||||||
|
- The auth boundary — grants, kicks, bans operate on members.
|
||||||
|
- Used for hello signature on the daemon's control-plane WS.
|
||||||
|
- Used as cryptographic root of trust for sibling sessions and service identities.
|
||||||
|
|
||||||
|
### Session identity (ephemeral, per-launch)
|
||||||
|
- Ed25519 keypair generated by each `claudemesh launch`. Held in process memory only.
|
||||||
|
- Parent-signed attestation vouches for it (TTL 12h, broker cap 24h). Rotation = new launch.
|
||||||
|
- Used for hello signature on the per-session WS, and as routing key for DMs targeted at *this specific launched session*.
|
||||||
|
- Session secret never touches disk; lives only in the daemon's `sessionBrokers` map keyed by IPC token.
|
||||||
|
|
||||||
|
### Service identity (third type, additive)
|
||||||
|
|
||||||
|
For non-Claude integrations that can't or shouldn't use a per-launch session.
|
||||||
|
|
||||||
|
```
|
||||||
|
ServiceIdentity {
|
||||||
|
service_id // Stable string id ("openai-assistant-foo", "livekit-room-bar")
|
||||||
|
service_pubkey // Ed25519 pubkey — the cryptographic identity. crypto_box targets this.
|
||||||
|
member_id // The mesh member that owns this service (auth boundary)
|
||||||
|
service_type // "openai-assistant" | "livekit-room" | "webhook" | "voice-agent" | ...
|
||||||
|
scopes // ["dm:read", "topic:write", "rpc:invoke", ...]
|
||||||
|
attestation // member-signed: { service_id, service_pubkey, scopes, expires_at, signature }
|
||||||
|
transport_hint // "ws" | "http-webhook" | "sse" | "livekit" — informs how the broker reaches it
|
||||||
|
delegate_daemon_pubkey? // Optional. Set when the daemon holds the service's secret on its behalf.
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Two flavors:
|
||||||
|
- **Holds-secret service** — has its own keypair (`service_pubkey` + service-secret kept by the service itself). Runs E2E crypto end-to-end. Voice agent side-cars, browser SDK, MQTT bridges.
|
||||||
|
- **Delegated service** — daemon holds the service-secret on the service's behalf. Senders still encrypt to `service_pubkey`; daemon decrypts on receipt and forwards plaintext (or re-signs) to the service via its `transport_hint`. Used by HTTP webhook consumers, OpenAI Assistants. Trust is in the daemon owner. `delegate_daemon_pubkey` records who's holding.
|
||||||
|
|
||||||
|
All three identity types resolve to a `member_id` for authorization. They differ in liveness (member = always; session = per-launch; service = scoped) and transport hint (member/session = WS-resident; service = polymorphic).
|
||||||
|
|
||||||
|
### Identity revocation (explicit)
|
||||||
|
|
||||||
|
Existing v1 left this implicit. v2 makes it concrete:
|
||||||
|
|
||||||
|
- **CLI verb:** `claudemesh service revoke <service_id>` (also `claudemesh peer revoke <pubkey>` for member revocation).
|
||||||
|
- **Broker effect:** add row to `revocation` table with `(mesh_id, revoked_pubkey, revoked_at, revoked_by, reason?)`. Drop any active WS for that pubkey (close 4002 "revoked"). Reject future helloes.
|
||||||
|
- **Drain effect:** `drainForMember` checks revocation list at drain time; ciphertext-in-flight from the revoked sender is dropped (sender already broker-acked, but recipient never sees it).
|
||||||
|
- **Gossip:** revocation events publish on the `system` channel (highest priority). Online peers cache; offline peers see on reconnect. Required so P2P sessions also honor revoke (otherwise a revoked peer's stored attestations could keep working over direct paths).
|
||||||
|
- **Latency target:** <30s for online peers to receive and apply.
|
||||||
|
- **Expiry vs revoke distinction:** `expires_at` is graceful (predictable, scheduled rotation); revoke is emergency (leaked secret, fired employee, compromised host). Both use the same revocation table; `expires_at` enforces silently when reached, revoke is logged as an audit event.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Layer 2: Coordination plane (the broker, properly scoped)
|
||||||
|
|
||||||
|
The broker is **not** the data plane. Its real responsibilities:
|
||||||
|
|
||||||
|
1. **Mesh state authority** — member roster, group memberships, topic registry, service registrations, revocation list. Source of truth for who's in a mesh and what they can do.
|
||||||
|
2. **Peer discovery** — `list_peers` returns currently-online presences. Broker is the only system that knows which peers are reachable now and over which transports.
|
||||||
|
3. **Signaling for P2P upgrades** — when peer A wants to open a P2P connection to peer B, A sends a SDP offer through the broker; B responds with an SDP answer through the broker. Once the data channel is up, broker is out of the path. Same as WebRTC signaling.
|
||||||
|
4. **Offline message queue** — when recipient is offline, broker stores the (encrypted) message until they reconnect. P2P can't do this without an "always-on peer" model, which is awkward to bootstrap.
|
||||||
|
5. **Group / topic / broadcast fan-out** — broker is the cheap fan-out point. Sender publishes once; broker delivers to N recipients. P2P fan-out (gossipsub) is possible but adds significant complexity for a feature most meshes won't need at scale.
|
||||||
|
6. **TURN-style relay for NAT-blocked pairs** — when P2P negotiation fails (symmetric NAT, restrictive corporate firewall), broker carries the data. Functionally equivalent to TURN.
|
||||||
|
7. **Revocation gossip publisher** — broker pushes revocation events to all online peers via the `system` channel; peers cache them.
|
||||||
|
8. **Audit log + persistence layer** — encrypted message metadata for compliance. Bodies are E2E-encrypted, so audit is over (sender, recipient, channel, timestamp, size), not content.
|
||||||
|
|
||||||
|
The broker is **NOT**:
|
||||||
|
- The default path for online-online direct messages (P2P should win).
|
||||||
|
- The decryptor for any direct message (E2E means broker sees ciphertext only).
|
||||||
|
- A bottleneck on bulk data (file transfer, voice, screen share — these go P2P or fail).
|
||||||
|
- The sole identity authority for active sessions (P2P sessions verify attestations locally via cached mesh state).
|
||||||
|
|
||||||
|
### Two roles per mesh on the WS layer (Codex-1 correction, kept)
|
||||||
|
|
||||||
|
Within the broker's WS surface, the daemon holds two roles per mesh, not one connection per launch:
|
||||||
|
|
||||||
|
- **Control-plane connection** — one per mesh, member-keyed. Carries: signaling + outbox drain + RPCs + broadcast/member-targeted inbound + revocation gossip subscription.
|
||||||
|
- **Session connections** — N per mesh, session-keyed. Carries: presence row keyed on session pubkey + signaling for P2P upgrades involving this session + inbound for session-targeted DMs that arrive via broker fallback.
|
||||||
|
|
||||||
|
A peer who's purely on the broker (no P2P) functions exactly as today. A peer who upgrades to P2P with another peer keeps its broker WS for the other roles.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Layer 3: Data plane (hybrid P2P + broker fallback)
|
||||||
|
|
||||||
|
The data plane is what carries actual message bodies. Three modes, selected per (sender, recipient, channel) tuple:
|
||||||
|
|
||||||
|
### Mode 1: Direct P2P (preferred when possible)
|
||||||
|
|
||||||
|
Two peers run a WebRTC data channel (or QUIC stream — pluggable, see Layer 4) between their daemons. Established via signaling through the broker; once up, broker is out of the path.
|
||||||
|
|
||||||
|
**When P2P is selected:**
|
||||||
|
- Both peers are online (have an active broker WS).
|
||||||
|
- Both peers' transports advertise P2P capability (WebRTC available; not a webhook-only service identity; not a browser without `RTCPeerConnection`).
|
||||||
|
- ICE negotiation succeeds (at least one candidate pair works — direct, server-reflexive, or peer-reflexive).
|
||||||
|
- Channel type is `dm`, `rpc`, or `stream` (the 1:1 cases).
|
||||||
|
|
||||||
|
**P2P session lifecycle:**
|
||||||
|
- Established lazily on first message (warm-up cost ~200ms; dominated by ICE + DTLS handshake). Subsequent messages reuse the channel.
|
||||||
|
- Idle timeout: 5min of no traffic → tear down. Re-established on next message.
|
||||||
|
- Hard timeout: 1h max regardless of activity, then re-handshake. Limits damage of compromised session keys.
|
||||||
|
- Either side can demote to broker-relay at any time; broker is the fallback always.
|
||||||
|
|
||||||
|
**Crypto on P2P:**
|
||||||
|
- DTLS handshake provides transport encryption (forward secrecy; recipient pubkey verified via cached attestation chain).
|
||||||
|
- Application-layer crypto_box ALSO runs on top — same as broker-relayed messages — so the wire format and decryption path are identical on the receiver side. Defense in depth, no special-case code.
|
||||||
|
|
||||||
|
### Mode 2: Broker-relayed (fallback)
|
||||||
|
|
||||||
|
The current path. Sender encrypts to recipient pubkey (member or session or service), pushes to broker via WS, broker queues, recipient pulls (or broker pushes to recipient's WS).
|
||||||
|
|
||||||
|
**When broker-relay is selected:**
|
||||||
|
- One peer offline → broker queues, delivers on reconnect.
|
||||||
|
- ICE negotiation fails → broker becomes the relay.
|
||||||
|
- Channel type is `group`, `topic`, or `broadcast` → broker fan-out is structurally cheaper than P2P fan-out for any group >2.
|
||||||
|
- Service identity at either end can't run P2P → broker is the only path.
|
||||||
|
|
||||||
|
**Crypto:** unchanged from today — E2E crypto_box, broker sees ciphertext only.
|
||||||
|
|
||||||
|
### Mode 3: Direct webhook (broker as broker, not as relay)
|
||||||
|
|
||||||
|
For service identities advertising `transport_hint: "http-webhook"`. Sender encrypts to service's `service_pubkey` (or to delegate-daemon's pubkey for delegated services), broker POSTs the ciphertext to the service's registered URL with HMAC signature + retry. No long-lived connection on the service side.
|
||||||
|
|
||||||
|
This is functionally a "broker queue, custom delivery transport" — broker still mediates, but delivery is HTTP not WS.
|
||||||
|
|
||||||
|
### Selection logic (deterministic, sender-side)
|
||||||
|
|
||||||
|
```
|
||||||
|
function pickTransport(sender, recipient, channel) -> Transport:
|
||||||
|
if channel in [group, topic, broadcast]:
|
||||||
|
return broker.relay # fan-out semantics
|
||||||
|
|
||||||
|
if recipient.transport_hint == "http-webhook":
|
||||||
|
return broker.relay # broker calls webhook
|
||||||
|
|
||||||
|
if recipient is offline:
|
||||||
|
return broker.queue # store-and-forward
|
||||||
|
|
||||||
|
if !recipient.capabilities.p2p:
|
||||||
|
return broker.relay # one-end can't P2P
|
||||||
|
|
||||||
|
if !sender.capabilities.p2p:
|
||||||
|
return broker.relay # we can't P2P
|
||||||
|
|
||||||
|
if has_active_p2p_session(sender, recipient):
|
||||||
|
return p2p.session # warm path
|
||||||
|
|
||||||
|
attempt_p2p_handshake(sender, recipient, timeout=2s) ->
|
||||||
|
if ok: return p2p.session
|
||||||
|
else: return broker.relay # fall through, log degraded
|
||||||
|
```
|
||||||
|
|
||||||
|
Policy lives in the daemon's send path. Broker doesn't know or care — it sees only the messages that actually go through it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Layer 4: Transport adapters (pluggable)
|
||||||
|
|
||||||
|
A transport adapter is an implementation of how *one peer pair* moves bytes. Defined by an interface; new adapters added without touching upper layers.
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface PeerTransport {
|
||||||
|
readonly kind: string; // "ws-broker" | "webrtc-p2p" | "http-webhook" | ...
|
||||||
|
|
||||||
|
readonly capabilities: {
|
||||||
|
p2p: boolean;
|
||||||
|
bidirectional: boolean;
|
||||||
|
midTurnPush: boolean;
|
||||||
|
maxMessageBytes: number;
|
||||||
|
streamingChunks: boolean;
|
||||||
|
};
|
||||||
|
|
||||||
|
open(opts: TransportOpenOpts): Promise<TransportSession>;
|
||||||
|
send(envelope: Envelope): Promise<TransportSendResult>;
|
||||||
|
inbound(): AsyncIterable<Envelope>;
|
||||||
|
heartbeat(): Promise<boolean>;
|
||||||
|
close(reason?: string): Promise<void>;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Concrete adapters at end-state
|
||||||
|
|
||||||
|
1. **`WsBrokerTransport`** — current code. WebSocket to `wss://ic.claudemesh.com/ws`. Underpins both broker-relay (Mode 2) and signaling for P2P upgrades.
|
||||||
|
2. **`WebRtcP2pTransport`** — RTCPeerConnection + RTCDataChannel. Browser, Node (`node-datachannel` or similar), CLI all supported. Chunking handled at envelope layer for `stream` channel.
|
||||||
|
3. **`HttpWebhookTransport`** — outbound HTTP POST to broker `/v1/send`; inbound HTTP POST to a registered webhook URL. Unidirectional from peer's perspective. Mid-turn push: no.
|
||||||
|
4. **`LiveKitRoomTransport`** — for voice agents. Side-car bridges a LiveKit room to claudemesh. Maps a LiveKit participant → claudemesh service identity.
|
||||||
|
|
||||||
|
Future adapters TBD as concrete needs surface — no commitments here. (v1 listed MQTT/gRPC/SSE as future named adapters; v2 drops the named list per Codex-2 should-cut feedback.)
|
||||||
|
|
||||||
|
The peer's daemon advertises transport capabilities at hello time; broker stores them in the presence row; senders consult them via `list_peers` (capability fields added to the response).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Layer 5: Channels (typed envelope)
|
||||||
|
|
||||||
|
Channels define **semantics**: what the message means, what crypto to apply, what delivery guarantees, what fan-out, what backpressure.
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
type ChannelType =
|
||||||
|
| "dm" // 1:1 direct, encrypted to recipient pubkey, at-least-once with ack
|
||||||
|
| "group" // post to named group, per-recipient encrypt or symmetric, at-least-once with ack
|
||||||
|
| "topic" // pub/sub topic, persisted history, per-topic symmetric key, at-least-once with ack
|
||||||
|
| "rpc" // request/response with correlation id + timeout, exactly-once via dedupe
|
||||||
|
| "system" // peer_joined / peer_left / topology / lifecycle / revocation (broker-originated)
|
||||||
|
| "stream"; // long-lived ordered chunks, idempotent per (stream_id, chunk_id)
|
||||||
|
|
||||||
|
interface Envelope {
|
||||||
|
v: 2;
|
||||||
|
channel: ChannelType;
|
||||||
|
/** Routing target — meaning depends on channel:
|
||||||
|
* dm: recipient pubkey (member, session, or service)
|
||||||
|
* group: group name (e.g. "@admins")
|
||||||
|
* topic: topic id (e.g. "#abc123")
|
||||||
|
* rpc: recipient pubkey
|
||||||
|
* system: ignored (sender-determined fan-out; broker fills in)
|
||||||
|
* stream: recipient pubkey (the stream_id is in meta.streamId — see below) */
|
||||||
|
target: string;
|
||||||
|
/** Sender identity pubkey (member, session, or service). */
|
||||||
|
from: string;
|
||||||
|
/** Encrypted payload. Channel + recipient determines crypto recipe:
|
||||||
|
* dm/rpc/stream: crypto_box to recipient pubkey
|
||||||
|
* group: per-recipient seal (or symmetric in v3)
|
||||||
|
* topic: per-topic symmetric key (v0.2.0 spec)
|
||||||
|
* system: broker-signed, plaintext metadata (event has no body) */
|
||||||
|
body: { nonce: string; ciphertext: string; bodyVersion: number };
|
||||||
|
/** Required in v2 (was optional in v1). Even minimal envelopes must carry
|
||||||
|
* clientMessageId for idempotent dedupe. */
|
||||||
|
meta: {
|
||||||
|
clientMessageId: string; // REQUIRED — idempotency id (spec §4.2)
|
||||||
|
requestFingerprint?: string;
|
||||||
|
priority?: "now" | "next" | "low"; // dm: gates mid-turn push; group/topic: fan-out priority
|
||||||
|
timeoutMs?: number; // rpc only
|
||||||
|
streamId?: string; // REQUIRED for channel:"stream"; identifies the stream
|
||||||
|
streamChunkId?: number; // stream only; monotonic; receiver dedupes
|
||||||
|
streamTerminator?: boolean; // stream only; signals end
|
||||||
|
rpcCorrelationId?: string; // rpc only; back-edge for response
|
||||||
|
rpcResponse?: boolean; // rpc only; this is a response, not request
|
||||||
|
replyToId?: string; // dm/topic threading
|
||||||
|
mentions?: string[]; // dm/topic; @-callouts
|
||||||
|
expiresAt?: number; // any; broker drops past this; default 7d for queued
|
||||||
|
};
|
||||||
|
/** Sender Ed25519 signature over canonical bytes. Verified by recipient
|
||||||
|
* (and by broker for system-message origin). */
|
||||||
|
signature: string;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Stream concurrency
|
||||||
|
|
||||||
|
For `channel: "stream"`, **`meta.streamId` is required**. Two concurrent streams to the same recipient pubkey use distinct streamIds; receiver demuxes by `(from, streamId)`. Without this, multi-stream voice transcripts or file transfers from the same peer would collide.
|
||||||
|
|
||||||
|
### Crypto by channel
|
||||||
|
|
||||||
|
- `dm`, `rpc`, `stream` → crypto_box(plaintext, recipient_pubkey, sender_secretkey). Receiver verifies attestation chain to ensure recipient_pubkey is a valid identity rooted in a current member.
|
||||||
|
- `group` → for now: per-recipient crypto_box (sender encrypts N times, broker fans out). Future: hybrid Curve25519 → AES-GCM with sender key wrap, like Signal Sender Keys.
|
||||||
|
- `topic` → per-topic symmetric key (already in v0.2.0 spec). Key rotation = new topic + members re-subscribe. Keys distributed via DM at join time, encrypted to each member's pubkey.
|
||||||
|
- `system` → broker is the signer; receivers verify against the broker's published Ed25519 pubkey. Plaintext bodies allowed since these are operational events.
|
||||||
|
|
||||||
|
### Delivery semantics (Codex-2 correction applied)
|
||||||
|
|
||||||
|
**At-least-once requires receiver ack.** Today's broker sets `delivered_at = NOW()` inside the claim CTE before WS push succeeds — that's at-most-once with no retry. The end-state behavior:
|
||||||
|
|
||||||
|
1. Sender's daemon writes to outbox (durable).
|
||||||
|
2. Drain worker sends to broker; broker acks with `client_message_id` echo (this is sender → broker delivery ack, NOT end-to-end).
|
||||||
|
3. Broker queues with `claimed_at` NULL, `delivered_at` NULL.
|
||||||
|
4. On recipient hello / push opportunity: broker claims by setting `claimed_at = NOW(), claim_id = <presenceId>` (lease 30s).
|
||||||
|
5. Broker `sendToPeer` writes to WS / P2P / webhook.
|
||||||
|
6. Receiver processes envelope and emits `client_ack { clientMessageId }` back to broker.
|
||||||
|
7. Broker sets `delivered_at = NOW()` ON ACK RECEIPT.
|
||||||
|
8. If lease expires without ack → broker re-eligible to claim and re-deliver.
|
||||||
|
9. Receiver dedupes by `clientMessageId` (idempotent insert into inbox).
|
||||||
|
|
||||||
|
Until ack is wired (transitional state), the transitional label is **best-effort retry with idempotent dedupe**, not at-least-once. The outbox + claim/lease + dedupe combination upgrades to at-least-once when the ack path is in place.
|
||||||
|
|
||||||
|
`rpc` exactly-once is the same path with the addition that the response carries the `rpcCorrelationId`; sender retries the request until response received OR `timeoutMs` elapses; receiver-side dedupe ensures the handler runs at most once.
|
||||||
|
|
||||||
|
### Mid-turn push
|
||||||
|
|
||||||
|
`channel: "dm"` with `meta.priority: "now"` and recipient is a launched Claude Code session → recipient's daemon emits `claude/channel` MCP push; the session's Claude Code reads it mid-turn. Other priorities deliver via `claudemesh inbox` poll or at next tool boundary.
|
||||||
|
|
||||||
|
### Reply threading + mentions
|
||||||
|
|
||||||
|
Uniform across `dm` and `topic`: `meta.replyToId` references the original message's `clientMessageId`. `meta.mentions` is an array of pubkeys (or `@<group>`) — UI/CLI surfaces them; broker doesn't enforce.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Layer 6: Mesh state — broker authority + signed gossip
|
||||||
|
|
||||||
|
The mesh state (members, groups, topics, services, revocations, policies) needs both:
|
||||||
|
|
||||||
|
- **Authority** — single source of truth. The broker DB. Mutations (add member, revoke, change policy) go through broker, signed by mesh owner / admin.
|
||||||
|
- **Replication** — every peer needs a current-enough copy to authorize incoming P2P messages locally (otherwise revoke can't be enforced when peers chat directly).
|
||||||
|
|
||||||
|
End-state: broker publishes signed mesh-state-update events on the `system` channel; peers cache and apply. Conflict resolution is trivial because broker is authority — peers merge updates by version vector. Eventually consistent in seconds, not the open-ended convergence of CRDT-only systems.
|
||||||
|
|
||||||
|
For peer revocation specifically: revocation gossip is highest priority and must propagate within 30s to all online peers. Offline peers see it on reconnect.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Crypto — what doesn't change vs what does
|
||||||
|
|
||||||
|
### Doesn't change
|
||||||
|
- Per-peer Ed25519 keypairs (member + session + service).
|
||||||
|
- crypto_box (Curve25519 + XSalsa20 + Poly1305) for DMs/RPC/stream.
|
||||||
|
- Parent-attestation flow for sessions and services.
|
||||||
|
|
||||||
|
### Does change (additive)
|
||||||
|
- DTLS layer underneath WebRTC P2P (transport-level encryption for fingerprint binding).
|
||||||
|
- Per-topic symmetric keys (v0.2.0 baseline; v2 makes it a hard requirement for topics).
|
||||||
|
- Broker signing key for `system` channel events (single Ed25519 keypair the broker holds; pubkey published in mesh state).
|
||||||
|
- Service identity attestations carry `service_pubkey` + `scopes`.
|
||||||
|
- Forward-secrecy for long-lived P2P sessions: post-handshake, derive a fresh symmetric key per session epoch (1h max); rotate.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migration order (architectural milestones, NO time estimates)
|
||||||
|
|
||||||
|
The end-state above doesn't ship in one PR. The following ordering minimizes regression risk and lets each milestone be useful on its own. **No weeks/sprints attached** — work proceeds when the prior milestone is stable.
|
||||||
|
|
||||||
|
### Milestone 1 — Foundational correctness
|
||||||
|
*Required before anything else. Without this, every later milestone inherits the bugs.*
|
||||||
|
|
||||||
|
- Extract `connectWsWithBackoff` helper. Refactor `DaemonBrokerClient` and `SessionBrokerClient` to use it. Eliminates the drift bug class.
|
||||||
|
- Drop daemon's stray `sessionPubkey` field (or rename + document).
|
||||||
|
- Tighten daemon-WS inbound filter — `*` broadcasts and member-targeted DMs only; session-targeted DMs land on session WS exclusively.
|
||||||
|
- Add `presence.role` column at broker (`control-plane | session | service`); list_peers + fan-out + reconnect honor it.
|
||||||
|
- **Fix broker drain race** — schema migration adds `claimed_at`, `claim_id`, `claim_expires_at` columns. Rewrite `drainForMember` for two-phase claim/deliver. Re-claim if `claimed_at` older than lease (30s).
|
||||||
|
- Receiver-side `client_ack` for at-least-once with ack (Codex-2 correction). Without ack wiring this stays at "best-effort retry with idempotent dedupe."
|
||||||
|
- Receiver-side dedupe: idempotent insert on `clientMessageId`; finished + made required for v2 envelopes.
|
||||||
|
|
||||||
|
### Milestone 2 — Capability advertisement + transport abstraction
|
||||||
|
*Sets up the interface. No new transport yet.*
|
||||||
|
|
||||||
|
- Define `PeerTransport` interface; refactor existing WS code to be the first implementation. No behavioral change.
|
||||||
|
- Add capabilities field to hello payload + presence row + `list_peers` response.
|
||||||
|
- Define `Envelope v2` schema with `meta` required + `streamId` requirement on `stream` channel. Broker accepts both v1 and v2 (v1 auto-upgraded server-side by inferring `channel` from `targetSpec` shape). Senders start emitting v2.
|
||||||
|
|
||||||
|
### Milestone 3 — Service identity + HTTP webhook transport
|
||||||
|
*First non-WS transport. Validates abstraction. Includes revocation.*
|
||||||
|
|
||||||
|
- Service identity registration: `claudemesh service register --type webhook --pubkey <hex> --scopes ...` mints attestation, stores broker-side. Service pubkey explicit in attestation.
|
||||||
|
- Service revocation: `claudemesh service revoke <service_id>` writes broker denylist + closes any active connections + publishes `system` revocation event.
|
||||||
|
- Add `HttpWebhookTransport` (broker-side outbound: POST with HMAC + retry; daemon-side inbound: HTTP server receives webhook callbacks → handleBrokerPush).
|
||||||
|
- Add `/v1/send` HTTP POST endpoint on broker (today broker is WS-only for sends).
|
||||||
|
- Demo: cron job using only `curl` posts to mesh; webhook subscriber receives.
|
||||||
|
- (`SseTransport` deferred — Codex-2 should-cut feedback. Pull in when concrete browser need arises.)
|
||||||
|
|
||||||
|
### Milestone 4 — Typed channels: rpc, stream, system
|
||||||
|
*Channel layer becomes real.*
|
||||||
|
|
||||||
|
- `channel: "rpc"` end-to-end: correlation id routing through any transport, response timeout, `claudemesh rpc <peer> <method> <args>` CLI verb.
|
||||||
|
- `channel: "stream"` end-to-end: chunked + ordered + idempotent, multi-stream demux via `meta.streamId`, `claudemesh stream <peer> <stream-id>` CLI verb.
|
||||||
|
- `channel: "system"` formalized (broker-signed events for peer_joined, peer_left, topology, revocation, mesh-state-updates).
|
||||||
|
|
||||||
|
### Milestone 5 — P2P data plane (WebRTC adapter)
|
||||||
|
*The big architectural shift. Broker becomes coordinator, not data path.*
|
||||||
|
|
||||||
|
- Add `WebRtcP2pTransport` adapter. Uses `node-datachannel` (or libdatachannel binding) on Node; native WebRTC in browser.
|
||||||
|
- Add signaling protocol over the existing broker WS:
|
||||||
|
- `p2p_offer` (sender → broker → recipient): SDP offer + ICE candidates.
|
||||||
|
- `p2p_answer` (recipient → broker → sender): SDP answer + ICE candidates.
|
||||||
|
- `p2p_candidate` (either way): trickle ICE candidates.
|
||||||
|
- All signaling messages are broker-attested (only valid sender/recipient pairs).
|
||||||
|
- Add `pickTransport()` policy in daemon send path.
|
||||||
|
- Add P2P session manager: warm-cache, idle timeout, hard timeout, demote-to-broker on failure.
|
||||||
|
- Tag broker-relayed messages that *could have* gone P2P with a metric, so degradation rate is observable.
|
||||||
|
|
||||||
|
### Milestone 6 — Mesh state replication + revocation gossip
|
||||||
|
*Required before P2P is safe at scale.*
|
||||||
|
|
||||||
|
- Broker publishes signed `system` events for all mesh state mutations.
|
||||||
|
- Peers subscribe; cache and apply.
|
||||||
|
- Revocation propagation latency target: <30s for online peers.
|
||||||
|
- P2P sessions verify peer identity against cached state on every message (cheap, just a map lookup).
|
||||||
|
|
||||||
|
### Milestone 7 — External integrations (proof points, parallel)
|
||||||
|
*One PoC per category to validate the architecture, opportunistically.*
|
||||||
|
|
||||||
|
- LiveKit side-car (validates LiveKit room transport).
|
||||||
|
- OpenAI Assistant (validates delegated-key crypto + webhook transport).
|
||||||
|
- WhatsApp / Slack bridge (validates human-bridge service identity).
|
||||||
|
- Browser SDK (validates browser as a peer; uses WebRTC adapter natively).
|
||||||
|
|
||||||
|
### Milestone 8 — Group/topic crypto upgrade
|
||||||
|
*Group fan-out crypto efficiency.*
|
||||||
|
|
||||||
|
- Sender Keys protocol for group: sender derives group key, encrypts content once, encrypts group key per-recipient. Avoids N-way encryption per message.
|
||||||
|
- Per-topic key rotation policy (member join → optional re-key; member leave → forced re-key).
|
||||||
|
|
||||||
|
### Beyond Milestone 8
|
||||||
|
- Future transport adapters as concrete needs surface (no commitments).
|
||||||
|
- Multi-broker federation (mesh spans multiple brokers; gossip across).
|
||||||
|
- Onion routing option for adversarial environments.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Non-goals (explicit)
|
||||||
|
|
||||||
|
- **Replacing Slack / Discord / Matrix as a human chat product.** claudemesh is for agent coordination; humans participate via bridges or direct DMs but UX is CLI-first.
|
||||||
|
- **Pure-P2P with no central coordinator.** The broker stays — for offline queue, group fan-out, mesh authority, revocation. "P2P-first hybrid" is the commitment, not "P2P-only."
|
||||||
|
- **Replacing the MCP `claude/channel` push-pipe.** Mid-turn interrupt stays MCP. The data-plane changes don't touch the daemon-to-Claude-Code path.
|
||||||
|
- **Real-time media (audio/video) directly in claudemesh data channels.** Bandwidth-heavy media goes through dedicated stacks (LiveKit, WebRTC SFU). claudemesh metadata + signaling glues them.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Open questions
|
||||||
|
|
||||||
|
1. **Mid-turn push when sender is on P2P session.** P2P delivery to recipient's daemon → daemon emits MCP push. Same shape as broker-delivered. Confirm the MCP push respects per-session targeting (different session pubkey siblings of the same member).
|
||||||
|
|
||||||
|
2. **Browser peers and NAT traversal.** Browser ↔ browser via WebRTC works. Browser ↔ daemon (Node WebRTC binding) — needs testing under symmetric NAT. May require running a STUN server (Google's for now; eventually self-hosted). TURN fallback uses the broker WS.
|
||||||
|
|
||||||
|
3. **Backpressure on stream channel.** WebRTC data channels have built-in flow control. Broker-relayed streams need per-stream backpressure signaling to avoid OOM at the broker. Proposal: receiver advertises `stream_window_bytes` periodically; sender pauses when used.
|
||||||
|
|
||||||
|
4. **Multi-region brokers.** Today single broker. If we add a second broker (or federation), how do peers in mesh A on broker 1 talk to peers in mesh A on broker 2? Out of scope here; separate spec when forced.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Acknowledgements
|
||||||
|
|
||||||
|
**Codex-1 (initial architecture review of existing code) caught:**
|
||||||
|
- "Remove daemon-WS inbound entirely" idea silently loses broadcasts + member-targeted DMs whenever zero launches exist. Corrected → retained.
|
||||||
|
- Inheritance for the dup'd lifecycle would become a god class. Composition via helper kept.
|
||||||
|
- Drain race needs `claimed_at` + delivered-on-success; "check OPEN before claim" still drops on crash. Kept.
|
||||||
|
- Token-keyed registry is correct (token = auth boundary), not a smell. Kept.
|
||||||
|
|
||||||
|
**Codex-2 (single-pass review of v1 of this spec) caught:**
|
||||||
|
- At-least-once requires receiver ack, not just "set delivered_at on success." → Layer 5 delivery semantics rewritten to require client_ack.
|
||||||
|
- Service identity needs explicit `service_pubkey` field, included in attestation. → Added to ServiceIdentity definition.
|
||||||
|
- v2 envelope `meta` should be non-optional with `clientMessageId` always present. → meta is now required.
|
||||||
|
- Service identity needed explicit revocation/disable story. → New CLI verb `claudemesh service revoke`, broker denylist, system-channel gossip propagation.
|
||||||
|
- `streamId` location ambiguous; concurrent streams to same peer would collide. → `meta.streamId` made REQUIRED for `channel: "stream"`.
|
||||||
|
- Defer `SseTransport` from Milestone 3. → Done.
|
||||||
|
- Drop named future-adapter list (MQTT/gRPC) to avoid false commitments. → Done.
|
||||||
|
|
||||||
|
The hybrid P2P data plane, transport adapter abstraction, typed channel envelope, mesh state replication, and milestone reordering are mine. Codex's reviews were targeted at correctness/scope-gap/should-cut, not redesign.
|
||||||
|
|
||||||
|
**This spec is now frozen for implementation.** No further architectural drift; deviations during implementation surface as new spec-deltas with explicit rationale, not silent edits to this document.
|
||||||
360
.artifacts/specs/2026-05-04-agentic-comms-architecture.md
Normal file
360
.artifacts/specs/2026-05-04-agentic-comms-architecture.md
Normal file
@@ -0,0 +1,360 @@
|
|||||||
|
---
|
||||||
|
title: claudemesh as agentic communication platform — architecture spec
|
||||||
|
status: draft
|
||||||
|
target: 2.0.0 (foundational cleanup) → 2.1.0 (transport adapters) → 2.2.0 (channel typing)
|
||||||
|
author: Alejandro + Claude (cross-checked with Codex GPT-5.2)
|
||||||
|
date: 2026-05-04
|
||||||
|
supersedes: none
|
||||||
|
references:
|
||||||
|
- 2026-05-02-architecture-north-star.md (CLI-first commitment, push-pipe)
|
||||||
|
- 2026-05-04-per-session-presence.md (per-launch session pubkey + attestation)
|
||||||
|
- apps/cli/CHANGELOG.md (1.30.0–1.32.1 history)
|
||||||
|
---
|
||||||
|
|
||||||
|
# claudemesh as agentic communication platform
|
||||||
|
|
||||||
|
## TL;DR
|
||||||
|
|
||||||
|
Today claudemesh is a **peer mesh for Claude Code sessions** — broker + CLI + per-session WS, encrypted DMs, peer list, mid-turn push via MCP. Tomorrow it has to be a **transport-agnostic agentic communication platform** that:
|
||||||
|
|
||||||
|
- treats Claude Code as **one channel type** among many (with first-class support for mid-turn interrupts via `claude/channel`)
|
||||||
|
- accepts **non-Claude agents** as peers — voice agents (LiveKit/Pipecat), OpenAI Assistants, raw HTTP webhook consumers, scheduled cron actors, human IM bridges
|
||||||
|
- exposes **typed channels** (DM, group, topic, RPC, system event, stream) so message semantics aren't shoved through one `targetSpec` string
|
||||||
|
- has a **pluggable transport layer** so a peer can join the mesh over WS, HTTP webhook, SSE, MQTT, or gRPC without changing the broker's data plane
|
||||||
|
- preserves **end-to-end encryption** as a non-negotiable for direct messages
|
||||||
|
|
||||||
|
This document specifies the architecture in three layers (identity, transport, channel), the foundational cleanup needed before adding any of it (Codex caught a few sharp issues), and the migration path that gets us there without a "v2 rewrite" event.
|
||||||
|
|
||||||
|
The CLI-first commitment from the North Star spec stays intact — every channel type and transport adapter must be invocable from `claudemesh <verb>` first, with MCP serving only `claude/channel` push.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Why now
|
||||||
|
|
||||||
|
Three forcing functions:
|
||||||
|
|
||||||
|
1. **Multi-session interconnect already broke** (1.30.0 → 1.32.1). The per-session WS subsystem shipped without a push handler because the architecture assumed "one daemon WS per mesh handles everything" and then we bolted session WSes on top without finishing the inbound side. The shape is right; the wiring was incomplete. We need to formalize the role split before adding more transports.
|
||||||
|
|
||||||
|
2. **Codex review surfaced a correctness bug in the broker's drain.** `drainForMember` claims rows by setting `delivered_at = NOW()` *before* the WS push succeeds. If `ws.readyState !== OPEN` at push time, the row is marked delivered and the message is gone. This is at-most-once with no retry. Any future channel type or transport adapter inherits this bug if we don't fix it at the foundation.
|
||||||
|
|
||||||
|
3. **The agentic-comms market is becoming a thing.** Voice agents (LiveKit, Pipecat, ElevenLabs Conversational), OpenAI Assistants threads, MCP servers acting as autonomous workers, scheduled cron actors — they all need a "mesh" to coordinate. claudemesh has the right primitives (E2E crypto, peer presence, typed routing); it just needs the architecture to admit non-Claude peers without forking the codebase.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Audience for this architecture
|
||||||
|
|
||||||
|
| Peer type | Identity | Transport | Channels they speak |
|
||||||
|
|---|---|---|---|
|
||||||
|
| **Claude Code session** (today) | Per-launch session pubkey, parent-attested by member key | WS to broker | DM, group, topic, system events; receives mid-turn push via MCP `claude/channel` |
|
||||||
|
| **Headless agent** (e.g. cron job, Hermes/OpenClaw worker) | Member pubkey (no per-launch session) | WS to broker, OR HTTP webhook outbound | DM, group, topic; no mid-turn push (polls inbox) |
|
||||||
|
| **Voice agent** (LiveKit/Pipecat call) | Service identity (signed by mesh owner) | WS to broker, possibly via TURN relay | DM (transcript stream), group (call participants), system events (call lifecycle) |
|
||||||
|
| **OpenAI Assistant / Anthropic Agent** (Skill SDK) | Service identity, OAuth-style scoped token | HTTP webhook (server-side push) OR WS | DM, RPC (tool-style request/response) |
|
||||||
|
| **Human via Slack/WhatsApp bridge** | Service identity for the bridge, end-user mapped via membership | WS (bridge to broker) | DM, topic |
|
||||||
|
| **Webhook consumer** (Stripe-style passive listener) | Service identity, scoped to one channel | HTTP webhook outbound only | Topic (subscribe to events) |
|
||||||
|
|
||||||
|
Every row in this table needs to work without changing the broker's data plane.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Layer 1: Identity
|
||||||
|
|
||||||
|
### Today
|
||||||
|
|
||||||
|
Two identity types coexist:
|
||||||
|
|
||||||
|
- **Member identity** — stable Ed25519 keypair held in `~/.claudemesh/config.json`. One per joined mesh. Used for hello signature on the daemon's main WS; used as the cryptographic root of trust for sibling sessions.
|
||||||
|
- **Session identity** — ephemeral Ed25519 keypair generated per `claudemesh launch`. Parent-signed attestation vouches for it (TTL 12h, broker cap 24h). Used for hello signature on the per-session WS; used as the routing key for DMs targeted at *this specific launched session*.
|
||||||
|
|
||||||
|
This is enough for Claude Code peers. It's not enough for the audience table above.
|
||||||
|
|
||||||
|
### Proposed: third identity type — **service identity**
|
||||||
|
|
||||||
|
A service identity is what a non-Claude integration uses to authenticate:
|
||||||
|
|
||||||
|
```
|
||||||
|
ServiceIdentity {
|
||||||
|
member_id // The mesh member that owns this service (auth boundary)
|
||||||
|
service_id // Stable id for the service ("openai-assistant-foo", "livekit-room-bar")
|
||||||
|
service_type // "openai-assistant" | "livekit-room" | "webhook" | "voice-agent" | ...
|
||||||
|
scopes // ["dm:read", "topic:write", "rpc:invoke", ...]
|
||||||
|
attestation // member-signed: { service_id, scopes, expires_at, signature }
|
||||||
|
transport_hint // "ws" | "http-webhook" | "sse" — informs how the broker reaches it
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Three identity types, one auth model:**
|
||||||
|
- All identities resolve to a `member_id` (the auth boundary — grants, kicks, bans operate on members).
|
||||||
|
- Identities differ in *liveness* (member = always; session = per-launch; service = scoped/scheduled) and in *transport hint* (member/session = WS-resident; service = polymorphic).
|
||||||
|
|
||||||
|
**Backward compatibility:** existing member + session identities are unchanged. Service identity is additive.
|
||||||
|
|
||||||
|
### Cryptographic implications
|
||||||
|
|
||||||
|
- E2E encryption (`crypto_box`) targets a public key. Member pubkey, session pubkey, service pubkey all work the same way.
|
||||||
|
- A service that can't hold a long-lived secret (e.g. OpenAI Assistant calling out via HTTPS) gets a **delegated identity** the daemon holds — sender encrypts to the daemon's per-member key, daemon re-encrypts and forwards over the service's webhook. This adds trust in the daemon, but it's the only way to bridge to non-crypto-native peers without giving them raw secrets.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Layer 2: Transport
|
||||||
|
|
||||||
|
### Today
|
||||||
|
|
||||||
|
One transport: **WebSocket to broker** (`wss://ic.claudemesh.com/ws`). Everything goes through it — hello, send, push, RPC. The CLI's daemon holds two WS instances per mesh (member-keyed `DaemonBrokerClient` + per-launch `SessionBrokerClient`).
|
||||||
|
|
||||||
|
### Proposed: transport adapter interface
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
interface BrokerTransport {
|
||||||
|
/** One-time hello + auth handshake. Identity is opaque to the transport. */
|
||||||
|
connect(opts: TransportConnectOpts): Promise<TransportSession>;
|
||||||
|
|
||||||
|
/** Send a typed envelope. Returns a delivery promise (ack or terminal failure). */
|
||||||
|
send(envelope: Envelope): Promise<SendResult>;
|
||||||
|
|
||||||
|
/** Stream of inbound envelopes. Pull-model so a transport can be a webhook,
|
||||||
|
* not just a long-lived socket. */
|
||||||
|
inbound(): AsyncIterable<Envelope>;
|
||||||
|
|
||||||
|
/** Close cleanly. */
|
||||||
|
close(reason?: string): Promise<void>;
|
||||||
|
|
||||||
|
/** Capabilities surfaced to the daemon — broker uses this to decide
|
||||||
|
* whether mid-turn push is possible, whether RPC blocks are
|
||||||
|
* supported, etc. */
|
||||||
|
capabilities: TransportCapabilities;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Concrete adapters at v2.1.0:**
|
||||||
|
|
||||||
|
1. **`WsBrokerTransport`** — current WS implementation. The `DaemonBrokerClient` and `SessionBrokerClient` are recast as two roles using this transport with different hello payloads.
|
||||||
|
2. **`HttpWebhookTransport`** — for service identities that can't hold a WS open. Outbound: HTTP POST to the broker's `/v1/send`. Inbound: broker calls back to a registered webhook URL with retry + signature. Mid-turn push is not possible (degrades gracefully).
|
||||||
|
3. **`SseTransport`** — for browsers / restricted environments. Outbound: HTTP POST. Inbound: SSE stream from broker to client.
|
||||||
|
|
||||||
|
**Future adapters (v2.3+):**
|
||||||
|
|
||||||
|
4. **`LiveKitTransport`** — for voice agents. The "broker" is a LiveKit room; messages are LiveKit data-channel packets. Bridges to the central broker via a daemon side-car.
|
||||||
|
5. **`MqttTransport`** — for IoT / fleet scenarios.
|
||||||
|
6. **`GrpcTransport`** — for low-latency intra-cluster.
|
||||||
|
|
||||||
|
Any new adapter implements the same interface; broker logic is transport-agnostic at the API boundary.
|
||||||
|
|
||||||
|
### The two-role model (Codex's correction)
|
||||||
|
|
||||||
|
Even within one transport, the daemon holds **two roles per mesh**, not one connection per launch:
|
||||||
|
|
||||||
|
- **Control-plane connection** — one per mesh, member-keyed. Carries: outbox drain (one queue, can't race), `list_peers`/state/memory/skill RPCs, inbound for `*` broadcasts and member-targeted DMs (legacy traffic + zero-launch state).
|
||||||
|
- **Session connections** — N per mesh, session-keyed. Carries: presence row keyed on session pubkey, inbound for session-targeted DMs.
|
||||||
|
|
||||||
|
This is what we have today; the spec just makes the role split explicit. The mistake in 1.30.0–1.32.0 was treating session connections as "presence-only" instead of "second-class peers." 1.32.1 corrects that.
|
||||||
|
|
||||||
|
### Foundational cleanup (ship first, before any new transport)
|
||||||
|
|
||||||
|
1. **Extract `connectWsWithBackoff` helper** — current `DaemonBrokerClient` and `SessionBrokerClient` duplicate the WS lifecycle (open, hello, ack-timeout, close, backoff, reconnect). Codex's recommendation: composition, not inheritance. A single helper takes `{ url, buildHello, onMessage, onStatusChange }` and both clients call it. Eliminates the drift bug class that produced session_replaced thrashing.
|
||||||
|
|
||||||
|
2. **Drop the daemon's stray `sessionPubkey`** (`apps/cli/src/daemon/broker.ts:113`). It's a leftover from the era when the daemon WS was the only WS. The session role now owns session pubkeys. If we want the daemon itself to be addressable by a stable pubkey, rename it `daemonPubkey` and document it; today it's dead ballast.
|
||||||
|
|
||||||
|
3. **Tighten daemon-WS inbound filter, don't remove it** (Codex's correction to my prior take). Daemon WS should still receive `*` broadcasts and member-targeted DMs (legacy senders, zero-launch state). It should NOT decrypt session-targeted DMs (that's the session WS's job, and decryption requires the session secret which the daemon WS doesn't have anyway).
|
||||||
|
|
||||||
|
4. **Fix the broker drain race** (`apps/broker/src/broker.ts:2399-2402`). Add `claimed_at` + `claim_id` columns; claim sets `claimed_at = NOW()` (NOT `delivered_at`); push runs; `delivered_at = NOW()` is set ONLY after `ws.send` succeeds. Re-eligible if `claimed_at` is older than the lease timeout (e.g. 30s). Combined with `client_message_id` dedupe on the receiver side, this gives at-least-once semantics, which is what an agentic comms platform needs.
|
||||||
|
|
||||||
|
5. **Decouple presence-WS-role from session-WS-role at the broker.** Today `connectPresence` is called from both `handleHello` and `handleSessionHello`. The two paths diverge in identity (member vs session pubkey) and dedup key (sessionId in both cases). Make the role explicit on the presence row (`role: "control-plane" | "session" | "service"`) so list_peers, fan-out, and reconnect can reason about it. Hidden `claudemesh-daemon` rows in 1.32.0's `peer list` are a hack covering for missing typing.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Layer 3: Channels
|
||||||
|
|
||||||
|
### Today
|
||||||
|
|
||||||
|
One channel type: **direct messages with target-spec routing**. `targetSpec` is a string that the broker pattern-matches:
|
||||||
|
- `<64-hex-pubkey>` → DM to that member or session
|
||||||
|
- `*` → broadcast to mesh
|
||||||
|
- `@<groupname>` → group post
|
||||||
|
- `#<topicId>` → topic post
|
||||||
|
|
||||||
|
This works but it's overloaded — the same `send` verb covers DMs, broadcasts, groups, topics, and (since v0.9) tagged messages. As we add agentic peers, the semantics matter and the routing key string can't carry them.
|
||||||
|
|
||||||
|
### Proposed: typed channel envelope
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
type ChannelType =
|
||||||
|
| "dm" // 1:1 message, encrypted to recipient pubkey
|
||||||
|
| "group" // post to named group, encrypted per-recipient (today: base64 plaintext)
|
||||||
|
| "topic" // pub/sub topic, persisted, history available, per-topic symmetric key
|
||||||
|
| "rpc" // request/response, correlation id, timeout, structured result
|
||||||
|
| "system" // peer_joined / peer_left / topology / lifecycle events
|
||||||
|
| "stream"; // long-lived data stream (voice transcript, log tail, file transfer chunks)
|
||||||
|
|
||||||
|
interface Envelope {
|
||||||
|
/** Schema version. v1 = current opaque shape. v2 = this typed shape. */
|
||||||
|
v: 2;
|
||||||
|
/** What semantics the receiver should apply. */
|
||||||
|
channel: ChannelType;
|
||||||
|
/** Target — pubkey for dm, group name for group, topic id for topic, etc.
|
||||||
|
* Same wire format as today's targetSpec, but typed. */
|
||||||
|
target: string;
|
||||||
|
/** Sender identity (member, session, or service pubkey). */
|
||||||
|
from: string;
|
||||||
|
/** Encrypted payload + crypto envelope. Channel type drives crypto:
|
||||||
|
* - dm: crypto_box to recipient pubkey
|
||||||
|
* - group: per-recipient seal (today: plaintext)
|
||||||
|
* - topic: symmetric key (today: plaintext, v0.2.0+ adds per-topic key)
|
||||||
|
* - rpc / system / stream: same as DM (crypto_box) */
|
||||||
|
body: { nonce: string; ciphertext: string; bodyVersion: number };
|
||||||
|
/** Optional metadata, varies by channel type. */
|
||||||
|
meta?: {
|
||||||
|
/** Stable client-supplied id for dedupe (existing field, made required for v2). */
|
||||||
|
clientMessageId: string;
|
||||||
|
/** Sender's canonical fingerprint per spec §4.4 (existing field). */
|
||||||
|
requestFingerprint?: string;
|
||||||
|
/** dm/group: priority gate (now/next/low). rpc: timeout_ms. stream: chunk_id. */
|
||||||
|
priority?: "now" | "next" | "low";
|
||||||
|
timeoutMs?: number;
|
||||||
|
streamChunkId?: number;
|
||||||
|
/** dm/topic: replyTo for threading. */
|
||||||
|
replyToId?: string;
|
||||||
|
/** topic: mentions list (existing field). */
|
||||||
|
mentions?: string[];
|
||||||
|
/** rpc: correlation back-edge so the broker can route the response. */
|
||||||
|
rpcCorrelationId?: string;
|
||||||
|
};
|
||||||
|
/** Sender signature over (channel, target, from, nonce, ciphertext, meta). */
|
||||||
|
signature?: string;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why this matters for agentic peers:**
|
||||||
|
|
||||||
|
- A voice agent sending a partial transcript wants `channel: "stream"` semantics — high-frequency, small chunks, idempotent, no per-message ack required.
|
||||||
|
- An OpenAI Assistant calling a tool wants `channel: "rpc"` — request-response with timeout, correlation back-edge so the response routes.
|
||||||
|
- A scheduled cron actor reporting completion wants `channel: "topic"` — fire-and-forget, persisted history.
|
||||||
|
- Today all of these get bolted onto `dm` with conventions; v2 envelope makes them first-class.
|
||||||
|
|
||||||
|
### Claude Code channels — first-class support
|
||||||
|
|
||||||
|
Two specific channel features for Claude Code:
|
||||||
|
|
||||||
|
1. **Mid-turn interrupt** (`claude/channel` push). Already implemented via the MCP push-pipe. The new envelope makes it explicit: `channel: "dm"` with `meta.priority: "now"` triggers MCP push to a launched session. Other priorities deliver at next inbox poll.
|
||||||
|
|
||||||
|
2. **Reply threading** (`meta.replyToId`). Already partially supported on topics; v2 makes it work uniformly across `dm` and `topic`. The receiver Claude Code session sees a structured reply thread instead of flat history.
|
||||||
|
|
||||||
|
3. **Mentions** (`meta.mentions`). Already supported on topics; v2 surfaces them on `dm` too — useful for `@<peer>` callouts in groups even when the message body is encrypted.
|
||||||
|
|
||||||
|
### Backward compatibility
|
||||||
|
|
||||||
|
Envelope v1 (today's shape) stays accepted by the broker until v3.x. v1 envelopes are auto-upgraded server-side: `channel` inferred from `targetSpec` shape (`*` → group/broadcast, `#` → topic, hex → dm). Existing CLIs keep working.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Future integrations (concrete)
|
||||||
|
|
||||||
|
These are not part of v2.0 — they're the test cases the architecture must support:
|
||||||
|
|
||||||
|
### LiveKit voice agent
|
||||||
|
- Service identity: `livekit-room-<id>`, signed by mesh owner.
|
||||||
|
- Transport: dedicated daemon side-car hosts a LiveKit participant; data-channel packets bridge to the central broker via WS.
|
||||||
|
- Channels: `stream` for transcript chunks, `system` for call lifecycle (joined/left/muted), `dm` for sidebar text.
|
||||||
|
- E2E: per-call ephemeral keypair held by the side-car; participants' member keys are discovered via mesh peer list.
|
||||||
|
|
||||||
|
### OpenAI Assistant integration
|
||||||
|
- Service identity: `openai-assistant-<id>`, scoped to one or more topics + RPC.
|
||||||
|
- Transport: HTTP webhook out (broker → assistant API), HTTP POST in (assistant → broker `/v1/send`).
|
||||||
|
- Channels: `rpc` for tool-style invocations from claudemesh peers, `topic` for assistant-published events.
|
||||||
|
- Crypto: delegated to daemon (assistant can't hold a libsodium secret; daemon re-encrypts on its behalf).
|
||||||
|
|
||||||
|
### Generic webhook consumer (Stripe-style)
|
||||||
|
- Service identity: `webhook-<consumer-id>`, scoped to subscribed topics.
|
||||||
|
- Transport: HTTP webhook out only. No inbound — it's a passive sink.
|
||||||
|
- Channels: `topic` only.
|
||||||
|
- Crypto: not E2E; webhook bodies are signed (HMAC-SHA256, sender = mesh) but plaintext.
|
||||||
|
|
||||||
|
### Human-via-WhatsApp bridge
|
||||||
|
- Service identity: `whatsapp-bridge`, with member-mapping for each end-user.
|
||||||
|
- Transport: WS (bridge holds long connection to broker), bridges to WhatsApp Business API.
|
||||||
|
- Channels: `dm` (1:1 chat → WhatsApp DM), `topic` (claudemesh topic → WhatsApp group).
|
||||||
|
- E2E: bridge holds a per-end-user delegated key; not "true" E2E to the WhatsApp side, but signaled clearly in UX.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migration plan
|
||||||
|
|
||||||
|
### v2.0.0 — Foundational cleanup (no new external surface)
|
||||||
|
**Target: 1–2 weeks**
|
||||||
|
|
||||||
|
- [ ] Extract `connectWsWithBackoff` helper, refactor `DaemonBrokerClient` + `SessionBrokerClient` to use it.
|
||||||
|
- [ ] Drop daemon's stray `sessionPubkey` (or rename + document).
|
||||||
|
- [ ] Tighten daemon-WS inbound filter (broadcast + member-targeted only).
|
||||||
|
- [ ] Add `presence.role` column (`control-plane | session | service`); broker fan-out + list_peers honor it.
|
||||||
|
- [ ] **Fix drain race**: schema migration adds `claimed_at`, `claim_id`, `claim_expires_at` columns; rewrite `drainForMember` for two-phase claim/deliver; add re-claim path for stale leases.
|
||||||
|
- [ ] Receiver-side: harden `client_message_id` dedupe (already partial in 1.32.x; finish for at-least-once). Add idempotent insert that returns existing row on conflict.
|
||||||
|
|
||||||
|
**Success criteria:**
|
||||||
|
- Two-session smoke test still passes (1.32.1 baseline).
|
||||||
|
- Crash-mid-push test: kill broker between claim and send; verify message redelivers on broker restart + recipient reconnect.
|
||||||
|
- Reconnect storm test: 100 reconnect cycles per session over 60s; zero message loss.
|
||||||
|
|
||||||
|
### v2.1.0 — Transport adapter interface
|
||||||
|
**Target: 2–3 weeks after v2.0.0**
|
||||||
|
|
||||||
|
- [ ] Define `BrokerTransport` interface; refactor existing WS code to be the first implementation.
|
||||||
|
- [ ] Add `HttpWebhookTransport` adapter (broker side: outbound HTTP POST with retry + HMAC signature; daemon side: HTTP server that receives webhook callbacks and inserts into inbox).
|
||||||
|
- [ ] Add `/v1/send` HTTP endpoint on the broker (today the broker is WS-only for sends).
|
||||||
|
- [ ] Service identity registration flow: `claudemesh service register --type webhook --scopes dm:read,topic:write` mints attestation, stores it locally + on broker.
|
||||||
|
- [ ] Basic `SseTransport` for browser/CI use cases.
|
||||||
|
|
||||||
|
**Success criteria:**
|
||||||
|
- A scheduled cron job using only `curl` can send to the mesh (no daemon required).
|
||||||
|
- A webhook consumer subscribed to a topic receives messages within 5s of post.
|
||||||
|
|
||||||
|
### v2.2.0 — Typed channels (envelope v2)
|
||||||
|
**Target: 2–3 weeks after v2.1.0**
|
||||||
|
|
||||||
|
- [ ] Define `Envelope v2` schema; broker accepts both v1 and v2; sender-side code emits v2.
|
||||||
|
- [ ] `channel: "rpc"` end-to-end: correlation id routing, response timeout, `claudemesh rpc <peer> <method> <args>` CLI verb.
|
||||||
|
- [ ] `channel: "stream"` end-to-end: chunked delivery, ordered, idempotent, `claudemesh stream <peer> <stream-id>` CLI verb.
|
||||||
|
- [ ] Mid-turn push (`claude/channel`) honors `channel: "dm"` with `meta.priority: "now"` only.
|
||||||
|
- [ ] Mentions + replyToId surface uniformly across dm and topic.
|
||||||
|
|
||||||
|
**Success criteria:**
|
||||||
|
- Demo: a Claude Code session sends an `rpc` to another Claude Code session, gets a structured response.
|
||||||
|
- Demo: a voice-agent prototype sends `stream` chunks; another peer receives them in order with no gaps.
|
||||||
|
|
||||||
|
### v2.3+ — Concrete external integrations
|
||||||
|
**Target: opportunistic**
|
||||||
|
|
||||||
|
- LiveKit side-car (one PoC integration to validate the architecture).
|
||||||
|
- OpenAI Assistant integration (validate delegated-key crypto path).
|
||||||
|
- WhatsApp bridge (validate human-bridge service identity).
|
||||||
|
|
||||||
|
These are not on the critical path for the architecture; they prove it.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Non-goals (explicit)
|
||||||
|
|
||||||
|
- **Replacing Slack / Discord.** claudemesh is for agent coordination. Human chat is a side-effect, not the headline.
|
||||||
|
- **Federation across multiple brokers.** v2.0 stays single-broker per mesh. Multi-broker (gossip / federation) is a separate spec, post-v3.
|
||||||
|
- **Sync-only / no-broker P2P.** Direct peer-to-peer (without the central broker) is a different architecture (libp2p, Iroh). Not in scope.
|
||||||
|
- **Replacing the MCP push-pipe.** Mid-turn interrupt stays MCP-based. The transport-adapter layer is broker-side; MCP is daemon-to-Claude-Code, untouched.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Open questions
|
||||||
|
|
||||||
|
1. **How does a service identity prove liveness?** WS gives us implicit liveness via the connection. HTTP webhook services need an explicit heartbeat / health-check. Proposal: broker periodically POSTs to `<webhook>/health`; service is marked offline after 3 consecutive failures.
|
||||||
|
|
||||||
|
2. **RPC routing through offline peers — what's the failure mode?** If `claudemesh rpc <peer> ...` and the peer is offline, do we (a) queue and wait (DM semantics) or (b) fail fast (REST semantics)? Proposal: RPC fails fast with `peer_offline` after a 5s probe; explicit `--wait` flag opts into DM-style queue.
|
||||||
|
|
||||||
|
3. **Per-topic symmetric key rotation.** Existing v0.2.0 spec mentions per-topic keys. Rotation policy (when, who triggers, how members re-sync) is unsolved. Defer to a separate spec; v2.2.0 ships with one-shot keys (rotate by re-creating topic).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Acknowledgements
|
||||||
|
|
||||||
|
Cross-checked with Codex (GPT-5.2, high reasoning) on the foundational cleanup section. Codex caught:
|
||||||
|
- The "remove daemon-WS inbound entirely" idea would silently lose broadcasts + member-targeted DMs whenever zero launches exist. Corrected.
|
||||||
|
- Inheritance for the dup'd lifecycle would become a god class. Composition via helper is the right call.
|
||||||
|
- The drain race needs a `claimed_at` + delivered-on-success fix; "check OPEN before claim" still drops on crash.
|
||||||
|
- Token-keyed registry is correct (token = auth boundary), not a smell.
|
||||||
|
|
||||||
|
The agentic-comms / typed-channels / transport-adapter layers are mine — Codex didn't touch those because the question I asked was about the existing architecture's smells, not the future roadmap.
|
||||||
282
.artifacts/specs/2026-05-04-per-session-presence.md
Normal file
282
.artifacts/specs/2026-05-04-per-session-presence.md
Normal file
@@ -0,0 +1,282 @@
|
|||||||
|
# Per-session broker presence — daemon-multiplexed
|
||||||
|
|
||||||
|
**Status:** spec, queued for 1.30.0 (alongside launch-wizard refactor).
|
||||||
|
**Owner:** alezmad
|
||||||
|
**Author:** Claude (Sprint A planning, 2026-05-04)
|
||||||
|
**Related:** `2026-05-04-v2-roadmap-completion.md` (Sprint A overview),
|
||||||
|
1.29.0 session-registry CHANGELOG entry.
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
After 1.28.0 dropped the bridge tier, **launched `claude` sessions have
|
||||||
|
no persistent broker presence**. Only the daemon does.
|
||||||
|
|
||||||
|
Concretely: two `claudemesh launch` sessions in the same cwd, querying
|
||||||
|
`peer list` 2 s apart, **never see each other**. Each `claudemesh peer
|
||||||
|
list` opens a short-lived cold-path WS that creates a `presence` row
|
||||||
|
for the duration of the query and tears it down. The "this session"
|
||||||
|
row everyone sees in their own snapshot is created by the snapshot
|
||||||
|
itself; sibling sessions' queries miss it because their WS-lifetimes
|
||||||
|
don't overlap.
|
||||||
|
|
||||||
|
Confirmed empirically (2026-05-04, same-cwd ECIJA-Intranet test):
|
||||||
|
|
||||||
|
| Snapshot | timestamp | self pubkey | self `connectedAt` |
|
||||||
|
|---|---|---|---|
|
||||||
|
| Session A | 11:42:37Z | `61d96106cb499208` | 11:42:38Z (= query time) |
|
||||||
|
| Session B | 11:42:39Z | `ce77188aba02827d` | 11:42:38Z (= query time) |
|
||||||
|
|
||||||
|
Each saw 5 long-lived peers (the daemon and unrelated other sessions)
|
||||||
|
plus its own ephemeral row. Neither saw the other.
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
Every launched `claude` session has a long-lived broker presence row
|
||||||
|
**owned by the daemon**, identified by the session's per-launch
|
||||||
|
keypair. Siblings see each other in `peer list` immediately and
|
||||||
|
continuously, not as snapshot artifacts.
|
||||||
|
|
||||||
|
## Non-goals
|
||||||
|
|
||||||
|
- Cross-machine session sync (waiting on 2.0.0 HKDF identity).
|
||||||
|
- Replacing the daemon's own presence row — the daemon stays as a
|
||||||
|
separate row for "the user on this machine, no specific session."
|
||||||
|
- Persistence of the session-presence link across daemon restarts —
|
||||||
|
daemon restart can be allowed to require launched sessions to
|
||||||
|
re-register (same compromise as the in-memory session registry from
|
||||||
|
1.29.0).
|
||||||
|
|
||||||
|
## Design
|
||||||
|
|
||||||
|
### State machine
|
||||||
|
|
||||||
|
The 1.29.0 session registry already tracks `Map<token, SessionInfo>`
|
||||||
|
inside the daemon. Extend it to own a per-session broker connection.
|
||||||
|
|
||||||
|
```
|
||||||
|
session lifecycle:
|
||||||
|
POST /v1/sessions/register
|
||||||
|
→ registry.set(token, info)
|
||||||
|
→ daemon.openSessionWs(info) ← NEW
|
||||||
|
→ broker creates presence row owned by session.pubkey
|
||||||
|
|
||||||
|
DELETE /v1/sessions/:token
|
||||||
|
→ registry.delete(token)
|
||||||
|
→ daemon.closeSessionWs(token) ← NEW
|
||||||
|
→ broker marks presence.disconnectedAt = now()
|
||||||
|
|
||||||
|
reaper (30 s tick): pid dead?
|
||||||
|
→ registry.delete(token)
|
||||||
|
→ daemon.closeSessionWs(token)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Daemon-side: per-session `BrokerClient`
|
||||||
|
|
||||||
|
Today the daemon holds `Map<meshSlug, DaemonBrokerClient>` (one WS per
|
||||||
|
attached mesh). Add a parallel `Map<token, SessionBrokerClient>` for
|
||||||
|
the per-launch ephemeral connections.
|
||||||
|
|
||||||
|
`SessionBrokerClient` is the existing `BrokerClient` reused, configured
|
||||||
|
with the session's per-launch keypair instead of the member's stable
|
||||||
|
keypair. It registers presence (`presence_join`) and stays connected
|
||||||
|
until `closeSessionWs(token)` fires. It does **not** drain the outbox
|
||||||
|
— that's the member-keypair `DaemonBrokerClient`'s job. It only carries
|
||||||
|
presence + receives DMs targeted at the session pubkey.
|
||||||
|
|
||||||
|
### Broker-side: parent-vouched presence auth
|
||||||
|
|
||||||
|
Today's broker accepts hello-sig auth where:
|
||||||
|
- Caller signs the broker's nonce with their `mesh_member` keypair.
|
||||||
|
- Broker looks up `mesh_member.peer_pubkey == sig.pubkey`.
|
||||||
|
|
||||||
|
For per-session keypairs, the session pubkey is **not** in `mesh_member`
|
||||||
|
— it's freshly generated by `claudemesh launch`. We need a new
|
||||||
|
attestation flow:
|
||||||
|
|
||||||
|
```
|
||||||
|
hello {
|
||||||
|
type: "session_hello",
|
||||||
|
session_pubkey: <fresh keypair>,
|
||||||
|
parent_member_pubkey: <member keypair from config>,
|
||||||
|
display_name, cwd, role, groups,
|
||||||
|
parent_signature: ed25519_sign(member_priv,
|
||||||
|
"claudemesh-session/" || session_pubkey || "/" || nonce),
|
||||||
|
nonce_challenge: <broker nonce>,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Broker validates:
|
||||||
|
1. `parent_member_pubkey` exists in `mesh.member` for the target mesh.
|
||||||
|
2. `parent_signature` validates against `parent_member_pubkey` over the
|
||||||
|
canonical message above.
|
||||||
|
3. Broker inserts a presence row keyed on `session_pubkey` but
|
||||||
|
`member_id` pointing at the parent member's `mesh.member.id`.
|
||||||
|
|
||||||
|
This is the OAuth-style refresh-vs-access pattern: the parent member
|
||||||
|
key vouches "this ephemeral session pubkey belongs to me." The broker
|
||||||
|
binds the row to the parent member but uses the session pubkey for
|
||||||
|
routing (so DMs targeted at the session pubkey land at this WS).
|
||||||
|
|
||||||
|
### CLI-side: launch.ts produces the parent signature
|
||||||
|
|
||||||
|
`claudemesh launch` already mints the session keypair and writes the
|
||||||
|
session-token file. Extend it to also produce a `parent_signature`
|
||||||
|
that the daemon can present when opening the session WS:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
const sessionPubkey = sessionKeypair.publicKey;
|
||||||
|
const parentSig = ed25519_sign(
|
||||||
|
mesh.secretKey,
|
||||||
|
Buffer.concat([
|
||||||
|
Buffer.from("claudemesh-session/"),
|
||||||
|
sessionPubkey,
|
||||||
|
Buffer.from("/"),
|
||||||
|
/* nonce comes from broker — handled at WS-connect time */
|
||||||
|
]),
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
Actually, the nonce is broker-issued at hello time, so the signature
|
||||||
|
needs to be produced fresh per WS-connect. Simpler approach: the
|
||||||
|
`POST /v1/sessions/register` body carries the *member secret key* (or
|
||||||
|
a derived signing capability) so the daemon can sign nonces on behalf
|
||||||
|
of the session.
|
||||||
|
|
||||||
|
That's a key-leak risk. Better: register carries a **pre-signed
|
||||||
|
attestation** good for a TTL window:
|
||||||
|
|
||||||
|
```
|
||||||
|
register body adds:
|
||||||
|
parent_attestation: {
|
||||||
|
session_pubkey: hex,
|
||||||
|
parent_member_pubkey: hex,
|
||||||
|
expires_at: ISO,
|
||||||
|
signature: ed25519_sign(member_priv,
|
||||||
|
"claudemesh-session-attest/" ||
|
||||||
|
session_pubkey || "/" ||
|
||||||
|
expires_at),
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Daemon presents this attestation in `session_hello`; broker validates
|
||||||
|
expiry and signature, then issues a nonce challenge that the daemon
|
||||||
|
can satisfy with the session keypair (which IS held by the daemon
|
||||||
|
for the lifetime of the registration). Two-stage: parent vouches the
|
||||||
|
session; session signs the nonce.
|
||||||
|
|
||||||
|
### Registry persistence
|
||||||
|
|
||||||
|
For now, in-memory only (matching 1.29.0). Daemon restart drops all
|
||||||
|
session WSes; launched `claude` processes are responsible for
|
||||||
|
re-registering on next CLI invocation. Acceptable v1 behaviour;
|
||||||
|
revisit when sqlite persistence lands for the registry.
|
||||||
|
|
||||||
|
## Wire changes
|
||||||
|
|
||||||
|
### Broker
|
||||||
|
|
||||||
|
- New `session_hello` message type (additive; existing `hello` for
|
||||||
|
member auth unchanged).
|
||||||
|
- `presence` row schema unchanged — `member_id` still required, but
|
||||||
|
`session_pubkey` differs from member's stable pubkey.
|
||||||
|
- Validate `parent_attestation.expires_at <= now() + 24h` to bound
|
||||||
|
attestation reuse.
|
||||||
|
|
||||||
|
### Daemon
|
||||||
|
|
||||||
|
- New `SessionBrokerClient` factory — wraps `BrokerClient` with
|
||||||
|
session-mode hello.
|
||||||
|
- `Map<token, SessionBrokerClient>` alongside the existing
|
||||||
|
`Map<slug, DaemonBrokerClient>`.
|
||||||
|
- IPC routes:
|
||||||
|
- `POST /v1/sessions/register` — extend body schema with
|
||||||
|
`parent_attestation`.
|
||||||
|
- `DELETE /v1/sessions/:token` — close the session WS first, then
|
||||||
|
drop registry entry.
|
||||||
|
|
||||||
|
### CLI (`claudemesh launch`)
|
||||||
|
|
||||||
|
- Mint session keypair (today only writes the session token; need to
|
||||||
|
add ed25519 keypair generation per launch and write the privkey
|
||||||
|
alongside the token).
|
||||||
|
- Sign `parent_attestation` with the member key from the joined-mesh
|
||||||
|
config.
|
||||||
|
- POST register with both the new keypair and the attestation.
|
||||||
|
|
||||||
|
## LoC estimate
|
||||||
|
|
||||||
|
- Daemon `SessionBrokerClient` + registry hook: ~120 LoC.
|
||||||
|
- IPC route schema extension + validation: ~40 LoC.
|
||||||
|
- Broker `session_hello` handler + tests: ~140 LoC.
|
||||||
|
- CLI `claudemesh launch` keypair + attestation: ~60 LoC.
|
||||||
|
- Tests + smoke: ~80 LoC.
|
||||||
|
|
||||||
|
Total: **~440 LoC** across CLI + daemon + broker.
|
||||||
|
|
||||||
|
## Risks
|
||||||
|
|
||||||
|
| Risk | Mitigation |
|
||||||
|
|---|---|
|
||||||
|
| Member private key never leaves the user's machine, but the **attestation** (signed token) can be replayed within its TTL. | TTL bound 24h; refresh on launch; revocation path = drop the parent member's mesh enrollment (nuclear, but works). |
|
||||||
|
| Cascading WS connections — N launches = N+1 broker WSes per user. | Acceptable up to 10-20 concurrent sessions; if it ever becomes a problem, multiplex per-session at the protocol level (one WS, multiple presence rows). Out of scope for v1. |
|
||||||
|
| Daemon restart kills all session WSes — `peer list` from inside a launched session sees the remaining 5 peers but not its own siblings until they re-register. | Same as 1.29.0 registry. The registry could persist to sqlite later; for v1, accepted. |
|
||||||
|
| Broker schema cost: every new presence row has a different `session_pubkey`, growing the table faster. | Already accepted — broker prunes disconnected rows on a 30-day window. Per-session keys triple the row count at peak but stay within the prune budget. |
|
||||||
|
|
||||||
|
## Compatibility
|
||||||
|
|
||||||
|
- **Older brokers** can't validate `session_hello`. Sessions will
|
||||||
|
attempt the new hello, get back `unknown_message_type`, and fall
|
||||||
|
back to the existing member-keyed hello (no per-session presence,
|
||||||
|
but everything still works as 1.28.0). Add the broker change first,
|
||||||
|
let it deploy, then ship the CLI side.
|
||||||
|
- **Older CLIs** continue to work unchanged — they don't open
|
||||||
|
per-session WSes. They appear as ephemeral cold-path rows just like
|
||||||
|
today, and lose the symmetric-visibility property between siblings.
|
||||||
|
- **Backward visible:** users on 1.30.0+ on the same mesh as users on
|
||||||
|
≤1.29.x will see the older users as one row (their daemon) instead
|
||||||
|
of one row per session. Acceptable — opt-in to the new visibility
|
||||||
|
by upgrading.
|
||||||
|
|
||||||
|
## Sequencing
|
||||||
|
|
||||||
|
1. **Broker change ships first.** Add `session_hello` handler, deploy,
|
||||||
|
bake for ~24h. No CLI behaviour change yet.
|
||||||
|
2. **Daemon `SessionBrokerClient` ships next** behind a feature flag
|
||||||
|
(`CLAUDEMESH_SESSION_PRESENCE=1`). Manually test with two launched
|
||||||
|
sessions in the same cwd; verify both see each other.
|
||||||
|
3. **CLI keypair-mint + attestation in `launch.ts` ships last**, behind
|
||||||
|
the same flag.
|
||||||
|
4. Flip the flag default in 1.30.0 release; document rollback via env.
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
End-to-end smoke (paste into 1.30.0's CHANGELOG):
|
||||||
|
|
||||||
|
```
|
||||||
|
$ # In two different shells, both cd ~/Desktop/foo:
|
||||||
|
$ claudemesh launch --name SessionA -y # shell 1
|
||||||
|
$ claudemesh launch --name SessionB -y # shell 2
|
||||||
|
$
|
||||||
|
$ # In a third shell:
|
||||||
|
$ claudemesh peer list --json --mesh foo | jq '.[] | {n: .displayName, c: .cwd}'
|
||||||
|
{ "n": "SessionA", "c": "/.../foo" } ← persistent, not query-induced
|
||||||
|
{ "n": "SessionB", "c": "/.../foo" }
|
||||||
|
$
|
||||||
|
$ # In SessionA's shell:
|
||||||
|
$ claudemesh peer list --mesh foo
|
||||||
|
should include SessionB.
|
||||||
|
$
|
||||||
|
$ # Kill SessionB (Ctrl-C in shell 2). Wait <30s.
|
||||||
|
$ claudemesh peer list --mesh foo
|
||||||
|
should NOT include SessionB (reaper closed its WS).
|
||||||
|
```
|
||||||
|
|
||||||
|
## Open questions
|
||||||
|
|
||||||
|
- Should the per-session WS also drain *its own* outbox subset, or stay
|
||||||
|
presence-only? Recommend presence-only for v1 — keeps state machines
|
||||||
|
simple, daemon's member-keyed WS handles all sends. Can be revisited
|
||||||
|
when per-session policy DSL ships.
|
||||||
|
- Should the parent attestation be revocable mid-session? Could add an
|
||||||
|
IPC route on the daemon. Out of scope for v1; revoke = drop the
|
||||||
|
whole member enrollment.
|
||||||
288
.artifacts/specs/2026-05-04-session-capabilities.md
Normal file
288
.artifacts/specs/2026-05-04-session-capabilities.md
Normal file
@@ -0,0 +1,288 @@
|
|||||||
|
# Session capabilities — first-class concept
|
||||||
|
|
||||||
|
**Status:** spec, queued behind v0.3.0 topic-encryption work.
|
||||||
|
**Owner:** alezmad
|
||||||
|
**Author:** Claude (Sprint B follow-up, 2026-05-04)
|
||||||
|
**Related:** `2026-04-15-per-peer-capabilities.md` (existing per-peer
|
||||||
|
caps system, member-keyed), `2026-05-04-per-session-presence.md`
|
||||||
|
(per-launch session presence — what we're now restricting).
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
Per-peer capability grants (`apps/broker/src/index.ts:2178+, 2309+`)
|
||||||
|
are keyed on the sender's **stable member pubkey**. The grant model
|
||||||
|
gives the recipient fine-grained control: "alice can DM me",
|
||||||
|
"bob can read state but not broadcast", etc.
|
||||||
|
|
||||||
|
But: as of v1.30.0 (`per-session-presence`), every `claudemesh
|
||||||
|
launch` mints a per-launch ephemeral keypair with a parent attestation
|
||||||
|
binding it to the member identity. The launched session inherits **all**
|
||||||
|
the member's capabilities transitively, because cap enforcement always
|
||||||
|
falls through to the member key.
|
||||||
|
|
||||||
|
Concretely:
|
||||||
|
|
||||||
|
- Member `alice` is in mesh `flexicar`, granted `dm + state-read +
|
||||||
|
state-write` by everyone.
|
||||||
|
- Alice launches a session with `claudemesh launch` to do an automated
|
||||||
|
task — say, run a Claude Code agent that iterates over PRs.
|
||||||
|
- That session has full member privileges. It can DM peers, write
|
||||||
|
shared state keys (e.g. clobber `current-pr`), grant new caps, ban
|
||||||
|
members, etc. — none of which the user wanted to delegate.
|
||||||
|
|
||||||
|
There is no way to express "this session can DM peers but cannot
|
||||||
|
deploy services or grant caps." The parent attestation is a binary
|
||||||
|
existence proof — "this session was vouched by a member" — with no
|
||||||
|
capability subset.
|
||||||
|
|
||||||
|
Plus an adjacent footgun: `set_state` (`apps/broker/src/index.ts:2949`)
|
||||||
|
has **no cap check at all**. Anyone in the mesh can write any key. The
|
||||||
|
spec at `2026-04-15-per-peer-capabilities.md` lists `state-write` as a
|
||||||
|
planned cap but it was never wired into the broker. Shared keys like
|
||||||
|
`current-pr` are write-anyone today.
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
A launched session can be issued **a capability subset** of its
|
||||||
|
parent member, signed by the parent at launch time, and the broker
|
||||||
|
enforces the **intersection** of recipient grants × session caps on
|
||||||
|
every protected operation.
|
||||||
|
|
||||||
|
## Non-goals
|
||||||
|
|
||||||
|
- Changing the existing per-peer cap model. Member-keyed grants stay
|
||||||
|
authoritative for "who is allowed to talk to me."
|
||||||
|
- Cross-machine session caps (waiting on 2.0.0 HKDF identity).
|
||||||
|
- Per-tool granularity inside the Claude Code MCP surface — this
|
||||||
|
spec only covers the broker-enforceable verbs (dm, broadcast,
|
||||||
|
state-read, state-write, grant, kick, ban, profile-write,
|
||||||
|
service-deploy).
|
||||||
|
- Delegation: a session cannot re-vouch a sub-session with its own
|
||||||
|
cap subset. Only members can attest sessions. (Could be lifted in
|
||||||
|
a future spec; today's launch flow doesn't need it.)
|
||||||
|
|
||||||
|
## Design
|
||||||
|
|
||||||
|
### Capability vocabulary
|
||||||
|
|
||||||
|
Existing (today, member-level):
|
||||||
|
|
||||||
|
| Capability | Effect when GRANTED on a recipient → sender pair |
|
||||||
|
|---------------|---------------------------------------------------|
|
||||||
|
| `read` | Sender appears in recipient's `list_peers` |
|
||||||
|
| `dm` | Sender can DM recipient |
|
||||||
|
| `broadcast` | Sender's broadcasts reach recipient |
|
||||||
|
| `state-read` | Sender can read shared state |
|
||||||
|
| `state-write` | (planned) Sender can write shared state |
|
||||||
|
| `file-read` | Sender can fetch files recipient shared |
|
||||||
|
|
||||||
|
New (session-level — cap subset on the attestation):
|
||||||
|
|
||||||
|
These are the **verbs the session is allowed to invoke**, NOT what
|
||||||
|
peers can do TO it. A session attestation declaring `["dm", "read"]`
|
||||||
|
means the session can SEND dm/read-list operations; it cannot
|
||||||
|
broadcast, write state, grant, etc.
|
||||||
|
|
||||||
|
| Session cap | Gates which broker operations |
|
||||||
|
|-------------------|------------------------------------------------|
|
||||||
|
| `dm` | `send` with single recipient |
|
||||||
|
| `broadcast` | `send` with `*`, `@group`, `#topic` |
|
||||||
|
| `state-read` | `get_state`, `list_state` |
|
||||||
|
| `state-write` | `set_state` |
|
||||||
|
| `grant` | `grant`, `revoke`, `block` |
|
||||||
|
| `kick` | `kick`, `disconnect` |
|
||||||
|
| `ban` | `ban`, `unban` |
|
||||||
|
| `profile-write` | `set_profile`, `set_summary`, `set_status` |
|
||||||
|
| `service-deploy` | `mesh_service_register`, `_unregister` |
|
||||||
|
|
||||||
|
The default cap set when no subset is declared: the **full member
|
||||||
|
set** (today's behavior — opt-in restriction, not breaking).
|
||||||
|
|
||||||
|
### Attestation v2
|
||||||
|
|
||||||
|
Existing v1 (`apps/cli/src/services/broker/session-hello-sig.ts`):
|
||||||
|
|
||||||
|
```
|
||||||
|
canonical = `claudemesh-session-attest|<parent>|<session>|<expires>`
|
||||||
|
```
|
||||||
|
|
||||||
|
New v2 (additive — broker accepts both):
|
||||||
|
|
||||||
|
```
|
||||||
|
canonical = `claudemesh-session-attest-v2|<parent>|<session>|<expires>|<sorted-caps-csv>`
|
||||||
|
```
|
||||||
|
|
||||||
|
Where `<sorted-caps-csv>` is the lower-cased, comma-joined,
|
||||||
|
ASCII-sorted cap list. Empty-list = full member caps (default,
|
||||||
|
back-compat).
|
||||||
|
|
||||||
|
**Wire shape additions on `session_hello`:**
|
||||||
|
|
||||||
|
```ts
|
||||||
|
{
|
||||||
|
type: "session_hello",
|
||||||
|
...existing fields...,
|
||||||
|
parentAttestation: {
|
||||||
|
sessionPubkey,
|
||||||
|
parentMemberPubkey,
|
||||||
|
expiresAt,
|
||||||
|
signature,
|
||||||
|
// NEW:
|
||||||
|
allowed_caps?: string[], // omitted = full member set
|
||||||
|
version?: 2, // omitted = v1
|
||||||
|
},
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The broker version-detects: `version === 2` → verify v2 canonical
|
||||||
|
including `allowed_caps`. Default behavior is unchanged for clients
|
||||||
|
that don't pass it.
|
||||||
|
|
||||||
|
### Enforcement
|
||||||
|
|
||||||
|
Add `allowed_caps: string[] | null` to the in-memory `PeerConn`
|
||||||
|
shape (`apps/broker/src/index.ts:131`). Populated from
|
||||||
|
`handleSessionHello` (the v2 attestation supplies it) and from
|
||||||
|
`handleHello` (control-plane / member connection — set to `null`,
|
||||||
|
meaning "full member caps").
|
||||||
|
|
||||||
|
**Effective cap check** for a sending peer needing `cap`:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
function senderHasCap(conn: PeerConn, cap: string): boolean {
|
||||||
|
if (conn.allowed_caps === null) return true; // member-level, no subset
|
||||||
|
return conn.allowed_caps.includes(cap);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Wire this into every broker operation in the table above. The
|
||||||
|
existing per-peer recipient-cap check at `2178+, 2309+` stays —
|
||||||
|
session caps gate the **sender side**, recipient grants gate the
|
||||||
|
**receive side**, and both must allow:
|
||||||
|
|
||||||
|
```
|
||||||
|
allowed = senderHasCap(conn, capNeeded) && recipientGrants[sender][capNeeded]
|
||||||
|
```
|
||||||
|
|
||||||
|
### `set_state` gate (bonus, ship together)
|
||||||
|
|
||||||
|
Today: no cap check. After this spec: `set_state` requires
|
||||||
|
`state-write` on the sender side. Migration: existing members
|
||||||
|
default to having `state-write` in their member caps (no recipient
|
||||||
|
grant model for state-write — it's a sender-side gate only, mesh-
|
||||||
|
wide). New attestations can omit it to forbid the session.
|
||||||
|
|
||||||
|
The recipient-side analog (per-peer state-write grants) is left for
|
||||||
|
a future spec — today the value of guarding state-write is
|
||||||
|
session-level (avoid an automated session clobbering shared keys),
|
||||||
|
not peer-level.
|
||||||
|
|
||||||
|
### CLI surface
|
||||||
|
|
||||||
|
```
|
||||||
|
claudemesh launch --caps dm,read # tight: read-only chat agent
|
||||||
|
claudemesh launch --caps dm,broadcast # send-only, no state writes
|
||||||
|
claudemesh launch # default: full member caps
|
||||||
|
```
|
||||||
|
|
||||||
|
`claudemesh launch --caps ?` prints the table above with descriptions.
|
||||||
|
|
||||||
|
`claudemesh peer list --json` includes `allowed_caps` per row when
|
||||||
|
present (`null` = full member). Lets users audit what their running
|
||||||
|
sessions can actually do.
|
||||||
|
|
||||||
|
### Migration plan (mirrors `2026-04-15-per-peer-capabilities.md` §"Migration plan")
|
||||||
|
|
||||||
|
1. **Broker schema additive** — `PeerConn.allowed_caps` in-memory
|
||||||
|
only; no DB column. Reload-on-reconnect is fine because the
|
||||||
|
attestation is re-sent on every WS open (it's the proof of
|
||||||
|
identity).
|
||||||
|
|
||||||
|
2. **CLI ships v2 attestation alongside v1.** New `--caps` flag
|
||||||
|
defaults to omitted (= v1 attestation, full caps). Older
|
||||||
|
brokers ignore the new fields entirely.
|
||||||
|
|
||||||
|
3. **Broker accepts v2.** When `allowed_caps` arrives, store it.
|
||||||
|
No enforcement yet — log denied operations as `cap_check_dryrun`
|
||||||
|
metric counter, still allow them through.
|
||||||
|
|
||||||
|
4. **Dry-run release.** Ship one CLI + broker release that emits
|
||||||
|
the metric but doesn't enforce. Watch for false positives in
|
||||||
|
real meshes for ≥ 1 week.
|
||||||
|
|
||||||
|
5. **Flip enforcement on.** Broker rejects operations failing the
|
||||||
|
cap check with `forbidden: missing session capability "<cap>"`.
|
||||||
|
Default ("no caps declared = full member") keeps existing
|
||||||
|
sessions unaffected.
|
||||||
|
|
||||||
|
6. **`set_state` gate** ships in step 5 alongside the rest. Default
|
||||||
|
member caps include `state-write`, so flipping it on doesn't
|
||||||
|
break existing flows. Only sessions that explicitly omit
|
||||||
|
`state-write` from `--caps` lose write access.
|
||||||
|
|
||||||
|
### Crypto notes
|
||||||
|
|
||||||
|
- v2 attestation re-uses `crypto_sign_detached` over the new
|
||||||
|
canonical string; same parent member secret key, same TTL caps
|
||||||
|
(≤24 h), same `expiresAt` semantics.
|
||||||
|
- v1 signatures are NOT v2 signatures — collision is impossible
|
||||||
|
because the canonical strings have different prefixes
|
||||||
|
(`claudemesh-session-attest` vs `claudemesh-session-attest-v2`).
|
||||||
|
Domain separation is intrinsic.
|
||||||
|
- Like the existing per-peer cap system: caps are server-enforced
|
||||||
|
metadata, not capability tokens. A malicious broker can ignore
|
||||||
|
them. This is about UX trust + footgun prevention, not protocol-
|
||||||
|
level security.
|
||||||
|
|
||||||
|
## Open questions
|
||||||
|
|
||||||
|
1. **Should the session attestation also bind to a fingerprint of
|
||||||
|
the launched binary / Claude version?** Would let a member say
|
||||||
|
"this session is constrained to Claude Code v1.34.15" so a
|
||||||
|
compromised launched-binary doesn't get reused. Probably no — too
|
||||||
|
much friction for the threat model.
|
||||||
|
|
||||||
|
2. **What's the right default for `claudemesh launch` going forward?**
|
||||||
|
Once enforcement ships, do we change the default `--caps` from
|
||||||
|
"full member" to "dm + read + state-read"? Tighter but breaks
|
||||||
|
existing automation that writes state. Probably worth a one-
|
||||||
|
release deprecation warning ("your session will lose state-write
|
||||||
|
in v2.0.0 unless you pass --caps state-write") and then flip in
|
||||||
|
v2.0.0.
|
||||||
|
|
||||||
|
3. **Does `--caps` belong in `~/.claudemesh/config.json` per-mesh
|
||||||
|
defaults too?** A user who always launches read-only agents
|
||||||
|
wants `caps: ["dm", "read"]` as a personal default. Easy add;
|
||||||
|
defer until users ask for it.
|
||||||
|
|
||||||
|
4. **Per-tool MCP cap surface?** Out of scope here, but: a `claudemesh
|
||||||
|
launch --tools peer:read,memory:write` would be a finer cut than
|
||||||
|
broker-verb caps. The broker can't enforce that — it'd live in the
|
||||||
|
MCP wrapper / Claude Code's allowedTools. Different layer.
|
||||||
|
|
||||||
|
## Test plan
|
||||||
|
|
||||||
|
- Pure-logic tests on `senderHasCap` (member-level → always true,
|
||||||
|
empty caps → always false, declared caps → exact match).
|
||||||
|
- Broker integration: launch a session with `--caps dm`, attempt
|
||||||
|
`set_state` → expect `forbidden: missing session capability
|
||||||
|
"state-write"`.
|
||||||
|
- v1 attestation still accepted, no `allowed_caps` set, all caps
|
||||||
|
permitted (back-compat).
|
||||||
|
- v2 attestation with empty `allowed_caps` array → broker treats
|
||||||
|
as "explicitly empty, no caps allowed" (NOT "full member"). The
|
||||||
|
full-member default is "field omitted entirely". Test both.
|
||||||
|
- Dry-run mode: cap fail increments the counter but the operation
|
||||||
|
proceeds. Smoke-test before flipping enforcement.
|
||||||
|
|
||||||
|
## Estimate
|
||||||
|
|
||||||
|
- Spec review + open-question resolution: 1–2 days.
|
||||||
|
- Broker change (PeerConn field, attestation v2 accept, per-verb
|
||||||
|
enforcement, dry-run mode): 2–3 days.
|
||||||
|
- CLI change (`--caps` flag, attestation builder, peer list
|
||||||
|
surface): 1 day.
|
||||||
|
- Tests: 1 day.
|
||||||
|
- Dry-run release window: ≥ 1 week.
|
||||||
|
|
||||||
|
Total: ~1 sprint of focused work, plus a dry-run window.
|
||||||
104
.artifacts/specs/2026-05-04-v2-roadmap-completion.md
Normal file
104
.artifacts/specs/2026-05-04-v2-roadmap-completion.md
Normal file
@@ -0,0 +1,104 @@
|
|||||||
|
# v2.0.0 Daemon Redesign — Completion Roadmap
|
||||||
|
|
||||||
|
**Date:** 2026-05-04
|
||||||
|
**Owner:** alezmad
|
||||||
|
**Status:** in-progress (1.24.0 + 1.25.0 land most of it; remainder is two follow-up arcs)
|
||||||
|
|
||||||
|
## What's done
|
||||||
|
|
||||||
|
| v2.0.0 bullet | Version | Status |
|
||||||
|
|---|---|---|
|
||||||
|
| `claudemesh-daemon` long-lived launchd / systemd unit | 1.22.0 | ✅ Done |
|
||||||
|
| MCP server shrinks to thin daemon adapter | 1.24.0 | ✅ Done — 979 → ~200 LoC of push-pipe, daemon-required, no fallback |
|
||||||
|
| `claudemesh install` auto-installs + starts daemon | 1.24.0 | ✅ Done |
|
||||||
|
| `claudemesh launch` ensures daemon | 1.24.0 | ✅ Done |
|
||||||
|
| Daemon outbound routing (Sprint 4: real targets + crypto) | 1.25.0 | ✅ Done — outbox stores `mesh`, `target_spec`, `nonce`, `ciphertext`, `priority`; resolution + `crypto_box` happens at IPC accept time; drain is a forwarder |
|
||||||
|
| CLI thin-client routing for read verbs | 1.25.0 | ✅ Partial — `peer list`, `skill list/get` route through daemon when present; same `trySendViaDaemon` fallback shape |
|
||||||
|
| Ambient mode (raw `claude` Just Works) | 1.25.0 | ✅ Documented + functional for the daemon's attached mesh |
|
||||||
|
|
||||||
|
## What remains (in dependency order)
|
||||||
|
|
||||||
|
### A. Daemon multi-mesh (the prerequisite for "ambient mode for everything")
|
||||||
|
|
||||||
|
**Why it's the critical path:** ambient mode today only works for the single mesh the daemon is attached to. Users with N meshes either run N daemons (different sock paths) or restart the daemon to switch. Neither is acceptable for the v2.0.0 promise.
|
||||||
|
|
||||||
|
**What it takes:**
|
||||||
|
- Daemon holds `Map<slug, DaemonBrokerClient>` instead of one broker.
|
||||||
|
- Outbox row's `mesh` column (1.25.0 added) is the dispatch key.
|
||||||
|
- IPC `/v1/send` requires `mesh` field (or infers from target prefix `<slug>:<target>`).
|
||||||
|
- IPC read endpoints (`/v1/peers`, `/v1/skills`, `/v1/profile`) accept `?mesh=<slug>` or return mesh-grouped results.
|
||||||
|
- SSE event payloads already include `mesh` slug; no change needed.
|
||||||
|
- Drain worker selects broker by row's `mesh` column.
|
||||||
|
- `daemon up` with no `--mesh` attaches to all joined meshes; with `--mesh X` restricts to X (legacy mode for explicit single-mesh).
|
||||||
|
- Inbox dedupe keeps using `client_message_id` UNIQUE; mesh column for filtering only.
|
||||||
|
|
||||||
|
**Estimated effort:** 1 week. ~600 LoC across `run.ts`, `drain.ts`, `ipc/server.ts`, plus tests for per-mesh dispatch.
|
||||||
|
|
||||||
|
**Risk:** medium. The single-mesh assumption is baked into a few places (peer-list response shape, skill-list response shape). Need to choose: per-mesh tagged responses (breaking) or array-of-meshes wrapped responses (additive). Recommend the latter for back-compat.
|
||||||
|
|
||||||
|
### B. HKDF-derived peer keypairs (cross-machine identity)
|
||||||
|
|
||||||
|
**Why it matters:** today each install per machine = fresh keypair = different mesh member identity. User signs in on laptop and desktop and shows up as two different members. v2.0.0 promised "same identity across machines."
|
||||||
|
|
||||||
|
**What it takes:**
|
||||||
|
- `HKDF(account_secret, info: "claudemesh/mesh/<mesh_id>/peer", salt: <user_id>)` derives a deterministic ed25519 keypair per mesh.
|
||||||
|
- `account_secret` derives from the user's authenticated session — needs broker-side endpoint to vend it on first install.
|
||||||
|
- Enrollment flow changes: instead of generating a fresh keypair, derive it. Subsequent installs find the same pubkey already in `mesh.member` and skip enrollment.
|
||||||
|
- Migration: existing members keep their old keypairs (they're stored in config). Only new joins use HKDF. Optional: opt-in re-enrollment for users who want cross-machine sync.
|
||||||
|
- Broker hello-sig protocol unchanged (still ed25519 sign).
|
||||||
|
|
||||||
|
**Estimated effort:** 2-3 weeks. Touches enrollment, broker auth, dashboard, security review.
|
||||||
|
|
||||||
|
**Risk:** high. Crypto change with security implications. Needs design review (account_secret distribution security, HKDF salt choice, key compromise recovery story).
|
||||||
|
|
||||||
|
### C. Mesh → workspace public surface rename
|
||||||
|
|
||||||
|
**Why it matters:** "mesh" is internal jargon for what users experience as "a workspace." v2.0.0 calls for the rename to align UX language.
|
||||||
|
|
||||||
|
**What it takes:**
|
||||||
|
- All CLI verbs gain `workspace` aliases (`claudemesh workspace list` ≡ `claudemesh list`).
|
||||||
|
- Help text, docs, README, marketing site updated.
|
||||||
|
- DB tables stay `mesh_*` (migration cost prohibitive; not user-visible).
|
||||||
|
- Wire protocol stays `mesh_*` (broker change too disruptive).
|
||||||
|
- Eventually deprecate the `mesh` aliases (~2 minor versions later).
|
||||||
|
|
||||||
|
**Estimated effort:** 3-4 days. Mostly rote search/replace + new aliases.
|
||||||
|
|
||||||
|
**Risk:** low. Cosmetic.
|
||||||
|
|
||||||
|
### D. Full CLI-to-thin-client conversion
|
||||||
|
|
||||||
|
**Why it matters:** today the CLI has bridge + cold-path code that duplicates ~3000 LoC of broker WS / crypto / decode logic that the daemon also has. Once daemon is multi-mesh, every verb can become "open IPC, send request, render response."
|
||||||
|
|
||||||
|
**What it takes:**
|
||||||
|
- Each verb: replace `withMesh(...)` (which opens its own broker WS) with `daemonOnly(...)` (calls IPC, errors if daemon down).
|
||||||
|
- Drop `bridge/server.ts`, `bridge/client.ts`, `bridge/socket-broker.ts` entirely.
|
||||||
|
- Drop most of `services/broker/ws-client.ts` from the CLI build (kept only for daemon's internal use).
|
||||||
|
- CLI binary shrinks ~30-40%.
|
||||||
|
- Daemon becomes the only broker WS holder per user.
|
||||||
|
|
||||||
|
**Estimated effort:** 1 week. Mostly mechanical; strict typescript catches most issues.
|
||||||
|
|
||||||
|
**Risk:** medium. Breaks workflows where CLI is used without daemon (CI environments, headless scripts). Need to keep a `--no-daemon` escape hatch or document the constraint.
|
||||||
|
|
||||||
|
## Recommended sequencing
|
||||||
|
|
||||||
|
```
|
||||||
|
1.25.0 (today): Sprint 4 outbound routing + CLI thin-client read paths + ambient mode docs
|
||||||
|
1.26.0 (next): A. Daemon multi-mesh — "ambient mode for everything"
|
||||||
|
1.27.0: D. CLI-to-thin-client conversion — drops ~3000 LoC
|
||||||
|
1.28.0: C. Mesh → workspace rename (aliases shipped, no removal yet)
|
||||||
|
2.0.0: B. HKDF identity (separate security-reviewed arc)
|
||||||
|
```
|
||||||
|
|
||||||
|
A → D → C → B is the right order:
|
||||||
|
- A unblocks ambient mode for multi-mesh users (highest UX value).
|
||||||
|
- D unblocks the LoC reduction the v2.0.0 promise mentioned ("3000 LoC removed").
|
||||||
|
- C is cosmetic; do it once D has stabilized.
|
||||||
|
- B is the most security-sensitive; do it last, with proper review.
|
||||||
|
|
||||||
|
## Out of scope for the v2.0.0 endpoint
|
||||||
|
|
||||||
|
- **Topic crypto (Sprint 5+).** Topics still ship as base64 plaintext. Real per-topic encryption is a v0.3.0 operator-layer item, parallel track.
|
||||||
|
- **Broker hardening for daemon idempotency (Sprint 7).** Partial unique index on `(mesh_id, client_message_id) WHERE NOT NULL` and the `mesh.client_message_dedupe` table. Documented in `2026-05-03-daemon-spec-broker-hardening-followups.md`.
|
||||||
|
- **`launch` deprecation.** 1.25.0 docs now recommend ambient mode for default cases; `launch` stays as the override path. Full deprecation is a 2.x decision.
|
||||||
350
.artifacts/specs/2026-05-05-continuous-presence.md
Normal file
350
.artifacts/specs/2026-05-05-continuous-presence.md
Normal file
@@ -0,0 +1,350 @@
|
|||||||
|
# Continuous presence — lease model + resume token
|
||||||
|
|
||||||
|
**Status:** spec, ready for v0.3.0.
|
||||||
|
**Owner:** alezmad
|
||||||
|
**Author:** Claude (2026-05-05, follow-up to user-reported "after hours claudemesh disconnects")
|
||||||
|
**Related:** `2026-05-04-per-session-presence.md` (per-launch ephemeral keypair), `apps/broker/src/index.ts:5430-5436` (current 30s ping loop), `apps/cli/src/daemon/ws-lifecycle.ts` (current backoff reconnect).
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
Today, presence is fused to a single TCP/WS connection. When the
|
||||||
|
connection breaks — half-dead NAT entries, ISP route changes, laptop
|
||||||
|
sleep, broker restart — the broker tears down the presence row, fires
|
||||||
|
`peer_left`, and waits for the daemon to dial a fresh socket and run
|
||||||
|
the full attestation hello again. Other peers see the user blink
|
||||||
|
offline → back online. Messages sent to the session during the gap are
|
||||||
|
either dropped (if it's a `now`/`next` priority DM with no recipient
|
||||||
|
match) or held in `message_queue` for `low` only.
|
||||||
|
|
||||||
|
Concrete symptom (user-reported): `claudemesh peer list` shows zero
|
||||||
|
peers despite multiple sessions being "up" — they're stuck on
|
||||||
|
half-dead TCP connections. Daemon hasn't noticed because no `close`
|
||||||
|
fired. Hours later, kernel TCP keepalive (default Linux: 7200s idle +
|
||||||
|
9 × 75s probes ≈ 2h11m) finally RSTs the socket, daemon's existing
|
||||||
|
backoff reconnects, peers reappear. Until then: zombie session.
|
||||||
|
|
||||||
|
Two coupled bugs:
|
||||||
|
|
||||||
|
1. **No application-layer staleness detection.** Broker pings every
|
||||||
|
30s (line 5431) and updates `lastPingAt` on pong, but never
|
||||||
|
`terminate()`s a connection that stops returning pongs. Daemon
|
||||||
|
doesn't ping at all. Both sides trust the kernel for liveness,
|
||||||
|
which only fires after hours.
|
||||||
|
|
||||||
|
2. **Presence == connection.** Even once the staleness IS detected
|
||||||
|
and the daemon reconnects, peers see a full `peer_left` /
|
||||||
|
`peer_joined` cycle for a network blip that took 1–30 seconds.
|
||||||
|
Outbound messages during the gap that target the session by
|
||||||
|
pubkey route to nothing.
|
||||||
|
|
||||||
|
The user's ask: peers should never see a gap during transient
|
||||||
|
disconnects. Presence should be continuous as long as the *session
|
||||||
|
intent* is alive, regardless of how many sockets carried it.
|
||||||
|
|
||||||
|
## Goal
|
||||||
|
|
||||||
|
Presence is a **lease** keyed off the session's stable identity
|
||||||
|
(`sessionPubkey`), held in broker memory + DB, with a TTL refreshed
|
||||||
|
on every keepalive. Sockets come and go beneath the lease. Other peers
|
||||||
|
see continuous online status across reconnects up to the lease TTL.
|
||||||
|
|
||||||
|
Specifically:
|
||||||
|
|
||||||
|
- A daemon (or per-session WS) can drop and re-establish the WS
|
||||||
|
within a configurable grace window (default 90s) without any peer
|
||||||
|
observing `peer_left` / `peer_joined`.
|
||||||
|
- Messages sent to a session while its socket is mid-flap are queued,
|
||||||
|
delivered on the next reattach, ordered.
|
||||||
|
- Reconnect itself is sub-second on the wire when a `resume_token` is
|
||||||
|
presented — broker recognises the session, restores the slot, no
|
||||||
|
re-attestation round-trip.
|
||||||
|
- After the grace window expires, the broker fires `peer_left`
|
||||||
|
exactly once; on a later reconnect it fires `peer_joined` exactly
|
||||||
|
once. No flapping.
|
||||||
|
|
||||||
|
## Non-goals
|
||||||
|
|
||||||
|
- **Multi-broker handoff.** Out of scope. If the broker process
|
||||||
|
restarts, leases are lost and we fall back to today's behavior
|
||||||
|
(clean reconnect, peers see one cycle). A future spec can address
|
||||||
|
this with a shared lease store (Redis / Postgres LISTEN).
|
||||||
|
- **Dual-socket on the daemon.** Useful gold-plating but not required
|
||||||
|
for the user-facing problem. Single-socket with watchdog +
|
||||||
|
resume-token covers the failure modes actually observed (NAT drops,
|
||||||
|
ISP blips, sleep <90s).
|
||||||
|
- **Manual `claudemesh reconnect` CLI.** Not needed; the lease model
|
||||||
|
makes it redundant. Re-evaluate if real support cases surface.
|
||||||
|
|
||||||
|
## Design
|
||||||
|
|
||||||
|
### Lease model
|
||||||
|
|
||||||
|
```
|
||||||
|
sessionPubkey → { transport: "online" | "offline",
|
||||||
|
leaseUntil: Date,
|
||||||
|
ws: WebSocket | null,
|
||||||
|
...existing PeerConn fields }
|
||||||
|
```
|
||||||
|
|
||||||
|
Today the `connections` Map IS keyed by `presenceId`, which is a fresh
|
||||||
|
UUID per WS. We change that key to `sessionPubkey` (member-WS:
|
||||||
|
`memberPubkey`; session-WS: `sessionPubkey`). The PeerConn struct
|
||||||
|
gains:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
transport: "online" | "offline";
|
||||||
|
leaseUntil: Date; // Date.now() + LEASE_TTL_MS
|
||||||
|
evictionTimer: NodeJS.Timeout | null;
|
||||||
|
```
|
||||||
|
|
||||||
|
### State transitions
|
||||||
|
|
||||||
|
**On WS open + hello accepted (initial):**
|
||||||
|
- Insert into `connections` with `transport: "online"`,
|
||||||
|
`leaseUntil: now + 90s`, `evictionTimer: null`.
|
||||||
|
- Broadcast `peer_joined` (today's behavior).
|
||||||
|
- Issue `resume_token` (see below) in the `hello_ack`.
|
||||||
|
|
||||||
|
**On WS open + hello carries valid `resume_token`:**
|
||||||
|
- Look up by `sessionPubkey`, verify token signature + freshness
|
||||||
|
(TTL <= LEASE_TTL_MS). If valid AND entry exists with
|
||||||
|
`transport: "offline"`:
|
||||||
|
- Cancel `evictionTimer`.
|
||||||
|
- Swap `ws` reference.
|
||||||
|
- Set `transport: "online"`, refresh `leaseUntil`.
|
||||||
|
- **Do NOT** broadcast `peer_joined`. The lease never expired.
|
||||||
|
- Drain any queued DMs accumulated during offline window.
|
||||||
|
- Reply `hello_ack` with new `resume_token`.
|
||||||
|
- If entry exists with `transport: "online"` (token replay attack or
|
||||||
|
rapid reconnect race): close old `ws` with `1000, "session_replaced"`
|
||||||
|
before swapping. Same as today's `oldConn.ws.close(1000, ...)`
|
||||||
|
pattern at lines 1768/1996.
|
||||||
|
- If no entry exists or token is stale: treat as a fresh hello,
|
||||||
|
broadcast `peer_joined`. Token expired = same as a cold start.
|
||||||
|
|
||||||
|
**On WS close (any reason):**
|
||||||
|
- Look up by `sessionPubkey`. If not found, no-op (already evicted).
|
||||||
|
- Set `transport: "offline"`, clear `ws` reference.
|
||||||
|
- Start `evictionTimer = setTimeout(evict, GRACE_MS)`.
|
||||||
|
- **Do NOT** broadcast `peer_left`. **Do NOT** delete the entry.
|
||||||
|
- **Do NOT** call `disconnectPresence(presenceId)` yet.
|
||||||
|
|
||||||
|
**On `evictionTimer` fire (lease expired without reattach):**
|
||||||
|
- Delete from `connections`.
|
||||||
|
- Broadcast `peer_left` (today's behavior at lines 5167-5189).
|
||||||
|
- `decMeshCount`.
|
||||||
|
- `disconnectPresence(presenceId)`.
|
||||||
|
- Clean up URL watches, stream subs, MCP registry — same as today's
|
||||||
|
close handler.
|
||||||
|
- Audit `peer_left`.
|
||||||
|
|
||||||
|
**Watchdog (broker):**
|
||||||
|
- The 30s ping loop (line 5431) gains a staleness check: if any
|
||||||
|
conn's `transport === "online"` and `lastPingAt < now - 75s`, call
|
||||||
|
`ws.terminate()`. This converts the half-dead socket into a clean
|
||||||
|
`close` event, which fires the lease-offline transition above.
|
||||||
|
- Same logic on the daemon side (see § Daemon changes).
|
||||||
|
|
||||||
|
### Resume token
|
||||||
|
|
||||||
|
A short opaque string the broker hands the daemon in `hello_ack`.
|
||||||
|
Format: `mesh-resume.v1.<base64url(JSON-payload)>.<base64url(sig)>`
|
||||||
|
where `JSON-payload = { sub: <sessionPubkey>, mid: <meshId>, exp:
|
||||||
|
<unix-ms>, iat: <unix-ms> }` and `sig = ed25519(brokerSigningKey,
|
||||||
|
JSON-payload)`.
|
||||||
|
|
||||||
|
- **Why a token, not just sessionPubkey?** A session needs to prove
|
||||||
|
it's the holder of an existing lease without re-running the full
|
||||||
|
attestation handshake (which involves member key + parent
|
||||||
|
attestation lookup). The token is a server-issued cookie: cheap to
|
||||||
|
verify, scoped to a single session, expires with the lease.
|
||||||
|
- **Storage:** broker keeps the signing key in env (`RESUME_TOKEN_KEY`,
|
||||||
|
generated on first boot if missing, persisted to a config row). No
|
||||||
|
DB column needed for the tokens themselves — they're verified by
|
||||||
|
signature alone.
|
||||||
|
- **TTL:** equal to LEASE_TTL_MS (90s). After that the daemon must
|
||||||
|
re-handshake with full attestation. Refreshed on every successful
|
||||||
|
reattach.
|
||||||
|
- **Daemon storage:** in-memory only. Lost on daemon restart, which
|
||||||
|
is correct: a daemon restart is a real reconnect and should run
|
||||||
|
the full hello.
|
||||||
|
|
||||||
|
### Wire protocol additions
|
||||||
|
|
||||||
|
`hello` (member-WS, session-WS, fresh-launch hello — all three):
|
||||||
|
```diff
|
||||||
|
{
|
||||||
|
type: "hello",
|
||||||
|
memberPubkey: "...",
|
||||||
|
sessionPubkey: "...", // session-WS only
|
||||||
|
attestation: "...", // session-WS only
|
||||||
|
signature: "...",
|
||||||
|
+ resumeToken?: "mesh-resume.v1...", // optional; presence = reattach attempt
|
||||||
|
...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`hello_ack`:
|
||||||
|
```diff
|
||||||
|
{
|
||||||
|
type: "hello_ack",
|
||||||
|
presenceId: "...",
|
||||||
|
...
|
||||||
|
+ resumeToken: "mesh-resume.v1...", // always issued; replaces prior on reattach
|
||||||
|
+ leaseTtlMs: 90000, // informational; daemon may use for ping cadence
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
No new message types. Old daemons that don't send `resumeToken` get
|
||||||
|
today's full-handshake behavior — fully backward compatible.
|
||||||
|
|
||||||
|
### Message queue during grace window
|
||||||
|
|
||||||
|
Today: DMs to a presence whose WS is closed → routed to
|
||||||
|
`message_queue` only for `priority: low`; `now`/`next` either route
|
||||||
|
to a different connected session of the same member or drop.
|
||||||
|
|
||||||
|
Change: when broker would route to a session whose
|
||||||
|
`transport === "offline"` (lease still valid), enqueue regardless of
|
||||||
|
priority. On reattach, the existing inbox-drain path
|
||||||
|
(`maybePushQueuedMessages` at line 967) flushes them in order. The
|
||||||
|
`message_queue` already has the schema for this; we're just relaxing
|
||||||
|
the priority gate when the target is in grace.
|
||||||
|
|
||||||
|
### Constants
|
||||||
|
|
||||||
|
```ts
|
||||||
|
const LEASE_TTL_MS = 90_000; // grace window after WS close
|
||||||
|
const PING_INTERVAL_MS = 30_000; // unchanged
|
||||||
|
const STALE_PONG_THRESHOLD_MS = 75_000; // 2.5x ping interval
|
||||||
|
const RESUME_TOKEN_TTL_MS = LEASE_TTL_MS;
|
||||||
|
```
|
||||||
|
|
||||||
|
`LEASE_TTL_MS` = 90s rationale: long enough to absorb a sleep/resume
|
||||||
|
cycle, NAT timeout, ISP route flap, mobile→wifi handover. Short
|
||||||
|
enough that a true crash (daemon killed, machine off) clears the
|
||||||
|
session within 90s — peers don't see ghost online status forever.
|
||||||
|
Configurable via env (`LEASE_TTL_MS`) for self-hosted brokers.
|
||||||
|
|
||||||
|
## Daemon changes
|
||||||
|
|
||||||
|
### Watchdog
|
||||||
|
|
||||||
|
In `ws-lifecycle.ts`, add an `idleWatchdog` parallel to the existing
|
||||||
|
backoff/reconnect machinery:
|
||||||
|
|
||||||
|
```ts
|
||||||
|
let lastActivity = Date.now(); // bumped on every incoming message + pong
|
||||||
|
const watchdog = setInterval(() => {
|
||||||
|
if (Date.now() - lastActivity > STALE_THRESHOLD_MS) {
|
||||||
|
log("warn", "ws_stale_terminate", { url: opts.url });
|
||||||
|
sock.terminate(); // fires existing close handler → reconnect path
|
||||||
|
} else if (sock.readyState === sock.OPEN) {
|
||||||
|
sock.ping(); // matches broker's 30s cadence, gives broker a pong
|
||||||
|
}
|
||||||
|
}, PING_INTERVAL_MS);
|
||||||
|
sock.on("message", () => { lastActivity = Date.now(); });
|
||||||
|
sock.on("pong", () => { lastActivity = Date.now(); });
|
||||||
|
```
|
||||||
|
|
||||||
|
Cleanup `clearInterval(watchdog)` in the close handler and explicit
|
||||||
|
`close()` path.
|
||||||
|
|
||||||
|
### Resume token in hello
|
||||||
|
|
||||||
|
`apps/cli/src/daemon/broker.ts:136` and equivalent in
|
||||||
|
`session-broker.ts`: persist the `resumeToken` from each successful
|
||||||
|
`hello_ack` into a private field, include it in the next
|
||||||
|
`buildHello()` call. On daemon restart the field is empty → cold
|
||||||
|
start, exactly today's behavior.
|
||||||
|
|
||||||
|
### No CLI changes
|
||||||
|
|
||||||
|
`claudemesh peer list` keeps reading the broker's `connections` Map
|
||||||
|
which now reflects continuous presence. Users see online sessions as
|
||||||
|
online during transient blips. No UX surface changes.
|
||||||
|
|
||||||
|
## Migration
|
||||||
|
|
||||||
|
- New broker is fully backward compatible with old daemons (resume
|
||||||
|
token is optional, defaults fall through to today's path).
|
||||||
|
- New daemons against an old broker: token is sent but ignored, full
|
||||||
|
handshake runs each reconnect — same as today.
|
||||||
|
- DB migration: none. `presence` table semantics unchanged. The
|
||||||
|
`disconnectedAt` column is now set only on lease eviction (>90s),
|
||||||
|
not on every WS close. This is a behavioral change but not a
|
||||||
|
schema change.
|
||||||
|
- Add ENV var `RESUME_TOKEN_KEY` (broker generates on first boot if
|
||||||
|
unset, persists to a singleton config row).
|
||||||
|
|
||||||
|
## Test plan
|
||||||
|
|
||||||
|
1. **Sleep test:** kill -STOP the daemon for 60s, then kill -CONT.
|
||||||
|
Expect: peers never see `peer_left`. Daemon's WS is dead-on-arrival
|
||||||
|
when it wakes; watchdog terminates it; reconnect with resume_token
|
||||||
|
succeeds within 1-2s; lease was at ~30s of its 90s TTL when the
|
||||||
|
daemon resumed.
|
||||||
|
|
||||||
|
2. **Hard offline:** kill -STOP for 120s, kill -CONT. Expect: peers
|
||||||
|
see exactly one `peer_left` at t=90s, then exactly one
|
||||||
|
`peer_joined` after the daemon resumes and reconnects (resume
|
||||||
|
token is now stale; full handshake runs).
|
||||||
|
|
||||||
|
3. **NAT drop simulation:** `iptables -A OUTPUT -p tcp --dport 443
|
||||||
|
-j DROP` for 60s on the daemon host, then remove the rule. Expect:
|
||||||
|
broker pings stop landing, broker-side watchdog calls
|
||||||
|
`ws.terminate()` at t=75s, lease enters grace, daemon's own
|
||||||
|
watchdog fires within ~30s, daemon reconnects with resume_token,
|
||||||
|
peers never see a flap.
|
||||||
|
|
||||||
|
4. **Message-during-grace:** while a target session is in grace
|
||||||
|
(offline, lease valid), send a `priority: now` DM. Expect: queued
|
||||||
|
in `message_queue`, delivered exactly once on reattach, no
|
||||||
|
`peer_left` visible to sender, ack returns delivered.
|
||||||
|
|
||||||
|
5. **Replay attack:** capture a resume_token in flight, replay it
|
||||||
|
against a different broker connection while the original session
|
||||||
|
is still online. Expect: broker treats it as a reconnect for an
|
||||||
|
already-online session → closes old WS with `session_replaced`,
|
||||||
|
new WS takes over. Equivalent to today's session-replacement
|
||||||
|
semantics; the original session detects the close and either
|
||||||
|
reconnects (if it's still alive) or gives up.
|
||||||
|
|
||||||
|
6. **Token forgery:** send a `resumeToken` not signed by the broker.
|
||||||
|
Expect: signature check fails, broker treats hello as a fresh
|
||||||
|
handshake (or rejects if the rest of the hello is invalid).
|
||||||
|
|
||||||
|
## Open questions
|
||||||
|
|
||||||
|
- **Should `peer list` expose a `transport` field** so callers can
|
||||||
|
distinguish "leased but offline" from "online"? Default no — the
|
||||||
|
abstraction we're selling is "they're online." But debugging may
|
||||||
|
want it; gate it behind `--all` or `--debug`.
|
||||||
|
- **What about the broker-side `mcpRegistry` cleanup?** Today we
|
||||||
|
delete non-persistent MCP entries on WS close (line 5217). With
|
||||||
|
leases, we should defer that to lease eviction, not WS close.
|
||||||
|
Otherwise an MCP server registered by a session disappears every
|
||||||
|
time its WS reconnects.
|
||||||
|
|
||||||
|
## Build order
|
||||||
|
|
||||||
|
1. **Broker lease model** — change `connections` keying, add
|
||||||
|
`transport`/`leaseUntil`/`evictionTimer`, refactor close handler
|
||||||
|
to start grace timer instead of immediate teardown, refactor
|
||||||
|
eviction path. (~80 lines.)
|
||||||
|
2. **Resume token** — signing key bootstrap, token issue/verify,
|
||||||
|
wire format, hello_ack changes. (~50 lines + 1 config row.)
|
||||||
|
3. **Daemon watchdog** — `ws-lifecycle.ts` adds `idleWatchdog` and
|
||||||
|
stores `resumeToken` from acks. (~25 lines.)
|
||||||
|
4. **Daemon hello** — pass `resumeToken` in next `buildHello()`.
|
||||||
|
(~10 lines across `broker.ts` + `session-broker.ts`.)
|
||||||
|
5. **Broker watchdog** — extend the 30s ping loop with
|
||||||
|
`terminate()`-on-stale logic. (~15 lines.)
|
||||||
|
6. **Queue-during-grace** — relax priority gate in DM routing.
|
||||||
|
(~5 lines.)
|
||||||
|
7. **Spec docs** — update `docs/protocol.md` with resume_token,
|
||||||
|
lease semantics. (~30 lines.)
|
||||||
|
8. **Tests** — six scenarios above. Likely ~3 new test files.
|
||||||
|
|
||||||
|
Estimated total: one focused day. The broker lease model is the load-
|
||||||
|
bearing change; everything else slots in cleanly once that's done.
|
||||||
@@ -1 +1 @@
|
|||||||
{"sessionId":"ae5dbe38-9c56-4d07-9fb6-a38cb8a250a6","pid":4612,"acquiredAt":1776217467441}
|
{"sessionId":"ae5dbe38-9c56-4d07-9fb6-a38cb8a250a6","pid":3633,"procStart":"Fri May 1 22:40:56 2026","acquiredAt":1777683244936}
|
||||||
71
.github/workflows/deploy-web.yml
vendored
Normal file
71
.github/workflows/deploy-web.yml
vendored
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
name: Deploy claudemesh-web
|
||||||
|
|
||||||
|
# Triggers a Coolify deploy of the apps/web Next.js app on the OVH VPS.
|
||||||
|
# Coolify only auto-deploys the broker (it watches the gitea-vps mirror);
|
||||||
|
# the web app needs an explicit poke. This workflow is the poke.
|
||||||
|
#
|
||||||
|
# The Coolify dashboard is bound to a Tailscale-only address
|
||||||
|
# (100.122.34.28:8000), so the runner first joins the tailnet via
|
||||||
|
# an OAuth-issued ephemeral node, then hits Coolify's deploy API.
|
||||||
|
#
|
||||||
|
# Path filter: redeploy on changes to the web app, the API package
|
||||||
|
# (bundled into the web build), or any shared package the web app
|
||||||
|
# transpiles. Anything else (broker-only, cli-only, docs) skips it.
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [main]
|
||||||
|
paths:
|
||||||
|
- "apps/web/**"
|
||||||
|
- "packages/api/**"
|
||||||
|
- "packages/db/**"
|
||||||
|
- "packages/auth/**"
|
||||||
|
- "packages/ui/**"
|
||||||
|
- "packages/i18n/**"
|
||||||
|
- "packages/shared/**"
|
||||||
|
- "packages/email/**"
|
||||||
|
- "packages/billing/**"
|
||||||
|
- "packages/storage/**"
|
||||||
|
- "packages/monitoring-web/**"
|
||||||
|
- "pnpm-lock.yaml"
|
||||||
|
- ".github/workflows/deploy-web.yml"
|
||||||
|
workflow_dispatch:
|
||||||
|
|
||||||
|
# Coalesce rapid pushes — only one deploy in flight at a time, and
|
||||||
|
# if a newer push lands while one is queued, the older one is
|
||||||
|
# cancelled. Avoids the "5 commits, 5 deploys" stampede.
|
||||||
|
concurrency:
|
||||||
|
group: deploy-web
|
||||||
|
cancel-in-progress: true
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
deploy:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Connect to Tailscale
|
||||||
|
uses: tailscale/github-action@v3
|
||||||
|
with:
|
||||||
|
oauth-client-id: ${{ secrets.TS_OAUTH_CLIENT_ID }}
|
||||||
|
oauth-secret: ${{ secrets.TS_OAUTH_SECRET }}
|
||||||
|
tags: tag:ci
|
||||||
|
|
||||||
|
- name: Trigger Coolify deploy
|
||||||
|
env:
|
||||||
|
COOLIFY_TOKEN: ${{ secrets.COOLIFY_TOKEN }}
|
||||||
|
APP_UUID: p68x1e3k4xmrjmblca5ybe09
|
||||||
|
run: |
|
||||||
|
if [ -z "$COOLIFY_TOKEN" ]; then
|
||||||
|
echo "::error::COOLIFY_TOKEN secret is not set"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
response=$(curl -sS -w "\n%{http_code}" -X GET \
|
||||||
|
"http://100.122.34.28:8000/api/v1/deploy?uuid=${APP_UUID}&force=true" \
|
||||||
|
-H "Authorization: Bearer ${COOLIFY_TOKEN}")
|
||||||
|
status=$(echo "$response" | tail -n1)
|
||||||
|
body=$(echo "$response" | sed '$d')
|
||||||
|
echo "HTTP $status"
|
||||||
|
echo "$body"
|
||||||
|
if [ "$status" != "200" ]; then
|
||||||
|
echo "::error::Coolify returned HTTP $status"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
@@ -20,11 +20,12 @@ Peer mesh for Claude Code sessions. Broker + CLI + MCP server.
|
|||||||
|
|
||||||
## Deploy
|
## Deploy
|
||||||
|
|
||||||
- **Broker:** `git push gitea-vps main` triggers Coolify auto-deploy. Manual: `curl -s -X GET "http://100.122.34.28:8000/api/v1/deploy?uuid=mcn8m74tbxfxbplmyb40b2ia" -H "Authorization: Bearer 3|K2vkSJzdUA69rj22CKZc5z0YB6pkY43GLEonti3UzcnqVJj6WhrqqYTAng6DzMUi"`. Pending migrations apply automatically on startup.
|
- **Broker:** `git push gitea-vps main` triggers Coolify auto-deploy via the gitea webhook. Pending migrations apply automatically on startup.
|
||||||
|
- **Web:** Coolify on the OVH VPS (`claudemesh.com` resolves to `135.125.191.245`, NOT Vercel — the `apps/web/Dockerfile` is what Coolify builds). Auto-deploys via `.github/workflows/deploy-web.yml` on push to `main` when paths under `apps/web/**` or `packages/{api,db,auth,ui,i18n,shared,email,billing,storage,monitoring-web}/**` change. The workflow joins the tailnet via Tailscale OAuth, then hits the Coolify API.
|
||||||
|
- **Manual deploy** (if the workflow is broken or the path filter missed something) — Coolify dashboard at `http://100.122.34.28:8000` (Tailscale only). Token in `COOLIFY_TOKEN` repo secret. App UUIDs: broker `mcn8m74tbxfxbplmyb40b2ia`, web `p68x1e3k4xmrjmblca5ybe09`.
|
||||||
- **CLI:**
|
- **CLI:**
|
||||||
- npm: `cd apps/cli && npm publish --tag alpha --access public --no-git-checks --ignore-scripts`
|
- npm: `cd apps/cli && npm publish --access public --no-git-checks --ignore-scripts`
|
||||||
- Binaries: `git tag cli-v<version> && git push github cli-v<version>` — workflow builds 5 platforms.
|
- Binaries: `git tag cli-v<version> && git push github cli-v<version>` — workflow builds 5 platforms.
|
||||||
- **Web:** Vercel auto-deploy on push to GitHub
|
|
||||||
|
|
||||||
## Dev
|
## Dev
|
||||||
|
|
||||||
|
|||||||
@@ -369,8 +369,19 @@ export interface ConnectParams {
|
|||||||
pid: number;
|
pid: number;
|
||||||
cwd: string;
|
cwd: string;
|
||||||
groups?: Array<{ name: string; role?: string }>;
|
groups?: Array<{ name: string; role?: string }>;
|
||||||
|
/**
|
||||||
|
* v2 agentic-comms (M1) — connection role.
|
||||||
|
* 'control-plane' — daemon WS (hidden from user-facing peer lists).
|
||||||
|
* 'session' — per-Claude-Code-session WS (default).
|
||||||
|
* 'service' — autonomous bots/services attached to the mesh.
|
||||||
|
* Optional for backwards compatibility; defaults to 'session'.
|
||||||
|
*/
|
||||||
|
role?: PresenceRole;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/** v2 agentic-comms (M1): typed connection roles. */
|
||||||
|
export type PresenceRole = "control-plane" | "session" | "service";
|
||||||
|
|
||||||
/** Create a presence row for a new WS connection. */
|
/** Create a presence row for a new WS connection. */
|
||||||
export async function connectPresence(
|
export async function connectPresence(
|
||||||
params: ConnectParams,
|
params: ConnectParams,
|
||||||
@@ -389,6 +400,7 @@ export async function connectPresence(
|
|||||||
statusSource: "jsonl",
|
statusSource: "jsonl",
|
||||||
statusUpdatedAt: now,
|
statusUpdatedAt: now,
|
||||||
groups: params.groups ?? [],
|
groups: params.groups ?? [],
|
||||||
|
role: params.role ?? "session",
|
||||||
connectedAt: now,
|
connectedAt: now,
|
||||||
lastPingAt: now,
|
lastPingAt: now,
|
||||||
})
|
})
|
||||||
@@ -415,6 +427,21 @@ export async function heartbeat(presenceId: string): Promise<void> {
|
|||||||
.where(eq(presence.id, presenceId));
|
.where(eq(presence.id, presenceId));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Restore a presence row to online state on lease reattach: clear
|
||||||
|
* `disconnectedAt` and bump `lastPingAt`. Needed because the DB-level
|
||||||
|
* stale-presence sweeper may have flipped the row to disconnected
|
||||||
|
* during the grace window — the lease is in-memory truth, but other
|
||||||
|
* code paths read presence.disconnectedAt directly.
|
||||||
|
*/
|
||||||
|
export async function restorePresence(presenceId: string): Promise<void> {
|
||||||
|
const now = new Date();
|
||||||
|
await db
|
||||||
|
.update(presence)
|
||||||
|
.set({ disconnectedAt: null, lastPingAt: now })
|
||||||
|
.where(eq(presence.id, presenceId));
|
||||||
|
}
|
||||||
|
|
||||||
// --- Peer discovery ---
|
// --- Peer discovery ---
|
||||||
|
|
||||||
/** Return all active (connected) presences in a mesh, joined with member info. */
|
/** Return all active (connected) presences in a mesh, joined with member info. */
|
||||||
@@ -431,6 +458,11 @@ export async function listPeersInMesh(
|
|||||||
sessionId: string;
|
sessionId: string;
|
||||||
cwd: string;
|
cwd: string;
|
||||||
connectedAt: Date;
|
connectedAt: Date;
|
||||||
|
/** v2 agentic-comms (M1): connection role. CLI uses this to hide
|
||||||
|
* control-plane daemons from user-facing lists. Wire-level field
|
||||||
|
* is `peerRole` to avoid collision with 1.31.5's top-level `role`
|
||||||
|
* lift of profile.role (user-supplied string like "lead"). */
|
||||||
|
peerRole: PresenceRole;
|
||||||
}>
|
}>
|
||||||
> {
|
> {
|
||||||
const rows = await db
|
const rows = await db
|
||||||
@@ -445,6 +477,7 @@ export async function listPeersInMesh(
|
|||||||
sessionId: presence.sessionId,
|
sessionId: presence.sessionId,
|
||||||
cwd: presence.cwd,
|
cwd: presence.cwd,
|
||||||
connectedAt: presence.connectedAt,
|
connectedAt: presence.connectedAt,
|
||||||
|
peerRole: presence.role,
|
||||||
})
|
})
|
||||||
.from(presence)
|
.from(presence)
|
||||||
.innerJoin(memberTable, eq(presence.memberId, memberTable.id))
|
.innerJoin(memberTable, eq(presence.memberId, memberTable.id))
|
||||||
@@ -469,6 +502,7 @@ export async function listPeersInMesh(
|
|||||||
sessionId: r.sessionId,
|
sessionId: r.sessionId,
|
||||||
cwd: r.cwd,
|
cwd: r.cwd,
|
||||||
connectedAt: r.connectedAt,
|
connectedAt: r.connectedAt,
|
||||||
|
peerRole: (r.peerRole ?? "session") as PresenceRole,
|
||||||
}));
|
}));
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -593,13 +627,23 @@ export async function createTopic(args: {
|
|||||||
if (!row) throw new Error("failed to create topic");
|
if (!row) throw new Error("failed to create topic");
|
||||||
|
|
||||||
// Seal a copy for the creator immediately. Other members get sealed
|
// Seal a copy for the creator immediately. Other members get sealed
|
||||||
// copies as they join via joinTopic().
|
// copies as they join (re-seal flow). Wrap in try/catch so a seal
|
||||||
|
// failure (bad pubkey, transient DB error) doesn't roll back topic
|
||||||
|
// creation — the user can re-seal later.
|
||||||
if (args.createdByMemberId) {
|
if (args.createdByMemberId) {
|
||||||
await sealTopicKeyForMember({
|
try {
|
||||||
topicId: row.id,
|
await sealTopicKeyForMember({
|
||||||
memberId: args.createdByMemberId,
|
topicId: row.id,
|
||||||
bundle: topicKeyBundle,
|
memberId: args.createdByMemberId,
|
||||||
});
|
bundle: topicKeyBundle,
|
||||||
|
});
|
||||||
|
} catch (err) {
|
||||||
|
// Topic exists but no key sealed for the creator. They'll get
|
||||||
|
// 404 on GET /key until another holder re-seals. Phase-3 flow
|
||||||
|
// handles this for any member, including the creator.
|
||||||
|
// Silent in-band — the topic create itself succeeded.
|
||||||
|
void err;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
return {
|
return {
|
||||||
@@ -624,7 +668,7 @@ async function generateTopicKeyBundle(): Promise<{
|
|||||||
senderPubkey: Uint8Array;
|
senderPubkey: Uint8Array;
|
||||||
senderPubkeyHex: string;
|
senderPubkeyHex: string;
|
||||||
}> {
|
}> {
|
||||||
const sodium = await import("libsodium-wrappers");
|
const sodium = (await import("libsodium-wrappers")).default;
|
||||||
await sodium.ready;
|
await sodium.ready;
|
||||||
const topicKey = sodium.randombytes_buf(32);
|
const topicKey = sodium.randombytes_buf(32);
|
||||||
const sender = sodium.crypto_box_keypair();
|
const sender = sodium.crypto_box_keypair();
|
||||||
@@ -664,7 +708,7 @@ async function sealTopicKeyForMember(args: {
|
|||||||
.where(eq(memberTable.id, args.memberId));
|
.where(eq(memberTable.id, args.memberId));
|
||||||
if (!member) return;
|
if (!member) return;
|
||||||
|
|
||||||
const sodium = await import("libsodium-wrappers");
|
const sodium = (await import("libsodium-wrappers")).default;
|
||||||
await sodium.ready;
|
await sodium.ready;
|
||||||
let recipientX25519: Uint8Array;
|
let recipientX25519: Uint8Array;
|
||||||
try {
|
try {
|
||||||
@@ -683,20 +727,28 @@ async function sealTopicKeyForMember(args: {
|
|||||||
recipientX25519,
|
recipientX25519,
|
||||||
args.bundle.senderSecret,
|
args.bundle.senderSecret,
|
||||||
);
|
);
|
||||||
|
// Embed sender x25519 pubkey as the first 32 bytes so re-sealed
|
||||||
|
// copies (which carry their own sender pubkey from a different
|
||||||
|
// member) decode the same way as creator-sealed copies.
|
||||||
|
const blob = new Uint8Array(32 + sealed.length);
|
||||||
|
blob.set(args.bundle.senderPubkey, 0);
|
||||||
|
blob.set(sealed, 32);
|
||||||
|
const encryptedKey = sodium.to_base64(blob, sodium.base64_variants.ORIGINAL);
|
||||||
|
const nonceB64 = sodium.to_base64(nonce, sodium.base64_variants.ORIGINAL);
|
||||||
|
|
||||||
await db
|
await db
|
||||||
.insert(meshTopicMemberKey)
|
.insert(meshTopicMemberKey)
|
||||||
.values({
|
.values({
|
||||||
topicId: args.topicId,
|
topicId: args.topicId,
|
||||||
memberId: args.memberId,
|
memberId: args.memberId,
|
||||||
encryptedKey: sodium.to_base64(sealed, sodium.base64_variants.ORIGINAL),
|
encryptedKey,
|
||||||
nonce: sodium.to_base64(nonce, sodium.base64_variants.ORIGINAL),
|
nonce: nonceB64,
|
||||||
})
|
})
|
||||||
.onConflictDoUpdate({
|
.onConflictDoUpdate({
|
||||||
target: [meshTopicMemberKey.topicId, meshTopicMemberKey.memberId],
|
target: [meshTopicMemberKey.topicId, meshTopicMemberKey.memberId],
|
||||||
set: {
|
set: {
|
||||||
encryptedKey: sodium.to_base64(sealed, sodium.base64_variants.ORIGINAL),
|
encryptedKey,
|
||||||
nonce: sodium.to_base64(nonce, sodium.base64_variants.ORIGINAL),
|
nonce: nonceB64,
|
||||||
rotatedAt: new Date(),
|
rotatedAt: new Date(),
|
||||||
},
|
},
|
||||||
});
|
});
|
||||||
@@ -820,6 +872,13 @@ export async function appendTopicMessage(args: {
|
|||||||
senderSessionPubkey?: string;
|
senderSessionPubkey?: string;
|
||||||
nonce: string;
|
nonce: string;
|
||||||
ciphertext: string;
|
ciphertext: string;
|
||||||
|
bodyVersion?: number;
|
||||||
|
/**
|
||||||
|
* Optional id of the parent topic message this one replies to. Server
|
||||||
|
* verifies the parent exists and lives in the same topic; otherwise
|
||||||
|
* silently drops the reference (treated as a top-level post).
|
||||||
|
*/
|
||||||
|
replyToId?: string;
|
||||||
/**
|
/**
|
||||||
* Optional client-extracted mention list (lowercased display names
|
* Optional client-extracted mention list (lowercased display names
|
||||||
* without the leading @). Required once per-topic encryption lands —
|
* without the leading @). Required once per-topic encryption lands —
|
||||||
@@ -828,6 +887,17 @@ export async function appendTopicMessage(args: {
|
|||||||
*/
|
*/
|
||||||
mentions?: string[];
|
mentions?: string[];
|
||||||
}): Promise<string> {
|
}): Promise<string> {
|
||||||
|
let validatedReplyTo: string | null = null;
|
||||||
|
if (args.replyToId) {
|
||||||
|
const [parent] = await db
|
||||||
|
.select({ id: meshTopicMessage.id, topicId: meshTopicMessage.topicId })
|
||||||
|
.from(meshTopicMessage)
|
||||||
|
.where(eq(meshTopicMessage.id, args.replyToId));
|
||||||
|
if (parent && parent.topicId === args.topicId) {
|
||||||
|
validatedReplyTo = parent.id;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
const [row] = await db
|
const [row] = await db
|
||||||
.insert(meshTopicMessage)
|
.insert(meshTopicMessage)
|
||||||
.values({
|
.values({
|
||||||
@@ -836,6 +906,8 @@ export async function appendTopicMessage(args: {
|
|||||||
senderSessionPubkey: args.senderSessionPubkey ?? null,
|
senderSessionPubkey: args.senderSessionPubkey ?? null,
|
||||||
nonce: args.nonce,
|
nonce: args.nonce,
|
||||||
ciphertext: args.ciphertext,
|
ciphertext: args.ciphertext,
|
||||||
|
bodyVersion: args.bodyVersion ?? 1,
|
||||||
|
replyToId: validatedReplyTo,
|
||||||
})
|
})
|
||||||
.returning({ id: meshTopicMessage.id });
|
.returning({ id: meshTopicMessage.id });
|
||||||
if (!row) throw new Error("failed to append topic message");
|
if (!row) throw new Error("failed to append topic message");
|
||||||
@@ -940,8 +1012,11 @@ export async function topicHistory(args: {
|
|||||||
id: string;
|
id: string;
|
||||||
senderMemberId: string;
|
senderMemberId: string;
|
||||||
senderPubkey: string;
|
senderPubkey: string;
|
||||||
|
senderName: string;
|
||||||
nonce: string;
|
nonce: string;
|
||||||
ciphertext: string;
|
ciphertext: string;
|
||||||
|
bodyVersion: number;
|
||||||
|
replyToId: string | null;
|
||||||
createdAt: Date;
|
createdAt: Date;
|
||||||
}>
|
}>
|
||||||
> {
|
> {
|
||||||
@@ -953,13 +1028,18 @@ export async function topicHistory(args: {
|
|||||||
id: string;
|
id: string;
|
||||||
sender_member_id: string;
|
sender_member_id: string;
|
||||||
sender_pubkey: string;
|
sender_pubkey: string;
|
||||||
|
sender_name: string;
|
||||||
nonce: string;
|
nonce: string;
|
||||||
ciphertext: string;
|
ciphertext: string;
|
||||||
|
body_version: number;
|
||||||
|
reply_to_id: string | null;
|
||||||
created_at: Date;
|
created_at: Date;
|
||||||
}>(sql`
|
}>(sql`
|
||||||
SELECT tm.id, tm.sender_member_id,
|
SELECT tm.id, tm.sender_member_id,
|
||||||
COALESCE(tm.sender_session_pubkey, m.peer_pubkey) AS sender_pubkey,
|
COALESCE(tm.sender_session_pubkey, m.peer_pubkey) AS sender_pubkey,
|
||||||
tm.nonce, tm.ciphertext, tm.created_at
|
m.display_name AS sender_name,
|
||||||
|
tm.nonce, tm.ciphertext, tm.body_version, tm.reply_to_id,
|
||||||
|
tm.created_at
|
||||||
FROM mesh.topic_message tm
|
FROM mesh.topic_message tm
|
||||||
JOIN mesh.member m ON m.id = tm.sender_member_id
|
JOIN mesh.member m ON m.id = tm.sender_member_id
|
||||||
WHERE tm.topic_id = ${args.topicId}
|
WHERE tm.topic_id = ${args.topicId}
|
||||||
@@ -967,20 +1047,26 @@ export async function topicHistory(args: {
|
|||||||
ORDER BY tm.created_at DESC, tm.id DESC
|
ORDER BY tm.created_at DESC, tm.id DESC
|
||||||
LIMIT ${limit}
|
LIMIT ${limit}
|
||||||
`);
|
`);
|
||||||
const rows = (result.rows ?? result) as Array<{
|
const rows = ((result as unknown as { rows?: unknown[] }).rows ?? (result as unknown as unknown[])) as Array<{
|
||||||
id: string;
|
id: string;
|
||||||
sender_member_id: string;
|
sender_member_id: string;
|
||||||
sender_pubkey: string;
|
sender_pubkey: string;
|
||||||
|
sender_name: string;
|
||||||
nonce: string;
|
nonce: string;
|
||||||
ciphertext: string;
|
ciphertext: string;
|
||||||
|
body_version: number;
|
||||||
|
reply_to_id: string | null;
|
||||||
created_at: Date;
|
created_at: Date;
|
||||||
}>;
|
}>;
|
||||||
return rows.map((r) => ({
|
return rows.map((r) => ({
|
||||||
id: r.id,
|
id: r.id,
|
||||||
senderMemberId: r.sender_member_id,
|
senderMemberId: r.sender_member_id,
|
||||||
senderPubkey: r.sender_pubkey,
|
senderPubkey: r.sender_pubkey,
|
||||||
|
senderName: r.sender_name,
|
||||||
nonce: r.nonce,
|
nonce: r.nonce,
|
||||||
ciphertext: r.ciphertext,
|
ciphertext: r.ciphertext,
|
||||||
|
bodyVersion: r.body_version ?? 1,
|
||||||
|
replyToId: r.reply_to_id,
|
||||||
createdAt: r.created_at instanceof Date ? r.created_at : new Date(r.created_at),
|
createdAt: r.created_at instanceof Date ? r.created_at : new Date(r.created_at),
|
||||||
}));
|
}));
|
||||||
}
|
}
|
||||||
@@ -1390,7 +1476,7 @@ export async function recallMemory(
|
|||||||
ORDER BY ts_rank(search_vector, plainto_tsquery('english', ${query})) DESC
|
ORDER BY ts_rank(search_vector, plainto_tsquery('english', ${query})) DESC
|
||||||
LIMIT 20
|
LIMIT 20
|
||||||
`);
|
`);
|
||||||
const rows = (result.rows ?? result) as Array<{
|
const rows = ((result as unknown as { rows?: unknown[] }).rows ?? (result as unknown as unknown[])) as Array<{
|
||||||
id: string;
|
id: string;
|
||||||
content: string;
|
content: string;
|
||||||
tags: string[];
|
tags: string[];
|
||||||
@@ -1958,7 +2044,7 @@ export async function getContext(
|
|||||||
ORDER BY updated_at DESC
|
ORDER BY updated_at DESC
|
||||||
LIMIT 20
|
LIMIT 20
|
||||||
`);
|
`);
|
||||||
const rows = (result.rows ?? result) as Array<{
|
const rows = ((result as unknown as { rows?: unknown[] }).rows ?? (result as unknown as unknown[])) as Array<{
|
||||||
peer_name: string | null;
|
peer_name: string | null;
|
||||||
summary: string;
|
summary: string;
|
||||||
files_read: string[] | null;
|
files_read: string[] | null;
|
||||||
@@ -2209,6 +2295,10 @@ export interface QueueParams {
|
|||||||
nonce: string;
|
nonce: string;
|
||||||
ciphertext: string;
|
ciphertext: string;
|
||||||
expiresAt?: Date;
|
expiresAt?: Date;
|
||||||
|
/** Daemon idempotency id (spec §4.2). Optional; pre-daemon callers omit. */
|
||||||
|
clientMessageId?: string;
|
||||||
|
/** Canonical request fingerprint hex (spec §4.4). Optional; pre-daemon callers omit. */
|
||||||
|
requestFingerprint?: string;
|
||||||
}
|
}
|
||||||
|
|
||||||
/** Insert an E2E envelope into the mesh's message queue. */
|
/** Insert an E2E envelope into the mesh's message queue. */
|
||||||
@@ -2224,6 +2314,8 @@ export async function queueMessage(params: QueueParams): Promise<string> {
|
|||||||
nonce: params.nonce,
|
nonce: params.nonce,
|
||||||
ciphertext: params.ciphertext,
|
ciphertext: params.ciphertext,
|
||||||
expiresAt: params.expiresAt,
|
expiresAt: params.expiresAt,
|
||||||
|
clientMessageId: params.clientMessageId ?? null,
|
||||||
|
requestFingerprint: params.requestFingerprint ?? null,
|
||||||
})
|
})
|
||||||
.returning({ id: messageQueue.id });
|
.returning({ id: messageQueue.id });
|
||||||
if (!row) throw new Error("failed to queue message");
|
if (!row) throw new Error("failed to queue message");
|
||||||
@@ -2253,6 +2345,22 @@ function deliverablePriorities(status: PeerStatus): Priority[] {
|
|||||||
* targetSpec routing: matches either the member's pubkey directly or
|
* targetSpec routing: matches either the member's pubkey directly or
|
||||||
* the broadcast wildcard ("*"). Channel/tag resolution is per-mesh
|
* the broadcast wildcard ("*"). Channel/tag resolution is per-mesh
|
||||||
* config that lives outside this function.
|
* config that lives outside this function.
|
||||||
|
*
|
||||||
|
* v2 agentic-comms (M1): two-phase claim/deliver with a 30s lease.
|
||||||
|
*
|
||||||
|
* The legacy implementation set `delivered_at = NOW()` in the same
|
||||||
|
* UPDATE that selected the row. If the recipient WS was no longer
|
||||||
|
* OPEN at push time, the message dropped silently (the row read as
|
||||||
|
* "delivered" so the next reconnect's drain skipped it).
|
||||||
|
*
|
||||||
|
* The new behaviour:
|
||||||
|
* - claim sets (claimed_at, claim_id, claim_expires_at = NOW() + 30s)
|
||||||
|
* - delivered_at stays NULL until the recipient acks via `client_ack`
|
||||||
|
* - re-eligibility predicate accepts rows whose claim has expired,
|
||||||
|
* so dropped pushes are redelivered (at-least-once)
|
||||||
|
*
|
||||||
|
* `claimerPresenceId` is recorded on the row purely for debugging — it
|
||||||
|
* never gates re-claim; expiry alone does.
|
||||||
*/
|
*/
|
||||||
export async function drainForMember(
|
export async function drainForMember(
|
||||||
meshId: string,
|
meshId: string,
|
||||||
@@ -2262,6 +2370,7 @@ export async function drainForMember(
|
|||||||
sessionPubkey?: string,
|
sessionPubkey?: string,
|
||||||
excludeSenderSessionPubkey?: string,
|
excludeSenderSessionPubkey?: string,
|
||||||
memberGroups?: string[],
|
memberGroups?: string[],
|
||||||
|
claimerPresenceId?: string,
|
||||||
): Promise<
|
): Promise<
|
||||||
Array<{
|
Array<{
|
||||||
id: string;
|
id: string;
|
||||||
@@ -2271,6 +2380,9 @@ export async function drainForMember(
|
|||||||
createdAt: Date;
|
createdAt: Date;
|
||||||
senderMemberId: string;
|
senderMemberId: string;
|
||||||
senderPubkey: string;
|
senderPubkey: string;
|
||||||
|
/** v0.9.0 daemon fields; null for legacy traffic. */
|
||||||
|
clientMessageId: string | null;
|
||||||
|
requestFingerprint: string | null;
|
||||||
}>
|
}>
|
||||||
> {
|
> {
|
||||||
const priorities = deliverablePriorities(status);
|
const priorities = deliverablePriorities(status);
|
||||||
@@ -2324,6 +2436,11 @@ export async function drainForMember(
|
|||||||
// (with id as tiebreaker so equal-timestamp rows stay deterministic).
|
// (with id as tiebreaker so equal-timestamp rows stay deterministic).
|
||||||
// Sorting in SQL avoids JS Date's millisecond-precision collapse of
|
// Sorting in SQL avoids JS Date's millisecond-precision collapse of
|
||||||
// Postgres microsecond timestamps.
|
// Postgres microsecond timestamps.
|
||||||
|
//
|
||||||
|
// v2 (M1): claim sets the lease columns, NOT delivered_at. Re-eligibility
|
||||||
|
// accepts unclaimed rows AND rows with an expired claim (NULL or past
|
||||||
|
// NOW()). delivered_at stays NULL until a `client_ack` lands.
|
||||||
|
const claimerId = claimerPresenceId ?? null;
|
||||||
const result = await db.execute<{
|
const result = await db.execute<{
|
||||||
id: string;
|
id: string;
|
||||||
priority: string;
|
priority: string;
|
||||||
@@ -2332,15 +2449,20 @@ export async function drainForMember(
|
|||||||
created_at: string | Date;
|
created_at: string | Date;
|
||||||
sender_member_id: string;
|
sender_member_id: string;
|
||||||
sender_pubkey: string;
|
sender_pubkey: string;
|
||||||
|
client_message_id: string | null;
|
||||||
|
request_fingerprint: string | null;
|
||||||
}>(sql`
|
}>(sql`
|
||||||
WITH claimed AS (
|
WITH claimed AS (
|
||||||
UPDATE mesh.message_queue AS mq
|
UPDATE mesh.message_queue AS mq
|
||||||
SET delivered_at = NOW()
|
SET claimed_at = NOW(),
|
||||||
|
claim_id = ${claimerId},
|
||||||
|
claim_expires_at = NOW() + INTERVAL '30 seconds'
|
||||||
FROM mesh.member AS m
|
FROM mesh.member AS m
|
||||||
WHERE mq.id IN (
|
WHERE mq.id IN (
|
||||||
SELECT id FROM mesh.message_queue
|
SELECT id FROM mesh.message_queue
|
||||||
WHERE mesh_id = ${meshId}
|
WHERE mesh_id = ${meshId}
|
||||||
AND delivered_at IS NULL
|
AND delivered_at IS NULL
|
||||||
|
AND (claimed_at IS NULL OR claim_expires_at IS NULL OR claim_expires_at < NOW())
|
||||||
AND priority::text IN (${priorityList})
|
AND priority::text IN (${priorityList})
|
||||||
AND (target_spec = ${memberPubkey} OR target_spec = '*'${sessionPubkey ? sql` OR target_spec = ${sessionPubkey}` : sql``} OR target_spec IN (${groupTargetList})${topicTargetList ? sql` OR target_spec IN (${topicTargetList})` : sql``})
|
AND (target_spec = ${memberPubkey} OR target_spec = '*'${sessionPubkey ? sql` OR target_spec = ${sessionPubkey}` : sql``} OR target_spec IN (${groupTargetList})${topicTargetList ? sql` OR target_spec IN (${topicTargetList})` : sql``})
|
||||||
${excludeSenderSessionPubkey ? sql`AND NOT (target_spec IN ('*') AND sender_session_pubkey = ${excludeSenderSessionPubkey})` : sql``}
|
${excludeSenderSessionPubkey ? sql`AND NOT (target_spec IN ('*') AND sender_session_pubkey = ${excludeSenderSessionPubkey})` : sql``}
|
||||||
@@ -2350,12 +2472,13 @@ export async function drainForMember(
|
|||||||
AND m.id = mq.sender_member_id
|
AND m.id = mq.sender_member_id
|
||||||
RETURNING mq.id, mq.priority, mq.nonce, mq.ciphertext,
|
RETURNING mq.id, mq.priority, mq.nonce, mq.ciphertext,
|
||||||
mq.created_at, mq.sender_member_id,
|
mq.created_at, mq.sender_member_id,
|
||||||
|
mq.client_message_id, mq.request_fingerprint,
|
||||||
COALESCE(mq.sender_session_pubkey, m.peer_pubkey) AS sender_pubkey
|
COALESCE(mq.sender_session_pubkey, m.peer_pubkey) AS sender_pubkey
|
||||||
)
|
)
|
||||||
SELECT * FROM claimed ORDER BY created_at ASC, id ASC
|
SELECT * FROM claimed ORDER BY created_at ASC, id ASC
|
||||||
`);
|
`);
|
||||||
|
|
||||||
const rows = (result.rows ?? result) as Array<{
|
const rows = ((result as unknown as { rows?: unknown[] }).rows ?? (result as unknown as unknown[])) as Array<{
|
||||||
id: string;
|
id: string;
|
||||||
priority: string;
|
priority: string;
|
||||||
nonce: string;
|
nonce: string;
|
||||||
@@ -2363,6 +2486,8 @@ export async function drainForMember(
|
|||||||
created_at: string | Date;
|
created_at: string | Date;
|
||||||
sender_member_id: string;
|
sender_member_id: string;
|
||||||
sender_pubkey: string;
|
sender_pubkey: string;
|
||||||
|
client_message_id: string | null;
|
||||||
|
request_fingerprint: string | null;
|
||||||
}>;
|
}>;
|
||||||
if (!rows || rows.length === 0) return [];
|
if (!rows || rows.length === 0) return [];
|
||||||
return rows.map((r) => ({
|
return rows.map((r) => ({
|
||||||
@@ -2374,14 +2499,98 @@ export async function drainForMember(
|
|||||||
r.created_at instanceof Date ? r.created_at : new Date(r.created_at),
|
r.created_at instanceof Date ? r.created_at : new Date(r.created_at),
|
||||||
senderMemberId: r.sender_member_id,
|
senderMemberId: r.sender_member_id,
|
||||||
senderPubkey: r.sender_pubkey,
|
senderPubkey: r.sender_pubkey,
|
||||||
|
clientMessageId: r.client_message_id ?? null,
|
||||||
|
requestFingerprint: r.request_fingerprint ?? null,
|
||||||
}));
|
}));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* v2 agentic-comms (M1): mark a message_queue row as delivered.
|
||||||
|
*
|
||||||
|
* Called when the recipient WS replies with a `client_ack` carrying the
|
||||||
|
* original `client_message_id`. Lookup is scoped to (mesh_id, member_id)
|
||||||
|
* so a malicious peer can't ack messages addressed to others. Returns
|
||||||
|
* the number of rows marked (0 = unknown id, already delivered, or wrong
|
||||||
|
* recipient).
|
||||||
|
*/
|
||||||
|
export async function markDelivered(params: {
|
||||||
|
meshId: string;
|
||||||
|
/** memberId of the WS that's claiming to have received this message. */
|
||||||
|
recipientMemberId: string;
|
||||||
|
recipientMemberPubkey: string;
|
||||||
|
recipientSessionPubkey?: string | null;
|
||||||
|
clientMessageId?: string | null;
|
||||||
|
brokerMessageId?: string | null;
|
||||||
|
}): Promise<number> {
|
||||||
|
const {
|
||||||
|
meshId,
|
||||||
|
recipientMemberPubkey,
|
||||||
|
recipientSessionPubkey,
|
||||||
|
clientMessageId,
|
||||||
|
brokerMessageId,
|
||||||
|
} = params;
|
||||||
|
if (!clientMessageId && !brokerMessageId) return 0;
|
||||||
|
|
||||||
|
// Prefer broker id when available; falls back to clientMessageId.
|
||||||
|
// Scope to (mesh_id, target_spec ∈ {member-pubkey, session-pubkey, '*', @group, #topic}).
|
||||||
|
// For minimal blast radius we only allow direct/broadcast acks here —
|
||||||
|
// group/topic acks would need the same membership expansion drainForMember
|
||||||
|
// does and we'd rather under-ack than over-ack (re-claim is cheap).
|
||||||
|
const result = await db.execute<{ id: string }>(sql`
|
||||||
|
UPDATE mesh.message_queue
|
||||||
|
SET delivered_at = NOW()
|
||||||
|
WHERE mesh_id = ${meshId}
|
||||||
|
AND delivered_at IS NULL
|
||||||
|
AND (
|
||||||
|
${brokerMessageId ? sql`id = ${brokerMessageId}` : sql`FALSE`}
|
||||||
|
OR ${clientMessageId ? sql`client_message_id = ${clientMessageId}` : sql`FALSE`}
|
||||||
|
)
|
||||||
|
AND (
|
||||||
|
target_spec = ${recipientMemberPubkey}
|
||||||
|
${recipientSessionPubkey ? sql`OR target_spec = ${recipientSessionPubkey}` : sql``}
|
||||||
|
OR target_spec = '*'
|
||||||
|
OR target_spec LIKE '@%'
|
||||||
|
OR target_spec LIKE '#%'
|
||||||
|
)
|
||||||
|
RETURNING id
|
||||||
|
`);
|
||||||
|
const rows = ((result as unknown as { rows?: unknown[] }).rows ?? (result as unknown as unknown[])) as Array<{ id: string }>;
|
||||||
|
return rows.length;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* v2 agentic-comms (M1): reap expired claims so dropped pushes redeliver.
|
||||||
|
*
|
||||||
|
* Runs every 15s. Clears (claimed_at, claim_id, claim_expires_at) on rows
|
||||||
|
* where the lease has expired and no `client_ack` arrived. The next
|
||||||
|
* `drainForMember` call will pick the row up again — at-least-once.
|
||||||
|
*
|
||||||
|
* Returns the number of rows reaped.
|
||||||
|
*/
|
||||||
|
export async function sweepExpiredClaims(): Promise<number> {
|
||||||
|
const result = await db.execute<{ id: string }>(sql`
|
||||||
|
UPDATE mesh.message_queue
|
||||||
|
SET claimed_at = NULL,
|
||||||
|
claim_id = NULL,
|
||||||
|
claim_expires_at = NULL
|
||||||
|
WHERE delivered_at IS NULL
|
||||||
|
AND claim_expires_at IS NOT NULL
|
||||||
|
AND claim_expires_at < NOW()
|
||||||
|
RETURNING id
|
||||||
|
`);
|
||||||
|
const rows = ((result as unknown as { rows?: unknown[] }).rows ?? (result as unknown as unknown[])) as Array<{ id: string }>;
|
||||||
|
return rows.length;
|
||||||
|
}
|
||||||
|
|
||||||
// --- Lifecycle ---
|
// --- Lifecycle ---
|
||||||
|
|
||||||
let ttlTimer: ReturnType<typeof setInterval> | null = null;
|
let ttlTimer: ReturnType<typeof setInterval> | null = null;
|
||||||
let pendingTimer: ReturnType<typeof setInterval> | null = null;
|
let pendingTimer: ReturnType<typeof setInterval> | null = null;
|
||||||
let staleTimer: ReturnType<typeof setInterval> | null = null;
|
let staleTimer: ReturnType<typeof setInterval> | null = null;
|
||||||
|
let claimSweepTimer: ReturnType<typeof setInterval> | null = null;
|
||||||
|
|
||||||
|
/** v2 agentic-comms (M1): how often we reap expired message claims. */
|
||||||
|
const CLAIM_SWEEP_INTERVAL_MS = 15_000;
|
||||||
|
|
||||||
/** Start background sweepers. Idempotent. */
|
/** Start background sweepers. Idempotent. */
|
||||||
export function startSweepers(): void {
|
export function startSweepers(): void {
|
||||||
@@ -2399,6 +2608,13 @@ export function startSweepers(): void {
|
|||||||
console.error("[broker] stale presence sweep:", e),
|
console.error("[broker] stale presence sweep:", e),
|
||||||
);
|
);
|
||||||
}, 30_000);
|
}, 30_000);
|
||||||
|
claimSweepTimer = setInterval(() => {
|
||||||
|
sweepExpiredClaims()
|
||||||
|
.then((n) => {
|
||||||
|
if (n > 0) console.log(`[broker] expired claims swept: ${n}`);
|
||||||
|
})
|
||||||
|
.catch((e) => console.error("[broker] claim sweep:", e));
|
||||||
|
}, CLAIM_SWEEP_INTERVAL_MS);
|
||||||
// Orphan-message sweep every hour; cheap, rows are all >7d at deletion time.
|
// Orphan-message sweep every hour; cheap, rows are all >7d at deletion time.
|
||||||
setInterval(() => {
|
setInterval(() => {
|
||||||
sweepOrphanMessages()
|
sweepOrphanMessages()
|
||||||
@@ -2412,9 +2628,11 @@ export async function stopSweepers(): Promise<void> {
|
|||||||
if (ttlTimer) clearInterval(ttlTimer);
|
if (ttlTimer) clearInterval(ttlTimer);
|
||||||
if (pendingTimer) clearInterval(pendingTimer);
|
if (pendingTimer) clearInterval(pendingTimer);
|
||||||
if (staleTimer) clearInterval(staleTimer);
|
if (staleTimer) clearInterval(staleTimer);
|
||||||
|
if (claimSweepTimer) clearInterval(claimSweepTimer);
|
||||||
ttlTimer = null;
|
ttlTimer = null;
|
||||||
pendingTimer = null;
|
pendingTimer = null;
|
||||||
staleTimer = null;
|
staleTimer = null;
|
||||||
|
claimSweepTimer = null;
|
||||||
await db
|
await db
|
||||||
.update(presence)
|
.update(presence)
|
||||||
.set({ disconnectedAt: new Date() })
|
.set({ disconnectedAt: new Date() })
|
||||||
@@ -2597,7 +2815,11 @@ export async function findMemberByPubkey(
|
|||||||
),
|
),
|
||||||
)
|
)
|
||||||
.limit(1);
|
.limit(1);
|
||||||
return row ?? null;
|
if (!row) return null;
|
||||||
|
return {
|
||||||
|
...row,
|
||||||
|
defaultGroups: row.defaultGroups ?? [],
|
||||||
|
};
|
||||||
}
|
}
|
||||||
|
|
||||||
// --- Mesh databases (per-mesh PostgreSQL schemas) ---
|
// --- Mesh databases (per-mesh PostgreSQL schemas) ---
|
||||||
@@ -2651,7 +2873,7 @@ export async function meshQuery(
|
|||||||
sql.raw(`SET LOCAL search_path TO "${schema}"`)
|
sql.raw(`SET LOCAL search_path TO "${schema}"`)
|
||||||
);
|
);
|
||||||
const result = await tx.execute(sql.raw(query));
|
const result = await tx.execute(sql.raw(query));
|
||||||
const rows = (result.rows ?? []) as Array<Record<string, unknown>>;
|
const rows = ((result as unknown as { rows?: unknown[] }).rows ?? (result as unknown as unknown[])) as Array<Record<string, unknown>>;
|
||||||
const columns = rows.length > 0 ? Object.keys(rows[0]!) : [];
|
const columns = rows.length > 0 ? Object.keys(rows[0]!) : [];
|
||||||
return { columns, rows, rowCount: rows.length };
|
return { columns, rows, rowCount: rows.length };
|
||||||
});
|
});
|
||||||
@@ -2694,7 +2916,7 @@ export async function meshSchema(
|
|||||||
WHERE table_schema = ${schema}
|
WHERE table_schema = ${schema}
|
||||||
ORDER BY table_name, ordinal_position
|
ORDER BY table_name, ordinal_position
|
||||||
`);
|
`);
|
||||||
const rows = (result.rows ?? result) as Array<{
|
const rows = ((result as unknown as { rows?: unknown[] }).rows ?? (result as unknown as unknown[])) as Array<{
|
||||||
table_name: string;
|
table_name: string;
|
||||||
column_name: string;
|
column_name: string;
|
||||||
data_type: string;
|
data_type: string;
|
||||||
|
|||||||
@@ -138,6 +138,128 @@ export async function sealRootKeyToRecipient(params: {
|
|||||||
|
|
||||||
export const HELLO_SKEW_MS = 60_000;
|
export const HELLO_SKEW_MS = 60_000;
|
||||||
|
|
||||||
|
/** Maximum lifetime of a parent attestation (24h). */
|
||||||
|
export const SESSION_ATTESTATION_MAX_TTL_MS = 24 * 60 * 60 * 1000;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Canonical bytes for a parent-vouches-session attestation.
|
||||||
|
*
|
||||||
|
* The parent member signs this with their stable ed25519 secret key when
|
||||||
|
* minting an attestation in `claudemesh launch`. The broker recomputes
|
||||||
|
* the same string at session_hello time and verifies the signature
|
||||||
|
* against `parent_member_pubkey`.
|
||||||
|
*
|
||||||
|
* Format: `claudemesh-session-attest|<parent_pubkey>|<session_pubkey>|<expires_at_ms>`
|
||||||
|
*/
|
||||||
|
export function canonicalSessionAttestation(
|
||||||
|
parentMemberPubkey: string,
|
||||||
|
sessionPubkey: string,
|
||||||
|
expiresAt: number,
|
||||||
|
): string {
|
||||||
|
return `claudemesh-session-attest|${parentMemberPubkey}|${sessionPubkey}|${expiresAt}`;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Canonical bytes for the session_hello signature.
|
||||||
|
*
|
||||||
|
* The session keypair (held by the daemon for the lifetime of the
|
||||||
|
* registration) signs this fresh on every WS connect, proving liveness +
|
||||||
|
* possession of the session secret key. Without this stage, an attacker
|
||||||
|
* who captured an attestation could replay it from any machine.
|
||||||
|
*
|
||||||
|
* Format: `claudemesh-session-hello|<mesh_id>|<parent_pubkey>|<session_pubkey>|<timestamp_ms>`
|
||||||
|
*/
|
||||||
|
export function canonicalSessionHello(
|
||||||
|
meshId: string,
|
||||||
|
parentMemberPubkey: string,
|
||||||
|
sessionPubkey: string,
|
||||||
|
timestamp: number,
|
||||||
|
): string {
|
||||||
|
return `claudemesh-session-hello|${meshId}|${parentMemberPubkey}|${sessionPubkey}|${timestamp}`;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Validate a parent-vouches-session attestation: lifetime bound + signature.
|
||||||
|
* Returns `{ ok: true }` on success or `{ ok: false, reason }` on failure.
|
||||||
|
*
|
||||||
|
* The TTL ceiling (24h) bounds replay damage if an attestation leaks; the
|
||||||
|
* lower bound (already in the past) blocks reuse of expired ones.
|
||||||
|
*/
|
||||||
|
export async function verifySessionAttestation(args: {
|
||||||
|
parentMemberPubkey: string;
|
||||||
|
sessionPubkey: string;
|
||||||
|
expiresAt: number;
|
||||||
|
signature: string;
|
||||||
|
now?: number;
|
||||||
|
}): Promise<
|
||||||
|
| { ok: true }
|
||||||
|
| { ok: false; reason: "expired" | "ttl_too_long" | "bad_signature" | "malformed" }
|
||||||
|
> {
|
||||||
|
const now = args.now ?? Date.now();
|
||||||
|
if (!Number.isFinite(args.expiresAt)) {
|
||||||
|
return { ok: false, reason: "malformed" };
|
||||||
|
}
|
||||||
|
if (args.expiresAt <= now) {
|
||||||
|
return { ok: false, reason: "expired" };
|
||||||
|
}
|
||||||
|
if (args.expiresAt > now + SESSION_ATTESTATION_MAX_TTL_MS) {
|
||||||
|
return { ok: false, reason: "ttl_too_long" };
|
||||||
|
}
|
||||||
|
if (
|
||||||
|
!/^[0-9a-f]{64}$/i.test(args.parentMemberPubkey) ||
|
||||||
|
!/^[0-9a-f]{64}$/i.test(args.sessionPubkey) ||
|
||||||
|
!/^[0-9a-f]{128}$/i.test(args.signature)
|
||||||
|
) {
|
||||||
|
return { ok: false, reason: "malformed" };
|
||||||
|
}
|
||||||
|
const canonical = canonicalSessionAttestation(
|
||||||
|
args.parentMemberPubkey,
|
||||||
|
args.sessionPubkey,
|
||||||
|
args.expiresAt,
|
||||||
|
);
|
||||||
|
const ok = await verifyEd25519(canonical, args.signature, args.parentMemberPubkey);
|
||||||
|
return ok ? { ok: true } : { ok: false, reason: "bad_signature" };
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Validate the session-side hello signature: timestamp skew + signature
|
||||||
|
* by the session keypair over canonical session-hello bytes.
|
||||||
|
*/
|
||||||
|
export async function verifySessionHelloSignature(args: {
|
||||||
|
meshId: string;
|
||||||
|
parentMemberPubkey: string;
|
||||||
|
sessionPubkey: string;
|
||||||
|
timestamp: number;
|
||||||
|
signature: string;
|
||||||
|
now?: number;
|
||||||
|
}): Promise<
|
||||||
|
| { ok: true }
|
||||||
|
| { ok: false; reason: "timestamp_skew" | "bad_signature" | "malformed" }
|
||||||
|
> {
|
||||||
|
const now = args.now ?? Date.now();
|
||||||
|
if (
|
||||||
|
!Number.isFinite(args.timestamp) ||
|
||||||
|
Math.abs(now - args.timestamp) > HELLO_SKEW_MS
|
||||||
|
) {
|
||||||
|
return { ok: false, reason: "timestamp_skew" };
|
||||||
|
}
|
||||||
|
if (
|
||||||
|
!/^[0-9a-f]{64}$/i.test(args.parentMemberPubkey) ||
|
||||||
|
!/^[0-9a-f]{64}$/i.test(args.sessionPubkey) ||
|
||||||
|
!/^[0-9a-f]{128}$/i.test(args.signature)
|
||||||
|
) {
|
||||||
|
return { ok: false, reason: "malformed" };
|
||||||
|
}
|
||||||
|
const canonical = canonicalSessionHello(
|
||||||
|
args.meshId,
|
||||||
|
args.parentMemberPubkey,
|
||||||
|
args.sessionPubkey,
|
||||||
|
args.timestamp,
|
||||||
|
);
|
||||||
|
const ok = await verifyEd25519(canonical, args.signature, args.sessionPubkey);
|
||||||
|
return ok ? { ok: true } : { ok: false, reason: "bad_signature" };
|
||||||
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Verify a hello's ed25519 signature + timestamp skew.
|
* Verify a hello's ed25519 signature + timestamp skew.
|
||||||
* Returns { ok: true } on success, or { ok: false, reason } describing
|
* Returns { ok: true } on success, or { ok: false, reason } describing
|
||||||
|
|||||||
@@ -23,7 +23,7 @@ const envSchema = z.object({
|
|||||||
MINIO_ENDPOINT: z.string().default("minio:9000"),
|
MINIO_ENDPOINT: z.string().default("minio:9000"),
|
||||||
MINIO_ACCESS_KEY: z.string().default("claudemesh"),
|
MINIO_ACCESS_KEY: z.string().default("claudemesh"),
|
||||||
MINIO_SECRET_KEY: z.string().default("changeme"),
|
MINIO_SECRET_KEY: z.string().default("changeme"),
|
||||||
MINIO_USE_SSL: z.enum(["true", "false", ""]).transform(v => v === "true").default("false"),
|
MINIO_USE_SSL: z.enum(["true", "false", ""]).default("false").transform(v => v === "true"),
|
||||||
QDRANT_URL: z.string().default("http://qdrant:6333"),
|
QDRANT_URL: z.string().default("http://qdrant:6333"),
|
||||||
NEO4J_URL: z.string().default("bolt://neo4j:7687"),
|
NEO4J_URL: z.string().default("bolt://neo4j:7687"),
|
||||||
NEO4J_USER: z.string().default("neo4j"),
|
NEO4J_USER: z.string().default("neo4j"),
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@@ -86,7 +86,7 @@ export async function verifySyncToken(
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Decode header — must be HS256
|
// Decode header — must be HS256
|
||||||
const header = JSON.parse(new TextDecoder().decode(base64UrlDecode(headerB64)));
|
const header = JSON.parse(new TextDecoder().decode(base64UrlDecode(headerB64))) as { alg?: string };
|
||||||
if (header.alg !== "HS256") {
|
if (header.alg !== "HS256") {
|
||||||
return { ok: false, error: `unsupported algorithm: ${header.alg}` };
|
return { ok: false, error: `unsupported algorithm: ${header.alg}` };
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -31,7 +31,7 @@ export interface MemberPermissionUpdate {
|
|||||||
|
|
||||||
export type MemberUpdateRequest = MemberProfileUpdate & MemberPermissionUpdate;
|
export type MemberUpdateRequest = MemberProfileUpdate & MemberPermissionUpdate;
|
||||||
|
|
||||||
interface SelfEditablePolicy {
|
export interface SelfEditablePolicy {
|
||||||
displayName: boolean;
|
displayName: boolean;
|
||||||
roleTag: boolean;
|
roleTag: boolean;
|
||||||
groups: boolean;
|
groups: boolean;
|
||||||
|
|||||||
@@ -115,11 +115,11 @@ function lastAssistantHasToolUse(filePath: string): boolean {
|
|||||||
if (!line) continue;
|
if (!line) continue;
|
||||||
if (!line.includes('"assistant"')) continue;
|
if (!line.includes('"assistant"')) continue;
|
||||||
try {
|
try {
|
||||||
const d = JSON.parse(line);
|
const d = JSON.parse(line) as { type?: string; message?: { content?: unknown } };
|
||||||
if (d.type !== "assistant") continue;
|
if (d.type !== "assistant") continue;
|
||||||
const content = d.message?.content;
|
const content = d.message?.content;
|
||||||
if (!Array.isArray(content)) continue;
|
if (!Array.isArray(content)) continue;
|
||||||
return content.some((c: { type?: string }) => c.type === "tool_use");
|
return (content as Array<{ type?: string }>).some((c) => c.type === "tool_use");
|
||||||
} catch {
|
} catch {
|
||||||
/* malformed line, skip */
|
/* malformed line, skip */
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -169,7 +169,7 @@ function detectEntry(
|
|||||||
try {
|
try {
|
||||||
const pkg = JSON.parse(
|
const pkg = JSON.parse(
|
||||||
readFileSync(join(sourcePath, "package.json"), "utf-8"),
|
readFileSync(join(sourcePath, "package.json"), "utf-8"),
|
||||||
);
|
) as { main?: string; bin?: string | Record<string, string> };
|
||||||
if (pkg.main) return { command: cmd, args: [pkg.main] };
|
if (pkg.main) return { command: cmd, args: [pkg.main] };
|
||||||
if (pkg.bin) {
|
if (pkg.bin) {
|
||||||
const bin =
|
const bin =
|
||||||
@@ -372,7 +372,7 @@ function spawnService(svc: ManagedService): void {
|
|||||||
const rl = createInterface({ input: child.stdout! });
|
const rl = createInterface({ input: child.stdout! });
|
||||||
rl.on("line", (line) => {
|
rl.on("line", (line) => {
|
||||||
try {
|
try {
|
||||||
const msg = JSON.parse(line);
|
const msg = JSON.parse(line) as { id?: string | number; error?: { message?: string }; result?: unknown };
|
||||||
if (msg.id && svc.pendingCalls.has(String(msg.id))) {
|
if (msg.id && svc.pendingCalls.has(String(msg.id))) {
|
||||||
const pending = svc.pendingCalls.get(String(msg.id))!;
|
const pending = svc.pendingCalls.get(String(msg.id))!;
|
||||||
clearTimeout(pending.timer);
|
clearTimeout(pending.timer);
|
||||||
|
|||||||
@@ -13,6 +13,7 @@ import { Bot, InputFile } from "grammy";
|
|||||||
import WebSocket from "ws";
|
import WebSocket from "ws";
|
||||||
import sodium from "libsodium-wrappers";
|
import sodium from "libsodium-wrappers";
|
||||||
import { validateTelegramConnectToken } from "./telegram-token";
|
import { validateTelegramConnectToken } from "./telegram-token";
|
||||||
|
import { log } from "./logger";
|
||||||
|
|
||||||
// ---------------------------------------------------------------------------
|
// ---------------------------------------------------------------------------
|
||||||
// Types
|
// Types
|
||||||
@@ -22,11 +23,12 @@ export interface BridgeRow {
|
|||||||
chatId: number;
|
chatId: number;
|
||||||
meshId: string;
|
meshId: string;
|
||||||
meshSlug?: string;
|
meshSlug?: string;
|
||||||
memberId: string;
|
/** memberId can be null until the bridge claims a mesh.member row. */
|
||||||
|
memberId: string | null;
|
||||||
pubkey: string;
|
pubkey: string;
|
||||||
secretKey: string;
|
secretKey: string;
|
||||||
displayName: string;
|
displayName: string | null;
|
||||||
chatType: string;
|
chatType: string | null;
|
||||||
chatTitle: string | null;
|
chatTitle: string | null;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -228,7 +230,7 @@ class MeshConnection {
|
|||||||
|
|
||||||
ws.on("message", async (raw) => {
|
ws.on("message", async (raw) => {
|
||||||
try {
|
try {
|
||||||
const msg = JSON.parse(raw.toString());
|
const msg = JSON.parse(raw.toString()) as Record<string, any>;
|
||||||
|
|
||||||
if (msg.type === "hello_ack") {
|
if (msg.type === "hello_ack") {
|
||||||
clearTimeout(helloTimeout);
|
clearTimeout(helloTimeout);
|
||||||
@@ -674,8 +676,8 @@ function createPushHandler(bot: Bot) {
|
|||||||
for (const chatId of chatIds) {
|
for (const chatId of chatIds) {
|
||||||
bot.api
|
bot.api
|
||||||
.sendMessage(chatId, formatted)
|
.sendMessage(chatId, formatted)
|
||||||
.catch((e) => {
|
.catch((e: unknown) => {
|
||||||
console.error(`[tg-bridge] send to chat ${chatId} failed:`, e.message);
|
console.error(`[tg-bridge] send to chat ${chatId} failed:`, e instanceof Error ? e.message : String(e));
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
@@ -1729,11 +1731,12 @@ async function executeAiToolCall(
|
|||||||
for (const meshId of meshIds) {
|
for (const meshId of meshIds) {
|
||||||
const services = await listDbMeshServices(meshId);
|
const services = await listDbMeshServices(meshId);
|
||||||
for (const s of services) {
|
for (const s of services) {
|
||||||
|
const sx = s as Record<string, unknown>;
|
||||||
allServices.push({
|
allServices.push({
|
||||||
name: s.name,
|
name: String(sx.name ?? ""),
|
||||||
type: s.type ?? "mcp",
|
type: String(sx.type ?? "mcp"),
|
||||||
tools: s.tool_count ?? 0,
|
tools: Number(sx.tool_count ?? 0),
|
||||||
status: s.status ?? "running",
|
status: String(sx.status ?? "running"),
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -1841,6 +1844,9 @@ export async function bootTelegramBridge(
|
|||||||
for (const [meshId, meshRows] of byMesh) {
|
for (const [meshId, meshRows] of byMesh) {
|
||||||
const first = meshRows[0]!;
|
const first = meshRows[0]!;
|
||||||
try {
|
try {
|
||||||
|
// memberId/displayName come back from DB nullable; bridge only
|
||||||
|
// works once both are populated, so skip rows missing either.
|
||||||
|
if (!first.memberId || !first.displayName) continue;
|
||||||
await ensureMeshConnection(
|
await ensureMeshConnection(
|
||||||
{
|
{
|
||||||
meshId,
|
meshId,
|
||||||
|
|||||||
@@ -102,11 +102,11 @@ export function validateTelegramConnectToken(
|
|||||||
if (!timingSafeEqual(a, b)) return null;
|
if (!timingSafeEqual(a, b)) return null;
|
||||||
|
|
||||||
// Verify header algorithm
|
// Verify header algorithm
|
||||||
const header = JSON.parse(base64urlDecode(headerB64));
|
const header = JSON.parse(base64urlDecode(headerB64)) as { alg?: string };
|
||||||
if (header.alg !== "HS256") return null;
|
if (header.alg !== "HS256") return null;
|
||||||
|
|
||||||
// Decode and validate claims
|
// Decode and validate claims
|
||||||
const claims: JwtClaims = JSON.parse(base64urlDecode(payloadB64));
|
const claims = JSON.parse(base64urlDecode(payloadB64)) as JwtClaims;
|
||||||
|
|
||||||
// Check subject
|
// Check subject
|
||||||
if (claims.sub !== "telegram-connect") return null;
|
if (claims.sub !== "telegram-connect") return null;
|
||||||
|
|||||||
@@ -90,6 +90,66 @@ export interface WSHelloMessage {
|
|||||||
signature: string;
|
signature: string;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Client → broker: per-launch session hello, vouched by the parent member.
|
||||||
|
*
|
||||||
|
* Used by the daemon's per-session WebSocket connections (1.30.0+) so that
|
||||||
|
* each `claudemesh launch`-spawned session has its own long-lived presence
|
||||||
|
* row owned by an ephemeral session keypair. The parent member key vouches
|
||||||
|
* (out-of-band) that the session pubkey is theirs; the session keypair
|
||||||
|
* proves liveness on every connect.
|
||||||
|
*
|
||||||
|
* Two-stage proof:
|
||||||
|
* 1. `parentAttestation.signature` — ed25519 over
|
||||||
|
* `claudemesh-session-attest|<parent_pubkey>|<session_pubkey>|<expires_at_ms>`
|
||||||
|
* signed by the parent member's stable secret key. TTL ≤ 24h.
|
||||||
|
* 2. `signature` — ed25519 over
|
||||||
|
* `claudemesh-session-hello|<mesh_id>|<parent_pubkey>|<session_pubkey>|<timestamp>`
|
||||||
|
* signed by the session secret key (held by the daemon for the
|
||||||
|
* lifetime of the session registration).
|
||||||
|
*
|
||||||
|
* Older brokers don't recognize this message type and reply with
|
||||||
|
* `unknown_message_type`; clients fall back to the legacy `hello` flow.
|
||||||
|
*/
|
||||||
|
export interface WSSessionHelloMessage {
|
||||||
|
type: "session_hello";
|
||||||
|
/** Highest WS protocol version the client understands. */
|
||||||
|
protocolVersion?: number;
|
||||||
|
/** Optional feature strings the client supports. */
|
||||||
|
capabilities?: string[];
|
||||||
|
meshId: string;
|
||||||
|
/** Parent member's id (mesh.member.id) — used for revocation lookup. */
|
||||||
|
parentMemberId: string;
|
||||||
|
/** Parent member's stable ed25519 pubkey (hex), as found in mesh.member. */
|
||||||
|
parentMemberPubkey: string;
|
||||||
|
/** Per-launch ephemeral ed25519 pubkey (hex). Routes presence + DMs. */
|
||||||
|
sessionPubkey: string;
|
||||||
|
/** Pre-signed attestation by the parent member, presented per session. */
|
||||||
|
parentAttestation: {
|
||||||
|
sessionPubkey: string;
|
||||||
|
parentMemberPubkey: string;
|
||||||
|
/** Unix ms; broker rejects past or > now+24h. */
|
||||||
|
expiresAt: number;
|
||||||
|
signature: string;
|
||||||
|
};
|
||||||
|
/** Display name override for this session (optional, falls back to member). */
|
||||||
|
displayName?: string;
|
||||||
|
sessionId: string;
|
||||||
|
pid: number;
|
||||||
|
cwd: string;
|
||||||
|
hostname?: string;
|
||||||
|
peerType?: "ai" | "human" | "connector";
|
||||||
|
channel?: string;
|
||||||
|
model?: string;
|
||||||
|
groups?: Array<{ name: string; role?: string }>;
|
||||||
|
/** Initial role tag for the session. */
|
||||||
|
role?: string;
|
||||||
|
/** ms epoch; broker rejects if outside ±60s of its own clock. */
|
||||||
|
timestamp: number;
|
||||||
|
/** ed25519 signature (hex) by the SESSION secret key over canonical bytes. */
|
||||||
|
signature: string;
|
||||||
|
}
|
||||||
|
|
||||||
/** Client → broker: send an E2E-encrypted envelope to a target. */
|
/** Client → broker: send an E2E-encrypted envelope to a target. */
|
||||||
export interface WSSendMessage {
|
export interface WSSendMessage {
|
||||||
type: "send";
|
type: "send";
|
||||||
@@ -106,6 +166,14 @@ export interface WSSendMessage {
|
|||||||
* the body when this is absent.
|
* the body when this is absent.
|
||||||
*/
|
*/
|
||||||
mentions?: string[];
|
mentions?: string[];
|
||||||
|
/** Optional id of a previous topic message this one replies to.
|
||||||
|
* Server validates same-topic membership; FK is set null if parent
|
||||||
|
* later disappears. Ignored for non-topic targets. */
|
||||||
|
replyToId?: string;
|
||||||
|
/** Optional ciphertext-format version. 1 = v1 plaintext base64;
|
||||||
|
* 2 = v0.3.0 phase 3 per-topic encrypted body. Server passes this
|
||||||
|
* through verbatim into topic_message.body_version. */
|
||||||
|
bodyVersion?: number;
|
||||||
}
|
}
|
||||||
|
|
||||||
/** Broker → client: an envelope addressed to this peer. */
|
/** Broker → client: an envelope addressed to this peer. */
|
||||||
@@ -113,11 +181,34 @@ export interface WSPushMessage {
|
|||||||
type: "push";
|
type: "push";
|
||||||
messageId: string;
|
messageId: string;
|
||||||
meshId: string;
|
meshId: string;
|
||||||
|
/** Sender's *session* pubkey — ephemeral, rotates on session restart.
|
||||||
|
* DMs are sealed against the recipient's session key paired with this.
|
||||||
|
* For replies prefer `senderMemberPubkey` / `senderMemberId`. */
|
||||||
senderPubkey: string;
|
senderPubkey: string;
|
||||||
|
/** Sender's *member* pubkey — stable across reconnects/restarts.
|
||||||
|
* Use this as the canonical reply target. */
|
||||||
|
senderMemberPubkey?: string;
|
||||||
|
/** Stable mesh.member id of the sender — survives display-name changes,
|
||||||
|
* use this as the canonical reply target when set. Optional for
|
||||||
|
* legacy/non-topic broker paths that haven't been wired yet. */
|
||||||
|
senderMemberId?: string;
|
||||||
|
/** Sender's current display name as a convenience for renderers. */
|
||||||
|
senderName?: string;
|
||||||
|
/** Topic name when the push originates from a topic post (vs DM). */
|
||||||
|
topic?: string;
|
||||||
|
/** Server-side message id of the parent message when this push is a
|
||||||
|
* reply, so the recipient can render thread context and re-thread. */
|
||||||
|
replyToId?: string;
|
||||||
priority: Priority;
|
priority: Priority;
|
||||||
nonce: string;
|
nonce: string;
|
||||||
ciphertext: string;
|
ciphertext: string;
|
||||||
createdAt: string;
|
createdAt: string;
|
||||||
|
/** v0.9.0 daemon fields. Echoed when the sender's send envelope
|
||||||
|
* carried them (spec §4.2/§4.4). Receivers use `client_message_id`
|
||||||
|
* for idempotent inbox dedupe and `request_fingerprint` for
|
||||||
|
* defense-in-depth verification. Both null on legacy traffic. */
|
||||||
|
client_message_id?: string | null;
|
||||||
|
request_fingerprint?: string | null;
|
||||||
/** Optional semantic tag — "reminder" when delivered by the scheduler,
|
/** Optional semantic tag — "reminder" when delivered by the scheduler,
|
||||||
* "system" for broker-originated topology events (peer join/leave). */
|
* "system" for broker-originated topology events (peer join/leave). */
|
||||||
subtype?: "reminder" | "system";
|
subtype?: "reminder" | "system";
|
||||||
@@ -133,6 +224,26 @@ export interface WSSetStatusMessage {
|
|||||||
status: PeerStatus;
|
status: PeerStatus;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Client → broker: confirm receipt of a previously pushed envelope so the
|
||||||
|
* broker can mark the message_queue row delivered.
|
||||||
|
*
|
||||||
|
* v2 agentic-comms (M1): pairs with the two-phase claim/lease introduced
|
||||||
|
* in `drainForMember`. Without this ack, the lease expires after 30s and
|
||||||
|
* the message is re-claimed and re-pushed (at-least-once retry).
|
||||||
|
*
|
||||||
|
* Either id is accepted; daemons that track inbox dedupe by clientMessageId
|
||||||
|
* should send that one. brokerMessageId is the row primary key, useful when
|
||||||
|
* the original send didn't carry a client_message_id (legacy traffic).
|
||||||
|
*/
|
||||||
|
export interface WSClientAckMessage {
|
||||||
|
type: "client_ack";
|
||||||
|
/** Original caller-supplied idempotency id from the `send` envelope. */
|
||||||
|
clientMessageId?: string;
|
||||||
|
/** Broker-side row id (the `messageId` field on the inbound `push`). */
|
||||||
|
brokerMessageId?: string;
|
||||||
|
}
|
||||||
|
|
||||||
/** Client → broker: request list of connected peers in the same mesh. */
|
/** Client → broker: request list of connected peers in the same mesh. */
|
||||||
export interface WSListPeersMessage {
|
export interface WSListPeersMessage {
|
||||||
type: "list_peers";
|
type: "list_peers";
|
||||||
@@ -345,8 +456,12 @@ export interface WSTopicHistoryResponseMessage {
|
|||||||
messages: Array<{
|
messages: Array<{
|
||||||
id: string;
|
id: string;
|
||||||
senderPubkey: string;
|
senderPubkey: string;
|
||||||
|
senderMemberId?: string;
|
||||||
|
senderName?: string;
|
||||||
nonce: string;
|
nonce: string;
|
||||||
ciphertext: string;
|
ciphertext: string;
|
||||||
|
bodyVersion?: number;
|
||||||
|
replyToId?: string | null;
|
||||||
createdAt: string;
|
createdAt: string;
|
||||||
}>;
|
}>;
|
||||||
_reqId?: string;
|
_reqId?: string;
|
||||||
@@ -423,6 +538,8 @@ export interface WSPeersListMessage {
|
|||||||
type: "peers_list";
|
type: "peers_list";
|
||||||
peers: Array<{
|
peers: Array<{
|
||||||
pubkey: string;
|
pubkey: string;
|
||||||
|
/** Stable member pubkey — present on M1+ broker responses. */
|
||||||
|
memberPubkey?: string;
|
||||||
displayName: string;
|
displayName: string;
|
||||||
status: PeerStatus;
|
status: PeerStatus;
|
||||||
summary: string | null;
|
summary: string | null;
|
||||||
@@ -430,6 +547,13 @@ export interface WSPeersListMessage {
|
|||||||
sessionId: string;
|
sessionId: string;
|
||||||
connectedAt: string;
|
connectedAt: string;
|
||||||
cwd?: string;
|
cwd?: string;
|
||||||
|
/** v2 agentic-comms (M1): typed connection role. CLI uses this to
|
||||||
|
* filter control-plane daemons out of user-facing peer lists.
|
||||||
|
* Optional for clients talking to a pre-M1 broker. Wire field is
|
||||||
|
* `peerRole` to avoid collision with 1.31.5's top-level `role`
|
||||||
|
* (which is a lift of `profile.role`, the user-supplied string
|
||||||
|
* like "lead" / "reviewer" / "human"). */
|
||||||
|
peerRole?: "control-plane" | "session" | "service";
|
||||||
hostname?: string;
|
hostname?: string;
|
||||||
peerType?: "ai" | "human" | "connector";
|
peerType?: "ai" | "human" | "connector";
|
||||||
channel?: string;
|
channel?: string;
|
||||||
@@ -1299,6 +1423,16 @@ export interface WSVaultGetMessage { type: "vault_get"; keys: string[]; _reqId?:
|
|||||||
export interface WSWatchMessage { type: "watch"; url: string; mode?: "hash" | "json" | "status"; extract?: string; interval?: number; notify_on?: string; headers?: Record<string, string>; label?: string; _reqId?: string; }
|
export interface WSWatchMessage { type: "watch"; url: string; mode?: "hash" | "json" | "status"; extract?: string; interval?: number; notify_on?: string; headers?: Record<string, string>; label?: string; _reqId?: string; }
|
||||||
/** Client → broker: stop watching. */
|
/** Client → broker: stop watching. */
|
||||||
export interface WSUnwatchMessage { type: "unwatch"; watchId: string; _reqId?: string; }
|
export interface WSUnwatchMessage { type: "unwatch"; watchId: string; _reqId?: string; }
|
||||||
|
/** Client → broker: soft-disconnect a peer (1000; CLI auto-reconnects). */
|
||||||
|
export interface WSDisconnectMessage { type: "disconnect"; target?: string; stale?: number; all?: boolean; _reqId?: string; }
|
||||||
|
/** Client → broker: hard-kick a peer (4001; CLI exits). */
|
||||||
|
export interface WSKickMessage { type: "kick"; target?: string; stale?: number; all?: boolean; _reqId?: string; }
|
||||||
|
/** Client → broker: ban a member by pubkey or display name. */
|
||||||
|
export interface WSBanMessage { type: "ban"; target: string; reason?: string; _reqId?: string; }
|
||||||
|
/** Client → broker: lift a ban. */
|
||||||
|
export interface WSUnbanMessage { type: "unban"; target: string; _reqId?: string; }
|
||||||
|
/** Client → broker: list active bans on the caller's mesh. */
|
||||||
|
export interface WSListBansMessage { type: "list_bans"; _reqId?: string; }
|
||||||
/** Client → broker: list active watches. */
|
/** Client → broker: list active watches. */
|
||||||
export interface WSWatchListMessage { type: "watch_list"; _reqId?: string; }
|
export interface WSWatchListMessage { type: "watch_list"; _reqId?: string; }
|
||||||
/** Broker → client: watch created acknowledgement. */
|
/** Broker → client: watch created acknowledgement. */
|
||||||
@@ -1310,7 +1444,9 @@ export interface WSWatchTriggeredMessage { type: "watch_triggered"; watchId: str
|
|||||||
|
|
||||||
export type WSClientMessage =
|
export type WSClientMessage =
|
||||||
| WSHelloMessage
|
| WSHelloMessage
|
||||||
|
| WSSessionHelloMessage
|
||||||
| WSSendMessage
|
| WSSendMessage
|
||||||
|
| WSClientAckMessage
|
||||||
| WSSetStatusMessage
|
| WSSetStatusMessage
|
||||||
| WSListPeersMessage
|
| WSListPeersMessage
|
||||||
| WSSetSummaryMessage
|
| WSSetSummaryMessage
|
||||||
@@ -1402,7 +1538,12 @@ export type WSClientMessage =
|
|||||||
| WSVaultGetMessage
|
| WSVaultGetMessage
|
||||||
| WSWatchMessage
|
| WSWatchMessage
|
||||||
| WSUnwatchMessage
|
| WSUnwatchMessage
|
||||||
| WSWatchListMessage;
|
| WSWatchListMessage
|
||||||
|
| WSDisconnectMessage
|
||||||
|
| WSKickMessage
|
||||||
|
| WSBanMessage
|
||||||
|
| WSUnbanMessage
|
||||||
|
| WSListBansMessage;
|
||||||
|
|
||||||
// --- Skill messages ---
|
// --- Skill messages ---
|
||||||
|
|
||||||
@@ -1454,6 +1595,8 @@ export interface WSSkillDataMessage {
|
|||||||
instructions: string;
|
instructions: string;
|
||||||
tags: string[];
|
tags: string[];
|
||||||
author: string;
|
author: string;
|
||||||
|
/** Optional opaque metadata stored alongside the skill body. */
|
||||||
|
manifest?: unknown;
|
||||||
createdAt: string;
|
createdAt: string;
|
||||||
} | null;
|
} | null;
|
||||||
_reqId?: string;
|
_reqId?: string;
|
||||||
|
|||||||
47
apps/broker/tests/kick-control-plane-skip.test.ts
Normal file
47
apps/broker/tests/kick-control-plane-skip.test.ts
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
/**
|
||||||
|
* Kick control-plane skip: 1.34.15 (gap #3a) refuses to close
|
||||||
|
* long-lived control-plane connections (claudemesh daemon, dashboard)
|
||||||
|
* via `kick`, because they auto-reconnect within seconds and the verb
|
||||||
|
* was effectively a no-op. The soft `disconnect` verb keeps the old
|
||||||
|
* behavior so users can still nudge a control-plane peer to
|
||||||
|
* re-authenticate.
|
||||||
|
*
|
||||||
|
* Pure-logic test — mirrors the branch inside handleSend's kick case
|
||||||
|
* without spinning up a broker. Same pattern as
|
||||||
|
* grants-enforcement.test.ts.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { describe, expect, test } from "vitest";
|
||||||
|
|
||||||
|
type PeerRole = "control-plane" | "session" | "service";
|
||||||
|
|
||||||
|
/** Mirrors the predicate inserted into the kick handler. */
|
||||||
|
function shouldSkipKick(args: {
|
||||||
|
verb: "kick" | "disconnect";
|
||||||
|
peerRole: PeerRole;
|
||||||
|
}): boolean {
|
||||||
|
const skipControlPlane = args.verb === "kick";
|
||||||
|
return skipControlPlane && args.peerRole === "control-plane";
|
||||||
|
}
|
||||||
|
|
||||||
|
describe("kick control-plane skip (gap #3a)", () => {
|
||||||
|
test("kick on control-plane → skipped (would auto-reconnect)", () => {
|
||||||
|
expect(shouldSkipKick({ verb: "kick", peerRole: "control-plane" })).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
test("kick on session → not skipped (closes user session)", () => {
|
||||||
|
expect(shouldSkipKick({ verb: "kick", peerRole: "session" })).toBe(false);
|
||||||
|
});
|
||||||
|
|
||||||
|
test("kick on service → not skipped", () => {
|
||||||
|
expect(shouldSkipKick({ verb: "kick", peerRole: "service" })).toBe(false);
|
||||||
|
});
|
||||||
|
|
||||||
|
test("disconnect on control-plane → not skipped (intentional nudge)", () => {
|
||||||
|
expect(shouldSkipKick({ verb: "disconnect", peerRole: "control-plane" })).toBe(false);
|
||||||
|
});
|
||||||
|
|
||||||
|
test("disconnect on session → not skipped", () => {
|
||||||
|
expect(shouldSkipKick({ verb: "disconnect", peerRole: "session" })).toBe(false);
|
||||||
|
});
|
||||||
|
});
|
||||||
218
apps/broker/tests/session-hello-signature.test.ts
Normal file
218
apps/broker/tests/session-hello-signature.test.ts
Normal file
@@ -0,0 +1,218 @@
|
|||||||
|
/**
|
||||||
|
* Session-hello signature + parent-attestation verification.
|
||||||
|
*
|
||||||
|
* Two-stage proof:
|
||||||
|
* 1. Parent member signs `canonicalSessionAttestation` (long-lived, ≤24h
|
||||||
|
* TTL) — vouches that the session pubkey belongs to them.
|
||||||
|
* 2. Session keypair signs `canonicalSessionHello` per WS-connect — proves
|
||||||
|
* liveness + possession.
|
||||||
|
*
|
||||||
|
* The broker rejects on any: expired/over-TTL attestation, bad signature,
|
||||||
|
* timestamp skew, malformed hex, or a session signature made with the
|
||||||
|
* wrong key (covers the "attestation leaked, attacker tries to use it
|
||||||
|
* without the session secret key" case).
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { beforeAll, describe, expect, test } from "vitest";
|
||||||
|
import sodium from "libsodium-wrappers";
|
||||||
|
import {
|
||||||
|
canonicalSessionAttestation,
|
||||||
|
canonicalSessionHello,
|
||||||
|
verifySessionAttestation,
|
||||||
|
verifySessionHelloSignature,
|
||||||
|
SESSION_ATTESTATION_MAX_TTL_MS,
|
||||||
|
HELLO_SKEW_MS,
|
||||||
|
} from "../src/crypto";
|
||||||
|
|
||||||
|
interface Keypair {
|
||||||
|
publicKey: string;
|
||||||
|
secretKey: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function makeKeypair(): Promise<Keypair> {
|
||||||
|
await sodium.ready;
|
||||||
|
const kp = sodium.crypto_sign_keypair();
|
||||||
|
return {
|
||||||
|
publicKey: sodium.to_hex(kp.publicKey),
|
||||||
|
secretKey: sodium.to_hex(kp.privateKey),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function sign(canonical: string, secretKeyHex: string): string {
|
||||||
|
return sodium.to_hex(
|
||||||
|
sodium.crypto_sign_detached(
|
||||||
|
sodium.from_string(canonical),
|
||||||
|
sodium.from_hex(secretKeyHex),
|
||||||
|
),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
describe("verifySessionAttestation", () => {
|
||||||
|
let parent: Keypair;
|
||||||
|
let session: Keypair;
|
||||||
|
|
||||||
|
beforeAll(async () => {
|
||||||
|
parent = await makeKeypair();
|
||||||
|
session = await makeKeypair();
|
||||||
|
});
|
||||||
|
|
||||||
|
test("valid attestation accepted", async () => {
|
||||||
|
const expiresAt = Date.now() + 60 * 60 * 1000;
|
||||||
|
const canonical = canonicalSessionAttestation(parent.publicKey, session.publicKey, expiresAt);
|
||||||
|
const signature = sign(canonical, parent.secretKey);
|
||||||
|
const result = await verifySessionAttestation({
|
||||||
|
parentMemberPubkey: parent.publicKey,
|
||||||
|
sessionPubkey: session.publicKey,
|
||||||
|
expiresAt,
|
||||||
|
signature,
|
||||||
|
});
|
||||||
|
expect(result.ok).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
test("expired attestation rejected", async () => {
|
||||||
|
const expiresAt = Date.now() - 1_000;
|
||||||
|
const canonical = canonicalSessionAttestation(parent.publicKey, session.publicKey, expiresAt);
|
||||||
|
const signature = sign(canonical, parent.secretKey);
|
||||||
|
const result = await verifySessionAttestation({
|
||||||
|
parentMemberPubkey: parent.publicKey,
|
||||||
|
sessionPubkey: session.publicKey,
|
||||||
|
expiresAt,
|
||||||
|
signature,
|
||||||
|
});
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
if (!result.ok) expect(result.reason).toBe("expired");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("over-24h TTL rejected", async () => {
|
||||||
|
const expiresAt = Date.now() + SESSION_ATTESTATION_MAX_TTL_MS + 60_000;
|
||||||
|
const canonical = canonicalSessionAttestation(parent.publicKey, session.publicKey, expiresAt);
|
||||||
|
const signature = sign(canonical, parent.secretKey);
|
||||||
|
const result = await verifySessionAttestation({
|
||||||
|
parentMemberPubkey: parent.publicKey,
|
||||||
|
sessionPubkey: session.publicKey,
|
||||||
|
expiresAt,
|
||||||
|
signature,
|
||||||
|
});
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
if (!result.ok) expect(result.reason).toBe("ttl_too_long");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("attestation signed by wrong key rejected", async () => {
|
||||||
|
const other = await makeKeypair();
|
||||||
|
const expiresAt = Date.now() + 60 * 60 * 1000;
|
||||||
|
const canonical = canonicalSessionAttestation(parent.publicKey, session.publicKey, expiresAt);
|
||||||
|
// Sign with a different parent — verifier still checks against
|
||||||
|
// claimed parentMemberPubkey, so it should fail.
|
||||||
|
const signature = sign(canonical, other.secretKey);
|
||||||
|
const result = await verifySessionAttestation({
|
||||||
|
parentMemberPubkey: parent.publicKey,
|
||||||
|
sessionPubkey: session.publicKey,
|
||||||
|
expiresAt,
|
||||||
|
signature,
|
||||||
|
});
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
if (!result.ok) expect(result.reason).toBe("bad_signature");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("tampered session_pubkey fails (canonical mismatch)", async () => {
|
||||||
|
const expiresAt = Date.now() + 60 * 60 * 1000;
|
||||||
|
const canonical = canonicalSessionAttestation(parent.publicKey, session.publicKey, expiresAt);
|
||||||
|
const signature = sign(canonical, parent.secretKey);
|
||||||
|
const evil = await makeKeypair();
|
||||||
|
const result = await verifySessionAttestation({
|
||||||
|
parentMemberPubkey: parent.publicKey,
|
||||||
|
sessionPubkey: evil.publicKey, // claim a different session pubkey
|
||||||
|
expiresAt,
|
||||||
|
signature,
|
||||||
|
});
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
if (!result.ok) expect(result.reason).toBe("bad_signature");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("malformed hex rejected", async () => {
|
||||||
|
const expiresAt = Date.now() + 60 * 60 * 1000;
|
||||||
|
const result = await verifySessionAttestation({
|
||||||
|
parentMemberPubkey: "not-hex",
|
||||||
|
sessionPubkey: session.publicKey,
|
||||||
|
expiresAt,
|
||||||
|
signature: "a".repeat(128),
|
||||||
|
});
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
if (!result.ok) expect(result.reason).toBe("malformed");
|
||||||
|
});
|
||||||
|
});
|
||||||
|
|
||||||
|
describe("verifySessionHelloSignature", () => {
|
||||||
|
let parent: Keypair;
|
||||||
|
let session: Keypair;
|
||||||
|
|
||||||
|
beforeAll(async () => {
|
||||||
|
parent = await makeKeypair();
|
||||||
|
session = await makeKeypair();
|
||||||
|
});
|
||||||
|
|
||||||
|
test("valid session-hello signature accepted", async () => {
|
||||||
|
const meshId = "mesh-x";
|
||||||
|
const timestamp = Date.now();
|
||||||
|
const canonical = canonicalSessionHello(meshId, parent.publicKey, session.publicKey, timestamp);
|
||||||
|
const signature = sign(canonical, session.secretKey);
|
||||||
|
const result = await verifySessionHelloSignature({
|
||||||
|
meshId,
|
||||||
|
parentMemberPubkey: parent.publicKey,
|
||||||
|
sessionPubkey: session.publicKey,
|
||||||
|
timestamp,
|
||||||
|
signature,
|
||||||
|
});
|
||||||
|
expect(result.ok).toBe(true);
|
||||||
|
});
|
||||||
|
|
||||||
|
test("attacker without session secret key cannot forge session-hello", async () => {
|
||||||
|
// The hostile case: attacker captured a valid attestation but doesn't
|
||||||
|
// hold the session secret key. They try to sign session_hello with the
|
||||||
|
// parent's key — broker checks the signature against sessionPubkey,
|
||||||
|
// which fails because the parent didn't sign with the session key.
|
||||||
|
const meshId = "mesh-x";
|
||||||
|
const timestamp = Date.now();
|
||||||
|
const canonical = canonicalSessionHello(meshId, parent.publicKey, session.publicKey, timestamp);
|
||||||
|
const signature = sign(canonical, parent.secretKey); // wrong secret key
|
||||||
|
const result = await verifySessionHelloSignature({
|
||||||
|
meshId,
|
||||||
|
parentMemberPubkey: parent.publicKey,
|
||||||
|
sessionPubkey: session.publicKey,
|
||||||
|
timestamp,
|
||||||
|
signature,
|
||||||
|
});
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
if (!result.ok) expect(result.reason).toBe("bad_signature");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("timestamp skew rejected", async () => {
|
||||||
|
const timestamp = Date.now() - HELLO_SKEW_MS - 1_000;
|
||||||
|
const canonical = canonicalSessionHello("mesh-x", parent.publicKey, session.publicKey, timestamp);
|
||||||
|
const signature = sign(canonical, session.secretKey);
|
||||||
|
const result = await verifySessionHelloSignature({
|
||||||
|
meshId: "mesh-x",
|
||||||
|
parentMemberPubkey: parent.publicKey,
|
||||||
|
sessionPubkey: session.publicKey,
|
||||||
|
timestamp,
|
||||||
|
signature,
|
||||||
|
});
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
if (!result.ok) expect(result.reason).toBe("timestamp_skew");
|
||||||
|
});
|
||||||
|
|
||||||
|
test("tampered meshId fails verification", async () => {
|
||||||
|
const timestamp = Date.now();
|
||||||
|
const canonical = canonicalSessionHello("mesh-A", parent.publicKey, session.publicKey, timestamp);
|
||||||
|
const signature = sign(canonical, session.secretKey);
|
||||||
|
const result = await verifySessionHelloSignature({
|
||||||
|
meshId: "mesh-B", // claim a different mesh
|
||||||
|
parentMemberPubkey: parent.publicKey,
|
||||||
|
sessionPubkey: session.publicKey,
|
||||||
|
timestamp,
|
||||||
|
signature,
|
||||||
|
});
|
||||||
|
expect(result.ok).toBe(false);
|
||||||
|
if (!result.ok) expect(result.reason).toBe("bad_signature");
|
||||||
|
});
|
||||||
|
});
|
||||||
File diff suppressed because it is too large
Load Diff
@@ -2,7 +2,11 @@
|
|||||||
|
|
||||||
Peer mesh for Claude Code sessions. Connect multiple Claude Code instances into a shared mesh with real-time messaging, shared state, memory, file sharing, vector store, scheduled jobs, and more — all driven from the `claudemesh` CLI. The MCP server is a tool-less push-pipe that delivers inbound peer messages to Claude as `<channel>` interrupts; everything else lives behind CLI verbs that Claude learns from the auto-installed `claudemesh` skill.
|
Peer mesh for Claude Code sessions. Connect multiple Claude Code instances into a shared mesh with real-time messaging, shared state, memory, file sharing, vector store, scheduled jobs, and more — all driven from the `claudemesh` CLI. The MCP server is a tool-less push-pipe that delivers inbound peer messages to Claude as `<channel>` interrupts; everything else lives behind CLI verbs that Claude learns from the auto-installed `claudemesh` skill.
|
||||||
|
|
||||||
> **What's new in 1.7.0:** terminal parity for the v1.6.x server features. New verbs: `claudemesh topic tail` (live SSE message stream — Ctrl-C to exit), `claudemesh notification list` (recent `@you` mentions across topics), `claudemesh member list` (mesh roster with online dots, distinct from `peer list`'s live-session view). Each command auto-mints a 5-minute read-only apikey via the WebSocket and revokes it on exit, so no token plumbing is needed.
|
> **What's new in 1.9.x:** topic threading + multi-session reliability fixes. `claudemesh topic post <topic> <msg> --reply-to <id>` threads a reply onto a previous topic message (full id or 8+ char prefix); `topic tail` renders `↳ in reply to <name>: "<snippet>"` above replies and shows a copyable `#xxxxxxxx` short id on every row. `<channel>` MCP attrs now carry `from_member_id`, `from_pubkey` (stable), `from_session_pubkey` (ephemeral), `message_id`, `topic`, `reply_to_id` — everything the recipient needs to reply directly. Broker fixes (v0.3.2): replies to a stale session pubkey now resolve to the owning member's live session instead of bouncing with "not online", and broadcast `*` no longer loopbacks decrypt-fail warnings to the sender's sibling sessions.
|
||||||
|
>
|
||||||
|
> **What was new in 1.8.0:** per-topic end-to-end encryption (v0.3.0 phase 3, CLI side). `claudemesh topic post <topic> <msg>` encrypts the body with `crypto_secretbox` under the topic's symmetric key — broker stores ciphertext only. `claudemesh topic tail` now decrypts v2 messages on render and runs a background re-seal loop every 30s, so new topic joiners get their sealed keys without manual action. `topic-key` cache is process-only — kill the CLI, the key forgets. Web dashboard reads v1 plaintext for now (phase 3.5 brings browser-side identity).
|
||||||
|
>
|
||||||
|
> **What was new in 1.7.0:** terminal parity for the v1.6.x server features. New verbs: `claudemesh topic tail` (live SSE message stream — Ctrl-C to exit), `claudemesh notification list` (recent `@you` mentions across topics), `claudemesh member list` (mesh roster with online dots, distinct from `peer list`'s live-session view). Each command auto-mints a 5-minute read-only apikey via the WebSocket and revokes it on exit, so no token plumbing is needed.
|
||||||
>
|
>
|
||||||
> **What was new in 1.6.0:** topics (channel pub/sub), API keys for human/REST clients, and bridge peers that forward a topic between two meshes. New verbs: `claudemesh topic`, `claudemesh apikey`, `claudemesh bridge`. A REST surface at `https://claudemesh.com/api/v1/*` (messages, topics, peers, history) accepts `Authorization: Bearer cm_...` keys, so any HTTPS client can participate without WebSocket + ed25519 plumbing. **Note**: REST lives on the web host (`claudemesh.com`), not the broker host (`ic.claudemesh.com`) — the broker only speaks WebSocket.
|
> **What was new in 1.6.0:** topics (channel pub/sub), API keys for human/REST clients, and bridge peers that forward a topic between two meshes. New verbs: `claudemesh topic`, `claudemesh apikey`, `claudemesh bridge`. A REST surface at `https://claudemesh.com/api/v1/*` (messages, topics, peers, history) accepts `Authorization: Bearer cm_...` keys, so any HTTPS client can participate without WebSocket + ed25519 plumbing. **Note**: REST lives on the web host (`claudemesh.com`), not the broker host (`ic.claudemesh.com`) — the broker only speaks WebSocket.
|
||||||
>
|
>
|
||||||
@@ -45,7 +49,8 @@ USAGE
|
|||||||
claudemesh profile view or edit your profile
|
claudemesh profile view or edit your profile
|
||||||
|
|
||||||
claudemesh topic ... create, list, join, send to topics
|
claudemesh topic ... create, list, join, send to topics
|
||||||
claudemesh topic tail <t> live SSE tail of a topic
|
claudemesh topic tail <t> live SSE tail of a topic (decrypts v2)
|
||||||
|
claudemesh topic post <t> encrypted REST post (v2 ciphertext)
|
||||||
claudemesh member list mesh roster with online state
|
claudemesh member list mesh roster with online state
|
||||||
claudemesh notification list recent @-mentions of you
|
claudemesh notification list recent @-mentions of you
|
||||||
claudemesh apikey ... issue, list, revoke API keys (REST clients)
|
claudemesh apikey ... issue, list, revoke API keys (REST clients)
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
{
|
{
|
||||||
"name": "claudemesh-cli",
|
"name": "claudemesh-cli",
|
||||||
"version": "1.7.0",
|
"version": "1.34.16",
|
||||||
"description": "Peer mesh for Claude Code sessions — CLI + MCP server.",
|
"description": "Peer mesh for Claude Code sessions — CLI + MCP server.",
|
||||||
"keywords": [
|
"keywords": [
|
||||||
"claude-code",
|
"claude-code",
|
||||||
|
|||||||
@@ -9,14 +9,230 @@ description: Use when the user asks to send a message to a peer Claude session,
|
|||||||
|
|
||||||
**You invoke claudemesh exclusively through the `claudemesh` CLI via Bash.** There are no MCP tools to call — `tools/list` returns empty for the claudemesh MCP server. The MCP server exists only to deliver inbound peer messages as `<channel source="claudemesh">` interrupts mid-turn. Everything else is CLI.
|
**You invoke claudemesh exclusively through the `claudemesh` CLI via Bash.** There are no MCP tools to call — `tools/list` returns empty for the claudemesh MCP server. The MCP server exists only to deliver inbound peer messages as `<channel source="claudemesh">` interrupts mid-turn. Everything else is CLI.
|
||||||
|
|
||||||
|
## Launch welcome (`kind: "welcome"`) — 1.34.2+
|
||||||
|
|
||||||
|
5 seconds after Claude Code attaches to claudemesh via `claudemesh launch`, the MCP server emits ONE `<channel source="claudemesh">` push with `meta.kind: "welcome"`. It carries identity (`self_display_name`, `self_session_pubkey`, `self_role`), the active `mesh_slug`, live `peer_count` + `peer_names`, recent `unread_count` + `latest_message_ids`, and a CLI hint line. Treat it as the "mesh is connected" handshake — read it once, internalize identity + peers + inbox state, and use it to decide whether to act on unread items right away. Do NOT reply to a welcome push the way you reply to a DM; it has no sender.
|
||||||
|
|
||||||
## When you receive a `<channel source="claudemesh">` message
|
## When you receive a `<channel source="claudemesh">` message
|
||||||
|
|
||||||
Respond IMMEDIATELY. Pause your current task, reply via `claudemesh send`, then resume. Read `from_name`, `mesh_slug`, and `priority` from the channel attributes. Reply by setting `<to>` to the sender's `from_name`. Do not ignore low-priority messages — acknowledge them briefly even if you defer action. If the channel meta contains `subtype: reminder`, this is a scheduled reminder you set yourself — act on it.
|
Respond IMMEDIATELY (unless `meta.kind` is `"welcome"` or `"system"` — those are informational, no reply needed). Pause your current task, reply via `claudemesh send`, then resume. Read `from_name`, `mesh_slug`, and `priority` from the channel attributes. Reply by setting `<to>` to the sender's `from_name`. Do not ignore low-priority messages — acknowledge them briefly even if you defer action. If the channel meta contains `subtype: reminder`, this is a scheduled reminder you set yourself — act on it.
|
||||||
|
|
||||||
|
### Channel attributes (everything you need to reply is in the push)
|
||||||
|
|
||||||
|
The `<channel>` interrupt carries these attributes — no lookup needed:
|
||||||
|
|
||||||
|
| Attribute | What it is |
|
||||||
|
|---|---|
|
||||||
|
| `from_name` | Sender's display name. **Use as `to` in your reply** for DMs. Empty/absent on `kind: "welcome"` and `kind: "system"`. |
|
||||||
|
| `from_pubkey` | Sender's **session pubkey** (hex, ephemeral per-launch). Since 1.34.0 this is the session pubkey of the launched session that originated the send, NOT the daemon's stable member pubkey — sibling sessions of the same human are correctly disambiguated. |
|
||||||
|
| `from_session_pubkey` | Same as `from_pubkey` for session-originated DMs. Kept as a separate key so the model never confuses session vs member identity when a control-plane source is involved. |
|
||||||
|
| `from_member_id` / `from_member_pubkey` | Sender's stable mesh.member id / pubkey. Survives display-name and session rotation. Use to recognize "the same human across multiple Claude Code windows". |
|
||||||
|
| `mesh_slug` | Mesh the message arrived on. Pass via `--mesh <slug>` if the parent isn't on the same mesh. |
|
||||||
|
| `priority` | `now` / `next` / `low`. |
|
||||||
|
| `message_id` | Server-side id of THIS message. **Pass to `--reply-to <id>` to thread your reply** in topic posts. |
|
||||||
|
| `client_message_id` | Sender-stable idempotency id (UUID). Survives broker restarts; safe to log. |
|
||||||
|
| `topic` | Set when the source is a topic post. Reply via `topic post <topic> --reply-to <message_id>`. |
|
||||||
|
| `reply_to_id` | Set when the message itself is a reply to a previous one — render thread context. |
|
||||||
|
| `kind` (welcome/system meta only) | `"welcome"` for the launch handshake, `"system"` for peer_join/peer_leave/etc. — neither needs a reply. |
|
||||||
|
|
||||||
|
**Reply patterns:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# DM → use from_name as the target
|
||||||
|
claudemesh send "<from_name>" "ack — looking now"
|
||||||
|
|
||||||
|
# Topic reply → thread it onto the message you got
|
||||||
|
claudemesh topic post "<topic>" "yep, looks good" --reply-to <message_id>
|
||||||
|
|
||||||
|
# When the sender is on a different mesh you've joined
|
||||||
|
claudemesh send "<from_name>" "..." --mesh "<mesh_slug>"
|
||||||
|
```
|
||||||
|
|
||||||
## Performance model (warm vs cold path)
|
## Performance model (warm vs cold path)
|
||||||
|
|
||||||
If the parent Claude session was launched via `claudemesh launch`, an MCP push-pipe is running and holds the per-mesh WS connection. CLI invocations dial `~/.claudemesh/sockets/<mesh-slug>.sock` and reuse that warm connection (~200ms total round-trip including Node.js startup). If no push-pipe is running (cron, scripts, hooks fired outside a session), the CLI opens its own WS, which takes ~500-700ms cold. **You don't manage this** — every verb auto-detects and falls through.
|
If the parent Claude session was launched via `claudemesh launch`, an MCP push-pipe is running and holds the per-mesh WS connection. CLI invocations dial `~/.claudemesh/sockets/<mesh-slug>.sock` and reuse that warm connection (~200ms total round-trip including Node.js startup). If no push-pipe is running (cron, scripts, hooks fired outside a session), the CLI opens its own WS, which takes ~500-700ms cold. **You don't manage this** — every verb auto-detects and falls through.
|
||||||
|
|
||||||
|
### Daemon path (v1.24.0+, REQUIRED for in-Claude-Code use)
|
||||||
|
|
||||||
|
`claudemesh daemon up [--mesh <slug>]` starts a persistent per-user runtime that holds the broker WS, a durable SQLite outbox/inbox, and listens on `~/.claudemesh/daemon/daemon.sock` (UDS) plus an optional loopback TCP. When the daemon socket is present, every verb routes through it first (~1ms IPC) before falling back to bridge / cold paths. The send envelope carries a caller-stable `client_message_id`, so a `claudemesh send` that started before a daemon crash survives the restart via the on-disk outbox.
|
||||||
|
|
||||||
|
Lifecycle:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
claudemesh daemon up --mesh <slug> # foreground
|
||||||
|
claudemesh daemon install-service --mesh <slug> # macOS launchd / Linux systemd-user
|
||||||
|
claudemesh daemon status [--json] # health + pid
|
||||||
|
claudemesh daemon outbox list [--failed|--pending|...] # local queue inspection
|
||||||
|
claudemesh daemon outbox requeue <id> # re-enqueue an aborted/dead row
|
||||||
|
claudemesh daemon down # SIGTERM + wait
|
||||||
|
```
|
||||||
|
|
||||||
|
As of 1.24.0 `claudemesh install` registers the MCP entry **and** installs/starts the daemon service for the user's primary mesh. The MCP shim hard-requires the daemon to be running — it bails at boot with actionable instructions if the socket isn't present. There is no fallback. CLI verbs (`send`, `peer list`, `inbox`, `skill list/get`, etc.) keep working without a daemon via bridge or cold paths, but for any in-Claude-Code use the daemon must be up.
|
||||||
|
|
||||||
|
### Ambient mode (1.25.0+)
|
||||||
|
|
||||||
|
Once `claudemesh install` has run (registers MCP entry + starts daemon service), **raw `claude` Just Works** for the daemon's attached mesh. No `claudemesh launch` ceremony, no manual flags, no per-session keypair. Channel push, slash commands, and resources all flow through the daemon-backed MCP shim. Use `claudemesh launch` only when you need to override defaults (different mesh, custom display name, system-prompt injection, headless modes).
|
||||||
|
|
||||||
|
## Spawning new sessions (no wizard)
|
||||||
|
|
||||||
|
`claudemesh launch` remains useful for non-default cases: explicit mesh selection, fresh display name, headless `--quiet` runs, system-prompt injection, multi-mesh users with one daemon attached to mesh A who want to spawn into mesh B. For the common case (single joined mesh, daemon installed), prefer raw `claude`. Pass every required flag up front so no interactive prompt fires — that's what makes the verb scriptable from tmux send-keys, AppleScript/iTerm spawn helpers, hooks, cron, and the `claudemesh launch` you call from inside another session.
|
||||||
|
|
||||||
|
### Full flag surface
|
||||||
|
|
||||||
|
| Flag | What it skips | Notes |
|
||||||
|
|---|---|---|
|
||||||
|
| `--name <display-name>` | the "What's your name?" prompt | required when spawning unattended; persists as the session's display name and `from_name` in inbound channels |
|
||||||
|
| `--mesh <slug>` | the multi-mesh picker | required when the user has joined >1 mesh; otherwise the single mesh is auto-selected |
|
||||||
|
| `--join <invite-url>` | the "join a mesh first" branch | run join + launch in one step; pair with `-y` for fully non-interactive |
|
||||||
|
| `--groups "name:role,name2:role2,all"` | the group selection prompt | comma-separated `<groupname>:<role>` entries; the literal `all` joins `@all` |
|
||||||
|
| `--role <lead\|member\|observer>` | the role prompt | applied to all groups in `--groups` that didn't specify their own |
|
||||||
|
| `--message-mode <push\|inbox>` | the message-mode prompt | `push` (default) emits `<channel>` notifications mid-turn; `inbox` only buffers — quieter for headless agents |
|
||||||
|
| `--system-prompt <text>` | nothing — pure pass-through | forwarded to `claude --system-prompt` (overrides default; pass a string, not a path) |
|
||||||
|
| `--resume <session-id>` | nothing — pure pass-through | forwarded to `claude --resume` to continue a prior Claude Code session |
|
||||||
|
| `--continue` | nothing — pure pass-through | forwarded to `claude --continue` (resumes the last session in this cwd) |
|
||||||
|
| `-y` / `--yes` | every confirmation prompt | including the "you'll skip ALL permission prompts" gate. **Use for autonomous agents; omit for shared/multi-person meshes.** |
|
||||||
|
| `--quiet` | the wizard + welcome banner | suppresses the launch wizard and banner. Combine with `-y` for true headless: `--quiet` alone won't bypass Claude's permission prompts, so a script using only `--quiet` will hang on the first tool call. |
|
||||||
|
| `--` | (separator) | everything after `--` is forwarded verbatim to `claude`. Example: `claudemesh launch --name X -y -- --resume abc123 --model opus` |
|
||||||
|
|
||||||
|
> **All twelve flags are end-to-end wired as of `claudemesh-cli@1.27.1`.** Earlier builds silently dropped `--role`, `--groups`, `--message-mode`, `--system-prompt`, `--continue`, and `--quiet` at the CLI entrypoint — they were declared but never reached `runLaunch`. If a script targets older versions, those flags are no-ops.
|
||||||
|
|
||||||
|
### Wizard-free spawn templates
|
||||||
|
|
||||||
|
#### Canonical fully-populated spawn (every flag set explicitly)
|
||||||
|
|
||||||
|
The kitchen-sink form — copy, set every value, and the session boots without a single interactive prompt or banner. Use as a base when scripting from cron, hooks, CI, or another agent:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
claudemesh launch \
|
||||||
|
--name "ci-bot" \
|
||||||
|
--mesh openclaw \
|
||||||
|
--role member \
|
||||||
|
--groups "frontend:lead,reviewers:observer,all" \
|
||||||
|
--message-mode inbox \
|
||||||
|
--system-prompt "$(cat ~/agents/ci-bot.md)" \
|
||||||
|
--quiet \
|
||||||
|
-y \
|
||||||
|
-- \
|
||||||
|
--model opus \
|
||||||
|
--resume "$LAST_SESSION_ID"
|
||||||
|
```
|
||||||
|
|
||||||
|
Annotated:
|
||||||
|
|
||||||
|
| Position | Value | Effect |
|
||||||
|
|---|---|---|
|
||||||
|
| `--name "ci-bot"` | identity | what peers see in `peer list` and `<channel from_name>` — pin so peers always see the same name across machines |
|
||||||
|
| `--mesh openclaw` | workspace | required when you have ≥2 joined meshes; safe to include even with 1 (becomes a no-op assertion) |
|
||||||
|
| `--role member` | session label | free-form tag used by group conventions; common values: `lead`, `member`, `observer`, `bot`, `oncall` |
|
||||||
|
| `--groups "frontend:lead,..."` | group memberships | comma-separated `<group>:<role>` pairs; bare `all` joins `@all` with no role |
|
||||||
|
| `--message-mode inbox` | delivery | `push` interrupts mid-turn (default); `inbox` buffers silently; `off` disables messages but keeps tool calls |
|
||||||
|
| `--system-prompt "..."` | claude system prompt | overrides Claude's default. Pass a string, not a path — wrap with `$(cat …)` if you keep prompts in files |
|
||||||
|
| `--quiet` | output | suppress the wizard and banner — clean stdout for the spawning script |
|
||||||
|
| `-y` | consent | skips every permission prompt (claudemesh's policy gate **and** Claude's `--dangerously-skip-permissions`). Required for true headless |
|
||||||
|
| `--` | separator | everything after is passed verbatim to `claude` |
|
||||||
|
| `--model opus` | claude flag | example claude-side override |
|
||||||
|
| `--resume "$LAST_SESSION_ID"` | claude flag | resume a prior Claude session inside this mesh identity |
|
||||||
|
|
||||||
|
**Rule of thumb:** for any unattended spawn, the minimum is `--name + --mesh + -y + --quiet`. Add `--system-prompt` to seed task context, `--message-mode inbox` to keep the bot quiet, and `--role` + `--groups` so peers know how to address it. Drop `--quiet` when a human is watching the script's stdout.
|
||||||
|
|
||||||
|
#### Trimmed templates
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Minimal — single joined mesh, fresh agent, autonomous:
|
||||||
|
claudemesh launch --name "Lug Nut" -y
|
||||||
|
|
||||||
|
# Multi-mesh user — pick mesh explicitly:
|
||||||
|
claudemesh launch --name "Mou" --mesh openclaw -y
|
||||||
|
|
||||||
|
# Cold-start a peer who hasn't joined the mesh yet:
|
||||||
|
claudemesh launch \
|
||||||
|
--name "Lug Nut" \
|
||||||
|
--join "https://claudemesh.com/i/abc123" \
|
||||||
|
--groups "frontend:member,reviewers:observer,all" \
|
||||||
|
--message-mode push \
|
||||||
|
-y
|
||||||
|
|
||||||
|
# Resume a specific Claude session inside claudemesh:
|
||||||
|
claudemesh launch --name "Mou" --mesh openclaw -y -- --resume abc123-...
|
||||||
|
|
||||||
|
# Quiet, headless, system-prompt loaded — for cron / hooks:
|
||||||
|
claudemesh launch --name "ci-bot" --mesh openclaw \
|
||||||
|
--system-prompt "$(cat ~/agents/ci-bot.md)" \
|
||||||
|
--message-mode inbox \
|
||||||
|
--quiet -y
|
||||||
|
```
|
||||||
|
|
||||||
|
If any required flag is missing AND stdin is a TTY, `launch` falls back to its prompt for that single field. **In a non-TTY context (Bash tool, cron, AppleScript pipe), missing flags cause the verb to fail-closed — never silently use a default that affects identity.**
|
||||||
|
|
||||||
|
### Spawning into new terminal panes/windows
|
||||||
|
|
||||||
|
The launch verb itself is just a shell command — wrap it in whatever pane-creation primitive the host platform uses. The patterns that work today:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# tmux — send into a pane you control. NEVER send-keys into a pane
|
||||||
|
# you didn't create; you risk typing into another live TUI.
|
||||||
|
tmux new-window -t "$SESSION" -n claudemesh-lugnut
|
||||||
|
tmux send-keys -t "$SESSION:claudemesh-lugnut" \
|
||||||
|
'claudemesh launch --name "Lug Nut" --mesh openclaw -y' Enter
|
||||||
|
|
||||||
|
# macOS iTerm2 (split current window into a vertical pane):
|
||||||
|
osascript <<'OSA'
|
||||||
|
tell application "iTerm2"
|
||||||
|
tell current window
|
||||||
|
create tab with default profile
|
||||||
|
tell current session of current tab
|
||||||
|
write text "claudemesh launch --name \"Lug Nut\" --mesh openclaw -y"
|
||||||
|
end tell
|
||||||
|
end tell
|
||||||
|
end tell
|
||||||
|
OSA
|
||||||
|
|
||||||
|
# macOS Terminal.app (new window):
|
||||||
|
osascript -e 'tell application "Terminal" to do script "claudemesh launch --name \"Lug Nut\" --mesh openclaw -y"'
|
||||||
|
|
||||||
|
# GNOME Terminal / generic Linux:
|
||||||
|
gnome-terminal -- bash -lc 'claudemesh launch --name "Lug Nut" --mesh openclaw -y'
|
||||||
|
|
||||||
|
# screen detached:
|
||||||
|
screen -dmS lugnut bash -lc 'claudemesh launch --name "Lug Nut" --mesh openclaw -y'
|
||||||
|
|
||||||
|
# Windows Terminal (wt.exe) — open a new tab:
|
||||||
|
wt.exe new-tab --title claudemesh-lugnut powershell -NoExit -Command "claudemesh launch --name 'Lug Nut' --mesh openclaw -y"
|
||||||
|
|
||||||
|
# Windows Terminal — split the current pane vertically instead:
|
||||||
|
wt.exe split-pane -V powershell -NoExit -Command "claudemesh launch --name 'Lug Nut' --mesh openclaw -y"
|
||||||
|
|
||||||
|
# PowerShell — spawn a detached window of the user's default shell:
|
||||||
|
Start-Process powershell -ArgumentList '-NoExit','-Command','claudemesh launch --name "Lug Nut" --mesh openclaw -y'
|
||||||
|
|
||||||
|
# cmd.exe — start a new console window:
|
||||||
|
start "claudemesh-lugnut" cmd /k "claudemesh launch --name ""Lug Nut"" --mesh openclaw -y"
|
||||||
|
|
||||||
|
# WSL from a Windows host — same launch verb, just route through wsl.exe:
|
||||||
|
wsl.exe -- bash -lc 'claudemesh launch --name "Lug Nut" --mesh openclaw -y'
|
||||||
|
```
|
||||||
|
|
||||||
|
Windows-specific gotchas:
|
||||||
|
- **Single quotes don't nest in cmd.exe.** Use `""` to escape inner double quotes (see the `cmd /k` example) or move to PowerShell where single quotes work normally.
|
||||||
|
- **`-NoExit`** is the PowerShell equivalent of bash's `exec` + interactive shell — keeps the window open after `claudemesh launch` returns control to its child `claude` process. Without it, the window closes when the launch script exits.
|
||||||
|
- **WSL paths.** If you spawn from a Windows-side script into WSL, the `claudemesh` CLI in WSL writes to `~/.claudemesh/` on the Linux side, *not* `%USERPROFILE%\.claudemesh\`. The two installs are independent — match the spawn host to the install host.
|
||||||
|
- **Windows Terminal profile names.** Replace `powershell` with `pwsh` for PowerShell 7+, or use `--profile "<name>"` to target a configured profile (e.g. one preconfigured with WSL Ubuntu + a starting directory).
|
||||||
|
|
||||||
|
The user's environment may also have these pre-built helpers (CLAUDE.md will tell you):
|
||||||
|
|
||||||
|
- `~/tools/scripts/spawn-iterm-panes.sh` and `spawn-iterm-window.sh` — safer iTerm spawners that only write into sessions they themselves created.
|
||||||
|
- `~/tools/scripts/claude-peers.sh` — tmux wrapper that opens a split running `claudemesh launch` with sensible defaults.
|
||||||
|
|
||||||
|
Prefer those when available — they handle pane ownership / cleanup correctly.
|
||||||
|
|
||||||
|
### Sanity rules for unattended spawns
|
||||||
|
|
||||||
|
1. **Always pass `--name`.** A nameless session falls back to `<hostname>-<pid>`, which makes peer attribution opaque in `peer list` and inbound channels.
|
||||||
|
2. **Always pass `--mesh` when the user has multiple meshes joined.** Otherwise the picker fires and the spawn hangs waiting for stdin.
|
||||||
|
3. **Pass `-y` only when you understand the consent it grants.** It skips every permission gate — fine for an autonomous agent on a private mesh, dangerous on a shared mesh where peers can drive your file system.
|
||||||
|
4. **For long-running daemonised peers, use `--message-mode inbox`** so they don't fire `<channel>` interrupts on every received DM. They poll `claudemesh inbox` on their own cadence.
|
||||||
|
5. **Confirm the spawn worked** by waiting a few seconds and running `claudemesh peer list` — the new peer's `displayName` should appear with `status: "idle"`.
|
||||||
|
|
||||||
## Universal flags
|
## Universal flags
|
||||||
|
|
||||||
| Flag | Behavior |
|
| Flag | Behavior |
|
||||||
@@ -52,15 +268,57 @@ claudemesh topic history deploys --limit 50 # fetch back-scroll
|
|||||||
claudemesh topic history deploys --before <msg-id> # paginate older
|
claudemesh topic history deploys --before <msg-id> # paginate older
|
||||||
claudemesh topic read deploys # mark all as read
|
claudemesh topic read deploys # mark all as read
|
||||||
|
|
||||||
# Send to a topic — same `send` verb, target starts with #
|
# Send to a topic — same `send` verb, target starts with # (WS, v1 plaintext)
|
||||||
claudemesh send "#deploys" "rolling out 1.5.1 to staging"
|
claudemesh send "#deploys" "rolling out 1.5.1 to staging"
|
||||||
|
|
||||||
|
# v1.7.0+: live tail in the terminal — backfill last N + then SSE forward.
|
||||||
|
# Decrypts v2 messages on render. Runs a 30s re-seal loop while held.
|
||||||
|
claudemesh topic tail deploys --limit 50
|
||||||
|
|
||||||
|
# v1.8.0+: encrypted REST send (body_version 2). Falls back to v1
|
||||||
|
# automatically for legacy unencrypted topics. --plaintext forces v1.
|
||||||
|
claudemesh topic post deploys "rolling out, cc @Alexis stay around"
|
||||||
|
|
||||||
|
# v1.9.0+: thread a reply onto a previous topic message. Accepts the
|
||||||
|
# full id or an 8+ char prefix; resolved against recent history.
|
||||||
|
claudemesh topic post deploys "yes — same here" --reply-to 7XtIeF7o
|
||||||
```
|
```
|
||||||
|
|
||||||
|
In `topic tail` output, replies render with a `↳ in reply to <name>: "<snippet>"` line above the message and every row shows a short id tag (`#xxxxxxxx`) so you can copy-paste into `--reply-to`.
|
||||||
|
|
||||||
When to use topics vs groups vs DM:
|
When to use topics vs groups vs DM:
|
||||||
- **DM** (`send <peer>`) — 1:1, ephemeral.
|
- **DM** (`send <peer>`) — 1:1, ephemeral.
|
||||||
- **Group** (`send "@frontend"`) — addresses everyone in a group; ephemeral; for coordinating teams.
|
- **Group** (`send "@frontend"`) — addresses everyone in a group; ephemeral; for coordinating teams.
|
||||||
- **Topic** (`send "#deploys"`) — durable conversation room; for ongoing work threads, incident channels, build-status feeds.
|
- **Topic** (`send "#deploys"`) — durable conversation room; for ongoing work threads, incident channels, build-status feeds.
|
||||||
|
|
||||||
|
### `member` — mesh roster + online state (v1.7.0)
|
||||||
|
|
||||||
|
Distinct from `peer list`: members shows the static roster (every joined member of a mesh, online or not), peers shows the live WS-connected sessions plus REST-active humans.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
claudemesh member list # everyone, with status dots
|
||||||
|
claudemesh member list --online # only online
|
||||||
|
claudemesh member list --mesh deploys --json
|
||||||
|
```
|
||||||
|
|
||||||
|
Status glyphs: `●` emerald = idle, `●` clay = working, `●` red = dnd, `○` dim = offline. `bot` tag appears on non-human members.
|
||||||
|
|
||||||
|
### `notification` — recent @-mentions (v1.7.0)
|
||||||
|
|
||||||
|
Server-side write-time fan-out from `mesh.notification` — one row per recipient per matching `@-mention`. Works for both v1 plaintext and v2 ciphertext (clients send the mention list explicitly on v2).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
claudemesh notification list # last 24h, all mentions of you
|
||||||
|
claudemesh notification list --since 2026-05-01T00:00Z # incremental for polling
|
||||||
|
claudemesh notification list --json # parseable
|
||||||
|
```
|
||||||
|
|
||||||
|
### Per-topic encryption (v0.3.0 / CLI 1.8.0)
|
||||||
|
|
||||||
|
Topics created on or after CLI 1.8.0 generate a 32-byte XSalsa20-Poly1305 symmetric key sealed for each member via `crypto_box`. The broker holds ciphertext only. `topic post` encrypts; `topic tail` decrypts. The `🔒 v2` glyph in tail output marks ciphertext rounds. v1 plaintext topics keep working unchanged.
|
||||||
|
|
||||||
|
When a new member joins an encrypted topic, they get a 404 from `GET /v1/topics/:name/key` until any holder re-seals for them. `topic tail` runs a 30s background loop that does the re-seal automatically while the tail is open. Otherwise the joiner waits for someone with the key to log in.
|
||||||
|
|
||||||
### `peer` — read connected peers + admin (kick / ban / verify)
|
### `peer` — read connected peers + admin (kick / ban / verify)
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -77,22 +335,36 @@ claudemesh peer bans # list banned members
|
|||||||
claudemesh peer verify [peer] # 6×5-digit safety numbers
|
claudemesh peer verify [peer] # 6×5-digit safety numbers
|
||||||
```
|
```
|
||||||
|
|
||||||
JSON shape (per peer):
|
JSON shape (per peer) — **render `role` and `groups` whenever you build a table for the user**, they're the highest-signal fields after `displayName`:
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"displayName": "Mou",
|
"displayName": "Mou",
|
||||||
"pubkey": "abc123...",
|
"pubkey": "abc123...", // session pubkey (rotates per claudemesh launch)
|
||||||
|
"memberPubkey": "def456...", // stable identity (same across all sibling sessions)
|
||||||
|
"sessionId": "uuid",
|
||||||
"status": "idle | working | dnd",
|
"status": "idle | working | dnd",
|
||||||
"summary": "string or null",
|
"summary": "string or null",
|
||||||
|
"role": "lead | reviewer | bot | ...", // 1.31.5+: top-level alias of profile.role
|
||||||
"groups": [{ "name": "reviewers", "role": "lead" }],
|
"groups": [{ "name": "reviewers", "role": "lead" }],
|
||||||
"peerType": "claude | telegram | ...",
|
"profile": {
|
||||||
|
"role": "lead",
|
||||||
|
"title": "string or null",
|
||||||
|
"bio": "string or null",
|
||||||
|
"avatar": "emoji or null",
|
||||||
|
"capabilities": ["..."]
|
||||||
|
},
|
||||||
|
"peerType": "claude | telegram | ai | human | connector | ...",
|
||||||
"channel": "claude-code | api | ...",
|
"channel": "claude-code | api | ...",
|
||||||
"model": "claude-opus-4-7 | ...",
|
"model": "claude-opus-4-7 | ...",
|
||||||
"cwd": "/path/to/working/dir or null",
|
"cwd": "/path/to/working/dir or null",
|
||||||
|
"isSelf": true, // peer is one of the caller's own sessions
|
||||||
|
"isThisSession": false, // peer is the exact session running the cli
|
||||||
"stats": { "messagesIn": 0, "messagesOut": 0, "toolCalls": 0, "errors": 0, "uptime": 1200 }
|
"stats": { "messagesIn": 0, "messagesOut": 0, "toolCalls": 0, "errors": 0, "uptime": 1200 }
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**When asked to "list peers" inside a launched session, prefer the human renderer (`claudemesh peer list`, no `--json`) — it already prints role + groups inline next to the name with an explicit `(none)` footer when both are absent. If you do need JSON for parsing, always include `role` and `groups` columns in any rendered table; the user's primary question is usually "who's in what role" and dropping those fields hides the answer.**
|
||||||
|
|
||||||
### `message` — send and inspect messages
|
### `message` — send and inspect messages
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -105,15 +377,33 @@ claudemesh message send <p> "..." --priority now # bypass busy gates
|
|||||||
claudemesh message send <p> "..." --priority next # default
|
claudemesh message send <p> "..." --priority next # default
|
||||||
claudemesh message send <p> "..." --priority low # pull-only
|
claudemesh message send <p> "..." --priority low # pull-only
|
||||||
|
|
||||||
# inbox (alias: claudemesh inbox)
|
# inbox (alias: claudemesh inbox) — 1.34.0+ reads from inbox.db via daemon IPC
|
||||||
claudemesh message inbox
|
claudemesh inbox # all attached meshes, last 100
|
||||||
claudemesh message inbox --json
|
claudemesh inbox --mesh <slug> # scoped to one mesh
|
||||||
|
claudemesh inbox --mesh <slug> --limit 20 # custom cap
|
||||||
|
claudemesh inbox --json # full row (sender_pubkey, mesh, body, received_at, seen_at, …)
|
||||||
|
claudemesh inbox --unread # 1.34.8+ only rows whose seen_at IS NULL
|
||||||
|
|
||||||
|
# inbox flush + delete — 1.34.7+
|
||||||
|
claudemesh inbox flush --mesh <slug> # delete all rows on one mesh
|
||||||
|
claudemesh inbox flush --before <iso-timestamp> # delete rows older than timestamp
|
||||||
|
claudemesh inbox flush --all # delete every row on every mesh (required guard)
|
||||||
|
claudemesh inbox delete <id> # delete one inbox row by id (alias: rm)
|
||||||
|
claudemesh inbox flush --mesh <slug> --json # JSON: { ok: true, removed: N }
|
||||||
|
|
||||||
# delivery status (alias: claudemesh msg-status <id>)
|
# delivery status (alias: claudemesh msg-status <id>)
|
||||||
claudemesh message status <message-id>
|
claudemesh message status <message-id>
|
||||||
claudemesh message status <message-id> --json
|
claudemesh message status <message-id> --json
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Inbox source (1.34.0+):** `claudemesh inbox` queries the daemon's persistent `~/.claudemesh/daemon/inbox.db` over IPC — it is NOT a fresh broker-WS buffer drain. Rows survive daemon restarts. Sender attribution is the actual session pubkey of the launched session that originated the send (NOT the stable member pubkey of the sender's daemon), so two sibling sessions of the same human appear as distinct rows.
|
||||||
|
|
||||||
|
**Read-state (1.34.8+):** every inbox row carries a `seen_at` timestamp. `null` = never surfaced; an ISO string = first surfaced at that moment. The flag flips automatically when (a) the row is returned by an interactive `claudemesh inbox` listing, or (b) the MCP server emits a live `<channel>` reminder for it. The launch welcome push uses `unread_only=true` to surface only rows the user hasn't seen — so a session relaunched a day later sees what it actually missed, not the same 24h batch every time. Use `claudemesh inbox --unread` to get the same filter from the CLI.
|
||||||
|
|
||||||
|
**Self-echo guard (1.34.8+):** broker fan-out paths sometimes mirror an outbound DM back to the originating session-WS. The daemon now drops those at the WS boundary (matching on `senderPubkey === own.session_pubkey`), so the sender no longer sees their own `claudemesh send` arrive as a `← claudemesh: <self>: ...` channel push immediately after dispatching it.
|
||||||
|
|
||||||
|
**Inbox TTL (1.34.8+):** the daemon runs an hourly prune that deletes rows older than 30 days. Without this the inbox grew unbounded; now it self-trims while preserving "I went on holiday and want to see what I missed" recovery for a generous window. No CLI knob — it's a built-in retention policy. To override, manually `claudemesh inbox flush --before <iso>`.
|
||||||
|
|
||||||
`send` JSON output: `{"ok": true, "messageId": "...", "target": "..."}`. Errors: `{"ok": false, "error": "..."}`.
|
`send` JSON output: `{"ok": true, "messageId": "...", "target": "..."}`. Errors: `{"ok": false, "error": "..."}`.
|
||||||
|
|
||||||
### `state` — shared per-mesh key-value store
|
### `state` — shared per-mesh key-value store
|
||||||
@@ -262,12 +552,30 @@ claudemesh webhook delete <name>
|
|||||||
### `file` — shared mesh files
|
### `file` — shared mesh files
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
claudemesh file list [search-query] # list files
|
claudemesh file share <path> # upload to mesh (visible to all members)
|
||||||
claudemesh file status <file-id> # who has accessed
|
claudemesh file share <path> --to <peer> # share with one peer (same-host fast path if co-located)
|
||||||
|
claudemesh file share <path> --to <peer> --message "see line 42"
|
||||||
|
claudemesh file share <path> --upload # force network upload, skip same-host fast path
|
||||||
|
claudemesh file get <file-id> # download by id (saves to ./<name>)
|
||||||
|
claudemesh file get <file-id> --out /tmp/foo.bin # download to explicit path
|
||||||
|
claudemesh file list [search-query] # browse mesh files
|
||||||
|
claudemesh file status <file-id> # who has accessed
|
||||||
claudemesh file delete <file-id>
|
claudemesh file delete <file-id>
|
||||||
# Upload + retrieval currently via MCP `share_file` / `get_file` (binary streams)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Same-host fast path** (v0.6.0+): when `--to <peer>` resolves to a session
|
||||||
|
running on the same hostname as you, `claudemesh file share` skips MinIO
|
||||||
|
entirely and sends a DM with the absolute filepath. The receiver reads it
|
||||||
|
directly off disk. No 50 MB cap, no upload latency, nothing in the bucket.
|
||||||
|
Falls back to encrypted upload when the peer is remote, or always when
|
||||||
|
`--upload` is set. Routes by session pubkey, so sibling sessions of the
|
||||||
|
same member work without tripping the self-DM guard.
|
||||||
|
|
||||||
|
**Network upload cap**: 50 MB. Same-host fast path has no cap.
|
||||||
|
|
||||||
|
**`--to` accepts**: display name, member pubkey, session pubkey, or any
|
||||||
|
≥8-char prefix of a pubkey. Prefer pubkey when multiple peers share a name.
|
||||||
|
|
||||||
### `mesh-mcp` — call MCP servers other peers deployed to the mesh
|
### `mesh-mcp` — call MCP servers other peers deployed to the mesh
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|||||||
@@ -2,6 +2,43 @@ import { defineCommand, runMain } from "citty";
|
|||||||
|
|
||||||
export interface ParsedArgs { command: string; positionals: string[]; flags: Record<string, string | boolean | undefined>; }
|
export interface ParsedArgs { command: string; positionals: string[]; flags: Record<string, string | boolean | undefined>; }
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Flags that NEVER take a value. The parser's default behavior is greedy
|
||||||
|
* (any `--flag` consumes the next non-`-` arg as its value), which is
|
||||||
|
* fine for `--mesh foo` and `--priority now` but breaks for booleans:
|
||||||
|
* `claudemesh send --self <pubkey> "msg"` was eating the pubkey as the
|
||||||
|
* value of --self, leaving zero positionals and triggering Usage errors.
|
||||||
|
*
|
||||||
|
* Adding to this set: any new boolean / no-arg switch.
|
||||||
|
*/
|
||||||
|
const BOOLEAN_FLAGS = new Set([
|
||||||
|
"self",
|
||||||
|
"json", // also accepts --json=a,b,c form below
|
||||||
|
"all",
|
||||||
|
"yes", "y",
|
||||||
|
"help", "h",
|
||||||
|
"version", "v",
|
||||||
|
"quiet",
|
||||||
|
"strict",
|
||||||
|
"continue",
|
||||||
|
"no-daemon",
|
||||||
|
"no-color",
|
||||||
|
"debug",
|
||||||
|
"allow-ci-persistent",
|
||||||
|
"force",
|
||||||
|
"dry-run",
|
||||||
|
"verbose",
|
||||||
|
"skip-service",
|
||||||
|
// 1.34.8: `--unread` filters `claudemesh inbox` to rows whose
|
||||||
|
// seen_at is NULL. No value — pure switch.
|
||||||
|
"unread",
|
||||||
|
// 1.34.12: `--foreground` keeps `claudemesh daemon up` attached
|
||||||
|
// to the terminal (pre-1.34.12 behavior). Default is detached now.
|
||||||
|
"foreground",
|
||||||
|
"no-tcp",
|
||||||
|
"public-health",
|
||||||
|
]);
|
||||||
|
|
||||||
export function parseArgv(argv: string[]): ParsedArgs {
|
export function parseArgv(argv: string[]): ParsedArgs {
|
||||||
const args = argv.slice(2);
|
const args = argv.slice(2);
|
||||||
const flags: Record<string, string | boolean | undefined> = {};
|
const flags: Record<string, string | boolean | undefined> = {};
|
||||||
@@ -10,14 +47,26 @@ export function parseArgv(argv: string[]): ParsedArgs {
|
|||||||
|
|
||||||
for (let i = 0; i < args.length; i++) {
|
for (let i = 0; i < args.length; i++) {
|
||||||
const arg = args[i]!;
|
const arg = args[i]!;
|
||||||
|
// --flag=value (always parsed as a value, regardless of boolean set)
|
||||||
|
if (arg.startsWith("--") && arg.includes("=")) {
|
||||||
|
const eq = arg.indexOf("=");
|
||||||
|
const key = arg.slice(2, eq);
|
||||||
|
flags[key] = arg.slice(eq + 1);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
if (arg.startsWith("--")) {
|
if (arg.startsWith("--")) {
|
||||||
const key = arg.slice(2);
|
const key = arg.slice(2);
|
||||||
|
// Known boolean → never consume the next token as a value.
|
||||||
|
if (BOOLEAN_FLAGS.has(key)) { flags[key] = true; continue; }
|
||||||
const next = args[i + 1];
|
const next = args[i + 1];
|
||||||
if (next && !next.startsWith("-")) { flags[key] = next; i++; } else flags[key] = true;
|
if (next !== undefined && !next.startsWith("-")) { flags[key] = next; i++; }
|
||||||
|
else flags[key] = true;
|
||||||
} else if (arg.startsWith("-") && arg.length === 2) {
|
} else if (arg.startsWith("-") && arg.length === 2) {
|
||||||
const key = arg.slice(1);
|
const key = arg.slice(1);
|
||||||
|
if (BOOLEAN_FLAGS.has(key)) { flags[key] = true; continue; }
|
||||||
const next = args[i + 1];
|
const next = args[i + 1];
|
||||||
if (next && !next.startsWith("-")) { flags[key] = next; i++; } else flags[key] = true;
|
if (next !== undefined && !next.startsWith("-")) { flags[key] = next; i++; }
|
||||||
|
else flags[key] = true;
|
||||||
} else if (!command) {
|
} else if (!command) {
|
||||||
command = arg;
|
command = arg;
|
||||||
} else {
|
} else {
|
||||||
|
|||||||
198
apps/cli/src/cli/validators.ts
Normal file
198
apps/cli/src/cli/validators.ts
Normal file
@@ -0,0 +1,198 @@
|
|||||||
|
/**
|
||||||
|
* Argument validators — fail loud at the boundary, with specific reasons.
|
||||||
|
*
|
||||||
|
* Each validator returns a discriminated `ValidationResult` so callers can
|
||||||
|
* branch cleanly between "shape is wrong" (INVALID_ARGS exit) vs "value
|
||||||
|
* is well-shaped, do the lookup" (proceed). Hints (`reason`, `expected`,
|
||||||
|
* `nearest`) drive the three-tier error message contract:
|
||||||
|
*
|
||||||
|
* 1. WHAT'S WRONG — the failed assertion.
|
||||||
|
* 2. WHAT WOULD BE VALID — the canonical shape.
|
||||||
|
* 3. CLOSEST VALID ALTERNATIVE — best-effort suggestion.
|
||||||
|
*
|
||||||
|
* Use these instead of throwing strings or returning `null` for malformed
|
||||||
|
* input. They make argument errors structurally distinct from "thing
|
||||||
|
* doesn't exist" errors, which today's CLI conflates.
|
||||||
|
*/
|
||||||
|
|
||||||
|
export type ValidationResult<T = string> =
|
||||||
|
| { ok: true; value: T }
|
||||||
|
| { ok: false; code: string; reason: string; expected?: string };
|
||||||
|
|
||||||
|
const HEX_RE = /^[0-9a-f]+$/i;
|
||||||
|
const BASE62_RE = /^[A-Za-z0-9]+$/;
|
||||||
|
const SLUG_RE = /^[a-z0-9]+(?:-[a-z0-9]+)*$/;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 64-char lowercase hex peer pubkey (member or session).
|
||||||
|
* Accepts UPPERCASE hex and normalizes to lowercase.
|
||||||
|
*/
|
||||||
|
export function validatePubkey(input: string | undefined): ValidationResult {
|
||||||
|
if (!input) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
code: "missing",
|
||||||
|
reason: "pubkey is required",
|
||||||
|
expected: "64 lowercase hex chars",
|
||||||
|
};
|
||||||
|
}
|
||||||
|
if (input.length !== 64) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
code: "wrong_length",
|
||||||
|
reason: `pubkey is ${input.length} chars, expected 64`,
|
||||||
|
expected: "64 lowercase hex chars (try `claudemesh peer list --json`)",
|
||||||
|
};
|
||||||
|
}
|
||||||
|
if (!HEX_RE.test(input)) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
code: "non_hex",
|
||||||
|
reason: "pubkey contains non-hex characters",
|
||||||
|
expected: "characters [0-9a-f] only",
|
||||||
|
};
|
||||||
|
}
|
||||||
|
return { ok: true, value: input.toLowerCase() };
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Hex pubkey *prefix* — used for short-form references. Min 8 chars
|
||||||
|
* to keep collisions vanishingly rare on a per-mesh roster, max 64.
|
||||||
|
*/
|
||||||
|
export function validatePubkeyPrefix(
|
||||||
|
input: string | undefined,
|
||||||
|
{ min = 8 }: { min?: number } = {},
|
||||||
|
): ValidationResult {
|
||||||
|
if (!input) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
code: "missing",
|
||||||
|
reason: "pubkey prefix is required",
|
||||||
|
expected: `${min}-64 lowercase hex chars`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
if (input.length < min) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
code: "too_short",
|
||||||
|
reason: `prefix is ${input.length} chars, needs ≥${min}`,
|
||||||
|
expected: `${min}+ hex chars (full pubkey is 64)`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
if (input.length > 64) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
code: "too_long",
|
||||||
|
reason: `prefix is ${input.length} chars, max 64`,
|
||||||
|
expected: "drop trailing characters",
|
||||||
|
};
|
||||||
|
}
|
||||||
|
if (!HEX_RE.test(input)) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
code: "non_hex",
|
||||||
|
reason: "prefix contains non-hex characters",
|
||||||
|
expected: "characters [0-9a-f] only",
|
||||||
|
};
|
||||||
|
}
|
||||||
|
return { ok: true, value: input.toLowerCase() };
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Message id — base62, 32 chars exact, OR a prefix of ≥8 chars.
|
||||||
|
* Returns `{ value, isPrefix }` so callers can decide whether to
|
||||||
|
* resolve via lookup or treat as full id.
|
||||||
|
*/
|
||||||
|
export function validateMessageId(
|
||||||
|
input: string | undefined,
|
||||||
|
): ValidationResult<{ value: string; isPrefix: boolean }> {
|
||||||
|
if (!input) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
code: "missing",
|
||||||
|
reason: "message id is required",
|
||||||
|
expected: "32-char base62 id, or ≥8-char prefix",
|
||||||
|
};
|
||||||
|
}
|
||||||
|
if (input.length < 8) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
code: "too_short",
|
||||||
|
reason: `id is ${input.length} chars, needs ≥8`,
|
||||||
|
expected: "8+ chars (paste from a previous send/post output)",
|
||||||
|
};
|
||||||
|
}
|
||||||
|
if (input.length > 32) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
code: "too_long",
|
||||||
|
reason: `id is ${input.length} chars, max 32`,
|
||||||
|
expected: "trim trailing characters",
|
||||||
|
};
|
||||||
|
}
|
||||||
|
if (!BASE62_RE.test(input)) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
code: "bad_charset",
|
||||||
|
reason: "id contains characters outside [A-Za-z0-9]",
|
||||||
|
expected: "base62 only",
|
||||||
|
};
|
||||||
|
}
|
||||||
|
return { ok: true, value: { value: input, isPrefix: input.length < 32 } };
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Mesh slug — kebab-case, lowercase, 2-64 chars.
|
||||||
|
*/
|
||||||
|
export function validateMeshSlug(input: string | undefined): ValidationResult {
|
||||||
|
if (!input) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
code: "missing",
|
||||||
|
reason: "mesh slug is required",
|
||||||
|
expected: "kebab-case slug (e.g. `openclaw`)",
|
||||||
|
};
|
||||||
|
}
|
||||||
|
if (input.length < 2 || input.length > 64) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
code: "wrong_length",
|
||||||
|
reason: `slug is ${input.length} chars, expected 2-64`,
|
||||||
|
expected: "lowercase kebab-case",
|
||||||
|
};
|
||||||
|
}
|
||||||
|
if (!SLUG_RE.test(input)) {
|
||||||
|
return {
|
||||||
|
ok: false,
|
||||||
|
code: "bad_format",
|
||||||
|
reason: "slug must be lowercase letters, digits, and hyphens (no leading/trailing hyphen)",
|
||||||
|
expected: "e.g. `team-alpha`, `flexicar-2`",
|
||||||
|
};
|
||||||
|
}
|
||||||
|
return { ok: true, value: input };
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Render a structured validation error to stderr in the canonical
|
||||||
|
* three-line shape: `✘ <verb> <input>` / ` <reason>` / ` <expected>`.
|
||||||
|
*
|
||||||
|
* Optional fourth line for `nearest` when a fuzzy suggestion is available.
|
||||||
|
*/
|
||||||
|
export function renderValidationError(
|
||||||
|
args: {
|
||||||
|
verb: string;
|
||||||
|
input: string;
|
||||||
|
result: Extract<ValidationResult, { ok: false }>;
|
||||||
|
nearest?: string;
|
||||||
|
},
|
||||||
|
write: (s: string) => void = (s) => process.stderr.write(s),
|
||||||
|
): void {
|
||||||
|
write(` \x1b[31m✘\x1b[0m ${args.verb} ${args.input}\n`);
|
||||||
|
write(` ${args.result.reason}.\n`);
|
||||||
|
if (args.result.expected) {
|
||||||
|
write(` expected: ${args.result.expected}\n`);
|
||||||
|
}
|
||||||
|
if (args.nearest) {
|
||||||
|
write(` did you mean: \x1b[36m${args.nearest}\x1b[0m\n`);
|
||||||
|
}
|
||||||
|
}
|
||||||
@@ -15,23 +15,15 @@
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
import { withMesh } from "./connect.js";
|
import { withMesh } from "./connect.js";
|
||||||
import { readConfig } from "~/services/config/facade.js";
|
import { tryForgetViaDaemon } from "~/services/bridge/daemon-route.js";
|
||||||
import { tryBridge } from "~/services/bridge/client.js";
|
|
||||||
import { render } from "~/ui/render.js";
|
import { render } from "~/ui/render.js";
|
||||||
import { bold, clay, dim } from "~/ui/styles.js";
|
import { bold, clay, dim } from "~/ui/styles.js";
|
||||||
import { EXIT } from "~/constants/exit-codes.js";
|
import { EXIT } from "~/constants/exit-codes.js";
|
||||||
|
import { validateMessageId, renderValidationError } from "~/cli/validators.js";
|
||||||
|
|
||||||
type StateFlags = { mesh?: string; json?: boolean };
|
type StateFlags = { mesh?: string; json?: boolean };
|
||||||
type PeerStatus = "idle" | "working" | "dnd";
|
type PeerStatus = "idle" | "working" | "dnd";
|
||||||
|
|
||||||
/** Resolve unambiguous mesh slug for warm-path bridging. Returns null if
|
|
||||||
* the user has multiple joined meshes and didn't pick one. */
|
|
||||||
function unambiguousMesh(opts: StateFlags): string | null {
|
|
||||||
if (opts.mesh) return opts.mesh;
|
|
||||||
const config = readConfig();
|
|
||||||
return config.meshes.length === 1 ? config.meshes[0]!.slug : null;
|
|
||||||
}
|
|
||||||
|
|
||||||
// --- status ---
|
// --- status ---
|
||||||
|
|
||||||
export async function runStatusSet(state: string, opts: StateFlags): Promise<number> {
|
export async function runStatusSet(state: string, opts: StateFlags): Promise<number> {
|
||||||
@@ -41,21 +33,9 @@ export async function runStatusSet(state: string, opts: StateFlags): Promise<num
|
|||||||
return EXIT.INVALID_ARGS;
|
return EXIT.INVALID_ARGS;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Warm path
|
// Bridge tier deleted in 1.28.0 (dead code; the orphaned warm-path
|
||||||
const meshSlug = unambiguousMesh(opts);
|
// socket was never opened by anyone). Daemon route would belong here;
|
||||||
if (meshSlug) {
|
// adding it for status/summary/visible is queued for 1.29.0.
|
||||||
const bridged = await tryBridge(meshSlug, "status_set", { status: state });
|
|
||||||
if (bridged !== null) {
|
|
||||||
if (bridged.ok) {
|
|
||||||
if (opts.json) console.log(JSON.stringify({ status: state }));
|
|
||||||
else render.ok(`status set to ${bold(state)}`);
|
|
||||||
return EXIT.SUCCESS;
|
|
||||||
}
|
|
||||||
render.err(bridged.error);
|
|
||||||
return EXIT.INTERNAL_ERROR;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
||||||
await client.setStatus(state as PeerStatus);
|
await client.setStatus(state as PeerStatus);
|
||||||
});
|
});
|
||||||
@@ -72,21 +52,6 @@ export async function runSummary(text: string, opts: StateFlags): Promise<number
|
|||||||
return EXIT.INVALID_ARGS;
|
return EXIT.INVALID_ARGS;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Warm path
|
|
||||||
const meshSlug = unambiguousMesh(opts);
|
|
||||||
if (meshSlug) {
|
|
||||||
const bridged = await tryBridge(meshSlug, "summary", { summary: text });
|
|
||||||
if (bridged !== null) {
|
|
||||||
if (bridged.ok) {
|
|
||||||
if (opts.json) console.log(JSON.stringify({ summary: text }));
|
|
||||||
else render.ok("summary set", dim(text));
|
|
||||||
return EXIT.SUCCESS;
|
|
||||||
}
|
|
||||||
render.err(bridged.error);
|
|
||||||
return EXIT.INTERNAL_ERROR;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
||||||
await client.setSummary(text);
|
await client.setSummary(text);
|
||||||
});
|
});
|
||||||
@@ -106,21 +71,6 @@ export async function runVisible(value: string | undefined, opts: StateFlags): P
|
|||||||
return EXIT.INVALID_ARGS;
|
return EXIT.INVALID_ARGS;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Warm path
|
|
||||||
const meshSlug = unambiguousMesh(opts);
|
|
||||||
if (meshSlug) {
|
|
||||||
const bridged = await tryBridge(meshSlug, "visible", { visible });
|
|
||||||
if (bridged !== null) {
|
|
||||||
if (bridged.ok) {
|
|
||||||
if (opts.json) console.log(JSON.stringify({ visible }));
|
|
||||||
else render.ok(visible ? "you are now visible to peers" : "you are now hidden", visible ? undefined : "direct messages still reach you");
|
|
||||||
return EXIT.SUCCESS;
|
|
||||||
}
|
|
||||||
render.err(bridged.error);
|
|
||||||
return EXIT.INTERNAL_ERROR;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
||||||
await client.setVisible(visible);
|
await client.setVisible(visible);
|
||||||
});
|
});
|
||||||
@@ -172,6 +122,14 @@ export async function runForget(id: string | undefined, opts: StateFlags): Promi
|
|||||||
render.err("Usage: claudemesh forget <memory-id>");
|
render.err("Usage: claudemesh forget <memory-id>");
|
||||||
return EXIT.INVALID_ARGS;
|
return EXIT.INVALID_ARGS;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Daemon path first.
|
||||||
|
if (await tryForgetViaDaemon(id, opts.mesh)) {
|
||||||
|
if (opts.json) { console.log(JSON.stringify({ id, forgotten: true })); return EXIT.SUCCESS; }
|
||||||
|
render.ok(`forgot ${dim(id.slice(0, 8))}`);
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
||||||
await client.forget(id);
|
await client.forget(id);
|
||||||
});
|
});
|
||||||
@@ -186,22 +144,57 @@ export async function runForget(id: string | undefined, opts: StateFlags): Promi
|
|||||||
// --- msg-status ---
|
// --- msg-status ---
|
||||||
|
|
||||||
export async function runMsgStatus(id: string | undefined, opts: StateFlags): Promise<number> {
|
export async function runMsgStatus(id: string | undefined, opts: StateFlags): Promise<number> {
|
||||||
if (!id) {
|
// Validate input shape *before* we open a WS connection, so a typo
|
||||||
render.err("Usage: claudemesh msg-status <message-id>");
|
// returns a structured error instead of "not found or timed out".
|
||||||
|
const v = validateMessageId(id);
|
||||||
|
if (!v.ok) {
|
||||||
|
if (opts.json) {
|
||||||
|
console.log(
|
||||||
|
JSON.stringify({
|
||||||
|
ok: false,
|
||||||
|
error: "invalid_argument",
|
||||||
|
field: "messageId",
|
||||||
|
code: v.code,
|
||||||
|
reason: v.reason,
|
||||||
|
expected: v.expected,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
} else {
|
||||||
|
renderValidationError({
|
||||||
|
verb: "msg-status",
|
||||||
|
input: id ?? "(missing)",
|
||||||
|
result: v,
|
||||||
|
});
|
||||||
|
}
|
||||||
return EXIT.INVALID_ARGS;
|
return EXIT.INVALID_ARGS;
|
||||||
}
|
}
|
||||||
|
const lookupId = v.value.value;
|
||||||
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
||||||
const result = await client.messageStatus(id);
|
const result = await client.messageStatus(lookupId);
|
||||||
if (!result) {
|
if (!result) {
|
||||||
if (opts.json) console.log(JSON.stringify({ id, found: false }));
|
if (opts.json) {
|
||||||
else render.err(`Message ${id} not found or timed out.`);
|
console.log(
|
||||||
|
JSON.stringify({
|
||||||
|
ok: false,
|
||||||
|
error: "not_found",
|
||||||
|
id: lookupId,
|
||||||
|
isPrefix: v.value.isPrefix,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
} else {
|
||||||
|
const hint = v.value.isPrefix
|
||||||
|
? ` no message id starts with ${dim("\"" + lookupId + "\"")} in this mesh.\n try: claudemesh msg-status <full-32-char-id>`
|
||||||
|
: ` message ${dim(lookupId.slice(0, 12) + "…")} not in queue (already drained, expired, or never sent in this mesh).`;
|
||||||
|
render.err(`message not found`);
|
||||||
|
process.stderr.write(hint + "\n");
|
||||||
|
}
|
||||||
return EXIT.NOT_FOUND;
|
return EXIT.NOT_FOUND;
|
||||||
}
|
}
|
||||||
if (opts.json) {
|
if (opts.json) {
|
||||||
console.log(JSON.stringify(result, null, 2));
|
console.log(JSON.stringify(result, null, 2));
|
||||||
return EXIT.SUCCESS;
|
return EXIT.SUCCESS;
|
||||||
}
|
}
|
||||||
render.section(`message ${id.slice(0, 12)}…`);
|
render.section(`message ${lookupId.slice(0, 12)}…`);
|
||||||
render.kv([
|
render.kv([
|
||||||
["target", result.targetSpec],
|
["target", result.targetSpec],
|
||||||
["delivered", result.delivered ? "yes" : "no"],
|
["delivered", result.delivered ? "yes" : "no"],
|
||||||
|
|||||||
@@ -10,6 +10,7 @@ import { createInterface } from "node:readline";
|
|||||||
import { BrokerClient } from "~/services/broker/facade.js";
|
import { BrokerClient } from "~/services/broker/facade.js";
|
||||||
import { readConfig } from "~/services/config/facade.js";
|
import { readConfig } from "~/services/config/facade.js";
|
||||||
import type { JoinedMesh } from "~/services/config/facade.js";
|
import type { JoinedMesh } from "~/services/config/facade.js";
|
||||||
|
import { getDaemonPolicy } from "~/services/daemon/policy.js";
|
||||||
|
|
||||||
export interface ConnectOpts {
|
export interface ConnectOpts {
|
||||||
/** Mesh slug to connect to. Auto-selects if only one mesh joined. */
|
/** Mesh slug to connect to. Auto-selects if only one mesh joined. */
|
||||||
@@ -46,6 +47,18 @@ export async function withMesh<T>(
|
|||||||
opts: ConnectOpts,
|
opts: ConnectOpts,
|
||||||
fn: (client: BrokerClient, mesh: JoinedMesh) => Promise<T>,
|
fn: (client: BrokerClient, mesh: JoinedMesh) => Promise<T>,
|
||||||
): Promise<T> {
|
): Promise<T> {
|
||||||
|
// --strict gate: every cold-path verb funnels through here, so a single
|
||||||
|
// policy check covers the whole CLI surface. The daemon-routing helpers
|
||||||
|
// already returned null (auto-spawn failed); under --strict we refuse
|
||||||
|
// the cold-path fallback and exit loudly instead.
|
||||||
|
if (getDaemonPolicy().mode === "strict") {
|
||||||
|
console.error(
|
||||||
|
"\n ✘ daemon not reachable — --strict refuses cold-path fallback.\n" +
|
||||||
|
" run `claudemesh daemon up` (or `claudemesh doctor`) and retry.\n",
|
||||||
|
);
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
const config = readConfig();
|
const config = readConfig();
|
||||||
if (config.meshes.length === 0) {
|
if (config.meshes.length === 0) {
|
||||||
console.error("No meshes joined. Run `claudemesh join <url>` first.");
|
console.error("No meshes joined. Run `claudemesh join <url>` first.");
|
||||||
|
|||||||
431
apps/cli/src/commands/daemon.ts
Normal file
431
apps/cli/src/commands/daemon.ts
Normal file
@@ -0,0 +1,431 @@
|
|||||||
|
import { spawn } from "node:child_process";
|
||||||
|
import { existsSync, openSync, mkdirSync } from "node:fs";
|
||||||
|
import { join } from "node:path";
|
||||||
|
|
||||||
|
import { runDaemon } from "~/daemon/run.js";
|
||||||
|
import { ipc, IpcError } from "~/daemon/ipc/client.js";
|
||||||
|
import { readRunningPid } from "~/daemon/lock.js";
|
||||||
|
import { DAEMON_PATHS } from "~/daemon/paths.js";
|
||||||
|
|
||||||
|
export interface DaemonOptions {
|
||||||
|
json?: boolean;
|
||||||
|
noTcp?: boolean;
|
||||||
|
publicHealth?: boolean;
|
||||||
|
mesh?: string;
|
||||||
|
displayName?: string;
|
||||||
|
/** 1.34.12: keep the daemon attached to the current shell instead
|
||||||
|
* of double-forking. Default behavior changed in 1.34.12 — `up`
|
||||||
|
* now detaches by default and writes JSON logs to
|
||||||
|
* ~/.claudemesh/daemon/daemon.log. Pass `--foreground` to get the
|
||||||
|
* pre-1.34.12 behavior (logs streaming to stdout, blocks the
|
||||||
|
* terminal until Ctrl-C). install-service and `claudemesh launch`'s
|
||||||
|
* auto-spawn path always pass --foreground because their parents
|
||||||
|
* (launchd / the launch helper) own the lifecycle. */
|
||||||
|
foreground?: boolean;
|
||||||
|
/** outbox-list status filter, set from boolean flags --failed/--pending/etc. */
|
||||||
|
outboxStatus?: "pending" | "inflight" | "done" | "dead" | "aborted";
|
||||||
|
/** outbox requeue: optional id to mint a fresh client_message_id with. */
|
||||||
|
newClientId?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runDaemonCommand(
|
||||||
|
sub: string | undefined,
|
||||||
|
opts: DaemonOptions,
|
||||||
|
rest: string[] = [],
|
||||||
|
): Promise<number> {
|
||||||
|
switch (sub) {
|
||||||
|
case undefined:
|
||||||
|
return printDaemonUsage();
|
||||||
|
|
||||||
|
case "up":
|
||||||
|
case "start":
|
||||||
|
// 1.34.10: `--mesh` and `--name` deprecated.
|
||||||
|
// --mesh: daemon attaches to every joined mesh automatically;
|
||||||
|
// pinning at start time blocks new meshes from being picked up.
|
||||||
|
// --name: overrides the daemon-WS display name GLOBALLY across
|
||||||
|
// every mesh, but each mesh has its own per-mesh display name
|
||||||
|
// in config.json (set at `claudemesh join` time). Passing one
|
||||||
|
// name flattens that out. Sessions advertise their own
|
||||||
|
// CLAUDEMESH_DISPLAY_NAME at `claudemesh launch` time anyway,
|
||||||
|
// and the daemon-WS presence is hidden from peer lists since
|
||||||
|
// 1.32, so the daemon's display name isn't user-visible.
|
||||||
|
if (opts.mesh) {
|
||||||
|
process.stderr.write(
|
||||||
|
`[claudemesh] --mesh on \`daemon up\` is deprecated; the daemon attaches to every joined mesh automatically. ` +
|
||||||
|
`Ignoring --mesh ${opts.mesh}.\n`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if (opts.displayName) {
|
||||||
|
process.stderr.write(
|
||||||
|
`[claudemesh] --name on \`daemon up\` is deprecated; per-mesh display names live in config.json (set at join time), ` +
|
||||||
|
`and session display names come from \`claudemesh launch --name\`. Ignoring --name ${opts.displayName}.\n`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
// 1.34.12: detach by default. The pre-1.34.12 behavior streamed
|
||||||
|
// JSON logs to the controlling terminal and blocked the shell —
|
||||||
|
// fine for debugging, surprising for users who just want the
|
||||||
|
// daemon "up." `--foreground` opts back into the old behavior;
|
||||||
|
// launchd / systemd-user units always pass it because the unit
|
||||||
|
// manager owns lifecycle and stdio redirection.
|
||||||
|
if (!opts.foreground) {
|
||||||
|
return spawnDetachedDaemon(opts);
|
||||||
|
}
|
||||||
|
return runDaemon({
|
||||||
|
tcpEnabled: !opts.noTcp,
|
||||||
|
publicHealthCheck: opts.publicHealth,
|
||||||
|
});
|
||||||
|
|
||||||
|
case "help":
|
||||||
|
case "--help":
|
||||||
|
case "-h":
|
||||||
|
return printDaemonUsage();
|
||||||
|
|
||||||
|
case "status":
|
||||||
|
return runStatus(opts);
|
||||||
|
|
||||||
|
case "version":
|
||||||
|
return runVersion(opts);
|
||||||
|
|
||||||
|
case "down":
|
||||||
|
case "stop":
|
||||||
|
return runStop(opts);
|
||||||
|
|
||||||
|
case "accept-host":
|
||||||
|
return runAcceptHost(opts);
|
||||||
|
|
||||||
|
case "outbox":
|
||||||
|
return runOutbox(rest, opts);
|
||||||
|
|
||||||
|
case "install-service":
|
||||||
|
return runInstallService(opts);
|
||||||
|
|
||||||
|
case "uninstall-service":
|
||||||
|
return runUninstallService(opts);
|
||||||
|
|
||||||
|
default:
|
||||||
|
process.stderr.write(`unknown daemon subcommand: ${sub}\n\n`);
|
||||||
|
printDaemonUsage(process.stderr);
|
||||||
|
return 2;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function printDaemonUsage(stream: NodeJS.WritableStream = process.stdout): number {
|
||||||
|
stream.write(`claudemesh daemon — long-lived peer mesh runtime (v0.9.0)
|
||||||
|
|
||||||
|
USAGE
|
||||||
|
claudemesh daemon <command> [options]
|
||||||
|
|
||||||
|
COMMANDS
|
||||||
|
up | start start the daemon (detached by default)
|
||||||
|
status show running pid + IPC health
|
||||||
|
version ipc + schema version of the running daemon
|
||||||
|
down | stop stop the running daemon (SIGTERM, then wait)
|
||||||
|
accept-host pin the current host fingerprint
|
||||||
|
outbox list list local outbox rows (newest first)
|
||||||
|
outbox requeue <id> re-enqueue an aborted / dead outbox row
|
||||||
|
install-service write launchd (macOS) / systemd-user (Linux) unit
|
||||||
|
uninstall-service remove the platform service unit
|
||||||
|
|
||||||
|
OPTIONS
|
||||||
|
--foreground keep daemon attached to terminal, JSON logs to stdout (1.34.12+)
|
||||||
|
--no-tcp disable the loopback TCP listener (UDS only)
|
||||||
|
--public-health expose /v1/health unauthenticated on TCP
|
||||||
|
--json machine-readable output where supported
|
||||||
|
|
||||||
|
OUTBOX FLAGS (for 'daemon outbox list')
|
||||||
|
--pending --inflight --done --failed --aborted filter by status
|
||||||
|
|
||||||
|
OUTBOX FLAGS (for 'daemon outbox requeue')
|
||||||
|
--new-client-id <id> mint the new row with this client_message_id
|
||||||
|
|
||||||
|
See ${"https://claudemesh.com/docs"} for the full daemon spec.
|
||||||
|
`);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface OutboxRowResp {
|
||||||
|
id: string;
|
||||||
|
client_message_id: string;
|
||||||
|
status: string;
|
||||||
|
attempts: number;
|
||||||
|
enqueued_at: string;
|
||||||
|
next_attempt_at: string;
|
||||||
|
delivered_at: string | null;
|
||||||
|
broker_message_id: string | null;
|
||||||
|
last_error: string | null;
|
||||||
|
aborted_at: string | null;
|
||||||
|
aborted_by: string | null;
|
||||||
|
superseded_by: string | null;
|
||||||
|
payload_bytes: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function runOutbox(rest: string[], opts: DaemonOptions): Promise<number> {
|
||||||
|
const sub = rest[0];
|
||||||
|
switch (sub) {
|
||||||
|
case undefined:
|
||||||
|
case "list": {
|
||||||
|
const status = opts.outboxStatus;
|
||||||
|
const path = `/v1/outbox${status ? `?status=${status}` : ""}`;
|
||||||
|
try {
|
||||||
|
const res = await ipc<{ items: OutboxRowResp[] }>({ path });
|
||||||
|
if (opts.json) {
|
||||||
|
process.stdout.write(JSON.stringify(res.body) + "\n");
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
if (!res.body.items?.length) {
|
||||||
|
process.stdout.write("(empty)\n");
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
for (const r of res.body.items) {
|
||||||
|
const tag = r.status.padEnd(8);
|
||||||
|
const bm = r.broker_message_id ? ` → ${r.broker_message_id}` : "";
|
||||||
|
const err = r.last_error ? ` last_error="${r.last_error.slice(0, 60)}"` : "";
|
||||||
|
process.stdout.write(`${tag} ${r.id} cid=${r.client_message_id} attempts=${r.attempts}${bm}${err}\n`);
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
} catch (err) {
|
||||||
|
process.stderr.write(`daemon unreachable: ${String(err)}\n`);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
case "requeue": {
|
||||||
|
const id = rest[1];
|
||||||
|
if (!id) { process.stderr.write("usage: claudemesh daemon outbox requeue <id> [--new-client-id <id>]\n"); return 2; }
|
||||||
|
const newClientMessageId = opts.newClientId;
|
||||||
|
try {
|
||||||
|
const res = await ipc<{
|
||||||
|
aborted_row_id: string; new_row_id: string; new_client_message_id: string; error?: string;
|
||||||
|
}>({
|
||||||
|
method: "POST",
|
||||||
|
path: "/v1/outbox/requeue",
|
||||||
|
body: { id, new_client_message_id: newClientMessageId },
|
||||||
|
});
|
||||||
|
if (res.status === 200) {
|
||||||
|
if (opts.json) process.stdout.write(JSON.stringify(res.body) + "\n");
|
||||||
|
else process.stdout.write(
|
||||||
|
`requeued: aborted ${res.body.aborted_row_id} → new ${res.body.new_row_id} ` +
|
||||||
|
`(client_message_id=${res.body.new_client_message_id})\n`,
|
||||||
|
);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
process.stderr.write(`requeue failed (${res.status}): ${res.body.error ?? "unknown"}\n`);
|
||||||
|
return 1;
|
||||||
|
} catch (err) {
|
||||||
|
process.stderr.write(`daemon unreachable: ${String(err)}\n`);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
default:
|
||||||
|
process.stderr.write(`unknown outbox subcommand: ${sub}\n`);
|
||||||
|
process.stderr.write(`usage: claudemesh daemon outbox [list|requeue <id>]\n`);
|
||||||
|
return 2;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function runInstallService(opts: DaemonOptions): Promise<number> {
|
||||||
|
const { installService, detectPlatform } = await import("~/daemon/service-install.js");
|
||||||
|
const platform = detectPlatform();
|
||||||
|
if (!platform) {
|
||||||
|
process.stderr.write(`unsupported platform: ${process.platform}\n`);
|
||||||
|
return 2;
|
||||||
|
}
|
||||||
|
// Resolve the binary path. Prefer the running argv[0] when it's an
|
||||||
|
// installed claudemesh binary; fall back to whichever `claudemesh` is
|
||||||
|
// first on PATH.
|
||||||
|
// 1.34.10: install-service no longer bakes --mesh into the unit. The
|
||||||
|
// daemon attaches to every joined mesh by default, and pinning the
|
||||||
|
// unit to one slug at install time was the source of the "joined a
|
||||||
|
// new mesh but my service ignores it" footgun. If the user passes
|
||||||
|
// --mesh anyway, we warn + ignore.
|
||||||
|
let binary = process.argv[1] ?? "";
|
||||||
|
if (!binary || /\.ts$/.test(binary) || /node_modules|src\/entrypoints/.test(binary)) {
|
||||||
|
try {
|
||||||
|
const { execSync } = await import("node:child_process");
|
||||||
|
binary = execSync("which claudemesh", { encoding: "utf8" }).trim();
|
||||||
|
} catch {
|
||||||
|
process.stderr.write(`couldn't resolve a 'claudemesh' binary on PATH; install via npm/homebrew first\n`);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (opts.mesh) {
|
||||||
|
process.stderr.write(
|
||||||
|
`[claudemesh] --mesh on \`daemon install-service\` is deprecated and ignored; the daemon attaches to every joined mesh.\n`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
if (opts.displayName) {
|
||||||
|
process.stderr.write(
|
||||||
|
`[claudemesh] --name on \`daemon install-service\` is deprecated and ignored; per-mesh names live in config.json, session names come from \`claudemesh launch --name\`.\n`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
try {
|
||||||
|
const r = installService({
|
||||||
|
binaryPath: binary,
|
||||||
|
});
|
||||||
|
if (opts.json) {
|
||||||
|
process.stdout.write(JSON.stringify({ ok: true, ...r }) + "\n");
|
||||||
|
} else {
|
||||||
|
process.stdout.write(`installed ${r.platform} service unit: ${r.unitPath}\n`);
|
||||||
|
process.stdout.write(`bring it up now: ${r.bootCommand}\n`);
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
} catch (err) {
|
||||||
|
process.stderr.write(`install-service failed: ${String(err)}\n`);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function runUninstallService(opts: DaemonOptions): Promise<number> {
|
||||||
|
const { uninstallService } = await import("~/daemon/service-install.js");
|
||||||
|
const r = uninstallService();
|
||||||
|
if (opts.json) process.stdout.write(JSON.stringify(r) + "\n");
|
||||||
|
else if (r.removed.length === 0) process.stdout.write("no service unit installed\n");
|
||||||
|
else process.stdout.write(`removed: ${r.removed.join(", ")}\n`);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function runAcceptHost(opts: DaemonOptions): Promise<number> {
|
||||||
|
const { acceptCurrentHost } = await import("~/daemon/identity.js");
|
||||||
|
const fp = acceptCurrentHost();
|
||||||
|
if (opts.json) process.stdout.write(JSON.stringify({ ok: true, fingerprint_prefix: fp.fingerprint.slice(0, 16) }) + "\n");
|
||||||
|
else process.stdout.write(`host fingerprint accepted: ${fp.fingerprint.slice(0, 16)}…\n`);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function runStatus(opts: DaemonOptions): Promise<number> {
|
||||||
|
const pid = readRunningPid();
|
||||||
|
if (!pid) {
|
||||||
|
if (opts.json) process.stdout.write(JSON.stringify({ running: false }) + "\n");
|
||||||
|
else process.stdout.write("daemon: not running\n");
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
try {
|
||||||
|
const res = await ipc<{ ok: boolean; pid: number }>({ path: "/v1/health" });
|
||||||
|
if (opts.json) {
|
||||||
|
process.stdout.write(JSON.stringify({ running: true, pid, health: res.body }) + "\n");
|
||||||
|
} else {
|
||||||
|
process.stdout.write(`daemon: running (pid ${pid})\n`);
|
||||||
|
process.stdout.write(`socket: ${DAEMON_PATHS.SOCK_FILE}\n`);
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
} catch (err) {
|
||||||
|
if (opts.json) process.stdout.write(JSON.stringify({ running: true, pid, ipc_error: String(err) }) + "\n");
|
||||||
|
else process.stdout.write(`daemon: pid ${pid} alive but IPC unreachable (${String(err)})\n`);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function runVersion(opts: DaemonOptions): Promise<number> {
|
||||||
|
try {
|
||||||
|
const res = await ipc<Record<string, unknown>>({ path: "/v1/version" });
|
||||||
|
if (opts.json) process.stdout.write(JSON.stringify(res.body) + "\n");
|
||||||
|
else {
|
||||||
|
const v = res.body as { daemon_version?: string; ipc_api?: string; schema_version?: number };
|
||||||
|
process.stdout.write(`daemon ${v.daemon_version ?? "unknown"} (ipc ${v.ipc_api ?? "?"}, schema ${v.schema_version ?? "?"})\n`);
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
} catch (err) {
|
||||||
|
if (err instanceof IpcError) {
|
||||||
|
process.stderr.write(`${err.message}\n`);
|
||||||
|
return err.status === 401 ? 3 : 1;
|
||||||
|
}
|
||||||
|
process.stderr.write(`daemon unreachable: ${String(err)}\n`);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async function runStop(opts: DaemonOptions): Promise<number> {
|
||||||
|
const pid = readRunningPid();
|
||||||
|
if (!pid) {
|
||||||
|
if (opts.json) process.stdout.write(JSON.stringify({ stopped: false, reason: "not_running" }) + "\n");
|
||||||
|
else process.stdout.write("daemon: not running\n");
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
try {
|
||||||
|
process.kill(pid, "SIGTERM");
|
||||||
|
} catch (err) {
|
||||||
|
process.stderr.write(`failed to signal pid ${pid}: ${String(err)}\n`);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
// Brief wait for the daemon to release its lock cleanly.
|
||||||
|
for (let i = 0; i < 50; i++) {
|
||||||
|
await new Promise<void>((r) => setTimeout(r, 100));
|
||||||
|
if (!readRunningPid()) {
|
||||||
|
if (opts.json) process.stdout.write(JSON.stringify({ stopped: true, pid }) + "\n");
|
||||||
|
else process.stdout.write(`daemon: stopped (was pid ${pid})\n`);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (opts.json) process.stdout.write(JSON.stringify({ stopped: false, pid, reason: "shutdown_timeout" }) + "\n");
|
||||||
|
else process.stdout.write(`daemon: signaled but did not exit within 5s (pid ${pid})\n`);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 1.34.12: spawn the daemon as a detached background process. Re-execs
|
||||||
|
* the same `claudemesh` binary with `daemon up --foreground` (so the
|
||||||
|
* child runs the long-lived loop), redirects stdout/stderr to
|
||||||
|
* ~/.claudemesh/daemon/daemon.log, and `unref()`s so the parent shell
|
||||||
|
* can exit cleanly.
|
||||||
|
*
|
||||||
|
* The parent waits up to ~3s for the UDS socket to appear before
|
||||||
|
* declaring success — that's the same liveness check `claudemesh launch`
|
||||||
|
* uses, and it catches the "child crashed during boot" case (config
|
||||||
|
* read failed, port bind failed, etc.) with an actionable error
|
||||||
|
* pointing at the log file rather than silent loss.
|
||||||
|
*/
|
||||||
|
async function spawnDetachedDaemon(opts: DaemonOptions): Promise<number> {
|
||||||
|
// Ensure the log directory exists before opening the FDs.
|
||||||
|
mkdirSync(DAEMON_PATHS.DAEMON_DIR, { recursive: true, mode: 0o700 });
|
||||||
|
const logPath = join(DAEMON_PATHS.DAEMON_DIR, "daemon.log");
|
||||||
|
|
||||||
|
// The CLI binary path. process.argv[1] is the entrypoint script the
|
||||||
|
// node runtime is currently executing — for an installed CLI that's
|
||||||
|
// .../bin/claudemesh, for `bun run` dev that's the local dist file.
|
||||||
|
// Either way it's the right thing to re-exec.
|
||||||
|
const binary = process.argv[1] ?? "claudemesh";
|
||||||
|
const args = ["daemon", "up", "--foreground"];
|
||||||
|
if (opts.noTcp) args.push("--no-tcp");
|
||||||
|
if (opts.publicHealth) args.push("--public-health");
|
||||||
|
|
||||||
|
const out = openSync(logPath, "a");
|
||||||
|
const err = openSync(logPath, "a");
|
||||||
|
const child = spawn(process.execPath, [binary, ...args], {
|
||||||
|
detached: true,
|
||||||
|
stdio: ["ignore", out, err],
|
||||||
|
env: process.env,
|
||||||
|
});
|
||||||
|
// Decouple the child from the parent's process group so closing the
|
||||||
|
// shell doesn't SIGHUP the daemon.
|
||||||
|
child.unref();
|
||||||
|
|
||||||
|
// Wait for the socket to appear — the daemon's IPC listener binds
|
||||||
|
// ~immediately after the broker WS handshake starts, so socket
|
||||||
|
// existence is a reliable "the daemon is alive enough to accept
|
||||||
|
// requests" signal.
|
||||||
|
const sockPath = DAEMON_PATHS.SOCK_FILE;
|
||||||
|
const startedAt = Date.now();
|
||||||
|
while (Date.now() - startedAt < 3_000) {
|
||||||
|
if (existsSync(sockPath)) {
|
||||||
|
if (opts.json) {
|
||||||
|
process.stdout.write(JSON.stringify({ ok: true, detached: true, pid: child.pid, log: logPath }) + "\n");
|
||||||
|
} else {
|
||||||
|
process.stdout.write(` ✔ daemon started (pid ${child.pid})\n`);
|
||||||
|
process.stdout.write(` → log: ${logPath}\n`);
|
||||||
|
process.stdout.write(` → stop: claudemesh daemon down\n`);
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
await new Promise<void>((r) => setTimeout(r, 100));
|
||||||
|
}
|
||||||
|
|
||||||
|
if (opts.json) {
|
||||||
|
process.stdout.write(JSON.stringify({ ok: false, detached: true, pid: child.pid, reason: "socket_not_appeared", log: logPath }) + "\n");
|
||||||
|
} else {
|
||||||
|
process.stderr.write(` ✘ daemon spawn timeout: socket did not appear within 3s\n`);
|
||||||
|
process.stderr.write(` → check log: ${logPath}\n`);
|
||||||
|
process.stderr.write(` → run foreground for live output: claudemesh daemon up --foreground\n`);
|
||||||
|
}
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
@@ -45,7 +45,7 @@ export async function deleteMesh(slug: string, opts: { yes?: boolean } = {}): Pr
|
|||||||
}
|
}
|
||||||
render.section("select mesh to remove");
|
render.section("select mesh to remove");
|
||||||
config.meshes.forEach((m, i) => {
|
config.meshes.forEach((m, i) => {
|
||||||
process.stdout.write(` ${bold(String(i + 1) + ")")} ${clay(m.slug)} ${dim("(" + m.name + ")")}\n`);
|
process.stdout.write(` ${bold(String(i + 1) + ")")} ${clay(m.slug)}\n`);
|
||||||
});
|
});
|
||||||
render.blank();
|
render.blank();
|
||||||
const choice = await prompt(` ${dim("choice:")} `);
|
const choice = await prompt(` ${dim("choice:")} `);
|
||||||
|
|||||||
166
apps/cli/src/commands/file.ts
Normal file
166
apps/cli/src/commands/file.ts
Normal file
@@ -0,0 +1,166 @@
|
|||||||
|
/**
|
||||||
|
* `claudemesh file share <path>` — upload a file to the mesh.
|
||||||
|
* `claudemesh file get <id>` — download a file by id.
|
||||||
|
*
|
||||||
|
* Same-host fast path: when `--to <peer>` is provided and the target
|
||||||
|
* peer's `hostname` matches this machine's, we skip the MinIO upload
|
||||||
|
* entirely and send a DM containing the absolute path. The receiver
|
||||||
|
* reads it directly off the local filesystem. Saves bandwidth + bucket
|
||||||
|
* space for the common "two Claude sessions on the same laptop" case.
|
||||||
|
*
|
||||||
|
* Falls back to encrypted MinIO upload + grant when:
|
||||||
|
* - `--to` not provided (sharing with the whole mesh)
|
||||||
|
* - target peer is on a different host
|
||||||
|
* - `--upload` flag forces the network path
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { hostname as osHostname } from "node:os";
|
||||||
|
import { resolve as resolvePath, basename, dirname } from "node:path";
|
||||||
|
import { statSync, existsSync, writeFileSync, mkdirSync } from "node:fs";
|
||||||
|
|
||||||
|
import { withMesh } from "./connect.js";
|
||||||
|
import { render } from "~/ui/render.js";
|
||||||
|
import { bold, dim, green } from "~/ui/styles.js";
|
||||||
|
import { EXIT } from "~/constants/exit-codes.js";
|
||||||
|
|
||||||
|
// Broker enforces 50 MB on /upload (apps/broker/src/index.ts ~line 1204).
|
||||||
|
// We mirror it client-side so users get a clear error before bytes go on the wire.
|
||||||
|
const MAX_FILE_BYTES = 50 * 1024 * 1024;
|
||||||
|
|
||||||
|
type Flags = {
|
||||||
|
mesh?: string;
|
||||||
|
json?: boolean;
|
||||||
|
to?: string;
|
||||||
|
tags?: string;
|
||||||
|
out?: string;
|
||||||
|
upload?: boolean; // force network upload, skip same-host fast path
|
||||||
|
message?: string; // optional note attached to the share DM
|
||||||
|
};
|
||||||
|
|
||||||
|
function emitJson(data: unknown): void {
|
||||||
|
console.log(JSON.stringify(data, null, 2));
|
||||||
|
}
|
||||||
|
|
||||||
|
function formatSize(bytes: number): string {
|
||||||
|
if (bytes < 1024) return `${bytes} B`;
|
||||||
|
if (bytes < 1024 * 1024) return `${(bytes / 1024).toFixed(1)} KB`;
|
||||||
|
return `${(bytes / (1024 * 1024)).toFixed(1)} MB`;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runFileShare(filePath: string, opts: Flags): Promise<number> {
|
||||||
|
if (!filePath) {
|
||||||
|
render.err("Usage: claudemesh file share <path> [--to <peer>] [--tags a,b] [--message \"...\"] [--upload]");
|
||||||
|
return EXIT.INVALID_ARGS;
|
||||||
|
}
|
||||||
|
const absPath = resolvePath(filePath);
|
||||||
|
if (!existsSync(absPath)) {
|
||||||
|
render.err(`File not found: ${absPath}`);
|
||||||
|
return EXIT.INVALID_ARGS;
|
||||||
|
}
|
||||||
|
const stat = statSync(absPath);
|
||||||
|
if (!stat.isFile()) {
|
||||||
|
render.err(`Not a regular file: ${absPath}`);
|
||||||
|
return EXIT.INVALID_ARGS;
|
||||||
|
}
|
||||||
|
// Network upload has a 50 MB cap (broker-enforced). The same-host fast
|
||||||
|
// path doesn't transfer bytes — it sends a filepath — so it has no cap.
|
||||||
|
|
||||||
|
const tags = opts.tags ? opts.tags.split(",").map((t) => t.trim()).filter(Boolean) : [];
|
||||||
|
|
||||||
|
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client, mesh) => {
|
||||||
|
// ── Same-host fast path ─────────────────────────────────────────────
|
||||||
|
// If --to points at a peer running on this same machine, just DM the
|
||||||
|
// absolute path. No upload, no MinIO, no presigned URLs.
|
||||||
|
if (opts.to && !opts.upload) {
|
||||||
|
const peers = await client.listPeers();
|
||||||
|
const myHost = osHostname();
|
||||||
|
const target = peers.find((p) => {
|
||||||
|
if (!p.hostname || p.hostname !== myHost) return false;
|
||||||
|
return (
|
||||||
|
p.displayName === opts.to ||
|
||||||
|
(p as { memberPubkey?: string }).memberPubkey === opts.to ||
|
||||||
|
p.pubkey === opts.to ||
|
||||||
|
(typeof opts.to === "string" && opts.to.length >= 8 && p.pubkey.startsWith(opts.to))
|
||||||
|
);
|
||||||
|
});
|
||||||
|
|
||||||
|
if (target) {
|
||||||
|
const note = opts.message ? `\n${opts.message}` : "";
|
||||||
|
const body = `📎 file://${absPath} (${formatSize(stat.size)} · same host, no upload)${note}`;
|
||||||
|
// Route by session pubkey, not displayName — sibling sessions of
|
||||||
|
// the same member share the displayName (and the v0.5.1 self-DM
|
||||||
|
// guard would otherwise reject sends targeting our own member).
|
||||||
|
const result = await client.send(target.pubkey, body, "next");
|
||||||
|
if (!result.ok) {
|
||||||
|
render.err(`Send failed: ${result.error ?? "unknown"}`);
|
||||||
|
return EXIT.NETWORK_ERROR;
|
||||||
|
}
|
||||||
|
if (opts.json) {
|
||||||
|
emitJson({ mode: "local", path: absPath, to: target.displayName, hostname: myHost, sizeBytes: stat.size });
|
||||||
|
} else {
|
||||||
|
render.ok(`shared ${bold(basename(absPath))} ${dim(`(${formatSize(stat.size)})`)} → ${green(target.displayName)} ${dim("[same host, no upload]")}`);
|
||||||
|
}
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
// No same-host match — fall through to upload path.
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Network upload path ─────────────────────────────────────────────
|
||||||
|
const fileId = await client.uploadFile(absPath, mesh.meshId, mesh.memberId, {
|
||||||
|
name: basename(absPath),
|
||||||
|
tags,
|
||||||
|
persistent: true,
|
||||||
|
targetSpec: opts.to,
|
||||||
|
});
|
||||||
|
|
||||||
|
// If --to was set, drop a DM so the recipient is notified + has the id.
|
||||||
|
if (opts.to) {
|
||||||
|
const note = opts.message ? `\n${opts.message}` : "";
|
||||||
|
const body = `📎 ${basename(absPath)} (${formatSize(stat.size)})\nclaudemesh file get ${fileId}${note}`;
|
||||||
|
await client.send(opts.to, body, "next");
|
||||||
|
}
|
||||||
|
|
||||||
|
if (opts.json) {
|
||||||
|
emitJson({ mode: "upload", fileId, name: basename(absPath), sizeBytes: stat.size, to: opts.to ?? null });
|
||||||
|
} else {
|
||||||
|
render.ok(`uploaded ${bold(basename(absPath))} ${dim(`(${formatSize(stat.size)})`)} ${dim("· id=" + fileId.slice(0, 12))}`);
|
||||||
|
if (opts.to) render.info(dim(` notified ${opts.to}`));
|
||||||
|
else render.info(dim(` retrieve: claudemesh file get ${fileId}`));
|
||||||
|
}
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runFileGet(fileId: string, opts: Flags): Promise<number> {
|
||||||
|
if (!fileId) {
|
||||||
|
render.err("Usage: claudemesh file get <file-id> [--out <path>]");
|
||||||
|
return EXIT.INVALID_ARGS;
|
||||||
|
}
|
||||||
|
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
||||||
|
const meta = await client.getFile(fileId);
|
||||||
|
if (!meta) {
|
||||||
|
render.err(`File not found or not accessible: ${fileId}`);
|
||||||
|
return EXIT.NOT_FOUND;
|
||||||
|
}
|
||||||
|
|
||||||
|
const res = await fetch(meta.url, { signal: AbortSignal.timeout(60_000) });
|
||||||
|
if (!res.ok) {
|
||||||
|
render.err(`Download failed: HTTP ${res.status}`);
|
||||||
|
return EXIT.NETWORK_ERROR;
|
||||||
|
}
|
||||||
|
const buf = Buffer.from(await res.arrayBuffer());
|
||||||
|
|
||||||
|
const outPath = opts.out
|
||||||
|
? resolvePath(opts.out)
|
||||||
|
: resolvePath(process.cwd(), meta.name);
|
||||||
|
mkdirSync(dirname(outPath), { recursive: true });
|
||||||
|
writeFileSync(outPath, buf);
|
||||||
|
|
||||||
|
if (opts.json) {
|
||||||
|
emitJson({ fileId, name: meta.name, savedTo: outPath, sizeBytes: buf.length });
|
||||||
|
} else {
|
||||||
|
render.ok(`saved ${bold(meta.name)} ${dim(`(${formatSize(buf.length)})`)} → ${dim(outPath)}`);
|
||||||
|
}
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
});
|
||||||
|
}
|
||||||
91
apps/cli/src/commands/inbox-actions.ts
Normal file
91
apps/cli/src/commands/inbox-actions.ts
Normal file
@@ -0,0 +1,91 @@
|
|||||||
|
/**
|
||||||
|
* `claudemesh inbox flush` and `claudemesh inbox delete <id>` —
|
||||||
|
* mutate the daemon's persistent inbox store
|
||||||
|
* (`~/.claudemesh/daemon/inbox.db`) over IPC.
|
||||||
|
*
|
||||||
|
* 1.34.7: until this version, the only way to clean the inbox was a
|
||||||
|
* raw `sqlite3 inbox.db "DELETE FROM inbox"` against the daemon's
|
||||||
|
* private DB. That works but bypasses the IPC layer (and any future
|
||||||
|
* lifecycle hooks on row removal), and is invisible to a user who
|
||||||
|
* doesn't know the schema. These two verbs make the operation visible
|
||||||
|
* + safe + scriptable.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import {
|
||||||
|
tryFlushInboxViaDaemon,
|
||||||
|
tryDeleteInboxRowViaDaemon,
|
||||||
|
} from "~/services/bridge/daemon-route.js";
|
||||||
|
import { render } from "~/ui/render.js";
|
||||||
|
import { dim } from "~/ui/styles.js";
|
||||||
|
|
||||||
|
export interface InboxFlushFlags {
|
||||||
|
mesh?: string;
|
||||||
|
/** ISO-8601 timestamp; deletes rows received_at < before. */
|
||||||
|
before?: string;
|
||||||
|
/** Required when neither --mesh nor --before is set, to prevent an
|
||||||
|
* accidental "delete every row on every mesh". */
|
||||||
|
all?: boolean;
|
||||||
|
json?: boolean;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runInboxFlush(flags: InboxFlushFlags): Promise<void> {
|
||||||
|
const hasFilter = !!(flags.mesh || flags.before);
|
||||||
|
if (!hasFilter && !flags.all) {
|
||||||
|
if (flags.json) { process.stdout.write(JSON.stringify({ ok: false, error: "missing_filter" }) + "\n"); return; }
|
||||||
|
render.info(dim(
|
||||||
|
"Refusing to flush every row on every mesh.\n" +
|
||||||
|
" Re-run with --mesh <slug>, --before <iso-timestamp>, or --all to confirm.",
|
||||||
|
));
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
const removed = await tryFlushInboxViaDaemon({
|
||||||
|
...(flags.mesh ? { mesh: flags.mesh } : {}),
|
||||||
|
...(flags.before ? { beforeIso: flags.before } : {}),
|
||||||
|
});
|
||||||
|
|
||||||
|
if (removed === null) {
|
||||||
|
if (flags.json) { process.stdout.write(JSON.stringify({ ok: false, error: "daemon_unreachable" }) + "\n"); return; }
|
||||||
|
render.info(dim("Daemon not reachable. Run `claudemesh daemon up` and retry."));
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (flags.json) {
|
||||||
|
process.stdout.write(JSON.stringify({ ok: true, removed }) + "\n");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
const scope = flags.mesh
|
||||||
|
? `mesh "${flags.mesh}"`
|
||||||
|
: flags.before
|
||||||
|
? `older than ${flags.before}`
|
||||||
|
: "all meshes";
|
||||||
|
render.info(`✔ Flushed ${removed} message${removed === 1 ? "" : "s"} from ${scope}.`);
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface InboxDeleteFlags {
|
||||||
|
json?: boolean;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runInboxDelete(id: string, flags: InboxDeleteFlags): Promise<void> {
|
||||||
|
if (!id) {
|
||||||
|
if (flags.json) { process.stdout.write(JSON.stringify({ ok: false, error: "missing_id" }) + "\n"); return; }
|
||||||
|
render.info(dim("Usage: claudemesh inbox delete <message-id>"));
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
const ok = await tryDeleteInboxRowViaDaemon(id);
|
||||||
|
if (ok === null) {
|
||||||
|
if (flags.json) { process.stdout.write(JSON.stringify({ ok: false, error: "daemon_unreachable" }) + "\n"); return; }
|
||||||
|
render.info(dim("Daemon not reachable. Run `claudemesh daemon up` and retry."));
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
if (!ok) {
|
||||||
|
if (flags.json) { process.stdout.write(JSON.stringify({ ok: false, error: "not_found", id }) + "\n"); return; }
|
||||||
|
render.info(dim(`No inbox row with id "${id}".`));
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
if (flags.json) {
|
||||||
|
process.stdout.write(JSON.stringify({ ok: true, id }) + "\n");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
render.info(`✔ Deleted inbox row ${id}.`);
|
||||||
|
}
|
||||||
@@ -1,49 +1,101 @@
|
|||||||
/**
|
/**
|
||||||
* `claudemesh inbox` — read pending peer messages.
|
* `claudemesh inbox` — read pending peer messages from the daemon's
|
||||||
|
* persisted inbox (`~/.claudemesh/daemon/inbox.db`).
|
||||||
*
|
*
|
||||||
* Connects, waits briefly for push delivery, drains the buffer, prints.
|
* 1.34.0: switched from the legacy cold-path "open fresh broker WS,
|
||||||
* Works best when message-mode is "inbox" or "off" (messages held at broker).
|
* drain in-memory buffer" flow to a daemon IPC read against `/v1/inbox`.
|
||||||
|
* The cold path was structurally broken — the persistent inbox lives in
|
||||||
|
* the daemon, and pushes land on its session-WS, not on a freshly-opened
|
||||||
|
* standalone WS. The daemon-route `tryListInboxViaDaemon` returns rows
|
||||||
|
* persisted across daemon restarts and surfaces them with the correct
|
||||||
|
* mesh scoping (server-side mesh filter added in 1.34.0).
|
||||||
|
*
|
||||||
|
* Cold-path fallback removed: when the daemon isn't reachable, the
|
||||||
|
* prior implementation returned an empty list anyway (no broker state
|
||||||
|
* = no buffered pushes), so removing that path doesn't lose any
|
||||||
|
* functionality. Strict mode emits a clear error via daemon-route.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
import { withMesh } from "./connect.js";
|
import { tryListInboxViaDaemon } from "~/services/bridge/daemon-route.js";
|
||||||
import type { InboundPush } from "~/services/broker/facade.js";
|
|
||||||
import { render } from "~/ui/render.js";
|
import { render } from "~/ui/render.js";
|
||||||
import { bold, dim } from "~/ui/styles.js";
|
import { bold, dim } from "~/ui/styles.js";
|
||||||
|
|
||||||
export interface InboxFlags {
|
export interface InboxFlags {
|
||||||
mesh?: string;
|
mesh?: string;
|
||||||
json?: boolean;
|
json?: boolean;
|
||||||
wait?: number;
|
/** Cap the number of rows returned by the daemon. Default 100. */
|
||||||
|
limit?: number;
|
||||||
|
/** 1.34.8: only show rows whose seen_at is NULL (i.e., never
|
||||||
|
* surfaced via an interactive listing or live channel reminder).
|
||||||
|
* When omitted, every row is returned and an interactive listing
|
||||||
|
* stamps them seen as a side effect. */
|
||||||
|
unread?: boolean;
|
||||||
}
|
}
|
||||||
|
|
||||||
function formatMessage(msg: InboundPush): string {
|
interface FormattedItem {
|
||||||
const text = msg.plaintext ?? `[encrypted: ${msg.ciphertext.slice(0, 32)}…]`;
|
sender_pubkey: string;
|
||||||
const from = msg.senderPubkey.slice(0, 8);
|
sender_name: string;
|
||||||
const time = new Date(msg.createdAt).toLocaleTimeString();
|
body: string | null;
|
||||||
const kindTag = msg.kind === "direct" ? "→ direct" : msg.kind;
|
topic: string | null;
|
||||||
return ` ${bold(from)} ${dim(`[${kindTag}] ${time}`)}\n ${text}`;
|
received_at: string;
|
||||||
|
mesh: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
function formatMessage(msg: FormattedItem, includeMesh: boolean): string {
|
||||||
|
const text = msg.body ?? "[encrypted]";
|
||||||
|
const from = msg.sender_name && msg.sender_name !== msg.sender_pubkey.slice(0, 8)
|
||||||
|
? `${msg.sender_name} (${msg.sender_pubkey.slice(0, 8)})`
|
||||||
|
: msg.sender_pubkey.slice(0, 8);
|
||||||
|
const time = new Date(msg.received_at).toLocaleTimeString();
|
||||||
|
const topicTag = msg.topic ? ` (#${msg.topic})` : "";
|
||||||
|
const meshTag = includeMesh ? ` [${msg.mesh}]` : "";
|
||||||
|
return ` ${bold(from)} ${dim(`${meshTag}${topicTag} ${time}`)}\n ${text}`;
|
||||||
}
|
}
|
||||||
|
|
||||||
export async function runInbox(flags: InboxFlags): Promise<void> {
|
export async function runInbox(flags: InboxFlags): Promise<void> {
|
||||||
const waitMs = (flags.wait ?? 1) * 1000;
|
// Mesh resolution is owned by the daemon (it knows which meshes are
|
||||||
|
// attached) — the CLI just forwards the user's --mesh flag through.
|
||||||
|
// When omitted, the daemon's `/v1/inbox` honors the session-default
|
||||||
|
// mesh on auth-token requests; out-of-session callers see rows from
|
||||||
|
// every attached mesh. We don't pre-validate the mesh slug here so
|
||||||
|
// the command works even from a launch tmpdir whose local
|
||||||
|
// `config.json` only knows about the launch's mesh.
|
||||||
|
const meshSlug = flags.mesh;
|
||||||
|
|
||||||
await withMesh({ meshSlug: flags.mesh ?? null }, async (client, mesh) => {
|
const items = await tryListInboxViaDaemon(meshSlug, flags.limit ?? 100, {
|
||||||
await new Promise<void>((resolve) => setTimeout(resolve, waitMs));
|
unreadOnly: flags.unread === true,
|
||||||
const messages = client.drainPushBuffer();
|
// CLI is the canonical "I'm reading my inbox" path — let the daemon
|
||||||
|
// auto-stamp seen_at on the rows we just rendered. The MCP welcome
|
||||||
if (flags.json) {
|
// path passes mark_seen=false instead and stamps explicitly after
|
||||||
process.stdout.write(JSON.stringify(messages, null, 2) + "\n");
|
// the channel notification succeeds.
|
||||||
return;
|
markSeen: true,
|
||||||
}
|
|
||||||
|
|
||||||
if (messages.length === 0) {
|
|
||||||
render.info(dim(`No messages on mesh "${mesh.slug}".`));
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
|
|
||||||
render.section(`inbox — ${mesh.slug} (${messages.length} message${messages.length === 1 ? "" : "s"})`);
|
|
||||||
for (const msg of messages) {
|
|
||||||
process.stdout.write(formatMessage(msg) + "\n\n");
|
|
||||||
}
|
|
||||||
});
|
});
|
||||||
|
if (items === null) {
|
||||||
|
if (flags.json) { process.stdout.write("[]\n"); return; }
|
||||||
|
render.info(dim("Daemon not reachable. Run `claudemesh daemon up` and retry."));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (flags.json) {
|
||||||
|
process.stdout.write(JSON.stringify(items, null, 2) + "\n");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (items.length === 0) {
|
||||||
|
const scope = meshSlug ? `mesh "${meshSlug}"` : "any mesh";
|
||||||
|
const filter = flags.unread ? "unread " : "";
|
||||||
|
render.info(dim(`No ${filter}messages on ${scope}.`));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
const filterTag = flags.unread ? " unread" : "";
|
||||||
|
const heading = meshSlug
|
||||||
|
? `inbox — ${meshSlug} (${items.length}${filterTag} message${items.length === 1 ? "" : "s"})`
|
||||||
|
: `inbox (${items.length}${filterTag} message${items.length === 1 ? "" : "s"})`;
|
||||||
|
render.section(heading);
|
||||||
|
// When the user didn't filter by mesh, surface the mesh slug per row
|
||||||
|
// so they can tell apart rows from different meshes at a glance.
|
||||||
|
for (const msg of items) {
|
||||||
|
process.stdout.write(formatMessage(msg, !meshSlug) + "\n\n");
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -29,4 +29,5 @@ export { runSeedTestMesh } from "./seed-test-mesh.js";
|
|||||||
export { runNotificationList } from "./notification.js";
|
export { runNotificationList } from "./notification.js";
|
||||||
export { runMemberList } from "./member.js";
|
export { runMemberList } from "./member.js";
|
||||||
export { runTopicTail } from "./topic-tail.js";
|
export { runTopicTail } from "./topic-tail.js";
|
||||||
|
export { runTopicPost } from "./topic-post.js";
|
||||||
export { withMesh } from "./connect.js";
|
export { withMesh } from "./connect.js";
|
||||||
|
|||||||
@@ -434,9 +434,10 @@ function installStatusLine(): { installed: boolean } {
|
|||||||
return { installed: true };
|
return { installed: true };
|
||||||
}
|
}
|
||||||
|
|
||||||
export function runInstall(args: string[] = []): void {
|
export async function runInstall(args: string[] = []): Promise<void> {
|
||||||
const skipHooks = args.includes("--no-hooks");
|
const skipHooks = args.includes("--no-hooks");
|
||||||
const skipSkill = args.includes("--no-skill");
|
const skipSkill = args.includes("--no-skill");
|
||||||
|
const skipService = args.includes("--no-service");
|
||||||
const wantStatusLine = args.includes("--status-line");
|
const wantStatusLine = args.includes("--status-line");
|
||||||
render.section("claudemesh install");
|
render.section("claudemesh install");
|
||||||
|
|
||||||
@@ -549,6 +550,30 @@ export function runInstall(args: string[] = []): void {
|
|||||||
hasMeshes = meshConfig.meshes.length > 0;
|
hasMeshes = meshConfig.meshes.length > 0;
|
||||||
} catch {}
|
} catch {}
|
||||||
|
|
||||||
|
// Daemon service install — required for MCP integration as of 1.24.0.
|
||||||
|
// The daemon owns the broker WS and feeds the MCP push-pipe via SSE;
|
||||||
|
// skipping it leaves channel push, slash commands, and resources broken.
|
||||||
|
// 1.30.2: install no longer locks the unit to a single mesh; the
|
||||||
|
// daemon attaches to every joined mesh on boot (1.26.0 multi-mesh
|
||||||
|
// design). Users who want single-mesh can pass `claudemesh daemon
|
||||||
|
// install-service --mesh <slug>` explicitly.
|
||||||
|
if (!skipService && hasMeshes) {
|
||||||
|
try {
|
||||||
|
await installDaemonService(entry);
|
||||||
|
} catch (e) {
|
||||||
|
render.warn(
|
||||||
|
`daemon service install failed: ${e instanceof Error ? e.message : String(e)}`,
|
||||||
|
"Run `claudemesh daemon install-service` to retry.",
|
||||||
|
);
|
||||||
|
}
|
||||||
|
} else if (skipService) {
|
||||||
|
render.info(dim("· Daemon service skipped (--no-service)"));
|
||||||
|
render.info(dim(" MCP integration will fail at boot until you start the daemon manually:"));
|
||||||
|
render.info(dim(" claudemesh daemon up --mesh <slug>"));
|
||||||
|
} else if (!hasMeshes) {
|
||||||
|
render.info(dim("· Daemon service deferred — join a mesh first, then run install again."));
|
||||||
|
}
|
||||||
|
|
||||||
render.blank();
|
render.blank();
|
||||||
render.warn(`${bold("RESTART CLAUDE CODE")} ${yellow("for MCP tools to appear.")}`);
|
render.warn(`${bold("RESTART CLAUDE CODE")} ${yellow("for MCP tools to appear.")}`);
|
||||||
|
|
||||||
@@ -569,6 +594,112 @@ export function runInstall(args: string[] = []): void {
|
|||||||
render.info(dim(` claudemesh completions zsh # shell completions`));
|
render.info(dim(` claudemesh completions zsh # shell completions`));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Install + start the per-user daemon service for the primary mesh.
|
||||||
|
*
|
||||||
|
* Refuses on CI hosts (the service-install module guards this); falls
|
||||||
|
* back to a friendly message and lets the install otherwise succeed.
|
||||||
|
* The MCP push-pipe will fail loudly if the daemon isn't reachable, so
|
||||||
|
* the user knows there's a problem before it shows up as "no messages
|
||||||
|
* arriving."
|
||||||
|
*/
|
||||||
|
async function installDaemonService(binaryEntry: string): Promise<void> {
|
||||||
|
const {
|
||||||
|
installService,
|
||||||
|
detectPlatform,
|
||||||
|
} = require("~/daemon/service-install.js") as typeof import("../daemon/service-install.js");
|
||||||
|
|
||||||
|
const platform = detectPlatform();
|
||||||
|
if (!platform) {
|
||||||
|
render.info(dim(`· Daemon service skipped — unsupported platform: ${process.platform}`));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Resolve the binary the service unit should launch. When invoked from a
|
||||||
|
// bundled binary, argv[1] is correct. When invoked under tsx / dev, fall
|
||||||
|
// back to whatever `claudemesh` resolves to on PATH so the unit launches
|
||||||
|
// a shipped binary, not a dev script.
|
||||||
|
let binary = process.argv[1] ?? binaryEntry;
|
||||||
|
if (!binary || /\.ts$/.test(binary) || /node_modules|src\/entrypoints/.test(binary)) {
|
||||||
|
try {
|
||||||
|
const { execSync } = require("node:child_process") as typeof import("node:child_process");
|
||||||
|
binary = execSync("which claudemesh", { encoding: "utf8" }).trim();
|
||||||
|
} catch {
|
||||||
|
render.warn(
|
||||||
|
"couldn't resolve a 'claudemesh' binary on PATH; daemon service skipped",
|
||||||
|
"Install via npm/homebrew, then run `claudemesh daemon install-service`",
|
||||||
|
);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const r = installService({ binaryPath: binary });
|
||||||
|
render.ok(`daemon service installed (${r.platform})`);
|
||||||
|
render.kv([
|
||||||
|
["unit", dim(r.unitPath)],
|
||||||
|
["mesh", dim("(all joined meshes)")],
|
||||||
|
]);
|
||||||
|
|
||||||
|
// Boot the unit immediately so MCP has a daemon to attach to on next
|
||||||
|
// Claude Code launch. Best-effort: if launchctl/systemctl errors out we
|
||||||
|
// log and continue — the user can run the boot command manually.
|
||||||
|
try {
|
||||||
|
const { execSync } = require("node:child_process") as typeof import("node:child_process");
|
||||||
|
execSync(r.bootCommand, { stdio: "ignore" });
|
||||||
|
render.ok("daemon started");
|
||||||
|
} catch (e) {
|
||||||
|
render.warn(
|
||||||
|
`daemon service installed but failed to start: ${e instanceof Error ? e.message : String(e)}`,
|
||||||
|
`Run manually: ${r.bootCommand}`,
|
||||||
|
);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 1.31.0 — post-flight: verify the daemon actually establishes a
|
||||||
|
// broker WebSocket. Boots that fail silently here (DNS, expired TLS,
|
||||||
|
// outbound :443 blocked, broker outage) used to surface only when
|
||||||
|
// the user's first `peer list` or `send` failed half an hour later.
|
||||||
|
// Polling /v1/health gives a clear, install-time signal.
|
||||||
|
await verifyBrokerConnectivity();
|
||||||
|
}
|
||||||
|
|
||||||
|
async function verifyBrokerConnectivity(): Promise<void> {
|
||||||
|
const VERIFY_BUDGET_MS = 15_000;
|
||||||
|
const POLL_INTERVAL_MS = 500;
|
||||||
|
const { ipc } = await import("~/daemon/ipc/client.js");
|
||||||
|
const start = Date.now();
|
||||||
|
let lastBrokers: Record<string, string> = {};
|
||||||
|
|
||||||
|
while (Date.now() - start < VERIFY_BUDGET_MS) {
|
||||||
|
try {
|
||||||
|
const res = await ipc<{ ok: boolean; brokers?: Record<string, string> }>({
|
||||||
|
path: "/v1/health",
|
||||||
|
timeoutMs: 2_000,
|
||||||
|
});
|
||||||
|
lastBrokers = res.body?.brokers ?? {};
|
||||||
|
const openMesh = Object.entries(lastBrokers).find(([, s]) => s === "open");
|
||||||
|
if (openMesh) {
|
||||||
|
const others = Object.entries(lastBrokers).filter(([slug]) => slug !== openMesh[0]);
|
||||||
|
const tail = others.length > 0 ? `, ${others.length} other mesh${others.length === 1 ? "" : "es"} attaching` : "";
|
||||||
|
render.ok(`broker connected (mesh=${openMesh[0]}${tail})`);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
} catch { /* daemon may still be starting up; keep polling */ }
|
||||||
|
await new Promise((r) => setTimeout(r, POLL_INTERVAL_MS));
|
||||||
|
}
|
||||||
|
|
||||||
|
// Timed out without a single broker reaching `open`. Surface what we
|
||||||
|
// saw last so the user can act — this is exactly the bug class we
|
||||||
|
// want to catch at install time, not at first send.
|
||||||
|
const states = Object.keys(lastBrokers).length === 0
|
||||||
|
? "no health response from daemon"
|
||||||
|
: Object.entries(lastBrokers).map(([m, s]) => `${m}=${s}`).join(", ");
|
||||||
|
render.warn(
|
||||||
|
`broker did not reach open within ${Math.round(VERIFY_BUDGET_MS / 1000)}s (${states})`,
|
||||||
|
"Check ~/.claudemesh/daemon/daemon.log for connect errors. Common causes: outbound :443 blocked, expired TLS, DNS resolution.",
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
export function runUninstall(): void {
|
export function runUninstall(): void {
|
||||||
render.section("claudemesh uninstall");
|
render.section("claudemesh uninstall");
|
||||||
|
|
||||||
|
|||||||
@@ -39,7 +39,7 @@ export async function invite(
|
|||||||
// Show picker
|
// Show picker
|
||||||
console.log("\n Select mesh to share:\n");
|
console.log("\n Select mesh to share:\n");
|
||||||
config.meshes.forEach((m, i) => {
|
config.meshes.forEach((m, i) => {
|
||||||
console.log(` ${bold(String(i + 1) + ")")} ${m.slug} ${dim("(" + m.name + ")")}`);
|
console.log(` ${bold(String(i + 1) + ")")} ${m.slug}`);
|
||||||
});
|
});
|
||||||
console.log("");
|
console.log("");
|
||||||
const choice = await prompt(" Choice [1]: ") || "1";
|
const choice = await prompt(" Choice [1]: ") || "1";
|
||||||
|
|||||||
@@ -76,12 +76,32 @@ export async function runKick(
|
|||||||
if ("error" in built) { render.err(String(built.error)); return EXIT.INVALID_ARGS; }
|
if ("error" in built) { render.err(String(built.error)); return EXIT.INVALID_ARGS; }
|
||||||
|
|
||||||
return await withMesh({ meshSlug }, async (client) => {
|
return await withMesh({ meshSlug }, async (client) => {
|
||||||
const result = await client.sendAndWait(built as Record<string, unknown>) as { affected?: string[]; kicked?: string[] };
|
const result = await client.sendAndWait(built as Record<string, unknown>) as {
|
||||||
|
affected?: string[];
|
||||||
|
kicked?: string[];
|
||||||
|
// 1.34.15: broker refuses to kick control-plane WSes (they'd
|
||||||
|
// just auto-reconnect). Older brokers don't emit this field.
|
||||||
|
skipped_control_plane?: string[];
|
||||||
|
};
|
||||||
const peers = result?.affected ?? result?.kicked ?? [];
|
const peers = result?.affected ?? result?.kicked ?? [];
|
||||||
if (peers.length === 0) render.info("No peers matched.");
|
const skipped = result?.skipped_control_plane ?? [];
|
||||||
else {
|
|
||||||
|
if (peers.length === 0 && skipped.length === 0) {
|
||||||
|
render.info("No peers matched.");
|
||||||
|
} else if (peers.length === 0 && skipped.length > 0) {
|
||||||
|
render.warn(
|
||||||
|
`${skipped.length} match(es) refused: ${skipped.join(", ")} — control-plane connections (daemon / dashboard) auto-reconnect, so kick is a no-op.`,
|
||||||
|
"To take a daemon offline locally, run `claudemesh daemon down` on that machine. To remove a member from the mesh, use `claudemesh ban <peer>`.",
|
||||||
|
);
|
||||||
|
} else {
|
||||||
render.ok(`Kicked ${peers.length} peer(s): ${peers.join(", ")}`);
|
render.ok(`Kicked ${peers.length} peer(s): ${peers.join(", ")}`);
|
||||||
render.hint("Their Claude Code session ended. They can rejoin anytime by running `claudemesh`.");
|
render.hint("Their Claude Code session ended. They can rejoin anytime by running `claudemesh`.");
|
||||||
|
if (skipped.length > 0) {
|
||||||
|
render.warn(
|
||||||
|
`(also refused ${skipped.length} control-plane connection(s): ${skipped.join(", ")})`,
|
||||||
|
"Daemon / dashboard connections auto-reconnect; kick is a no-op against them. Use `claudemesh ban <peer>` to remove a member entirely.",
|
||||||
|
);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
return EXIT.SUCCESS;
|
return EXIT.SUCCESS;
|
||||||
});
|
});
|
||||||
|
|||||||
@@ -44,6 +44,62 @@ export interface LaunchFlags {
|
|||||||
|
|
||||||
// --- Interactive mesh picker ---
|
// --- Interactive mesh picker ---
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Ensure the per-user daemon is running before we hand off to Claude Code.
|
||||||
|
*
|
||||||
|
* As of 1.24.0 the daemon owns the broker WS and feeds the MCP push-pipe
|
||||||
|
* over IPC SSE. If the socket is absent when Claude boots its MCP shim,
|
||||||
|
* the shim bails (no fallback). Delegates to the shared lifecycle helper
|
||||||
|
* (services/daemon/lifecycle.ts) which probes the socket properly
|
||||||
|
* (avoiding the stale-socket bug where existsSync was a false positive
|
||||||
|
* after a daemon crash), spawns under a file-lock, and polls for liveness.
|
||||||
|
*/
|
||||||
|
async function ensureDaemonRunning(meshSlug: string, quiet: boolean): Promise<void> {
|
||||||
|
const { ensureDaemonReady } = await import("~/services/daemon/lifecycle.js");
|
||||||
|
if (!quiet) render.info("ensuring claudemesh daemon is running…");
|
||||||
|
// Larger budget for `launch` — it's a one-shot flow where the user
|
||||||
|
// is actively waiting; cold node start + broker hello can take
|
||||||
|
// longer than the default 3s budget for ad-hoc verbs.
|
||||||
|
const res = await ensureDaemonReady({ budgetMs: 10_000, mesh: meshSlug });
|
||||||
|
if (res.state === "up") {
|
||||||
|
if (!quiet) render.ok("daemon already running");
|
||||||
|
await warnIfDaemonStale(quiet);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
if (res.state === "started") {
|
||||||
|
if (!quiet) render.ok(`daemon ready (${res.durationMs}ms)`);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
render.warn(
|
||||||
|
`daemon ${res.state}${res.reason ? `: ${res.reason}` : ""}`,
|
||||||
|
"Run `claudemesh daemon up` manually, then re-launch.",
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
/** 1.34.9: warn when the running daemon's version doesn't match the CLI
|
||||||
|
* that's about to launch a session. `npm i -g claudemesh-cli` upgrades
|
||||||
|
* the binaries on disk but doesn't restart a launchd / systemd-user
|
||||||
|
* service or a foreground `claudemesh daemon up`, so users routinely
|
||||||
|
* ship a fix to the CLI side and never see it because the WS lifecycle,
|
||||||
|
* echo guards, and self-join filters all live in the long-running
|
||||||
|
* daemon process. We probe `/v1/version` and emit a one-shot stderr
|
||||||
|
* warning when CLI ≠ daemon. Best-effort; failures are silent. */
|
||||||
|
async function warnIfDaemonStale(quiet: boolean): Promise<void> {
|
||||||
|
if (quiet) return;
|
||||||
|
try {
|
||||||
|
const { ipc } = await import("~/daemon/ipc/client.js");
|
||||||
|
const { VERSION } = await import("~/constants/urls.js");
|
||||||
|
const res = await ipc<{ daemon_version?: string }>({ path: "/v1/version", timeoutMs: 1_500 });
|
||||||
|
if (res.status !== 200) return;
|
||||||
|
const daemonVersion = res.body.daemon_version ?? "";
|
||||||
|
if (!daemonVersion || daemonVersion === VERSION) return;
|
||||||
|
render.warn(
|
||||||
|
`daemon is ${daemonVersion}, CLI is ${VERSION} — restart to pick up new fixes.`,
|
||||||
|
"Run: `claudemesh daemon down && claudemesh daemon up` (no --mesh — daemon attaches to every joined mesh; restart the launchd / systemd-user unit if you installed one).",
|
||||||
|
);
|
||||||
|
} catch { /* swallow — version probe is best-effort */ }
|
||||||
|
}
|
||||||
|
|
||||||
async function pickMesh(meshes: JoinedMesh[]): Promise<JoinedMesh> {
|
async function pickMesh(meshes: JoinedMesh[]): Promise<JoinedMesh> {
|
||||||
if (meshes.length === 1) return meshes[0]!;
|
if (meshes.length === 1) return meshes[0]!;
|
||||||
|
|
||||||
@@ -218,7 +274,7 @@ async function runLaunchWizard(opts: {
|
|||||||
spinner.stop();
|
spinner.stop();
|
||||||
const choice = await menuSelect({
|
const choice = await menuSelect({
|
||||||
title: "Select mesh",
|
title: "Select mesh",
|
||||||
items: opts.meshes.map(m => m.slug),
|
items: opts.meshes.map((m) => m.slug),
|
||||||
row,
|
row,
|
||||||
});
|
});
|
||||||
mesh = opts.meshes[choice]!;
|
mesh = opts.meshes[choice]!;
|
||||||
@@ -318,6 +374,66 @@ async function runLaunchWizard(opts: {
|
|||||||
return { mesh, role, groups, messageMode, skipPermissions };
|
return { mesh, role, groups, messageMode, skipPermissions };
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 1.32.0 — broker welcome line printed right after the launch banner.
|
||||||
|
* Polls the daemon's /v1/health (per-mesh broker WS state) and tries
|
||||||
|
* to fetch the inbox + peer count via daemon-route helpers. Best-effort:
|
||||||
|
* if any call fails the welcome simply prints what it knows and moves
|
||||||
|
* on — never blocks the launch path.
|
||||||
|
*/
|
||||||
|
async function printBrokerWelcome(meshSlug: string): Promise<void> {
|
||||||
|
const useColor = !process.env.NO_COLOR && process.env.TERM !== "dumb" && process.stdout.isTTY;
|
||||||
|
const dim = (s: string): string => (useColor ? `\x1b[2m${s}\x1b[22m` : s);
|
||||||
|
const green = (s: string): string => (useColor ? `\x1b[32m${s}\x1b[22m` : s);
|
||||||
|
const yellow = (s: string): string => (useColor ? `\x1b[33m${s}\x1b[22m` : s);
|
||||||
|
|
||||||
|
// Probe daemon health for broker WS state.
|
||||||
|
let brokerState = "unknown";
|
||||||
|
try {
|
||||||
|
const { ipc } = await import("~/daemon/ipc/client.js");
|
||||||
|
const res = await ipc<{ ok?: boolean; brokers?: Record<string, string> }>({
|
||||||
|
path: "/v1/health",
|
||||||
|
timeoutMs: 1_500,
|
||||||
|
});
|
||||||
|
if (res.status === 200 && res.body?.brokers) {
|
||||||
|
brokerState = res.body.brokers[meshSlug] ?? "unknown";
|
||||||
|
}
|
||||||
|
} catch { /* daemon unreachable — not fatal */ }
|
||||||
|
|
||||||
|
// Peer count (best-effort). 1.34.15: scope to the launched mesh so
|
||||||
|
// multi-mesh daemons don't inflate the welcome banner with peers
|
||||||
|
// from other meshes the user didn't just attach to.
|
||||||
|
let peerCount = -1;
|
||||||
|
try {
|
||||||
|
const { tryListPeersViaDaemon } = await import("~/services/bridge/daemon-route.js");
|
||||||
|
const peers = (await tryListPeersViaDaemon(meshSlug)) ?? [];
|
||||||
|
peerCount = peers.filter((p) =>
|
||||||
|
(p as { channel?: string }).channel !== "claudemesh-daemon",
|
||||||
|
).length;
|
||||||
|
} catch { /* skip peer count */ }
|
||||||
|
|
||||||
|
// Unread inbox count (best-effort).
|
||||||
|
let unread = -1;
|
||||||
|
try {
|
||||||
|
const { ipc } = await import("~/daemon/ipc/client.js");
|
||||||
|
const res = await ipc<{ messages?: unknown[] }>({
|
||||||
|
path: "/v1/inbox",
|
||||||
|
timeoutMs: 1_500,
|
||||||
|
});
|
||||||
|
if (res.status === 200 && Array.isArray(res.body?.messages)) {
|
||||||
|
unread = res.body.messages.length;
|
||||||
|
}
|
||||||
|
} catch { /* skip unread */ }
|
||||||
|
|
||||||
|
const dot = brokerState === "open" ? green("●") : yellow("●");
|
||||||
|
const parts: string[] = [];
|
||||||
|
parts.push(`broker ${brokerState === "open" ? "connected" : brokerState}`);
|
||||||
|
if (peerCount >= 0) parts.push(`${peerCount} peer${peerCount === 1 ? "" : "s"} online`);
|
||||||
|
if (unread >= 0) parts.push(`${unread} unread`);
|
||||||
|
console.log(`${dot} ${parts.join(dim(" · "))}`);
|
||||||
|
console.log("");
|
||||||
|
}
|
||||||
|
|
||||||
function printBanner(name: string, meshSlug: string, role: string | null, groups: GroupEntry[], messageMode: "push" | "inbox" | "off"): void {
|
function printBanner(name: string, meshSlug: string, role: string | null, groups: GroupEntry[], messageMode: "push" | "inbox" | "off"): void {
|
||||||
const useColor =
|
const useColor =
|
||||||
!process.env.NO_COLOR && process.env.TERM !== "dumb" && process.stdout.isTTY;
|
!process.env.NO_COLOR && process.env.TERM !== "dumb" && process.stdout.isTTY;
|
||||||
@@ -550,6 +666,12 @@ export async function runLaunch(flags: LaunchFlags, rawArgs: string[]): Promise<
|
|||||||
}
|
}
|
||||||
} catch { /* best effort */ }
|
} catch { /* best effort */ }
|
||||||
|
|
||||||
|
// Ensure the daemon is running before we spawn Claude. The MCP shim
|
||||||
|
// (loaded by --dangerously-load-development-channels server:claudemesh)
|
||||||
|
// requires the daemon's UDS to be reachable at boot — if it isn't,
|
||||||
|
// channel push, slash commands, and resources fail.
|
||||||
|
await ensureDaemonRunning(mesh.slug, args.quiet);
|
||||||
|
|
||||||
// Clean up stale mesh MCP entries from crashed sessions
|
// Clean up stale mesh MCP entries from crashed sessions
|
||||||
try {
|
try {
|
||||||
const claudeConfigPath = join(homedir(), ".claude.json");
|
const claudeConfigPath = join(homedir(), ".claude.json");
|
||||||
@@ -616,9 +738,109 @@ export async function runLaunch(flags: LaunchFlags, rawArgs: string[]): Promise<
|
|||||||
"utf-8",
|
"utf-8",
|
||||||
);
|
);
|
||||||
|
|
||||||
|
// 4b. Mint a per-session IPC token, persist it under tmpDir, and
|
||||||
|
// register it with the daemon. The token's path is exposed to
|
||||||
|
// the spawned claude (and all its descendants) via env so
|
||||||
|
// CLI invocations from inside the session auto-attribute to it.
|
||||||
|
//
|
||||||
|
// 1.30.0: also mint an ephemeral ed25519 session keypair and a
|
||||||
|
// parent-vouched attestation. The daemon uses these to open a
|
||||||
|
// long-lived broker WebSocket per session (presence row keyed on
|
||||||
|
// the session pubkey, member_id from the parent), so sibling
|
||||||
|
// sessions in the same mesh see each other in `peer list`.
|
||||||
|
//
|
||||||
|
// Session-id resolution: 1.29.0 referenced `claudeSessionId`
|
||||||
|
// before its `const` declaration further down the file, hitting
|
||||||
|
// the TDZ → ReferenceError swallowed by the surrounding catch.
|
||||||
|
// The IPC registration has been silently failing every launch
|
||||||
|
// since 1.29.0. Hoist the declaration up so it actually runs.
|
||||||
|
const isResume = args.resume !== null || args.continueSession;
|
||||||
|
const claudeSessionId = isResume ? undefined : randomUUID();
|
||||||
|
let sessionTokenFilePath: string | null = null;
|
||||||
|
let sessionTokenForCleanup: string | null = null;
|
||||||
|
try {
|
||||||
|
const { mintSessionToken, TOKEN_FILE_ENV } = await import("~/services/session/token.js");
|
||||||
|
const minted = mintSessionToken(tmpDir);
|
||||||
|
sessionTokenFilePath = minted.filePath;
|
||||||
|
sessionTokenForCleanup = minted.token;
|
||||||
|
|
||||||
|
// Per-session ephemeral keypair + parent attestation (1.30.0+).
|
||||||
|
// Older daemons ignore unknown body fields, so sending presence
|
||||||
|
// material always is forward-compatible.
|
||||||
|
let presencePayload: {
|
||||||
|
session_pubkey: string;
|
||||||
|
session_secret_key: string;
|
||||||
|
parent_attestation: {
|
||||||
|
session_pubkey: string;
|
||||||
|
parent_member_pubkey: string;
|
||||||
|
expires_at: number;
|
||||||
|
signature: string;
|
||||||
|
};
|
||||||
|
} | undefined;
|
||||||
|
try {
|
||||||
|
const { generateKeypair } = await import("~/services/crypto/facade.js");
|
||||||
|
const { signParentAttestation } = await import("~/services/broker/session-hello-sig.js");
|
||||||
|
const sessionKp = await generateKeypair();
|
||||||
|
const att = await signParentAttestation({
|
||||||
|
parentMemberPubkey: mesh.pubkey,
|
||||||
|
parentSecretKey: mesh.secretKey,
|
||||||
|
sessionPubkey: sessionKp.publicKey,
|
||||||
|
});
|
||||||
|
presencePayload = {
|
||||||
|
session_pubkey: sessionKp.publicKey,
|
||||||
|
session_secret_key: sessionKp.secretKey,
|
||||||
|
parent_attestation: {
|
||||||
|
session_pubkey: att.sessionPubkey,
|
||||||
|
parent_member_pubkey: att.parentMemberPubkey,
|
||||||
|
expires_at: att.expiresAt,
|
||||||
|
signature: att.signature,
|
||||||
|
},
|
||||||
|
};
|
||||||
|
} catch {
|
||||||
|
// Keypair / attestation failure — proceed without per-session
|
||||||
|
// presence. The session still registers; only the broker-side
|
||||||
|
// presence row is skipped.
|
||||||
|
}
|
||||||
|
|
||||||
|
// Register with the daemon. Best-effort: a daemon failure here
|
||||||
|
// means the session falls back to user-level scope, which is fine.
|
||||||
|
const { ipc } = await import("~/daemon/ipc/client.js");
|
||||||
|
const sessionIdForRegister = claudeSessionId ?? randomUUID();
|
||||||
|
await ipc({
|
||||||
|
method: "POST",
|
||||||
|
path: "/v1/sessions/register",
|
||||||
|
timeoutMs: 3_000,
|
||||||
|
body: {
|
||||||
|
token: minted.token,
|
||||||
|
session_id: sessionIdForRegister,
|
||||||
|
mesh: mesh.slug,
|
||||||
|
display_name: displayName,
|
||||||
|
pid: process.pid,
|
||||||
|
cwd: process.cwd(),
|
||||||
|
...(role ? { role } : {}),
|
||||||
|
...(parsedGroups.length > 0 ? { groups: parsedGroups.map((g) => `@${g.name}${g.role ? `:${g.role}` : ""}`) } : {}),
|
||||||
|
...(presencePayload ? { presence: presencePayload } : {}),
|
||||||
|
},
|
||||||
|
}).catch(() => null);
|
||||||
|
|
||||||
|
// Pin the env name on a global so the spawn block below can pick it up.
|
||||||
|
(process as unknown as { _claudemeshTokenEnv?: { name: string; value: string } })._claudemeshTokenEnv = {
|
||||||
|
name: TOKEN_FILE_ENV,
|
||||||
|
value: minted.filePath,
|
||||||
|
};
|
||||||
|
} catch {
|
||||||
|
// Token mint or registration failed — proceed without per-session
|
||||||
|
// attribution. CLI invocations from the session will still work,
|
||||||
|
// they'll just default to user-level scope.
|
||||||
|
}
|
||||||
|
|
||||||
// 5. Print summary banner (wizard already handled all interactive config).
|
// 5. Print summary banner (wizard already handled all interactive config).
|
||||||
if (!args.quiet) {
|
if (!args.quiet) {
|
||||||
printBanner(displayName, mesh.slug, role, parsedGroups, messageMode);
|
printBanner(displayName, mesh.slug, role, parsedGroups, messageMode);
|
||||||
|
// 1.32.0+: broker welcome — confirm the per-session WS is actually
|
||||||
|
// attached and surface peer count + unread inbox so the user lands
|
||||||
|
// in claude code with a clear state instead of silent assumptions.
|
||||||
|
await printBrokerWelcome(mesh.slug);
|
||||||
}
|
}
|
||||||
|
|
||||||
// --- Install native MCP entries for deployed mesh services ---
|
// --- Install native MCP entries for deployed mesh services ---
|
||||||
@@ -689,10 +911,8 @@ export async function runLaunch(flags: LaunchFlags, rawArgs: string[]): Promise<
|
|||||||
// passes -y / --yes. Without it, claudemesh tools still work because
|
// passes -y / --yes. Without it, claudemesh tools still work because
|
||||||
// `claudemesh install` pre-approves them via allowedTools in settings.json.
|
// `claudemesh install` pre-approves them via allowedTools in settings.json.
|
||||||
// This keeps permissions tight for multi-person meshes.
|
// This keeps permissions tight for multi-person meshes.
|
||||||
// Session identity: --resume reuses existing session, otherwise generate new.
|
// Session identity: claudeSessionId was generated above (4b) so the
|
||||||
// When resuming, Claude Code reuses the session ID so the mesh peer identity persists.
|
// session-token registration could include it. Reuse here.
|
||||||
const isResume = args.resume !== null || args.continueSession;
|
|
||||||
const claudeSessionId = isResume ? undefined : randomUUID();
|
|
||||||
|
|
||||||
const claudeArgs = [
|
const claudeArgs = [
|
||||||
"--dangerously-load-development-channels",
|
"--dangerously-load-development-channels",
|
||||||
@@ -737,7 +957,14 @@ export async function runLaunch(flags: LaunchFlags, rawArgs: string[]): Promise<
|
|||||||
writeFileSync(claudeConfigPath, JSON.stringify(claudeConfig, null, 2) + "\n", "utf-8");
|
writeFileSync(claudeConfigPath, JSON.stringify(claudeConfig, null, 2) + "\n", "utf-8");
|
||||||
} catch { /* best effort */ }
|
} catch { /* best effort */ }
|
||||||
}
|
}
|
||||||
// Ephemeral config dir
|
// The token's session-token file lives inside tmpDir; rmSync below
|
||||||
|
// shreds the secret. The daemon's session reaper notices the
|
||||||
|
// launched session's pid is gone within 30s and drops the registry
|
||||||
|
// entry. Explicit DELETE on /v1/sessions is feasible only from an
|
||||||
|
// async exit hook, which adds complexity for ~30s of memory the
|
||||||
|
// reaper will reclaim anyway. Leaving as-is; revisit if the
|
||||||
|
// registry ever grows persistence.
|
||||||
|
// Ephemeral config dir (also drops the session-token file)
|
||||||
try {
|
try {
|
||||||
rmSync(tmpDir, { recursive: true, force: true });
|
rmSync(tmpDir, { recursive: true, force: true });
|
||||||
} catch { /* best effort */ }
|
} catch { /* best effort */ }
|
||||||
@@ -799,6 +1026,7 @@ export async function runLaunch(flags: LaunchFlags, rawArgs: string[]): Promise<
|
|||||||
CLAUDEMESH_CONFIG_DIR: tmpDir,
|
CLAUDEMESH_CONFIG_DIR: tmpDir,
|
||||||
CLAUDEMESH_DISPLAY_NAME: displayName,
|
CLAUDEMESH_DISPLAY_NAME: displayName,
|
||||||
...(claudeSessionId ? { CLAUDEMESH_SESSION_ID: claudeSessionId } : {}),
|
...(claudeSessionId ? { CLAUDEMESH_SESSION_ID: claudeSessionId } : {}),
|
||||||
|
...(sessionTokenFilePath ? { CLAUDEMESH_IPC_TOKEN_FILE: sessionTokenFilePath } : {}),
|
||||||
MCP_TIMEOUT: process.env.MCP_TIMEOUT ?? "30000",
|
MCP_TIMEOUT: process.env.MCP_TIMEOUT ?? "30000",
|
||||||
MAX_MCP_OUTPUT_TOKENS: process.env.MAX_MCP_OUTPUT_TOKENS ?? "50000",
|
MAX_MCP_OUTPUT_TOKENS: process.env.MAX_MCP_OUTPUT_TOKENS ?? "50000",
|
||||||
...(role ? { CLAUDEMESH_ROLE: role } : {}),
|
...(role ? { CLAUDEMESH_ROLE: role } : {}),
|
||||||
|
|||||||
755
apps/cli/src/commands/me.ts
Normal file
755
apps/cli/src/commands/me.ts
Normal file
@@ -0,0 +1,755 @@
|
|||||||
|
/**
|
||||||
|
* `claudemesh me` — cross-mesh workspace overview for the caller's user.
|
||||||
|
*
|
||||||
|
* Calls GET /v1/me/workspace which aggregates over every mesh the
|
||||||
|
* authenticated user belongs to: peer count, online count, topic count,
|
||||||
|
* unread @-mention count per mesh + global totals.
|
||||||
|
*
|
||||||
|
* Auth: mints a temporary read-scoped REST apikey on whichever mesh
|
||||||
|
* the user has joined first (any mesh works — the endpoint resolves
|
||||||
|
* to the issuing user, not the apikey's mesh).
|
||||||
|
*
|
||||||
|
* v0.4.0 substrate. Future verbs (`me topics`, `me notifications`,
|
||||||
|
* `me activity`, `me search`) layer on top of similar aggregating
|
||||||
|
* endpoints once they ship.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { withRestKey } from "~/services/api/with-rest-key.js";
|
||||||
|
import { request } from "~/services/api/client.js";
|
||||||
|
import { readConfig } from "~/services/config/facade.js";
|
||||||
|
import { render } from "~/ui/render.js";
|
||||||
|
import { bold, clay, cyan, dim, green, yellow } from "~/ui/styles.js";
|
||||||
|
import { EXIT } from "~/constants/exit-codes.js";
|
||||||
|
|
||||||
|
/**
|
||||||
|
* /v1/me/* endpoints resolve the caller's user from the apikey issuer
|
||||||
|
* regardless of which mesh issued the key — every mesh works. When the
|
||||||
|
* user didn't pass --mesh, silently pick the first joined mesh for
|
||||||
|
* apikey-mint instead of prompting; the endpoint sees the same user.
|
||||||
|
*/
|
||||||
|
function resolveMeshForMint(explicit: string | null | undefined): string | null {
|
||||||
|
if (explicit) return explicit;
|
||||||
|
const cfg = readConfig();
|
||||||
|
return cfg.meshes[0]?.slug ?? null;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceMesh {
|
||||||
|
meshId: string;
|
||||||
|
slug: string;
|
||||||
|
name: string;
|
||||||
|
memberId: string;
|
||||||
|
myRole: string;
|
||||||
|
joinedAt: string;
|
||||||
|
peers: number;
|
||||||
|
online: number;
|
||||||
|
topics: number;
|
||||||
|
unreadMentions: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceResponse {
|
||||||
|
userId: string;
|
||||||
|
meshes: WorkspaceMesh[];
|
||||||
|
totals: {
|
||||||
|
meshes: number;
|
||||||
|
peers: number;
|
||||||
|
online: number;
|
||||||
|
topics: number;
|
||||||
|
unreadMentions: number;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface MeFlags {
|
||||||
|
mesh?: string;
|
||||||
|
json?: boolean;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runMe(flags: MeFlags): Promise<number> {
|
||||||
|
return withRestKey(
|
||||||
|
{
|
||||||
|
meshSlug: resolveMeshForMint(flags.mesh),
|
||||||
|
purpose: "workspace-overview",
|
||||||
|
capabilities: ["read"],
|
||||||
|
},
|
||||||
|
async ({ secret }) => {
|
||||||
|
const ws = await request<WorkspaceResponse>({
|
||||||
|
path: "/api/v1/me/workspace",
|
||||||
|
token: secret,
|
||||||
|
});
|
||||||
|
|
||||||
|
if (flags.json) {
|
||||||
|
console.log(JSON.stringify(ws, null, 2));
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
render.section(
|
||||||
|
`${clay("workspace")} — ${bold(ws.userId.slice(0, 8))} ${dim(
|
||||||
|
`· ${ws.totals.meshes} mesh${ws.totals.meshes === 1 ? "" : "es"}`,
|
||||||
|
)}`,
|
||||||
|
);
|
||||||
|
|
||||||
|
const totalsLine = [
|
||||||
|
`${green(String(ws.totals.online))}/${ws.totals.peers} online`,
|
||||||
|
`${ws.totals.topics} topic${ws.totals.topics === 1 ? "" : "s"}`,
|
||||||
|
ws.totals.unreadMentions > 0
|
||||||
|
? yellow(`${ws.totals.unreadMentions} unread @you`)
|
||||||
|
: dim("0 unread @you"),
|
||||||
|
].join(dim(" · "));
|
||||||
|
process.stdout.write(" " + totalsLine + "\n\n");
|
||||||
|
|
||||||
|
if (ws.meshes.length === 0) {
|
||||||
|
process.stdout.write(
|
||||||
|
dim(" no meshes joined — run `claudemesh new` or accept an invite\n"),
|
||||||
|
);
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
const slugWidth = Math.max(...ws.meshes.map((m) => m.slug.length), 8);
|
||||||
|
for (const m of ws.meshes) {
|
||||||
|
const slug = cyan(m.slug.padEnd(slugWidth));
|
||||||
|
const peers = `${m.online}/${m.peers}`;
|
||||||
|
const role = dim(m.myRole);
|
||||||
|
const unread =
|
||||||
|
m.unreadMentions > 0
|
||||||
|
? " " + yellow(`${m.unreadMentions} @you`)
|
||||||
|
: "";
|
||||||
|
process.stdout.write(
|
||||||
|
` ${slug} ${peers.padStart(5)} online ${dim(
|
||||||
|
String(m.topics).padStart(2) + " topics",
|
||||||
|
)} ${role}${unread}\n`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceTopic {
|
||||||
|
topicId: string;
|
||||||
|
name: string;
|
||||||
|
description: string | null;
|
||||||
|
visibility: string;
|
||||||
|
createdAt: string;
|
||||||
|
meshId: string;
|
||||||
|
meshSlug: string;
|
||||||
|
meshName: string;
|
||||||
|
memberId: string;
|
||||||
|
unread: number;
|
||||||
|
lastMessageAt: string | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceTopicsResponse {
|
||||||
|
topics: WorkspaceTopic[];
|
||||||
|
totals: { topics: number; unread: number };
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface MeTopicsFlags extends MeFlags {
|
||||||
|
unread?: boolean;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runMeTopics(flags: MeTopicsFlags): Promise<number> {
|
||||||
|
return withRestKey(
|
||||||
|
{
|
||||||
|
meshSlug: resolveMeshForMint(flags.mesh),
|
||||||
|
purpose: "workspace-topics",
|
||||||
|
capabilities: ["read"],
|
||||||
|
},
|
||||||
|
async ({ secret }) => {
|
||||||
|
const ws = await request<WorkspaceTopicsResponse>({
|
||||||
|
path: "/api/v1/me/topics",
|
||||||
|
token: secret,
|
||||||
|
});
|
||||||
|
|
||||||
|
const visible = flags.unread
|
||||||
|
? ws.topics.filter((t) => t.unread > 0)
|
||||||
|
: ws.topics;
|
||||||
|
|
||||||
|
if (flags.json) {
|
||||||
|
console.log(
|
||||||
|
JSON.stringify(
|
||||||
|
{ topics: visible, totals: ws.totals },
|
||||||
|
null,
|
||||||
|
2,
|
||||||
|
),
|
||||||
|
);
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
render.section(
|
||||||
|
`${clay("topics")} — ${ws.totals.topics} across all meshes ${dim(
|
||||||
|
ws.totals.unread > 0
|
||||||
|
? `· ${ws.totals.unread} unread`
|
||||||
|
: "· all read",
|
||||||
|
)}`,
|
||||||
|
);
|
||||||
|
|
||||||
|
if (visible.length === 0) {
|
||||||
|
process.stdout.write(
|
||||||
|
dim(
|
||||||
|
flags.unread
|
||||||
|
? " no unread topics\n"
|
||||||
|
: " no topics — run `claudemesh topic create #general`\n",
|
||||||
|
),
|
||||||
|
);
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
const slugWidth = Math.max(...visible.map((t) => t.meshSlug.length), 6);
|
||||||
|
const nameWidth = Math.max(...visible.map((t) => t.name.length), 8);
|
||||||
|
|
||||||
|
for (const t of visible) {
|
||||||
|
const slug = dim(t.meshSlug.padEnd(slugWidth));
|
||||||
|
const name = cyan(t.name.padEnd(nameWidth));
|
||||||
|
const unread =
|
||||||
|
t.unread > 0
|
||||||
|
? yellow(`${t.unread} unread`.padStart(10))
|
||||||
|
: dim("·".padStart(10));
|
||||||
|
const last = t.lastMessageAt
|
||||||
|
? dim(formatRelativeTime(t.lastMessageAt))
|
||||||
|
: dim("never");
|
||||||
|
process.stdout.write(` ${slug} ${name} ${unread} ${last}\n`);
|
||||||
|
}
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceNotification {
|
||||||
|
notificationId: string;
|
||||||
|
messageId: string;
|
||||||
|
topicId: string;
|
||||||
|
topicName: string;
|
||||||
|
meshId: string;
|
||||||
|
meshSlug: string;
|
||||||
|
meshName: string;
|
||||||
|
senderName: string | null;
|
||||||
|
snippet: string | null;
|
||||||
|
ciphertext: string | null;
|
||||||
|
bodyVersion: number;
|
||||||
|
read: boolean;
|
||||||
|
readAt: string | null;
|
||||||
|
createdAt: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceNotificationsResponse {
|
||||||
|
notifications: WorkspaceNotification[];
|
||||||
|
totals: { unread: number; total: number };
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface MeNotificationsFlags extends MeFlags {
|
||||||
|
all?: boolean;
|
||||||
|
since?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runMeNotifications(
|
||||||
|
flags: MeNotificationsFlags,
|
||||||
|
): Promise<number> {
|
||||||
|
return withRestKey(
|
||||||
|
{
|
||||||
|
meshSlug: resolveMeshForMint(flags.mesh),
|
||||||
|
purpose: "workspace-notifications",
|
||||||
|
capabilities: ["read"],
|
||||||
|
},
|
||||||
|
async ({ secret }) => {
|
||||||
|
const params = new URLSearchParams();
|
||||||
|
if (flags.all) params.set("include", "all");
|
||||||
|
if (flags.since) params.set("since", flags.since);
|
||||||
|
const path =
|
||||||
|
"/api/v1/me/notifications" +
|
||||||
|
(params.toString() ? `?${params.toString()}` : "");
|
||||||
|
const ws = await request<WorkspaceNotificationsResponse>({
|
||||||
|
path,
|
||||||
|
token: secret,
|
||||||
|
});
|
||||||
|
|
||||||
|
if (flags.json) {
|
||||||
|
console.log(JSON.stringify(ws, null, 2));
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
const headerLabel = flags.all ? "@-mentions (all)" : "@-mentions (unread)";
|
||||||
|
render.section(
|
||||||
|
`${clay(headerLabel)} — ${ws.totals.total} ${dim(
|
||||||
|
ws.totals.unread > 0 ? `· ${ws.totals.unread} unread` : "· nothing pending",
|
||||||
|
)}`,
|
||||||
|
);
|
||||||
|
|
||||||
|
if (ws.notifications.length === 0) {
|
||||||
|
process.stdout.write(
|
||||||
|
dim(
|
||||||
|
flags.all
|
||||||
|
? " no @-mentions in window\n"
|
||||||
|
: " inbox zero — nothing waiting\n",
|
||||||
|
),
|
||||||
|
);
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
const slugWidth = Math.max(
|
||||||
|
...ws.notifications.map((n) => n.meshSlug.length),
|
||||||
|
6,
|
||||||
|
);
|
||||||
|
|
||||||
|
for (const n of ws.notifications) {
|
||||||
|
const slug = dim(n.meshSlug.padEnd(slugWidth));
|
||||||
|
const topic = cyan(`#${n.topicName}`);
|
||||||
|
const sender = n.senderName ? `from ${n.senderName}` : "from ?";
|
||||||
|
const ago = formatRelativeTime(n.createdAt);
|
||||||
|
const dot = n.read ? dim("·") : yellow("●");
|
||||||
|
const snippet =
|
||||||
|
n.snippet ?? (n.ciphertext ? dim("[encrypted]") : dim("[empty]"));
|
||||||
|
process.stdout.write(
|
||||||
|
` ${dot} ${slug} ${topic} ${dim(sender)} ${dim(ago)}\n` +
|
||||||
|
` ${snippet.length > 200 ? snippet.slice(0, 200) + "…" : snippet}\n`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceActivity {
|
||||||
|
messageId: string;
|
||||||
|
topicId: string;
|
||||||
|
topicName: string;
|
||||||
|
meshId: string;
|
||||||
|
meshSlug: string;
|
||||||
|
meshName: string;
|
||||||
|
senderName: string;
|
||||||
|
senderMemberId: string;
|
||||||
|
snippet: string | null;
|
||||||
|
ciphertext: string | null;
|
||||||
|
bodyVersion: number;
|
||||||
|
createdAt: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceActivityResponse {
|
||||||
|
activity: WorkspaceActivity[];
|
||||||
|
totals: { events: number };
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface MeActivityFlags extends MeFlags {
|
||||||
|
since?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runMeActivity(flags: MeActivityFlags): Promise<number> {
|
||||||
|
return withRestKey(
|
||||||
|
{
|
||||||
|
meshSlug: resolveMeshForMint(flags.mesh),
|
||||||
|
purpose: "workspace-activity",
|
||||||
|
capabilities: ["read"],
|
||||||
|
},
|
||||||
|
async ({ secret }) => {
|
||||||
|
const params = new URLSearchParams();
|
||||||
|
if (flags.since) params.set("since", flags.since);
|
||||||
|
const path =
|
||||||
|
"/api/v1/me/activity" +
|
||||||
|
(params.toString() ? `?${params.toString()}` : "");
|
||||||
|
const ws = await request<WorkspaceActivityResponse>({
|
||||||
|
path,
|
||||||
|
token: secret,
|
||||||
|
});
|
||||||
|
|
||||||
|
if (flags.json) {
|
||||||
|
console.log(JSON.stringify(ws, null, 2));
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
render.section(
|
||||||
|
`${clay("activity")} — ${ws.totals.events} ${dim(
|
||||||
|
flags.since ? `since ${flags.since}` : "in the last 24h",
|
||||||
|
)}`,
|
||||||
|
);
|
||||||
|
|
||||||
|
if (ws.activity.length === 0) {
|
||||||
|
process.stdout.write(dim(" quiet — no activity in window\n"));
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
const slugWidth = Math.max(
|
||||||
|
...ws.activity.map((a) => a.meshSlug.length),
|
||||||
|
6,
|
||||||
|
);
|
||||||
|
|
||||||
|
for (const a of ws.activity) {
|
||||||
|
const slug = dim(a.meshSlug.padEnd(slugWidth));
|
||||||
|
const topic = cyan(`#${a.topicName}`);
|
||||||
|
const sender = a.senderName ?? "?";
|
||||||
|
const ago = formatRelativeTime(a.createdAt);
|
||||||
|
const snippet =
|
||||||
|
a.snippet ?? (a.ciphertext ? dim("[encrypted]") : dim("[empty]"));
|
||||||
|
process.stdout.write(
|
||||||
|
` ${slug} ${topic} ${dim(sender + " ·")} ${dim(ago)}\n` +
|
||||||
|
` ${snippet.length > 200 ? snippet.slice(0, 200) + "…" : snippet}\n`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceSearchTopicHit {
|
||||||
|
id: string;
|
||||||
|
name: string;
|
||||||
|
description: string | null;
|
||||||
|
meshId: string;
|
||||||
|
meshSlug: string;
|
||||||
|
meshName: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceSearchMessageHit {
|
||||||
|
messageId: string;
|
||||||
|
topicId: string;
|
||||||
|
topicName: string;
|
||||||
|
meshId: string;
|
||||||
|
meshSlug: string;
|
||||||
|
senderName: string;
|
||||||
|
snippet: string | null;
|
||||||
|
bodyVersion: number;
|
||||||
|
createdAt: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceSearchResponse {
|
||||||
|
query: string;
|
||||||
|
topics: WorkspaceSearchTopicHit[];
|
||||||
|
messages: WorkspaceSearchMessageHit[];
|
||||||
|
totals: { topics: number; messages: number };
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface MeSearchFlags extends MeFlags {
|
||||||
|
query: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runMeSearch(flags: MeSearchFlags): Promise<number> {
|
||||||
|
if (!flags.query || flags.query.length < 2) {
|
||||||
|
process.stderr.write(
|
||||||
|
"Usage: claudemesh me search <query> (min 2 chars)\n",
|
||||||
|
);
|
||||||
|
return EXIT.INVALID_ARGS;
|
||||||
|
}
|
||||||
|
|
||||||
|
return withRestKey(
|
||||||
|
{
|
||||||
|
meshSlug: resolveMeshForMint(flags.mesh),
|
||||||
|
purpose: "workspace-search",
|
||||||
|
capabilities: ["read"],
|
||||||
|
},
|
||||||
|
async ({ secret }) => {
|
||||||
|
const params = new URLSearchParams({ q: flags.query });
|
||||||
|
const ws = await request<WorkspaceSearchResponse>({
|
||||||
|
path: `/api/v1/me/search?${params.toString()}`,
|
||||||
|
token: secret,
|
||||||
|
});
|
||||||
|
|
||||||
|
if (flags.json) {
|
||||||
|
console.log(JSON.stringify(ws, null, 2));
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
render.section(
|
||||||
|
`${clay("search")} — "${flags.query}" ${dim(
|
||||||
|
`${ws.totals.topics} topic${ws.totals.topics === 1 ? "" : "s"}, ` +
|
||||||
|
`${ws.totals.messages} message${ws.totals.messages === 1 ? "" : "s"}`,
|
||||||
|
)}`,
|
||||||
|
);
|
||||||
|
|
||||||
|
if (ws.topics.length === 0 && ws.messages.length === 0) {
|
||||||
|
process.stdout.write(dim(" no matches\n"));
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (ws.topics.length > 0) {
|
||||||
|
process.stdout.write(dim("\n topics\n"));
|
||||||
|
const slugWidth = Math.max(
|
||||||
|
...ws.topics.map((t) => t.meshSlug.length),
|
||||||
|
6,
|
||||||
|
);
|
||||||
|
for (const t of ws.topics) {
|
||||||
|
const slug = dim(t.meshSlug.padEnd(slugWidth));
|
||||||
|
const name = cyan(`#${t.name}`);
|
||||||
|
const desc = t.description ? dim(` — ${t.description}`) : "";
|
||||||
|
process.stdout.write(` ${slug} ${name}${desc}\n`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (ws.messages.length > 0) {
|
||||||
|
process.stdout.write(dim("\n messages\n"));
|
||||||
|
const slugWidth = Math.max(
|
||||||
|
...ws.messages.map((m) => m.meshSlug.length),
|
||||||
|
6,
|
||||||
|
);
|
||||||
|
for (const m of ws.messages) {
|
||||||
|
const slug = dim(m.meshSlug.padEnd(slugWidth));
|
||||||
|
const topic = cyan(`#${m.topicName}`);
|
||||||
|
const sender = m.senderName;
|
||||||
|
const ago = formatRelativeTime(m.createdAt);
|
||||||
|
const snippet =
|
||||||
|
m.snippet ??
|
||||||
|
(m.bodyVersion === 2 ? dim("[encrypted — open the topic to decrypt]") : dim("[empty]"));
|
||||||
|
const highlighted =
|
||||||
|
m.snippet
|
||||||
|
? highlightMatch(snippet, flags.query)
|
||||||
|
: snippet;
|
||||||
|
process.stdout.write(
|
||||||
|
` ${slug} ${topic} ${dim(sender + " ·")} ${dim(ago)}\n` +
|
||||||
|
` ${highlighted}\n`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
function highlightMatch(text: string, query: string): string {
|
||||||
|
if (!query) return text;
|
||||||
|
const idx = text.toLowerCase().indexOf(query.toLowerCase());
|
||||||
|
if (idx === -1) return text;
|
||||||
|
const before = text.slice(0, idx);
|
||||||
|
const match = text.slice(idx, idx + query.length);
|
||||||
|
const after = text.slice(idx + query.length);
|
||||||
|
return `${before}${yellow(match)}${after}`;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceTask {
|
||||||
|
id: string;
|
||||||
|
meshId: string;
|
||||||
|
meshSlug: string;
|
||||||
|
title: string;
|
||||||
|
assignee: string | null;
|
||||||
|
claimedByName: string | null;
|
||||||
|
priority: string;
|
||||||
|
status: string;
|
||||||
|
tags: string[];
|
||||||
|
result: string | null;
|
||||||
|
createdByName: string | null;
|
||||||
|
createdAt: string;
|
||||||
|
claimedAt: string | null;
|
||||||
|
completedAt: string | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceTasksResponse {
|
||||||
|
tasks: WorkspaceTask[];
|
||||||
|
totals: { open: number; claimed: number; completed: number };
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface MeTasksFlags extends MeFlags {
|
||||||
|
status?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runMeTasks(flags: MeTasksFlags): Promise<number> {
|
||||||
|
return withRestKey(
|
||||||
|
{
|
||||||
|
meshSlug: resolveMeshForMint(flags.mesh),
|
||||||
|
purpose: "workspace-tasks",
|
||||||
|
capabilities: ["read"],
|
||||||
|
},
|
||||||
|
async ({ secret }) => {
|
||||||
|
const params = new URLSearchParams();
|
||||||
|
if (flags.status) params.set("status", flags.status);
|
||||||
|
const path =
|
||||||
|
"/api/v1/me/tasks" +
|
||||||
|
(params.toString() ? `?${params.toString()}` : "");
|
||||||
|
const ws = await request<WorkspaceTasksResponse>({
|
||||||
|
path,
|
||||||
|
token: secret,
|
||||||
|
});
|
||||||
|
|
||||||
|
if (flags.json) {
|
||||||
|
console.log(JSON.stringify(ws, null, 2));
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
render.section(
|
||||||
|
`${clay("tasks")} — ${dim(
|
||||||
|
`${ws.totals.open} open · ${ws.totals.claimed} in-flight · ${ws.totals.completed} done`,
|
||||||
|
)}`,
|
||||||
|
);
|
||||||
|
|
||||||
|
if (ws.tasks.length === 0) {
|
||||||
|
process.stdout.write(dim(" no tasks in window\n"));
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
const slugWidth = Math.max(...ws.tasks.map((t) => t.meshSlug.length), 6);
|
||||||
|
for (const t of ws.tasks) {
|
||||||
|
const slug = dim(t.meshSlug.padEnd(slugWidth));
|
||||||
|
const status =
|
||||||
|
t.status === "open"
|
||||||
|
? yellow("open ")
|
||||||
|
: t.status === "claimed"
|
||||||
|
? cyan("working ")
|
||||||
|
: green("done ");
|
||||||
|
const prio =
|
||||||
|
t.priority === "urgent"
|
||||||
|
? yellow("!")
|
||||||
|
: t.priority === "low"
|
||||||
|
? dim("·")
|
||||||
|
: " ";
|
||||||
|
const claimer = t.claimedByName ? dim(` ← ${t.claimedByName}`) : "";
|
||||||
|
process.stdout.write(
|
||||||
|
` ${slug} ${prio} ${status} ${t.title}${claimer}\n`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceStateEntry {
|
||||||
|
meshId: string;
|
||||||
|
meshSlug: string;
|
||||||
|
key: string;
|
||||||
|
value: unknown;
|
||||||
|
updatedByName: string | null;
|
||||||
|
updatedAt: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceStateResponse {
|
||||||
|
entries: WorkspaceStateEntry[];
|
||||||
|
totals: { entries: number; meshes: number };
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface MeStateFlags extends MeFlags {
|
||||||
|
key?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runMeState(flags: MeStateFlags): Promise<number> {
|
||||||
|
return withRestKey(
|
||||||
|
{
|
||||||
|
meshSlug: resolveMeshForMint(flags.mesh),
|
||||||
|
purpose: "workspace-state",
|
||||||
|
capabilities: ["read"],
|
||||||
|
},
|
||||||
|
async ({ secret }) => {
|
||||||
|
const params = new URLSearchParams();
|
||||||
|
if (flags.key) params.set("key", flags.key);
|
||||||
|
const path =
|
||||||
|
"/api/v1/me/state" +
|
||||||
|
(params.toString() ? `?${params.toString()}` : "");
|
||||||
|
const ws = await request<WorkspaceStateResponse>({
|
||||||
|
path,
|
||||||
|
token: secret,
|
||||||
|
});
|
||||||
|
|
||||||
|
if (flags.json) {
|
||||||
|
console.log(JSON.stringify(ws, null, 2));
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
render.section(
|
||||||
|
`${clay("state")} — ${ws.totals.entries} entr${ws.totals.entries === 1 ? "y" : "ies"} ${dim(
|
||||||
|
`across ${ws.totals.meshes} mesh${ws.totals.meshes === 1 ? "" : "es"}`,
|
||||||
|
)}`,
|
||||||
|
);
|
||||||
|
|
||||||
|
if (ws.entries.length === 0) {
|
||||||
|
process.stdout.write(dim(" no state entries\n"));
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
const slugWidth = Math.max(...ws.entries.map((e) => e.meshSlug.length), 6);
|
||||||
|
const keyWidth = Math.max(...ws.entries.map((e) => e.key.length), 8);
|
||||||
|
for (const e of ws.entries) {
|
||||||
|
const slug = dim(e.meshSlug.padEnd(slugWidth));
|
||||||
|
const key = cyan(e.key.padEnd(keyWidth));
|
||||||
|
const valueStr =
|
||||||
|
typeof e.value === "string"
|
||||||
|
? e.value
|
||||||
|
: JSON.stringify(e.value);
|
||||||
|
const trimmed =
|
||||||
|
valueStr.length > 80 ? valueStr.slice(0, 80) + "…" : valueStr;
|
||||||
|
const ago = dim(formatRelativeTime(e.updatedAt));
|
||||||
|
process.stdout.write(` ${slug} ${key} ${trimmed} ${ago}\n`);
|
||||||
|
}
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceMemory {
|
||||||
|
id: string;
|
||||||
|
meshId: string;
|
||||||
|
meshSlug: string;
|
||||||
|
content: string;
|
||||||
|
tags: string[];
|
||||||
|
rememberedByName: string | null;
|
||||||
|
rememberedAt: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface WorkspaceMemoryResponse {
|
||||||
|
query: string;
|
||||||
|
memories: WorkspaceMemory[];
|
||||||
|
totals: { entries: number };
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface MeMemoryFlags extends MeFlags {
|
||||||
|
query?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runMeMemory(flags: MeMemoryFlags): Promise<number> {
|
||||||
|
return withRestKey(
|
||||||
|
{
|
||||||
|
meshSlug: resolveMeshForMint(flags.mesh),
|
||||||
|
purpose: "workspace-memory",
|
||||||
|
capabilities: ["read"],
|
||||||
|
},
|
||||||
|
async ({ secret }) => {
|
||||||
|
const params = new URLSearchParams();
|
||||||
|
if (flags.query) params.set("q", flags.query);
|
||||||
|
const path =
|
||||||
|
"/api/v1/me/memory" +
|
||||||
|
(params.toString() ? `?${params.toString()}` : "");
|
||||||
|
const ws = await request<WorkspaceMemoryResponse>({
|
||||||
|
path,
|
||||||
|
token: secret,
|
||||||
|
});
|
||||||
|
|
||||||
|
if (flags.json) {
|
||||||
|
console.log(JSON.stringify(ws, null, 2));
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
const headerLabel = flags.query
|
||||||
|
? `recall — "${flags.query}"`
|
||||||
|
: "recall — last 30 days";
|
||||||
|
render.section(
|
||||||
|
`${clay(headerLabel)} ${dim(`${ws.totals.entries} match${ws.totals.entries === 1 ? "" : "es"}`)}`,
|
||||||
|
);
|
||||||
|
|
||||||
|
if (ws.memories.length === 0) {
|
||||||
|
process.stdout.write(dim(" no memories\n"));
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
const slugWidth = Math.max(
|
||||||
|
...ws.memories.map((m) => m.meshSlug.length),
|
||||||
|
6,
|
||||||
|
);
|
||||||
|
for (const m of ws.memories) {
|
||||||
|
const slug = dim(m.meshSlug.padEnd(slugWidth));
|
||||||
|
const ago = dim(formatRelativeTime(m.rememberedAt));
|
||||||
|
const tags =
|
||||||
|
m.tags.length > 0
|
||||||
|
? " " + dim("[" + m.tags.join(", ") + "]")
|
||||||
|
: "";
|
||||||
|
const content =
|
||||||
|
m.content.length > 240 ? m.content.slice(0, 240) + "…" : m.content;
|
||||||
|
process.stdout.write(` ${slug} ${ago}${tags}\n ${content}\n`);
|
||||||
|
}
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
function formatRelativeTime(iso: string): string {
|
||||||
|
const then = new Date(iso).getTime();
|
||||||
|
const now = Date.now();
|
||||||
|
const sec = Math.max(0, Math.floor((now - then) / 1000));
|
||||||
|
if (sec < 60) return `${sec}s ago`;
|
||||||
|
if (sec < 3600) return `${Math.floor(sec / 60)}m ago`;
|
||||||
|
if (sec < 86_400) return `${Math.floor(sec / 3600)}h ago`;
|
||||||
|
if (sec < 86_400 * 30) return `${Math.floor(sec / 86_400)}d ago`;
|
||||||
|
if (sec < 86_400 * 365)
|
||||||
|
return `${Math.floor(sec / (86_400 * 30))}mo ago`;
|
||||||
|
return `${Math.floor(sec / (86_400 * 365))}y ago`;
|
||||||
|
}
|
||||||
@@ -14,7 +14,6 @@
|
|||||||
|
|
||||||
import { withMesh } from "./connect.js";
|
import { withMesh } from "./connect.js";
|
||||||
import { readConfig } from "~/services/config/facade.js";
|
import { readConfig } from "~/services/config/facade.js";
|
||||||
import { tryBridge } from "~/services/bridge/client.js";
|
|
||||||
import { render } from "~/ui/render.js";
|
import { render } from "~/ui/render.js";
|
||||||
import { bold, dim, green, yellow } from "~/ui/styles.js";
|
import { bold, dim, green, yellow } from "~/ui/styles.js";
|
||||||
|
|
||||||
@@ -22,18 +21,79 @@ export interface PeersFlags {
|
|||||||
mesh?: string;
|
mesh?: string;
|
||||||
/** `true`/`undefined` = full record; comma-separated string = field projection. */
|
/** `true`/`undefined` = full record; comma-separated string = field projection. */
|
||||||
json?: boolean | string;
|
json?: boolean | string;
|
||||||
|
/** When false (default), hide control-plane presence rows from the
|
||||||
|
* human renderer — they're infrastructure (daemon-WS member-keyed
|
||||||
|
* presence), not interactive peers, and confused users into thinking
|
||||||
|
* the daemon counted as a "peer". The JSON output still includes them
|
||||||
|
* so scripts that need a full inventory can opt in via --all (or
|
||||||
|
* just consume JSON).
|
||||||
|
*
|
||||||
|
* Source of truth is the broker-side `role` field
|
||||||
|
* (`'control-plane' | 'session' | 'service'`). Older brokers don't
|
||||||
|
* emit `role` yet — this code falls back to treating missing role as
|
||||||
|
* `'session'` so legacy peer rows stay visible. */
|
||||||
|
all?: boolean;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Broker-emitted peer classification, added 2026-05-04. Older brokers
|
||||||
|
* may omit it — treat missing as 'session' so legacy meshes still
|
||||||
|
* render their peers (and don't accidentally hide them all). The CLI
|
||||||
|
* never emits 'control-plane' on its own; that comes from the broker.
|
||||||
|
*/
|
||||||
|
export type PeerRole = "control-plane" | "session" | "service";
|
||||||
|
|
||||||
interface PeerRecord {
|
interface PeerRecord {
|
||||||
pubkey: string;
|
pubkey: string;
|
||||||
|
/** Stable member pubkey (independent of session). When sender shares
|
||||||
|
* this with a peer, they're talking to the same person across all
|
||||||
|
* their open sessions. */
|
||||||
|
memberPubkey?: string;
|
||||||
|
/** Per-launch session identifier (uuid). Used by the renderer to
|
||||||
|
* disambiguate sibling sessions of the same member that otherwise
|
||||||
|
* look identical (same name, same cwd). */
|
||||||
|
sessionId?: string;
|
||||||
displayName: string;
|
displayName: string;
|
||||||
status?: string;
|
status?: string;
|
||||||
summary?: string;
|
summary?: string;
|
||||||
groups: Array<{ name: string; role?: string }>;
|
groups: Array<{ name: string; role?: string }>;
|
||||||
|
/** Top-level convenience alias for `profile.role`, lifted by the CLI
|
||||||
|
* since 1.31.5 so JSON consumers (the agent-vibes claudemesh skill,
|
||||||
|
* launched-session LLMs) see the user-supplied role string at the
|
||||||
|
* shape's top level. Same value as `profile.role`. Distinct from
|
||||||
|
* `peerRole` below — that's the broker's presence-class taxonomy. */
|
||||||
|
role?: string;
|
||||||
|
/** Broker-emitted presence classification: 'control-plane' | 'session'
|
||||||
|
* | 'service'. Source of truth for the --all visibility filter and
|
||||||
|
* the default-hide rule. Older brokers omit this; the CLI fills
|
||||||
|
* missing values with 'session' so legacy peer rows stay visible.
|
||||||
|
*
|
||||||
|
* Renamed from `role` to avoid collision with 1.31.5's profile.role
|
||||||
|
* lift above. Wire-level field on the broker is also `peerRole`. */
|
||||||
|
peerRole?: PeerRole;
|
||||||
peerType?: string;
|
peerType?: string;
|
||||||
channel?: string;
|
channel?: string;
|
||||||
model?: string;
|
model?: string;
|
||||||
cwd?: string;
|
cwd?: string;
|
||||||
|
/** Peer-level profile metadata (set via `claudemesh profile`). The
|
||||||
|
* broker passes this through verbatim; the most common field is
|
||||||
|
* `role` ("lead", "reviewer", "human", etc.) but capabilities, bio,
|
||||||
|
* avatar, and title also live here when set. */
|
||||||
|
profile?: {
|
||||||
|
role?: string;
|
||||||
|
title?: string;
|
||||||
|
bio?: string;
|
||||||
|
avatar?: string;
|
||||||
|
capabilities?: string[];
|
||||||
|
[k: string]: unknown;
|
||||||
|
};
|
||||||
|
/** True when this peer is one of the caller's own member's sessions.
|
||||||
|
* Set in the cli (not the broker) by comparing memberPubkey against
|
||||||
|
* the caller's stable JoinedMesh.pubkey. */
|
||||||
|
isSelf?: boolean;
|
||||||
|
/** When isSelf is true, true if this is the exact session running
|
||||||
|
* the command (vs a sibling session of the same member). */
|
||||||
|
isThisSession?: boolean;
|
||||||
[k: string]: unknown;
|
[k: string]: unknown;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -52,24 +112,115 @@ function projectFields(record: PeerRecord, fields: string[]): Record<string, unk
|
|||||||
}
|
}
|
||||||
|
|
||||||
async function listPeersForMesh(slug: string): Promise<PeerRecord[]> {
|
async function listPeersForMesh(slug: string): Promise<PeerRecord[]> {
|
||||||
// Try warm path first.
|
const config = readConfig();
|
||||||
const bridged = await tryBridge(slug, "peers");
|
const joined = config.meshes.find((m) => m.slug === slug);
|
||||||
if (bridged && bridged.ok) {
|
const selfMemberPubkey = joined?.pubkey ?? null;
|
||||||
return bridged.result as PeerRecord[];
|
|
||||||
}
|
// Resolve our own session pubkey via the daemon's /v1/sessions/me when
|
||||||
// Cold path — open our own WS.
|
// we're inside a launched session. Without this, isThisSession can't
|
||||||
|
// be set on the daemon path (only on the cold path where a fresh WS
|
||||||
|
// creates the keypair), and the renderer can't tell the user which
|
||||||
|
// row in `peer list` is them.
|
||||||
|
let selfSessionPubkey: string | null = null;
|
||||||
|
try {
|
||||||
|
const { getSessionInfo } = await import("~/services/session/resolve.js");
|
||||||
|
const sess = await getSessionInfo();
|
||||||
|
if (sess && sess.mesh === slug && sess.presence?.sessionPubkey) {
|
||||||
|
selfSessionPubkey = sess.presence.sessionPubkey;
|
||||||
|
}
|
||||||
|
} catch { /* not in a launched session; isThisSession stays false */ }
|
||||||
|
|
||||||
|
// Daemon path — preferred when running. Same routing pattern as send.ts:
|
||||||
|
// ~1 ms IPC round-trip; broker WS already warm in the daemon. The
|
||||||
|
// lifecycle helper inside tryListPeersViaDaemon auto-spawns the
|
||||||
|
// daemon if it's down and probes it for liveness — no separate bridge
|
||||||
|
// tier is needed any more (1.28.0).
|
||||||
|
//
|
||||||
|
// 1.34.15: forward `slug` to the daemon as `?mesh=<slug>` so the
|
||||||
|
// server-side aggregator narrows to the requested mesh. Pre-1.34.15
|
||||||
|
// we called this with no argument, so a multi-mesh daemon returned
|
||||||
|
// peers from every attached mesh and the renderer printed "peers on
|
||||||
|
// flexicar" with cross-mesh rows mixed in. The daemon's
|
||||||
|
// `meshFromCtx` already does the right scoping when the slug is
|
||||||
|
// passed; the CLI just wasn't passing it.
|
||||||
|
try {
|
||||||
|
const { tryListPeersViaDaemon } = await import("~/services/bridge/daemon-route.js");
|
||||||
|
const dr = await tryListPeersViaDaemon(slug);
|
||||||
|
if (dr !== null) {
|
||||||
|
return dr.map((p) => annotateSelf(p as PeerRecord, selfMemberPubkey, selfSessionPubkey));
|
||||||
|
}
|
||||||
|
} catch { /* daemon route helper not available; fall through */ }
|
||||||
|
|
||||||
|
// Cold path — open our own WS. Reached only when the lifecycle helper
|
||||||
|
// could not bring the daemon up.
|
||||||
let result: PeerRecord[] = [];
|
let result: PeerRecord[] = [];
|
||||||
await withMesh({ meshSlug: slug }, async (client) => {
|
await withMesh({ meshSlug: slug }, async (client) => {
|
||||||
const all = await client.listPeers();
|
const all = (await client.listPeers()) as unknown as PeerRecord[];
|
||||||
const selfPubkey = client.getSessionPubkey();
|
const selfSessionPubkey = client.getSessionPubkey();
|
||||||
result = (selfPubkey ? all.filter((p) => p.pubkey !== selfPubkey) : all) as unknown as PeerRecord[];
|
result = all.map((p) =>
|
||||||
|
annotateSelf(p, selfMemberPubkey, selfSessionPubkey),
|
||||||
|
);
|
||||||
});
|
});
|
||||||
return result;
|
return result;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Tag each peer record with `isSelf` / `isThisSession` so the renderer
|
||||||
|
* (and downstream code that picks targets, e.g. `claudemesh send`) can
|
||||||
|
* tell sender's own sessions from real peers. The broker has always
|
||||||
|
* surfaced a sender's siblings as separate rows because they're separate
|
||||||
|
* presence rows; the cli just hadn't been making that visible.
|
||||||
|
*
|
||||||
|
* Also normalizes the broker's `peerRole` classification: missing
|
||||||
|
* values (older brokers) default to 'session' so legacy peer rows stay
|
||||||
|
* visible under the default `--all=false` filter.
|
||||||
|
*
|
||||||
|
* And lifts `profile.role` to a top-level `role` field — the 1.31.5
|
||||||
|
* convenience alias for JSON consumers (skill SKILL.md, launched-session
|
||||||
|
* LLMs, jq pipelines). Same value as profile.role; distinct from
|
||||||
|
* peerRole (presence taxonomy).
|
||||||
|
*/
|
||||||
|
function annotateSelf(
|
||||||
|
peer: PeerRecord,
|
||||||
|
selfMemberPubkey: string | null,
|
||||||
|
selfSessionPubkey: string | null,
|
||||||
|
): PeerRecord {
|
||||||
|
const isSelf = !!(
|
||||||
|
selfMemberPubkey &&
|
||||||
|
peer.memberPubkey &&
|
||||||
|
peer.memberPubkey === selfMemberPubkey
|
||||||
|
);
|
||||||
|
const isThisSession = !!(
|
||||||
|
isSelf &&
|
||||||
|
selfSessionPubkey &&
|
||||||
|
peer.pubkey === selfSessionPubkey
|
||||||
|
);
|
||||||
|
const peerRole: PeerRole = peer.peerRole ?? "session";
|
||||||
|
const profileRole = peer.profile?.role?.trim() || undefined;
|
||||||
|
return {
|
||||||
|
...peer,
|
||||||
|
...(profileRole ? { role: profileRole } : {}),
|
||||||
|
peerRole,
|
||||||
|
isSelf,
|
||||||
|
isThisSession,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
export async function runPeers(flags: PeersFlags): Promise<void> {
|
export async function runPeers(flags: PeersFlags): Promise<void> {
|
||||||
const config = readConfig();
|
const config = readConfig();
|
||||||
const slugs = flags.mesh ? [flags.mesh] : config.meshes.map((m) => m.slug);
|
|
||||||
|
// Mesh selection precedence:
|
||||||
|
// 1. explicit --mesh <slug> (always wins)
|
||||||
|
// 2. session-token mesh (when invoked from inside a launched session)
|
||||||
|
// 3. all joined meshes (default for bare shells)
|
||||||
|
let slugs: string[];
|
||||||
|
if (flags.mesh) {
|
||||||
|
slugs = [flags.mesh];
|
||||||
|
} else {
|
||||||
|
const { getSessionInfo } = await import("~/services/session/resolve.js");
|
||||||
|
const sess = await getSessionInfo();
|
||||||
|
slugs = sess ? [sess.mesh] : config.meshes.map((m) => m.slug);
|
||||||
|
}
|
||||||
|
|
||||||
if (slugs.length === 0) {
|
if (slugs.length === 0) {
|
||||||
render.err("No meshes joined.");
|
render.err("No meshes joined.");
|
||||||
@@ -98,21 +249,41 @@ export async function runPeers(flags: PeersFlags): Promise<void> {
|
|||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
render.section(`peers on ${slug} (${peers.length})`);
|
// Hide control-plane rows by default — they're infrastructure
|
||||||
|
// (daemon-WS member-keyed presence), not interactive peers, and
|
||||||
|
// they confused users into thinking the daemon counted as a
|
||||||
|
// separate peer. --all opts back in for debugging.
|
||||||
|
//
|
||||||
|
// Source of truth: broker-emitted `peerRole` field (added
|
||||||
|
// 2026-05-04). annotateSelf() filled in 'session' for older
|
||||||
|
// brokers that don't emit peerRole yet, so this filter is
|
||||||
|
// backwards-compatible by construction — legacy rows show up.
|
||||||
|
const visible = flags.all
|
||||||
|
? peers
|
||||||
|
: peers.filter((p) => p.peerRole !== "control-plane");
|
||||||
|
|
||||||
if (peers.length === 0) {
|
// Sort: this-session first, then your-other-sessions, then real
|
||||||
|
// peers. Within each group, idle/working ahead of dnd. Inside the
|
||||||
|
// groups, leave broker order. The point is: when you run peer
|
||||||
|
// list, the row that's YOU is row 1.
|
||||||
|
const sorted = visible.slice().sort((a, b) => {
|
||||||
|
const score = (p: PeerRecord) =>
|
||||||
|
p.isThisSession ? 0 : p.isSelf ? 1 : 2;
|
||||||
|
return score(a) - score(b);
|
||||||
|
});
|
||||||
|
|
||||||
|
const hiddenControlPlane = peers.length - visible.length;
|
||||||
|
const header = hiddenControlPlane > 0
|
||||||
|
? `peers on ${slug} (${sorted.length}, ${hiddenControlPlane} control-plane hidden — use --all)`
|
||||||
|
: `peers on ${slug} (${sorted.length})`;
|
||||||
|
render.section(header);
|
||||||
|
|
||||||
|
if (sorted.length === 0) {
|
||||||
render.info(dim(" (no peers connected)"));
|
render.info(dim(" (no peers connected)"));
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
for (const p of peers) {
|
for (const p of sorted) {
|
||||||
const groups = p.groups.length
|
|
||||||
? " [" +
|
|
||||||
p.groups
|
|
||||||
.map((g) => `@${g.name}${g.role ? `:${g.role}` : ""}`)
|
|
||||||
.join(", ") +
|
|
||||||
"]"
|
|
||||||
: "";
|
|
||||||
const statusDot = p.status === "working" ? yellow("●") : green("●");
|
const statusDot = p.status === "working" ? yellow("●") : green("●");
|
||||||
const name = bold(p.displayName);
|
const name = bold(p.displayName);
|
||||||
const meta: string[] = [];
|
const meta: string[] = [];
|
||||||
@@ -122,8 +293,46 @@ export async function runPeers(flags: PeersFlags): Promise<void> {
|
|||||||
const metaStr = meta.length ? dim(` (${meta.join(", ")})`) : "";
|
const metaStr = meta.length ? dim(` (${meta.join(", ")})`) : "";
|
||||||
const summary = p.summary ? dim(` — ${p.summary}`) : "";
|
const summary = p.summary ? dim(` — ${p.summary}`) : "";
|
||||||
const pubkeyTag = dim(` · ${p.pubkey.slice(0, 16)}…`);
|
const pubkeyTag = dim(` · ${p.pubkey.slice(0, 16)}…`);
|
||||||
render.info(`${statusDot} ${name}${groups}${metaStr}${pubkeyTag}${summary}`);
|
// Short sessionId tag — appears for sibling sessions of the same
|
||||||
|
// member that would otherwise be visually identical (same name,
|
||||||
|
// same cwd, only the truncated pubkey on the right differs).
|
||||||
|
const sidTag = p.sessionId
|
||||||
|
? dim(` · sid:${p.sessionId.slice(0, 8)}`)
|
||||||
|
: "";
|
||||||
|
const selfTag = p.isThisSession
|
||||||
|
? dim(" ") + yellow("(this session)")
|
||||||
|
: p.isSelf
|
||||||
|
? dim(" ") + yellow("(your other session)")
|
||||||
|
: "";
|
||||||
|
|
||||||
|
// Inline tags ("role:lead [@flexicar:reviewer, @oncall]") so the
|
||||||
|
// first thing the user sees beside the name is the access /
|
||||||
|
// affiliation context. Empty role + empty groups → omit the
|
||||||
|
// bracket entirely (the dim summary line below carries the
|
||||||
|
// explicit "(no role / no groups)" so JSON output is unaffected
|
||||||
|
// and screen readers don't get spammed with literal "no").
|
||||||
|
const inlineTags: string[] = [];
|
||||||
|
const peerRole = p.profile?.role?.trim();
|
||||||
|
if (peerRole) inlineTags.push(`role:${peerRole}`);
|
||||||
|
if (p.groups.length) {
|
||||||
|
inlineTags.push(
|
||||||
|
...p.groups.map((g) => `@${g.name}${g.role ? `:${g.role}` : ""}`),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
const tagsStr = inlineTags.length ? " [" + inlineTags.join(", ") + "]" : "";
|
||||||
|
|
||||||
|
render.info(
|
||||||
|
`${statusDot} ${name}${selfTag}${tagsStr}${metaStr}${pubkeyTag}${sidTag}${summary}`,
|
||||||
|
);
|
||||||
|
|
||||||
|
// Second line: cwd + an explicit role/groups footer when both
|
||||||
|
// are absent. Surfacing the absence is important — the previous
|
||||||
|
// renderer hid it, so users couldn't tell "no role set" from
|
||||||
|
// "the cli isn't showing roles".
|
||||||
if (p.cwd) render.info(dim(` cwd: ${p.cwd}`));
|
if (p.cwd) render.info(dim(` cwd: ${p.cwd}`));
|
||||||
|
if (!peerRole && p.groups.length === 0) {
|
||||||
|
render.info(dim(" role: (none) groups: (none)"));
|
||||||
|
}
|
||||||
}
|
}
|
||||||
} catch (e) {
|
} catch (e) {
|
||||||
render.err(`${slug}: ${e instanceof Error ? e.message : String(e)}`);
|
render.err(`${slug}: ${e instanceof Error ? e.message : String(e)}`);
|
||||||
|
|||||||
@@ -273,6 +273,24 @@ export async function runSqlSchema(opts: Flags): Promise<number> {
|
|||||||
// ════════════════════════════════════════════════════════════════════════
|
// ════════════════════════════════════════════════════════════════════════
|
||||||
|
|
||||||
export async function runSkillList(opts: Flags & { query?: string }): Promise<number> {
|
export async function runSkillList(opts: Flags & { query?: string }): Promise<number> {
|
||||||
|
// Daemon path — preferred when running. Mirror trySendViaDaemon shape.
|
||||||
|
try {
|
||||||
|
const { tryListSkillsViaDaemon } = await import("~/services/bridge/daemon-route.js");
|
||||||
|
const dr = await tryListSkillsViaDaemon();
|
||||||
|
if (dr !== null) {
|
||||||
|
const skills = dr as Array<{ name: string; description: string; author: string; tags: string[] }>;
|
||||||
|
if (opts.json) { emitJson(skills); return EXIT.SUCCESS; }
|
||||||
|
if (skills.length === 0) { render.info(dim("(no skills)")); return EXIT.SUCCESS; }
|
||||||
|
render.section(`mesh skills (${skills.length})`);
|
||||||
|
for (const s of skills) {
|
||||||
|
process.stdout.write(` ${bold(s.name)} ${dim("· by " + s.author)}\n`);
|
||||||
|
process.stdout.write(` ${s.description}\n`);
|
||||||
|
if (s.tags?.length) process.stdout.write(` ${dim("tags: " + s.tags.join(", "))}\n`);
|
||||||
|
}
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
} catch { /* fall through to cold path */ }
|
||||||
|
|
||||||
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
||||||
const skills = await client.listSkills(opts.query);
|
const skills = await client.listSkills(opts.query);
|
||||||
if (opts.json) { emitJson(skills); return EXIT.SUCCESS; }
|
if (opts.json) { emitJson(skills); return EXIT.SUCCESS; }
|
||||||
@@ -289,6 +307,27 @@ export async function runSkillList(opts: Flags & { query?: string }): Promise<nu
|
|||||||
|
|
||||||
export async function runSkillGet(name: string, opts: Flags): Promise<number> {
|
export async function runSkillGet(name: string, opts: Flags): Promise<number> {
|
||||||
if (!name) { render.err("Usage: claudemesh skill get <name>"); return EXIT.INVALID_ARGS; }
|
if (!name) { render.err("Usage: claudemesh skill get <name>"); return EXIT.INVALID_ARGS; }
|
||||||
|
// Daemon path first.
|
||||||
|
try {
|
||||||
|
const { tryGetSkillViaDaemon } = await import("~/services/bridge/daemon-route.js");
|
||||||
|
const dr = await tryGetSkillViaDaemon(name);
|
||||||
|
if (dr !== null) {
|
||||||
|
const skill = dr as { name: string; description: string; instructions: string; tags: string[]; author: string; createdAt: string };
|
||||||
|
if (opts.json) { emitJson(skill); return EXIT.SUCCESS; }
|
||||||
|
render.section(skill.name);
|
||||||
|
render.kv([
|
||||||
|
["author", skill.author],
|
||||||
|
["created", skill.createdAt],
|
||||||
|
["tags", skill.tags?.join(", ") || dim("(none)")],
|
||||||
|
]);
|
||||||
|
render.blank();
|
||||||
|
render.info(skill.description);
|
||||||
|
render.blank();
|
||||||
|
process.stdout.write(skill.instructions + "\n");
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
} catch { /* fall through */ }
|
||||||
|
|
||||||
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
||||||
const skill = await client.getSkill(name);
|
const skill = await client.getSkill(name);
|
||||||
if (!skill) { render.err(`skill "${name}" not found`); return EXIT.NOT_FOUND; }
|
if (!skill) { render.err(`skill "${name}" not found`); return EXIT.NOT_FOUND; }
|
||||||
@@ -348,6 +387,52 @@ export async function runVaultDelete(key: string, opts: Flags): Promise<number>
|
|||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
export interface VaultSetOpts extends Flags {
|
||||||
|
entryType?: "env" | "file";
|
||||||
|
mountPath?: string;
|
||||||
|
description?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runVaultSet(key: string, value: string, opts: VaultSetOpts): Promise<number> {
|
||||||
|
if (!key || value == null) {
|
||||||
|
render.err("Usage: claudemesh vault set <key> <value> [--type env|file] [--mount /path] [--description ...]");
|
||||||
|
return EXIT.INVALID_ARGS;
|
||||||
|
}
|
||||||
|
const { encryptFile, sealKeyForPeer } = await import("~/services/crypto/file-crypto.js");
|
||||||
|
const { getMeshConfig } = await import("~/services/config/facade.js");
|
||||||
|
const { readConfig } = await import("~/services/config/facade.js");
|
||||||
|
|
||||||
|
const config = readConfig();
|
||||||
|
const slug = opts.mesh ?? (config.meshes.length === 1 ? config.meshes[0]!.slug : null);
|
||||||
|
if (!slug) {
|
||||||
|
render.err("multiple meshes joined; pass --mesh <slug>");
|
||||||
|
return EXIT.INVALID_ARGS;
|
||||||
|
}
|
||||||
|
const mesh = getMeshConfig(slug);
|
||||||
|
if (!mesh) { render.err(`not joined to mesh "${slug}"`); return EXIT.NOT_FOUND; }
|
||||||
|
|
||||||
|
const plaintext = new TextEncoder().encode(value);
|
||||||
|
const enc = await encryptFile(plaintext);
|
||||||
|
const ciphertextB64 = Buffer.from(enc.ciphertext).toString("base64");
|
||||||
|
const sealed = await sealKeyForPeer(enc.key, mesh.pubkey);
|
||||||
|
|
||||||
|
return await withMesh({ meshSlug: slug }, async (client) => {
|
||||||
|
const ok = await client.vaultSet(
|
||||||
|
key,
|
||||||
|
ciphertextB64,
|
||||||
|
enc.nonce,
|
||||||
|
sealed,
|
||||||
|
opts.entryType ?? "env",
|
||||||
|
opts.mountPath,
|
||||||
|
opts.description,
|
||||||
|
);
|
||||||
|
if (opts.json) emitJson({ key, stored: ok });
|
||||||
|
else if (ok) render.ok(`vault[${bold(key)}] stored`, dim(`(${ciphertextB64.length}b)`));
|
||||||
|
else render.err(`vault set failed for "${key}"`);
|
||||||
|
return ok ? EXIT.SUCCESS : EXIT.IO_ERROR;
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
// ════════════════════════════════════════════════════════════════════════
|
// ════════════════════════════════════════════════════════════════════════
|
||||||
// watch — URL change watchers
|
// watch — URL change watchers
|
||||||
// ════════════════════════════════════════════════════════════════════════
|
// ════════════════════════════════════════════════════════════════════════
|
||||||
@@ -368,6 +453,39 @@ export async function runWatchList(opts: Flags): Promise<number> {
|
|||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
export interface WatchAddOpts extends Flags {
|
||||||
|
label?: string;
|
||||||
|
interval?: number;
|
||||||
|
mode?: string;
|
||||||
|
extract?: string;
|
||||||
|
notifyOn?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runWatchAdd(url: string, opts: WatchAddOpts): Promise<number> {
|
||||||
|
if (!url) {
|
||||||
|
render.err("Usage: claudemesh watch add <url> [--label ...] [--interval <sec>] [--extract <css>] [--notify-on changed|always]");
|
||||||
|
return EXIT.INVALID_ARGS;
|
||||||
|
}
|
||||||
|
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
||||||
|
const result = await client.watch(url, {
|
||||||
|
label: opts.label,
|
||||||
|
interval: opts.interval,
|
||||||
|
mode: opts.mode,
|
||||||
|
extract: opts.extract,
|
||||||
|
notify_on: opts.notifyOn,
|
||||||
|
});
|
||||||
|
if (result?.error) {
|
||||||
|
if (opts.json) emitJson({ ok: false, error: result.error });
|
||||||
|
else render.err(`watch add failed: ${result.error}`);
|
||||||
|
return EXIT.IO_ERROR;
|
||||||
|
}
|
||||||
|
const id = String((result as any)?.id ?? (result as any)?.watch_id ?? "?");
|
||||||
|
if (opts.json) emitJson({ ok: true, id, url, ...(opts.label ? { label: opts.label } : {}) });
|
||||||
|
else render.ok(`watching ${clay(url)}`, dim(id.slice(0, 8)));
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
export async function runUnwatch(id: string, opts: Flags): Promise<number> {
|
export async function runUnwatch(id: string, opts: Flags): Promise<number> {
|
||||||
if (!id) { render.err("Usage: claudemesh watch remove <id>"); return EXIT.INVALID_ARGS; }
|
if (!id) { render.err("Usage: claudemesh watch remove <id>"); return EXIT.INVALID_ARGS; }
|
||||||
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
||||||
@@ -397,6 +515,28 @@ export async function runWebhookList(opts: Flags): Promise<number> {
|
|||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
|
export async function runWebhookCreate(name: string, opts: Flags): Promise<number> {
|
||||||
|
if (!name) {
|
||||||
|
render.err("Usage: claudemesh webhook create <name>");
|
||||||
|
return EXIT.INVALID_ARGS;
|
||||||
|
}
|
||||||
|
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
||||||
|
const created = await client.createWebhook(name);
|
||||||
|
if (!created) {
|
||||||
|
if (opts.json) emitJson({ ok: false, error: "create failed (timeout or duplicate)" });
|
||||||
|
else render.err(`webhook create "${name}" failed`);
|
||||||
|
return EXIT.IO_ERROR;
|
||||||
|
}
|
||||||
|
if (opts.json) emitJson({ ok: true, ...created });
|
||||||
|
else {
|
||||||
|
render.ok(`created webhook ${bold(created.name)}`);
|
||||||
|
process.stdout.write(` url: ${clay(created.url)}\n`);
|
||||||
|
process.stdout.write(` secret: ${dim(created.secret)} ${dim("(shown once)")}\n`);
|
||||||
|
}
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
export async function runWebhookDelete(name: string, opts: Flags): Promise<number> {
|
export async function runWebhookDelete(name: string, opts: Flags): Promise<number> {
|
||||||
if (!name) { render.err("Usage: claudemesh webhook delete <name>"); return EXIT.INVALID_ARGS; }
|
if (!name) { render.err("Usage: claudemesh webhook delete <name>"); return EXIT.INVALID_ARGS; }
|
||||||
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
||||||
|
|||||||
@@ -1,4 +1,5 @@
|
|||||||
import { withMesh } from "./connect.js";
|
import { withMesh } from "./connect.js";
|
||||||
|
import { tryRecallViaDaemon } from "~/services/bridge/daemon-route.js";
|
||||||
import { render } from "~/ui/render.js";
|
import { render } from "~/ui/render.js";
|
||||||
import { bold, clay, dim } from "~/ui/styles.js";
|
import { bold, clay, dim } from "~/ui/styles.js";
|
||||||
import { EXIT } from "~/constants/exit-codes.js";
|
import { EXIT } from "~/constants/exit-codes.js";
|
||||||
@@ -11,6 +12,22 @@ export async function recall(
|
|||||||
render.err("Usage: claudemesh recall <query>");
|
render.err("Usage: claudemesh recall <query>");
|
||||||
return EXIT.INVALID_ARGS;
|
return EXIT.INVALID_ARGS;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Daemon path first.
|
||||||
|
const daemonMatches = await tryRecallViaDaemon(query, opts.mesh);
|
||||||
|
if (daemonMatches !== null) {
|
||||||
|
if (opts.json) { console.log(JSON.stringify(daemonMatches, null, 2)); return EXIT.SUCCESS; }
|
||||||
|
if (daemonMatches.length === 0) { render.info(dim("no memories found.")); return EXIT.SUCCESS; }
|
||||||
|
render.section(`memories (${daemonMatches.length})`);
|
||||||
|
for (const m of daemonMatches) {
|
||||||
|
const tags = m.tags.length ? dim(` [${m.tags.map((t) => clay(t)).join(dim(", "))}]`) : "";
|
||||||
|
process.stdout.write(` ${bold(m.id.slice(0, 8))}${tags}\n`);
|
||||||
|
process.stdout.write(` ${m.content}\n`);
|
||||||
|
process.stdout.write(` ${dim(m.rememberedBy + " · " + new Date(m.rememberedAt).toLocaleString())}\n\n`);
|
||||||
|
}
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
||||||
const memories = await client.recall(query);
|
const memories = await client.recall(query);
|
||||||
|
|
||||||
|
|||||||
@@ -1,4 +1,5 @@
|
|||||||
import { withMesh } from "./connect.js";
|
import { withMesh } from "./connect.js";
|
||||||
|
import { tryRememberViaDaemon } from "~/services/bridge/daemon-route.js";
|
||||||
import { render } from "~/ui/render.js";
|
import { render } from "~/ui/render.js";
|
||||||
import { dim } from "~/ui/styles.js";
|
import { dim } from "~/ui/styles.js";
|
||||||
import { EXIT } from "~/constants/exit-codes.js";
|
import { EXIT } from "~/constants/exit-codes.js";
|
||||||
@@ -12,6 +13,18 @@ export async function remember(
|
|||||||
return EXIT.INVALID_ARGS;
|
return EXIT.INVALID_ARGS;
|
||||||
}
|
}
|
||||||
const tags = opts.tags?.split(",").map((t) => t.trim()).filter(Boolean);
|
const tags = opts.tags?.split(",").map((t) => t.trim()).filter(Boolean);
|
||||||
|
|
||||||
|
// Daemon path first.
|
||||||
|
const daemonRes = await tryRememberViaDaemon(content, tags, opts.mesh);
|
||||||
|
if (daemonRes) {
|
||||||
|
if (opts.json) {
|
||||||
|
console.log(JSON.stringify({ id: daemonRes.id, content, tags, mesh: daemonRes.mesh }));
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
render.ok("remembered", dim(daemonRes.id.slice(0, 8)));
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
return await withMesh({ meshSlug: opts.mesh ?? null }, async (client) => {
|
||||||
const id = await client.remember(content, tags);
|
const id = await client.remember(content, tags);
|
||||||
|
|
||||||
|
|||||||
@@ -1,13 +1,72 @@
|
|||||||
import { rename as renameMesh } from "~/services/mesh/facade.js";
|
/**
|
||||||
import { green, icons } from "~/ui/styles.js";
|
* `claudemesh rename <old-slug> <new-slug>` — change a mesh's identifier.
|
||||||
|
*
|
||||||
|
* v0.7.0 collapse: slug IS the identifier — there is no separate
|
||||||
|
* "display name". Pre-launch we collapsed the model so users only ever
|
||||||
|
* deal with one identifier per mesh. The mesh.name column on the DB is
|
||||||
|
* kept for now (avoids touching ~25 reader sites) but is always synced
|
||||||
|
* to slug; a follow-up migration drops it.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { reslug as reslugMesh } from "~/services/mesh/facade.js";
|
||||||
|
import { getStoredToken } from "~/services/auth/facade.js";
|
||||||
|
import { ApiError } from "~/services/api/facade.js";
|
||||||
|
import { readConfig, setMeshConfig, removeMeshConfig } from "~/services/config/facade.js";
|
||||||
|
import { bold, dim, green, icons } from "~/ui/styles.js";
|
||||||
import { EXIT } from "~/constants/exit-codes.js";
|
import { EXIT } from "~/constants/exit-codes.js";
|
||||||
|
|
||||||
export async function rename(slug: string, newName: string): Promise<number> {
|
const SLUG_RE = /^[a-z0-9][a-z0-9-]{1,31}$/;
|
||||||
|
|
||||||
|
export async function rename(oldSlug: string, newSlug: string): Promise<number> {
|
||||||
|
if (!oldSlug || !newSlug) {
|
||||||
|
console.error(` ${icons.cross} Usage: ${bold("claudemesh rename")} <old-slug> <new-slug>`);
|
||||||
|
return EXIT.INVALID_ARGS;
|
||||||
|
}
|
||||||
|
if (!SLUG_RE.test(newSlug)) {
|
||||||
|
console.error(` ${icons.cross} Invalid slug: must be 2-32 chars, lowercase alnum + hyphens, start with alnum`);
|
||||||
|
return EXIT.INVALID_ARGS;
|
||||||
|
}
|
||||||
|
if (oldSlug === newSlug) {
|
||||||
|
console.error(` ${icons.cross} Old and new slug are the same.`);
|
||||||
|
return EXIT.INVALID_ARGS;
|
||||||
|
}
|
||||||
|
|
||||||
|
const auth = getStoredToken();
|
||||||
|
if (!auth) {
|
||||||
|
console.error(` ${icons.cross} Renaming a mesh requires a claudemesh.com account session.`);
|
||||||
|
console.error(` ${dim("Run")} ${bold("claudemesh login")} ${dim("first.")}`);
|
||||||
|
return EXIT.AUTH_FAILED;
|
||||||
|
}
|
||||||
|
|
||||||
|
const cfg = readConfig();
|
||||||
|
const collision = cfg.meshes.find((m) => m.slug === newSlug && m.slug !== oldSlug);
|
||||||
|
if (collision) {
|
||||||
|
console.error(` ${icons.cross} Slug "${newSlug}" already used locally by another joined mesh.`);
|
||||||
|
console.error(` ${dim("Pick a different slug, or leave the other mesh first.")}`);
|
||||||
|
return EXIT.ALREADY_EXISTS;
|
||||||
|
}
|
||||||
|
|
||||||
try {
|
try {
|
||||||
await renameMesh(slug, newName);
|
const updated = await reslugMesh(oldSlug, newSlug);
|
||||||
console.log(` ${green(icons.check)} Renamed "${slug}" to "${newName}"`);
|
const local = cfg.meshes.find((m) => m.slug === oldSlug);
|
||||||
|
if (local) {
|
||||||
|
removeMeshConfig(oldSlug);
|
||||||
|
setMeshConfig(updated.slug, { ...local, slug: updated.slug, name: updated.slug });
|
||||||
|
}
|
||||||
|
console.log(` ${green(icons.check)} Renamed: "${oldSlug}" → "${updated.slug}"`);
|
||||||
|
console.log(` ${dim("Other peers will pick up the new identifier after they run")} ${bold("claudemesh sync")}`);
|
||||||
return EXIT.SUCCESS;
|
return EXIT.SUCCESS;
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
|
if (err instanceof ApiError) {
|
||||||
|
const body = err.body as { error?: string } | undefined;
|
||||||
|
console.error(` ${icons.cross} ${body?.error ?? err.statusText}`);
|
||||||
|
if (err.status === 401) return EXIT.AUTH_FAILED;
|
||||||
|
if (err.status === 403) return EXIT.PERMISSION_DENIED;
|
||||||
|
if (err.status === 404) return EXIT.NOT_FOUND;
|
||||||
|
if (err.status === 409) return EXIT.ALREADY_EXISTS;
|
||||||
|
if (err.status === 400) return EXIT.INVALID_ARGS;
|
||||||
|
return EXIT.INTERNAL_ERROR;
|
||||||
|
}
|
||||||
console.error(` ${icons.cross} Failed: ${err instanceof Error ? err.message : err}`);
|
console.error(` ${icons.cross} Failed: ${err instanceof Error ? err.message : err}`);
|
||||||
return EXIT.INTERNAL_ERROR;
|
return EXIT.INTERNAL_ERROR;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -13,7 +13,7 @@
|
|||||||
|
|
||||||
import { withMesh } from "./connect.js";
|
import { withMesh } from "./connect.js";
|
||||||
import { readConfig } from "~/services/config/facade.js";
|
import { readConfig } from "~/services/config/facade.js";
|
||||||
import { tryBridge } from "~/services/bridge/client.js";
|
import { trySendViaDaemon } from "~/services/bridge/daemon-route.js";
|
||||||
import type { Priority } from "~/services/broker/facade.js";
|
import type { Priority } from "~/services/broker/facade.js";
|
||||||
import { render } from "~/ui/render.js";
|
import { render } from "~/ui/render.js";
|
||||||
import { dim } from "~/ui/styles.js";
|
import { dim } from "~/ui/styles.js";
|
||||||
@@ -22,6 +22,11 @@ export interface SendFlags {
|
|||||||
mesh?: string;
|
mesh?: string;
|
||||||
priority?: string;
|
priority?: string;
|
||||||
json?: boolean;
|
json?: boolean;
|
||||||
|
/** Allow sending to a target that resolves to one of the caller's
|
||||||
|
* own sessions. Off by default — trying to message your own
|
||||||
|
* sibling session is almost always an accident (copying a hex
|
||||||
|
* pubkey from `peer list` without realizing it was your own row). */
|
||||||
|
self?: boolean;
|
||||||
}
|
}
|
||||||
|
|
||||||
export async function runSend(flags: SendFlags, to: string, message: string): Promise<void> {
|
export async function runSend(flags: SendFlags, to: string, message: string): Promise<void> {
|
||||||
@@ -42,31 +47,166 @@ export async function runSend(flags: SendFlags, to: string, message: string): Pr
|
|||||||
flags.mesh ??
|
flags.mesh ??
|
||||||
(config.meshes.length === 1 ? config.meshes[0]!.slug : null);
|
(config.meshes.length === 1 ? config.meshes[0]!.slug : null);
|
||||||
|
|
||||||
// Warm path — only when mesh is unambiguous.
|
// 1.31.6: hex-prefix resolution. If `to` looks like hex but isn't a
|
||||||
if (meshSlug) {
|
// full 64-char pubkey, resolve it against the peer list and replace
|
||||||
const bridged = await tryBridge(meshSlug, "send", { to, message, priority });
|
// it with the matching full pubkey. The broker stores `targetSpec`
|
||||||
if (bridged !== null) {
|
// verbatim and the drain query at apps/broker/src/broker.ts:2408
|
||||||
if (bridged.ok) {
|
// matches only on full pubkeys, so a 16-hex prefix would queue
|
||||||
const r = bridged.result as { messageId?: string };
|
// successfully but never fetch — sender saw "sent", recipient saw
|
||||||
if (flags.json) {
|
// nothing. Resolving here makes the CLI's prefix UX work end-to-end
|
||||||
console.log(JSON.stringify({ ok: true, messageId: r.messageId, target: to }));
|
// and surfaces ambiguous / unmatched prefixes with a clear error
|
||||||
} else {
|
// instead of a silent drop.
|
||||||
render.ok(`sent to ${to}`, r.messageId ? dim(r.messageId.slice(0, 8)) : undefined);
|
if (
|
||||||
}
|
!to.startsWith("@") &&
|
||||||
return;
|
!to.startsWith("#") &&
|
||||||
|
to !== "*" &&
|
||||||
|
/^[0-9a-f]{4,63}$/i.test(to)
|
||||||
|
) {
|
||||||
|
try {
|
||||||
|
const { tryListPeersViaDaemon } = await import("~/services/bridge/daemon-route.js");
|
||||||
|
const peers = (await tryListPeersViaDaemon()) ?? [];
|
||||||
|
const lower = to.toLowerCase();
|
||||||
|
const matches = peers.filter((p) => {
|
||||||
|
const pk = (p as { pubkey?: string }).pubkey ?? "";
|
||||||
|
const mpk = (p as { memberPubkey?: string }).memberPubkey ?? "";
|
||||||
|
return pk.toLowerCase().startsWith(lower) || mpk.toLowerCase().startsWith(lower);
|
||||||
|
});
|
||||||
|
if (matches.length === 0) {
|
||||||
|
render.err(`No peer matches hex prefix "${to}".`);
|
||||||
|
const names = peers
|
||||||
|
.map((p) => (p as { displayName?: string }).displayName)
|
||||||
|
.filter(Boolean)
|
||||||
|
.join(", ");
|
||||||
|
if (names) render.hint(`online: ${names}`);
|
||||||
|
process.exit(1);
|
||||||
}
|
}
|
||||||
// Bridge reachable but op failed — surface error, don't fall through.
|
if (matches.length > 1) {
|
||||||
if (flags.json) {
|
const candidates = matches
|
||||||
console.log(JSON.stringify({ ok: false, error: bridged.error }));
|
.map((p) => {
|
||||||
} else {
|
const pk = (p as { pubkey?: string }).pubkey ?? "";
|
||||||
render.err(`send failed: ${bridged.error}`);
|
const dn = (p as { displayName?: string }).displayName ?? "?";
|
||||||
|
return `${dn} ${pk.slice(0, 16)}…`;
|
||||||
|
})
|
||||||
|
.join(", ");
|
||||||
|
render.err(`Ambiguous hex prefix "${to}" — matches ${matches.length} peers.`);
|
||||||
|
render.hint(`candidates: ${candidates}`);
|
||||||
|
render.hint("Use a longer prefix or paste the full 64-char pubkey.");
|
||||||
|
process.exit(1);
|
||||||
}
|
}
|
||||||
process.exit(1);
|
to = (matches[0] as { pubkey?: string }).pubkey ?? to;
|
||||||
|
} catch {
|
||||||
|
// Daemon unreachable — fall through; cold path will try a name
|
||||||
|
// lookup and surface its own error if that also fails.
|
||||||
}
|
}
|
||||||
// bridged === null → bridge unreachable, fall through to cold path
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// Cold path
|
// Self-DM safety check: if target is a 64-char hex that matches the
|
||||||
|
// caller's own member pubkey, refuse without --self. Catches the
|
||||||
|
// common pasted-from-peer-list-not-realizing-it-was-mine footgun.
|
||||||
|
// With --self, member-pubkey targeting fans out to every connected
|
||||||
|
// sibling session of your member (the broker's drain only matches
|
||||||
|
// exact session pubkeys, so we resolve here in the CLI).
|
||||||
|
if (meshSlug) {
|
||||||
|
const joined = config.meshes.find((m) => m.slug === meshSlug);
|
||||||
|
const isOwnMemberKey =
|
||||||
|
joined && /^[0-9a-f]{64}$/i.test(to) && to.toLowerCase() === joined.pubkey.toLowerCase();
|
||||||
|
|
||||||
|
if (isOwnMemberKey && !flags.self) {
|
||||||
|
render.err(
|
||||||
|
`Target "${to.slice(0, 16)}…" is your own member pubkey on mesh "${meshSlug}".`,
|
||||||
|
);
|
||||||
|
render.hint(
|
||||||
|
"Pass --self to message a sibling session of your own member, or pick a different peer's pubkey.",
|
||||||
|
);
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (isOwnMemberKey && flags.self) {
|
||||||
|
// Member-pubkey fan-out: resolve to every connected sibling
|
||||||
|
// session pubkey and send one message per recipient. Required
|
||||||
|
// because the broker's drain query at apps/broker/src/broker.ts
|
||||||
|
// matches target_spec only against full session pubkeys —
|
||||||
|
// sending to a member pubkey would queue successfully but no
|
||||||
|
// drain would fetch.
|
||||||
|
try {
|
||||||
|
const { tryListPeersViaDaemon } = await import("~/services/bridge/daemon-route.js");
|
||||||
|
const { getSessionInfo } = await import("~/services/session/resolve.js");
|
||||||
|
const peers = (await tryListPeersViaDaemon()) ?? [];
|
||||||
|
const session = await getSessionInfo();
|
||||||
|
const ownSessionPk = session?.presence?.sessionPubkey?.toLowerCase();
|
||||||
|
const siblings = peers.filter((p) => {
|
||||||
|
const r = p as { memberPubkey?: string; pubkey?: string; channel?: string };
|
||||||
|
if (!r.pubkey) return false;
|
||||||
|
if (ownSessionPk && r.pubkey.toLowerCase() === ownSessionPk) return false;
|
||||||
|
if (r.channel === "claudemesh-daemon") return false;
|
||||||
|
return r.memberPubkey?.toLowerCase() === to.toLowerCase();
|
||||||
|
});
|
||||||
|
if (siblings.length === 0) {
|
||||||
|
render.err(`--self fan-out: no other sibling sessions of your member online.`);
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
const results: Array<{ pubkey: string; ok: boolean; messageId?: string; error?: string }> = [];
|
||||||
|
for (const peer of siblings) {
|
||||||
|
const pk = (peer as { pubkey: string }).pubkey;
|
||||||
|
const dr = await trySendViaDaemon({ to: pk, message, priority, expectedMesh: meshSlug ?? undefined });
|
||||||
|
if (dr === null) {
|
||||||
|
results.push({ pubkey: pk, ok: false, error: "daemon path unavailable" });
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
if (dr.ok) {
|
||||||
|
results.push({
|
||||||
|
pubkey: pk,
|
||||||
|
ok: true,
|
||||||
|
...(dr.messageId ? { messageId: dr.messageId } : {}),
|
||||||
|
});
|
||||||
|
} else {
|
||||||
|
results.push({ pubkey: pk, ok: false, error: dr.error });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
const okCount = results.filter((r) => r.ok).length;
|
||||||
|
if (flags.json) {
|
||||||
|
console.log(JSON.stringify({ ok: okCount > 0, fanout: results, via: "daemon" }));
|
||||||
|
} else if (okCount === results.length) {
|
||||||
|
render.ok(`fanned out to ${okCount} sibling session${okCount === 1 ? "" : "s"} (daemon)`);
|
||||||
|
for (const r of results) render.info(dim(` → ${r.pubkey.slice(0, 16)}… ${r.messageId ? dim(r.messageId.slice(0, 8)) : ""}`));
|
||||||
|
} else {
|
||||||
|
render.warn(`fanned out: ${okCount}/${results.length} delivered`);
|
||||||
|
for (const r of results) {
|
||||||
|
const tag = r.ok ? "✔" : "✘";
|
||||||
|
render.info(` ${tag} ${r.pubkey.slice(0, 16)}… ${r.error ? dim(`— ${r.error}`) : ""}`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
} catch (e) {
|
||||||
|
render.err(`--self fan-out failed: ${e instanceof Error ? e.message : String(e)}`);
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Daemon path — preferred when a long-lived daemon is local. UDS at
|
||||||
|
// ~/.claudemesh/daemon/daemon.sock; ~1ms round-trip; persists outbox
|
||||||
|
// across CLI invocations so a `claudemesh send` survives a daemon
|
||||||
|
// crash via the on-disk outbox.
|
||||||
|
{
|
||||||
|
const dr = await trySendViaDaemon({ to, message, priority, expectedMesh: meshSlug ?? undefined });
|
||||||
|
if (dr !== null) {
|
||||||
|
if (dr.ok) {
|
||||||
|
if (flags.json) console.log(JSON.stringify({ ok: true, messageId: dr.messageId, target: to, via: "daemon", duplicate: !!dr.duplicate }));
|
||||||
|
else render.ok(`sent to ${to} (daemon)`, dr.messageId ? dim(dr.messageId.slice(0, 8)) : undefined);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
// Daemon answered but rejected (409 idempotency, 400 schema). Surface; do not fall through.
|
||||||
|
if (flags.json) console.log(JSON.stringify({ ok: false, error: dr.error, via: "daemon" }));
|
||||||
|
else render.err(`send failed (daemon): ${dr.error}`);
|
||||||
|
process.exit(1);
|
||||||
|
}
|
||||||
|
// dr === null → daemon not running and lifecycle couldn't auto-
|
||||||
|
// spawn it; fall through to cold path. The orphaned bridge tier
|
||||||
|
// was removed in 1.28.0.
|
||||||
|
}
|
||||||
|
|
||||||
|
// Cold path — open our own WS, encrypt locally, fire envelope.
|
||||||
await withMesh({ meshSlug: flags.mesh ?? null }, async (client) => {
|
await withMesh({ meshSlug: flags.mesh ?? null }, async (client) => {
|
||||||
let targetSpec = to;
|
let targetSpec = to;
|
||||||
if (to.startsWith("#") && !/^#[0-9a-z_-]{20,}$/i.test(to)) {
|
if (to.startsWith("#") && !/^#[0-9a-z_-]{20,}$/i.test(to)) {
|
||||||
|
|||||||
21
apps/cli/src/commands/skill.ts
Normal file
21
apps/cli/src/commands/skill.ts
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
/**
|
||||||
|
* `claudemesh skill` — print the bundled SKILL.md to stdout.
|
||||||
|
*
|
||||||
|
* Zero-install access: the skill is embedded into the binary at build
|
||||||
|
* time via Bun's text-import attribute, so a fresh `npm i -g` user
|
||||||
|
* (or someone running the prebuilt binary) can pipe the contents into
|
||||||
|
* Claude Code (or anywhere else) without copying files into
|
||||||
|
* ~/.claude/skills.
|
||||||
|
*
|
||||||
|
* claudemesh skill | claude --skill-add -
|
||||||
|
* claudemesh skill > /tmp/cm.md
|
||||||
|
*/
|
||||||
|
|
||||||
|
import skillContent from "../../skills/claudemesh/SKILL.md" with { type: "text" };
|
||||||
|
import { EXIT } from "~/constants/exit-codes.js";
|
||||||
|
|
||||||
|
export async function runSkill(): Promise<number> {
|
||||||
|
process.stdout.write(skillContent);
|
||||||
|
if (!skillContent.endsWith("\n")) process.stdout.write("\n");
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
@@ -5,6 +5,7 @@
|
|||||||
*/
|
*/
|
||||||
|
|
||||||
import { withMesh } from "./connect.js";
|
import { withMesh } from "./connect.js";
|
||||||
|
import { tryGetStateViaDaemon, tryListStateViaDaemon, trySetStateViaDaemon } from "~/services/bridge/daemon-route.js";
|
||||||
import { render } from "~/ui/render.js";
|
import { render } from "~/ui/render.js";
|
||||||
import { bold, dim } from "~/ui/styles.js";
|
import { bold, dim } from "~/ui/styles.js";
|
||||||
|
|
||||||
@@ -14,6 +15,16 @@ export interface StateFlags {
|
|||||||
}
|
}
|
||||||
|
|
||||||
export async function runStateGet(flags: StateFlags, key: string): Promise<void> {
|
export async function runStateGet(flags: StateFlags, key: string): Promise<void> {
|
||||||
|
// Daemon path first.
|
||||||
|
const daemonEntry = await tryGetStateViaDaemon(key, flags.mesh);
|
||||||
|
if (daemonEntry !== null) {
|
||||||
|
if (!daemonEntry) { render.info(dim("(not set)")); return; }
|
||||||
|
if (flags.json) { console.log(JSON.stringify(daemonEntry, null, 2)); return; }
|
||||||
|
const val = typeof daemonEntry.value === "string" ? daemonEntry.value : JSON.stringify(daemonEntry.value);
|
||||||
|
render.info(val);
|
||||||
|
render.info(dim(` set by ${daemonEntry.updatedBy} at ${new Date(daemonEntry.updatedAt).toLocaleString()}`));
|
||||||
|
return;
|
||||||
|
}
|
||||||
await withMesh({ meshSlug: flags.mesh ?? null }, async (client) => {
|
await withMesh({ meshSlug: flags.mesh ?? null }, async (client) => {
|
||||||
const entry = await client.getState(key);
|
const entry = await client.getState(key);
|
||||||
if (!entry) {
|
if (!entry) {
|
||||||
@@ -38,6 +49,12 @@ export async function runStateSet(flags: StateFlags, key: string, value: string)
|
|||||||
parsed = value;
|
parsed = value;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Daemon path first.
|
||||||
|
const daemonOk = await trySetStateViaDaemon(key, parsed, flags.mesh);
|
||||||
|
if (daemonOk) {
|
||||||
|
render.ok(`${bold(key)} = ${JSON.stringify(parsed)}`);
|
||||||
|
return;
|
||||||
|
}
|
||||||
await withMesh({ meshSlug: flags.mesh ?? null }, async (client) => {
|
await withMesh({ meshSlug: flags.mesh ?? null }, async (client) => {
|
||||||
await client.setState(key, parsed);
|
await client.setState(key, parsed);
|
||||||
render.ok(`${bold(key)} = ${JSON.stringify(parsed)}`);
|
render.ok(`${bold(key)} = ${JSON.stringify(parsed)}`);
|
||||||
@@ -45,6 +62,19 @@ export async function runStateSet(flags: StateFlags, key: string, value: string)
|
|||||||
}
|
}
|
||||||
|
|
||||||
export async function runStateList(flags: StateFlags): Promise<void> {
|
export async function runStateList(flags: StateFlags): Promise<void> {
|
||||||
|
// Daemon path first.
|
||||||
|
const daemonRows = await tryListStateViaDaemon(flags.mesh);
|
||||||
|
if (daemonRows !== null) {
|
||||||
|
if (flags.json) { console.log(JSON.stringify(daemonRows, null, 2)); return; }
|
||||||
|
if (daemonRows.length === 0) { render.info(dim("(no state)")); return; }
|
||||||
|
render.section(`state (${daemonRows.length})`);
|
||||||
|
for (const e of daemonRows) {
|
||||||
|
const val = typeof e.value === "string" ? e.value : JSON.stringify(e.value);
|
||||||
|
process.stdout.write(` ${bold(e.key)}: ${val}\n`);
|
||||||
|
process.stdout.write(` ${dim(e.updatedBy + " · " + new Date(e.updatedAt).toLocaleString())}\n`);
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
await withMesh({ meshSlug: flags.mesh ?? null }, async (client, mesh) => {
|
await withMesh({ meshSlug: flags.mesh ?? null }, async (client, mesh) => {
|
||||||
const entries = await client.listState();
|
const entries = await client.listState();
|
||||||
|
|
||||||
|
|||||||
167
apps/cli/src/commands/topic-post.ts
Normal file
167
apps/cli/src/commands/topic-post.ts
Normal file
@@ -0,0 +1,167 @@
|
|||||||
|
/**
|
||||||
|
* `claudemesh topic post <name> <message>` — REST-encrypted send.
|
||||||
|
*
|
||||||
|
* Distinct from `claudemesh topic send` (WS-based, currently v1
|
||||||
|
* plaintext). This verb:
|
||||||
|
* 1. Mints an ephemeral REST apikey scoped to the topic.
|
||||||
|
* 2. Fetches + decrypts the topic key (crypto_box).
|
||||||
|
* 3. Encrypts the body with crypto_secretbox under the topic key.
|
||||||
|
* 4. POSTs body_version: 2 ciphertext to /api/v1/messages.
|
||||||
|
* 5. Revokes the apikey.
|
||||||
|
*
|
||||||
|
* If the topic doesn't yet have a sealed key for this member (404
|
||||||
|
* not_sealed) we surface a clear error and skip — the user must wait
|
||||||
|
* for a holder to re-seal.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { withRestKey } from "~/services/api/with-rest-key.js";
|
||||||
|
import { request } from "~/services/api/client.js";
|
||||||
|
import {
|
||||||
|
getTopicKey,
|
||||||
|
encryptMessage,
|
||||||
|
} from "~/services/crypto/topic-key.js";
|
||||||
|
import { render } from "~/ui/render.js";
|
||||||
|
import { clay, dim, green } from "~/ui/styles.js";
|
||||||
|
import { EXIT } from "~/constants/exit-codes.js";
|
||||||
|
|
||||||
|
export interface TopicPostFlags {
|
||||||
|
mesh?: string;
|
||||||
|
json?: boolean;
|
||||||
|
/** Force v1 plaintext send even if the topic is encrypted. */
|
||||||
|
plaintext?: boolean;
|
||||||
|
/** Reply-to message id (full or 8+ char prefix). */
|
||||||
|
replyTo?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
interface PostResponse {
|
||||||
|
messageId: string | null;
|
||||||
|
historyId: string | null;
|
||||||
|
topic: string;
|
||||||
|
topicId: string;
|
||||||
|
notifications: number;
|
||||||
|
replyToId?: string | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runTopicPost(
|
||||||
|
topicName: string,
|
||||||
|
message: string,
|
||||||
|
flags: TopicPostFlags,
|
||||||
|
): Promise<number> {
|
||||||
|
if (!topicName || !message) {
|
||||||
|
render.err("Usage: claudemesh topic post <topic> <message>");
|
||||||
|
return EXIT.INVALID_ARGS;
|
||||||
|
}
|
||||||
|
const cleanName = topicName.replace(/^#/, "");
|
||||||
|
|
||||||
|
// Extract @-mention tokens for write-time fan-out so the server can
|
||||||
|
// populate notifications without reading ciphertext.
|
||||||
|
const mentions: string[] = [];
|
||||||
|
const mentionRe = /(^|[^A-Za-z0-9_-])@([A-Za-z0-9_-]{1,64})(?=$|[^A-Za-z0-9_-])/g;
|
||||||
|
let m: RegExpExecArray | null;
|
||||||
|
while ((m = mentionRe.exec(message)) !== null) {
|
||||||
|
mentions.push(m[2]!.toLowerCase());
|
||||||
|
if (mentions.length >= 16) break;
|
||||||
|
}
|
||||||
|
|
||||||
|
return withRestKey(
|
||||||
|
{
|
||||||
|
meshSlug: flags.mesh ?? null,
|
||||||
|
purpose: `post-${cleanName}`,
|
||||||
|
capabilities: ["read", "send"],
|
||||||
|
topicScopes: [cleanName],
|
||||||
|
},
|
||||||
|
async ({ secret, mesh }) => {
|
||||||
|
let bodyVersion: 1 | 2 = 1;
|
||||||
|
let ciphertext: string;
|
||||||
|
let nonce: string;
|
||||||
|
|
||||||
|
if (flags.plaintext) {
|
||||||
|
// Explicit v1: caller wants plaintext. Encode UTF-8 → base64.
|
||||||
|
ciphertext = Buffer.from(message, "utf-8").toString("base64");
|
||||||
|
nonce = Buffer.from(new Uint8Array(24)).toString("base64");
|
||||||
|
} else {
|
||||||
|
const keyResult = await getTopicKey({
|
||||||
|
apiKeySecret: secret,
|
||||||
|
memberSecretKeyHex: mesh.secretKey,
|
||||||
|
topicName: cleanName,
|
||||||
|
});
|
||||||
|
if (keyResult.ok && keyResult.topicKey) {
|
||||||
|
const enc = await encryptMessage(keyResult.topicKey, message);
|
||||||
|
ciphertext = enc.ciphertext;
|
||||||
|
nonce = enc.nonce;
|
||||||
|
bodyVersion = 2;
|
||||||
|
} else if (keyResult.error === "topic_unencrypted") {
|
||||||
|
// Legacy v0.2.0 topic — fall back to v1 plaintext.
|
||||||
|
ciphertext = Buffer.from(message, "utf-8").toString("base64");
|
||||||
|
nonce = Buffer.from(new Uint8Array(24)).toString("base64");
|
||||||
|
} else {
|
||||||
|
render.err(
|
||||||
|
`cannot encrypt for #${cleanName}: ${keyResult.error ?? "unknown"}${
|
||||||
|
keyResult.message ? " — " + keyResult.message : ""
|
||||||
|
}`,
|
||||||
|
);
|
||||||
|
return EXIT.INTERNAL_ERROR;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Resolve reply-to: accept full id or 8+ char prefix by querying recent
|
||||||
|
// history once and matching. Server validates same-topic membership.
|
||||||
|
let replyToId: string | undefined;
|
||||||
|
if (flags.replyTo) {
|
||||||
|
if (flags.replyTo.length >= 16) {
|
||||||
|
replyToId = flags.replyTo;
|
||||||
|
} else if (flags.replyTo.length >= 6) {
|
||||||
|
const recent = await request<{
|
||||||
|
messages: Array<{ id: string }>;
|
||||||
|
}>({
|
||||||
|
path: `/api/v1/topics/${encodeURIComponent(cleanName)}/messages?limit=200`,
|
||||||
|
method: "GET",
|
||||||
|
token: secret,
|
||||||
|
});
|
||||||
|
const hit = recent.messages?.find((r) =>
|
||||||
|
r.id.startsWith(flags.replyTo!),
|
||||||
|
);
|
||||||
|
if (!hit) {
|
||||||
|
render.err(
|
||||||
|
`--reply-to ${flags.replyTo}: no recent message id starts with that prefix`,
|
||||||
|
);
|
||||||
|
return EXIT.INVALID_ARGS;
|
||||||
|
}
|
||||||
|
replyToId = hit.id;
|
||||||
|
} else {
|
||||||
|
render.err("--reply-to needs at least 6 characters of the message id");
|
||||||
|
return EXIT.INVALID_ARGS;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
const result = await request<PostResponse>({
|
||||||
|
path: "/api/v1/messages",
|
||||||
|
method: "POST",
|
||||||
|
token: secret,
|
||||||
|
body: {
|
||||||
|
topic: cleanName,
|
||||||
|
ciphertext,
|
||||||
|
nonce,
|
||||||
|
bodyVersion,
|
||||||
|
...(mentions.length > 0 ? { mentions } : {}),
|
||||||
|
...(replyToId ? { replyToId } : {}),
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
if (flags.json) {
|
||||||
|
console.log(JSON.stringify({ ...result, bodyVersion, mentions }));
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
}
|
||||||
|
|
||||||
|
const versionTag = bodyVersion === 2 ? green("🔒 v2") : dim("v1");
|
||||||
|
const replyTag = result.replyToId
|
||||||
|
? ` ${dim("↳ " + result.replyToId.slice(0, 8))}`
|
||||||
|
: "";
|
||||||
|
render.ok(
|
||||||
|
"posted",
|
||||||
|
`${clay("#" + cleanName)} ${versionTag}${replyTag} ${dim(`(${result.notifications} mentions)`)}`,
|
||||||
|
);
|
||||||
|
return EXIT.SUCCESS;
|
||||||
|
},
|
||||||
|
);
|
||||||
|
}
|
||||||
@@ -8,8 +8,13 @@
|
|||||||
import { URLS } from "~/constants/urls.js";
|
import { URLS } from "~/constants/urls.js";
|
||||||
import { withRestKey } from "~/services/api/with-rest-key.js";
|
import { withRestKey } from "~/services/api/with-rest-key.js";
|
||||||
import { request } from "~/services/api/client.js";
|
import { request } from "~/services/api/client.js";
|
||||||
|
import {
|
||||||
|
getTopicKey,
|
||||||
|
decryptMessage,
|
||||||
|
sealTopicKeyFor,
|
||||||
|
} from "~/services/crypto/topic-key.js";
|
||||||
import { render } from "~/ui/render.js";
|
import { render } from "~/ui/render.js";
|
||||||
import { bold, clay, dim } from "~/ui/styles.js";
|
import { bold, clay, dim, yellow } from "~/ui/styles.js";
|
||||||
import { EXIT } from "~/constants/exit-codes.js";
|
import { EXIT } from "~/constants/exit-codes.js";
|
||||||
|
|
||||||
export interface TopicTailFlags {
|
export interface TopicTailFlags {
|
||||||
@@ -22,20 +27,45 @@ export interface TopicTailFlags {
|
|||||||
|
|
||||||
interface TopicMessage {
|
interface TopicMessage {
|
||||||
id: string;
|
id: string;
|
||||||
|
senderMemberId?: string;
|
||||||
senderPubkey: string;
|
senderPubkey: string;
|
||||||
senderName: string;
|
senderName: string;
|
||||||
nonce: string;
|
nonce: string;
|
||||||
ciphertext: string;
|
ciphertext: string;
|
||||||
|
bodyVersion?: number;
|
||||||
|
replyToId?: string | null;
|
||||||
createdAt: string;
|
createdAt: string;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
/** Bounded recent-message cache used to render reply-context lines. */
|
||||||
|
type RenderedSnippet = { name: string; snippet: string };
|
||||||
|
const RECENT_CACHE_MAX = 256;
|
||||||
|
function rememberRendered(
|
||||||
|
cache: Map<string, RenderedSnippet>,
|
||||||
|
m: TopicMessage,
|
||||||
|
text: string,
|
||||||
|
): void {
|
||||||
|
cache.set(m.id, {
|
||||||
|
name: m.senderName || m.senderPubkey.slice(0, 8),
|
||||||
|
snippet: text.replace(/\s+/g, " ").slice(0, 60),
|
||||||
|
});
|
||||||
|
if (cache.size > RECENT_CACHE_MAX) {
|
||||||
|
const firstKey = cache.keys().next().value;
|
||||||
|
if (firstKey) cache.delete(firstKey);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
interface HistoryResponse {
|
interface HistoryResponse {
|
||||||
topic: string;
|
topic: string;
|
||||||
topicId: string;
|
topicId: string;
|
||||||
messages: TopicMessage[];
|
messages: TopicMessage[];
|
||||||
}
|
}
|
||||||
|
|
||||||
function decodeCiphertext(b64: string): string {
|
/**
|
||||||
|
* v1 (legacy plaintext-base64) decode. v2 messages are decrypted via
|
||||||
|
* the topic key separately — see decryptForRender below.
|
||||||
|
*/
|
||||||
|
function decodeV1(b64: string): string {
|
||||||
try {
|
try {
|
||||||
return Buffer.from(b64, "base64").toString("utf-8");
|
return Buffer.from(b64, "base64").toString("utf-8");
|
||||||
} catch {
|
} catch {
|
||||||
@@ -43,6 +73,16 @@ function decodeCiphertext(b64: string): string {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
async function decryptForRender(
|
||||||
|
m: TopicMessage,
|
||||||
|
topicKey: Uint8Array | null,
|
||||||
|
): Promise<string> {
|
||||||
|
if ((m.bodyVersion ?? 1) === 1) return decodeV1(m.ciphertext);
|
||||||
|
if (!topicKey) return "[encrypted — no topic key]";
|
||||||
|
const plain = await decryptMessage(topicKey, m.ciphertext, m.nonce);
|
||||||
|
return plain ?? "[decrypt failed]";
|
||||||
|
}
|
||||||
|
|
||||||
function fmtTime(iso: string): string {
|
function fmtTime(iso: string): string {
|
||||||
try {
|
try {
|
||||||
return new Date(iso).toLocaleTimeString([], {
|
return new Date(iso).toLocaleTimeString([], {
|
||||||
@@ -55,15 +95,31 @@ function fmtTime(iso: string): string {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
function printMessage(m: TopicMessage, json: boolean): void {
|
async function printMessage(
|
||||||
const text = decodeCiphertext(m.ciphertext);
|
m: TopicMessage,
|
||||||
|
topicKey: Uint8Array | null,
|
||||||
|
json: boolean,
|
||||||
|
cache: Map<string, RenderedSnippet>,
|
||||||
|
): Promise<void> {
|
||||||
|
const text = await decryptForRender(m, topicKey);
|
||||||
if (json) {
|
if (json) {
|
||||||
console.log(JSON.stringify({ ...m, message: text }));
|
console.log(JSON.stringify({ ...m, message: text }));
|
||||||
|
rememberRendered(cache, m, text);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
const v2Marker = (m.bodyVersion ?? 1) === 2 ? dim("🔒 ") : "";
|
||||||
|
if (m.replyToId) {
|
||||||
|
const parent = cache.get(m.replyToId);
|
||||||
|
const ref = parent
|
||||||
|
? `${parent.name}: "${parent.snippet}${parent.snippet.length === 60 ? "…" : ""}"`
|
||||||
|
: `${m.replyToId.slice(0, 8)}…`;
|
||||||
|
process.stdout.write(` ${dim("↳ in reply to " + ref)}\n`);
|
||||||
|
}
|
||||||
|
const idTag = dim(`#${m.id.slice(0, 8)}`);
|
||||||
process.stdout.write(
|
process.stdout.write(
|
||||||
` ${dim(fmtTime(m.createdAt))} ${bold(m.senderName || m.senderPubkey.slice(0, 8))} ${text}\n`,
|
` ${dim(fmtTime(m.createdAt))} ${bold(m.senderName || m.senderPubkey.slice(0, 8))} ${idTag} ${v2Marker}${text}\n`,
|
||||||
);
|
);
|
||||||
|
rememberRendered(cache, m, text);
|
||||||
}
|
}
|
||||||
|
|
||||||
interface SseEvent {
|
interface SseEvent {
|
||||||
@@ -118,7 +174,90 @@ export async function runTopicTail(name: string, flags: TopicTailFlags): Promise
|
|||||||
capabilities: ["read"],
|
capabilities: ["read"],
|
||||||
topicScopes: [cleanName],
|
topicScopes: [cleanName],
|
||||||
},
|
},
|
||||||
async ({ secret, meshSlug }) => {
|
async ({ secret, meshSlug, mesh }) => {
|
||||||
|
// Fetch + decrypt the topic key once. Stays in memory for this
|
||||||
|
// invocation; tail dies → key forgotten. v1 topics return
|
||||||
|
// not_sealed/topic_unencrypted and we just don't decrypt.
|
||||||
|
const keyResult = await getTopicKey({
|
||||||
|
apiKeySecret: secret,
|
||||||
|
memberSecretKeyHex: mesh.secretKey,
|
||||||
|
topicName: cleanName,
|
||||||
|
});
|
||||||
|
const topicKey = keyResult.ok ? keyResult.topicKey ?? null : null;
|
||||||
|
const snippetCache = new Map<string, RenderedSnippet>();
|
||||||
|
|
||||||
|
// Re-seal background loop. While we hold the topic key, every
|
||||||
|
// 30s we look for newly-joined members who don't have a sealed
|
||||||
|
// copy yet, seal the key for each, and POST. Soft-failures stay
|
||||||
|
// silent so a flaky network doesn't spam the tail output.
|
||||||
|
let resealTimer: ReturnType<typeof setInterval> | null = null;
|
||||||
|
if (topicKey) {
|
||||||
|
const reseal = async () => {
|
||||||
|
try {
|
||||||
|
const pending = await request<{
|
||||||
|
pending: Array<{
|
||||||
|
memberId: string;
|
||||||
|
pubkey: string;
|
||||||
|
displayName: string;
|
||||||
|
}>;
|
||||||
|
}>({
|
||||||
|
path: `/api/v1/topics/${encodeURIComponent(cleanName)}/pending-seals`,
|
||||||
|
token: secret,
|
||||||
|
});
|
||||||
|
for (const target of pending.pending) {
|
||||||
|
const sealed = await sealTopicKeyFor(
|
||||||
|
topicKey,
|
||||||
|
target.pubkey,
|
||||||
|
mesh.secretKey,
|
||||||
|
);
|
||||||
|
if (!sealed) continue;
|
||||||
|
try {
|
||||||
|
await request({
|
||||||
|
path: `/api/v1/topics/${encodeURIComponent(cleanName)}/seal`,
|
||||||
|
method: "POST",
|
||||||
|
token: secret,
|
||||||
|
body: {
|
||||||
|
memberId: target.memberId,
|
||||||
|
encryptedKey: sealed.encryptedKey,
|
||||||
|
nonce: sealed.nonce,
|
||||||
|
},
|
||||||
|
});
|
||||||
|
if (!flags.json) {
|
||||||
|
render.info(
|
||||||
|
dim(`re-sealed topic key for ${target.displayName}`),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
// Another holder likely sealed first — ignore.
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
// Soft-fail; next tick retries.
|
||||||
|
}
|
||||||
|
};
|
||||||
|
void reseal();
|
||||||
|
resealTimer = setInterval(reseal, 30_000);
|
||||||
|
}
|
||||||
|
if (!flags.json && !keyResult.ok) {
|
||||||
|
if (keyResult.error === "topic_unencrypted") {
|
||||||
|
render.info(
|
||||||
|
dim("topic is on v1 (plaintext) — encryption will activate after creator-seal"),
|
||||||
|
);
|
||||||
|
} else if (keyResult.error === "not_sealed") {
|
||||||
|
render.warn(
|
||||||
|
yellow(
|
||||||
|
"no topic key sealed for you yet — wait for a holder to re-seal",
|
||||||
|
),
|
||||||
|
);
|
||||||
|
} else if (keyResult.error === "decrypt_failed") {
|
||||||
|
render.warn(
|
||||||
|
yellow(
|
||||||
|
`topic key fetched but decrypt failed: ${keyResult.message ?? ""}`,
|
||||||
|
),
|
||||||
|
);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// 1. Backfill the most recent N messages so the user sees context
|
// 1. Backfill the most recent N messages so the user sees context
|
||||||
// when they tail an active topic.
|
// when they tail an active topic.
|
||||||
if (!flags.forwardOnly && limit > 0) {
|
if (!flags.forwardOnly && limit > 0) {
|
||||||
@@ -134,7 +273,7 @@ export async function runTopicTail(name: string, flags: TopicTailFlags): Promise
|
|||||||
}
|
}
|
||||||
// History is newest-first; reverse for chronological display.
|
// History is newest-first; reverse for chronological display.
|
||||||
for (const m of history.messages.slice().reverse()) {
|
for (const m of history.messages.slice().reverse()) {
|
||||||
printMessage(m, flags.json ?? false);
|
await printMessage(m, topicKey, flags.json ?? false, snippetCache);
|
||||||
}
|
}
|
||||||
} catch (err) {
|
} catch (err) {
|
||||||
render.warn(`backfill failed: ${(err as Error).message}`);
|
render.warn(`backfill failed: ${(err as Error).message}`);
|
||||||
@@ -176,7 +315,7 @@ export async function runTopicTail(name: string, flags: TopicTailFlags): Promise
|
|||||||
if (ev.event === "message") {
|
if (ev.event === "message") {
|
||||||
try {
|
try {
|
||||||
const m = JSON.parse(ev.data) as TopicMessage;
|
const m = JSON.parse(ev.data) as TopicMessage;
|
||||||
printMessage(m, flags.json ?? false);
|
await printMessage(m, topicKey, flags.json ?? false, snippetCache);
|
||||||
} catch {
|
} catch {
|
||||||
// skip malformed
|
// skip malformed
|
||||||
}
|
}
|
||||||
@@ -190,6 +329,7 @@ export async function runTopicTail(name: string, flags: TopicTailFlags): Promise
|
|||||||
} finally {
|
} finally {
|
||||||
process.removeListener("SIGINT", onSig);
|
process.removeListener("SIGINT", onSig);
|
||||||
process.removeListener("SIGTERM", onSig);
|
process.removeListener("SIGTERM", onSig);
|
||||||
|
if (resealTimer) clearInterval(resealTimer);
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
);
|
);
|
||||||
|
|||||||
@@ -1,25 +1,51 @@
|
|||||||
import { whoAmI } from "~/services/auth/facade.js";
|
import { whoAmI } from "~/services/auth/facade.js";
|
||||||
|
import { getSessionInfo } from "~/services/session/resolve.js";
|
||||||
import { render } from "~/ui/render.js";
|
import { render } from "~/ui/render.js";
|
||||||
import { bold, clay, dim } from "~/ui/styles.js";
|
import { bold, clay, dim, yellow } from "~/ui/styles.js";
|
||||||
import { EXIT } from "~/constants/exit-codes.js";
|
import { EXIT } from "~/constants/exit-codes.js";
|
||||||
|
|
||||||
export async function whoami(opts: { json?: boolean }): Promise<number> {
|
export async function whoami(opts: { json?: boolean }): Promise<number> {
|
||||||
const result = await whoAmI();
|
const result = await whoAmI();
|
||||||
|
// 1.32.0+: surface the calling session's identity when whoami is run
|
||||||
|
// from inside a `claudemesh launch`-spawned shell. Previously the
|
||||||
|
// command only reported web sign-in + local mesh memberships, and a
|
||||||
|
// launched session had to dig env vars + parse config.json to figure
|
||||||
|
// out its own session pubkey.
|
||||||
|
const session = await getSessionInfo();
|
||||||
|
|
||||||
if (opts.json) {
|
if (opts.json) {
|
||||||
console.log(JSON.stringify({ schema_version: "1.0", ...result }, null, 2));
|
console.log(JSON.stringify({ schema_version: "1.0", ...result, session }, null, 2));
|
||||||
return result.signed_in || result.local ? EXIT.SUCCESS : EXIT.AUTH_FAILED;
|
return result.signed_in || result.local || session ? EXIT.SUCCESS : EXIT.AUTH_FAILED;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Show whatever we have. Both the web session and the local mesh
|
// Show whatever we have. Web session, local mesh config, and the
|
||||||
// config are independent surfaces of identity; suppress sections that
|
// launched-session identity are three independent surfaces.
|
||||||
// are empty.
|
if (!result.signed_in && !result.local && !session) {
|
||||||
if (!result.signed_in && !result.local) {
|
|
||||||
render.err("Not signed in", "Run `claudemesh login` to sign in or `claudemesh <invite>` to join.");
|
render.err("Not signed in", "Run `claudemesh login` to sign in or `claudemesh <invite>` to join.");
|
||||||
return EXIT.AUTH_FAILED;
|
return EXIT.AUTH_FAILED;
|
||||||
}
|
}
|
||||||
|
|
||||||
render.section("whoami");
|
render.section("whoami");
|
||||||
|
|
||||||
|
if (session) {
|
||||||
|
const sessionPk = session.presence?.sessionPubkey;
|
||||||
|
const groups = (session.groups ?? []).join(", ") || dim("(none)");
|
||||||
|
render.kv([
|
||||||
|
["this session", `${yellow(session.displayName)} on ${bold(session.mesh)}`],
|
||||||
|
["session id", dim(session.sessionId)],
|
||||||
|
...(sessionPk
|
||||||
|
? [["session pubkey", dim(`${sessionPk.slice(0, 16)}… (full: ${sessionPk})`)] as [string, string]]
|
||||||
|
: []),
|
||||||
|
...(session.role
|
||||||
|
? [["role", session.role] as [string, string]]
|
||||||
|
: []),
|
||||||
|
["groups", groups],
|
||||||
|
...(session.cwd ? [["cwd", dim(session.cwd)] as [string, string]] : []),
|
||||||
|
["pid", String(session.pid)],
|
||||||
|
]);
|
||||||
|
render.blank();
|
||||||
|
}
|
||||||
|
|
||||||
if (result.signed_in) {
|
if (result.signed_in) {
|
||||||
render.kv([
|
render.kv([
|
||||||
["user", `${bold(result.user!.display_name)} ${dim(`(${result.user!.email})`)}`],
|
["user", `${bold(result.user!.display_name)} ${dim(`(${result.user!.email})`)}`],
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ export const EXIT = {
|
|||||||
PERMISSION_DENIED: 7,
|
PERMISSION_DENIED: 7,
|
||||||
INTERNAL_ERROR: 8,
|
INTERNAL_ERROR: 8,
|
||||||
CLAUDE_MISSING: 9,
|
CLAUDE_MISSING: 9,
|
||||||
|
IO_ERROR: 10,
|
||||||
} as const;
|
} as const;
|
||||||
|
|
||||||
export type ExitCode = (typeof EXIT)[keyof typeof EXIT];
|
export type ExitCode = (typeof EXIT)[keyof typeof EXIT];
|
||||||
|
|||||||
@@ -1,10 +1,82 @@
|
|||||||
|
import { existsSync } from "node:fs";
|
||||||
import { homedir } from "node:os";
|
import { homedir } from "node:os";
|
||||||
import { join } from "node:path";
|
import { join } from "node:path";
|
||||||
|
|
||||||
const home = homedir();
|
const home = homedir();
|
||||||
|
const DEFAULT_CONFIG_DIR = join(home, ".claudemesh");
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Resolve `CONFIG_DIR` once, with stale-env detection.
|
||||||
|
*
|
||||||
|
* `claudemesh launch` exposes `CLAUDEMESH_CONFIG_DIR=<tmpdir>` to its
|
||||||
|
* spawned `claude` so the per-session mesh selection is isolated from
|
||||||
|
* `~/.claudemesh/config.json`. The tmpdir is rmSync'd on launch exit.
|
||||||
|
*
|
||||||
|
* Footgun: if a `claudemesh` invocation INHERITS that env from an
|
||||||
|
* already-launched (or previously-launched) session — e.g. a Bash tool
|
||||||
|
* call inside Claude Code, or a tmux pane that captured the env via
|
||||||
|
* `update-environment` — the inherited path may point at a tmpdir that
|
||||||
|
* no longer exists. Pre-1.34.14 we silently used the dead path,
|
||||||
|
* `readConfig()` came back empty, and the user saw "No meshes joined"
|
||||||
|
* from an otherwise-working install.
|
||||||
|
*
|
||||||
|
* Resolution rules:
|
||||||
|
* 1. No env var → `~/.claudemesh` (default).
|
||||||
|
* 2. Env points at a dir containing `config.json` → trust it
|
||||||
|
* (the legitimate per-session-launch case).
|
||||||
|
* 3. Env set but stale (dir missing or no `config.json`) → warn
|
||||||
|
* once on stderr (TTY-only) and fall back to `~/.claudemesh`.
|
||||||
|
*
|
||||||
|
* Memoized: resolves once on first access. Mid-process env mutations
|
||||||
|
* are intentionally ignored — paths must stay stable across one CLI
|
||||||
|
* invocation.
|
||||||
|
*/
|
||||||
|
let _resolvedConfigDir: string | null = null;
|
||||||
|
let _warnedStaleEnv = false;
|
||||||
|
|
||||||
|
function resolveConfigDir(): string {
|
||||||
|
if (_resolvedConfigDir !== null) return _resolvedConfigDir;
|
||||||
|
const envDir = process.env.CLAUDEMESH_CONFIG_DIR;
|
||||||
|
if (!envDir) {
|
||||||
|
_resolvedConfigDir = DEFAULT_CONFIG_DIR;
|
||||||
|
return DEFAULT_CONFIG_DIR;
|
||||||
|
}
|
||||||
|
// Trust the env when it resolves to a real directory. We check
|
||||||
|
// the DIR (not `config.json`) because the legitimate "fresh launch
|
||||||
|
// before any write" case has the dir but no config.json yet.
|
||||||
|
// The stale signature we want to catch is `rmSync(tmpDir,
|
||||||
|
// {recursive: true})` from the outer launch's cleanup — that
|
||||||
|
// removes the directory entirely, so a missing dir is the
|
||||||
|
// unambiguous "stale" signal.
|
||||||
|
if (existsSync(envDir)) {
|
||||||
|
_resolvedConfigDir = envDir;
|
||||||
|
return envDir;
|
||||||
|
}
|
||||||
|
// Stale: env set but the dir is gone. Most likely the outer
|
||||||
|
// launch's cleanup ran and we inherited its (now-dead) tmpdir
|
||||||
|
// path. Fall back to default and warn the user once on stderr —
|
||||||
|
// only when attached to a TTY, so non-interactive callers (CI,
|
||||||
|
// MCP boot, scripts piping stdout) stay quiet.
|
||||||
|
if (!_warnedStaleEnv && process.stderr.isTTY) {
|
||||||
|
_warnedStaleEnv = true;
|
||||||
|
const unsetHint =
|
||||||
|
process.env.SHELL?.endsWith("fish")
|
||||||
|
? "set -e CLAUDEMESH_CONFIG_DIR CLAUDEMESH_IPC_TOKEN_FILE"
|
||||||
|
: "unset CLAUDEMESH_CONFIG_DIR CLAUDEMESH_IPC_TOKEN_FILE";
|
||||||
|
process.stderr.write(
|
||||||
|
`claudemesh: ignoring stale CLAUDEMESH_CONFIG_DIR=${envDir} (no config.json there); using ${DEFAULT_CONFIG_DIR}.\n`
|
||||||
|
+ ` Hint: this is usually a leftover env from a previous \`claudemesh launch\`. Clean it with:\n`
|
||||||
|
+ ` ${unsetHint}\n`,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
_resolvedConfigDir = DEFAULT_CONFIG_DIR;
|
||||||
|
return DEFAULT_CONFIG_DIR;
|
||||||
|
}
|
||||||
|
|
||||||
export const PATHS = {
|
export const PATHS = {
|
||||||
CONFIG_DIR: process.env.CLAUDEMESH_CONFIG_DIR || join(home, ".claudemesh"),
|
get CONFIG_DIR() {
|
||||||
|
return resolveConfigDir();
|
||||||
|
},
|
||||||
get CONFIG_FILE() {
|
get CONFIG_FILE() {
|
||||||
return join(this.CONFIG_DIR, "config.json");
|
return join(this.CONFIG_DIR, "config.json");
|
||||||
},
|
},
|
||||||
@@ -20,3 +92,12 @@ export const PATHS = {
|
|||||||
CLAUDE_JSON: join(home, ".claude.json"),
|
CLAUDE_JSON: join(home, ".claude.json"),
|
||||||
CLAUDE_SETTINGS: join(home, ".claude", "settings.json"),
|
CLAUDE_SETTINGS: join(home, ".claude", "settings.json"),
|
||||||
} as const;
|
} as const;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Test-only: reset the memoized resolution. Not exported from the
|
||||||
|
* package barrel; reach in via the relative path from a test file.
|
||||||
|
*/
|
||||||
|
export function _resetPathsForTest(): void {
|
||||||
|
_resolvedConfigDir = null;
|
||||||
|
_warnedStaleEnv = false;
|
||||||
|
}
|
||||||
|
|||||||
503
apps/cli/src/daemon/broker.ts
Normal file
503
apps/cli/src/daemon/broker.ts
Normal file
@@ -0,0 +1,503 @@
|
|||||||
|
// Minimal broker WS connector for the daemon. Reuses the existing CLI
|
||||||
|
// hello-sign protocol so it speaks the wire current brokers understand.
|
||||||
|
//
|
||||||
|
// Differences from BrokerClient (services/broker/ws-client.ts):
|
||||||
|
// - Slim: no in-memory pending-sends queue, no list_peers/state/topic
|
||||||
|
// RPCs. The daemon's outbox is the source of truth.
|
||||||
|
// - Wire envelope adds `client_message_id` (broker may ignore in legacy
|
||||||
|
// mode; Sprint 7 promotes it to authoritative dedupe).
|
||||||
|
// - Reconnect with exponential backoff, signaled to the drain worker.
|
||||||
|
//
|
||||||
|
// 2026-05-04: lifecycle (connect / hello-ack / close-reconnect) now
|
||||||
|
// lives in `ws-lifecycle.ts`. This class supplies the daemon-WS hello
|
||||||
|
// content and routes incoming RPC replies / pushes; the helper handles
|
||||||
|
// the rest. The hello no longer carries an ephemeral `sessionPubkey` —
|
||||||
|
// session-targeted DMs land on the per-session WS (SessionBrokerClient)
|
||||||
|
// since 1.32.1, so this socket only needs the member identity.
|
||||||
|
|
||||||
|
import type { JoinedMesh } from "~/services/config/facade.js";
|
||||||
|
import { signHello } from "~/services/broker/hello-sig.js";
|
||||||
|
import { connectWsWithBackoff, type WsLifecycle, type WsStatus } from "./ws-lifecycle.js";
|
||||||
|
|
||||||
|
export type ConnStatus = WsStatus;
|
||||||
|
|
||||||
|
export interface BrokerSendArgs {
|
||||||
|
/** Target as the broker expects it: peer name | pubkey | @group | * | topic. */
|
||||||
|
targetSpec: string;
|
||||||
|
priority: "now" | "next" | "low";
|
||||||
|
nonce: string;
|
||||||
|
ciphertext: string;
|
||||||
|
/** Daemon-issued idempotency id. Echoed back by the broker for dedupe. */
|
||||||
|
client_message_id: string;
|
||||||
|
/** Sha256-32 fingerprint of the request, hex. Forwarded for Sprint 7 dedupe. */
|
||||||
|
request_fingerprint_hex: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export type BrokerSendResult =
|
||||||
|
| { ok: true; messageId: string }
|
||||||
|
| { ok: false; error: string; permanent: boolean };
|
||||||
|
|
||||||
|
interface PendingAck {
|
||||||
|
resolve: (r: BrokerSendResult) => void;
|
||||||
|
timer: NodeJS.Timeout;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface PeerSummary {
|
||||||
|
pubkey: string;
|
||||||
|
memberPubkey?: string;
|
||||||
|
displayName: string;
|
||||||
|
status: string;
|
||||||
|
summary: string | null;
|
||||||
|
groups: Array<{ name: string; role?: string }>;
|
||||||
|
sessionId: string;
|
||||||
|
connectedAt: string;
|
||||||
|
cwd?: string;
|
||||||
|
hostname?: string;
|
||||||
|
peerType?: string;
|
||||||
|
channel?: string;
|
||||||
|
/** Broker-side classification, added 2026-05-04. Missing in older brokers. */
|
||||||
|
role?: "control-plane" | "session" | "service";
|
||||||
|
}
|
||||||
|
|
||||||
|
interface PendingPeerList {
|
||||||
|
resolve: (peers: PeerSummary[]) => void;
|
||||||
|
timer: NodeJS.Timeout;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface SkillSummary {
|
||||||
|
name: string;
|
||||||
|
description: string;
|
||||||
|
tags: string[];
|
||||||
|
author: string;
|
||||||
|
createdAt: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface SkillFull extends SkillSummary {
|
||||||
|
instructions: string;
|
||||||
|
manifest?: unknown;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface StateRow {
|
||||||
|
key: string;
|
||||||
|
value: unknown;
|
||||||
|
updatedBy: string;
|
||||||
|
updatedAt: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface MemoryRow {
|
||||||
|
id: string;
|
||||||
|
content: string;
|
||||||
|
tags: string[];
|
||||||
|
rememberedBy: string;
|
||||||
|
rememberedAt: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
const SEND_ACK_TIMEOUT_MS = 15_000;
|
||||||
|
|
||||||
|
export interface DaemonBrokerOptions {
|
||||||
|
displayName?: string;
|
||||||
|
onStatusChange?: (s: ConnStatus) => void;
|
||||||
|
onPush?: (msg: Record<string, unknown>) => void;
|
||||||
|
log?: (level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) => void;
|
||||||
|
}
|
||||||
|
|
||||||
|
export class DaemonBrokerClient {
|
||||||
|
private lifecycle: WsLifecycle | null = null;
|
||||||
|
private _status: ConnStatus = "closed";
|
||||||
|
private closed = false;
|
||||||
|
private pendingAcks = new Map<string, PendingAck>();
|
||||||
|
private peerListResolvers = new Map<string, PendingPeerList>();
|
||||||
|
private skillListResolvers = new Map<string, { resolve: (rows: SkillSummary[]) => void; timer: NodeJS.Timeout }>();
|
||||||
|
private skillDataResolvers = new Map<string, { resolve: (row: SkillFull | null) => void; timer: NodeJS.Timeout }>();
|
||||||
|
private stateGetResolvers = new Map<string, { resolve: (row: StateRow | null) => void; timer: NodeJS.Timeout }>();
|
||||||
|
private stateListResolvers = new Map<string, { resolve: (rows: StateRow[]) => void; timer: NodeJS.Timeout }>();
|
||||||
|
private memoryStoreResolvers = new Map<string, { resolve: (id: string | null) => void; timer: NodeJS.Timeout }>();
|
||||||
|
private memoryRecallResolvers = new Map<string, { resolve: (rows: MemoryRow[]) => void; timer: NodeJS.Timeout }>();
|
||||||
|
private opens: Array<() => void> = [];
|
||||||
|
private reqCounter = 0;
|
||||||
|
|
||||||
|
constructor(private mesh: JoinedMesh, private opts: DaemonBrokerOptions = {}) {}
|
||||||
|
|
||||||
|
get status(): ConnStatus { return this._status; }
|
||||||
|
get meshSlug(): string { return this.mesh.slug; }
|
||||||
|
get meshId(): string { return this.mesh.meshId; }
|
||||||
|
|
||||||
|
private log = (level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) => {
|
||||||
|
(this.opts.log ?? defaultLog)(level, msg, { mesh: this.mesh.slug, ...meta });
|
||||||
|
};
|
||||||
|
|
||||||
|
/** Open the WS, run the hello handshake, resolve once the broker accepts. */
|
||||||
|
async connect(): Promise<void> {
|
||||||
|
if (this.closed) throw new Error("client_closed");
|
||||||
|
if (this._status === "connecting" || this._status === "open") return;
|
||||||
|
|
||||||
|
this.lifecycle = await connectWsWithBackoff({
|
||||||
|
url: this.mesh.brokerUrl,
|
||||||
|
buildHello: async () => {
|
||||||
|
const { timestamp, signature } = await signHello(
|
||||||
|
this.mesh.meshId, this.mesh.memberId, this.mesh.pubkey, this.mesh.secretKey,
|
||||||
|
);
|
||||||
|
return {
|
||||||
|
type: "hello",
|
||||||
|
meshId: this.mesh.meshId,
|
||||||
|
memberId: this.mesh.memberId,
|
||||||
|
pubkey: this.mesh.pubkey,
|
||||||
|
// No `sessionPubkey` — daemon-WS is member-keyed only. The
|
||||||
|
// per-session presence WS (SessionBrokerClient) carries the
|
||||||
|
// ephemeral session pubkey. Spec §"Layer 1: Identity → Member identity".
|
||||||
|
displayName: this.opts.displayName,
|
||||||
|
sessionId: `daemon-${process.pid}`,
|
||||||
|
pid: process.pid,
|
||||||
|
cwd: process.cwd(),
|
||||||
|
hostname: require("node:os").hostname(),
|
||||||
|
peerType: "ai" as const,
|
||||||
|
channel: "claudemesh-daemon",
|
||||||
|
timestamp,
|
||||||
|
signature,
|
||||||
|
};
|
||||||
|
},
|
||||||
|
isHelloAck: (msg) => msg.type === "hello_ack",
|
||||||
|
onMessage: (msg) => this.handleMessage(msg),
|
||||||
|
onStatusChange: (s) => {
|
||||||
|
this._status = s;
|
||||||
|
this.opts.onStatusChange?.(s);
|
||||||
|
if (s === "open") {
|
||||||
|
// Flush deferred openers (drain worker, etc.).
|
||||||
|
const queued = this.opens.slice();
|
||||||
|
this.opens.length = 0;
|
||||||
|
for (const fn of queued) {
|
||||||
|
try { fn(); } catch (e) { this.log("warn", "open_handler_failed", { err: String(e) }); }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
onBeforeReconnect: (code) => this.failPendingAcks(`broker_disconnected_${code}`),
|
||||||
|
log: (level, msg, meta) => this.log(level, `broker_${msg}`, meta),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
private handleMessage(msg: Record<string, unknown>): void {
|
||||||
|
if (msg.type === "ack") {
|
||||||
|
// Broker shape: { type: "ack", id, messageId, queued, error? }
|
||||||
|
const id = String(msg.id ?? "");
|
||||||
|
const ack = this.pendingAcks.get(id);
|
||||||
|
if (ack) {
|
||||||
|
this.pendingAcks.delete(id);
|
||||||
|
clearTimeout(ack.timer);
|
||||||
|
if (typeof msg.error === "string" && msg.error.length > 0) {
|
||||||
|
ack.resolve({ ok: false, error: msg.error, permanent: classifyPermanent(msg.error) });
|
||||||
|
} else {
|
||||||
|
ack.resolve({ ok: true, messageId: String(msg.messageId ?? id) });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type === "peers_list") {
|
||||||
|
const reqId = String(msg._reqId ?? "");
|
||||||
|
const pending = this.peerListResolvers.get(reqId);
|
||||||
|
if (pending) {
|
||||||
|
this.peerListResolvers.delete(reqId);
|
||||||
|
clearTimeout(pending.timer);
|
||||||
|
pending.resolve(Array.isArray(msg.peers) ? (msg.peers as PeerSummary[]) : []);
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type === "skill_list") {
|
||||||
|
const reqId = String(msg._reqId ?? "");
|
||||||
|
const pending = this.skillListResolvers.get(reqId);
|
||||||
|
if (pending) {
|
||||||
|
this.skillListResolvers.delete(reqId);
|
||||||
|
clearTimeout(pending.timer);
|
||||||
|
pending.resolve(Array.isArray(msg.skills) ? (msg.skills as SkillSummary[]) : []);
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type === "skill_data") {
|
||||||
|
const reqId = String(msg._reqId ?? "");
|
||||||
|
const pending = this.skillDataResolvers.get(reqId);
|
||||||
|
if (pending) {
|
||||||
|
this.skillDataResolvers.delete(reqId);
|
||||||
|
clearTimeout(pending.timer);
|
||||||
|
pending.resolve((msg.skill as SkillFull) ?? null);
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type === "state_value" || msg.type === "state_data") {
|
||||||
|
const reqId = String(msg._reqId ?? "");
|
||||||
|
const pending = this.stateGetResolvers.get(reqId);
|
||||||
|
if (pending) {
|
||||||
|
this.stateGetResolvers.delete(reqId);
|
||||||
|
clearTimeout(pending.timer);
|
||||||
|
pending.resolve((msg.state ?? msg.row ?? null) as StateRow | null);
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type === "state_list") {
|
||||||
|
const reqId = String(msg._reqId ?? "");
|
||||||
|
const pending = this.stateListResolvers.get(reqId);
|
||||||
|
if (pending) {
|
||||||
|
this.stateListResolvers.delete(reqId);
|
||||||
|
clearTimeout(pending.timer);
|
||||||
|
pending.resolve(Array.isArray(msg.entries) ? (msg.entries as StateRow[]) : []);
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type === "memory_stored") {
|
||||||
|
const reqId = String(msg._reqId ?? "");
|
||||||
|
const pending = this.memoryStoreResolvers.get(reqId);
|
||||||
|
if (pending) {
|
||||||
|
this.memoryStoreResolvers.delete(reqId);
|
||||||
|
clearTimeout(pending.timer);
|
||||||
|
pending.resolve(typeof msg.memoryId === "string" ? msg.memoryId : null);
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type === "memory_recall_result") {
|
||||||
|
const reqId = String(msg._reqId ?? "");
|
||||||
|
const pending = this.memoryRecallResolvers.get(reqId);
|
||||||
|
if (pending) {
|
||||||
|
this.memoryRecallResolvers.delete(reqId);
|
||||||
|
clearTimeout(pending.timer);
|
||||||
|
pending.resolve(Array.isArray(msg.matches) ? (msg.matches as MemoryRow[]) : []);
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type === "push" || msg.type === "inbound") {
|
||||||
|
this.opts.onPush?.(msg);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/** True when underlying socket is OPEN-ready for direct sends. */
|
||||||
|
private isOpen(): boolean {
|
||||||
|
const sock = this.lifecycle?.ws;
|
||||||
|
return !!sock && sock.readyState === sock.OPEN;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** v2 agentic-comms (M1): send `client_ack` back to the broker after
|
||||||
|
* successfully landing an inbound push in inbox.db. Broker uses the
|
||||||
|
* ack to set `delivered_at` (atomic at-least-once). Best-effort —
|
||||||
|
* if the WS isn't open, drop the ack; broker's 30s lease will
|
||||||
|
* re-deliver. */
|
||||||
|
sendClientAck(clientMessageId: string, brokerMessageId: string | null): void {
|
||||||
|
if (!this.isOpen()) return;
|
||||||
|
try {
|
||||||
|
this.lifecycle!.send({
|
||||||
|
type: "client_ack",
|
||||||
|
clientMessageId,
|
||||||
|
...(brokerMessageId ? { brokerMessageId } : {}),
|
||||||
|
});
|
||||||
|
} catch { /* drop; lease re-delivers */ }
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Send one outbox row. Resolves on broker ack/timeout. */
|
||||||
|
send(req: BrokerSendArgs): Promise<BrokerSendResult> {
|
||||||
|
return new Promise<BrokerSendResult>((resolve) => {
|
||||||
|
const dispatch = () => {
|
||||||
|
if (!this.isOpen()) {
|
||||||
|
resolve({ ok: false, error: "broker_not_open", permanent: false });
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
const id = req.client_message_id;
|
||||||
|
const timer = setTimeout(() => {
|
||||||
|
if (this.pendingAcks.delete(id)) {
|
||||||
|
resolve({ ok: false, error: "ack_timeout", permanent: false });
|
||||||
|
}
|
||||||
|
}, SEND_ACK_TIMEOUT_MS);
|
||||||
|
this.pendingAcks.set(id, { resolve, timer });
|
||||||
|
try {
|
||||||
|
this.lifecycle!.send({
|
||||||
|
type: "send",
|
||||||
|
id, // legacy correlation id
|
||||||
|
client_message_id: id, // forward-compat per spec §4.2
|
||||||
|
request_fingerprint: req.request_fingerprint_hex,
|
||||||
|
targetSpec: req.targetSpec,
|
||||||
|
priority: req.priority,
|
||||||
|
nonce: req.nonce,
|
||||||
|
ciphertext: req.ciphertext,
|
||||||
|
});
|
||||||
|
} catch (e) {
|
||||||
|
this.pendingAcks.delete(id);
|
||||||
|
clearTimeout(timer);
|
||||||
|
resolve({ ok: false, error: `ws_write_failed: ${String(e)}`, permanent: false });
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
if (this._status === "open") dispatch();
|
||||||
|
else this.opens.push(dispatch);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Ask the broker for the current peer list. */
|
||||||
|
async listPeers(timeoutMs = 5_000): Promise<PeerSummary[]> {
|
||||||
|
if (this._status !== "open" || !this.lifecycle) return [];
|
||||||
|
return new Promise<PeerSummary[]>((resolve) => {
|
||||||
|
const reqId = `pl-${++this.reqCounter}`;
|
||||||
|
const timer = setTimeout(() => {
|
||||||
|
if (this.peerListResolvers.delete(reqId)) resolve([]);
|
||||||
|
}, timeoutMs);
|
||||||
|
this.peerListResolvers.set(reqId, { resolve, timer });
|
||||||
|
try { this.lifecycle!.send({ type: "list_peers", _reqId: reqId }); }
|
||||||
|
catch { this.peerListResolvers.delete(reqId); clearTimeout(timer); resolve([]); }
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/** List mesh-published skills. Empty array on disconnect / timeout. */
|
||||||
|
async listSkills(query?: string, timeoutMs = 5_000): Promise<SkillSummary[]> {
|
||||||
|
if (this._status !== "open" || !this.lifecycle) return [];
|
||||||
|
return new Promise<SkillSummary[]>((resolve) => {
|
||||||
|
const reqId = `sl-${++this.reqCounter}`;
|
||||||
|
const timer = setTimeout(() => {
|
||||||
|
if (this.skillListResolvers.delete(reqId)) resolve([]);
|
||||||
|
}, timeoutMs);
|
||||||
|
this.skillListResolvers.set(reqId, { resolve, timer });
|
||||||
|
try { this.lifecycle!.send({ type: "list_skills", query, _reqId: reqId }); }
|
||||||
|
catch { this.skillListResolvers.delete(reqId); clearTimeout(timer); resolve([]); }
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Fetch one skill's full body. Null on not-found / disconnect / timeout. */
|
||||||
|
async getSkill(name: string, timeoutMs = 5_000): Promise<SkillFull | null> {
|
||||||
|
if (this._status !== "open" || !this.lifecycle) return null;
|
||||||
|
return new Promise<SkillFull | null>((resolve) => {
|
||||||
|
const reqId = `sg-${++this.reqCounter}`;
|
||||||
|
const timer = setTimeout(() => {
|
||||||
|
if (this.skillDataResolvers.delete(reqId)) resolve(null);
|
||||||
|
}, timeoutMs);
|
||||||
|
this.skillDataResolvers.set(reqId, { resolve, timer });
|
||||||
|
try { this.lifecycle!.send({ type: "get_skill", name, _reqId: reqId }); }
|
||||||
|
catch { this.skillDataResolvers.delete(reqId); clearTimeout(timer); resolve(null); }
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Read a single shared state row. Null on disconnect / timeout / not-found. */
|
||||||
|
async getState(key: string, timeoutMs = 5_000): Promise<StateRow | null> {
|
||||||
|
if (this._status !== "open" || !this.lifecycle) return null;
|
||||||
|
return new Promise<StateRow | null>((resolve) => {
|
||||||
|
const reqId = `sg-${++this.reqCounter}`;
|
||||||
|
const timer = setTimeout(() => {
|
||||||
|
if (this.stateGetResolvers.delete(reqId)) resolve(null);
|
||||||
|
}, timeoutMs);
|
||||||
|
this.stateGetResolvers.set(reqId, { resolve, timer });
|
||||||
|
try { this.lifecycle!.send({ type: "get_state", key, _reqId: reqId }); }
|
||||||
|
catch { this.stateGetResolvers.delete(reqId); clearTimeout(timer); resolve(null); }
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/** List all shared state rows in the mesh. */
|
||||||
|
async listState(timeoutMs = 5_000): Promise<StateRow[]> {
|
||||||
|
if (this._status !== "open" || !this.lifecycle) return [];
|
||||||
|
return new Promise<StateRow[]>((resolve) => {
|
||||||
|
const reqId = `sl-${++this.reqCounter}`;
|
||||||
|
const timer = setTimeout(() => {
|
||||||
|
if (this.stateListResolvers.delete(reqId)) resolve([]);
|
||||||
|
}, timeoutMs);
|
||||||
|
this.stateListResolvers.set(reqId, { resolve, timer });
|
||||||
|
try { this.lifecycle!.send({ type: "list_state", _reqId: reqId }); }
|
||||||
|
catch { this.stateListResolvers.delete(reqId); clearTimeout(timer); resolve([]); }
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Set a shared state value. Fire-and-forget. */
|
||||||
|
setState(key: string, value: unknown): void {
|
||||||
|
if (this._status !== "open" || !this.lifecycle) return;
|
||||||
|
try { this.lifecycle.send({ type: "set_state", key, value }); }
|
||||||
|
catch { /* ignore */ }
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Store a memory in the mesh. Returns the assigned id, or null on timeout. */
|
||||||
|
async remember(content: string, tags?: string[], timeoutMs = 5_000): Promise<string | null> {
|
||||||
|
if (this._status !== "open" || !this.lifecycle) return null;
|
||||||
|
return new Promise<string | null>((resolve) => {
|
||||||
|
const reqId = `mr-${++this.reqCounter}`;
|
||||||
|
const timer = setTimeout(() => {
|
||||||
|
if (this.memoryStoreResolvers.delete(reqId)) resolve(null);
|
||||||
|
}, timeoutMs);
|
||||||
|
this.memoryStoreResolvers.set(reqId, { resolve, timer });
|
||||||
|
try { this.lifecycle!.send({ type: "remember", content, tags, _reqId: reqId }); }
|
||||||
|
catch { this.memoryStoreResolvers.delete(reqId); clearTimeout(timer); resolve(null); }
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Search memories by relevance. */
|
||||||
|
async recall(query: string, timeoutMs = 5_000): Promise<MemoryRow[]> {
|
||||||
|
if (this._status !== "open" || !this.lifecycle) return [];
|
||||||
|
return new Promise<MemoryRow[]>((resolve) => {
|
||||||
|
const reqId = `mc-${++this.reqCounter}`;
|
||||||
|
const timer = setTimeout(() => {
|
||||||
|
if (this.memoryRecallResolvers.delete(reqId)) resolve([]);
|
||||||
|
}, timeoutMs);
|
||||||
|
this.memoryRecallResolvers.set(reqId, { resolve, timer });
|
||||||
|
try { this.lifecycle!.send({ type: "recall", query, _reqId: reqId }); }
|
||||||
|
catch { this.memoryRecallResolvers.delete(reqId); clearTimeout(timer); resolve([]); }
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Forget a memory by id. Fire-and-forget. */
|
||||||
|
forget(memoryId: string): void {
|
||||||
|
if (this._status !== "open" || !this.lifecycle) return;
|
||||||
|
try { this.lifecycle.send({ type: "forget", memoryId }); }
|
||||||
|
catch { /* ignore */ }
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Set the daemon's profile (avatar/title/bio/capabilities). Fire-and-forget. */
|
||||||
|
setProfile(profile: { avatar?: string; title?: string; bio?: string; capabilities?: string[] }): void {
|
||||||
|
if (this._status !== "open" || !this.lifecycle) return;
|
||||||
|
try { this.lifecycle.send({ type: "set_profile", ...profile }); }
|
||||||
|
catch { /* ignore */ }
|
||||||
|
}
|
||||||
|
|
||||||
|
setSummary(summary: string): void {
|
||||||
|
if (this._status !== "open" || !this.lifecycle) return;
|
||||||
|
try { this.lifecycle.send({ type: "set_summary", summary }); }
|
||||||
|
catch { /* ignore */ }
|
||||||
|
}
|
||||||
|
|
||||||
|
setStatus(status: "idle" | "working" | "dnd"): void {
|
||||||
|
if (this._status !== "open" || !this.lifecycle) return;
|
||||||
|
try { this.lifecycle.send({ type: "set_status", status }); }
|
||||||
|
catch { /* ignore */ }
|
||||||
|
}
|
||||||
|
|
||||||
|
setVisible(visible: boolean): void {
|
||||||
|
if (this._status !== "open" || !this.lifecycle) return;
|
||||||
|
try { this.lifecycle.send({ type: "set_visible", visible }); }
|
||||||
|
catch { /* ignore */ }
|
||||||
|
}
|
||||||
|
|
||||||
|
async close(): Promise<void> {
|
||||||
|
this.closed = true;
|
||||||
|
this.failPendingAcks("daemon_shutdown");
|
||||||
|
if (this.lifecycle) {
|
||||||
|
try { await this.lifecycle.close(); } catch { /* ignore */ }
|
||||||
|
this.lifecycle = null;
|
||||||
|
}
|
||||||
|
this._status = "closed";
|
||||||
|
}
|
||||||
|
|
||||||
|
private failPendingAcks(reason: string) {
|
||||||
|
for (const [id, ack] of this.pendingAcks) {
|
||||||
|
clearTimeout(ack.timer);
|
||||||
|
ack.resolve({ ok: false, error: reason, permanent: false });
|
||||||
|
this.pendingAcks.delete(id);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function defaultLog(level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) {
|
||||||
|
const line = JSON.stringify({ level, msg, ...meta, ts: new Date().toISOString() });
|
||||||
|
if (level === "info") process.stdout.write(line + "\n");
|
||||||
|
else process.stderr.write(line + "\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Heuristic: which broker errors are unrecoverable for this id. */
|
||||||
|
function classifyPermanent(err: string): boolean {
|
||||||
|
return /payload_too_large|forbidden|not_found|invalid|schema|auth|signature/i.test(err);
|
||||||
|
}
|
||||||
225
apps/cli/src/daemon/db/inbox.ts
Normal file
225
apps/cli/src/daemon/db/inbox.ts
Normal file
@@ -0,0 +1,225 @@
|
|||||||
|
// Inbox schema + accessors. Schema is the v0.9.0 spec §4.10 / v3 §4.5
|
||||||
|
// content table; FTS5 index is deferred to the followups doc.
|
||||||
|
|
||||||
|
import type { SqliteDb } from "./sqlite.js";
|
||||||
|
|
||||||
|
export interface InboxRow {
|
||||||
|
id: string;
|
||||||
|
client_message_id: string;
|
||||||
|
broker_message_id: string | null;
|
||||||
|
mesh: string;
|
||||||
|
topic: string | null;
|
||||||
|
sender_pubkey: string;
|
||||||
|
sender_name: string;
|
||||||
|
body: string | null;
|
||||||
|
meta: string | null;
|
||||||
|
received_at: number;
|
||||||
|
reply_to_id: string | null;
|
||||||
|
/** 1.34.8: Unix ms of when this row was first surfaced to the user
|
||||||
|
* (returned by an interactive `inbox` listing or pushed via channel
|
||||||
|
* reminder). NULL = never seen. Welcome filters on `seen_at IS NULL`
|
||||||
|
* so freshly-launched sessions only see what they actually missed. */
|
||||||
|
seen_at: number | null;
|
||||||
|
/** 1.34.11: pubkey of the WS that received this push. Either the
|
||||||
|
* daemon's member pubkey for member-keyed broadcasts, or one of
|
||||||
|
* our session pubkeys for session-targeted DMs. Without this, two
|
||||||
|
* sessions on the same daemon shared one inbox table and each saw
|
||||||
|
* every other session's messages — same bug shape the 1.34.10 SSE
|
||||||
|
* demux fixed for the live event path, just at the storage layer.
|
||||||
|
* Pre-1.34.11 rows have NULL here and are visible to every session
|
||||||
|
* on the same mesh (best-effort back-compat for already-stored
|
||||||
|
* history). */
|
||||||
|
recipient_pubkey: string | null;
|
||||||
|
/** 1.34.11: matches `recipient_kind` on the bus event. "session" =
|
||||||
|
* scoped to one session pubkey; "member" = visible to every
|
||||||
|
* session of that member on the mesh. NULL on legacy rows. */
|
||||||
|
recipient_kind: string | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function migrateInbox(db: SqliteDb): void {
|
||||||
|
db.exec(`
|
||||||
|
CREATE TABLE IF NOT EXISTS inbox (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
client_message_id TEXT NOT NULL UNIQUE,
|
||||||
|
broker_message_id TEXT,
|
||||||
|
mesh TEXT NOT NULL,
|
||||||
|
topic TEXT,
|
||||||
|
sender_pubkey TEXT NOT NULL,
|
||||||
|
sender_name TEXT NOT NULL,
|
||||||
|
body TEXT,
|
||||||
|
meta TEXT,
|
||||||
|
received_at INTEGER NOT NULL,
|
||||||
|
reply_to_id TEXT
|
||||||
|
);
|
||||||
|
CREATE INDEX IF NOT EXISTS inbox_received_at ON inbox(received_at);
|
||||||
|
CREATE INDEX IF NOT EXISTS inbox_topic ON inbox(topic);
|
||||||
|
CREATE INDEX IF NOT EXISTS inbox_sender ON inbox(sender_pubkey);
|
||||||
|
`);
|
||||||
|
// 1.34.8: read-state tracking. Pre-1.34.8 rows land with seen_at=NULL
|
||||||
|
// (treated as unread); welcome surfaces them once and the listing
|
||||||
|
// marks them seen. Indexed because welcome queries WHERE seen_at IS
|
||||||
|
// NULL on every launch.
|
||||||
|
const cols = db.prepare(`PRAGMA table_info(inbox)`).all<{ name: string }>();
|
||||||
|
if (!cols.some((c) => c.name === "seen_at")) {
|
||||||
|
db.exec(`ALTER TABLE inbox ADD COLUMN seen_at INTEGER`);
|
||||||
|
db.exec(`CREATE INDEX IF NOT EXISTS inbox_seen_at ON inbox(seen_at)`);
|
||||||
|
}
|
||||||
|
// 1.34.11: per-recipient scoping. Two sessions on the same daemon
|
||||||
|
// share one inbox table; without this column, listInbox returns
|
||||||
|
// every row regardless of which session is asking. Indexed
|
||||||
|
// because every interactive listing + welcome path filters by it.
|
||||||
|
if (!cols.some((c) => c.name === "recipient_pubkey")) {
|
||||||
|
db.exec(`ALTER TABLE inbox ADD COLUMN recipient_pubkey TEXT`);
|
||||||
|
db.exec(`ALTER TABLE inbox ADD COLUMN recipient_kind TEXT`);
|
||||||
|
db.exec(`CREATE INDEX IF NOT EXISTS inbox_recipient ON inbox(recipient_pubkey)`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Spec §4.5 insert path:
|
||||||
|
* INSERT ... ON CONFLICT(client_message_id) DO NOTHING RETURNING id
|
||||||
|
*
|
||||||
|
* Returns the new row id when this was a fresh insert, or null when the
|
||||||
|
* message id was already known (idempotent receive).
|
||||||
|
*/
|
||||||
|
export function insertIfNew(
|
||||||
|
db: SqliteDb,
|
||||||
|
// 1.34.8: callers don't pass `seen_at` — it's always NULL on insert
|
||||||
|
// (a freshly-received row is by definition unread). Stripping the
|
||||||
|
// field from the input type keeps inbound.ts callers from having to
|
||||||
|
// construct it.
|
||||||
|
row: Omit<InboxRow, "id" | "seen_at"> & { id: string },
|
||||||
|
): string | null {
|
||||||
|
// node:sqlite does support RETURNING. bun:sqlite does too. We branch on
|
||||||
|
// the row count instead so it works on both.
|
||||||
|
const before = db.prepare(`SELECT id FROM inbox WHERE client_message_id = ?`).get<{ id: string }>(row.client_message_id);
|
||||||
|
if (before) return null;
|
||||||
|
db.prepare(`
|
||||||
|
INSERT INTO inbox (
|
||||||
|
id, client_message_id, broker_message_id, mesh, topic,
|
||||||
|
sender_pubkey, sender_name, body, meta, received_at, reply_to_id,
|
||||||
|
recipient_pubkey, recipient_kind
|
||||||
|
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||||
|
ON CONFLICT(client_message_id) DO NOTHING
|
||||||
|
`).run(
|
||||||
|
row.id, row.client_message_id, row.broker_message_id, row.mesh, row.topic,
|
||||||
|
row.sender_pubkey, row.sender_name, row.body, row.meta, row.received_at, row.reply_to_id,
|
||||||
|
row.recipient_pubkey, row.recipient_kind,
|
||||||
|
);
|
||||||
|
// Confirm the insert landed (handles the conflict-noop race).
|
||||||
|
const after = db.prepare(`SELECT id FROM inbox WHERE client_message_id = ?`).get<{ id: string }>(row.client_message_id);
|
||||||
|
return after?.id === row.id ? row.id : null;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ListInboxParams {
|
||||||
|
since?: number; // received_at >= since
|
||||||
|
topic?: string;
|
||||||
|
fromPubkey?: string;
|
||||||
|
/** 1.34.0: filter by mesh slug. Omit to return rows across all meshes. */
|
||||||
|
mesh?: string;
|
||||||
|
/** 1.34.8: only rows with `seen_at IS NULL`. Used by the welcome
|
||||||
|
* push so a freshly-launched session surfaces what it actually
|
||||||
|
* missed instead of every row from the last 24h. */
|
||||||
|
unreadOnly?: boolean;
|
||||||
|
/** 1.34.11: scope to rows whose recipient is this session pubkey,
|
||||||
|
* PLUS member-keyed rows for the same member, PLUS legacy rows
|
||||||
|
* with a NULL recipient (best-effort back-compat with pre-1.34.11
|
||||||
|
* history). Set by the IPC `/v1/inbox` route from the bearer
|
||||||
|
* session token; without it the listing returns everything.
|
||||||
|
* `recipientMemberPubkey` widens the match to include broadcasts
|
||||||
|
* / member DMs that should reach every session of this member. */
|
||||||
|
recipientPubkey?: string;
|
||||||
|
recipientMemberPubkey?: string;
|
||||||
|
limit?: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function listInbox(db: SqliteDb, p: ListInboxParams): InboxRow[] {
|
||||||
|
const where: string[] = [];
|
||||||
|
const args: unknown[] = [];
|
||||||
|
if (p.since !== undefined) { where.push("received_at >= ?"); args.push(p.since); }
|
||||||
|
if (p.topic !== undefined) { where.push("topic = ?"); args.push(p.topic); }
|
||||||
|
if (p.fromPubkey !== undefined){ where.push("sender_pubkey = ?"); args.push(p.fromPubkey); }
|
||||||
|
if (p.mesh !== undefined) { where.push("mesh = ?"); args.push(p.mesh); }
|
||||||
|
if (p.unreadOnly === true) { where.push("seen_at IS NULL"); }
|
||||||
|
// 1.34.11: recipient scoping. A session sees:
|
||||||
|
// - rows whose recipient_pubkey === its session pubkey (its DMs),
|
||||||
|
// - rows whose recipient_pubkey === the daemon's member pubkey
|
||||||
|
// (broadcasts / member-keyed DMs to anyone in this member's
|
||||||
|
// identity — every sibling session sees them),
|
||||||
|
// - legacy rows where recipient_pubkey IS NULL (pre-1.34.11
|
||||||
|
// history; we can't tell who they were for, so surface to all).
|
||||||
|
if (p.recipientPubkey) {
|
||||||
|
const ors: string[] = ["recipient_pubkey IS NULL", "recipient_pubkey = ?"];
|
||||||
|
args.push(p.recipientPubkey);
|
||||||
|
if (p.recipientMemberPubkey) {
|
||||||
|
ors.push("recipient_pubkey = ?");
|
||||||
|
args.push(p.recipientMemberPubkey);
|
||||||
|
}
|
||||||
|
where.push(`(${ors.join(" OR ")})`);
|
||||||
|
}
|
||||||
|
const sql = `
|
||||||
|
SELECT id, client_message_id, broker_message_id, mesh, topic,
|
||||||
|
sender_pubkey, sender_name, body, meta, received_at, reply_to_id, seen_at,
|
||||||
|
recipient_pubkey, recipient_kind
|
||||||
|
FROM inbox
|
||||||
|
${where.length ? "WHERE " + where.join(" AND ") : ""}
|
||||||
|
ORDER BY received_at DESC
|
||||||
|
LIMIT ?
|
||||||
|
`;
|
||||||
|
args.push(Math.min(Math.max(p.limit ?? 100, 1), 1000));
|
||||||
|
return db.prepare(sql).all<InboxRow>(...args);
|
||||||
|
}
|
||||||
|
|
||||||
|
/** 1.34.8: stamp `seen_at = now` on every row whose id is in `ids`,
|
||||||
|
* but only when `seen_at IS NULL` so re-marking doesn't bump the
|
||||||
|
* timestamp on a row the user already knew about. Returns the number
|
||||||
|
* of rows that flipped from unread → seen. Used by:
|
||||||
|
* - the IPC `/v1/inbox` route when called by an interactive
|
||||||
|
* listing (the daemon stamps after returning rows so the human
|
||||||
|
* who just looked at their inbox doesn't see the same rows
|
||||||
|
* flagged "unread" on next launch);
|
||||||
|
* - the MCP server when the SSE message event surfaces a live
|
||||||
|
* `<channel>` reminder (Claude Code already saw the row inline,
|
||||||
|
* no need to surface it again on welcome). */
|
||||||
|
export function markInboxSeen(db: SqliteDb, ids: readonly string[], now = Date.now()): number {
|
||||||
|
if (ids.length === 0) return 0;
|
||||||
|
const placeholders = ids.map(() => "?").join(",");
|
||||||
|
const r = db.prepare(
|
||||||
|
`UPDATE inbox SET seen_at = ? WHERE seen_at IS NULL AND id IN (${placeholders})`,
|
||||||
|
).run(now, ...ids);
|
||||||
|
return Number(r.changes);
|
||||||
|
}
|
||||||
|
|
||||||
|
/** 1.34.8: TTL prune. Removes inbox rows older than `cutoffMs`
|
||||||
|
* (received_at < cutoffMs). Daemon schedules this hourly with a 30-day
|
||||||
|
* default retention (see startInboxPruner). Returns the number of
|
||||||
|
* rows removed so the caller can log the volume. */
|
||||||
|
export function pruneInboxBefore(db: SqliteDb, cutoffMs: number): number {
|
||||||
|
const r = db.prepare(`DELETE FROM inbox WHERE received_at < ?`).run(cutoffMs);
|
||||||
|
return Number(r.changes);
|
||||||
|
}
|
||||||
|
|
||||||
|
/** 1.34.7: delete a single inbox row by id. Returns true iff a row was
|
||||||
|
* removed. The CLI exposes this as `claudemesh inbox delete <id>`. */
|
||||||
|
export function deleteInboxRow(db: SqliteDb, id: string): boolean {
|
||||||
|
const r = db.prepare(`DELETE FROM inbox WHERE id = ?`).run(id);
|
||||||
|
return Number(r.changes) > 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** 1.34.7: bulk delete with mesh / age filters. Returns the number of
|
||||||
|
* rows removed. With no filter, deletes ALL rows on ALL meshes —
|
||||||
|
* caller is expected to gate this behind a `--all` confirmation. */
|
||||||
|
export interface FlushInboxParams {
|
||||||
|
mesh?: string;
|
||||||
|
/** Unix ms — delete rows received_at < before. */
|
||||||
|
before?: number;
|
||||||
|
}
|
||||||
|
export function flushInbox(db: SqliteDb, p: FlushInboxParams): number {
|
||||||
|
const where: string[] = [];
|
||||||
|
const args: unknown[] = [];
|
||||||
|
if (p.mesh !== undefined) { where.push("mesh = ?"); args.push(p.mesh); }
|
||||||
|
if (p.before !== undefined) { where.push("received_at < ?"); args.push(p.before); }
|
||||||
|
const sql = `DELETE FROM inbox ${where.length ? "WHERE " + where.join(" AND ") : ""}`;
|
||||||
|
const r = db.prepare(sql).run(...args);
|
||||||
|
return Number(r.changes);
|
||||||
|
}
|
||||||
245
apps/cli/src/daemon/db/outbox.ts
Normal file
245
apps/cli/src/daemon/db/outbox.ts
Normal file
@@ -0,0 +1,245 @@
|
|||||||
|
// Outbox schema + accessors. Schema is the v0.9.0 spec §4.5.2 shape:
|
||||||
|
// includes `aborted` status and audit columns from the v7 pull.
|
||||||
|
|
||||||
|
import type { SqliteDb } from "./sqlite.js";
|
||||||
|
|
||||||
|
export type OutboxStatus = "pending" | "inflight" | "done" | "dead" | "aborted";
|
||||||
|
|
||||||
|
export interface OutboxRow {
|
||||||
|
id: string;
|
||||||
|
client_message_id: string;
|
||||||
|
request_fingerprint: Uint8Array;
|
||||||
|
payload: Uint8Array;
|
||||||
|
enqueued_at: number;
|
||||||
|
attempts: number;
|
||||||
|
next_attempt_at: number;
|
||||||
|
status: OutboxStatus;
|
||||||
|
last_error: string | null;
|
||||||
|
delivered_at: number | null;
|
||||||
|
broker_message_id: string | null;
|
||||||
|
aborted_at: number | null;
|
||||||
|
aborted_by: string | null;
|
||||||
|
superseded_by: string | null;
|
||||||
|
/** Sprint 4 routing: NULL on v0.9.0 rows, drained via broadcast fallback. */
|
||||||
|
mesh: string | null;
|
||||||
|
target_spec: string | null;
|
||||||
|
nonce: string | null;
|
||||||
|
ciphertext: string | null;
|
||||||
|
priority: string | null;
|
||||||
|
/**
|
||||||
|
* 1.34.0: hex pubkey of the launched session that originated this row.
|
||||||
|
* NULL when the send came from outside a registered session
|
||||||
|
* (cold-path CLI, system-issued sends, etc.) — drain falls through to
|
||||||
|
* the daemon-WS in that case. When set, drain prefers the matching
|
||||||
|
* SessionBrokerClient so the broker fan-out attributes the push to
|
||||||
|
* the session pubkey instead of the daemon's stable member pubkey.
|
||||||
|
*/
|
||||||
|
sender_session_pubkey: string | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function migrateOutbox(db: SqliteDb): void {
|
||||||
|
db.exec(`
|
||||||
|
CREATE TABLE IF NOT EXISTS outbox (
|
||||||
|
id TEXT PRIMARY KEY,
|
||||||
|
client_message_id TEXT NOT NULL UNIQUE,
|
||||||
|
request_fingerprint BLOB NOT NULL,
|
||||||
|
payload BLOB NOT NULL,
|
||||||
|
enqueued_at INTEGER NOT NULL,
|
||||||
|
attempts INTEGER NOT NULL DEFAULT 0,
|
||||||
|
next_attempt_at INTEGER NOT NULL,
|
||||||
|
status TEXT NOT NULL CHECK(status IN
|
||||||
|
('pending','inflight','done','dead','aborted')),
|
||||||
|
last_error TEXT,
|
||||||
|
delivered_at INTEGER,
|
||||||
|
broker_message_id TEXT,
|
||||||
|
aborted_at INTEGER,
|
||||||
|
aborted_by TEXT,
|
||||||
|
superseded_by TEXT
|
||||||
|
);
|
||||||
|
CREATE INDEX IF NOT EXISTS outbox_pending
|
||||||
|
ON outbox(status, next_attempt_at);
|
||||||
|
CREATE INDEX IF NOT EXISTS outbox_aborted
|
||||||
|
ON outbox(status, aborted_at) WHERE status = 'aborted';
|
||||||
|
`);
|
||||||
|
|
||||||
|
// v1.25.0 / Sprint 4: real outbound routing. Adds the broker-format
|
||||||
|
// target spec, mesh slug, and the already-encrypted ciphertext+nonce so
|
||||||
|
// the drain worker can dispatch each row without re-resolving names or
|
||||||
|
// re-running crypto. Existing rows from v0.9.0 land with NULLs and get
|
||||||
|
// drained via the legacy broadcast fallback (preserves no-regression).
|
||||||
|
const hasMesh = columnExists(db, "outbox", "mesh");
|
||||||
|
const hasTargetSpec = columnExists(db, "outbox", "target_spec");
|
||||||
|
const hasNonce = columnExists(db, "outbox", "nonce");
|
||||||
|
const hasCiphertext = columnExists(db, "outbox", "ciphertext");
|
||||||
|
const hasPriority = columnExists(db, "outbox", "priority");
|
||||||
|
if (!hasMesh) db.exec(`ALTER TABLE outbox ADD COLUMN mesh TEXT`);
|
||||||
|
if (!hasTargetSpec) db.exec(`ALTER TABLE outbox ADD COLUMN target_spec TEXT`);
|
||||||
|
if (!hasNonce) db.exec(`ALTER TABLE outbox ADD COLUMN nonce TEXT`);
|
||||||
|
if (!hasCiphertext) db.exec(`ALTER TABLE outbox ADD COLUMN ciphertext TEXT`);
|
||||||
|
if (!hasPriority) db.exec(`ALTER TABLE outbox ADD COLUMN priority TEXT`);
|
||||||
|
|
||||||
|
// 1.34.0: per-row sender session pubkey, used by the drain worker to
|
||||||
|
// route via the originating session's WS so broker fan-out attributes
|
||||||
|
// the push to the session pubkey, not the daemon's member pubkey.
|
||||||
|
// Pre-1.34.0 rows land with NULL — drain falls back to the daemon-WS
|
||||||
|
// path (legacy attribution).
|
||||||
|
const hasSenderSessionPk = columnExists(db, "outbox", "sender_session_pubkey");
|
||||||
|
if (!hasSenderSessionPk) db.exec(`ALTER TABLE outbox ADD COLUMN sender_session_pubkey TEXT`);
|
||||||
|
}
|
||||||
|
|
||||||
|
function columnExists(db: SqliteDb, table: string, column: string): boolean {
|
||||||
|
const rows = db.prepare(`PRAGMA table_info(${table})`).all<{ name: string }>();
|
||||||
|
return rows.some((r) => r.name === column);
|
||||||
|
}
|
||||||
|
|
||||||
|
export function findByClientId(db: SqliteDb, clientMessageId: string): OutboxRow | null {
|
||||||
|
const row = db.prepare(`
|
||||||
|
SELECT id, client_message_id, request_fingerprint, payload, enqueued_at,
|
||||||
|
attempts, next_attempt_at, status, last_error, delivered_at,
|
||||||
|
broker_message_id, aborted_at, aborted_by, superseded_by,
|
||||||
|
mesh, target_spec, nonce, ciphertext, priority,
|
||||||
|
sender_session_pubkey
|
||||||
|
FROM outbox WHERE client_message_id = ?
|
||||||
|
`).get<OutboxRow>(clientMessageId);
|
||||||
|
return row ?? null;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface InsertPendingInput {
|
||||||
|
id: string;
|
||||||
|
client_message_id: string;
|
||||||
|
request_fingerprint: Uint8Array;
|
||||||
|
payload: Uint8Array;
|
||||||
|
now: number;
|
||||||
|
/** Sprint 4: routing fields. Optional only for legacy/v0.9.0 callers. */
|
||||||
|
mesh?: string;
|
||||||
|
target_spec?: string;
|
||||||
|
nonce?: string;
|
||||||
|
ciphertext?: string;
|
||||||
|
priority?: string;
|
||||||
|
/** 1.34.0: hex pubkey of the originating session (omit for cold-path
|
||||||
|
* CLI sends — drain will use the daemon-WS). */
|
||||||
|
sender_session_pubkey?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function insertPending(db: SqliteDb, input: InsertPendingInput): void {
|
||||||
|
db.prepare(`
|
||||||
|
INSERT INTO outbox (
|
||||||
|
id, client_message_id, request_fingerprint, payload,
|
||||||
|
enqueued_at, attempts, next_attempt_at, status,
|
||||||
|
mesh, target_spec, nonce, ciphertext, priority,
|
||||||
|
sender_session_pubkey
|
||||||
|
) VALUES (?, ?, ?, ?, ?, 0, ?, 'pending', ?, ?, ?, ?, ?, ?)
|
||||||
|
`).run(
|
||||||
|
input.id,
|
||||||
|
input.client_message_id,
|
||||||
|
input.request_fingerprint,
|
||||||
|
input.payload,
|
||||||
|
input.now,
|
||||||
|
input.now,
|
||||||
|
input.mesh ?? null,
|
||||||
|
input.target_spec ?? null,
|
||||||
|
input.nonce ?? null,
|
||||||
|
input.ciphertext ?? null,
|
||||||
|
input.priority ?? null,
|
||||||
|
input.sender_session_pubkey ?? null,
|
||||||
|
);
|
||||||
|
}
|
||||||
|
|
||||||
|
export function markAborted(db: SqliteDb, id: string, by: string, supersededBy: string | null, now: number): void {
|
||||||
|
db.prepare(`
|
||||||
|
UPDATE outbox SET status = 'aborted', aborted_at = ?, aborted_by = ?, superseded_by = ?
|
||||||
|
WHERE id = ?
|
||||||
|
`).run(now, by, supersededBy, id);
|
||||||
|
}
|
||||||
|
|
||||||
|
export function fingerprintsEqual(a: Uint8Array, b: Uint8Array): boolean {
|
||||||
|
if (a.length !== b.length) return false;
|
||||||
|
let diff = 0;
|
||||||
|
for (let i = 0; i < a.length; i++) diff |= (a[i]! ^ b[i]!);
|
||||||
|
return diff === 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ListOutboxParams {
|
||||||
|
status?: OutboxStatus;
|
||||||
|
limit?: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function listOutbox(db: SqliteDb, p: ListOutboxParams = {}): OutboxRow[] {
|
||||||
|
const where: string[] = [];
|
||||||
|
const args: unknown[] = [];
|
||||||
|
if (p.status) { where.push("status = ?"); args.push(p.status); }
|
||||||
|
const sql = `
|
||||||
|
SELECT id, client_message_id, request_fingerprint, payload, enqueued_at,
|
||||||
|
attempts, next_attempt_at, status, last_error, delivered_at,
|
||||||
|
broker_message_id, aborted_at, aborted_by, superseded_by,
|
||||||
|
mesh, target_spec, nonce, ciphertext, priority,
|
||||||
|
sender_session_pubkey
|
||||||
|
FROM outbox
|
||||||
|
${where.length ? "WHERE " + where.join(" AND ") : ""}
|
||||||
|
ORDER BY enqueued_at DESC
|
||||||
|
LIMIT ?
|
||||||
|
`;
|
||||||
|
args.push(Math.min(Math.max(p.limit ?? 50, 1), 500));
|
||||||
|
return db.prepare(sql).all<OutboxRow>(...args);
|
||||||
|
}
|
||||||
|
|
||||||
|
export function findById(db: SqliteDb, id: string): OutboxRow | null {
|
||||||
|
return db.prepare(`
|
||||||
|
SELECT id, client_message_id, request_fingerprint, payload, enqueued_at,
|
||||||
|
attempts, next_attempt_at, status, last_error, delivered_at,
|
||||||
|
broker_message_id, aborted_at, aborted_by, superseded_by,
|
||||||
|
mesh, target_spec, nonce, ciphertext, priority,
|
||||||
|
sender_session_pubkey
|
||||||
|
FROM outbox WHERE id = ?
|
||||||
|
`).get<OutboxRow>(id) ?? null;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface RequeueResult {
|
||||||
|
abortedRowId: string;
|
||||||
|
newRowId: string;
|
||||||
|
newClientMessageId: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Operator recovery per spec §4.5.3 / §4.6.3. Atomically:
|
||||||
|
* 1. Mark the existing row aborted (audit columns set, status flipped).
|
||||||
|
* 2. Insert a fresh pending row reusing the same payload+fingerprint
|
||||||
|
* under a new client_message_id.
|
||||||
|
* 3. Wire superseded_by on the old row to the new row id.
|
||||||
|
*
|
||||||
|
* Returns null if the requested id doesn't exist or is already aborted/done.
|
||||||
|
*/
|
||||||
|
export function requeueDeadOrPending(
|
||||||
|
db: SqliteDb,
|
||||||
|
args: { id: string; newClientMessageId: string; newRowId: string; now: number; abortedBy: string },
|
||||||
|
): RequeueResult | null {
|
||||||
|
const existing = findById(db, args.id);
|
||||||
|
if (!existing) return null;
|
||||||
|
if (existing.status === "aborted" || existing.status === "done") return null;
|
||||||
|
|
||||||
|
db.prepare(`
|
||||||
|
UPDATE outbox
|
||||||
|
SET status = 'aborted', aborted_at = ?, aborted_by = ?, superseded_by = ?
|
||||||
|
WHERE id = ? AND status IN ('pending','inflight','dead')
|
||||||
|
`).run(args.now, args.abortedBy, args.newRowId, args.id);
|
||||||
|
|
||||||
|
db.prepare(`
|
||||||
|
INSERT INTO outbox (
|
||||||
|
id, client_message_id, request_fingerprint, payload,
|
||||||
|
enqueued_at, attempts, next_attempt_at, status
|
||||||
|
) VALUES (?, ?, ?, ?, ?, 0, ?, 'pending')
|
||||||
|
`).run(
|
||||||
|
args.newRowId,
|
||||||
|
args.newClientMessageId,
|
||||||
|
existing.request_fingerprint,
|
||||||
|
existing.payload,
|
||||||
|
args.now,
|
||||||
|
args.now,
|
||||||
|
);
|
||||||
|
|
||||||
|
return {
|
||||||
|
abortedRowId: existing.id,
|
||||||
|
newRowId: args.newRowId,
|
||||||
|
newClientMessageId: args.newClientMessageId,
|
||||||
|
};
|
||||||
|
}
|
||||||
76
apps/cli/src/daemon/db/sqlite.ts
Normal file
76
apps/cli/src/daemon/db/sqlite.ts
Normal file
@@ -0,0 +1,76 @@
|
|||||||
|
// SQLite shim. The daemon runs under Node 22.5+ in production (node:sqlite).
|
||||||
|
// During local dev (bun src/entrypoints/cli.ts daemon up) we fall back to
|
||||||
|
// bun:sqlite, which has a near-identical API surface for what we use.
|
||||||
|
|
||||||
|
export type SqliteDb = {
|
||||||
|
prepare(sql: string): {
|
||||||
|
run(...params: unknown[]): { changes: number; lastInsertRowid: number | bigint };
|
||||||
|
get<T = unknown>(...params: unknown[]): T | undefined;
|
||||||
|
all<T = unknown>(...params: unknown[]): T[];
|
||||||
|
};
|
||||||
|
exec(sql: string): void;
|
||||||
|
close(): void;
|
||||||
|
};
|
||||||
|
|
||||||
|
interface DatabaseCtor {
|
||||||
|
new (path: string): SqliteDb;
|
||||||
|
}
|
||||||
|
|
||||||
|
let cached: DatabaseCtor | null = null;
|
||||||
|
|
||||||
|
async function loadSqlite(): Promise<DatabaseCtor> {
|
||||||
|
if (cached) return cached;
|
||||||
|
|
||||||
|
// Prefer node:sqlite (production runtime).
|
||||||
|
try {
|
||||||
|
const mod = (await import("node:sqlite")) as { DatabaseSync: DatabaseCtor };
|
||||||
|
cached = mod.DatabaseSync;
|
||||||
|
return cached;
|
||||||
|
} catch (nodeErr) {
|
||||||
|
// Dev path: bun:sqlite. Bun's Database has prepare/exec/close already.
|
||||||
|
try {
|
||||||
|
const bunMod = (await import("bun:sqlite")) as { Database: DatabaseCtor };
|
||||||
|
cached = bunMod.Database;
|
||||||
|
return cached;
|
||||||
|
} catch {
|
||||||
|
const msg = `claudemesh daemon requires Node.js 22.5+ for the embedded SQLite store ` +
|
||||||
|
`(node:sqlite), or Bun (bun:sqlite) for dev. ` +
|
||||||
|
`Current: ${process.version}. Original error: ${String(nodeErr)}`;
|
||||||
|
throw new Error(msg);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function openSqlite(path: string): Promise<SqliteDb> {
|
||||||
|
const Database = await loadSqlite();
|
||||||
|
const db = new Database(path);
|
||||||
|
// Default pragmas for daemon use:
|
||||||
|
// journal_mode WAL — concurrent reads while one writer is in BEGIN IMMEDIATE.
|
||||||
|
// synchronous NORMAL — balance durability/throughput; daemon is the only writer.
|
||||||
|
// foreign_keys ON — enforce constraints if any are added later.
|
||||||
|
// busy_timeout — let BEGIN IMMEDIATE wait briefly for a contending writer.
|
||||||
|
db.exec(`
|
||||||
|
PRAGMA journal_mode = WAL;
|
||||||
|
PRAGMA synchronous = NORMAL;
|
||||||
|
PRAGMA foreign_keys = ON;
|
||||||
|
PRAGMA busy_timeout = 5000;
|
||||||
|
`);
|
||||||
|
return db;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Run `fn` inside a `BEGIN IMMEDIATE` transaction. Per spec §4.5.1, this is
|
||||||
|
* what serializes IPC accept against concurrent same-id requests; SQLite has
|
||||||
|
* no row-level lock and `SELECT FOR UPDATE` is not supported.
|
||||||
|
*/
|
||||||
|
export function inImmediateTx<T>(db: SqliteDb, fn: () => T): T {
|
||||||
|
db.exec("BEGIN IMMEDIATE");
|
||||||
|
try {
|
||||||
|
const out = fn();
|
||||||
|
db.exec("COMMIT");
|
||||||
|
return out;
|
||||||
|
} catch (err) {
|
||||||
|
try { db.exec("ROLLBACK"); } catch { /* ignore */ }
|
||||||
|
throw err;
|
||||||
|
}
|
||||||
|
}
|
||||||
265
apps/cli/src/daemon/drain.ts
Normal file
265
apps/cli/src/daemon/drain.ts
Normal file
@@ -0,0 +1,265 @@
|
|||||||
|
// Outbox drain worker. Walks `outbox.pending` rows, sends them to the
|
||||||
|
// broker via DaemonBrokerClient, and transitions row state per spec §4.6.1.
|
||||||
|
//
|
||||||
|
// Lifecycle per row:
|
||||||
|
// pending → inflight → done (broker accepted)
|
||||||
|
// → pending+backoff (transient broker error)
|
||||||
|
// → dead (permanent broker error or
|
||||||
|
// attempt cap reached)
|
||||||
|
//
|
||||||
|
// Wakeable: insertPending in the IPC handler can call wake() to skip the
|
||||||
|
// idle interval. We use a simple promise-replacing pattern instead of a
|
||||||
|
// pollable signal.
|
||||||
|
|
||||||
|
import type { SqliteDb } from "./db/sqlite.js";
|
||||||
|
import type { DaemonBrokerClient } from "./broker.js";
|
||||||
|
import type { SessionBrokerClient } from "./session-broker.js";
|
||||||
|
import type { OutboxStatus } from "./db/outbox.js";
|
||||||
|
|
||||||
|
const POLL_INTERVAL_MS = 500;
|
||||||
|
const MAX_ATTEMPTS_PER_ROW = 25;
|
||||||
|
const BACKOFF_BASE_MS = 500;
|
||||||
|
const BACKOFF_CAP_MS = 30_000;
|
||||||
|
|
||||||
|
interface PendingRow {
|
||||||
|
id: string;
|
||||||
|
client_message_id: string;
|
||||||
|
request_fingerprint: Uint8Array;
|
||||||
|
payload: Uint8Array;
|
||||||
|
attempts: number;
|
||||||
|
/** Sprint 4 routing fields. NULL on legacy v0.9.0 rows → broadcast fallback. */
|
||||||
|
target_spec: string | null;
|
||||||
|
nonce: string | null;
|
||||||
|
ciphertext: string | null;
|
||||||
|
priority: string | null;
|
||||||
|
mesh: string | null;
|
||||||
|
/** 1.34.0: hex pubkey of the originating session — drain prefers
|
||||||
|
* routing via that session's WS so broker fan-out attributes the
|
||||||
|
* push to the session pubkey. NULL on cold-path / pre-1.34.0 rows. */
|
||||||
|
sender_session_pubkey: string | null;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface DrainOptions {
|
||||||
|
db: SqliteDb;
|
||||||
|
/** v1.26.0: per-mesh broker map. Drain dispatches each row to the
|
||||||
|
* broker keyed by its `mesh` column. Single-mesh daemons pass a
|
||||||
|
* Map of size 1; multi-mesh daemons pass one entry per joined mesh. */
|
||||||
|
brokers: Map<string, DaemonBrokerClient>;
|
||||||
|
/**
|
||||||
|
* 1.34.0: lookup for the per-session WS keyed by hex session pubkey.
|
||||||
|
* When an outbox row has `sender_session_pubkey` set and this lookup
|
||||||
|
* returns an open client, the drain routes via the session-WS so the
|
||||||
|
* broker fan-out attributes the push to the session pubkey instead
|
||||||
|
* of the daemon's stable member pubkey.
|
||||||
|
*
|
||||||
|
* Returning `undefined` (or an unopened client) signals "no session
|
||||||
|
* WS available" — the drain backs off and retries; it does NOT fall
|
||||||
|
* back to the daemon-WS, because the row was encrypted with the
|
||||||
|
* session secret and would fail to decrypt on the recipient side
|
||||||
|
* if attribution silently changed mid-flight.
|
||||||
|
*/
|
||||||
|
getSessionBrokerByPubkey?: (sessionPubkey: string) => SessionBrokerClient | undefined;
|
||||||
|
log?: (level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) => void;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface DrainHandle {
|
||||||
|
wake(): void;
|
||||||
|
close(): Promise<void>;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function startDrainWorker(opts: DrainOptions): DrainHandle {
|
||||||
|
const log = opts.log ?? defaultLog;
|
||||||
|
let stopped = false;
|
||||||
|
let wakeResolve: (() => void) | null = null;
|
||||||
|
let wakePromise = new Promise<void>((r) => { wakeResolve = r; });
|
||||||
|
|
||||||
|
const wake = () => {
|
||||||
|
if (wakeResolve) {
|
||||||
|
const r = wakeResolve;
|
||||||
|
wakeResolve = null;
|
||||||
|
r();
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
const tick = async () => {
|
||||||
|
while (!stopped) {
|
||||||
|
try { await drainOnce(opts, log); }
|
||||||
|
catch (e) { log("warn", "drain_tick_failed", { err: String(e) }); }
|
||||||
|
// Sleep up to POLL_INTERVAL_MS, but wake immediately on signal.
|
||||||
|
await Promise.race([
|
||||||
|
wakePromise,
|
||||||
|
new Promise<void>((r) => setTimeout(r, POLL_INTERVAL_MS)),
|
||||||
|
]);
|
||||||
|
// Reset wake promise after each loop.
|
||||||
|
wakePromise = new Promise<void>((r) => { wakeResolve = r; });
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
void tick();
|
||||||
|
|
||||||
|
return {
|
||||||
|
wake,
|
||||||
|
close: async () => { stopped = true; wake(); },
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
async function drainOnce(opts: DrainOptions, log: NonNullable<DrainOptions["log"]>): Promise<void> {
|
||||||
|
const now = Date.now();
|
||||||
|
const rows = opts.db.prepare(`
|
||||||
|
SELECT id, client_message_id, request_fingerprint, payload, attempts,
|
||||||
|
target_spec, nonce, ciphertext, priority, mesh,
|
||||||
|
sender_session_pubkey
|
||||||
|
FROM outbox
|
||||||
|
WHERE status = 'pending' AND next_attempt_at <= ?
|
||||||
|
ORDER BY enqueued_at
|
||||||
|
LIMIT 32
|
||||||
|
`).all<PendingRow>(now);
|
||||||
|
|
||||||
|
if (rows.length === 0) return;
|
||||||
|
|
||||||
|
for (const row of rows) {
|
||||||
|
if (markInflight(opts.db, row.id, now) === 0) continue; // raced with another drainer
|
||||||
|
const fpHex = bufferToHex(row.request_fingerprint);
|
||||||
|
|
||||||
|
// v1.26.0: pick the daemon-WS broker keyed by the row's mesh.
|
||||||
|
// Legacy rows (mesh=NULL) fall back to the only broker if there's
|
||||||
|
// exactly one; otherwise mark dead because we don't know where to
|
||||||
|
// send them.
|
||||||
|
let daemonBroker: DaemonBrokerClient | undefined;
|
||||||
|
if (row.mesh) {
|
||||||
|
daemonBroker = opts.brokers.get(row.mesh);
|
||||||
|
} else if (opts.brokers.size === 1) {
|
||||||
|
daemonBroker = opts.brokers.values().next().value;
|
||||||
|
}
|
||||||
|
if (!daemonBroker) {
|
||||||
|
log("warn", "drain_no_broker_for_mesh", { id: row.id, mesh: row.mesh ?? "(null)" });
|
||||||
|
markDead(opts.db, row.id, `no_broker_for_mesh:${row.mesh ?? "null"}`);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 1.34.0: when the row was written by an authenticated session,
|
||||||
|
// dispatch via the matching SessionBrokerClient so broker fan-out
|
||||||
|
// attributes the push to the session pubkey. Encryption is
|
||||||
|
// session-secret based on those rows, so we MUST NOT silently fall
|
||||||
|
// back to the daemon-WS — the recipient's decrypt would fail. If
|
||||||
|
// the session-WS is closed (reconnecting / session terminated), we
|
||||||
|
// back off and retry.
|
||||||
|
let sessionBroker: SessionBrokerClient | undefined;
|
||||||
|
if (row.sender_session_pubkey && opts.getSessionBrokerByPubkey) {
|
||||||
|
sessionBroker = opts.getSessionBrokerByPubkey(row.sender_session_pubkey);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Sprint 4: use the row's resolved target/ciphertext if present.
|
||||||
|
// Legacy v0.9.0 rows (NULL on these columns) fall back to the
|
||||||
|
// broadcast smoke-test shape so existing in-flight rows still drain.
|
||||||
|
let targetSpec: string;
|
||||||
|
let nonce: string;
|
||||||
|
let ciphertext: string;
|
||||||
|
let priority: "now" | "next" | "low";
|
||||||
|
if (row.target_spec && row.nonce && row.ciphertext) {
|
||||||
|
targetSpec = row.target_spec;
|
||||||
|
nonce = row.nonce;
|
||||||
|
ciphertext = row.ciphertext;
|
||||||
|
priority = (row.priority === "now" || row.priority === "low") ? row.priority : "next";
|
||||||
|
} else {
|
||||||
|
targetSpec = "*";
|
||||||
|
nonce = await randomNonce();
|
||||||
|
ciphertext = Buffer.from(row.payload).toString("base64");
|
||||||
|
priority = "next";
|
||||||
|
}
|
||||||
|
|
||||||
|
const sendArgs = {
|
||||||
|
targetSpec,
|
||||||
|
priority,
|
||||||
|
nonce,
|
||||||
|
ciphertext,
|
||||||
|
client_message_id: row.client_message_id,
|
||||||
|
request_fingerprint_hex: fpHex,
|
||||||
|
};
|
||||||
|
|
||||||
|
let res;
|
||||||
|
try {
|
||||||
|
if (row.sender_session_pubkey) {
|
||||||
|
// Session-attributed row. Require an open session-WS — see comment
|
||||||
|
// above on why we don't fall back to the daemon-WS.
|
||||||
|
if (!sessionBroker || !sessionBroker.isOpen()) {
|
||||||
|
log("info", "drain_session_ws_not_ready", {
|
||||||
|
id: row.id, session_pubkey: row.sender_session_pubkey.slice(0, 12),
|
||||||
|
});
|
||||||
|
backoffPending(opts.db, row.id, row.attempts + 1, "session_ws_not_open", "session_ws_not_open");
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
res = await sessionBroker.send(sendArgs);
|
||||||
|
} else {
|
||||||
|
res = await daemonBroker.send(sendArgs);
|
||||||
|
}
|
||||||
|
} catch (e) {
|
||||||
|
log("warn", "drain_send_threw", { id: row.id, err: String(e) });
|
||||||
|
backoffPending(opts.db, row.id, row.attempts + 1, "exception", String(e));
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (res.ok) {
|
||||||
|
markDone(opts.db, row.id, res.messageId, Date.now());
|
||||||
|
} else if (res.permanent) {
|
||||||
|
log("warn", "drain_permanent_failure", { id: row.id, err: res.error });
|
||||||
|
markDead(opts.db, row.id, res.error);
|
||||||
|
} else if (row.attempts + 1 >= MAX_ATTEMPTS_PER_ROW) {
|
||||||
|
log("warn", "drain_max_attempts", { id: row.id, err: res.error });
|
||||||
|
markDead(opts.db, row.id, `max_attempts: ${res.error}`);
|
||||||
|
} else {
|
||||||
|
backoffPending(opts.db, row.id, row.attempts + 1, "retry", res.error);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function markInflight(db: SqliteDb, id: string, now: number): number {
|
||||||
|
return Number(db.prepare(`
|
||||||
|
UPDATE outbox
|
||||||
|
SET status = 'inflight', attempts = attempts + 1, next_attempt_at = ?
|
||||||
|
WHERE id = ? AND status = 'pending'
|
||||||
|
`).run(now + BACKOFF_CAP_MS, id).changes);
|
||||||
|
}
|
||||||
|
|
||||||
|
function markDone(db: SqliteDb, id: string, brokerMessageId: string, now: number) {
|
||||||
|
db.prepare(`
|
||||||
|
UPDATE outbox
|
||||||
|
SET status = 'done', delivered_at = ?, broker_message_id = ?, last_error = NULL
|
||||||
|
WHERE id = ?
|
||||||
|
`).run(now, brokerMessageId, id);
|
||||||
|
}
|
||||||
|
|
||||||
|
function markDead(db: SqliteDb, id: string, err: string) {
|
||||||
|
db.prepare(`UPDATE outbox SET status = 'dead', last_error = ? WHERE id = ?`).run(err, id);
|
||||||
|
}
|
||||||
|
|
||||||
|
function backoffPending(db: SqliteDb, id: string, attempts: number, _kind: string, err: string) {
|
||||||
|
const wait = Math.min(BACKOFF_CAP_MS, BACKOFF_BASE_MS * (2 ** Math.min(attempts, 12)));
|
||||||
|
const next = Date.now() + wait;
|
||||||
|
db.prepare(`
|
||||||
|
UPDATE outbox
|
||||||
|
SET status = 'pending', attempts = ?, next_attempt_at = ?, last_error = ?
|
||||||
|
WHERE id = ?
|
||||||
|
`).run(attempts, next, err, id);
|
||||||
|
}
|
||||||
|
|
||||||
|
function bufferToHex(b: Uint8Array): string {
|
||||||
|
let s = "";
|
||||||
|
for (let i = 0; i < b.length; i++) s += b[i]!.toString(16).padStart(2, "0");
|
||||||
|
return s;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function randomNonce(): Promise<string> {
|
||||||
|
const { randomBytes } = await import("node:crypto");
|
||||||
|
return randomBytes(24).toString("base64");
|
||||||
|
}
|
||||||
|
|
||||||
|
function defaultLog(level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) {
|
||||||
|
const line = JSON.stringify({ level, msg, ...meta, ts: new Date().toISOString() });
|
||||||
|
if (level === "info") process.stdout.write(line + "\n");
|
||||||
|
else process.stderr.write(line + "\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
// Suppress unused-status warning under strict tsc:
|
||||||
|
const _statuses: OutboxStatus[] = ["pending", "inflight", "done", "dead", "aborted"];
|
||||||
|
void _statuses;
|
||||||
132
apps/cli/src/daemon/events.ts
Normal file
132
apps/cli/src/daemon/events.ts
Normal file
@@ -0,0 +1,132 @@
|
|||||||
|
// Lightweight in-process event bus + SSE writer. Used by /v1/events SSE
|
||||||
|
// stream and consumed by hooks (post-v0.9.0).
|
||||||
|
|
||||||
|
import type { ServerResponse } from "node:http";
|
||||||
|
|
||||||
|
export type DaemonEventKind =
|
||||||
|
| "message"
|
||||||
|
| "peer_join"
|
||||||
|
| "peer_leave"
|
||||||
|
| "broker_status"
|
||||||
|
| "system";
|
||||||
|
|
||||||
|
export interface DaemonEvent {
|
||||||
|
kind: DaemonEventKind;
|
||||||
|
ts: string;
|
||||||
|
data: Record<string, unknown>;
|
||||||
|
}
|
||||||
|
|
||||||
|
type Subscriber = (e: DaemonEvent) => void;
|
||||||
|
|
||||||
|
export class EventBus {
|
||||||
|
private subs = new Set<Subscriber>();
|
||||||
|
|
||||||
|
publish(kind: DaemonEventKind, data: Record<string, unknown>): void {
|
||||||
|
const e: DaemonEvent = { kind, ts: new Date().toISOString(), data };
|
||||||
|
for (const s of this.subs) {
|
||||||
|
try { s(e); } catch { /* one bad subscriber must not poison the rest */ }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
subscribe(fn: Subscriber): () => void {
|
||||||
|
this.subs.add(fn);
|
||||||
|
return () => this.subs.delete(fn);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Write an event to an open SSE response. */
|
||||||
|
export function writeSse(res: ServerResponse, e: DaemonEvent, idCounter: number): void {
|
||||||
|
res.write(`id: ${idCounter}\n`);
|
||||||
|
res.write(`event: ${e.kind}\n`);
|
||||||
|
res.write(`data: ${JSON.stringify({ ts: e.ts, ...e.data })}\n\n`);
|
||||||
|
}
|
||||||
|
|
||||||
|
/** 1.34.10: per-subscriber demux options. The MCP server passes its
|
||||||
|
* own session pubkey + member pubkey when binding so the bus only
|
||||||
|
* sends events meant for that session. Without this, every MCP on a
|
||||||
|
* multi-session daemon receives every inbox row and emits a
|
||||||
|
* duplicate channel notification — manifests as session A seeing its
|
||||||
|
* own outbound DM to B because B's session-WS published the row to
|
||||||
|
* the shared bus. */
|
||||||
|
export interface SseFilterOptions {
|
||||||
|
/** Session pubkey the subscribing MCP serves. Events tagged
|
||||||
|
* `recipient_kind: "session"` only flow when their
|
||||||
|
* `recipient_pubkey` matches this. */
|
||||||
|
sessionPubkey?: string;
|
||||||
|
/** Daemon's member pubkey for this mesh. Events tagged
|
||||||
|
* `recipient_kind: "member"` flow when their `recipient_pubkey`
|
||||||
|
* matches — those are member-keyed broadcasts / DMs that should
|
||||||
|
* reach every session of this member, but not OTHER members. */
|
||||||
|
memberPubkey?: string;
|
||||||
|
/** Mesh slug the subscriber is bound to (from session registry).
|
||||||
|
* When set, system events (peer_join etc.) are filtered to this
|
||||||
|
* mesh; without it every system event surfaces. */
|
||||||
|
meshSlug?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
function shouldDeliver(e: DaemonEvent, f: SseFilterOptions): boolean {
|
||||||
|
// No filter set → legacy behavior: deliver everything (used by
|
||||||
|
// diagnostic tooling like `claudemesh daemon events`).
|
||||||
|
if (!f.sessionPubkey && !f.memberPubkey && !f.meshSlug) return true;
|
||||||
|
|
||||||
|
// Mesh scoping for events that carry a mesh slug. peer_join /
|
||||||
|
// peer_leave / broker_status all carry `data.mesh`; if the
|
||||||
|
// subscriber is bound to a specific mesh, drop events from other
|
||||||
|
// meshes.
|
||||||
|
if (f.meshSlug) {
|
||||||
|
const eventMesh = typeof e.data.mesh === "string" ? e.data.mesh : null;
|
||||||
|
if (eventMesh && eventMesh !== f.meshSlug) return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
// System events (peer_join etc.) flow to every session on the same
|
||||||
|
// mesh — they're informational, not addressed.
|
||||||
|
if (e.kind !== "message") return true;
|
||||||
|
|
||||||
|
const recipientKind = typeof e.data.recipient_kind === "string" ? e.data.recipient_kind : null;
|
||||||
|
const recipientPubkey = typeof e.data.recipient_pubkey === "string" ? e.data.recipient_pubkey.toLowerCase() : null;
|
||||||
|
|
||||||
|
// Legacy publish without recipient context → everyone gets it. Keeps
|
||||||
|
// backward compatibility with older daemon code paths until they're
|
||||||
|
// migrated. Also covers test paths that don't thread context.
|
||||||
|
if (!recipientKind || !recipientPubkey) return true;
|
||||||
|
|
||||||
|
if (recipientKind === "session") {
|
||||||
|
return !!f.sessionPubkey && f.sessionPubkey.toLowerCase() === recipientPubkey;
|
||||||
|
}
|
||||||
|
if (recipientKind === "member") {
|
||||||
|
return !!f.memberPubkey && f.memberPubkey.toLowerCase() === recipientPubkey;
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Open an SSE stream on the response and route bus events to it.
|
||||||
|
* 1.34.10: optional `filter` scopes the stream to one session/member;
|
||||||
|
* see SseFilterOptions. */
|
||||||
|
export function bindSseStream(res: ServerResponse, bus: EventBus, filter: SseFilterOptions = {}): () => void {
|
||||||
|
res.statusCode = 200;
|
||||||
|
res.setHeader("Content-Type", "text/event-stream");
|
||||||
|
res.setHeader("Cache-Control", "no-cache, no-transform");
|
||||||
|
res.setHeader("Connection", "keep-alive");
|
||||||
|
res.setHeader("X-Accel-Buffering", "no");
|
||||||
|
res.write(": connected\n\n");
|
||||||
|
|
||||||
|
let counter = 0;
|
||||||
|
const unsubscribe = bus.subscribe((e) => {
|
||||||
|
if (!shouldDeliver(e, filter)) return;
|
||||||
|
writeSse(res, e, ++counter);
|
||||||
|
});
|
||||||
|
|
||||||
|
const heartbeat = setInterval(() => {
|
||||||
|
try { res.write(": keepalive\n\n"); }
|
||||||
|
catch { /* socket already torn down; cleanup handled below */ }
|
||||||
|
}, 15_000);
|
||||||
|
|
||||||
|
const cleanup = () => {
|
||||||
|
clearInterval(heartbeat);
|
||||||
|
unsubscribe();
|
||||||
|
try { res.end(); } catch { /* ignore */ }
|
||||||
|
};
|
||||||
|
res.on("close", cleanup);
|
||||||
|
res.on("error", cleanup);
|
||||||
|
return cleanup;
|
||||||
|
}
|
||||||
71
apps/cli/src/daemon/fingerprint.ts
Normal file
71
apps/cli/src/daemon/fingerprint.ts
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
// Canonical request fingerprint per spec §4.4.
|
||||||
|
//
|
||||||
|
// request_fingerprint = sha256(
|
||||||
|
// envelope_version || 0x00 ||
|
||||||
|
// destination_kind || 0x00 ||
|
||||||
|
// destination_ref || 0x00 ||
|
||||||
|
// reply_to_id_or_empty || 0x00 ||
|
||||||
|
// priority || 0x00 ||
|
||||||
|
// meta_canonical_json || 0x00 ||
|
||||||
|
// body_hash
|
||||||
|
// )
|
||||||
|
|
||||||
|
import { createHash } from "node:crypto";
|
||||||
|
|
||||||
|
export type DestKind = "topic" | "dm" | "queue";
|
||||||
|
export type Priority = "now" | "next" | "low";
|
||||||
|
|
||||||
|
export interface SendRequestForFingerprint {
|
||||||
|
envelope_version: number;
|
||||||
|
destination_kind: DestKind;
|
||||||
|
destination_ref: string;
|
||||||
|
reply_to_id?: string | null;
|
||||||
|
priority: Priority;
|
||||||
|
meta?: Record<string, unknown> | null;
|
||||||
|
/** UTF-8 body bytes. */
|
||||||
|
body: Uint8Array;
|
||||||
|
}
|
||||||
|
|
||||||
|
const NUL = Buffer.from([0]);
|
||||||
|
|
||||||
|
export function computeRequestFingerprint(req: SendRequestForFingerprint): Buffer {
|
||||||
|
const h = createHash("sha256");
|
||||||
|
h.update(String(req.envelope_version), "utf8"); h.update(NUL);
|
||||||
|
h.update(req.destination_kind, "utf8"); h.update(NUL);
|
||||||
|
h.update(req.destination_ref, "utf8"); h.update(NUL);
|
||||||
|
h.update(req.reply_to_id ?? "", "utf8"); h.update(NUL);
|
||||||
|
h.update(req.priority, "utf8"); h.update(NUL);
|
||||||
|
h.update(req.meta ? canonicalJson(req.meta) : "", "utf8");
|
||||||
|
h.update(NUL);
|
||||||
|
h.update(createHash("sha256").update(req.body).digest());
|
||||||
|
return h.digest();
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Minimal JCS-like canonicalization: sort object keys, no whitespace, no
|
||||||
|
* non-ASCII escape funny business. Sufficient for v0.9.0 (TS-only).
|
||||||
|
* Cross-language SDK ports get a vetted JCS lib + conformance tests
|
||||||
|
* (deferred per followups doc).
|
||||||
|
*/
|
||||||
|
export function canonicalJson(value: unknown): string {
|
||||||
|
return JSON.stringify(sortKeys(value));
|
||||||
|
}
|
||||||
|
|
||||||
|
function sortKeys(value: unknown): unknown {
|
||||||
|
if (Array.isArray(value)) return value.map(sortKeys);
|
||||||
|
if (value !== null && typeof value === "object") {
|
||||||
|
const obj = value as Record<string, unknown>;
|
||||||
|
const out: Record<string, unknown> = {};
|
||||||
|
for (const k of Object.keys(obj).sort()) out[k] = sortKeys(obj[k]);
|
||||||
|
return out;
|
||||||
|
}
|
||||||
|
return value;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function fingerprintHexPrefix(fp: Uint8Array, bytes = 8): string {
|
||||||
|
let s = "";
|
||||||
|
for (let i = 0; i < bytes && i < fp.length; i++) {
|
||||||
|
s += fp[i]!.toString(16).padStart(2, "0");
|
||||||
|
}
|
||||||
|
return s;
|
||||||
|
}
|
||||||
123
apps/cli/src/daemon/identity.ts
Normal file
123
apps/cli/src/daemon/identity.ts
Normal file
@@ -0,0 +1,123 @@
|
|||||||
|
// Accidental-clone detection per spec §2.2. Catches restored backups
|
||||||
|
// and copy-pasted homedirs by comparing a stable host fingerprint
|
||||||
|
// against the one we wrote at first daemon start.
|
||||||
|
//
|
||||||
|
// NOT attacker-grade: anyone copying both the keypair AND the
|
||||||
|
// host_fingerprint defeats this. Threat model §16 says so explicitly.
|
||||||
|
|
||||||
|
import { existsSync, readFileSync, writeFileSync } from "node:fs";
|
||||||
|
import { join } from "node:path";
|
||||||
|
import { createHash, randomUUID } from "node:crypto";
|
||||||
|
import { networkInterfaces } from "node:os";
|
||||||
|
|
||||||
|
import { DAEMON_PATHS } from "./paths.js";
|
||||||
|
|
||||||
|
export type ClonePolicy = "refuse" | "warn" | "allow";
|
||||||
|
|
||||||
|
export interface FingerprintRecord {
|
||||||
|
schema_version: 1;
|
||||||
|
fingerprint: string; // sha256 hex
|
||||||
|
host_id: string; // raw, for diagnostics
|
||||||
|
stable_mac: string; // raw, for diagnostics
|
||||||
|
written_at: string; // ISO date
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface FingerprintCheck {
|
||||||
|
result: "first_run" | "match" | "mismatch" | "unavailable";
|
||||||
|
current: FingerprintRecord;
|
||||||
|
stored?: FingerprintRecord;
|
||||||
|
}
|
||||||
|
|
||||||
|
const FILE_NAME = "host_fingerprint.json";
|
||||||
|
|
||||||
|
function path(): string { return join(DAEMON_PATHS.DAEMON_DIR, FILE_NAME); }
|
||||||
|
|
||||||
|
/** Compute (without writing) the current host fingerprint. */
|
||||||
|
export function computeCurrentFingerprint(): FingerprintRecord {
|
||||||
|
// Per spec §2.2 / followups doc: when neither host_id nor a stable MAC
|
||||||
|
// are readable we fall back to a persisted random UUID. We DO NOT mint
|
||||||
|
// a fresh random per call (that would make every restart look like a
|
||||||
|
// clone). Instead, leave host_id empty when unknown — the MAC alone
|
||||||
|
// identifies the host for accidental-clone detection.
|
||||||
|
const host_id = readHostId() ?? "";
|
||||||
|
const stable_mac = pickStableMac() ?? "";
|
||||||
|
const fp = createHash("sha256").update(host_id, "utf8").update("\0").update(stable_mac, "utf8").digest("hex");
|
||||||
|
return {
|
||||||
|
schema_version: 1,
|
||||||
|
fingerprint: fp,
|
||||||
|
host_id,
|
||||||
|
stable_mac,
|
||||||
|
written_at: new Date().toISOString(),
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// `randomUUID` is no longer used after the random-fallback fix; keep the
|
||||||
|
// import only if other helpers need it.
|
||||||
|
void randomUUID;
|
||||||
|
|
||||||
|
/** Read or write the persisted fingerprint and report the result. */
|
||||||
|
export function checkFingerprint(): FingerprintCheck {
|
||||||
|
const current = computeCurrentFingerprint();
|
||||||
|
if (!existsSync(path())) {
|
||||||
|
writeFileSync(path(), JSON.stringify(current, null, 2), { mode: 0o600 });
|
||||||
|
return { result: "first_run", current };
|
||||||
|
}
|
||||||
|
let stored: FingerprintRecord;
|
||||||
|
try { stored = JSON.parse(readFileSync(path(), "utf8")) as FingerprintRecord; }
|
||||||
|
catch { return { result: "unavailable", current }; }
|
||||||
|
if (stored.fingerprint === current.fingerprint) return { result: "match", current, stored };
|
||||||
|
return { result: "mismatch", current, stored };
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Re-write the fingerprint file. Used by `daemon accept-host`. */
|
||||||
|
export function acceptCurrentHost(): FingerprintRecord {
|
||||||
|
const current = computeCurrentFingerprint();
|
||||||
|
writeFileSync(path(), JSON.stringify(current, null, 2), { mode: 0o600 });
|
||||||
|
return current;
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── platform helpers ───────────────────────────────────────────────────
|
||||||
|
|
||||||
|
function readHostId(): string | null {
|
||||||
|
// Linux: /etc/machine-id (or /var/lib/dbus/machine-id).
|
||||||
|
if (process.platform === "linux") {
|
||||||
|
for (const p of ["/etc/machine-id", "/var/lib/dbus/machine-id"]) {
|
||||||
|
try {
|
||||||
|
const raw = readFileSync(p, "utf8").trim();
|
||||||
|
if (raw) return `linux:${raw}`;
|
||||||
|
} catch { /* try next */ }
|
||||||
|
}
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
// macOS: IOPlatformUUID via ioreg. We avoid spawning by checking ENV.
|
||||||
|
if (process.platform === "darwin") {
|
||||||
|
// No reliable file; fall back to MAC-only fingerprint.
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
// Windows: HKLM\SOFTWARE\Microsoft\Cryptography\MachineGuid. Skip in v0.9.0.
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
function pickStableMac(): string | null {
|
||||||
|
const ifs = networkInterfaces();
|
||||||
|
const candidates: string[] = [];
|
||||||
|
for (const [name, addrs] of Object.entries(ifs)) {
|
||||||
|
if (!addrs) continue;
|
||||||
|
if (isIgnoredInterface(name)) continue;
|
||||||
|
for (const a of addrs) {
|
||||||
|
if (a.internal) continue;
|
||||||
|
if (!a.mac || a.mac === "00:00:00:00:00:00") continue;
|
||||||
|
candidates.push(`${name}::${a.mac}`);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (candidates.length === 0) return null;
|
||||||
|
candidates.sort(); // lex by interface name
|
||||||
|
const first = candidates[0]!;
|
||||||
|
const idx = first.indexOf("::");
|
||||||
|
return idx >= 0 ? first.slice(idx + 2) : first;
|
||||||
|
}
|
||||||
|
|
||||||
|
function isIgnoredInterface(name: string): boolean {
|
||||||
|
return /^(lo|docker|br-|veth|tap|tun|tailscale|wg|utun|ppp|vboxnet|vmnet|awdl|llw)/i.test(name);
|
||||||
|
}
|
||||||
196
apps/cli/src/daemon/inbound.ts
Normal file
196
apps/cli/src/daemon/inbound.ts
Normal file
@@ -0,0 +1,196 @@
|
|||||||
|
// Decode incoming broker pushes and dedupe-insert them into the daemon
|
||||||
|
// inbox. Publishes a `message` event to the daemon's event bus on every
|
||||||
|
// new row (idempotent receives suppress the event).
|
||||||
|
|
||||||
|
import { randomUUID } from "node:crypto";
|
||||||
|
|
||||||
|
import type { SqliteDb } from "./db/sqlite.js";
|
||||||
|
import { insertIfNew } from "./db/inbox.js";
|
||||||
|
import type { EventBus } from "./events.js";
|
||||||
|
import { decryptDirect } from "~/services/crypto/facade.js";
|
||||||
|
|
||||||
|
export interface InboundContext {
|
||||||
|
db: SqliteDb;
|
||||||
|
bus: EventBus;
|
||||||
|
meshSlug: string;
|
||||||
|
/** Daemon's mesh secret key hex, used to decrypt sealed DMs. */
|
||||||
|
recipientSecretKeyHex?: string;
|
||||||
|
/** Daemon's session secret key hex (rotates per connect). When the
|
||||||
|
* sender encrypted to our session pubkey, decrypt with this instead. */
|
||||||
|
sessionSecretKeyHex?: string;
|
||||||
|
/** 1.34.10: recipient pubkey of the WS that received this push.
|
||||||
|
* Either the daemon's member pubkey (member-WS) or one of our
|
||||||
|
* session pubkeys (session-WS). Threaded through to the bus event
|
||||||
|
* so each MCP subscriber can filter to events meant for its own
|
||||||
|
* session — without it, every MCP on the same daemon renders every
|
||||||
|
* inbox row, which manifests as session A seeing its own outbound
|
||||||
|
* to B (because A's MCP also picks up the bus event B's WS just
|
||||||
|
* published). */
|
||||||
|
recipientPubkey?: string;
|
||||||
|
/** 1.34.10: kind of WS this push arrived on. "session" pushes only
|
||||||
|
* surface to the matching session's MCP; "member" pushes surface to
|
||||||
|
* every session on the same mesh (member-keyed broadcasts, member
|
||||||
|
* DMs that don't have a session). */
|
||||||
|
recipientKind?: "session" | "member";
|
||||||
|
/** v2 agentic-comms (M1): emit `client_ack` back to the broker after
|
||||||
|
* the message lands in inbox.db. Broker uses the ack to set
|
||||||
|
* `delivered_at` (atomic at-least-once). Without it, the broker's
|
||||||
|
* 30s lease expires and re-delivers — correct but noisy. The WS
|
||||||
|
* client owns this callback because it's the one that owns the
|
||||||
|
* socket; inbound.ts just signals "I accepted this id." */
|
||||||
|
ackClientMessage?: (clientMessageId: string, brokerMessageId: string | null) => void;
|
||||||
|
/** 1.34.9: drops system events (peer_joined / peer_left /
|
||||||
|
* peer_returned) whose eventData.pubkey is one of our own. The broker
|
||||||
|
* fans peer_joined to every OTHER connection in the mesh — but our
|
||||||
|
* daemon's member-WS counts as "other" relative to our session-WS,
|
||||||
|
* so without this filter the user sees `[system] Peer "<self>"
|
||||||
|
* joined the mesh` every time their own session reconnects.
|
||||||
|
* Implementation passes a closure that walks the live broker map
|
||||||
|
* rather than a static set, so newly-spawned sessions are visible
|
||||||
|
* immediately. */
|
||||||
|
isOwnPubkey?: (pubkey: string) => boolean;
|
||||||
|
log?: (level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) => void;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Spec §4.5: dedupe by `client_message_id` (broker echoes it from the
|
||||||
|
* sender's daemon). When the broker doesn't yet propagate the field
|
||||||
|
* (Sprint 7 prereq), fall back to the broker's `messageId` as the
|
||||||
|
* dedupe key — at-least-once still holds; we just lose the
|
||||||
|
* sender-attested form.
|
||||||
|
*/
|
||||||
|
export async function handleBrokerPush(msg: Record<string, unknown>, ctx: InboundContext): Promise<void> {
|
||||||
|
// System/topology pushes (peer_join, tick, …) — emit verbatim.
|
||||||
|
if (msg.subtype === "system" && typeof msg.event === "string") {
|
||||||
|
const eventData = (msg.eventData as Record<string, unknown> | undefined) ?? {};
|
||||||
|
// 1.34.9: drop self-joins. The broker excludes the JOINING
|
||||||
|
// connection from the fan-out, but our daemon owns multiple
|
||||||
|
// connections per mesh (member-WS + N session-WSs), and each is a
|
||||||
|
// distinct "other" from the broker's view — so a session's own
|
||||||
|
// peer_joined arrives at the same daemon's member-WS and used to
|
||||||
|
// surface as `[system] Peer "<self>" joined`. The session-WS path
|
||||||
|
// already skips system events entirely (see session-broker.ts
|
||||||
|
// 1.34.9), and this filter handles the member-WS path.
|
||||||
|
const eventPubkey = typeof eventData.pubkey === "string" ? eventData.pubkey : "";
|
||||||
|
if (eventPubkey && ctx.isOwnPubkey?.(eventPubkey)) return;
|
||||||
|
ctx.bus.publish(mapSystemEventKind(msg.event), {
|
||||||
|
mesh: ctx.meshSlug,
|
||||||
|
event: msg.event,
|
||||||
|
...eventData,
|
||||||
|
});
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
if (msg.type !== "push") return;
|
||||||
|
|
||||||
|
const brokerMessageId = stringOrNull(msg.messageId);
|
||||||
|
const senderPubkey = stringOrNull(msg.senderPubkey) ?? "";
|
||||||
|
const senderName = stringOrNull(msg.senderName) ?? senderPubkey.slice(0, 8);
|
||||||
|
const senderMemberPk = stringOrNull(msg.senderMemberPubkey);
|
||||||
|
const topic = stringOrNull(msg.topic);
|
||||||
|
const replyToId = stringOrNull(msg.replyToId);
|
||||||
|
const ciphertext = stringOrNull(msg.ciphertext) ?? "";
|
||||||
|
const nonce = stringOrNull(msg.nonce) ?? "";
|
||||||
|
const createdAt = stringOrNull(msg.createdAt);
|
||||||
|
const priority = stringOrNull(msg.priority) ?? "next";
|
||||||
|
const subtype = stringOrNull(msg.subtype);
|
||||||
|
// Forward-compat: Sprint 7 brokers will send client_message_id alongside.
|
||||||
|
const clientMessageId = stringOrNull(msg.client_message_id) ?? brokerMessageId ?? randomUUID();
|
||||||
|
const body = await decryptOrFallback({
|
||||||
|
ciphertext, nonce, senderPubkey, ctx,
|
||||||
|
});
|
||||||
|
|
||||||
|
const id = randomUUID();
|
||||||
|
const inserted = insertIfNew(ctx.db, {
|
||||||
|
id,
|
||||||
|
client_message_id: clientMessageId,
|
||||||
|
broker_message_id: brokerMessageId,
|
||||||
|
mesh: ctx.meshSlug,
|
||||||
|
topic,
|
||||||
|
sender_pubkey: senderPubkey,
|
||||||
|
sender_name: senderName,
|
||||||
|
body,
|
||||||
|
meta: createdAt ? JSON.stringify({ created_at: createdAt }) : null,
|
||||||
|
received_at: Date.now(),
|
||||||
|
reply_to_id: replyToId,
|
||||||
|
// 1.34.11: persist the recipient context so /v1/inbox can scope
|
||||||
|
// queries to the asking session. Mirrors the same fields on the
|
||||||
|
// bus event added in 1.34.10. Falls back to NULL when the caller
|
||||||
|
// didn't pass them (legacy paths, tests).
|
||||||
|
recipient_pubkey: ctx.recipientPubkey ?? null,
|
||||||
|
recipient_kind: ctx.recipientKind ?? null,
|
||||||
|
});
|
||||||
|
|
||||||
|
// Whether the row was newly inserted or already existed (dedupe), the
|
||||||
|
// broker still wants to know we received and processed this message —
|
||||||
|
// ack regardless. Skipping ack on dedupe would leak: broker would
|
||||||
|
// re-deliver after lease, and the receiver would re-dedupe forever.
|
||||||
|
ctx.ackClientMessage?.(clientMessageId, brokerMessageId);
|
||||||
|
|
||||||
|
if (!inserted) return; // already had this id; no event
|
||||||
|
|
||||||
|
ctx.bus.publish("message", {
|
||||||
|
id,
|
||||||
|
mesh: ctx.meshSlug,
|
||||||
|
client_message_id: clientMessageId,
|
||||||
|
broker_message_id: brokerMessageId,
|
||||||
|
sender_pubkey: senderPubkey,
|
||||||
|
sender_member_pubkey: senderMemberPk,
|
||||||
|
sender_name: senderName,
|
||||||
|
topic,
|
||||||
|
reply_to_id: replyToId,
|
||||||
|
priority,
|
||||||
|
...(subtype ? { subtype } : {}),
|
||||||
|
body,
|
||||||
|
created_at: createdAt,
|
||||||
|
// 1.34.10: per-recipient routing context. SSE subscribers (the
|
||||||
|
// MCP servers that translate bus events into channel notifications)
|
||||||
|
// use this to filter to events meant for their own session. Without
|
||||||
|
// it, every MCP on the same daemon emits a channel push for every
|
||||||
|
// inbox row, which means session A sees its own outbound to B
|
||||||
|
// because B's session-WS published the inbox row to the shared bus.
|
||||||
|
...(ctx.recipientPubkey ? { recipient_pubkey: ctx.recipientPubkey } : {}),
|
||||||
|
...(ctx.recipientKind ? { recipient_kind: ctx.recipientKind } : {}),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
async function decryptOrFallback(args: {
|
||||||
|
ciphertext: string;
|
||||||
|
nonce: string;
|
||||||
|
senderPubkey: string;
|
||||||
|
ctx: InboundContext;
|
||||||
|
}): Promise<string | null> {
|
||||||
|
const { ciphertext, nonce, senderPubkey, ctx } = args;
|
||||||
|
if (!ciphertext) return null;
|
||||||
|
|
||||||
|
// Try DM decrypt first (sender used crypto_box against our session/member key).
|
||||||
|
if (nonce && senderPubkey) {
|
||||||
|
const envelope = { nonce, ciphertext };
|
||||||
|
// Try session key (sender encrypted to our session pubkey, the common case).
|
||||||
|
if (ctx.sessionSecretKeyHex) {
|
||||||
|
const pt = await decryptDirect(envelope, senderPubkey, ctx.sessionSecretKeyHex);
|
||||||
|
if (pt !== null) return pt;
|
||||||
|
}
|
||||||
|
// Fall back to member key (sender encrypted to our stable mesh pubkey).
|
||||||
|
if (ctx.recipientSecretKeyHex) {
|
||||||
|
const pt = await decryptDirect(envelope, senderPubkey, ctx.recipientSecretKeyHex);
|
||||||
|
if (pt !== null) return pt;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fallback: broadcast/topic posts are base64 plaintext (existing CLI
|
||||||
|
// pre-encryption convention for `*` and `@topic`). Sprint 7+ adds per-
|
||||||
|
// topic symmetric keys.
|
||||||
|
try { return Buffer.from(ciphertext, "base64").toString("utf8"); }
|
||||||
|
catch (e) { ctx.log?.("warn", "inbound_b64_decode_failed", { err: String(e) }); return null; }
|
||||||
|
}
|
||||||
|
|
||||||
|
function stringOrNull(v: unknown): string | null {
|
||||||
|
return typeof v === "string" && v.length > 0 ? v : null;
|
||||||
|
}
|
||||||
|
|
||||||
|
function mapSystemEventKind(event: string): "peer_join" | "peer_leave" | "system" {
|
||||||
|
if (event === "peer_joined") return "peer_join";
|
||||||
|
if (event === "peer_left") return "peer_leave";
|
||||||
|
return "system";
|
||||||
|
}
|
||||||
73
apps/cli/src/daemon/inbox-pruner.ts
Normal file
73
apps/cli/src/daemon/inbox-pruner.ts
Normal file
@@ -0,0 +1,73 @@
|
|||||||
|
// 1.34.8: TTL prune for inbox.db.
|
||||||
|
//
|
||||||
|
// The inbox grows monotonically — every received DM lands as a row and
|
||||||
|
// nothing removes it except an explicit `claudemesh inbox flush`. For
|
||||||
|
// chatty meshes that's tens of thousands of rows over a few weeks.
|
||||||
|
// SQLite handles that volume fine, but the rows are sitting there
|
||||||
|
// forever and `claudemesh inbox` queries get slower as the table grows.
|
||||||
|
//
|
||||||
|
// The pruner runs hourly inside the daemon process and deletes rows
|
||||||
|
// whose received_at is older than `retentionMs`. Default is 30 days,
|
||||||
|
// which is generous for the "I went on holiday and want to see what I
|
||||||
|
// missed" case but won't carry old rows into next year.
|
||||||
|
//
|
||||||
|
// Best-effort: a failure logs a warning and the pruner keeps trying on
|
||||||
|
// the next interval. There's no shared state to corrupt — pruneInboxBefore
|
||||||
|
// is a single DELETE statement.
|
||||||
|
|
||||||
|
import { pruneInboxBefore } from "./db/inbox.js";
|
||||||
|
import type { SqliteDb } from "./db/sqlite.js";
|
||||||
|
|
||||||
|
export interface InboxPrunerOptions {
|
||||||
|
db: SqliteDb;
|
||||||
|
/** Retention window in ms. Rows with received_at < (now - retentionMs)
|
||||||
|
* are deleted. Default: 30 days. */
|
||||||
|
retentionMs?: number;
|
||||||
|
/** How often to run the prune. Default: 1 hour. */
|
||||||
|
intervalMs?: number;
|
||||||
|
log?: (level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) => void;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface InboxPrunerHandle {
|
||||||
|
stop: () => void;
|
||||||
|
}
|
||||||
|
|
||||||
|
const DEFAULT_RETENTION_MS = 30 * 24 * 60 * 60 * 1000;
|
||||||
|
const DEFAULT_INTERVAL_MS = 60 * 60 * 1000;
|
||||||
|
|
||||||
|
export function startInboxPruner(opts: InboxPrunerOptions): InboxPrunerHandle {
|
||||||
|
const retentionMs = opts.retentionMs ?? DEFAULT_RETENTION_MS;
|
||||||
|
const intervalMs = opts.intervalMs ?? DEFAULT_INTERVAL_MS;
|
||||||
|
const log = opts.log ?? defaultLog;
|
||||||
|
|
||||||
|
const tick = (): void => {
|
||||||
|
try {
|
||||||
|
const cutoff = Date.now() - retentionMs;
|
||||||
|
const removed = pruneInboxBefore(opts.db, cutoff);
|
||||||
|
if (removed > 0) {
|
||||||
|
log("info", "inbox_prune_completed", {
|
||||||
|
removed,
|
||||||
|
retention_days: Math.round(retentionMs / (24 * 60 * 60 * 1000)),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
} catch (e) {
|
||||||
|
log("warn", "inbox_prune_failed", { err: String(e) });
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
// Run once at startup so a daemon that's been down for weeks reaps
|
||||||
|
// immediately rather than waiting an hour.
|
||||||
|
tick();
|
||||||
|
|
||||||
|
const handle = setInterval(tick, intervalMs);
|
||||||
|
// Don't let the pruner block daemon shutdown.
|
||||||
|
if (typeof handle.unref === "function") handle.unref();
|
||||||
|
|
||||||
|
return { stop: () => clearInterval(handle) };
|
||||||
|
}
|
||||||
|
|
||||||
|
function defaultLog(level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) {
|
||||||
|
const line = JSON.stringify({ level, msg, ...meta, ts: new Date().toISOString() });
|
||||||
|
if (level === "info") process.stdout.write(line + "\n");
|
||||||
|
else process.stderr.write(line + "\n");
|
||||||
|
}
|
||||||
82
apps/cli/src/daemon/ipc/client.ts
Normal file
82
apps/cli/src/daemon/ipc/client.ts
Normal file
@@ -0,0 +1,82 @@
|
|||||||
|
import { request as httpRequest } from "node:http";
|
||||||
|
|
||||||
|
import { DAEMON_PATHS, DAEMON_TCP_HOST, DAEMON_TCP_DEFAULT_PORT } from "../paths.js";
|
||||||
|
import { readLocalToken } from "../local-token.js";
|
||||||
|
import { readSessionTokenFromEnv } from "~/services/session/token.js";
|
||||||
|
|
||||||
|
export interface IpcRequestOptions {
|
||||||
|
method?: "GET" | "POST" | "PATCH" | "DELETE";
|
||||||
|
path: string;
|
||||||
|
body?: unknown;
|
||||||
|
/** Force TCP loopback instead of UDS (for tests / cross-container scenarios). */
|
||||||
|
preferTcp?: boolean;
|
||||||
|
timeoutMs?: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface IpcResponse<T = unknown> {
|
||||||
|
status: number;
|
||||||
|
body: T;
|
||||||
|
}
|
||||||
|
|
||||||
|
export class IpcError extends Error {
|
||||||
|
constructor(public status: number, public payload: unknown, msg: string) {
|
||||||
|
super(msg);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Small, dependency-free IPC client for talking to the local daemon. */
|
||||||
|
export async function ipc<T = unknown>(opts: IpcRequestOptions): Promise<IpcResponse<T>> {
|
||||||
|
const useTcp = !!opts.preferTcp;
|
||||||
|
const headers: Record<string, string> = {
|
||||||
|
accept: "application/json",
|
||||||
|
host: "localhost",
|
||||||
|
};
|
||||||
|
|
||||||
|
let bodyBuf: Buffer | undefined;
|
||||||
|
if (opts.body !== undefined) {
|
||||||
|
bodyBuf = Buffer.from(JSON.stringify(opts.body), "utf8");
|
||||||
|
headers["content-type"] = "application/json";
|
||||||
|
headers["content-length"] = String(bodyBuf.length);
|
||||||
|
}
|
||||||
|
|
||||||
|
if (useTcp) {
|
||||||
|
const tok = readLocalToken();
|
||||||
|
if (!tok) throw new IpcError(0, null, "daemon local token not found; is the daemon running?");
|
||||||
|
headers.authorization = `Bearer ${tok}`;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Per-session token attribution. When the calling process has
|
||||||
|
// CLAUDEMESH_IPC_TOKEN_FILE set (a launched session and its
|
||||||
|
// descendants), attach the session token. The daemon's auth
|
||||||
|
// middleware resolves it to a SessionInfo and uses it for default-
|
||||||
|
// mesh scoping. Sent as a second Authorization header is not
|
||||||
|
// possible per HTTP semantics, so we layer: when both UDS and a
|
||||||
|
// session token exist, send the session token; the bearer remains
|
||||||
|
// only for TCP loopback callers.
|
||||||
|
if (!useTcp) {
|
||||||
|
const sessionTok = readSessionTokenFromEnv();
|
||||||
|
if (sessionTok) headers.authorization = `ClaudeMesh-Session ${sessionTok}`;
|
||||||
|
}
|
||||||
|
|
||||||
|
return new Promise<IpcResponse<T>>((resolve, reject) => {
|
||||||
|
const req = httpRequest(
|
||||||
|
useTcp
|
||||||
|
? { host: DAEMON_TCP_HOST, port: DAEMON_TCP_DEFAULT_PORT, path: opts.path, method: opts.method ?? "GET", headers }
|
||||||
|
: { socketPath: DAEMON_PATHS.SOCK_FILE, path: opts.path, method: opts.method ?? "GET", headers },
|
||||||
|
(res) => {
|
||||||
|
const chunks: Buffer[] = [];
|
||||||
|
res.on("data", (c) => chunks.push(c));
|
||||||
|
res.on("end", () => {
|
||||||
|
const raw = Buffer.concat(chunks).toString("utf8");
|
||||||
|
let parsed: unknown = raw;
|
||||||
|
try { parsed = raw.length > 0 ? JSON.parse(raw) : null; } catch { /* leave raw */ }
|
||||||
|
resolve({ status: res.statusCode ?? 0, body: parsed as T });
|
||||||
|
});
|
||||||
|
},
|
||||||
|
);
|
||||||
|
req.setTimeout(opts.timeoutMs ?? 5_000, () => req.destroy(new Error("ipc_timeout")));
|
||||||
|
req.on("error", (err) => reject(err));
|
||||||
|
if (bodyBuf) req.write(bodyBuf);
|
||||||
|
req.end();
|
||||||
|
});
|
||||||
|
}
|
||||||
170
apps/cli/src/daemon/ipc/handlers/send.ts
Normal file
170
apps/cli/src/daemon/ipc/handlers/send.ts
Normal file
@@ -0,0 +1,170 @@
|
|||||||
|
// IPC accept handler for POST /v1/send. Implements the §4.5.1 lookup table:
|
||||||
|
// daemon-local idempotency over outbox states × fingerprint match/mismatch.
|
||||||
|
//
|
||||||
|
// Broker delivery (drain → broker WS) is a separate concern and not part of
|
||||||
|
// this handler — this only serializes the daemon-local accept.
|
||||||
|
|
||||||
|
import { randomUUID } from "node:crypto";
|
||||||
|
|
||||||
|
import {
|
||||||
|
findByClientId,
|
||||||
|
fingerprintsEqual,
|
||||||
|
insertPending,
|
||||||
|
type OutboxRow,
|
||||||
|
} from "../../db/outbox.js";
|
||||||
|
import { inImmediateTx, type SqliteDb } from "../../db/sqlite.js";
|
||||||
|
import {
|
||||||
|
computeRequestFingerprint,
|
||||||
|
fingerprintHexPrefix,
|
||||||
|
type DestKind,
|
||||||
|
type Priority,
|
||||||
|
} from "../../fingerprint.js";
|
||||||
|
|
||||||
|
export interface SendRequest {
|
||||||
|
to: string; // peer name | pubkey hex | @group | * | topic name
|
||||||
|
message: string;
|
||||||
|
priority?: Priority;
|
||||||
|
meta?: Record<string, unknown>;
|
||||||
|
reply_to_id?: string;
|
||||||
|
/** Optional caller-supplied id. Wins over Idempotency-Key header. */
|
||||||
|
client_message_id?: string;
|
||||||
|
/** Destination kind + ref must be supplied by the IPC layer after parsing `to`. */
|
||||||
|
destination_kind: DestKind;
|
||||||
|
destination_ref: string;
|
||||||
|
/** Sprint 4: pre-resolved broker-format target (pubkey hex, "#topicId", @group, *). */
|
||||||
|
target_spec?: string;
|
||||||
|
/** Sprint 4: pre-encrypted ciphertext (base64). For DMs: crypto_box. For broadcast/topic: base64-of-plaintext. */
|
||||||
|
ciphertext?: string;
|
||||||
|
/** Sprint 4: nonce that pairs with ciphertext (base64). */
|
||||||
|
nonce?: string;
|
||||||
|
/** Sprint 4: which mesh this send is for (single-mesh daemon today; multi-mesh later). */
|
||||||
|
mesh?: string;
|
||||||
|
/** 1.34.0: when the IPC request authenticated as a launched session,
|
||||||
|
* the IPC layer fills this with the session's hex pubkey. The drain
|
||||||
|
* worker uses it to route via the matching SessionBrokerClient so
|
||||||
|
* broker fan-out attributes the push to the session pubkey instead
|
||||||
|
* of the daemon's member pubkey. */
|
||||||
|
sender_session_pubkey?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export type AcceptOutcome =
|
||||||
|
| { kind: "accepted_pending"; status: 202; client_message_id: string }
|
||||||
|
| { kind: "accepted_inflight"; status: 202; client_message_id: string }
|
||||||
|
| { kind: "accepted_done"; status: 200; client_message_id: string; broker_message_id: string | null }
|
||||||
|
| { kind: "conflict"; status: 409; reason: string; daemon_fingerprint_prefix: string; broker_message_id?: string | null };
|
||||||
|
|
||||||
|
export interface AcceptDeps {
|
||||||
|
db: SqliteDb;
|
||||||
|
/** Override for testing. */
|
||||||
|
now?: () => number;
|
||||||
|
/** Override for testing. */
|
||||||
|
newId?: () => string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export const ENVELOPE_VERSION = 1;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Daemon-local idempotency: serialized via BEGIN IMMEDIATE so concurrent
|
||||||
|
* IPC requests with the same client_message_id produce one outcome.
|
||||||
|
*/
|
||||||
|
export function acceptSend(req: SendRequest, deps: AcceptDeps): AcceptOutcome {
|
||||||
|
const now = (deps.now ?? Date.now)();
|
||||||
|
const newId = deps.newId ?? randomUUID;
|
||||||
|
|
||||||
|
// Per spec, caller-supplied client_message_id wins; otherwise daemon mints one.
|
||||||
|
const clientId = req.client_message_id?.trim() || ulidLike(newId);
|
||||||
|
|
||||||
|
const body = Buffer.from(req.message, "utf8");
|
||||||
|
const fingerprint = computeRequestFingerprint({
|
||||||
|
envelope_version: ENVELOPE_VERSION,
|
||||||
|
destination_kind: req.destination_kind,
|
||||||
|
destination_ref: req.destination_ref,
|
||||||
|
reply_to_id: req.reply_to_id ?? null,
|
||||||
|
priority: req.priority ?? "next",
|
||||||
|
meta: req.meta ?? null,
|
||||||
|
body,
|
||||||
|
});
|
||||||
|
|
||||||
|
return inImmediateTx(deps.db, () => {
|
||||||
|
const existing = findByClientId(deps.db, clientId);
|
||||||
|
if (!existing) {
|
||||||
|
insertPending(deps.db, {
|
||||||
|
id: newId(),
|
||||||
|
client_message_id: clientId,
|
||||||
|
request_fingerprint: fingerprint,
|
||||||
|
payload: body,
|
||||||
|
now,
|
||||||
|
mesh: req.mesh,
|
||||||
|
target_spec: req.target_spec,
|
||||||
|
nonce: req.nonce,
|
||||||
|
ciphertext: req.ciphertext,
|
||||||
|
priority: req.priority,
|
||||||
|
sender_session_pubkey: req.sender_session_pubkey,
|
||||||
|
});
|
||||||
|
return { kind: "accepted_pending", status: 202, client_message_id: clientId };
|
||||||
|
}
|
||||||
|
|
||||||
|
return decideForExistingRow(existing, fingerprint);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
function decideForExistingRow(row: OutboxRow, fp: Buffer): AcceptOutcome {
|
||||||
|
const match = fingerprintsEqual(fp, row.request_fingerprint);
|
||||||
|
const fpPrefix = fingerprintHexPrefix(fp);
|
||||||
|
|
||||||
|
// Spec §4.5.1 lookup table.
|
||||||
|
switch (row.status) {
|
||||||
|
case "pending":
|
||||||
|
return match
|
||||||
|
? { kind: "accepted_pending", status: 202, client_message_id: row.client_message_id }
|
||||||
|
: conflict("outbox_pending_fingerprint_mismatch", fpPrefix);
|
||||||
|
|
||||||
|
case "inflight":
|
||||||
|
return match
|
||||||
|
? { kind: "accepted_inflight", status: 202, client_message_id: row.client_message_id }
|
||||||
|
: conflict("outbox_inflight_fingerprint_mismatch", fpPrefix);
|
||||||
|
|
||||||
|
case "done":
|
||||||
|
return match
|
||||||
|
? {
|
||||||
|
kind: "accepted_done",
|
||||||
|
status: 200,
|
||||||
|
client_message_id: row.client_message_id,
|
||||||
|
broker_message_id: row.broker_message_id,
|
||||||
|
}
|
||||||
|
: conflict("outbox_done_fingerprint_mismatch", fpPrefix, row.broker_message_id);
|
||||||
|
|
||||||
|
case "dead":
|
||||||
|
return match
|
||||||
|
? conflict("outbox_dead_fingerprint_match", fpPrefix, row.broker_message_id)
|
||||||
|
: conflict("outbox_dead_fingerprint_mismatch", fpPrefix);
|
||||||
|
|
||||||
|
case "aborted":
|
||||||
|
return match
|
||||||
|
? conflict("outbox_aborted_fingerprint_match", fpPrefix)
|
||||||
|
: conflict("outbox_aborted_fingerprint_mismatch", fpPrefix);
|
||||||
|
|
||||||
|
default: {
|
||||||
|
// Exhaustiveness check.
|
||||||
|
const _: never = row.status;
|
||||||
|
throw new Error(`unknown outbox status: ${String(_)}`);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function conflict(reason: string, fpPrefix: string, brokerMessageId: string | null = null): AcceptOutcome {
|
||||||
|
return {
|
||||||
|
kind: "conflict",
|
||||||
|
status: 409,
|
||||||
|
reason,
|
||||||
|
daemon_fingerprint_prefix: fpPrefix,
|
||||||
|
broker_message_id: brokerMessageId,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Tiny ULID-ish generator: 26-char Crockford-base32 from time + random. */
|
||||||
|
function ulidLike(newId: () => string): string {
|
||||||
|
// We don't ship a full ULID lib for one fallback path; uuid is fine here.
|
||||||
|
// The wire-stable id is whatever we return; downstream just uses it as text.
|
||||||
|
return newId();
|
||||||
|
}
|
||||||
1034
apps/cli/src/daemon/ipc/server.ts
Normal file
1034
apps/cli/src/daemon/ipc/server.ts
Normal file
File diff suppressed because it is too large
Load Diff
26
apps/cli/src/daemon/local-token.ts
Normal file
26
apps/cli/src/daemon/local-token.ts
Normal file
@@ -0,0 +1,26 @@
|
|||||||
|
import { mkdirSync, readFileSync, writeFileSync } from "node:fs";
|
||||||
|
import { dirname } from "node:path";
|
||||||
|
import { randomBytes } from "node:crypto";
|
||||||
|
|
||||||
|
import { DAEMON_PATHS } from "./paths.js";
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Local IPC bearer token. Mode 0600. Rotated by deleting the file and
|
||||||
|
* restarting the daemon.
|
||||||
|
*/
|
||||||
|
export function readLocalToken(): string | null {
|
||||||
|
try {
|
||||||
|
return readFileSync(DAEMON_PATHS.TOKEN_FILE, "utf8").trim();
|
||||||
|
} catch {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
export function ensureLocalToken(): string {
|
||||||
|
const existing = readLocalToken();
|
||||||
|
if (existing) return existing;
|
||||||
|
mkdirSync(dirname(DAEMON_PATHS.TOKEN_FILE), { recursive: true, mode: 0o700 });
|
||||||
|
const tok = randomBytes(32).toString("base64url");
|
||||||
|
writeFileSync(DAEMON_PATHS.TOKEN_FILE, tok + "\n", { mode: 0o600 });
|
||||||
|
return tok;
|
||||||
|
}
|
||||||
59
apps/cli/src/daemon/lock.ts
Normal file
59
apps/cli/src/daemon/lock.ts
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
import { existsSync, mkdirSync, readFileSync, unlinkSync, writeFileSync } from "node:fs";
|
||||||
|
import { dirname } from "node:path";
|
||||||
|
|
||||||
|
import { DAEMON_PATHS } from "./paths.js";
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Single-instance lock via PID file. Returns:
|
||||||
|
* - 'acquired' — we hold the lock now, file written.
|
||||||
|
* - 'already-running' — another live process owns it.
|
||||||
|
* - 'stale' — file existed but the recorded PID is dead;
|
||||||
|
* caller should treat as acquired (we overwrote it).
|
||||||
|
*/
|
||||||
|
export type LockResult = "acquired" | "already-running" | "stale";
|
||||||
|
|
||||||
|
export function acquireSingletonLock(): { result: LockResult; pid: number } {
|
||||||
|
mkdirSync(dirname(DAEMON_PATHS.PID_FILE), { recursive: true, mode: 0o700 });
|
||||||
|
|
||||||
|
if (existsSync(DAEMON_PATHS.PID_FILE)) {
|
||||||
|
const raw = readFileSync(DAEMON_PATHS.PID_FILE, "utf8").trim();
|
||||||
|
const oldPid = Number.parseInt(raw, 10);
|
||||||
|
if (Number.isFinite(oldPid) && oldPid > 0 && isProcessAlive(oldPid)) {
|
||||||
|
return { result: "already-running", pid: oldPid };
|
||||||
|
}
|
||||||
|
// stale → unlink and re-acquire
|
||||||
|
try { unlinkSync(DAEMON_PATHS.PID_FILE); } catch { /* race with another acquirer; tolerate */ }
|
||||||
|
writeFileSync(DAEMON_PATHS.PID_FILE, String(process.pid), { mode: 0o600 });
|
||||||
|
return { result: "stale", pid: process.pid };
|
||||||
|
}
|
||||||
|
|
||||||
|
writeFileSync(DAEMON_PATHS.PID_FILE, String(process.pid), { mode: 0o600 });
|
||||||
|
return { result: "acquired", pid: process.pid };
|
||||||
|
}
|
||||||
|
|
||||||
|
export function releaseSingletonLock(): void {
|
||||||
|
try {
|
||||||
|
const raw = readFileSync(DAEMON_PATHS.PID_FILE, "utf8").trim();
|
||||||
|
if (Number.parseInt(raw, 10) === process.pid) unlinkSync(DAEMON_PATHS.PID_FILE);
|
||||||
|
} catch { /* file already gone, fine */ }
|
||||||
|
}
|
||||||
|
|
||||||
|
export function readRunningPid(): number | null {
|
||||||
|
try {
|
||||||
|
const raw = readFileSync(DAEMON_PATHS.PID_FILE, "utf8").trim();
|
||||||
|
const pid = Number.parseInt(raw, 10);
|
||||||
|
if (Number.isFinite(pid) && pid > 0 && isProcessAlive(pid)) return pid;
|
||||||
|
} catch { /* no pid file */ }
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
function isProcessAlive(pid: number): boolean {
|
||||||
|
try {
|
||||||
|
// signal 0: no-op; throws if process doesn't exist or we lack permission.
|
||||||
|
// EPERM means it does exist (just not ours), so treat as alive.
|
||||||
|
process.kill(pid, 0);
|
||||||
|
return true;
|
||||||
|
} catch (err) {
|
||||||
|
return (err as NodeJS.ErrnoException).code === "EPERM";
|
||||||
|
}
|
||||||
|
}
|
||||||
37
apps/cli/src/daemon/paths.ts
Normal file
37
apps/cli/src/daemon/paths.ts
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
import { homedir } from "node:os";
|
||||||
|
import { join } from "node:path";
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Daemon paths intentionally do NOT honor `CLAUDEMESH_CONFIG_DIR`.
|
||||||
|
*
|
||||||
|
* `claudemesh launch` sets `CLAUDEMESH_CONFIG_DIR` to a per-session
|
||||||
|
* tmpdir so that joined-mesh state, last-used selections, and the
|
||||||
|
* IPC session token stay isolated from the host's shared config.
|
||||||
|
* The daemon, however, is a single per-machine process serving every
|
||||||
|
* launched session — its socket, pid file, on-disk outbox, and SQLite
|
||||||
|
* stores all live under `~/.claudemesh/daemon/`. Letting them inherit
|
||||||
|
* the per-session tmpdir would point each CLI invocation inside a
|
||||||
|
* launched session at a daemon socket that doesn't exist, force the
|
||||||
|
* cold path, and surface as "service-managed daemon not responding
|
||||||
|
* within 8000ms" (1.31.0 regression observed in real install).
|
||||||
|
*
|
||||||
|
* `CLAUDEMESH_DAEMON_DIR` exists as an explicit override for tests
|
||||||
|
* and for the rare case of running multiple daemon instances side by
|
||||||
|
* side (e.g. integration tests). Production callers should never set
|
||||||
|
* it.
|
||||||
|
*/
|
||||||
|
const DAEMON_DIR_ROOT =
|
||||||
|
process.env.CLAUDEMESH_DAEMON_DIR || join(homedir(), ".claudemesh", "daemon");
|
||||||
|
|
||||||
|
export const DAEMON_PATHS = {
|
||||||
|
get DAEMON_DIR() { return DAEMON_DIR_ROOT; },
|
||||||
|
get PID_FILE() { return join(this.DAEMON_DIR, "daemon.pid"); },
|
||||||
|
get SOCK_FILE() { return join(this.DAEMON_DIR, "daemon.sock"); },
|
||||||
|
get TOKEN_FILE() { return join(this.DAEMON_DIR, "local-token"); },
|
||||||
|
get OUTBOX_DB() { return join(this.DAEMON_DIR, "outbox.db"); },
|
||||||
|
get INBOX_DB() { return join(this.DAEMON_DIR, "inbox.db"); },
|
||||||
|
get LOG_FILE() { return join(this.DAEMON_DIR, "daemon.log"); },
|
||||||
|
} as const;
|
||||||
|
|
||||||
|
export const DAEMON_TCP_HOST = "127.0.0.1";
|
||||||
|
export const DAEMON_TCP_DEFAULT_PORT = 47823;
|
||||||
98
apps/cli/src/daemon/process-info.ts
Normal file
98
apps/cli/src/daemon/process-info.ts
Normal file
@@ -0,0 +1,98 @@
|
|||||||
|
/**
|
||||||
|
* Process-info helpers used by the session reaper to detect dead-pid AND
|
||||||
|
* pid-reuse safely.
|
||||||
|
*
|
||||||
|
* `process.kill(pid, 0)` alone is insufficient: a recently-recycled pid
|
||||||
|
* passes the liveness check even though the process registered under it
|
||||||
|
* is long gone. To avoid mistakenly trusting a recycled pid, we capture
|
||||||
|
* a stable per-process start-time at register, and compare it on each
|
||||||
|
* sweep — if it changed, treat the original process as dead.
|
||||||
|
*
|
||||||
|
* macOS + Linux both expose `ps -o lstart=` returning a fixed-format
|
||||||
|
* timestamp ("Sun May 4 09:14:00 2026"). Equality is the only
|
||||||
|
* operation the reaper needs, so we keep the value as an opaque string.
|
||||||
|
*
|
||||||
|
* IMPORTANT (1.31.1): every fork / execFile blocks the daemon's event
|
||||||
|
* loop until ps completes (~30-80 ms per call on macOS). The first
|
||||||
|
* 1.31.0 implementation called execFileSync once per registered
|
||||||
|
* session every 5 s, and with 10+ sessions that stalled IPC for hundreds
|
||||||
|
* of milliseconds at a time — long enough that probes against
|
||||||
|
* /v1/version were declared "stale" and the CLI fell back to the cold
|
||||||
|
* path with the misleading "service-managed daemon not responding"
|
||||||
|
* warning. This module now exposes:
|
||||||
|
*
|
||||||
|
* - `getProcessStartTime(pid)`: async, single-pid, used at register.
|
||||||
|
* - `getProcessStartTimes(pids)`: async, batched, used by the reaper.
|
||||||
|
* One ps invocation handles N pids, so the per-sweep cost is fixed
|
||||||
|
* and tiny regardless of how many sessions are registered.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { execFile } from "node:child_process";
|
||||||
|
import { promisify } from "node:util";
|
||||||
|
|
||||||
|
const execFileAsync = promisify(execFile);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Returns a stable process-start identifier for `pid`, or null if the
|
||||||
|
* process is dead or unreachable. Async — never blocks the event loop.
|
||||||
|
*/
|
||||||
|
export async function getProcessStartTime(pid: number): Promise<string | null> {
|
||||||
|
if (!Number.isFinite(pid) || pid <= 0) return null;
|
||||||
|
try {
|
||||||
|
const { stdout } = await execFileAsync("ps", ["-o", "lstart=", "-p", String(pid)], {
|
||||||
|
encoding: "utf8",
|
||||||
|
timeout: 1_000,
|
||||||
|
});
|
||||||
|
const out = stdout.trim();
|
||||||
|
return out.length > 0 ? out : null;
|
||||||
|
} catch {
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Batched form: returns a Map<pid, lstart> for every pid that is still
|
||||||
|
* alive. Pids that ps doesn't return (i.e. dead) are absent from the
|
||||||
|
* map. One ps fork handles all pids — O(1) sweep cost regardless of
|
||||||
|
* session count.
|
||||||
|
*/
|
||||||
|
export async function getProcessStartTimes(pids: number[]): Promise<Map<number, string>> {
|
||||||
|
const result = new Map<number, string>();
|
||||||
|
const valid = pids.filter((p) => Number.isFinite(p) && p > 0);
|
||||||
|
if (valid.length === 0) return result;
|
||||||
|
// ps -o pid,lstart= -p p1,p2,... emits one row per live pid:
|
||||||
|
// " 12345 Sun May 4 09:14:00 2026"
|
||||||
|
// Dead pids are silently omitted.
|
||||||
|
try {
|
||||||
|
const { stdout } = await execFileAsync(
|
||||||
|
"ps",
|
||||||
|
["-o", "pid=,lstart=", "-p", valid.join(",")],
|
||||||
|
{ encoding: "utf8", timeout: 2_000 },
|
||||||
|
);
|
||||||
|
for (const raw of stdout.split("\n")) {
|
||||||
|
const line = raw.trim();
|
||||||
|
if (!line) continue;
|
||||||
|
const m = /^(\d+)\s+(.+)$/.exec(line);
|
||||||
|
if (!m) continue;
|
||||||
|
const pid = Number.parseInt(m[1]!, 10);
|
||||||
|
const lstart = m[2]!.trim();
|
||||||
|
if (Number.isFinite(pid) && lstart.length > 0) result.set(pid, lstart);
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
// ps failure (timeout, ENOENT) — treat as "no info available" and
|
||||||
|
// let the reaper fall back to bare liveness for these pids. Better
|
||||||
|
// to keep entries than to nuke them on a transient ps error.
|
||||||
|
}
|
||||||
|
return result;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Liveness-only probe (signal 0). Use together with start-time guard. */
|
||||||
|
export function isPidAlive(pid: number): boolean {
|
||||||
|
if (!Number.isFinite(pid) || pid <= 0) return false;
|
||||||
|
try {
|
||||||
|
process.kill(pid, 0);
|
||||||
|
return true;
|
||||||
|
} catch {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
356
apps/cli/src/daemon/run.ts
Normal file
356
apps/cli/src/daemon/run.ts
Normal file
@@ -0,0 +1,356 @@
|
|||||||
|
import { existsSync, mkdirSync, readFileSync } from "node:fs";
|
||||||
|
|
||||||
|
import { DAEMON_PATHS } from "./paths.js";
|
||||||
|
import { acquireSingletonLock, releaseSingletonLock } from "./lock.js";
|
||||||
|
import { ensureLocalToken } from "./local-token.js";
|
||||||
|
import { startIpcServer } from "./ipc/server.js";
|
||||||
|
import { setRegistryHooks, startReaper, type SessionInfo } from "./session-registry.js";
|
||||||
|
import { openSqlite, type SqliteDb } from "./db/sqlite.js";
|
||||||
|
import { migrateOutbox } from "./db/outbox.js";
|
||||||
|
import { migrateInbox } from "./db/inbox.js";
|
||||||
|
import { DaemonBrokerClient } from "./broker.js";
|
||||||
|
import { SessionBrokerClient } from "./session-broker.js";
|
||||||
|
import { startDrainWorker, type DrainHandle } from "./drain.js";
|
||||||
|
import { startInboxPruner, type InboxPrunerHandle } from "./inbox-pruner.js";
|
||||||
|
import { handleBrokerPush } from "./inbound.js";
|
||||||
|
import { EventBus } from "./events.js";
|
||||||
|
import { checkFingerprint, type ClonePolicy } from "./identity.js";
|
||||||
|
import { readConfig } from "~/services/config/facade.js";
|
||||||
|
import { VERSION } from "~/constants/urls.js";
|
||||||
|
|
||||||
|
export interface RunDaemonOptions {
|
||||||
|
/** Disable TCP loopback (UDS-only). Defaults true in container envs. */
|
||||||
|
tcpEnabled?: boolean;
|
||||||
|
publicHealthCheck?: boolean;
|
||||||
|
/** Behavior on host_fingerprint mismatch. Defaults 'refuse'. */
|
||||||
|
clonePolicy?: ClonePolicy;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Detect a few common container environments to pick UDS-only by default. */
|
||||||
|
function detectContainer(): boolean {
|
||||||
|
if (process.env.KUBERNETES_SERVICE_HOST) return true;
|
||||||
|
if (process.env.CONTAINER === "1") return true;
|
||||||
|
try {
|
||||||
|
if (existsSync("/.dockerenv")) return true;
|
||||||
|
const cg = readFileSync("/proc/1/cgroup", "utf8");
|
||||||
|
if (/(docker|kubepods|containerd)/.test(cg)) return true;
|
||||||
|
} catch { /* not linux or no /proc */ }
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
export async function runDaemon(opts: RunDaemonOptions = {}): Promise<number> {
|
||||||
|
mkdirSync(DAEMON_PATHS.DAEMON_DIR, { recursive: true, mode: 0o700 });
|
||||||
|
|
||||||
|
const lock = acquireSingletonLock();
|
||||||
|
if (lock.result === "already-running") {
|
||||||
|
process.stderr.write(`daemon already running (pid ${lock.pid})\n`);
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
if (lock.result === "stale") {
|
||||||
|
process.stderr.write(`recovered stale pid file; starting fresh\n`);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Accidental-clone detection (spec §2.2). Default policy: refuse.
|
||||||
|
const fpCheck = checkFingerprint();
|
||||||
|
const policy: ClonePolicy = opts.clonePolicy ?? "refuse";
|
||||||
|
if (fpCheck.result === "mismatch") {
|
||||||
|
const msg = `host_fingerprint mismatch: this daemon dir was started on a different host.`;
|
||||||
|
if (policy === "refuse") {
|
||||||
|
process.stderr.write(`${msg}\n`);
|
||||||
|
process.stderr.write(` stored host_id: ${fpCheck.stored?.host_id}\n`);
|
||||||
|
process.stderr.write(` current host_id: ${fpCheck.current.host_id}\n`);
|
||||||
|
process.stderr.write(`Run \`claudemesh daemon accept-host\` to write a fresh fingerprint, or\n`);
|
||||||
|
process.stderr.write(`run \`claudemesh daemon remint\` to mint a new keypair (Sprint 7+).\n`);
|
||||||
|
releaseSingletonLock();
|
||||||
|
return 4;
|
||||||
|
}
|
||||||
|
if (policy === "warn") {
|
||||||
|
process.stderr.write(`WARN: ${msg} (continuing per [clone] policy=warn)\n`);
|
||||||
|
}
|
||||||
|
// 'allow' is silent.
|
||||||
|
}
|
||||||
|
if (fpCheck.result === "first_run") {
|
||||||
|
process.stdout.write(JSON.stringify({
|
||||||
|
msg: "host_fingerprint_written", fingerprint_prefix: fpCheck.current.fingerprint.slice(0, 16), ts: new Date().toISOString(),
|
||||||
|
}) + "\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
const localToken = ensureLocalToken();
|
||||||
|
const tcpEnabled = opts.tcpEnabled ?? !detectContainer();
|
||||||
|
|
||||||
|
let outboxDb: SqliteDb;
|
||||||
|
let inboxDb: SqliteDb;
|
||||||
|
try {
|
||||||
|
outboxDb = await openSqlite(DAEMON_PATHS.OUTBOX_DB);
|
||||||
|
migrateOutbox(outboxDb);
|
||||||
|
inboxDb = await openSqlite(DAEMON_PATHS.INBOX_DB);
|
||||||
|
migrateInbox(inboxDb);
|
||||||
|
} catch (err) {
|
||||||
|
process.stderr.write(`db open failed: ${String(err)}\n`);
|
||||||
|
releaseSingletonLock();
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
const bus = new EventBus();
|
||||||
|
|
||||||
|
// 1.34.10: the daemon is universal — attaches to every mesh listed
|
||||||
|
// in config.json. Single-mesh isolation is handled by simply joining
|
||||||
|
// only one mesh in that environment (containers, etc.). No --mesh
|
||||||
|
// flag, no per-mesh service unit; one daemon, every mesh.
|
||||||
|
const cfg = readConfig();
|
||||||
|
if (cfg.meshes.length === 0) {
|
||||||
|
process.stderr.write(`no mesh joined; run \`claudemesh join <invite-url>\` first\n`);
|
||||||
|
releaseSingletonLock();
|
||||||
|
try { outboxDb.close(); } catch { /* ignore */ }
|
||||||
|
return 2;
|
||||||
|
}
|
||||||
|
const meshes = cfg.meshes;
|
||||||
|
|
||||||
|
// 1.34.9 — declared upfront so the daemon-WS onPush closure can
|
||||||
|
// reach into the per-session map for the isOwnPubkey filter (drops
|
||||||
|
// peer_joined / peer_left events for our own session pubkeys before
|
||||||
|
// they surface as `[system] Peer "<self>" joined`). Populated below
|
||||||
|
// by setRegistryHooks; empty until the first session registers, but
|
||||||
|
// that's fine — the closure walks it lazily.
|
||||||
|
const sessionBrokers = new Map<string, SessionBrokerClient>();
|
||||||
|
const sessionBrokersByPubkey = new Map<string, SessionBrokerClient>();
|
||||||
|
|
||||||
|
// Spin up one broker per mesh. Connection failures are non-fatal:
|
||||||
|
// the outbox keeps queuing per-mesh and reconnect logic in
|
||||||
|
// DaemonBrokerClient handles reattach.
|
||||||
|
const brokers = new Map<string, DaemonBrokerClient>();
|
||||||
|
const meshConfigs = new Map<string, typeof cfg.meshes[number]>();
|
||||||
|
for (const mesh of meshes) {
|
||||||
|
meshConfigs.set(mesh.slug, mesh);
|
||||||
|
// 1.34.10: no global displayName override anymore. Each mesh's
|
||||||
|
// hello uses its own per-mesh display name from config.json (set
|
||||||
|
// at `claudemesh join` time). Sessions advertise their own name
|
||||||
|
// via `claudemesh launch --name`.
|
||||||
|
const broker: DaemonBrokerClient = new DaemonBrokerClient(mesh, {
|
||||||
|
onStatusChange: (s) => {
|
||||||
|
process.stdout.write(JSON.stringify({
|
||||||
|
msg: "broker_status", status: s, mesh: mesh.slug, ts: new Date().toISOString(),
|
||||||
|
}) + "\n");
|
||||||
|
bus.publish("broker_status", { mesh: mesh.slug, status: s });
|
||||||
|
},
|
||||||
|
onPush: (m) => {
|
||||||
|
// Daemon-WS is member-keyed, not session-keyed. Session-targeted
|
||||||
|
// DMs land on the per-session WS (SessionBrokerClient) since
|
||||||
|
// 1.32.1 and decrypt with the session secret there. Anything that
|
||||||
|
// arrives here can only be member-keyed (broadcasts, member DMs,
|
||||||
|
// system events) — pass member secret only.
|
||||||
|
// 1.34.9: drop self-echoes — broker fan-out paths mirror an
|
||||||
|
// outbound back to the SAME daemon's member-WS even when the
|
||||||
|
// send originated on a session-WS (because both connections
|
||||||
|
// belong to the same member from the broker's view). Filter on
|
||||||
|
// senderMemberPubkey alone: anything attributed to OUR member is
|
||||||
|
// either our own send echoing back or, theoretically, a peer
|
||||||
|
// send from a different connection that happens to share our
|
||||||
|
// pubkey — but two-different-clients-same-pubkey is impossible
|
||||||
|
// by construction (member pubkeys are stable + unique per
|
||||||
|
// identity). Sibling-session DMs don't fan to our member-WS;
|
||||||
|
// they fan session-to-session. So this is safe.
|
||||||
|
const senderMemberPk = String((m as Record<string, unknown>).senderMemberPubkey ?? "").toLowerCase();
|
||||||
|
const ownMember = mesh.pubkey.toLowerCase();
|
||||||
|
if (senderMemberPk && senderMemberPk === ownMember) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
void handleBrokerPush(m, {
|
||||||
|
db: inboxDb,
|
||||||
|
bus,
|
||||||
|
meshSlug: mesh.slug,
|
||||||
|
recipientSecretKeyHex: mesh.secretKey,
|
||||||
|
// v2 agentic-comms (M1): client_ack closes the at-least-once
|
||||||
|
// loop. Broker holds the row claimed (not delivered) until ack.
|
||||||
|
ackClientMessage: (cmid, bmid) => broker.sendClientAck(cmid, bmid),
|
||||||
|
// 1.34.9: drop self-join system events. Member pubkey + every
|
||||||
|
// live session pubkey on this daemon all count as "us".
|
||||||
|
isOwnPubkey: (pubkey) => {
|
||||||
|
const lower = pubkey.toLowerCase();
|
||||||
|
if (lower === ownMember) return true;
|
||||||
|
return sessionBrokersByPubkey.has(lower);
|
||||||
|
},
|
||||||
|
// 1.34.10: tag the bus event with our member pubkey so the
|
||||||
|
// SSE demux only fans this row to MCPs whose subscriber
|
||||||
|
// matches (member-keyed broadcasts / DMs).
|
||||||
|
recipientPubkey: mesh.pubkey,
|
||||||
|
recipientKind: "member",
|
||||||
|
});
|
||||||
|
},
|
||||||
|
});
|
||||||
|
broker.connect().catch((err) => process.stderr.write(`broker connect failed for ${mesh.slug}: ${String(err)}\n`));
|
||||||
|
brokers.set(mesh.slug, broker);
|
||||||
|
}
|
||||||
|
|
||||||
|
// 1.30.0 — per-session broker presence. Always on. Older CLIs that
|
||||||
|
// don't include `presence` material in the register body just won't
|
||||||
|
// get a session WS; the daemon's own member-keyed broker still
|
||||||
|
// covers them.
|
||||||
|
//
|
||||||
|
// The two index maps (sessionBrokers by token, sessionBrokersByPubkey
|
||||||
|
// by session pubkey) are declared earlier in this function so the
|
||||||
|
// daemon-WS onPush closure can reference them for the isOwnPubkey
|
||||||
|
// self-join filter.
|
||||||
|
|
||||||
|
// Start the drain worker. With multi-mesh, drain dispatches each
|
||||||
|
// outbox row to its mesh's broker via the `mesh` column.
|
||||||
|
// 1.34.0: drain also accepts a session-pubkey lookup so rows
|
||||||
|
// written by authenticated sessions route via the matching session-WS
|
||||||
|
// (broker fan-out then attributes the push to the session pubkey).
|
||||||
|
let drain: DrainHandle | null = null;
|
||||||
|
drain = startDrainWorker({
|
||||||
|
db: outboxDb,
|
||||||
|
brokers,
|
||||||
|
getSessionBrokerByPubkey: (pubkey) => sessionBrokersByPubkey.get(pubkey),
|
||||||
|
});
|
||||||
|
|
||||||
|
// 1.34.8 — TTL prune for inbox.db. Runs hourly with a 30-day default
|
||||||
|
// retention. Without this the inbox grows unbounded; even on a moderate
|
||||||
|
// mesh that's tens of thousands of rows over a few weeks. Prune is a
|
||||||
|
// single DELETE; failures are non-fatal and the next interval retries.
|
||||||
|
const inboxPruner: InboxPrunerHandle = startInboxPruner({ db: inboxDb });
|
||||||
|
setRegistryHooks({
|
||||||
|
onRegister: (info) => {
|
||||||
|
if (!info.presence) return;
|
||||||
|
const meshConfig = meshConfigs.get(info.mesh);
|
||||||
|
if (!meshConfig) {
|
||||||
|
process.stderr.write(JSON.stringify({
|
||||||
|
level: "warn", msg: "session_broker_no_mesh_config", mesh: info.mesh,
|
||||||
|
ts: new Date().toISOString(),
|
||||||
|
}) + "\n");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
// Drop any pre-existing session WS under this token (re-register).
|
||||||
|
const prior = sessionBrokers.get(info.token);
|
||||||
|
if (prior) {
|
||||||
|
sessionBrokers.delete(info.token);
|
||||||
|
// 1.34.0: keep both indices in sync.
|
||||||
|
if (sessionBrokersByPubkey.get(prior.sessionPubkey) === prior) {
|
||||||
|
sessionBrokersByPubkey.delete(prior.sessionPubkey);
|
||||||
|
}
|
||||||
|
prior.close().catch(() => { /* ignore */ });
|
||||||
|
}
|
||||||
|
// 1.32.1 — wire push delivery. Messages targeted at the launched
|
||||||
|
// session's pubkey land on THIS WS, not on the member-keyed one,
|
||||||
|
// so without this forward they'd silently disappear (the bug that
|
||||||
|
// kept inbox.db at zero rows since 1.30.0). Decrypt prefers the
|
||||||
|
// session secret key; member key remains the fallback for legacy
|
||||||
|
// member-targeted traffic that happens to fan out here.
|
||||||
|
const sessionSecretKeyHex = info.presence.sessionSecretKey;
|
||||||
|
// Capture the pubkey for the onPush closure below — TS can't
|
||||||
|
// narrow `info.presence` inside the async arrow even though we
|
||||||
|
// guard `if (!info.presence) return` earlier.
|
||||||
|
const sessionPubkeyHex = info.presence.sessionPubkey;
|
||||||
|
const client: SessionBrokerClient = new SessionBrokerClient({
|
||||||
|
mesh: meshConfig,
|
||||||
|
sessionPubkey: info.presence.sessionPubkey,
|
||||||
|
sessionSecretKey: info.presence.sessionSecretKey,
|
||||||
|
parentAttestation: info.presence.parentAttestation,
|
||||||
|
sessionId: info.sessionId,
|
||||||
|
displayName: info.displayName,
|
||||||
|
...(info.role ? { role: info.role } : {}),
|
||||||
|
...(info.cwd ? { cwd: info.cwd } : {}),
|
||||||
|
pid: info.pid,
|
||||||
|
onPush: (m) => {
|
||||||
|
void handleBrokerPush(m, {
|
||||||
|
db: inboxDb,
|
||||||
|
bus,
|
||||||
|
meshSlug: meshConfig.slug,
|
||||||
|
recipientSecretKeyHex: meshConfig.secretKey,
|
||||||
|
sessionSecretKeyHex,
|
||||||
|
// v2 agentic-comms (M1): close the at-least-once loop.
|
||||||
|
ackClientMessage: (cmid, bmid) => client.sendClientAck(cmid, bmid),
|
||||||
|
// 1.34.10: tag the bus event with this session's pubkey so
|
||||||
|
// the SSE demux only delivers to the MCP serving THIS
|
||||||
|
// session — not its siblings on the same daemon. Without
|
||||||
|
// this, A's MCP also rendered DMs intended for B because
|
||||||
|
// the bus was a single shared stream.
|
||||||
|
recipientPubkey: sessionPubkeyHex,
|
||||||
|
recipientKind: "session",
|
||||||
|
});
|
||||||
|
},
|
||||||
|
});
|
||||||
|
sessionBrokers.set(info.token, client);
|
||||||
|
sessionBrokersByPubkey.set(info.presence.sessionPubkey, client);
|
||||||
|
client.connect().catch((err) =>
|
||||||
|
process.stderr.write(JSON.stringify({
|
||||||
|
level: "warn", msg: "session_broker_connect_failed",
|
||||||
|
mesh: info.mesh, err: String(err), ts: new Date().toISOString(),
|
||||||
|
}) + "\n"),
|
||||||
|
);
|
||||||
|
},
|
||||||
|
onDeregister: (info: SessionInfo) => {
|
||||||
|
const client = sessionBrokers.get(info.token);
|
||||||
|
if (!client) return;
|
||||||
|
sessionBrokers.delete(info.token);
|
||||||
|
// 1.34.0: drop the pubkey index iff this client still owns it
|
||||||
|
// (a re-register may have already swapped the entry).
|
||||||
|
if (sessionBrokersByPubkey.get(client.sessionPubkey) === client) {
|
||||||
|
sessionBrokersByPubkey.delete(client.sessionPubkey);
|
||||||
|
}
|
||||||
|
client.close().catch(() => { /* ignore */ });
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
startReaper();
|
||||||
|
|
||||||
|
const ipc = startIpcServer({
|
||||||
|
localToken,
|
||||||
|
tcpEnabled,
|
||||||
|
publicHealthCheck: opts.publicHealthCheck,
|
||||||
|
outboxDb,
|
||||||
|
inboxDb,
|
||||||
|
bus,
|
||||||
|
brokers,
|
||||||
|
meshConfigs,
|
||||||
|
onPendingInserted: () => drain?.wake(),
|
||||||
|
});
|
||||||
|
|
||||||
|
try {
|
||||||
|
await ipc.ready;
|
||||||
|
} catch (err) {
|
||||||
|
process.stderr.write(`ipc listen failed: ${String(err)}\n`);
|
||||||
|
releaseSingletonLock();
|
||||||
|
return 1;
|
||||||
|
}
|
||||||
|
|
||||||
|
process.stdout.write(JSON.stringify({
|
||||||
|
msg: "daemon_started",
|
||||||
|
// 1.34.10: stamp the version so users can tell whether the
|
||||||
|
// running daemon picked up a recent CLI ship. Read off the same
|
||||||
|
// VERSION constant the IPC `/v1/version` endpoint serves.
|
||||||
|
version: VERSION,
|
||||||
|
pid: process.pid,
|
||||||
|
sock: DAEMON_PATHS.SOCK_FILE,
|
||||||
|
tcp: tcpEnabled ? `127.0.0.1:47823` : null,
|
||||||
|
meshes: meshes.map((m) => m.slug),
|
||||||
|
ts: new Date().toISOString(),
|
||||||
|
}) + "\n");
|
||||||
|
|
||||||
|
let shuttingDown = false;
|
||||||
|
const shutdown = async (sig: string) => {
|
||||||
|
if (shuttingDown) return;
|
||||||
|
shuttingDown = true;
|
||||||
|
process.stdout.write(JSON.stringify({ msg: "daemon_shutdown", signal: sig, ts: new Date().toISOString() }) + "\n");
|
||||||
|
inboxPruner.stop();
|
||||||
|
if (drain) await drain.close();
|
||||||
|
for (const b of brokers.values()) {
|
||||||
|
try { await b.close(); } catch { /* ignore */ }
|
||||||
|
}
|
||||||
|
for (const b of sessionBrokers.values()) {
|
||||||
|
try { await b.close(); } catch { /* ignore */ }
|
||||||
|
}
|
||||||
|
sessionBrokers.clear();
|
||||||
|
await ipc.close();
|
||||||
|
try { outboxDb.close(); } catch { /* ignore */ }
|
||||||
|
try { inboxDb.close(); } catch { /* ignore */ }
|
||||||
|
releaseSingletonLock();
|
||||||
|
process.exit(0);
|
||||||
|
};
|
||||||
|
|
||||||
|
process.on("SIGINT", () => shutdown("SIGINT"));
|
||||||
|
process.on("SIGTERM", () => shutdown("SIGTERM"));
|
||||||
|
|
||||||
|
// Hold the event loop open until a signal arrives.
|
||||||
|
return new Promise<number>(() => { /* never resolves; signals call process.exit */ });
|
||||||
|
}
|
||||||
259
apps/cli/src/daemon/service-install.ts
Normal file
259
apps/cli/src/daemon/service-install.ts
Normal file
@@ -0,0 +1,259 @@
|
|||||||
|
// Service-install for daemon mode (spec §9). Two platforms:
|
||||||
|
// - macOS: ~/Library/LaunchAgents/com.claudemesh.daemon.plist (launchctl bootstrap)
|
||||||
|
// - Linux: ~/.config/systemd/user/claudemesh-daemon.service (systemctl --user enable)
|
||||||
|
//
|
||||||
|
// Both run as the invoking user, redirect stdout/stderr to ~/.claudemesh/
|
||||||
|
// daemon/daemon.log, restart on crash, and start at login. CI envs are
|
||||||
|
// refused unless --allow-ci-persistent is passed (spec §9 / §16.3).
|
||||||
|
|
||||||
|
import { existsSync, mkdirSync, writeFileSync, unlinkSync, readFileSync } from "node:fs";
|
||||||
|
import { execSync } from "node:child_process";
|
||||||
|
import { homedir } from "node:os";
|
||||||
|
import { join, dirname } from "node:path";
|
||||||
|
|
||||||
|
import { DAEMON_PATHS } from "./paths.js";
|
||||||
|
|
||||||
|
export type ServicePlatform = "darwin" | "linux";
|
||||||
|
export interface InstallResult {
|
||||||
|
platform: ServicePlatform;
|
||||||
|
unitPath: string;
|
||||||
|
/** Shell snippet that the operator can run to bring the service up now. */
|
||||||
|
bootCommand: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
const SERVICE_LABEL = "com.claudemesh.daemon";
|
||||||
|
const SYSTEMD_UNIT = "claudemesh-daemon.service";
|
||||||
|
|
||||||
|
export function detectPlatform(): ServicePlatform | null {
|
||||||
|
if (process.platform === "darwin") return "darwin";
|
||||||
|
if (process.platform === "linux") return "linux";
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
|
||||||
|
function isCi(): boolean {
|
||||||
|
return !!(process.env.CI || process.env.GITHUB_ACTIONS || process.env.GITLAB_CI || process.env.BUILDKITE
|
||||||
|
|| process.env.CIRCLECI || process.env.JENKINS_URL);
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface InstallArgs {
|
||||||
|
/** Path to the `claudemesh` binary, e.g. /opt/homebrew/bin/claudemesh */
|
||||||
|
binaryPath: string;
|
||||||
|
/**
|
||||||
|
* Optional mesh slug to lock the daemon to. Omit (the new default) so
|
||||||
|
* the daemon attaches to every joined mesh — matches the 1.26.0
|
||||||
|
* multi-mesh design. Single-mesh lock is preserved for users who
|
||||||
|
* explicitly want it (testing, CI, host with one mesh).
|
||||||
|
*/
|
||||||
|
meshSlug?: string;
|
||||||
|
/** Optional display name. */
|
||||||
|
displayName?: string;
|
||||||
|
/** Override the auto-detected CI refusal. */
|
||||||
|
allowCi?: boolean;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function installService(args: InstallArgs): InstallResult {
|
||||||
|
const platform = detectPlatform();
|
||||||
|
if (!platform) throw new Error(`unsupported platform: ${process.platform}`);
|
||||||
|
if (isCi() && !args.allowCi) {
|
||||||
|
throw new Error("Refusing to install persistent service in CI; pass --allow-ci-persistent to override.");
|
||||||
|
}
|
||||||
|
if (!existsSync(args.binaryPath)) {
|
||||||
|
throw new Error(`binary not found at ${args.binaryPath}`);
|
||||||
|
}
|
||||||
|
// Make sure the daemon dir exists so the launchd/systemd log paths resolve.
|
||||||
|
mkdirSync(DAEMON_PATHS.DAEMON_DIR, { recursive: true, mode: 0o700 });
|
||||||
|
|
||||||
|
if (platform === "darwin") return installDarwin(args);
|
||||||
|
return installLinux(args);
|
||||||
|
}
|
||||||
|
|
||||||
|
export function uninstallService(): { platform: ServicePlatform | null; removed: string[] } {
|
||||||
|
const platform = detectPlatform();
|
||||||
|
const removed: string[] = [];
|
||||||
|
if (platform === "darwin") {
|
||||||
|
const p = darwinPlistPath();
|
||||||
|
try { execSync(`launchctl bootout gui/$(id -u)/${SERVICE_LABEL}`, { stdio: "ignore" }); } catch { /* not loaded */ }
|
||||||
|
if (existsSync(p)) { unlinkSync(p); removed.push(p); }
|
||||||
|
} else if (platform === "linux") {
|
||||||
|
const p = linuxUnitPath();
|
||||||
|
try { execSync(`systemctl --user disable --now ${SYSTEMD_UNIT}`, { stdio: "ignore" }); } catch { /* not loaded */ }
|
||||||
|
if (existsSync(p)) { unlinkSync(p); removed.push(p); }
|
||||||
|
}
|
||||||
|
return { platform, removed };
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── macOS ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
function darwinPlistPath(): string {
|
||||||
|
return join(homedir(), "Library", "LaunchAgents", `${SERVICE_LABEL}.plist`);
|
||||||
|
}
|
||||||
|
|
||||||
|
function installDarwin(args: InstallArgs): InstallResult {
|
||||||
|
const plist = darwinPlistPath();
|
||||||
|
mkdirSync(dirname(plist), { recursive: true });
|
||||||
|
const log = DAEMON_PATHS.LOG_FILE;
|
||||||
|
// Resolve `node` explicitly. The bin script in node_modules/.bin starts
|
||||||
|
// with `#!/usr/bin/env node`; under launchd's restricted PATH that would
|
||||||
|
// resolve `node` to a system Node (often the wrong major) instead of the
|
||||||
|
// one that installed claudemesh-cli. Pinning process.execPath here means
|
||||||
|
// the daemon always runs under the same Node that ran `claudemesh install`.
|
||||||
|
const nodeBin = process.execPath;
|
||||||
|
// 1.34.12: --foreground because launchd manages lifecycle + stdio.
|
||||||
|
// Without it, the daemon would re-spawn itself detached (the new
|
||||||
|
// default) and launchd would lose track of the actual long-lived
|
||||||
|
// process — KeepAlive wouldn't work and stdout redirect would
|
||||||
|
// capture only the parent's brief boot.
|
||||||
|
const meshArgs = [
|
||||||
|
`<string>${escapeXml(args.binaryPath)}</string>`,
|
||||||
|
"<string>daemon</string>",
|
||||||
|
"<string>up</string>",
|
||||||
|
"<string>--foreground</string>",
|
||||||
|
...(args.meshSlug
|
||||||
|
? ["<string>--mesh</string>", `<string>${escapeXml(args.meshSlug)}</string>`]
|
||||||
|
: []),
|
||||||
|
...(args.displayName ? ["<string>--name</string>", `<string>${escapeXml(args.displayName)}</string>`] : []),
|
||||||
|
].join("\n ");
|
||||||
|
|
||||||
|
const xml = `<?xml version="1.0" encoding="UTF-8"?>
|
||||||
|
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
|
||||||
|
<plist version="1.0">
|
||||||
|
<dict>
|
||||||
|
<key>Label</key>
|
||||||
|
<string>${SERVICE_LABEL}</string>
|
||||||
|
<key>ProgramArguments</key>
|
||||||
|
<array>
|
||||||
|
<string>${escapeXml(nodeBin)}</string>
|
||||||
|
${meshArgs}
|
||||||
|
</array>
|
||||||
|
<key>RunAtLoad</key>
|
||||||
|
<true/>
|
||||||
|
<key>KeepAlive</key>
|
||||||
|
<true/>
|
||||||
|
<key>StandardOutPath</key>
|
||||||
|
<string>${escapeXml(log)}</string>
|
||||||
|
<key>StandardErrorPath</key>
|
||||||
|
<string>${escapeXml(log)}</string>
|
||||||
|
<key>WorkingDirectory</key>
|
||||||
|
<string>${escapeXml(homedir())}</string>
|
||||||
|
<key>EnvironmentVariables</key>
|
||||||
|
<dict>
|
||||||
|
<key>HOME</key>
|
||||||
|
<string>${escapeXml(homedir())}</string>
|
||||||
|
<key>PATH</key>
|
||||||
|
<string>/usr/local/bin:/opt/homebrew/bin:/usr/bin:/bin:/usr/sbin:/sbin</string>
|
||||||
|
</dict>
|
||||||
|
</dict>
|
||||||
|
</plist>
|
||||||
|
`;
|
||||||
|
writeFileSync(plist, xml, { mode: 0o644 });
|
||||||
|
|
||||||
|
// Stop any prior incarnation BEFORE bootstrapping so an upgrade run
|
||||||
|
// doesn't hit "service already loaded" → bootstrap exit-5 IO_ERROR.
|
||||||
|
// Both calls are best-effort: launchctl prints to stderr if the unit
|
||||||
|
// isn't loaded, and we don't want to fail install for that.
|
||||||
|
try {
|
||||||
|
execSync(`launchctl bootout gui/$(id -u)/${SERVICE_LABEL}`, { stdio: "ignore" });
|
||||||
|
} catch { /* unit not loaded — fine */ }
|
||||||
|
// Also kill any orphaned daemon process (started manually or by an
|
||||||
|
// older script) so the new launchd-managed one can claim the singleton
|
||||||
|
// lock on first start.
|
||||||
|
try {
|
||||||
|
const pidPath = DAEMON_PATHS.PID_FILE;
|
||||||
|
if (existsSync(pidPath)) {
|
||||||
|
const pid = parseInt(readFileSync(pidPath, "utf8").trim(), 10);
|
||||||
|
if (Number.isFinite(pid) && pid > 0) {
|
||||||
|
try { process.kill(pid, "SIGTERM"); } catch { /* already dead */ }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} catch { /* pid file missing — fine */ }
|
||||||
|
|
||||||
|
return {
|
||||||
|
platform: "darwin",
|
||||||
|
unitPath: plist,
|
||||||
|
bootCommand: `launchctl bootstrap gui/$(id -u) ${shellQuote(plist)}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── Linux ──────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
function linuxUnitPath(): string {
|
||||||
|
return join(homedir(), ".config", "systemd", "user", SYSTEMD_UNIT);
|
||||||
|
}
|
||||||
|
|
||||||
|
function installLinux(args: InstallArgs): InstallResult {
|
||||||
|
const unit = linuxUnitPath();
|
||||||
|
mkdirSync(dirname(unit), { recursive: true });
|
||||||
|
// Same node-pinning rationale as macOS — systemd's User= environment is
|
||||||
|
// similarly minimal; resolve node by absolute path.
|
||||||
|
const nodeBin = process.execPath;
|
||||||
|
// 1.34.12: --foreground because systemd-user owns process lifecycle
|
||||||
|
// and stdio capture; we don't want the child to double-fork into a
|
||||||
|
// detached grandchild systemd can't track.
|
||||||
|
const execArgs = [
|
||||||
|
"daemon", "up", "--foreground",
|
||||||
|
...(args.meshSlug ? ["--mesh", args.meshSlug] : []),
|
||||||
|
...(args.displayName ? ["--name", args.displayName] : []),
|
||||||
|
].map(shellQuote).join(" ");
|
||||||
|
|
||||||
|
const content = `[Unit]
|
||||||
|
Description=claudemesh daemon (peer mesh runtime)
|
||||||
|
After=network-online.target
|
||||||
|
Wants=network-online.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
ExecStart=${shellQuote(nodeBin)} ${shellQuote(args.binaryPath)} ${execArgs}
|
||||||
|
Restart=always
|
||||||
|
RestartSec=3
|
||||||
|
StandardOutput=append:${DAEMON_PATHS.LOG_FILE}
|
||||||
|
StandardError=append:${DAEMON_PATHS.LOG_FILE}
|
||||||
|
Environment=PATH=/usr/local/bin:/usr/bin:/bin
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=default.target
|
||||||
|
`;
|
||||||
|
writeFileSync(unit, content, { mode: 0o644 });
|
||||||
|
|
||||||
|
// Mirror the darwin path: stop the previous unit (if any) so an
|
||||||
|
// upgrade run replaces it cleanly, plus kill any orphaned manual
|
||||||
|
// daemon process holding the singleton lock.
|
||||||
|
try {
|
||||||
|
execSync(`systemctl --user stop ${SYSTEMD_UNIT}`, { stdio: "ignore" });
|
||||||
|
} catch { /* not loaded — fine */ }
|
||||||
|
try {
|
||||||
|
const pidPath = DAEMON_PATHS.PID_FILE;
|
||||||
|
if (existsSync(pidPath)) {
|
||||||
|
const pid = parseInt(readFileSync(pidPath, "utf8").trim(), 10);
|
||||||
|
if (Number.isFinite(pid) && pid > 0) {
|
||||||
|
try { process.kill(pid, "SIGTERM"); } catch { /* already dead */ }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
} catch { /* pid file missing — fine */ }
|
||||||
|
|
||||||
|
return {
|
||||||
|
platform: "linux",
|
||||||
|
unitPath: unit,
|
||||||
|
bootCommand: `systemctl --user daemon-reload && systemctl --user enable --now ${SYSTEMD_UNIT}`,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// ── helpers ────────────────────────────────────────────────────────────
|
||||||
|
|
||||||
|
function escapeXml(s: string): string {
|
||||||
|
return s.replace(/&/g, "&").replace(/</g, "<").replace(/>/g, ">").replace(/"/g, """);
|
||||||
|
}
|
||||||
|
|
||||||
|
function shellQuote(s: string): string {
|
||||||
|
if (/^[\w@%+=:,./-]+$/.test(s)) return s;
|
||||||
|
return "'" + s.replace(/'/g, "'\"'\"'") + "'";
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Diagnostic helper: dump current install status for `claudemesh daemon status --json`. */
|
||||||
|
export function readInstalledUnit(): { platform: ServicePlatform | null; path: string | null; content: string | null } {
|
||||||
|
const platform = detectPlatform();
|
||||||
|
if (!platform) return { platform: null, path: null, content: null };
|
||||||
|
const path = platform === "darwin" ? darwinPlistPath() : linuxUnitPath();
|
||||||
|
if (!existsSync(path)) return { platform, path: null, content: null };
|
||||||
|
try { return { platform, path, content: readFileSync(path, "utf8") }; }
|
||||||
|
catch { return { platform, path, content: null }; }
|
||||||
|
}
|
||||||
357
apps/cli/src/daemon/session-broker.ts
Normal file
357
apps/cli/src/daemon/session-broker.ts
Normal file
@@ -0,0 +1,357 @@
|
|||||||
|
/**
|
||||||
|
* Per-launch session broker WebSocket.
|
||||||
|
*
|
||||||
|
* Owned by the daemon, one per registered session. Holds a long-lived
|
||||||
|
* presence row on the broker keyed on the session's ephemeral pubkey
|
||||||
|
* (rather than the parent member's stable pubkey). Sibling sessions —
|
||||||
|
* two `claudemesh launch` runs in the same cwd — finally see each other
|
||||||
|
* in `peer list` because their presence rows coexist instead of fighting
|
||||||
|
* over the same memberPubkey snapshot.
|
||||||
|
*
|
||||||
|
* Differences from `DaemonBrokerClient`:
|
||||||
|
* - Uses session_hello (1.30.0+ broker), with a parent-vouched
|
||||||
|
* attestation provided at construction time.
|
||||||
|
* - Does NOT carry list_peers / state / memory RPCs. This client is
|
||||||
|
* presence + inbound DM delivery + (1.34.0) outbound send for
|
||||||
|
* messages that originate from this session. Routing those through
|
||||||
|
* here is what makes the broker fan-out attribute the push to the
|
||||||
|
* session pubkey instead of the daemon's stable member pubkey.
|
||||||
|
*
|
||||||
|
* Outbox routing (1.34.0): the drain worker now consults
|
||||||
|
* `outbox.sender_session_pubkey`. If a row was written by an
|
||||||
|
* authenticated session and the matching session-WS is `open`, the
|
||||||
|
* drain dispatches via `SessionBrokerClient.send()` — this
|
||||||
|
* connection's `conn.sessionPubkey` server-side is the session pubkey,
|
||||||
|
* so the broker's existing fan-out attribution
|
||||||
|
* (`senderPubkey: conn.sessionPubkey ?? conn.memberPubkey`) just works.
|
||||||
|
* Pre-1.34.0 every drain went through DaemonBrokerClient (member-WS),
|
||||||
|
* so every push showed up as "from <daemon-member-pubkey>" regardless
|
||||||
|
* of which session typed `claudemesh send`.
|
||||||
|
*
|
||||||
|
* Old brokers reply with `unknown_message_type` on session_hello — we
|
||||||
|
* surface that as a one-shot `error` event and the daemon decides
|
||||||
|
* whether to fall back. For 1.30.0 we just log + retry; the broker is
|
||||||
|
* expected to be deployed first.
|
||||||
|
*
|
||||||
|
* Spec: .artifacts/specs/2026-05-04-per-session-presence.md.
|
||||||
|
*
|
||||||
|
* 2026-05-04: lifecycle (connect / hello-ack / close-reconnect) lives
|
||||||
|
* in `ws-lifecycle.ts`. This class supplies session_hello content and
|
||||||
|
* routes the inbound onPush; the helper handles the rest.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { hostname as osHostname } from "node:os";
|
||||||
|
|
||||||
|
import type { JoinedMesh } from "~/services/config/facade.js";
|
||||||
|
import { signSessionHello } from "~/services/broker/session-hello-sig.js";
|
||||||
|
import { connectWsWithBackoff, type WsLifecycle, type WsStatus } from "./ws-lifecycle.js";
|
||||||
|
import type { BrokerSendArgs, BrokerSendResult } from "./broker.js";
|
||||||
|
|
||||||
|
export type SessionBrokerStatus = WsStatus;
|
||||||
|
|
||||||
|
/** Ack-tracking shape, mirrors DaemonBrokerClient.PendingAck. Kept
|
||||||
|
* internal — callers see only the resolved BrokerSendResult. */
|
||||||
|
interface PendingAck {
|
||||||
|
resolve: (r: BrokerSendResult) => void;
|
||||||
|
timer: NodeJS.Timeout;
|
||||||
|
}
|
||||||
|
|
||||||
|
const SEND_ACK_TIMEOUT_MS = 15_000;
|
||||||
|
|
||||||
|
/** Heuristic: which broker-reported send errors are permanent enough
|
||||||
|
* that the drain worker should give up rather than retry. Mirrors the
|
||||||
|
* daemon-WS classifier so behavior is identical regardless of which
|
||||||
|
* socket the row went out on. */
|
||||||
|
function classifyPermanent(error: string): boolean {
|
||||||
|
return /unknown|invalid|forbidden|not_authorized|target_not_found/i.test(error);
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface ParentAttestation {
|
||||||
|
sessionPubkey: string;
|
||||||
|
parentMemberPubkey: string;
|
||||||
|
/** Unix ms. Broker rejects > now+24h or already past. */
|
||||||
|
expiresAt: number;
|
||||||
|
signature: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface SessionBrokerOptions {
|
||||||
|
mesh: JoinedMesh;
|
||||||
|
/** Per-launch ephemeral keypair. */
|
||||||
|
sessionPubkey: string;
|
||||||
|
sessionSecretKey: string;
|
||||||
|
/** Parent-vouched attestation, signed by mesh.secretKey at launch time. */
|
||||||
|
parentAttestation: ParentAttestation;
|
||||||
|
/** Stable session_id from the launch (used for dedup on the broker). */
|
||||||
|
sessionId: string;
|
||||||
|
/** Display name override for this session. */
|
||||||
|
displayName?: string;
|
||||||
|
/** Initial groups. Format mirrors the regular hello. */
|
||||||
|
groups?: Array<{ name: string; role?: string }>;
|
||||||
|
/** Role tag (informational, not auth-bearing). */
|
||||||
|
role?: string;
|
||||||
|
/** Working directory (informational, surfaced in peer list). */
|
||||||
|
cwd?: string;
|
||||||
|
/** Pid of the launched session (NOT the daemon). */
|
||||||
|
pid: number;
|
||||||
|
onStatusChange?: (s: SessionBrokerStatus) => void;
|
||||||
|
/**
|
||||||
|
* Inbound push/inbound dispatch. The broker fans messages targeted at
|
||||||
|
* a session pubkey out over the corresponding session WS — without
|
||||||
|
* this callback they hit the floor and the daemon's inbox.db never
|
||||||
|
* sees them. Wired in run.ts to a handleBrokerPush call that decrypts
|
||||||
|
* with this session's secret key (member key as fallback).
|
||||||
|
*/
|
||||||
|
onPush?: (msg: Record<string, unknown>) => void;
|
||||||
|
log?: (level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) => void;
|
||||||
|
}
|
||||||
|
|
||||||
|
export class SessionBrokerClient {
|
||||||
|
private lifecycle: WsLifecycle | null = null;
|
||||||
|
private _status: SessionBrokerStatus = "closed";
|
||||||
|
private closed = false;
|
||||||
|
/** Set when the broker rejects session_hello with `unknown_message_type` —
|
||||||
|
* older brokers without the 1.30.0 surface. We stop retrying. */
|
||||||
|
private brokerUnsupported = false;
|
||||||
|
/** 1.34.0: outbound send tracking. Keyed by client_message_id. The
|
||||||
|
* drain worker registers an entry on dispatch; the WS message
|
||||||
|
* handler resolves it on broker `ack`. Times out after 15s. */
|
||||||
|
private pendingAcks = new Map<string, PendingAck>();
|
||||||
|
/** 1.34.0: dispatchers queued while the WS is reconnecting — flushed
|
||||||
|
* in onStatusChange when status flips to `open`. Mirrors the
|
||||||
|
* daemon-WS `opens` array. */
|
||||||
|
private opens: Array<() => void> = [];
|
||||||
|
|
||||||
|
constructor(private opts: SessionBrokerOptions) {}
|
||||||
|
|
||||||
|
get status(): SessionBrokerStatus { return this._status; }
|
||||||
|
get meshSlug(): string { return this.opts.mesh.slug; }
|
||||||
|
get sessionPubkey(): string { return this.opts.sessionPubkey; }
|
||||||
|
|
||||||
|
private log = (level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) => {
|
||||||
|
(this.opts.log ?? defaultLog)(level, msg, {
|
||||||
|
mesh: this.opts.mesh.slug,
|
||||||
|
session_pubkey: this.opts.sessionPubkey.slice(0, 12),
|
||||||
|
...meta,
|
||||||
|
});
|
||||||
|
};
|
||||||
|
|
||||||
|
/** Open the WS, run session_hello, resolve once the broker accepts. */
|
||||||
|
async connect(): Promise<void> {
|
||||||
|
if (this.closed) throw new Error("client_closed");
|
||||||
|
if (this._status === "connecting" || this._status === "open") return;
|
||||||
|
|
||||||
|
this.lifecycle = await connectWsWithBackoff({
|
||||||
|
url: this.opts.mesh.brokerUrl,
|
||||||
|
buildHello: async () => {
|
||||||
|
const { timestamp, signature } = await signSessionHello({
|
||||||
|
meshId: this.opts.mesh.meshId,
|
||||||
|
parentMemberPubkey: this.opts.mesh.pubkey,
|
||||||
|
sessionPubkey: this.opts.sessionPubkey,
|
||||||
|
sessionSecretKey: this.opts.sessionSecretKey,
|
||||||
|
});
|
||||||
|
return {
|
||||||
|
type: "session_hello",
|
||||||
|
meshId: this.opts.mesh.meshId,
|
||||||
|
parentMemberId: this.opts.mesh.memberId,
|
||||||
|
parentMemberPubkey: this.opts.mesh.pubkey,
|
||||||
|
sessionPubkey: this.opts.sessionPubkey,
|
||||||
|
parentAttestation: this.opts.parentAttestation,
|
||||||
|
displayName: this.opts.displayName,
|
||||||
|
sessionId: this.opts.sessionId,
|
||||||
|
pid: this.opts.pid,
|
||||||
|
cwd: this.opts.cwd ?? process.cwd(),
|
||||||
|
hostname: osHostname(),
|
||||||
|
peerType: "ai" as const,
|
||||||
|
channel: "claudemesh-session",
|
||||||
|
...(this.opts.groups && this.opts.groups.length > 0 ? { groups: this.opts.groups } : {}),
|
||||||
|
...(this.opts.role ? { role: this.opts.role } : {}),
|
||||||
|
timestamp,
|
||||||
|
signature,
|
||||||
|
};
|
||||||
|
},
|
||||||
|
isHelloAck: (msg) => msg.type === "hello_ack",
|
||||||
|
onMessage: (msg) => {
|
||||||
|
if (msg.type === "error") {
|
||||||
|
// Older brokers respond with `unknown_message_type` to session_hello;
|
||||||
|
// surface that so the daemon can decide to skip per-session presence
|
||||||
|
// rather than churn through reconnects. Setting `closed` halts the
|
||||||
|
// helper's reconnect loop on the next close.
|
||||||
|
this.log("warn", "broker_error", { code: msg.code, message: msg.message });
|
||||||
|
if (msg.code === "unknown_message_type") {
|
||||||
|
this.brokerUnsupported = true;
|
||||||
|
this.closed = true;
|
||||||
|
void this.lifecycle?.close();
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 1.34.0: outbox `send` ack arriving on the session-WS. Resolves
|
||||||
|
// the Promise the drain worker is awaiting. Mirrors the
|
||||||
|
// daemon-WS handler exactly.
|
||||||
|
if (msg.type === "ack") {
|
||||||
|
const id = String(msg.id ?? "");
|
||||||
|
const ack = this.pendingAcks.get(id);
|
||||||
|
if (ack) {
|
||||||
|
this.pendingAcks.delete(id);
|
||||||
|
clearTimeout(ack.timer);
|
||||||
|
if (typeof msg.error === "string" && msg.error.length > 0) {
|
||||||
|
ack.resolve({ ok: false, error: msg.error, permanent: classifyPermanent(msg.error) });
|
||||||
|
} else {
|
||||||
|
ack.resolve({ ok: true, messageId: String(msg.messageId ?? id) });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
// 1.32.1 — DMs targeted at the launched session's pubkey arrive
|
||||||
|
// here, NOT on the daemon's member-keyed WS. Forward to the
|
||||||
|
// daemon-level push handler so they land in inbox.db.
|
||||||
|
if (msg.type === "push" || msg.type === "inbound") {
|
||||||
|
// 1.34.9: skip system events on the session-WS — the daemon-WS
|
||||||
|
// already receives the same broker broadcast and publishes it
|
||||||
|
// to the bus, so forwarding here just produces duplicate
|
||||||
|
// `[system] Peer "X" joined the mesh` channel pushes (one per
|
||||||
|
// connection: 1 member-WS + 1 session-WS = 2 messages, +
|
||||||
|
// another set per sibling session). Caught in the 2026-05-04
|
||||||
|
// peer-rejoin smoke.
|
||||||
|
if ((msg as Record<string, unknown>).subtype === "system") return;
|
||||||
|
// 1.34.8: drop self-echoes. Some broker fan-out paths mirror an
|
||||||
|
// outbound DM back to the originating session-WS; without this
|
||||||
|
// guard the sender's own message lands in inbox.db, publishes a
|
||||||
|
// `message` bus event, and Claude Code surfaces it as
|
||||||
|
// `← claudemesh: <self>: <text>` immediately after the user
|
||||||
|
// typed `claudemesh send`. Caught in the 2026-05-04 two-session
|
||||||
|
// smoke. Match on session pubkey only — sibling sessions of the
|
||||||
|
// same member share `senderMemberPubkey`, so a member-level
|
||||||
|
// filter would wrongly drop legit sibling DMs.
|
||||||
|
const senderPubkey = String((msg as Record<string, unknown>).senderPubkey ?? "").toLowerCase();
|
||||||
|
if (senderPubkey && senderPubkey === this.opts.sessionPubkey.toLowerCase()) {
|
||||||
|
this.log("info", "self_echo_dropped", { sender: senderPubkey.slice(0, 12) });
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
this.opts.onPush?.(msg);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
},
|
||||||
|
onStatusChange: (s) => {
|
||||||
|
this._status = s;
|
||||||
|
this.opts.onStatusChange?.(s);
|
||||||
|
if (s === "open") {
|
||||||
|
// 1.34.0: flush queued send dispatchers so any outbox row that
|
||||||
|
// tried to dispatch while we were reconnecting goes out now.
|
||||||
|
const queued = this.opens.slice();
|
||||||
|
this.opens.length = 0;
|
||||||
|
for (const fn of queued) {
|
||||||
|
try { fn(); } catch (e) { this.log("warn", "session_open_handler_failed", { err: String(e) }); }
|
||||||
|
}
|
||||||
|
} else if (s === "closed" || s === "reconnecting") {
|
||||||
|
// Fail any in-flight acks so the drain worker can retry/backoff
|
||||||
|
// instead of hanging on a dead promise. The daemon-WS does the
|
||||||
|
// same thing via onBeforeReconnect; we centralize it here
|
||||||
|
// because session-broker uses status transitions directly.
|
||||||
|
this.failPendingAcks(`session_ws_${s}`);
|
||||||
|
}
|
||||||
|
},
|
||||||
|
log: (level, msg, meta) => this.log(level, `session_broker_${msg}`, meta),
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/** v2 agentic-comms (M1): send `client_ack` back to the broker after
|
||||||
|
* successfully landing an inbound push in inbox.db. Broker uses the
|
||||||
|
* ack to set `delivered_at`. Best-effort. */
|
||||||
|
sendClientAck(clientMessageId: string, brokerMessageId: string | null): void {
|
||||||
|
if (this._status !== "open" || !this.lifecycle) return;
|
||||||
|
try {
|
||||||
|
this.lifecycle.send({
|
||||||
|
type: "client_ack",
|
||||||
|
clientMessageId,
|
||||||
|
...(brokerMessageId ? { brokerMessageId } : {}),
|
||||||
|
});
|
||||||
|
} catch { /* drop; lease re-delivers */ }
|
||||||
|
}
|
||||||
|
|
||||||
|
/** True when underlying socket is OPEN-ready for direct sends. */
|
||||||
|
isOpen(): boolean {
|
||||||
|
const sock = this.lifecycle?.ws;
|
||||||
|
return !!sock && sock.readyState === sock.OPEN;
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* 1.34.0 — Send one outbox row over the session-WS. Same wire format
|
||||||
|
* as DaemonBrokerClient.send, but routed via this connection so the
|
||||||
|
* broker's fan-out attributes the push to the session pubkey.
|
||||||
|
*
|
||||||
|
* Used by the drain worker for rows whose `sender_session_pubkey`
|
||||||
|
* matches this client's session pubkey. When the WS is reconnecting
|
||||||
|
* the dispatcher is queued via `opens` and flushed on the next
|
||||||
|
* status flip.
|
||||||
|
*/
|
||||||
|
send(req: BrokerSendArgs): Promise<BrokerSendResult> {
|
||||||
|
return new Promise<BrokerSendResult>((resolve) => {
|
||||||
|
const dispatch = () => {
|
||||||
|
if (!this.isOpen() || !this.lifecycle) {
|
||||||
|
resolve({ ok: false, error: "session_ws_not_open", permanent: false });
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
const id = req.client_message_id;
|
||||||
|
const timer = setTimeout(() => {
|
||||||
|
if (this.pendingAcks.delete(id)) {
|
||||||
|
resolve({ ok: false, error: "ack_timeout", permanent: false });
|
||||||
|
}
|
||||||
|
}, SEND_ACK_TIMEOUT_MS);
|
||||||
|
this.pendingAcks.set(id, { resolve, timer });
|
||||||
|
try {
|
||||||
|
this.lifecycle.send({
|
||||||
|
type: "send",
|
||||||
|
id,
|
||||||
|
client_message_id: id,
|
||||||
|
request_fingerprint: req.request_fingerprint_hex,
|
||||||
|
targetSpec: req.targetSpec,
|
||||||
|
priority: req.priority,
|
||||||
|
nonce: req.nonce,
|
||||||
|
ciphertext: req.ciphertext,
|
||||||
|
});
|
||||||
|
} catch (e) {
|
||||||
|
this.pendingAcks.delete(id);
|
||||||
|
clearTimeout(timer);
|
||||||
|
resolve({ ok: false, error: `ws_write_failed: ${String(e)}`, permanent: false });
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
if (this._status === "open") dispatch();
|
||||||
|
else this.opens.push(dispatch);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Resolve every in-flight ack with a synthetic failure. Called on
|
||||||
|
* WS close so the drain worker stops waiting and either retries or
|
||||||
|
* reroutes via the daemon-WS. */
|
||||||
|
private failPendingAcks(reason: string): void {
|
||||||
|
if (this.pendingAcks.size === 0) return;
|
||||||
|
const entries = [...this.pendingAcks.entries()];
|
||||||
|
this.pendingAcks.clear();
|
||||||
|
for (const [, ack] of entries) {
|
||||||
|
clearTimeout(ack.timer);
|
||||||
|
ack.resolve({ ok: false, error: reason, permanent: false });
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
async close(): Promise<void> {
|
||||||
|
this.closed = true;
|
||||||
|
if (this.lifecycle) {
|
||||||
|
try { await this.lifecycle.close(); } catch { /* ignore */ }
|
||||||
|
this.lifecycle = null;
|
||||||
|
}
|
||||||
|
this._status = "closed";
|
||||||
|
}
|
||||||
|
|
||||||
|
/** True when the broker rejected our session_hello as unknown — caller
|
||||||
|
* may want to skip per-session presence entirely on this mesh. */
|
||||||
|
get isBrokerUnsupported(): boolean { return this.brokerUnsupported; }
|
||||||
|
}
|
||||||
|
|
||||||
|
function defaultLog(level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) {
|
||||||
|
const line = JSON.stringify({ level, msg, ...meta, ts: new Date().toISOString() });
|
||||||
|
if (level === "info") process.stdout.write(line + "\n");
|
||||||
|
else process.stderr.write(line + "\n");
|
||||||
|
}
|
||||||
220
apps/cli/src/daemon/session-registry.ts
Normal file
220
apps/cli/src/daemon/session-registry.ts
Normal file
@@ -0,0 +1,220 @@
|
|||||||
|
/**
|
||||||
|
* In-memory per-token session registry kept by the daemon.
|
||||||
|
*
|
||||||
|
* `claudemesh launch` POSTs `/v1/sessions/register` with the token it
|
||||||
|
* minted plus session metadata (sessionId, mesh, displayName, pid,
|
||||||
|
* cwd, role, groups). Subsequent CLI invocations from inside that
|
||||||
|
* session present the token via `Authorization: ClaudeMesh-Session
|
||||||
|
* <hex>` and the daemon's IPC auth middleware resolves it here in O(1).
|
||||||
|
*
|
||||||
|
* Lifecycle:
|
||||||
|
* - register replaces any prior entry under the same `sessionId`
|
||||||
|
* (handles re-launch and `--resume` flows cleanly).
|
||||||
|
* - reaper polls every 5 s. An entry is dropped when its pid is dead
|
||||||
|
* OR when its captured start-time no longer matches the running
|
||||||
|
* process (PID reuse — original is gone, OS recycled the number).
|
||||||
|
* - hard ttl ceiling of 24 h is a leak guard for forgotten sessions.
|
||||||
|
*
|
||||||
|
* Persistence: in-memory only for v1. A daemon restart clears the
|
||||||
|
* registry — every launched session needs to re-register. That's fine
|
||||||
|
* for now because launch.ts re-registers on `ensureDaemonRunning`'s
|
||||||
|
* success path, and most ad-hoc CLI invocations from outside a launched
|
||||||
|
* session have no token to begin with.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import { getProcessStartTime, getProcessStartTimes, isPidAlive } from "./process-info.js";
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Optional per-launch presence material. Carried opaquely through the
|
||||||
|
* registry; the daemon's session-broker subsystem (1.30.0+) reads it to
|
||||||
|
* open a long-lived broker WebSocket per session. Absent on older CLIs
|
||||||
|
* — register accepts payloads without it for backward compat.
|
||||||
|
*/
|
||||||
|
export interface SessionPresence {
|
||||||
|
/** Hex ed25519 pubkey, 64 chars. */
|
||||||
|
sessionPubkey: string;
|
||||||
|
/** Hex ed25519 secret key (held in-memory only; never disk). */
|
||||||
|
sessionSecretKey: string;
|
||||||
|
/** Parent-member-signed attestation; see signParentAttestation. */
|
||||||
|
parentAttestation: {
|
||||||
|
sessionPubkey: string;
|
||||||
|
parentMemberPubkey: string;
|
||||||
|
expiresAt: number;
|
||||||
|
signature: string;
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface SessionInfo {
|
||||||
|
token: string;
|
||||||
|
sessionId: string;
|
||||||
|
mesh: string;
|
||||||
|
displayName: string;
|
||||||
|
pid: number;
|
||||||
|
cwd?: string;
|
||||||
|
role?: string;
|
||||||
|
groups?: string[];
|
||||||
|
/** 1.30.0+: per-launch presence material. */
|
||||||
|
presence?: SessionPresence;
|
||||||
|
/**
|
||||||
|
* 1.31.0+: opaque per-process start-time captured at register. The
|
||||||
|
* reaper compares the live value against this on every sweep — a
|
||||||
|
* mismatch means the original process exited and the pid was reused
|
||||||
|
* by an unrelated program, so the registry entry must be dropped.
|
||||||
|
* `undefined` when capture failed (process already dead at register
|
||||||
|
* time, ps unavailable, etc.) — the reaper falls back to bare
|
||||||
|
* liveness in that case.
|
||||||
|
*/
|
||||||
|
startTime?: string;
|
||||||
|
registeredAt: number;
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Lifecycle callbacks invoked synchronously after registry mutation. */
|
||||||
|
export interface RegistryHooks {
|
||||||
|
onRegister?: (info: SessionInfo) => void;
|
||||||
|
onDeregister?: (info: SessionInfo) => void;
|
||||||
|
}
|
||||||
|
|
||||||
|
const TTL_MS = 24 * 60 * 60 * 1000;
|
||||||
|
const REAPER_INTERVAL_MS = 5 * 1000;
|
||||||
|
|
||||||
|
const byToken = new Map<string, SessionInfo>();
|
||||||
|
const bySessionId = new Map<string, string>();
|
||||||
|
const hooks: RegistryHooks = {};
|
||||||
|
|
||||||
|
let reaperHandle: NodeJS.Timeout | null = null;
|
||||||
|
|
||||||
|
export function startReaper(): void {
|
||||||
|
if (reaperHandle) return;
|
||||||
|
// The sweep is async (batched ps) — wrap in `void` so setInterval
|
||||||
|
// doesn't try to await us, and so an unexpected throw doesn't crash
|
||||||
|
// the daemon. Errors are swallowed inside reapDead.
|
||||||
|
reaperHandle = setInterval(() => { void reapDead(); }, REAPER_INTERVAL_MS).unref?.() ?? reaperHandle;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function stopReaper(): void {
|
||||||
|
if (reaperHandle) { clearInterval(reaperHandle); reaperHandle = null; }
|
||||||
|
}
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Wire daemon-level lifecycle hooks. Called once at daemon boot — passing
|
||||||
|
* `{}` clears them. Idempotent across calls so tests can re-bind.
|
||||||
|
*/
|
||||||
|
export function setRegistryHooks(next: RegistryHooks): void {
|
||||||
|
hooks.onRegister = next.onRegister;
|
||||||
|
hooks.onDeregister = next.onDeregister;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function registerSession(info: Omit<SessionInfo, "registeredAt">): SessionInfo {
|
||||||
|
// Replace any prior entry under the same sessionId.
|
||||||
|
const priorToken = bySessionId.get(info.sessionId);
|
||||||
|
if (priorToken && priorToken !== info.token) {
|
||||||
|
const prior = byToken.get(priorToken);
|
||||||
|
if (prior) {
|
||||||
|
byToken.delete(priorToken);
|
||||||
|
try { hooks.onDeregister?.(prior); } catch { /* hook errors must never throttle the registry */ }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Caller may pre-fill info.startTime (tests do this for determinism).
|
||||||
|
// For the real path we fire-and-forget an async ps probe — register
|
||||||
|
// stays sync and microsecond-fast, and the start-time lands on the
|
||||||
|
// entry within a few ms. Until it lands, the reaper falls back to
|
||||||
|
// bare liveness for this entry, which is fine for the common case
|
||||||
|
// (PID reuse is rare; the brief window without the guard is
|
||||||
|
// tolerable).
|
||||||
|
const stored: SessionInfo = { ...info, registeredAt: Date.now() };
|
||||||
|
byToken.set(info.token, stored);
|
||||||
|
bySessionId.set(info.sessionId, info.token);
|
||||||
|
try { hooks.onRegister?.(stored); } catch { /* see above */ }
|
||||||
|
if (stored.startTime === undefined) {
|
||||||
|
void captureStartTimeAsync(info.token, info.pid);
|
||||||
|
}
|
||||||
|
return stored;
|
||||||
|
}
|
||||||
|
|
||||||
|
async function captureStartTimeAsync(token: string, pid: number): Promise<void> {
|
||||||
|
const lstart = await getProcessStartTime(pid);
|
||||||
|
if (lstart === null) return;
|
||||||
|
const entry = byToken.get(token);
|
||||||
|
if (!entry || entry.pid !== pid) return; // entry was replaced; skip
|
||||||
|
entry.startTime = lstart;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function deregisterByToken(token: string): boolean {
|
||||||
|
const entry = byToken.get(token);
|
||||||
|
if (!entry) return false;
|
||||||
|
byToken.delete(token);
|
||||||
|
if (bySessionId.get(entry.sessionId) === token) bySessionId.delete(entry.sessionId);
|
||||||
|
try { hooks.onDeregister?.(entry); } catch { /* see above */ }
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function resolveToken(token: string): SessionInfo | null {
|
||||||
|
const entry = byToken.get(token);
|
||||||
|
if (!entry) return null;
|
||||||
|
if (Date.now() - entry.registeredAt > TTL_MS) {
|
||||||
|
deregisterByToken(token);
|
||||||
|
return null;
|
||||||
|
}
|
||||||
|
return entry;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function listSessions(): SessionInfo[] {
|
||||||
|
return [...byToken.values()];
|
||||||
|
}
|
||||||
|
|
||||||
|
async function reapDead(): Promise<void> {
|
||||||
|
// Snapshot first; the second (async) phase calls ps and we must not
|
||||||
|
// mutate the registry mid-iteration.
|
||||||
|
const entries = [...byToken.entries()];
|
||||||
|
|
||||||
|
// Phase 1 — TTL + bare liveness. Sync, microsecond-fast.
|
||||||
|
const dead: string[] = [];
|
||||||
|
const survivors: Array<[string, SessionInfo]> = [];
|
||||||
|
for (const [token, info] of entries) {
|
||||||
|
if (Date.now() - info.registeredAt > TTL_MS) { dead.push(token); continue; }
|
||||||
|
if (!isPidAlive(info.pid)) { dead.push(token); continue; }
|
||||||
|
survivors.push([token, info]);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Phase 2 — PID-reuse guard for survivors that have a captured
|
||||||
|
// start-time. Single batched ps call: O(1) forks regardless of
|
||||||
|
// session count. Survivors without a start-time keep the bare-
|
||||||
|
// liveness verdict from phase 1 (their captureStartTimeAsync may
|
||||||
|
// still be in-flight from a recent register).
|
||||||
|
const guardedPids = survivors
|
||||||
|
.filter(([, info]) => info.startTime !== undefined)
|
||||||
|
.map(([, info]) => info.pid);
|
||||||
|
if (guardedPids.length > 0) {
|
||||||
|
try {
|
||||||
|
const live = await getProcessStartTimes(guardedPids);
|
||||||
|
for (const [token, info] of survivors) {
|
||||||
|
if (info.startTime === undefined) continue;
|
||||||
|
const lstart = live.get(info.pid);
|
||||||
|
// ps may transiently miss a pid that was alive when isPidAlive
|
||||||
|
// ran — treat absence as "racing", let the next sweep decide.
|
||||||
|
if (lstart === undefined) continue;
|
||||||
|
if (lstart !== info.startTime) dead.push(token);
|
||||||
|
}
|
||||||
|
} catch {
|
||||||
|
// ps failure here is non-fatal: survivors keep their phase-1
|
||||||
|
// verdict. Logging is the daemon's responsibility — the
|
||||||
|
// registry deliberately stays log-free.
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for (const t of dead) deregisterByToken(t);
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Test helper: run a single reaper pass. */
|
||||||
|
export async function _runReaperOnce(): Promise<void> {
|
||||||
|
await reapDead();
|
||||||
|
}
|
||||||
|
|
||||||
|
/** Test helper. */
|
||||||
|
export function _resetRegistry(): void {
|
||||||
|
byToken.clear();
|
||||||
|
bySessionId.clear();
|
||||||
|
hooks.onRegister = undefined;
|
||||||
|
hooks.onDeregister = undefined;
|
||||||
|
}
|
||||||
274
apps/cli/src/daemon/ws-lifecycle.ts
Normal file
274
apps/cli/src/daemon/ws-lifecycle.ts
Normal file
@@ -0,0 +1,274 @@
|
|||||||
|
/**
|
||||||
|
* Shared WS lifecycle helper for the daemon's two broker clients.
|
||||||
|
*
|
||||||
|
* Both `DaemonBrokerClient` (member-keyed, one per joined mesh) and
|
||||||
|
* `SessionBrokerClient` (session-keyed, one per launched session) used
|
||||||
|
* to inline the same connect/hello/ack-timeout/close-reconnect logic.
|
||||||
|
* They drifted apart subtly — different ack-timeout names, different
|
||||||
|
* reconnect log messages, slightly different status flips — and that's
|
||||||
|
* how 1.32.x bugs shipped (push handler attached to the wrong client,
|
||||||
|
* etc).
|
||||||
|
*
|
||||||
|
* This helper owns ONLY the lifecycle:
|
||||||
|
* - new WebSocket(url), wire up open/message/close/error
|
||||||
|
* - on open → call buildHello() and send the result
|
||||||
|
* - start an ack-timeout timer; if it fires before the hello ack
|
||||||
|
* arrives, close the socket and reject the connect promise
|
||||||
|
* - on message, gate on isHelloAck(); when true, flip status to
|
||||||
|
* "open", clear the ack timer, resolve. All other messages are
|
||||||
|
* forwarded to onMessage()
|
||||||
|
* - on close, schedule a backoff reconnect (unless explicitly closed)
|
||||||
|
*
|
||||||
|
* Each client keeps its own concerns: DaemonBrokerClient still owns
|
||||||
|
* pendingAcks / peerListResolvers / etc; SessionBrokerClient still owns
|
||||||
|
* its onPush callback. The helper just hands them an open WS and a
|
||||||
|
* stable status field, and reconnects under their feet on disconnect.
|
||||||
|
*
|
||||||
|
* Composition over inheritance — callers receive a `WsLifecycle` handle
|
||||||
|
* with `send` / `close` / `status`, NOT a subclass.
|
||||||
|
*/
|
||||||
|
|
||||||
|
import WebSocket from "ws";
|
||||||
|
|
||||||
|
export type WsStatus = "connecting" | "open" | "closed" | "reconnecting";
|
||||||
|
|
||||||
|
export type WsLogLevel = "info" | "warn" | "error";
|
||||||
|
export type WsLog = (level: WsLogLevel, msg: string, meta?: Record<string, unknown>) => void;
|
||||||
|
|
||||||
|
export interface WsLifecycleOptions {
|
||||||
|
/** Broker URL (e.g. wss://ic.claudemesh.com/ws). */
|
||||||
|
url: string;
|
||||||
|
/**
|
||||||
|
* Build the hello frame to send right after the WS opens. Async because
|
||||||
|
* signing the hello may need libsodium initialization. Whatever this
|
||||||
|
* returns is JSON.stringified and sent verbatim — the helper does NOT
|
||||||
|
* inspect or modify it.
|
||||||
|
*/
|
||||||
|
buildHello: () => Promise<unknown>;
|
||||||
|
/**
|
||||||
|
* Returns true iff `msg` is the hello ack the helper should treat as
|
||||||
|
* "broker accepted us; flip status to open". Both daemon-WS and
|
||||||
|
* session-WS use `{ type: "hello_ack" }` today, but keeping this a
|
||||||
|
* predicate lets either client narrow further (e.g. on a `code` field)
|
||||||
|
* without leaking client-specific shape into the helper.
|
||||||
|
*/
|
||||||
|
isHelloAck: (msg: Record<string, unknown>) => boolean;
|
||||||
|
/**
|
||||||
|
* Called for every parsed message that is NOT the hello ack. The
|
||||||
|
* helper does NOT decide which messages are pushes vs RPCs vs errors;
|
||||||
|
* that's the caller's concern.
|
||||||
|
*/
|
||||||
|
onMessage: (msg: Record<string, unknown>) => void;
|
||||||
|
onStatusChange?: (s: WsStatus) => void;
|
||||||
|
/**
|
||||||
|
* How long to wait for the broker's hello ack before giving up and
|
||||||
|
* forcing a close. Defaults 5s — same as both pre-refactor clients.
|
||||||
|
*/
|
||||||
|
helloAckTimeoutMs?: number;
|
||||||
|
/**
|
||||||
|
* Reconnect backoff schedule. Defaults [1s, 2s, 4s, 8s, 16s, 30s] —
|
||||||
|
* matches both pre-refactor clients exactly.
|
||||||
|
*/
|
||||||
|
backoffCapsMs?: readonly number[];
|
||||||
|
log?: WsLog;
|
||||||
|
/**
|
||||||
|
* Hook for the close path BEFORE the helper schedules a reconnect.
|
||||||
|
* Used by DaemonBrokerClient to fail its in-flight pendingAcks map
|
||||||
|
* with a "broker_disconnected_<code>" reason. The helper passes the
|
||||||
|
* raw close code so the caller can shape its rejection text.
|
||||||
|
*
|
||||||
|
* Returns nothing — close handling continues regardless.
|
||||||
|
*/
|
||||||
|
onBeforeReconnect?: (code: number, reason: string) => void;
|
||||||
|
}
|
||||||
|
|
||||||
|
export interface WsLifecycle {
|
||||||
|
/** Current connection status. Updated synchronously before onStatusChange fires. */
|
||||||
|
readonly status: WsStatus;
|
||||||
|
/** Underlying socket. Exposed for callers that need OPEN-state checks
|
||||||
|
* before sending (mirrors the pre-refactor `this.ws.readyState` checks). */
|
||||||
|
readonly ws: WebSocket | null;
|
||||||
|
/** Send a JSON payload over the open WS. Throws if not open — callers
|
||||||
|
* that need queue-while-disconnected semantics should layer that
|
||||||
|
* themselves (DaemonBrokerClient does, via its `opens` deferred-fn array). */
|
||||||
|
send(payload: unknown): void;
|
||||||
|
/** Close the WS and stop reconnecting. Idempotent. */
|
||||||
|
close(): Promise<void>;
|
||||||
|
}
|
||||||
|
|
||||||
|
const DEFAULT_HELLO_ACK_TIMEOUT_MS = 5_000;
|
||||||
|
const DEFAULT_BACKOFF_CAPS_MS: readonly number[] = [1_000, 2_000, 4_000, 8_000, 16_000, 30_000];
|
||||||
|
|
||||||
|
const defaultLog: WsLog = (level, msg, meta) => {
|
||||||
|
const line = JSON.stringify({ level, msg, ...meta, ts: new Date().toISOString() });
|
||||||
|
if (level === "info") process.stdout.write(line + "\n");
|
||||||
|
else process.stderr.write(line + "\n");
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Connect a WebSocket with hello-handshake, ack-timeout, and reconnect
|
||||||
|
* with exponential backoff. Resolves once the broker accepts the hello;
|
||||||
|
* rejects if the first connect closes before the ack lands.
|
||||||
|
*
|
||||||
|
* Subsequent automatic reconnects are silent — they fire on the close
|
||||||
|
* handler's backoff timer and surface only via onStatusChange (and any
|
||||||
|
* caller-installed log).
|
||||||
|
*/
|
||||||
|
export function connectWsWithBackoff(opts: WsLifecycleOptions): Promise<WsLifecycle> {
|
||||||
|
const helloAckTimeoutMs = opts.helloAckTimeoutMs ?? DEFAULT_HELLO_ACK_TIMEOUT_MS;
|
||||||
|
const backoffCapsMs = opts.backoffCapsMs ?? DEFAULT_BACKOFF_CAPS_MS;
|
||||||
|
const log: WsLog = opts.log ?? defaultLog;
|
||||||
|
|
||||||
|
let ws: WebSocket | null = null;
|
||||||
|
let status: WsStatus = "closed";
|
||||||
|
let closed = false;
|
||||||
|
let reconnectAttempt = 0;
|
||||||
|
let reconnectTimer: NodeJS.Timeout | null = null;
|
||||||
|
let helloTimer: NodeJS.Timeout | null = null;
|
||||||
|
|
||||||
|
const setStatus = (s: WsStatus) => {
|
||||||
|
if (status === s) return;
|
||||||
|
status = s;
|
||||||
|
opts.onStatusChange?.(s);
|
||||||
|
};
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Open one WS attempt. Returns a promise that resolves on hello ack
|
||||||
|
* or rejects if the socket closes before we get one. Used by both the
|
||||||
|
* initial connect and the close-handler backoff timer (which awaits
|
||||||
|
* but ignores the rejection — by then the close handler has already
|
||||||
|
* scheduled its own reconnect).
|
||||||
|
*/
|
||||||
|
// Liveness watchdog: same cadence (30s) as the broker's outbound
|
||||||
|
// ping. Two jobs per tick:
|
||||||
|
// 1. If we haven't heard from the broker in >75s (2.5x the ping
|
||||||
|
// cadence — covers one missed ping plus some slack), terminate
|
||||||
|
// the socket. Fires the close handler → backoff reconnect runs
|
||||||
|
// its normal path. This is what catches NAT-dropped half-dead
|
||||||
|
// connections that the kernel won't RST for ~2 hours.
|
||||||
|
// 2. Otherwise, send our own ping. The broker's `ws` library
|
||||||
|
// auto-replies with a pong, which bumps lastActivity. This
|
||||||
|
// keeps the broker's stale-pong watchdog seeing us as alive.
|
||||||
|
//
|
||||||
|
// Bare `ping` and `pong` events both bump lastActivity, as does
|
||||||
|
// any inbound application message — any sign of life resets the
|
||||||
|
// dead-man's-switch.
|
||||||
|
const PING_INTERVAL_MS = 30_000;
|
||||||
|
const STALE_THRESHOLD_MS = 75_000;
|
||||||
|
let lastActivity = Date.now();
|
||||||
|
let watchdogTimer: NodeJS.Timeout | null = null;
|
||||||
|
|
||||||
|
const openOnce = (): Promise<void> => {
|
||||||
|
if (closed) return Promise.reject(new Error("client_closed"));
|
||||||
|
setStatus("connecting");
|
||||||
|
|
||||||
|
log("info", "ws_open_attempt", { url: opts.url });
|
||||||
|
const sock = new WebSocket(opts.url);
|
||||||
|
ws = sock;
|
||||||
|
lastActivity = Date.now();
|
||||||
|
|
||||||
|
return new Promise<void>((resolve, reject) => {
|
||||||
|
sock.on("open", () => {
|
||||||
|
log("info", "ws_open_ok", { url: opts.url });
|
||||||
|
// Build and send the hello inside a microtask so any sync
|
||||||
|
// throws from buildHello() reject this connect attempt cleanly.
|
||||||
|
(async () => {
|
||||||
|
try {
|
||||||
|
const hello = await opts.buildHello();
|
||||||
|
sock.send(JSON.stringify(hello));
|
||||||
|
log("info", "ws_hello_sent", { url: opts.url });
|
||||||
|
helloTimer = setTimeout(() => {
|
||||||
|
log("warn", "hello_ack_timeout", { url: opts.url });
|
||||||
|
try { sock.close(); } catch { /* ignore */ }
|
||||||
|
reject(new Error("hello_ack_timeout"));
|
||||||
|
}, helloAckTimeoutMs);
|
||||||
|
} catch (e) {
|
||||||
|
log("warn", "ws_build_hello_threw", { err: String(e) });
|
||||||
|
reject(e instanceof Error ? e : new Error(String(e)));
|
||||||
|
}
|
||||||
|
})();
|
||||||
|
});
|
||||||
|
|
||||||
|
sock.on("message", (raw) => {
|
||||||
|
lastActivity = Date.now();
|
||||||
|
let msg: Record<string, unknown>;
|
||||||
|
try { msg = JSON.parse(raw.toString()) as Record<string, unknown>; }
|
||||||
|
catch { return; }
|
||||||
|
|
||||||
|
if (opts.isHelloAck(msg)) {
|
||||||
|
if (helloTimer) { clearTimeout(helloTimer); helloTimer = null; }
|
||||||
|
setStatus("open");
|
||||||
|
reconnectAttempt = 0;
|
||||||
|
log("info", "ws_hello_acked", { url: opts.url });
|
||||||
|
// Start liveness watchdog only after a successful handshake.
|
||||||
|
if (watchdogTimer) clearInterval(watchdogTimer);
|
||||||
|
watchdogTimer = setInterval(() => {
|
||||||
|
if (sock.readyState !== sock.OPEN) return;
|
||||||
|
const idle = Date.now() - lastActivity;
|
||||||
|
if (idle > STALE_THRESHOLD_MS) {
|
||||||
|
log("warn", "ws_stale_terminate", { url: opts.url, idle_ms: idle });
|
||||||
|
try { sock.terminate(); } catch { /* socket already gone */ }
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
try { sock.ping(); } catch { /* ignore */ }
|
||||||
|
}, PING_INTERVAL_MS);
|
||||||
|
resolve();
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
|
opts.onMessage(msg);
|
||||||
|
});
|
||||||
|
|
||||||
|
sock.on("ping", () => { lastActivity = Date.now(); });
|
||||||
|
sock.on("pong", () => { lastActivity = Date.now(); });
|
||||||
|
|
||||||
|
sock.on("close", (code, reason) => {
|
||||||
|
if (helloTimer) { clearTimeout(helloTimer); helloTimer = null; }
|
||||||
|
if (watchdogTimer) { clearInterval(watchdogTimer); watchdogTimer = null; }
|
||||||
|
const reasonStr = reason.toString("utf8");
|
||||||
|
log("warn", "ws_closed", { url: opts.url, code, reason: reasonStr, status });
|
||||||
|
opts.onBeforeReconnect?.(code, reasonStr);
|
||||||
|
|
||||||
|
if (closed) {
|
||||||
|
setStatus("closed");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
setStatus("reconnecting");
|
||||||
|
const wait = backoffCapsMs[Math.min(reconnectAttempt, backoffCapsMs.length - 1)] ?? 30_000;
|
||||||
|
reconnectAttempt++;
|
||||||
|
log("info", "ws_reconnect_scheduled", { url: opts.url, wait_ms: wait, code, reason: reasonStr });
|
||||||
|
reconnectTimer = setTimeout(
|
||||||
|
() => openOnce().catch((err) => log("warn", "ws_reconnect_failed", { url: opts.url, err: String(err) })),
|
||||||
|
wait,
|
||||||
|
);
|
||||||
|
if (status === "connecting" || status === "reconnecting") {
|
||||||
|
reject(new Error(`closed_before_hello_${code}`));
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
|
sock.on("error", (err) => log("warn", "ws_error", { url: opts.url, err: err.message }));
|
||||||
|
});
|
||||||
|
};
|
||||||
|
|
||||||
|
return openOnce().then(() => {
|
||||||
|
const handle: WsLifecycle = {
|
||||||
|
get status() { return status; },
|
||||||
|
get ws() { return ws; },
|
||||||
|
send(payload: unknown) {
|
||||||
|
if (!ws || ws.readyState !== ws.OPEN) {
|
||||||
|
throw new Error("ws_not_open");
|
||||||
|
}
|
||||||
|
ws.send(JSON.stringify(payload));
|
||||||
|
},
|
||||||
|
async close() {
|
||||||
|
closed = true;
|
||||||
|
if (reconnectTimer) { clearTimeout(reconnectTimer); reconnectTimer = null; }
|
||||||
|
if (helloTimer) { clearTimeout(helloTimer); helloTimer = null; }
|
||||||
|
if (watchdogTimer) { clearInterval(watchdogTimer); watchdogTimer = null; }
|
||||||
|
try { ws?.close(); } catch { /* ignore */ }
|
||||||
|
setStatus("closed");
|
||||||
|
},
|
||||||
|
};
|
||||||
|
return handle;
|
||||||
|
});
|
||||||
|
}
|
||||||
@@ -9,12 +9,19 @@ import { renderVersion } from "~/cli/output/version.js";
|
|||||||
import { isInviteUrl, normaliseInviteUrl } from "~/utils/url.js";
|
import { isInviteUrl, normaliseInviteUrl } from "~/utils/url.js";
|
||||||
import { classifyInvocation } from "~/cli/policy-classify.js";
|
import { classifyInvocation } from "~/cli/policy-classify.js";
|
||||||
import { gate, type ApprovalMode } from "~/services/policy/index.js";
|
import { gate, type ApprovalMode } from "~/services/policy/index.js";
|
||||||
|
import { setDaemonPolicy, policyFromFlags } from "~/services/daemon/policy.js";
|
||||||
|
import { bold, clay, cyan, dim, orange } from "~/ui/styles.js";
|
||||||
|
|
||||||
installSignalHandlers();
|
installSignalHandlers();
|
||||||
installErrorHandlers();
|
installErrorHandlers();
|
||||||
|
|
||||||
const { command, positionals, flags } = parseArgv(process.argv);
|
const { command, positionals, flags } = parseArgv(process.argv);
|
||||||
|
|
||||||
|
// Resolve daemon policy once at boot — daemon-routing helpers read this
|
||||||
|
// instead of inspecting flags themselves. --no-daemon and --strict are
|
||||||
|
// mutually exclusive (--no-daemon wins if both are passed).
|
||||||
|
setDaemonPolicy(policyFromFlags(flags));
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Resolve the coarse approval mode from CLI flags + env.
|
* Resolve the coarse approval mode from CLI flags + env.
|
||||||
* --approval-mode <plan|read-only|write|yolo> explicit
|
* --approval-mode <plan|read-only|write|yolo> explicit
|
||||||
@@ -66,13 +73,13 @@ USAGE
|
|||||||
claudemesh <invite-url> join a mesh, then launch
|
claudemesh <invite-url> join a mesh, then launch
|
||||||
claudemesh launch --name <n> --join <url> join + launch in one step
|
claudemesh launch --name <n> --join <url> join + launch in one step
|
||||||
|
|
||||||
Mesh
|
Mesh (alias: "workspace" — claudemesh workspace <verb> mirrors each)
|
||||||
claudemesh create <name> create a new mesh
|
claudemesh create <name> create a new mesh
|
||||||
claudemesh join <url> join a mesh (accepts short /i/ or long /join/ link)
|
claudemesh join <url> join a mesh (accepts short /i/ or long /join/ link)
|
||||||
claudemesh launch [slug] launch Claude Code on a mesh (alias: connect)
|
claudemesh launch [slug] launch Claude Code on a mesh (alias: connect)
|
||||||
claudemesh list show your meshes (alias: ls)
|
claudemesh list show your meshes (alias: ls)
|
||||||
claudemesh delete [slug] delete a mesh (alias: rm)
|
claudemesh delete [slug] delete a mesh (alias: rm)
|
||||||
claudemesh rename <slug> <name> rename a mesh
|
claudemesh rename <old> <new> change a mesh's slug (the identifier you see and type)
|
||||||
claudemesh share [email] share mesh (invite link / send email)
|
claudemesh share [email] share mesh (invite link / send email)
|
||||||
|
|
||||||
Peer (resource form, recommended)
|
Peer (resource form, recommended)
|
||||||
@@ -86,7 +93,19 @@ Peer (resource form, recommended)
|
|||||||
|
|
||||||
Message (resource form)
|
Message (resource form)
|
||||||
claudemesh message send <to> <m> send a message (alias: send)
|
claudemesh message send <to> <m> send a message (alias: send)
|
||||||
claudemesh message inbox drain pending (alias: inbox)
|
flags: [--priority now|next|low] [--mesh <slug>]
|
||||||
|
[--self] (allow targeting your own member/session pubkey;
|
||||||
|
fans out to every sibling session of your member)
|
||||||
|
[--json] (machine-readable result)
|
||||||
|
claudemesh message inbox read persisted inbox (alias: inbox)
|
||||||
|
flags: [--mesh <slug>] [--limit N] [--unread] [--json]
|
||||||
|
reads ~/.claudemesh/daemon/inbox.db via daemon
|
||||||
|
--unread → only rows never surfaced before (seen_at IS NULL);
|
||||||
|
listing stamps returned rows seen as a side effect
|
||||||
|
claudemesh inbox flush bulk-delete inbox rows
|
||||||
|
flags: [--mesh <slug>] [--before <iso-timestamp>] [--all]
|
||||||
|
--all required when neither --mesh nor --before is set
|
||||||
|
claudemesh inbox delete <id> delete one inbox row by id (alias: rm)
|
||||||
claudemesh message status <id> delivery status (alias: msg-status)
|
claudemesh message status <id> delivery status (alias: msg-status)
|
||||||
|
|
||||||
Memory (resource form)
|
Memory (resource form)
|
||||||
@@ -113,16 +132,23 @@ Bridge (forward a topic between two meshes, v0.2.0)
|
|||||||
|
|
||||||
Topic (conversation scope, v0.2.0)
|
Topic (conversation scope, v0.2.0)
|
||||||
claudemesh topic create <name> create a topic [--description --visibility]
|
claudemesh topic create <name> create a topic [--description --visibility]
|
||||||
claudemesh topic list list topics in the mesh
|
claudemesh topic list list topics across all meshes (or --mesh <slug>)
|
||||||
claudemesh topic join <topic> subscribe (via name or id)
|
claudemesh topic join <topic> subscribe (via name or id)
|
||||||
claudemesh topic leave <topic> unsubscribe
|
claudemesh topic leave <topic> unsubscribe
|
||||||
claudemesh topic members <t> list topic subscribers
|
claudemesh topic members <t> list topic subscribers
|
||||||
claudemesh topic history <t> fetch message history [--limit --before]
|
claudemesh topic history <t> fetch message history [--limit --before]
|
||||||
claudemesh topic read <topic> mark all as read
|
claudemesh topic read <topic> mark all as read
|
||||||
claudemesh topic tail <topic> live SSE tail [--limit --forward-only]
|
claudemesh topic tail <topic> live SSE tail [--limit --forward-only]
|
||||||
claudemesh send "#topic" "msg" send to a topic
|
claudemesh topic post <t> <msg> encrypted REST post (v0.3.0 v2) [--reply-to <id>]
|
||||||
|
claudemesh send "#topic" "msg" send to a topic (WS path, v1 plaintext)
|
||||||
|
claudemesh skill print the bundled SKILL.md to stdout
|
||||||
|
claudemesh me cross-mesh workspace overview (v0.4.0)
|
||||||
|
claudemesh me topics cross-mesh topic list [--unread]
|
||||||
|
claudemesh me notifications cross-mesh @-mentions [--all] [--since=ISO]
|
||||||
|
claudemesh me activity cross-mesh recent messages [--since=ISO]
|
||||||
|
claudemesh me search <q> cross-mesh search (topics + messages)
|
||||||
claudemesh member list mesh roster with online state [--online]
|
claudemesh member list mesh roster with online state [--online]
|
||||||
claudemesh notification list recent @-mentions of you [--since <ISO>]
|
claudemesh notification list @-mentions across all meshes (or --mesh <slug>)
|
||||||
|
|
||||||
Schedule (resource form)
|
Schedule (resource form)
|
||||||
claudemesh schedule msg <m> one-shot or recurring (alias: remind)
|
claudemesh schedule msg <m> one-shot or recurring (alias: remind)
|
||||||
@@ -148,9 +174,11 @@ Platform
|
|||||||
claudemesh stream create|publish|list pub/sub event bus
|
claudemesh stream create|publish|list pub/sub event bus
|
||||||
claudemesh sql query|execute|schema per-mesh SQL
|
claudemesh sql query|execute|schema per-mesh SQL
|
||||||
claudemesh skill list|get|remove mesh-published skills
|
claudemesh skill list|get|remove mesh-published skills
|
||||||
claudemesh vault list|delete encrypted secrets
|
claudemesh vault set|list|delete encrypted secrets (set: --type env|file --mount /p)
|
||||||
claudemesh watch list|remove URL change watchers
|
claudemesh watch add|list|remove URL change watchers (add: --label --interval --extract)
|
||||||
claudemesh webhook list|delete outbound HTTP triggers
|
claudemesh webhook create|list|delete outbound HTTP triggers
|
||||||
|
claudemesh file share <path> [--to peer] upload (or local-host fast path if --to matches)
|
||||||
|
claudemesh file get <id> [--out path] download by id
|
||||||
claudemesh file list|status|delete shared mesh files
|
claudemesh file list|status|delete shared mesh files
|
||||||
claudemesh mesh-mcp list|call|catalog deployed mesh-MCP servers
|
claudemesh mesh-mcp list|call|catalog deployed mesh-MCP servers
|
||||||
claudemesh clock set|pause|resume mesh logical clock
|
claudemesh clock set|pause|resume mesh logical clock
|
||||||
@@ -170,6 +198,19 @@ Security
|
|||||||
claudemesh backup [file] encrypt config → portable recovery file
|
claudemesh backup [file] encrypt config → portable recovery file
|
||||||
claudemesh restore <file> restore config from a backup file
|
claudemesh restore <file> restore config from a backup file
|
||||||
|
|
||||||
|
Daemon (long-lived peer mesh runtime — universal across every joined mesh)
|
||||||
|
claudemesh daemon up start daemon (alias: start) [--no-tcp]
|
||||||
|
claudemesh daemon status show running pid + IPC health [--json]
|
||||||
|
claudemesh daemon down stop daemon (alias: stop)
|
||||||
|
claudemesh daemon version ipc + schema version of running daemon
|
||||||
|
claudemesh daemon outbox list list local outbox rows [--failed|--pending|--inflight|--done]
|
||||||
|
claudemesh daemon outbox requeue <id> re-enqueue an aborted/dead row [--new-client-id <id>]
|
||||||
|
claudemesh daemon accept-host pin current host fingerprint
|
||||||
|
claudemesh daemon install-service write launchd / systemd-user unit
|
||||||
|
claudemesh daemon uninstall-service remove the unit
|
||||||
|
Note: the daemon attaches to every mesh in ~/.claudemesh/config.json
|
||||||
|
automatically; --mesh on up / install-service is deprecated and ignored.
|
||||||
|
|
||||||
Setup
|
Setup
|
||||||
claudemesh install register MCP server + hooks
|
claudemesh install register MCP server + hooks
|
||||||
claudemesh uninstall remove MCP server + hooks
|
claudemesh uninstall remove MCP server + hooks
|
||||||
@@ -189,10 +230,63 @@ Flags
|
|||||||
--policy <path> override policy file
|
--policy <path> override policy file
|
||||||
-y, --yes skip confirmations (= --approval-mode yolo)
|
-y, --yes skip confirmations (= --approval-mode yolo)
|
||||||
-q, --quiet suppress non-essential output
|
-q, --quiet suppress non-essential output
|
||||||
|
--strict require daemon for broker-touching verbs (no cold-path fallback)
|
||||||
|
--no-daemon skip daemon entirely; open broker WS directly (CI / sandboxed scripts)
|
||||||
`;
|
`;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* Apply color treatment to the HELP block for terminal readability.
|
||||||
|
*
|
||||||
|
* Strategy is line-based and intentionally conservative:
|
||||||
|
* - Section header lines (the title-case categories like `Mesh`,
|
||||||
|
* `Topic`, `Auth`, `USAGE`) get bold + accent.
|
||||||
|
* - Each verb row (` claudemesh <verb> ...`) gets the command tinted
|
||||||
|
* cyan up to the second whitespace gap (separating the syntax from
|
||||||
|
* the description), and any trailing `(alias: ...)` parenthetical
|
||||||
|
* dimmed so it reads as secondary metadata.
|
||||||
|
* - The header (program name + version) gets the brand orange.
|
||||||
|
*
|
||||||
|
* Falls through to plain output when stdout is not a TTY or NO_COLOR
|
||||||
|
* is set — the underlying style helpers already gate on that.
|
||||||
|
*/
|
||||||
|
function colorizeHelp(raw: string): string {
|
||||||
|
const lines = raw.split("\n");
|
||||||
|
const SECTION_HEADER_RE = /^([A-Z][A-Za-z0-9 /+-]*?)(\s*\(.*\))?$/;
|
||||||
|
const VERB_ROW_RE = /^(\s{2})(claudemesh[^\s]*(?:\s+[^\s]+)*?)(\s{2,})(.*)$/;
|
||||||
|
const ALIAS_RE = /(\(alias[^)]*\))/g;
|
||||||
|
const out: string[] = [];
|
||||||
|
for (const line of lines) {
|
||||||
|
if (line.startsWith("claudemesh —")) {
|
||||||
|
out.push(orange(line));
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
if (line.trim() === "") {
|
||||||
|
out.push(line);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
// Section header: a line with no leading spaces that isn't a verb.
|
||||||
|
if (!line.startsWith(" ") && SECTION_HEADER_RE.test(line)) {
|
||||||
|
const m = line.match(SECTION_HEADER_RE)!;
|
||||||
|
const head = bold(clay(m[1]!));
|
||||||
|
const meta = m[2] ? dim(m[2]) : "";
|
||||||
|
out.push(head + meta);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
// Verb row: tint the syntax, dim the alias parenthetical.
|
||||||
|
const verbMatch = line.match(VERB_ROW_RE);
|
||||||
|
if (verbMatch) {
|
||||||
|
const [, indent, syntax, gap, rest] = verbMatch;
|
||||||
|
const dimmedRest = rest!.replace(ALIAS_RE, (m) => dim(m));
|
||||||
|
out.push(`${indent}${cyan(syntax!)}${gap}${dimmedRest}`);
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
out.push(line);
|
||||||
|
}
|
||||||
|
return out.join("\n");
|
||||||
|
}
|
||||||
|
|
||||||
async function main(): Promise<void> {
|
async function main(): Promise<void> {
|
||||||
if (flags.help || flags.h) { console.log(HELP); process.exit(EXIT.SUCCESS); }
|
if (flags.help || flags.h) { console.log(colorizeHelp(HELP)); process.exit(EXIT.SUCCESS); }
|
||||||
if (flags.version || flags.V) { console.log(renderVersion()); process.exit(EXIT.SUCCESS); }
|
if (flags.version || flags.V) { console.log(renderVersion()); process.exit(EXIT.SUCCESS); }
|
||||||
|
|
||||||
// Policy gate — runs before any broker-touching command. Skipped for help,
|
// Policy gate — runs before any broker-touching command. Skipped for help,
|
||||||
@@ -211,6 +305,12 @@ async function main(): Promise<void> {
|
|||||||
join: normaliseInviteUrl(command),
|
join: normaliseInviteUrl(command),
|
||||||
yes: !!flags.y || !!flags.yes,
|
yes: !!flags.y || !!flags.yes,
|
||||||
resume: flags.resume as string | undefined,
|
resume: flags.resume as string | undefined,
|
||||||
|
role: flags.role as string | undefined,
|
||||||
|
groups: flags.groups as string | undefined,
|
||||||
|
"message-mode": flags["message-mode"] as string | undefined,
|
||||||
|
"system-prompt": flags["system-prompt"] as string | undefined,
|
||||||
|
continue: !!flags.continue,
|
||||||
|
quiet: !!flags.quiet,
|
||||||
}, process.argv.slice(2));
|
}, process.argv.slice(2));
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
@@ -226,6 +326,12 @@ async function main(): Promise<void> {
|
|||||||
name: flags.name as string | undefined,
|
name: flags.name as string | undefined,
|
||||||
yes: !!flags.y || !!flags.yes,
|
yes: !!flags.y || !!flags.yes,
|
||||||
resume: flags.resume as string | undefined,
|
resume: flags.resume as string | undefined,
|
||||||
|
role: flags.role as string | undefined,
|
||||||
|
groups: flags.groups as string | undefined,
|
||||||
|
"message-mode": flags["message-mode"] as string | undefined,
|
||||||
|
"system-prompt": flags["system-prompt"] as string | undefined,
|
||||||
|
continue: !!flags.continue,
|
||||||
|
quiet: !!flags.quiet,
|
||||||
}, process.argv.slice(2));
|
}, process.argv.slice(2));
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
@@ -244,6 +350,12 @@ async function main(): Promise<void> {
|
|||||||
join: flags.join as string,
|
join: flags.join as string,
|
||||||
yes: !!flags.y || !!flags.yes,
|
yes: !!flags.y || !!flags.yes,
|
||||||
resume: flags.resume as string,
|
resume: flags.resume as string,
|
||||||
|
role: flags.role as string,
|
||||||
|
groups: flags.groups as string,
|
||||||
|
"message-mode": flags["message-mode"] as string,
|
||||||
|
"system-prompt": flags["system-prompt"] as string,
|
||||||
|
continue: !!flags.continue,
|
||||||
|
quiet: !!flags.quiet,
|
||||||
}, process.argv.slice(2));
|
}, process.argv.slice(2));
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
@@ -252,6 +364,37 @@ async function main(): Promise<void> {
|
|||||||
case "delete": case "rm": { const { deleteMesh } = await import("~/commands/delete-mesh.js"); process.exit(await deleteMesh(positionals[0] ?? "", { yes: !!flags.y || !!flags.yes })); break; }
|
case "delete": case "rm": { const { deleteMesh } = await import("~/commands/delete-mesh.js"); process.exit(await deleteMesh(positionals[0] ?? "", { yes: !!flags.y || !!flags.yes })); break; }
|
||||||
case "rename": { const { rename } = await import("~/commands/rename.js"); process.exit(await rename(positionals[0] ?? "", positionals[1] ?? "")); break; }
|
case "rename": { const { rename } = await import("~/commands/rename.js"); process.exit(await rename(positionals[0] ?? "", positionals[1] ?? "")); break; }
|
||||||
case "share": case "invite": { const { invite } = await import("~/commands/invite.js"); process.exit(await invite(positionals[0], { mesh: flags.mesh as string, json: !!flags.json })); break; }
|
case "share": case "invite": { const { invite } = await import("~/commands/invite.js"); process.exit(await invite(positionals[0], { mesh: flags.mesh as string, json: !!flags.json })); break; }
|
||||||
|
// workspace — alias surface for mesh-management verbs (v1.27.0 teaser; full
|
||||||
|
// rename arrives in 1.28.0). Each sub mirrors an existing top-level verb.
|
||||||
|
case "workspace": {
|
||||||
|
const sub = positionals[0];
|
||||||
|
if (!sub || sub === "launch" || sub === "connect" || sub === "open") {
|
||||||
|
const { runLaunch } = await import("~/commands/launch.js");
|
||||||
|
await runLaunch({
|
||||||
|
mesh: positionals[1] ?? flags.mesh as string,
|
||||||
|
name: flags.name as string,
|
||||||
|
join: flags.join as string,
|
||||||
|
yes: !!flags.y || !!flags.yes,
|
||||||
|
resume: flags.resume as string,
|
||||||
|
role: flags.role as string,
|
||||||
|
groups: flags.groups as string,
|
||||||
|
"message-mode": flags["message-mode"] as string,
|
||||||
|
"system-prompt": flags["system-prompt"] as string,
|
||||||
|
continue: !!flags.continue,
|
||||||
|
quiet: !!flags.quiet,
|
||||||
|
}, process.argv.slice(2));
|
||||||
|
}
|
||||||
|
else if (sub === "list" || sub === "ls") { const { runList } = await import("~/commands/list.js"); await runList(); }
|
||||||
|
else if (sub === "info") { const { runInfo } = await import("~/commands/info.js"); await runInfo({}); }
|
||||||
|
else if (sub === "create" || sub === "new") { const { newMesh } = await import("~/commands/new.js"); process.exit(await newMesh(positionals[1] ?? "", { json: !!flags.json })); }
|
||||||
|
else if (sub === "join" || sub === "add") { const { runJoin } = await import("~/commands/join.js"); await runJoin(positionals.slice(1)); }
|
||||||
|
else if (sub === "delete" || sub === "rm") { const { deleteMesh } = await import("~/commands/delete-mesh.js"); process.exit(await deleteMesh(positionals[1] ?? "", { yes: !!flags.y || !!flags.yes })); }
|
||||||
|
else if (sub === "rename") { const { rename } = await import("~/commands/rename.js"); process.exit(await rename(positionals[1] ?? "", positionals[2] ?? "")); }
|
||||||
|
else if (sub === "share" || sub === "invite") { const { invite } = await import("~/commands/invite.js"); process.exit(await invite(positionals[1], { mesh: flags.mesh as string, json: !!flags.json })); }
|
||||||
|
else if (sub === "overview") { const { runMe } = await import("~/commands/me.js"); process.exit(await runMe({ mesh: flags.mesh as string, json: !!flags.json })); }
|
||||||
|
else { console.error("Usage: claudemesh workspace <list|info|create|join|delete|rename|share|launch|overview>"); process.exit(EXIT.INVALID_ARGS); }
|
||||||
|
break;
|
||||||
|
}
|
||||||
case "disconnect": { const { runDisconnect } = await import("~/commands/kick.js"); process.exit(await runDisconnect(positionals[0], { mesh: flags.mesh as string, stale: flags.stale as string, all: !!flags.all })); break; }
|
case "disconnect": { const { runDisconnect } = await import("~/commands/kick.js"); process.exit(await runDisconnect(positionals[0], { mesh: flags.mesh as string, stale: flags.stale as string, all: !!flags.all })); break; }
|
||||||
case "kick": { const { runKick } = await import("~/commands/kick.js"); process.exit(await runKick(positionals[0], { mesh: flags.mesh as string, stale: flags.stale as string, all: !!flags.all })); break; }
|
case "kick": { const { runKick } = await import("~/commands/kick.js"); process.exit(await runKick(positionals[0], { mesh: flags.mesh as string, stale: flags.stale as string, all: !!flags.all })); break; }
|
||||||
case "ban": { const { runBan } = await import("~/commands/ban.js"); process.exit(await runBan(positionals[0], { mesh: flags.mesh as string })); break; }
|
case "ban": { const { runBan } = await import("~/commands/ban.js"); process.exit(await runBan(positionals[0], { mesh: flags.mesh as string })); break; }
|
||||||
@@ -259,19 +402,59 @@ async function main(): Promise<void> {
|
|||||||
case "bans": { const { runBans } = await import("~/commands/ban.js"); process.exit(await runBans({ mesh: flags.mesh as string, json: !!flags.json })); break; }
|
case "bans": { const { runBans } = await import("~/commands/ban.js"); process.exit(await runBans({ mesh: flags.mesh as string, json: !!flags.json })); break; }
|
||||||
|
|
||||||
// Messaging
|
// Messaging
|
||||||
case "peers": { const { runPeers } = await import("~/commands/peers.js"); await runPeers({ mesh: flags.mesh as string, json: flags.json as boolean | string | undefined }); break; }
|
case "peers": { const { runPeers } = await import("~/commands/peers.js"); await runPeers({ mesh: flags.mesh as string, json: flags.json as boolean | string | undefined, all: !!flags.all }); break; }
|
||||||
case "send": { const { runSend } = await import("~/commands/send.js"); await runSend({ mesh: flags.mesh as string, priority: flags.priority as string, json: !!flags.json }, positionals[0] ?? "", positionals.slice(1).join(" ")); break; }
|
case "send": { const { runSend } = await import("~/commands/send.js"); await runSend({ mesh: flags.mesh as string, priority: flags.priority as string, json: !!flags.json, self: !!flags.self }, positionals[0] ?? "", positionals.slice(1).join(" ")); break; }
|
||||||
case "inbox": { const { runInbox } = await import("~/commands/inbox.js"); await runInbox({ json: !!flags.json }); break; }
|
case "inbox": {
|
||||||
|
const sub = positionals[0];
|
||||||
|
if (sub === "flush") {
|
||||||
|
const { runInboxFlush } = await import("~/commands/inbox-actions.js");
|
||||||
|
await runInboxFlush({
|
||||||
|
mesh: flags.mesh as string | undefined,
|
||||||
|
before: flags.before as string | undefined,
|
||||||
|
all: !!flags.all,
|
||||||
|
json: !!flags.json,
|
||||||
|
});
|
||||||
|
} else if (sub === "delete" || sub === "rm") {
|
||||||
|
const { runInboxDelete } = await import("~/commands/inbox-actions.js");
|
||||||
|
await runInboxDelete(positionals[1] ?? "", { json: !!flags.json });
|
||||||
|
} else {
|
||||||
|
const { runInbox } = await import("~/commands/inbox.js");
|
||||||
|
await runInbox({
|
||||||
|
mesh: flags.mesh as string | undefined,
|
||||||
|
json: !!flags.json,
|
||||||
|
limit: typeof flags.limit === "number" ? flags.limit : (typeof flags.limit === "string" ? Number.parseInt(flags.limit, 10) : undefined),
|
||||||
|
unread: !!flags.unread,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
}
|
||||||
case "state": {
|
case "state": {
|
||||||
const sub = positionals[0];
|
const sub = positionals[0];
|
||||||
if (sub === "set") { const { runStateSet } = await import("~/commands/state.js"); await runStateSet({}, positionals[1] ?? "", positionals[2] ?? ""); }
|
if (sub === "set") { const { runStateSet } = await import("~/commands/state.js"); await runStateSet({}, positionals[1] ?? "", positionals[2] ?? ""); }
|
||||||
else if (sub === "list") { const { runStateList } = await import("~/commands/state.js"); await runStateList({}); }
|
else if (sub === "list") {
|
||||||
|
// v0.5.0 phase 2: aggregate across every mesh when --mesh is omitted.
|
||||||
|
if (!flags.mesh) {
|
||||||
|
const { runMeState } = await import("~/commands/me.js");
|
||||||
|
process.exit(await runMeState({ json: !!flags.json, key: flags.key as string | undefined }));
|
||||||
|
}
|
||||||
|
const { runStateList } = await import("~/commands/state.js");
|
||||||
|
await runStateList({});
|
||||||
|
}
|
||||||
else { const { runStateGet } = await import("~/commands/state.js"); await runStateGet({}, positionals[0] ?? ""); }
|
else { const { runStateGet } = await import("~/commands/state.js"); await runStateGet({}, positionals[0] ?? ""); }
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
case "info": { const { runInfo } = await import("~/commands/info.js"); await runInfo({}); break; }
|
case "info": { const { runInfo } = await import("~/commands/info.js"); await runInfo({}); break; }
|
||||||
case "remember": { const { remember } = await import("~/commands/remember.js"); process.exit(await remember(positionals.join(" "), { mesh: flags.mesh as string, tags: flags.tags as string, json: !!flags.json })); break; }
|
case "remember": { const { remember } = await import("~/commands/remember.js"); process.exit(await remember(positionals.join(" "), { mesh: flags.mesh as string, tags: flags.tags as string, json: !!flags.json })); break; }
|
||||||
case "recall": { const { recall } = await import("~/commands/recall.js"); process.exit(await recall(positionals.join(" "), { mesh: flags.mesh as string, json: !!flags.json })); break; }
|
case "recall": {
|
||||||
|
// v0.5.0 phase 2: aggregate across every mesh when --mesh is omitted.
|
||||||
|
if (!flags.mesh) {
|
||||||
|
const { runMeMemory } = await import("~/commands/me.js");
|
||||||
|
process.exit(await runMeMemory({ json: !!flags.json, query: positionals.join(" ") }));
|
||||||
|
}
|
||||||
|
const { recall } = await import("~/commands/recall.js");
|
||||||
|
process.exit(await recall(positionals.join(" "), { mesh: flags.mesh as string, json: !!flags.json }));
|
||||||
|
break;
|
||||||
|
}
|
||||||
case "forget": { const { runForget } = await import("~/commands/broker-actions.js"); process.exit(await runForget(positionals[0], { mesh: flags.mesh as string, json: !!flags.json })); break; }
|
case "forget": { const { runForget } = await import("~/commands/broker-actions.js"); process.exit(await runForget(positionals[0], { mesh: flags.mesh as string, json: !!flags.json })); break; }
|
||||||
case "remind": { const { runRemind } = await import("~/commands/remind.js"); await runRemind({ mesh: flags.mesh as string }, positionals); break; }
|
case "remind": { const { runRemind } = await import("~/commands/remind.js"); await runRemind({ mesh: flags.mesh as string }, positionals); break; }
|
||||||
// (profile case moved to resource-aliases block below for sub-command extensibility)
|
// (profile case moved to resource-aliases block below for sub-command extensibility)
|
||||||
@@ -299,8 +482,37 @@ async function main(): Promise<void> {
|
|||||||
case "logout": { const { logout } = await import("~/commands/logout.js"); process.exit(await logout()); break; }
|
case "logout": { const { logout } = await import("~/commands/logout.js"); process.exit(await logout()); break; }
|
||||||
case "whoami": { const { whoami } = await import("~/commands/whoami.js"); process.exit(await whoami({ json: !!flags.json })); break; }
|
case "whoami": { const { whoami } = await import("~/commands/whoami.js"); process.exit(await whoami({ json: !!flags.json })); break; }
|
||||||
|
|
||||||
|
// Daemon (v0.9.0)
|
||||||
|
case "daemon": {
|
||||||
|
const { runDaemonCommand } = await import("~/commands/daemon.js");
|
||||||
|
const sub = positionals[0];
|
||||||
|
const rest = positionals.slice(1);
|
||||||
|
const outboxStatus =
|
||||||
|
flags.failed ? "dead" :
|
||||||
|
flags.pending ? "pending" :
|
||||||
|
flags.inflight ? "inflight" :
|
||||||
|
flags.done ? "done" :
|
||||||
|
flags.aborted ? "aborted" : undefined;
|
||||||
|
const code = await runDaemonCommand(sub, {
|
||||||
|
json: !!flags.json,
|
||||||
|
noTcp: !!flags["no-tcp"],
|
||||||
|
publicHealth: !!flags["public-health"],
|
||||||
|
mesh: flags.mesh as string | undefined,
|
||||||
|
displayName: flags.name as string | undefined,
|
||||||
|
// 1.34.12: --foreground opts out of the new "detach by default"
|
||||||
|
// behavior. install-service and `claudemesh launch`'s auto-spawn
|
||||||
|
// path always run with --foreground so their parents (launchd /
|
||||||
|
// the launch helper) own lifecycle and stdio redirection.
|
||||||
|
foreground: !!flags.foreground,
|
||||||
|
outboxStatus,
|
||||||
|
newClientId: flags["new-client-id"] as string | undefined,
|
||||||
|
}, rest);
|
||||||
|
process.exit(code);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
// Setup
|
// Setup
|
||||||
case "install": { const { runInstall } = await import("~/commands/install.js"); runInstall(positionals); break; }
|
case "install": { const { runInstall } = await import("~/commands/install.js"); await runInstall(positionals); break; }
|
||||||
case "uninstall": { const { uninstall } = await import("~/commands/uninstall.js"); process.exit(await uninstall()); break; }
|
case "uninstall": { const { uninstall } = await import("~/commands/uninstall.js"); process.exit(await uninstall()); break; }
|
||||||
case "doctor": { const { runDoctor } = await import("~/commands/doctor.js"); await runDoctor(); break; }
|
case "doctor": { const { runDoctor } = await import("~/commands/doctor.js"); await runDoctor(); break; }
|
||||||
case "status": {
|
case "status": {
|
||||||
@@ -340,7 +552,7 @@ async function main(): Promise<void> {
|
|||||||
|
|
||||||
case "peer": {
|
case "peer": {
|
||||||
const sub = positionals[0];
|
const sub = positionals[0];
|
||||||
const f = { mesh: flags.mesh as string, json: flags.json as boolean | string | undefined };
|
const f = { mesh: flags.mesh as string, json: flags.json as boolean | string | undefined, all: !!flags.all };
|
||||||
const id = positionals[1] ?? "";
|
const id = positionals[1] ?? "";
|
||||||
if (sub === "list") { const { runPeers } = await import("~/commands/peers.js"); await runPeers(f); }
|
if (sub === "list") { const { runPeers } = await import("~/commands/peers.js"); await runPeers(f); }
|
||||||
else if (sub === "kick") { const { runKick } = await import("~/commands/kick.js"); process.exit(await runKick(id, { mesh: flags.mesh as string, stale: flags.stale as string, all: !!flags.all })); }
|
else if (sub === "kick") { const { runKick } = await import("~/commands/kick.js"); process.exit(await runKick(id, { mesh: flags.mesh as string, stale: flags.stale as string, all: !!flags.all })); }
|
||||||
@@ -355,8 +567,30 @@ async function main(): Promise<void> {
|
|||||||
|
|
||||||
case "message": {
|
case "message": {
|
||||||
const sub = positionals[0];
|
const sub = positionals[0];
|
||||||
if (sub === "send") { const { runSend } = await import("~/commands/send.js"); await runSend({ mesh: flags.mesh as string, priority: flags.priority as string, json: !!flags.json }, positionals[1] ?? "", positionals.slice(2).join(" ")); }
|
if (sub === "send") { const { runSend } = await import("~/commands/send.js"); await runSend({ mesh: flags.mesh as string, priority: flags.priority as string, json: !!flags.json, self: !!flags.self }, positionals[1] ?? "", positionals.slice(2).join(" ")); }
|
||||||
else if (sub === "inbox") { const { runInbox } = await import("~/commands/inbox.js"); await runInbox({ json: !!flags.json }); }
|
else if (sub === "inbox") {
|
||||||
|
const sub2 = positionals[1];
|
||||||
|
if (sub2 === "flush") {
|
||||||
|
const { runInboxFlush } = await import("~/commands/inbox-actions.js");
|
||||||
|
await runInboxFlush({
|
||||||
|
mesh: flags.mesh as string | undefined,
|
||||||
|
before: flags.before as string | undefined,
|
||||||
|
all: !!flags.all,
|
||||||
|
json: !!flags.json,
|
||||||
|
});
|
||||||
|
} else if (sub2 === "delete" || sub2 === "rm") {
|
||||||
|
const { runInboxDelete } = await import("~/commands/inbox-actions.js");
|
||||||
|
await runInboxDelete(positionals[2] ?? "", { json: !!flags.json });
|
||||||
|
} else {
|
||||||
|
const { runInbox } = await import("~/commands/inbox.js");
|
||||||
|
await runInbox({
|
||||||
|
mesh: flags.mesh as string | undefined,
|
||||||
|
json: !!flags.json,
|
||||||
|
limit: typeof flags.limit === "number" ? flags.limit : (typeof flags.limit === "string" ? Number.parseInt(flags.limit, 10) : undefined),
|
||||||
|
unread: !!flags.unread,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
}
|
||||||
else if (sub === "status") { const { runMsgStatus } = await import("~/commands/broker-actions.js"); process.exit(await runMsgStatus(positionals[1], { mesh: flags.mesh as string, json: !!flags.json })); }
|
else if (sub === "status") { const { runMsgStatus } = await import("~/commands/broker-actions.js"); process.exit(await runMsgStatus(positionals[1], { mesh: flags.mesh as string, json: !!flags.json })); }
|
||||||
else { console.error("Usage: claudemesh message <send|inbox|status>"); process.exit(EXIT.INVALID_ARGS); }
|
else { console.error("Usage: claudemesh message <send|inbox|status>"); process.exit(EXIT.INVALID_ARGS); }
|
||||||
break;
|
break;
|
||||||
@@ -463,10 +697,14 @@ async function main(): Promise<void> {
|
|||||||
case "skill": {
|
case "skill": {
|
||||||
const sub = positionals[0];
|
const sub = positionals[0];
|
||||||
const f = { mesh: flags.mesh as string, json: !!flags.json };
|
const f = { mesh: flags.mesh as string, json: !!flags.json };
|
||||||
if (sub === "list") { const { runSkillList } = await import("~/commands/platform-actions.js"); process.exit(await runSkillList({ ...f, query: positionals[1] })); }
|
// No subcommand → print the bundled SKILL.md to stdout. Lets a
|
||||||
|
// fresh user pipe `claudemesh skill | claude --skill-add -`
|
||||||
|
// without copying anything into ~/.claude/skills (v1.18.0).
|
||||||
|
if (!sub) { const { runSkill } = await import("~/commands/skill.js"); process.exit(await runSkill()); }
|
||||||
|
else if (sub === "list") { const { runSkillList } = await import("~/commands/platform-actions.js"); process.exit(await runSkillList({ ...f, query: positionals[1] })); }
|
||||||
else if (sub === "get") { const { runSkillGet } = await import("~/commands/platform-actions.js"); process.exit(await runSkillGet(positionals[1] ?? "", f)); }
|
else if (sub === "get") { const { runSkillGet } = await import("~/commands/platform-actions.js"); process.exit(await runSkillGet(positionals[1] ?? "", f)); }
|
||||||
else if (sub === "remove") { const { runSkillRemove } = await import("~/commands/platform-actions.js"); process.exit(await runSkillRemove(positionals[1] ?? "", f)); }
|
else if (sub === "remove") { const { runSkillRemove } = await import("~/commands/platform-actions.js"); process.exit(await runSkillRemove(positionals[1] ?? "", f)); }
|
||||||
else { console.error("Usage: claudemesh skill <list|get|remove>"); process.exit(EXIT.INVALID_ARGS); }
|
else { console.error("Usage: claudemesh skill (print bundled SKILL.md)\n claudemesh skill <list|get|remove>"); process.exit(EXIT.INVALID_ARGS); }
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
case "vault": {
|
case "vault": {
|
||||||
@@ -474,7 +712,16 @@ async function main(): Promise<void> {
|
|||||||
const f = { mesh: flags.mesh as string, json: !!flags.json };
|
const f = { mesh: flags.mesh as string, json: !!flags.json };
|
||||||
if (sub === "list") { const { runVaultList } = await import("~/commands/platform-actions.js"); process.exit(await runVaultList(f)); }
|
if (sub === "list") { const { runVaultList } = await import("~/commands/platform-actions.js"); process.exit(await runVaultList(f)); }
|
||||||
else if (sub === "delete") { const { runVaultDelete } = await import("~/commands/platform-actions.js"); process.exit(await runVaultDelete(positionals[1] ?? "", f)); }
|
else if (sub === "delete") { const { runVaultDelete } = await import("~/commands/platform-actions.js"); process.exit(await runVaultDelete(positionals[1] ?? "", f)); }
|
||||||
else { console.error("Usage: claudemesh vault <list|delete> (set/get currently via MCP — needs crypto)"); process.exit(EXIT.INVALID_ARGS); }
|
else if (sub === "set") {
|
||||||
|
const { runVaultSet } = await import("~/commands/platform-actions.js");
|
||||||
|
process.exit(await runVaultSet(positionals[1] ?? "", positionals[2] ?? "", {
|
||||||
|
...f,
|
||||||
|
entryType: (flags.type as "env" | "file" | undefined),
|
||||||
|
mountPath: flags.mount as string | undefined,
|
||||||
|
description: flags.description as string | undefined,
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
else { console.error("Usage: claudemesh vault <list|set|delete>"); process.exit(EXIT.INVALID_ARGS); }
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
case "watch": {
|
case "watch": {
|
||||||
@@ -482,7 +729,18 @@ async function main(): Promise<void> {
|
|||||||
const f = { mesh: flags.mesh as string, json: !!flags.json };
|
const f = { mesh: flags.mesh as string, json: !!flags.json };
|
||||||
if (sub === "list") { const { runWatchList } = await import("~/commands/platform-actions.js"); process.exit(await runWatchList(f)); }
|
if (sub === "list") { const { runWatchList } = await import("~/commands/platform-actions.js"); process.exit(await runWatchList(f)); }
|
||||||
else if (sub === "remove") { const { runUnwatch } = await import("~/commands/platform-actions.js"); process.exit(await runUnwatch(positionals[1] ?? "", f)); }
|
else if (sub === "remove") { const { runUnwatch } = await import("~/commands/platform-actions.js"); process.exit(await runUnwatch(positionals[1] ?? "", f)); }
|
||||||
else { console.error("Usage: claudemesh watch <list|remove>"); process.exit(EXIT.INVALID_ARGS); }
|
else if (sub === "add") {
|
||||||
|
const { runWatchAdd } = await import("~/commands/platform-actions.js");
|
||||||
|
process.exit(await runWatchAdd(positionals[1] ?? "", {
|
||||||
|
...f,
|
||||||
|
label: flags.label as string | undefined,
|
||||||
|
interval: flags.interval ? Number(flags.interval) : undefined,
|
||||||
|
mode: flags.mode as string | undefined,
|
||||||
|
extract: flags.extract as string | undefined,
|
||||||
|
notifyOn: flags["notify-on"] as string | undefined,
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
else { console.error("Usage: claudemesh watch <list|add|remove>"); process.exit(EXIT.INVALID_ARGS); }
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
case "webhook": {
|
case "webhook": {
|
||||||
@@ -490,16 +748,31 @@ async function main(): Promise<void> {
|
|||||||
const f = { mesh: flags.mesh as string, json: !!flags.json };
|
const f = { mesh: flags.mesh as string, json: !!flags.json };
|
||||||
if (sub === "list") { const { runWebhookList } = await import("~/commands/platform-actions.js"); process.exit(await runWebhookList(f)); }
|
if (sub === "list") { const { runWebhookList } = await import("~/commands/platform-actions.js"); process.exit(await runWebhookList(f)); }
|
||||||
else if (sub === "delete") { const { runWebhookDelete } = await import("~/commands/platform-actions.js"); process.exit(await runWebhookDelete(positionals[1] ?? "", f)); }
|
else if (sub === "delete") { const { runWebhookDelete } = await import("~/commands/platform-actions.js"); process.exit(await runWebhookDelete(positionals[1] ?? "", f)); }
|
||||||
else { console.error("Usage: claudemesh webhook <list|delete>"); process.exit(EXIT.INVALID_ARGS); }
|
else if (sub === "create") { const { runWebhookCreate } = await import("~/commands/platform-actions.js"); process.exit(await runWebhookCreate(positionals[1] ?? "", f)); }
|
||||||
|
else { console.error("Usage: claudemesh webhook <list|create|delete>"); process.exit(EXIT.INVALID_ARGS); }
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
case "file": {
|
case "file": {
|
||||||
const sub = positionals[0];
|
const sub = positionals[0];
|
||||||
const f = { mesh: flags.mesh as string, json: !!flags.json };
|
const f = { mesh: flags.mesh as string, json: !!flags.json };
|
||||||
if (sub === "list") { const { runFileList } = await import("~/commands/platform-actions.js"); process.exit(await runFileList({ ...f, query: positionals[1] })); }
|
if (sub === "share") {
|
||||||
|
const { runFileShare } = await import("~/commands/file.js");
|
||||||
|
process.exit(await runFileShare(positionals[1] ?? "", {
|
||||||
|
...f,
|
||||||
|
to: flags.to as string | undefined,
|
||||||
|
tags: flags.tags as string | undefined,
|
||||||
|
message: flags.message as string | undefined,
|
||||||
|
upload: !!flags.upload,
|
||||||
|
}));
|
||||||
|
}
|
||||||
|
else if (sub === "get") {
|
||||||
|
const { runFileGet } = await import("~/commands/file.js");
|
||||||
|
process.exit(await runFileGet(positionals[1] ?? "", { ...f, out: flags.out as string | undefined }));
|
||||||
|
}
|
||||||
|
else if (sub === "list") { const { runFileList } = await import("~/commands/platform-actions.js"); process.exit(await runFileList({ ...f, query: positionals[1] })); }
|
||||||
else if (sub === "status") { const { runFileStatus } = await import("~/commands/platform-actions.js"); process.exit(await runFileStatus(positionals[1] ?? "", f)); }
|
else if (sub === "status") { const { runFileStatus } = await import("~/commands/platform-actions.js"); process.exit(await runFileStatus(positionals[1] ?? "", f)); }
|
||||||
else if (sub === "delete") { const { runFileDelete } = await import("~/commands/platform-actions.js"); process.exit(await runFileDelete(positionals[1] ?? "", f)); }
|
else if (sub === "delete") { const { runFileDelete } = await import("~/commands/platform-actions.js"); process.exit(await runFileDelete(positionals[1] ?? "", f)); }
|
||||||
else { console.error("Usage: claudemesh file <list|status|delete>"); process.exit(EXIT.INVALID_ARGS); }
|
else { console.error("Usage: claudemesh file <share|get|list|status|delete>"); process.exit(EXIT.INVALID_ARGS); }
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
case "mesh-mcp": {
|
case "mesh-mcp": {
|
||||||
@@ -570,7 +843,17 @@ async function main(): Promise<void> {
|
|||||||
};
|
};
|
||||||
const arg = positionals[1] ?? "";
|
const arg = positionals[1] ?? "";
|
||||||
if (sub === "create") { const { runTopicCreate } = await import("~/commands/topic.js"); process.exit(await runTopicCreate(arg, f)); }
|
if (sub === "create") { const { runTopicCreate } = await import("~/commands/topic.js"); process.exit(await runTopicCreate(arg, f)); }
|
||||||
else if (sub === "list") { const { runTopicList } = await import("~/commands/topic.js"); process.exit(await runTopicList(f)); }
|
else if (sub === "list") {
|
||||||
|
// v0.5.0: aggregate across every joined mesh when --mesh is omitted.
|
||||||
|
// The per-mesh runTopicList prompts for a mesh; me topics doesn't
|
||||||
|
// need one and surfaces last-activity + unread per topic.
|
||||||
|
if (!f.mesh) {
|
||||||
|
const { runMeTopics } = await import("~/commands/me.js");
|
||||||
|
process.exit(await runMeTopics({ mesh: undefined, json: f.json, unread: !!flags.unread }));
|
||||||
|
}
|
||||||
|
const { runTopicList } = await import("~/commands/topic.js");
|
||||||
|
process.exit(await runTopicList(f));
|
||||||
|
}
|
||||||
else if (sub === "join") { const { runTopicJoin } = await import("~/commands/topic.js"); process.exit(await runTopicJoin(arg, f)); }
|
else if (sub === "join") { const { runTopicJoin } = await import("~/commands/topic.js"); process.exit(await runTopicJoin(arg, f)); }
|
||||||
else if (sub === "leave") { const { runTopicLeave } = await import("~/commands/topic.js"); process.exit(await runTopicLeave(arg, f)); }
|
else if (sub === "leave") { const { runTopicLeave } = await import("~/commands/topic.js"); process.exit(await runTopicLeave(arg, f)); }
|
||||||
else if (sub === "members") { const { runTopicMembers } = await import("~/commands/topic.js"); process.exit(await runTopicMembers(arg, f)); }
|
else if (sub === "members") { const { runTopicMembers } = await import("~/commands/topic.js"); process.exit(await runTopicMembers(arg, f)); }
|
||||||
@@ -586,7 +869,18 @@ async function main(): Promise<void> {
|
|||||||
const { runTopicTail } = await import("~/commands/topic-tail.js");
|
const { runTopicTail } = await import("~/commands/topic-tail.js");
|
||||||
process.exit(await runTopicTail(arg, tailFlags));
|
process.exit(await runTopicTail(arg, tailFlags));
|
||||||
}
|
}
|
||||||
else { console.error("Usage: claudemesh topic <create|list|join|leave|members|history|read|tail>"); process.exit(EXIT.INVALID_ARGS); }
|
else if (sub === "post") {
|
||||||
|
const postFlags = {
|
||||||
|
mesh: flags.mesh as string,
|
||||||
|
json: !!flags.json,
|
||||||
|
plaintext: !!flags.plaintext,
|
||||||
|
replyTo: (flags["reply-to"] as string) || (flags.replyTo as string),
|
||||||
|
};
|
||||||
|
const message = positionals.slice(2).join(" ");
|
||||||
|
const { runTopicPost } = await import("~/commands/topic-post.js");
|
||||||
|
process.exit(await runTopicPost(arg, message, postFlags));
|
||||||
|
}
|
||||||
|
else { console.error("Usage: claudemesh topic <create|list|join|leave|members|history|read|tail|post>"); process.exit(EXIT.INVALID_ARGS); }
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -599,10 +893,73 @@ async function main(): Promise<void> {
|
|||||||
since: flags.since as string,
|
since: flags.since as string,
|
||||||
};
|
};
|
||||||
if (sub === "list") {
|
if (sub === "list") {
|
||||||
|
// v0.5.0: aggregate across every joined mesh when --mesh is omitted.
|
||||||
|
if (!f.mesh) {
|
||||||
|
const { runMeNotifications } = await import("~/commands/me.js");
|
||||||
|
process.exit(
|
||||||
|
await runMeNotifications({
|
||||||
|
mesh: undefined,
|
||||||
|
json: f.json,
|
||||||
|
all: !!flags.all,
|
||||||
|
since: f.since,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
}
|
||||||
const { runNotificationList } = await import("~/commands/notification.js");
|
const { runNotificationList } = await import("~/commands/notification.js");
|
||||||
process.exit(await runNotificationList(f));
|
process.exit(await runNotificationList(f));
|
||||||
} else {
|
} else {
|
||||||
console.error("Usage: claudemesh notification list [--since <ISO>]");
|
console.error("Usage: claudemesh notification list [--mesh <slug>] [--since <ISO>] [--all]");
|
||||||
|
process.exit(EXIT.INVALID_ARGS);
|
||||||
|
}
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
// me — cross-mesh workspace overview (v0.4.0)
|
||||||
|
case "me": {
|
||||||
|
const sub = positionals[0];
|
||||||
|
const f = {
|
||||||
|
mesh: flags.mesh as string,
|
||||||
|
json: !!flags.json,
|
||||||
|
};
|
||||||
|
if (!sub || sub === "workspace" || sub === "overview") {
|
||||||
|
const { runMe } = await import("~/commands/me.js");
|
||||||
|
process.exit(await runMe(f));
|
||||||
|
} else if (sub === "topics") {
|
||||||
|
const { runMeTopics } = await import("~/commands/me.js");
|
||||||
|
process.exit(await runMeTopics({ ...f, unread: !!flags.unread }));
|
||||||
|
} else if (sub === "notifications" || sub === "notifs") {
|
||||||
|
const { runMeNotifications } = await import("~/commands/me.js");
|
||||||
|
process.exit(
|
||||||
|
await runMeNotifications({
|
||||||
|
...f,
|
||||||
|
all: !!flags.all,
|
||||||
|
since: flags.since as string | undefined,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
} else if (sub === "activity") {
|
||||||
|
const { runMeActivity } = await import("~/commands/me.js");
|
||||||
|
process.exit(
|
||||||
|
await runMeActivity({
|
||||||
|
...f,
|
||||||
|
since: flags.since as string | undefined,
|
||||||
|
}),
|
||||||
|
);
|
||||||
|
} else if (sub === "search") {
|
||||||
|
const { runMeSearch } = await import("~/commands/me.js");
|
||||||
|
const query = positionals.slice(1).join(" ").trim();
|
||||||
|
process.exit(await runMeSearch({ ...f, query }));
|
||||||
|
} else {
|
||||||
|
console.error(
|
||||||
|
"Usage: claudemesh me (cross-mesh overview)\n" +
|
||||||
|
" claudemesh me topics (cross-mesh topic list)\n" +
|
||||||
|
" claudemesh me topics --unread (only unread topics)\n" +
|
||||||
|
" claudemesh me notifications (unread @-mentions, last 7d)\n" +
|
||||||
|
" claudemesh me notifications --all (include already-read)\n" +
|
||||||
|
" claudemesh me notifications --since=ISO (custom window)\n" +
|
||||||
|
" claudemesh me activity (recent messages, last 24h)\n" +
|
||||||
|
" claudemesh me activity --since=ISO (custom window)\n" +
|
||||||
|
" claudemesh me search <query> (cross-mesh search)",
|
||||||
|
);
|
||||||
process.exit(EXIT.INVALID_ARGS);
|
process.exit(EXIT.INVALID_ARGS);
|
||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
@@ -632,7 +989,15 @@ async function main(): Promise<void> {
|
|||||||
const f = { mesh: flags.mesh as string, json: !!flags.json };
|
const f = { mesh: flags.mesh as string, json: !!flags.json };
|
||||||
if (sub === "claim") { const { runTaskClaim } = await import("~/commands/broker-actions.js"); process.exit(await runTaskClaim(positionals[1], f)); }
|
if (sub === "claim") { const { runTaskClaim } = await import("~/commands/broker-actions.js"); process.exit(await runTaskClaim(positionals[1], f)); }
|
||||||
else if (sub === "complete") { const { runTaskComplete } = await import("~/commands/broker-actions.js"); process.exit(await runTaskComplete(positionals[1], positionals.slice(2).join(" ") || undefined, f)); }
|
else if (sub === "complete") { const { runTaskComplete } = await import("~/commands/broker-actions.js"); process.exit(await runTaskComplete(positionals[1], positionals.slice(2).join(" ") || undefined, f)); }
|
||||||
else if (sub === "list") { const { runTaskList } = await import("~/commands/platform-actions.js"); process.exit(await runTaskList({ ...f, status: flags.status as string, assignee: flags.assignee as string })); }
|
else if (sub === "list") {
|
||||||
|
// v0.5.0 phase 2: aggregate across every mesh when --mesh is omitted.
|
||||||
|
if (!f.mesh) {
|
||||||
|
const { runMeTasks } = await import("~/commands/me.js");
|
||||||
|
process.exit(await runMeTasks({ json: f.json, status: flags.status as string | undefined }));
|
||||||
|
}
|
||||||
|
const { runTaskList } = await import("~/commands/platform-actions.js");
|
||||||
|
process.exit(await runTaskList({ ...f, status: flags.status as string, assignee: flags.assignee as string }));
|
||||||
|
}
|
||||||
else if (sub === "create") { const { runTaskCreate } = await import("~/commands/platform-actions.js"); process.exit(await runTaskCreate(positionals.slice(1).join(" "), { ...f, assignee: flags.assignee as string, priority: flags.priority as string, tags: flags.tags as string })); }
|
else if (sub === "create") { const { runTaskCreate } = await import("~/commands/platform-actions.js"); process.exit(await runTaskCreate(positionals.slice(1).join(" "), { ...f, assignee: flags.assignee as string, priority: flags.priority as string, tags: flags.tags as string })); }
|
||||||
else { console.error("Usage: claudemesh task <create|list|claim|complete>"); process.exit(EXIT.INVALID_ARGS); }
|
else { console.error("Usage: claudemesh task <create|list|claim|complete>"); process.exit(EXIT.INVALID_ARGS); }
|
||||||
break;
|
break;
|
||||||
|
|||||||
@@ -1 +0,0 @@
|
|||||||
export { CallToolRequestSchema, ListToolsRequestSchema } from "@modelcontextprotocol/sdk/types.js";
|
|
||||||
@@ -1 +0,0 @@
|
|||||||
export { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
|
|
||||||
@@ -1 +0,0 @@
|
|||||||
export function formatToolError(err: unknown): string { return err instanceof Error ? err.message : String(err); }
|
|
||||||
@@ -1,3 +0,0 @@
|
|||||||
export function logToolCall(toolName: string, durationMs: number): void {
|
|
||||||
if (process.env.CLAUDEMESH_DEBUG === "1") process.stderr.write("[mcp] " + toolName + " (" + durationMs + "ms)\n");
|
|
||||||
}
|
|
||||||
@@ -1,2 +0,0 @@
|
|||||||
// Tool dispatch — server.ts handles all routing via switch statement.
|
|
||||||
export const ROUTER_VERSION = "1.0" as const;
|
|
||||||
File diff suppressed because it is too large
Load Diff
@@ -1,4 +0,0 @@
|
|||||||
// MCP tool family: clock-write
|
|
||||||
// Handlers in mcp/server.ts; this file defines the family for the spec's folder structure.
|
|
||||||
export const FAMILY = "clock-write" as const;
|
|
||||||
export const TOOLS = ["mesh_set_clock", "mesh_pause_clock", "mesh_resume_clock"] as const;
|
|
||||||
@@ -1,4 +0,0 @@
|
|||||||
// MCP tool family: contexts
|
|
||||||
// Handlers in mcp/server.ts; this file defines the family for the spec's folder structure.
|
|
||||||
export const FAMILY = "contexts" as const;
|
|
||||||
export const TOOLS = ["share_context", "get_context", "list_contexts"] as const;
|
|
||||||
@@ -1,4 +0,0 @@
|
|||||||
// MCP tool family: files
|
|
||||||
// Handlers in mcp/server.ts; this file defines the family for the spec's folder structure.
|
|
||||||
export const FAMILY = "files" as const;
|
|
||||||
export const TOOLS = ["share_file", "get_file", "list_files", "file_status", "delete_file", "grant_file_access", "read_peer_file", "list_peer_files"] as const;
|
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user