Compare commits
9 Commits
6780899185
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
1b28550f30 | ||
|
|
9d1b4f3d4c | ||
|
|
ffd0621ccc | ||
|
|
b9ecbe79ad | ||
|
|
33051b95bf | ||
|
|
64d9f9f6f9 | ||
|
|
7f61a711f1 | ||
|
|
96520394ff | ||
|
|
a2a53ff355 |
288
.artifacts/specs/2026-05-04-session-capabilities.md
Normal file
288
.artifacts/specs/2026-05-04-session-capabilities.md
Normal file
@@ -0,0 +1,288 @@
|
||||
# Session capabilities — first-class concept
|
||||
|
||||
**Status:** spec, queued behind v0.3.0 topic-encryption work.
|
||||
**Owner:** alezmad
|
||||
**Author:** Claude (Sprint B follow-up, 2026-05-04)
|
||||
**Related:** `2026-04-15-per-peer-capabilities.md` (existing per-peer
|
||||
caps system, member-keyed), `2026-05-04-per-session-presence.md`
|
||||
(per-launch session presence — what we're now restricting).
|
||||
|
||||
## Problem
|
||||
|
||||
Per-peer capability grants (`apps/broker/src/index.ts:2178+, 2309+`)
|
||||
are keyed on the sender's **stable member pubkey**. The grant model
|
||||
gives the recipient fine-grained control: "alice can DM me",
|
||||
"bob can read state but not broadcast", etc.
|
||||
|
||||
But: as of v1.30.0 (`per-session-presence`), every `claudemesh
|
||||
launch` mints a per-launch ephemeral keypair with a parent attestation
|
||||
binding it to the member identity. The launched session inherits **all**
|
||||
the member's capabilities transitively, because cap enforcement always
|
||||
falls through to the member key.
|
||||
|
||||
Concretely:
|
||||
|
||||
- Member `alice` is in mesh `flexicar`, granted `dm + state-read +
|
||||
state-write` by everyone.
|
||||
- Alice launches a session with `claudemesh launch` to do an automated
|
||||
task — say, run a Claude Code agent that iterates over PRs.
|
||||
- That session has full member privileges. It can DM peers, write
|
||||
shared state keys (e.g. clobber `current-pr`), grant new caps, ban
|
||||
members, etc. — none of which the user wanted to delegate.
|
||||
|
||||
There is no way to express "this session can DM peers but cannot
|
||||
deploy services or grant caps." The parent attestation is a binary
|
||||
existence proof — "this session was vouched by a member" — with no
|
||||
capability subset.
|
||||
|
||||
Plus an adjacent footgun: `set_state` (`apps/broker/src/index.ts:2949`)
|
||||
has **no cap check at all**. Anyone in the mesh can write any key. The
|
||||
spec at `2026-04-15-per-peer-capabilities.md` lists `state-write` as a
|
||||
planned cap but it was never wired into the broker. Shared keys like
|
||||
`current-pr` are write-anyone today.
|
||||
|
||||
## Goal
|
||||
|
||||
A launched session can be issued **a capability subset** of its
|
||||
parent member, signed by the parent at launch time, and the broker
|
||||
enforces the **intersection** of recipient grants × session caps on
|
||||
every protected operation.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Changing the existing per-peer cap model. Member-keyed grants stay
|
||||
authoritative for "who is allowed to talk to me."
|
||||
- Cross-machine session caps (waiting on 2.0.0 HKDF identity).
|
||||
- Per-tool granularity inside the Claude Code MCP surface — this
|
||||
spec only covers the broker-enforceable verbs (dm, broadcast,
|
||||
state-read, state-write, grant, kick, ban, profile-write,
|
||||
service-deploy).
|
||||
- Delegation: a session cannot re-vouch a sub-session with its own
|
||||
cap subset. Only members can attest sessions. (Could be lifted in
|
||||
a future spec; today's launch flow doesn't need it.)
|
||||
|
||||
## Design
|
||||
|
||||
### Capability vocabulary
|
||||
|
||||
Existing (today, member-level):
|
||||
|
||||
| Capability | Effect when GRANTED on a recipient → sender pair |
|
||||
|---------------|---------------------------------------------------|
|
||||
| `read` | Sender appears in recipient's `list_peers` |
|
||||
| `dm` | Sender can DM recipient |
|
||||
| `broadcast` | Sender's broadcasts reach recipient |
|
||||
| `state-read` | Sender can read shared state |
|
||||
| `state-write` | (planned) Sender can write shared state |
|
||||
| `file-read` | Sender can fetch files recipient shared |
|
||||
|
||||
New (session-level — cap subset on the attestation):
|
||||
|
||||
These are the **verbs the session is allowed to invoke**, NOT what
|
||||
peers can do TO it. A session attestation declaring `["dm", "read"]`
|
||||
means the session can SEND dm/read-list operations; it cannot
|
||||
broadcast, write state, grant, etc.
|
||||
|
||||
| Session cap | Gates which broker operations |
|
||||
|-------------------|------------------------------------------------|
|
||||
| `dm` | `send` with single recipient |
|
||||
| `broadcast` | `send` with `*`, `@group`, `#topic` |
|
||||
| `state-read` | `get_state`, `list_state` |
|
||||
| `state-write` | `set_state` |
|
||||
| `grant` | `grant`, `revoke`, `block` |
|
||||
| `kick` | `kick`, `disconnect` |
|
||||
| `ban` | `ban`, `unban` |
|
||||
| `profile-write` | `set_profile`, `set_summary`, `set_status` |
|
||||
| `service-deploy` | `mesh_service_register`, `_unregister` |
|
||||
|
||||
The default cap set when no subset is declared: the **full member
|
||||
set** (today's behavior — opt-in restriction, not breaking).
|
||||
|
||||
### Attestation v2
|
||||
|
||||
Existing v1 (`apps/cli/src/services/broker/session-hello-sig.ts`):
|
||||
|
||||
```
|
||||
canonical = `claudemesh-session-attest|<parent>|<session>|<expires>`
|
||||
```
|
||||
|
||||
New v2 (additive — broker accepts both):
|
||||
|
||||
```
|
||||
canonical = `claudemesh-session-attest-v2|<parent>|<session>|<expires>|<sorted-caps-csv>`
|
||||
```
|
||||
|
||||
Where `<sorted-caps-csv>` is the lower-cased, comma-joined,
|
||||
ASCII-sorted cap list. Empty-list = full member caps (default,
|
||||
back-compat).
|
||||
|
||||
**Wire shape additions on `session_hello`:**
|
||||
|
||||
```ts
|
||||
{
|
||||
type: "session_hello",
|
||||
...existing fields...,
|
||||
parentAttestation: {
|
||||
sessionPubkey,
|
||||
parentMemberPubkey,
|
||||
expiresAt,
|
||||
signature,
|
||||
// NEW:
|
||||
allowed_caps?: string[], // omitted = full member set
|
||||
version?: 2, // omitted = v1
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
The broker version-detects: `version === 2` → verify v2 canonical
|
||||
including `allowed_caps`. Default behavior is unchanged for clients
|
||||
that don't pass it.
|
||||
|
||||
### Enforcement
|
||||
|
||||
Add `allowed_caps: string[] | null` to the in-memory `PeerConn`
|
||||
shape (`apps/broker/src/index.ts:131`). Populated from
|
||||
`handleSessionHello` (the v2 attestation supplies it) and from
|
||||
`handleHello` (control-plane / member connection — set to `null`,
|
||||
meaning "full member caps").
|
||||
|
||||
**Effective cap check** for a sending peer needing `cap`:
|
||||
|
||||
```ts
|
||||
function senderHasCap(conn: PeerConn, cap: string): boolean {
|
||||
if (conn.allowed_caps === null) return true; // member-level, no subset
|
||||
return conn.allowed_caps.includes(cap);
|
||||
}
|
||||
```
|
||||
|
||||
Wire this into every broker operation in the table above. The
|
||||
existing per-peer recipient-cap check at `2178+, 2309+` stays —
|
||||
session caps gate the **sender side**, recipient grants gate the
|
||||
**receive side**, and both must allow:
|
||||
|
||||
```
|
||||
allowed = senderHasCap(conn, capNeeded) && recipientGrants[sender][capNeeded]
|
||||
```
|
||||
|
||||
### `set_state` gate (bonus, ship together)
|
||||
|
||||
Today: no cap check. After this spec: `set_state` requires
|
||||
`state-write` on the sender side. Migration: existing members
|
||||
default to having `state-write` in their member caps (no recipient
|
||||
grant model for state-write — it's a sender-side gate only, mesh-
|
||||
wide). New attestations can omit it to forbid the session.
|
||||
|
||||
The recipient-side analog (per-peer state-write grants) is left for
|
||||
a future spec — today the value of guarding state-write is
|
||||
session-level (avoid an automated session clobbering shared keys),
|
||||
not peer-level.
|
||||
|
||||
### CLI surface
|
||||
|
||||
```
|
||||
claudemesh launch --caps dm,read # tight: read-only chat agent
|
||||
claudemesh launch --caps dm,broadcast # send-only, no state writes
|
||||
claudemesh launch # default: full member caps
|
||||
```
|
||||
|
||||
`claudemesh launch --caps ?` prints the table above with descriptions.
|
||||
|
||||
`claudemesh peer list --json` includes `allowed_caps` per row when
|
||||
present (`null` = full member). Lets users audit what their running
|
||||
sessions can actually do.
|
||||
|
||||
### Migration plan (mirrors `2026-04-15-per-peer-capabilities.md` §"Migration plan")
|
||||
|
||||
1. **Broker schema additive** — `PeerConn.allowed_caps` in-memory
|
||||
only; no DB column. Reload-on-reconnect is fine because the
|
||||
attestation is re-sent on every WS open (it's the proof of
|
||||
identity).
|
||||
|
||||
2. **CLI ships v2 attestation alongside v1.** New `--caps` flag
|
||||
defaults to omitted (= v1 attestation, full caps). Older
|
||||
brokers ignore the new fields entirely.
|
||||
|
||||
3. **Broker accepts v2.** When `allowed_caps` arrives, store it.
|
||||
No enforcement yet — log denied operations as `cap_check_dryrun`
|
||||
metric counter, still allow them through.
|
||||
|
||||
4. **Dry-run release.** Ship one CLI + broker release that emits
|
||||
the metric but doesn't enforce. Watch for false positives in
|
||||
real meshes for ≥ 1 week.
|
||||
|
||||
5. **Flip enforcement on.** Broker rejects operations failing the
|
||||
cap check with `forbidden: missing session capability "<cap>"`.
|
||||
Default ("no caps declared = full member") keeps existing
|
||||
sessions unaffected.
|
||||
|
||||
6. **`set_state` gate** ships in step 5 alongside the rest. Default
|
||||
member caps include `state-write`, so flipping it on doesn't
|
||||
break existing flows. Only sessions that explicitly omit
|
||||
`state-write` from `--caps` lose write access.
|
||||
|
||||
### Crypto notes
|
||||
|
||||
- v2 attestation re-uses `crypto_sign_detached` over the new
|
||||
canonical string; same parent member secret key, same TTL caps
|
||||
(≤24 h), same `expiresAt` semantics.
|
||||
- v1 signatures are NOT v2 signatures — collision is impossible
|
||||
because the canonical strings have different prefixes
|
||||
(`claudemesh-session-attest` vs `claudemesh-session-attest-v2`).
|
||||
Domain separation is intrinsic.
|
||||
- Like the existing per-peer cap system: caps are server-enforced
|
||||
metadata, not capability tokens. A malicious broker can ignore
|
||||
them. This is about UX trust + footgun prevention, not protocol-
|
||||
level security.
|
||||
|
||||
## Open questions
|
||||
|
||||
1. **Should the session attestation also bind to a fingerprint of
|
||||
the launched binary / Claude version?** Would let a member say
|
||||
"this session is constrained to Claude Code v1.34.15" so a
|
||||
compromised launched-binary doesn't get reused. Probably no — too
|
||||
much friction for the threat model.
|
||||
|
||||
2. **What's the right default for `claudemesh launch` going forward?**
|
||||
Once enforcement ships, do we change the default `--caps` from
|
||||
"full member" to "dm + read + state-read"? Tighter but breaks
|
||||
existing automation that writes state. Probably worth a one-
|
||||
release deprecation warning ("your session will lose state-write
|
||||
in v2.0.0 unless you pass --caps state-write") and then flip in
|
||||
v2.0.0.
|
||||
|
||||
3. **Does `--caps` belong in `~/.claudemesh/config.json` per-mesh
|
||||
defaults too?** A user who always launches read-only agents
|
||||
wants `caps: ["dm", "read"]` as a personal default. Easy add;
|
||||
defer until users ask for it.
|
||||
|
||||
4. **Per-tool MCP cap surface?** Out of scope here, but: a `claudemesh
|
||||
launch --tools peer:read,memory:write` would be a finer cut than
|
||||
broker-verb caps. The broker can't enforce that — it'd live in the
|
||||
MCP wrapper / Claude Code's allowedTools. Different layer.
|
||||
|
||||
## Test plan
|
||||
|
||||
- Pure-logic tests on `senderHasCap` (member-level → always true,
|
||||
empty caps → always false, declared caps → exact match).
|
||||
- Broker integration: launch a session with `--caps dm`, attempt
|
||||
`set_state` → expect `forbidden: missing session capability
|
||||
"state-write"`.
|
||||
- v1 attestation still accepted, no `allowed_caps` set, all caps
|
||||
permitted (back-compat).
|
||||
- v2 attestation with empty `allowed_caps` array → broker treats
|
||||
as "explicitly empty, no caps allowed" (NOT "full member"). The
|
||||
full-member default is "field omitted entirely". Test both.
|
||||
- Dry-run mode: cap fail increments the counter but the operation
|
||||
proceeds. Smoke-test before flipping enforcement.
|
||||
|
||||
## Estimate
|
||||
|
||||
- Spec review + open-question resolution: 1–2 days.
|
||||
- Broker change (PeerConn field, attestation v2 accept, per-verb
|
||||
enforcement, dry-run mode): 2–3 days.
|
||||
- CLI change (`--caps` flag, attestation builder, peer list
|
||||
surface): 1 day.
|
||||
- Tests: 1 day.
|
||||
- Dry-run release window: ≥ 1 week.
|
||||
|
||||
Total: ~1 sprint of focused work, plus a dry-run window.
|
||||
350
.artifacts/specs/2026-05-05-continuous-presence.md
Normal file
350
.artifacts/specs/2026-05-05-continuous-presence.md
Normal file
@@ -0,0 +1,350 @@
|
||||
# Continuous presence — lease model + resume token
|
||||
|
||||
**Status:** spec, ready for v0.3.0.
|
||||
**Owner:** alezmad
|
||||
**Author:** Claude (2026-05-05, follow-up to user-reported "after hours claudemesh disconnects")
|
||||
**Related:** `2026-05-04-per-session-presence.md` (per-launch ephemeral keypair), `apps/broker/src/index.ts:5430-5436` (current 30s ping loop), `apps/cli/src/daemon/ws-lifecycle.ts` (current backoff reconnect).
|
||||
|
||||
## Problem
|
||||
|
||||
Today, presence is fused to a single TCP/WS connection. When the
|
||||
connection breaks — half-dead NAT entries, ISP route changes, laptop
|
||||
sleep, broker restart — the broker tears down the presence row, fires
|
||||
`peer_left`, and waits for the daemon to dial a fresh socket and run
|
||||
the full attestation hello again. Other peers see the user blink
|
||||
offline → back online. Messages sent to the session during the gap are
|
||||
either dropped (if it's a `now`/`next` priority DM with no recipient
|
||||
match) or held in `message_queue` for `low` only.
|
||||
|
||||
Concrete symptom (user-reported): `claudemesh peer list` shows zero
|
||||
peers despite multiple sessions being "up" — they're stuck on
|
||||
half-dead TCP connections. Daemon hasn't noticed because no `close`
|
||||
fired. Hours later, kernel TCP keepalive (default Linux: 7200s idle +
|
||||
9 × 75s probes ≈ 2h11m) finally RSTs the socket, daemon's existing
|
||||
backoff reconnects, peers reappear. Until then: zombie session.
|
||||
|
||||
Two coupled bugs:
|
||||
|
||||
1. **No application-layer staleness detection.** Broker pings every
|
||||
30s (line 5431) and updates `lastPingAt` on pong, but never
|
||||
`terminate()`s a connection that stops returning pongs. Daemon
|
||||
doesn't ping at all. Both sides trust the kernel for liveness,
|
||||
which only fires after hours.
|
||||
|
||||
2. **Presence == connection.** Even once the staleness IS detected
|
||||
and the daemon reconnects, peers see a full `peer_left` /
|
||||
`peer_joined` cycle for a network blip that took 1–30 seconds.
|
||||
Outbound messages during the gap that target the session by
|
||||
pubkey route to nothing.
|
||||
|
||||
The user's ask: peers should never see a gap during transient
|
||||
disconnects. Presence should be continuous as long as the *session
|
||||
intent* is alive, regardless of how many sockets carried it.
|
||||
|
||||
## Goal
|
||||
|
||||
Presence is a **lease** keyed off the session's stable identity
|
||||
(`sessionPubkey`), held in broker memory + DB, with a TTL refreshed
|
||||
on every keepalive. Sockets come and go beneath the lease. Other peers
|
||||
see continuous online status across reconnects up to the lease TTL.
|
||||
|
||||
Specifically:
|
||||
|
||||
- A daemon (or per-session WS) can drop and re-establish the WS
|
||||
within a configurable grace window (default 90s) without any peer
|
||||
observing `peer_left` / `peer_joined`.
|
||||
- Messages sent to a session while its socket is mid-flap are queued,
|
||||
delivered on the next reattach, ordered.
|
||||
- Reconnect itself is sub-second on the wire when a `resume_token` is
|
||||
presented — broker recognises the session, restores the slot, no
|
||||
re-attestation round-trip.
|
||||
- After the grace window expires, the broker fires `peer_left`
|
||||
exactly once; on a later reconnect it fires `peer_joined` exactly
|
||||
once. No flapping.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **Multi-broker handoff.** Out of scope. If the broker process
|
||||
restarts, leases are lost and we fall back to today's behavior
|
||||
(clean reconnect, peers see one cycle). A future spec can address
|
||||
this with a shared lease store (Redis / Postgres LISTEN).
|
||||
- **Dual-socket on the daemon.** Useful gold-plating but not required
|
||||
for the user-facing problem. Single-socket with watchdog +
|
||||
resume-token covers the failure modes actually observed (NAT drops,
|
||||
ISP blips, sleep <90s).
|
||||
- **Manual `claudemesh reconnect` CLI.** Not needed; the lease model
|
||||
makes it redundant. Re-evaluate if real support cases surface.
|
||||
|
||||
## Design
|
||||
|
||||
### Lease model
|
||||
|
||||
```
|
||||
sessionPubkey → { transport: "online" | "offline",
|
||||
leaseUntil: Date,
|
||||
ws: WebSocket | null,
|
||||
...existing PeerConn fields }
|
||||
```
|
||||
|
||||
Today the `connections` Map IS keyed by `presenceId`, which is a fresh
|
||||
UUID per WS. We change that key to `sessionPubkey` (member-WS:
|
||||
`memberPubkey`; session-WS: `sessionPubkey`). The PeerConn struct
|
||||
gains:
|
||||
|
||||
```ts
|
||||
transport: "online" | "offline";
|
||||
leaseUntil: Date; // Date.now() + LEASE_TTL_MS
|
||||
evictionTimer: NodeJS.Timeout | null;
|
||||
```
|
||||
|
||||
### State transitions
|
||||
|
||||
**On WS open + hello accepted (initial):**
|
||||
- Insert into `connections` with `transport: "online"`,
|
||||
`leaseUntil: now + 90s`, `evictionTimer: null`.
|
||||
- Broadcast `peer_joined` (today's behavior).
|
||||
- Issue `resume_token` (see below) in the `hello_ack`.
|
||||
|
||||
**On WS open + hello carries valid `resume_token`:**
|
||||
- Look up by `sessionPubkey`, verify token signature + freshness
|
||||
(TTL <= LEASE_TTL_MS). If valid AND entry exists with
|
||||
`transport: "offline"`:
|
||||
- Cancel `evictionTimer`.
|
||||
- Swap `ws` reference.
|
||||
- Set `transport: "online"`, refresh `leaseUntil`.
|
||||
- **Do NOT** broadcast `peer_joined`. The lease never expired.
|
||||
- Drain any queued DMs accumulated during offline window.
|
||||
- Reply `hello_ack` with new `resume_token`.
|
||||
- If entry exists with `transport: "online"` (token replay attack or
|
||||
rapid reconnect race): close old `ws` with `1000, "session_replaced"`
|
||||
before swapping. Same as today's `oldConn.ws.close(1000, ...)`
|
||||
pattern at lines 1768/1996.
|
||||
- If no entry exists or token is stale: treat as a fresh hello,
|
||||
broadcast `peer_joined`. Token expired = same as a cold start.
|
||||
|
||||
**On WS close (any reason):**
|
||||
- Look up by `sessionPubkey`. If not found, no-op (already evicted).
|
||||
- Set `transport: "offline"`, clear `ws` reference.
|
||||
- Start `evictionTimer = setTimeout(evict, GRACE_MS)`.
|
||||
- **Do NOT** broadcast `peer_left`. **Do NOT** delete the entry.
|
||||
- **Do NOT** call `disconnectPresence(presenceId)` yet.
|
||||
|
||||
**On `evictionTimer` fire (lease expired without reattach):**
|
||||
- Delete from `connections`.
|
||||
- Broadcast `peer_left` (today's behavior at lines 5167-5189).
|
||||
- `decMeshCount`.
|
||||
- `disconnectPresence(presenceId)`.
|
||||
- Clean up URL watches, stream subs, MCP registry — same as today's
|
||||
close handler.
|
||||
- Audit `peer_left`.
|
||||
|
||||
**Watchdog (broker):**
|
||||
- The 30s ping loop (line 5431) gains a staleness check: if any
|
||||
conn's `transport === "online"` and `lastPingAt < now - 75s`, call
|
||||
`ws.terminate()`. This converts the half-dead socket into a clean
|
||||
`close` event, which fires the lease-offline transition above.
|
||||
- Same logic on the daemon side (see § Daemon changes).
|
||||
|
||||
### Resume token
|
||||
|
||||
A short opaque string the broker hands the daemon in `hello_ack`.
|
||||
Format: `mesh-resume.v1.<base64url(JSON-payload)>.<base64url(sig)>`
|
||||
where `JSON-payload = { sub: <sessionPubkey>, mid: <meshId>, exp:
|
||||
<unix-ms>, iat: <unix-ms> }` and `sig = ed25519(brokerSigningKey,
|
||||
JSON-payload)`.
|
||||
|
||||
- **Why a token, not just sessionPubkey?** A session needs to prove
|
||||
it's the holder of an existing lease without re-running the full
|
||||
attestation handshake (which involves member key + parent
|
||||
attestation lookup). The token is a server-issued cookie: cheap to
|
||||
verify, scoped to a single session, expires with the lease.
|
||||
- **Storage:** broker keeps the signing key in env (`RESUME_TOKEN_KEY`,
|
||||
generated on first boot if missing, persisted to a config row). No
|
||||
DB column needed for the tokens themselves — they're verified by
|
||||
signature alone.
|
||||
- **TTL:** equal to LEASE_TTL_MS (90s). After that the daemon must
|
||||
re-handshake with full attestation. Refreshed on every successful
|
||||
reattach.
|
||||
- **Daemon storage:** in-memory only. Lost on daemon restart, which
|
||||
is correct: a daemon restart is a real reconnect and should run
|
||||
the full hello.
|
||||
|
||||
### Wire protocol additions
|
||||
|
||||
`hello` (member-WS, session-WS, fresh-launch hello — all three):
|
||||
```diff
|
||||
{
|
||||
type: "hello",
|
||||
memberPubkey: "...",
|
||||
sessionPubkey: "...", // session-WS only
|
||||
attestation: "...", // session-WS only
|
||||
signature: "...",
|
||||
+ resumeToken?: "mesh-resume.v1...", // optional; presence = reattach attempt
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
`hello_ack`:
|
||||
```diff
|
||||
{
|
||||
type: "hello_ack",
|
||||
presenceId: "...",
|
||||
...
|
||||
+ resumeToken: "mesh-resume.v1...", // always issued; replaces prior on reattach
|
||||
+ leaseTtlMs: 90000, // informational; daemon may use for ping cadence
|
||||
}
|
||||
```
|
||||
|
||||
No new message types. Old daemons that don't send `resumeToken` get
|
||||
today's full-handshake behavior — fully backward compatible.
|
||||
|
||||
### Message queue during grace window
|
||||
|
||||
Today: DMs to a presence whose WS is closed → routed to
|
||||
`message_queue` only for `priority: low`; `now`/`next` either route
|
||||
to a different connected session of the same member or drop.
|
||||
|
||||
Change: when broker would route to a session whose
|
||||
`transport === "offline"` (lease still valid), enqueue regardless of
|
||||
priority. On reattach, the existing inbox-drain path
|
||||
(`maybePushQueuedMessages` at line 967) flushes them in order. The
|
||||
`message_queue` already has the schema for this; we're just relaxing
|
||||
the priority gate when the target is in grace.
|
||||
|
||||
### Constants
|
||||
|
||||
```ts
|
||||
const LEASE_TTL_MS = 90_000; // grace window after WS close
|
||||
const PING_INTERVAL_MS = 30_000; // unchanged
|
||||
const STALE_PONG_THRESHOLD_MS = 75_000; // 2.5x ping interval
|
||||
const RESUME_TOKEN_TTL_MS = LEASE_TTL_MS;
|
||||
```
|
||||
|
||||
`LEASE_TTL_MS` = 90s rationale: long enough to absorb a sleep/resume
|
||||
cycle, NAT timeout, ISP route flap, mobile→wifi handover. Short
|
||||
enough that a true crash (daemon killed, machine off) clears the
|
||||
session within 90s — peers don't see ghost online status forever.
|
||||
Configurable via env (`LEASE_TTL_MS`) for self-hosted brokers.
|
||||
|
||||
## Daemon changes
|
||||
|
||||
### Watchdog
|
||||
|
||||
In `ws-lifecycle.ts`, add an `idleWatchdog` parallel to the existing
|
||||
backoff/reconnect machinery:
|
||||
|
||||
```ts
|
||||
let lastActivity = Date.now(); // bumped on every incoming message + pong
|
||||
const watchdog = setInterval(() => {
|
||||
if (Date.now() - lastActivity > STALE_THRESHOLD_MS) {
|
||||
log("warn", "ws_stale_terminate", { url: opts.url });
|
||||
sock.terminate(); // fires existing close handler → reconnect path
|
||||
} else if (sock.readyState === sock.OPEN) {
|
||||
sock.ping(); // matches broker's 30s cadence, gives broker a pong
|
||||
}
|
||||
}, PING_INTERVAL_MS);
|
||||
sock.on("message", () => { lastActivity = Date.now(); });
|
||||
sock.on("pong", () => { lastActivity = Date.now(); });
|
||||
```
|
||||
|
||||
Cleanup `clearInterval(watchdog)` in the close handler and explicit
|
||||
`close()` path.
|
||||
|
||||
### Resume token in hello
|
||||
|
||||
`apps/cli/src/daemon/broker.ts:136` and equivalent in
|
||||
`session-broker.ts`: persist the `resumeToken` from each successful
|
||||
`hello_ack` into a private field, include it in the next
|
||||
`buildHello()` call. On daemon restart the field is empty → cold
|
||||
start, exactly today's behavior.
|
||||
|
||||
### No CLI changes
|
||||
|
||||
`claudemesh peer list` keeps reading the broker's `connections` Map
|
||||
which now reflects continuous presence. Users see online sessions as
|
||||
online during transient blips. No UX surface changes.
|
||||
|
||||
## Migration
|
||||
|
||||
- New broker is fully backward compatible with old daemons (resume
|
||||
token is optional, defaults fall through to today's path).
|
||||
- New daemons against an old broker: token is sent but ignored, full
|
||||
handshake runs each reconnect — same as today.
|
||||
- DB migration: none. `presence` table semantics unchanged. The
|
||||
`disconnectedAt` column is now set only on lease eviction (>90s),
|
||||
not on every WS close. This is a behavioral change but not a
|
||||
schema change.
|
||||
- Add ENV var `RESUME_TOKEN_KEY` (broker generates on first boot if
|
||||
unset, persists to a singleton config row).
|
||||
|
||||
## Test plan
|
||||
|
||||
1. **Sleep test:** kill -STOP the daemon for 60s, then kill -CONT.
|
||||
Expect: peers never see `peer_left`. Daemon's WS is dead-on-arrival
|
||||
when it wakes; watchdog terminates it; reconnect with resume_token
|
||||
succeeds within 1-2s; lease was at ~30s of its 90s TTL when the
|
||||
daemon resumed.
|
||||
|
||||
2. **Hard offline:** kill -STOP for 120s, kill -CONT. Expect: peers
|
||||
see exactly one `peer_left` at t=90s, then exactly one
|
||||
`peer_joined` after the daemon resumes and reconnects (resume
|
||||
token is now stale; full handshake runs).
|
||||
|
||||
3. **NAT drop simulation:** `iptables -A OUTPUT -p tcp --dport 443
|
||||
-j DROP` for 60s on the daemon host, then remove the rule. Expect:
|
||||
broker pings stop landing, broker-side watchdog calls
|
||||
`ws.terminate()` at t=75s, lease enters grace, daemon's own
|
||||
watchdog fires within ~30s, daemon reconnects with resume_token,
|
||||
peers never see a flap.
|
||||
|
||||
4. **Message-during-grace:** while a target session is in grace
|
||||
(offline, lease valid), send a `priority: now` DM. Expect: queued
|
||||
in `message_queue`, delivered exactly once on reattach, no
|
||||
`peer_left` visible to sender, ack returns delivered.
|
||||
|
||||
5. **Replay attack:** capture a resume_token in flight, replay it
|
||||
against a different broker connection while the original session
|
||||
is still online. Expect: broker treats it as a reconnect for an
|
||||
already-online session → closes old WS with `session_replaced`,
|
||||
new WS takes over. Equivalent to today's session-replacement
|
||||
semantics; the original session detects the close and either
|
||||
reconnects (if it's still alive) or gives up.
|
||||
|
||||
6. **Token forgery:** send a `resumeToken` not signed by the broker.
|
||||
Expect: signature check fails, broker treats hello as a fresh
|
||||
handshake (or rejects if the rest of the hello is invalid).
|
||||
|
||||
## Open questions
|
||||
|
||||
- **Should `peer list` expose a `transport` field** so callers can
|
||||
distinguish "leased but offline" from "online"? Default no — the
|
||||
abstraction we're selling is "they're online." But debugging may
|
||||
want it; gate it behind `--all` or `--debug`.
|
||||
- **What about the broker-side `mcpRegistry` cleanup?** Today we
|
||||
delete non-persistent MCP entries on WS close (line 5217). With
|
||||
leases, we should defer that to lease eviction, not WS close.
|
||||
Otherwise an MCP server registered by a session disappears every
|
||||
time its WS reconnects.
|
||||
|
||||
## Build order
|
||||
|
||||
1. **Broker lease model** — change `connections` keying, add
|
||||
`transport`/`leaseUntil`/`evictionTimer`, refactor close handler
|
||||
to start grace timer instead of immediate teardown, refactor
|
||||
eviction path. (~80 lines.)
|
||||
2. **Resume token** — signing key bootstrap, token issue/verify,
|
||||
wire format, hello_ack changes. (~50 lines + 1 config row.)
|
||||
3. **Daemon watchdog** — `ws-lifecycle.ts` adds `idleWatchdog` and
|
||||
stores `resumeToken` from acks. (~25 lines.)
|
||||
4. **Daemon hello** — pass `resumeToken` in next `buildHello()`.
|
||||
(~10 lines across `broker.ts` + `session-broker.ts`.)
|
||||
5. **Broker watchdog** — extend the 30s ping loop with
|
||||
`terminate()`-on-stale logic. (~15 lines.)
|
||||
6. **Queue-during-grace** — relax priority gate in DM routing.
|
||||
(~5 lines.)
|
||||
7. **Spec docs** — update `docs/protocol.md` with resume_token,
|
||||
lease semantics. (~30 lines.)
|
||||
8. **Tests** — six scenarios above. Likely ~3 new test files.
|
||||
|
||||
Estimated total: one focused day. The broker lease model is the load-
|
||||
bearing change; everything else slots in cleanly once that's done.
|
||||
@@ -427,6 +427,21 @@ export async function heartbeat(presenceId: string): Promise<void> {
|
||||
.where(eq(presence.id, presenceId));
|
||||
}
|
||||
|
||||
/**
|
||||
* Restore a presence row to online state on lease reattach: clear
|
||||
* `disconnectedAt` and bump `lastPingAt`. Needed because the DB-level
|
||||
* stale-presence sweeper may have flipped the row to disconnected
|
||||
* during the grace window — the lease is in-memory truth, but other
|
||||
* code paths read presence.disconnectedAt directly.
|
||||
*/
|
||||
export async function restorePresence(presenceId: string): Promise<void> {
|
||||
const now = new Date();
|
||||
await db
|
||||
.update(presence)
|
||||
.set({ disconnectedAt: null, lastPingAt: now })
|
||||
.where(eq(presence.id, presenceId));
|
||||
}
|
||||
|
||||
// --- Peer discovery ---
|
||||
|
||||
/** Return all active (connected) presences in a mesh, joined with member info. */
|
||||
|
||||
@@ -41,6 +41,7 @@ import {
|
||||
grantFileKey,
|
||||
handleHookSetStatus,
|
||||
heartbeat,
|
||||
restorePresence,
|
||||
insertFileKeys,
|
||||
joinGroup,
|
||||
joinMesh,
|
||||
@@ -156,11 +157,53 @@ interface PeerConn {
|
||||
bio?: string;
|
||||
capabilities?: string[];
|
||||
};
|
||||
/** v2 agentic-comms presence taxonomy. Mirrors the value passed to
|
||||
* `recordPresence`. Used by the kick handler to refuse no-op kicks
|
||||
* on long-lived control-plane connections (daemon, dashboard) that
|
||||
* would just auto-reconnect. */
|
||||
peerRole: "control-plane" | "session" | "service";
|
||||
/** Last time this connection's WS replied to a broker ping. Bumped
|
||||
* in the `pong` handler. Used by the staleness watchdog to detect
|
||||
* half-dead TCP/NAT-dropped connections that the kernel hasn't yet
|
||||
* RST'd (Linux default keepalive ≈ 2hrs). */
|
||||
lastPongAt: number;
|
||||
/** Lease state: "online" while the WS is healthy, "offline" during
|
||||
* the GRACE window after a WS close. While offline, the entry stays
|
||||
* in `connections` so peer_list / sendToPeer still see it; DMs land
|
||||
* in the message_queue (sendToPeer no-ops on dead WS, but the queue
|
||||
* row stays with deliveredAt=NULL and drains on reattach). After
|
||||
* GRACE_MS without a reattach, evictionTimer fires the full peer_left
|
||||
* + cleanup. Reattach (same sessionPubkey hello arriving on a fresh
|
||||
* WS) cancels the timer, swaps in the new ws, restores online. */
|
||||
leaseState: "online" | "offline";
|
||||
/** When the lease will be evicted if no reattach happens. 0 when online. */
|
||||
leaseUntil: number;
|
||||
/** Timer that fires evictPresenceFully(presenceId) at leaseUntil. null when online. */
|
||||
evictionTimer: NodeJS.Timeout | null;
|
||||
}
|
||||
|
||||
const connections = new Map<string, PeerConn>();
|
||||
const connectionsPerMesh = new Map<string, number>();
|
||||
|
||||
/**
|
||||
* Lease grace window — how long after a WS close the broker will hold
|
||||
* the presence row open before evicting and broadcasting peer_left.
|
||||
*
|
||||
* 90s: long enough to absorb a sleep/resume cycle, NAT timeout, ISP
|
||||
* route flap, mobile→wifi handover, broker restart of the daemon's
|
||||
* machine. Short enough that a true crash (machine off, daemon killed)
|
||||
* clears the session within 90s — peers don't see ghost online status
|
||||
* forever.
|
||||
*
|
||||
* During grace: lease stays in `connections`, peer_list keeps showing
|
||||
* the session as online to other peers, DMs route through message_queue
|
||||
* (sendToPeer no-ops on dead WS, drain happens on reattach). On
|
||||
* reattach (same sessionPubkey hello on a new WS): silent swap, no
|
||||
* peer_joined / peer_left visible to anyone. After grace expires:
|
||||
* full eviction (peer_left + cleanup) fires exactly once.
|
||||
*/
|
||||
const GRACE_MS = 90_000;
|
||||
|
||||
// Rate limiter for /tg/token endpoint (IP → count, cleared hourly)
|
||||
const tgTokenRateLimit = new Map<string, number>();
|
||||
setInterval(() => tgTokenRateLimit.clear(), 60 * 60_000).unref();
|
||||
@@ -525,6 +568,97 @@ function sendToPeer(presenceId: string, msg: WSServerMessage): void {
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Run the full presence-cleanup path: broadcast peer_left, decMeshCount,
|
||||
* disconnectPresence in DB, audit, clean up URL watches / streams /
|
||||
* MCP entries / clock. Removes the entry from `connections`.
|
||||
*
|
||||
* Called from two places:
|
||||
* 1. `ws.on("close")` when the closing WS belongs to a connection
|
||||
* with no active lease (no grace) — i.e. the lease had already
|
||||
* been evicted, or the close fires before lease is established.
|
||||
* 2. The grace-window evictionTimer when no reattach happened in
|
||||
* GRACE_MS. This is the "presence is really gone" path.
|
||||
*
|
||||
* Idempotent: re-entering when the connections entry is already gone
|
||||
* is a no-op.
|
||||
*/
|
||||
async function evictPresenceFully(presenceId: string): Promise<void> {
|
||||
const conn = connections.get(presenceId);
|
||||
if (!conn) return; // already evicted
|
||||
if (conn.evictionTimer) {
|
||||
clearTimeout(conn.evictionTimer);
|
||||
conn.evictionTimer = null;
|
||||
}
|
||||
connections.delete(presenceId);
|
||||
decMeshCount(conn.meshId);
|
||||
|
||||
const leaveMsg: WSPushMessage = {
|
||||
type: "push",
|
||||
subtype: "system",
|
||||
event: "peer_left",
|
||||
eventData: {
|
||||
name: conn.displayName,
|
||||
pubkey: conn.sessionPubkey ?? conn.memberPubkey,
|
||||
},
|
||||
messageId: crypto.randomUUID(),
|
||||
meshId: conn.meshId,
|
||||
senderPubkey: "system",
|
||||
priority: "low",
|
||||
nonce: "",
|
||||
ciphertext: "",
|
||||
createdAt: new Date().toISOString(),
|
||||
};
|
||||
for (const [pid, peer] of connections) {
|
||||
if (peer.meshId !== conn.meshId) continue;
|
||||
// Don't tell the user's own other sessions they "left" when one
|
||||
// of their Claude Code instances closes. Same pubkey = same user.
|
||||
if (peer.memberPubkey === conn.memberPubkey) continue;
|
||||
sendToPeer(pid, leaveMsg);
|
||||
}
|
||||
|
||||
await disconnectPresence(presenceId);
|
||||
void audit(conn.meshId, "peer_left", conn.memberId, conn.displayName, {});
|
||||
|
||||
// URL watches owned by this presence — interval would otherwise
|
||||
// happily fetch forever after the peer is gone.
|
||||
for (const [watchId, watch] of urlWatches) {
|
||||
if (watch.presenceId === presenceId) {
|
||||
clearInterval(watch.timer);
|
||||
urlWatches.delete(watchId);
|
||||
}
|
||||
}
|
||||
// Stream subscriptions for this presence.
|
||||
for (const [key, subs] of streamSubscriptions) {
|
||||
subs.delete(presenceId);
|
||||
if (subs.size === 0) streamSubscriptions.delete(key);
|
||||
}
|
||||
// MCP servers registered by this presence.
|
||||
for (const [key, entry] of mcpRegistry) {
|
||||
if (entry.presenceId === presenceId) {
|
||||
if (entry.persistent) {
|
||||
// Keep persistent entries but mark offline
|
||||
entry.online = false;
|
||||
entry.offlineSince = new Date().toISOString();
|
||||
entry.presenceId = "";
|
||||
} else {
|
||||
mcpRegistry.delete(key);
|
||||
}
|
||||
}
|
||||
}
|
||||
// Auto-pause clock when mesh becomes empty.
|
||||
if (!connectionsPerMesh.has(conn.meshId)) {
|
||||
const clock = meshClocks.get(conn.meshId);
|
||||
if (clock && clock.timer) {
|
||||
clearInterval(clock.timer);
|
||||
clock.timer = null;
|
||||
clock.paused = true;
|
||||
log.info("clock auto-paused (mesh empty)", { mesh_id: conn.meshId });
|
||||
}
|
||||
}
|
||||
log.info("ws evict full", { presence_id: presenceId });
|
||||
}
|
||||
|
||||
async function maybePushQueuedMessages(
|
||||
presenceId: string,
|
||||
excludeSenderSessionPubkey?: string,
|
||||
@@ -1661,6 +1795,10 @@ async function handleHello(
|
||||
lastSeenAt?: string;
|
||||
restoredGroups?: Array<{ name: string; role?: string }>;
|
||||
restoredStats?: unknown;
|
||||
/** True when this hello reattached an existing offline lease — caller
|
||||
* must skip the peer_joined broadcast and the services-list ack
|
||||
* augmentation. The session was never visibly absent from peers. */
|
||||
silent?: boolean;
|
||||
} | null> {
|
||||
// Validate sessionPubkey shape — it becomes a routable identity in
|
||||
// listPeers/drainForMember, so arbitrary strings let a client claim
|
||||
@@ -1753,6 +1891,61 @@ async function handleHello(
|
||||
const initialGroups = helloHasGroups
|
||||
? hello.groups!
|
||||
: (saved?.groups?.length ? saved.groups : (member.defaultGroups ?? []));
|
||||
// Reattach check: if an offline-leased lease exists for the same
|
||||
// stable identity (sessionPubkey when present, otherwise sessionId
|
||||
// for member-WS), this hello is a transient reconnect within the
|
||||
// grace window — swap the WS reference, clear the eviction timer,
|
||||
// restore online state. No peer_joined broadcast — peers never saw
|
||||
// this session leave.
|
||||
for (const [pid, oldConn] of connections) {
|
||||
if (oldConn.meshId !== hello.meshId) continue;
|
||||
if (oldConn.leaseState !== "offline") continue;
|
||||
const matchByPubkey =
|
||||
!!hello.sessionPubkey
|
||||
&& oldConn.sessionPubkey === hello.sessionPubkey;
|
||||
const matchBySessionId =
|
||||
!hello.sessionPubkey
|
||||
&& !oldConn.sessionPubkey
|
||||
&& oldConn.sessionId === hello.sessionId
|
||||
&& oldConn.memberPubkey === hello.pubkey;
|
||||
if (!matchByPubkey && !matchBySessionId) continue;
|
||||
|
||||
if (oldConn.evictionTimer) {
|
||||
clearTimeout(oldConn.evictionTimer);
|
||||
oldConn.evictionTimer = null;
|
||||
}
|
||||
oldConn.ws = ws;
|
||||
oldConn.leaseState = "online";
|
||||
oldConn.leaseUntil = 0;
|
||||
oldConn.lastPongAt = Date.now();
|
||||
// Refresh mutable fields from the new hello — the same session may
|
||||
// have moved cwd / changed display name across the blip.
|
||||
oldConn.cwd = hello.cwd;
|
||||
if (hello.displayName) oldConn.displayName = hello.displayName;
|
||||
log.info("ws hello reattach (lease)", {
|
||||
presence_id: pid,
|
||||
session_pubkey: hello.sessionPubkey?.slice(0, 12) ?? "(member-WS)",
|
||||
session_id: hello.sessionId,
|
||||
});
|
||||
// Reset DB row to online: the stale-presence sweeper may have set
|
||||
// disconnectedAt during the grace window. Lease is in-memory truth
|
||||
// but downstream code paths read presence.disconnectedAt directly.
|
||||
void restorePresence(pid);
|
||||
// Drain any queued DMs that landed during the offline window.
|
||||
void maybePushQueuedMessages(pid);
|
||||
return {
|
||||
presenceId: pid,
|
||||
memberDisplayName: oldConn.displayName,
|
||||
memberProfile: {
|
||||
roleTag: member.roleTag,
|
||||
groups: member.defaultGroups ?? [],
|
||||
messageMode: member.messageMode ?? "push",
|
||||
},
|
||||
meshPolicy,
|
||||
silent: true,
|
||||
};
|
||||
}
|
||||
|
||||
// Session-id dedup: if this session_id already has an active presence,
|
||||
// disconnect the ghost. Happens when a client reconnects after a
|
||||
// network blip or broker restart before the 90s stale sweeper runs.
|
||||
@@ -1797,6 +1990,11 @@ async function handleHello(
|
||||
groups: initialGroups,
|
||||
visible: saved?.visible ?? true,
|
||||
profile: saved?.profile ?? {},
|
||||
peerRole: "control-plane",
|
||||
lastPongAt: Date.now(),
|
||||
leaseState: "online",
|
||||
leaseUntil: 0,
|
||||
evictionTimer: null,
|
||||
});
|
||||
incMeshCount(hello.meshId);
|
||||
void audit(hello.meshId, "peer_joined", member.id, effectiveDisplayName, {
|
||||
@@ -1853,6 +2051,10 @@ async function handleSessionHello(
|
||||
memberDisplayName: string;
|
||||
memberProfile?: unknown;
|
||||
meshPolicy?: Record<string, unknown>;
|
||||
/** True when this hello reattached an existing offline lease — caller
|
||||
* must skip the peer_joined broadcast. The session was never visibly
|
||||
* absent from peers. */
|
||||
silent?: boolean;
|
||||
} | null> {
|
||||
// Shape checks. The crypto helpers also enforce these but bailing
|
||||
// early gives a clearer error code on the wire.
|
||||
@@ -1982,6 +2184,42 @@ async function handleSessionHello(
|
||||
|
||||
const initialGroups = hello.groups ?? member.defaultGroups ?? [];
|
||||
|
||||
// Reattach check: an offline-leased connection with the same
|
||||
// sessionPubkey is the same launched session resuming inside the
|
||||
// grace window. Cancel the eviction timer, swap the WS, restore
|
||||
// online state. No peer_joined broadcast — peers never saw the
|
||||
// session leave.
|
||||
for (const [pid, oldConn] of connections) {
|
||||
if (oldConn.meshId !== hello.meshId) continue;
|
||||
if (oldConn.leaseState !== "offline") continue;
|
||||
if (oldConn.sessionPubkey !== hello.sessionPubkey) continue;
|
||||
|
||||
if (oldConn.evictionTimer) {
|
||||
clearTimeout(oldConn.evictionTimer);
|
||||
oldConn.evictionTimer = null;
|
||||
}
|
||||
oldConn.ws = ws;
|
||||
oldConn.leaseState = "online";
|
||||
oldConn.leaseUntil = 0;
|
||||
oldConn.lastPongAt = Date.now();
|
||||
// Refresh mutable fields from the new hello.
|
||||
oldConn.cwd = hello.cwd;
|
||||
if (hello.displayName) oldConn.displayName = hello.displayName;
|
||||
log.info("session_hello reattach (lease)", {
|
||||
presence_id: pid,
|
||||
session_pubkey: hello.sessionPubkey.slice(0, 12),
|
||||
});
|
||||
void restorePresence(pid);
|
||||
void maybePushQueuedMessages(pid);
|
||||
return {
|
||||
presenceId: pid,
|
||||
memberDisplayName: oldConn.displayName,
|
||||
memberProfile: undefined,
|
||||
meshPolicy,
|
||||
silent: true,
|
||||
};
|
||||
}
|
||||
|
||||
// Session-id dedup: if the same session_id is already connected, kick
|
||||
// the ghost. Reconnect after a network blip lands here cleanly.
|
||||
for (const [oldPid, oldConn] of connections) {
|
||||
@@ -2022,6 +2260,11 @@ async function handleSessionHello(
|
||||
groups: initialGroups,
|
||||
visible: true,
|
||||
profile: {},
|
||||
peerRole: "session",
|
||||
lastPongAt: Date.now(),
|
||||
leaseState: "online",
|
||||
leaseUntil: 0,
|
||||
evictionTimer: null,
|
||||
});
|
||||
incMeshCount(hello.meshId);
|
||||
void audit(hello.meshId, "peer_joined", member.id, effectiveDisplayName, {
|
||||
@@ -2420,8 +2663,10 @@ function handleConnection(ws: WebSocket): void {
|
||||
}
|
||||
// Broadcast peer_joined to siblings — same shape as the regular
|
||||
// hello path, so list_peers consumers don't need to special-case.
|
||||
// Skipped on lease reattach: the session was never visibly absent,
|
||||
// so no synthetic join event should fire.
|
||||
const joinedConn = connections.get(presenceId);
|
||||
if (joinedConn) {
|
||||
if (joinedConn && !result.silent) {
|
||||
const joinMsg: WSPushMessage = {
|
||||
type: "push",
|
||||
subtype: "system",
|
||||
@@ -2504,9 +2749,11 @@ function handleConnection(ws: WebSocket): void {
|
||||
} catch {
|
||||
/* ws closed during hello */
|
||||
}
|
||||
// Broadcast peer_joined or peer_returned to all other peers in the same mesh.
|
||||
// Broadcast peer_joined or peer_returned to all other peers in the
|
||||
// same mesh. Skipped on lease reattach: the session never appeared
|
||||
// offline so no synthetic join event should fire.
|
||||
const joinedConn = connections.get(presenceId);
|
||||
if (joinedConn) {
|
||||
if (joinedConn && !result.silent) {
|
||||
const isReturning = !!result.restored;
|
||||
const joinMsg: WSPushMessage = {
|
||||
type: "push",
|
||||
@@ -4645,11 +4892,30 @@ function handleConnection(ws: WebSocket): void {
|
||||
}
|
||||
|
||||
const affected: string[] = [];
|
||||
// 1.34.15 (gap #3a): kick was a no-op against long-lived
|
||||
// control-plane connections (daemon, dashboard) — closing
|
||||
// their WS just triggered the auto-reconnect loop, the
|
||||
// kicker's CLI rendered "Their Claude Code session ended"
|
||||
// (which was misleading), and the user-visible state was
|
||||
// unchanged seconds later. We now refuse to close control-
|
||||
// plane WSes and surface the skipped peers in a new
|
||||
// additive ack field. Pre-1.34.15 CLI clients only read
|
||||
// `kicked`/`affected`, so this stays back-compat.
|
||||
//
|
||||
// For `kick`-only: the soft `disconnect` verb still closes
|
||||
// control-plane WSes intentionally — that's what users want
|
||||
// when they're nudging a peer for it to re-authenticate.
|
||||
const skippedControlPlane: string[] = [];
|
||||
const skipControlPlane = isKick;
|
||||
const now = Date.now();
|
||||
|
||||
if (km.all) {
|
||||
for (const [pid, peer] of connections) {
|
||||
if (peer.meshId !== conn.meshId || pid === presenceId) continue;
|
||||
if (skipControlPlane && peer.peerRole === "control-plane") {
|
||||
skippedControlPlane.push(peer.displayName || pid);
|
||||
continue;
|
||||
}
|
||||
try { peer.ws.close(closeCode, closeReason); } catch {}
|
||||
connections.delete(pid);
|
||||
void disconnectPresence(pid);
|
||||
@@ -4661,6 +4927,10 @@ function handleConnection(ws: WebSocket): void {
|
||||
if (peer.meshId !== conn.meshId || pid === presenceId) continue;
|
||||
const [pres] = await db.select({ lastPingAt: presence.lastPingAt }).from(presence).where(eq(presence.id, pid)).limit(1);
|
||||
if (pres && pres.lastPingAt && pres.lastPingAt.getTime() < cutoff) {
|
||||
if (skipControlPlane && peer.peerRole === "control-plane") {
|
||||
skippedControlPlane.push(peer.displayName || pid);
|
||||
continue;
|
||||
}
|
||||
try { peer.ws.close(closeCode, `${closeReason}_stale`); } catch {}
|
||||
connections.delete(pid);
|
||||
void disconnectPresence(pid);
|
||||
@@ -4671,6 +4941,10 @@ function handleConnection(ws: WebSocket): void {
|
||||
for (const [pid, peer] of connections) {
|
||||
if (peer.meshId !== conn.meshId) continue;
|
||||
if (peer.displayName === km.target || peer.memberPubkey === km.target || peer.memberPubkey.startsWith(km.target)) {
|
||||
if (skipControlPlane && peer.peerRole === "control-plane") {
|
||||
skippedControlPlane.push(peer.displayName || pid);
|
||||
continue;
|
||||
}
|
||||
try { peer.ws.close(closeCode, closeReason); } catch {}
|
||||
connections.delete(pid);
|
||||
void disconnectPresence(pid);
|
||||
@@ -4679,8 +4953,20 @@ function handleConnection(ws: WebSocket): void {
|
||||
}
|
||||
}
|
||||
|
||||
conn.ws.send(JSON.stringify({ type: ackType, kicked: affected, affected, _reqId: km._reqId }));
|
||||
log.info(`ws ${closeReason}`, { presence_id: presenceId, count: affected.length, target: km.target ?? km.stale ?? "all" });
|
||||
conn.ws.send(JSON.stringify({
|
||||
type: ackType,
|
||||
kicked: affected,
|
||||
affected,
|
||||
// Additive — older CLI clients ignore this field.
|
||||
...(skippedControlPlane.length > 0 ? { skipped_control_plane: skippedControlPlane } : {}),
|
||||
_reqId: km._reqId,
|
||||
}));
|
||||
log.info(`ws ${closeReason}`, {
|
||||
presence_id: presenceId,
|
||||
count: affected.length,
|
||||
target: km.target ?? km.stale ?? "all",
|
||||
skipped_control_plane: skippedControlPlane.length,
|
||||
});
|
||||
break;
|
||||
}
|
||||
|
||||
@@ -5108,88 +5394,52 @@ function handleConnection(ws: WebSocket): void {
|
||||
}
|
||||
});
|
||||
ws.on("close", async () => {
|
||||
if (presenceId) {
|
||||
const conn = connections.get(presenceId);
|
||||
// Persist peer state BEFORE removing from connections.
|
||||
if (conn) {
|
||||
await savePeerState(conn, conn.memberId, conn.meshId);
|
||||
}
|
||||
connections.delete(presenceId);
|
||||
if (conn) {
|
||||
decMeshCount(conn.meshId);
|
||||
// Broadcast peer_left to remaining peers in the same mesh.
|
||||
const leaveMsg: WSPushMessage = {
|
||||
type: "push",
|
||||
subtype: "system",
|
||||
event: "peer_left",
|
||||
eventData: {
|
||||
name: conn.displayName,
|
||||
pubkey: conn.sessionPubkey ?? conn.memberPubkey,
|
||||
},
|
||||
messageId: crypto.randomUUID(),
|
||||
meshId: conn.meshId,
|
||||
senderPubkey: "system",
|
||||
priority: "low",
|
||||
nonce: "",
|
||||
ciphertext: "",
|
||||
createdAt: new Date().toISOString(),
|
||||
};
|
||||
for (const [pid, peer] of connections) {
|
||||
if (peer.meshId !== conn.meshId) continue;
|
||||
// Don't tell the user's own other sessions they "left" when one
|
||||
// of their Claude Code instances closes. Same pubkey = same user.
|
||||
if (peer.memberPubkey === conn.memberPubkey) continue;
|
||||
sendToPeer(pid, leaveMsg);
|
||||
}
|
||||
}
|
||||
await disconnectPresence(presenceId);
|
||||
if (conn) {
|
||||
void audit(conn.meshId, "peer_left", conn.memberId, conn.displayName, {});
|
||||
}
|
||||
// Clean up URL watches owned by this peer — the interval was
|
||||
// happily fetching forever after the peer disconnected.
|
||||
for (const [watchId, watch] of urlWatches) {
|
||||
if (watch.presenceId === presenceId) {
|
||||
clearInterval(watch.timer);
|
||||
urlWatches.delete(watchId);
|
||||
}
|
||||
}
|
||||
// Clean up stream subscriptions for this peer
|
||||
for (const [key, subs] of streamSubscriptions) {
|
||||
subs.delete(presenceId);
|
||||
if (subs.size === 0) streamSubscriptions.delete(key);
|
||||
}
|
||||
// Clean up MCP servers registered by this peer
|
||||
for (const [key, entry] of mcpRegistry) {
|
||||
if (entry.presenceId === presenceId) {
|
||||
if (entry.persistent) {
|
||||
// Keep persistent entries but mark offline
|
||||
entry.online = false;
|
||||
entry.offlineSince = new Date().toISOString();
|
||||
entry.presenceId = "";
|
||||
} else {
|
||||
mcpRegistry.delete(key);
|
||||
}
|
||||
}
|
||||
}
|
||||
// Auto-pause clock when mesh becomes empty
|
||||
if (conn && !connectionsPerMesh.has(conn.meshId)) {
|
||||
const clock = meshClocks.get(conn.meshId);
|
||||
if (clock && clock.timer) {
|
||||
clearInterval(clock.timer);
|
||||
clock.timer = null;
|
||||
clock.paused = true;
|
||||
log.info("clock auto-paused (mesh empty)", { mesh_id: conn.meshId });
|
||||
}
|
||||
}
|
||||
log.info("ws close", { presence_id: presenceId });
|
||||
if (!presenceId) return;
|
||||
const conn = connections.get(presenceId);
|
||||
if (!conn) return; // already evicted
|
||||
|
||||
// If the conn's `ws` is no longer THIS ws, the close belongs to an
|
||||
// older socket that was already replaced by a reattach. Ignore — the
|
||||
// lease is healthy with the new WS, no eviction needed.
|
||||
if (conn.ws !== ws) {
|
||||
log.debug("ws close on replaced socket — ignoring", { presence_id: presenceId });
|
||||
return;
|
||||
}
|
||||
|
||||
await savePeerState(conn, conn.memberId, conn.meshId);
|
||||
|
||||
// If lease is currently online, enter grace. Other peers see the
|
||||
// session as still online; DMs queue (sendToPeer no-ops on dead
|
||||
// WS, drain on reattach). After GRACE_MS without a reattach, the
|
||||
// timer fires evictPresenceFully and cleanup runs as before.
|
||||
const pid = presenceId;
|
||||
if (conn.leaseState === "online") {
|
||||
conn.leaseState = "offline";
|
||||
conn.leaseUntil = Date.now() + GRACE_MS;
|
||||
conn.evictionTimer = setTimeout(() => {
|
||||
log.info("lease grace expired — evicting", { presence_id: pid });
|
||||
void evictPresenceFully(pid);
|
||||
}, GRACE_MS);
|
||||
log.info("ws close — lease grace started", {
|
||||
presence_id: pid,
|
||||
grace_ms: GRACE_MS,
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
// Not online (already in grace from an earlier close, or odd state).
|
||||
// Run full eviction immediately.
|
||||
await evictPresenceFully(pid);
|
||||
});
|
||||
ws.on("error", (err) => {
|
||||
log.warn("ws error", { error: err.message });
|
||||
});
|
||||
ws.on("pong", () => {
|
||||
if (presenceId) void heartbeat(presenceId);
|
||||
if (presenceId) {
|
||||
const conn = connections.get(presenceId);
|
||||
if (conn) conn.lastPongAt = Date.now();
|
||||
void heartbeat(presenceId);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
@@ -5381,10 +5631,29 @@ async function main(): Promise<void> {
|
||||
});
|
||||
});
|
||||
|
||||
// WS heartbeat ping every 30s; clients reply with pong → bumps lastPingAt.
|
||||
// WS heartbeat ping every 30s; clients reply with pong → bumps
|
||||
// lastPongAt. Connections whose pong is older than 75s (2.5x the
|
||||
// ping interval) are considered half-dead — kernel hasn't yet RST'd
|
||||
// the socket but no application traffic is flowing. Force-terminate
|
||||
// them to fire the close handler and free the connection slot.
|
||||
const STALE_PONG_THRESHOLD_MS = 75_000;
|
||||
const pingInterval = setInterval(() => {
|
||||
for (const { ws } of connections.values()) {
|
||||
if (ws.readyState === ws.OPEN) ws.ping();
|
||||
const now = Date.now();
|
||||
for (const [pid, conn] of connections) {
|
||||
// Skip offline-leased entries: their WS is intentionally dead
|
||||
// during grace; the eviction timer handles their lifecycle.
|
||||
if (conn.leaseState === "offline") continue;
|
||||
const { ws } = conn;
|
||||
if (ws.readyState !== ws.OPEN) continue;
|
||||
if (now - conn.lastPongAt > STALE_PONG_THRESHOLD_MS) {
|
||||
log.warn("ws stale terminate", {
|
||||
presence_id: pid,
|
||||
last_pong_ago_ms: now - conn.lastPongAt,
|
||||
});
|
||||
try { ws.terminate(); } catch { /* socket already gone */ }
|
||||
continue;
|
||||
}
|
||||
ws.ping();
|
||||
}
|
||||
}, 30_000);
|
||||
pingInterval.unref();
|
||||
|
||||
47
apps/broker/tests/kick-control-plane-skip.test.ts
Normal file
47
apps/broker/tests/kick-control-plane-skip.test.ts
Normal file
@@ -0,0 +1,47 @@
|
||||
/**
|
||||
* Kick control-plane skip: 1.34.15 (gap #3a) refuses to close
|
||||
* long-lived control-plane connections (claudemesh daemon, dashboard)
|
||||
* via `kick`, because they auto-reconnect within seconds and the verb
|
||||
* was effectively a no-op. The soft `disconnect` verb keeps the old
|
||||
* behavior so users can still nudge a control-plane peer to
|
||||
* re-authenticate.
|
||||
*
|
||||
* Pure-logic test — mirrors the branch inside handleSend's kick case
|
||||
* without spinning up a broker. Same pattern as
|
||||
* grants-enforcement.test.ts.
|
||||
*/
|
||||
|
||||
import { describe, expect, test } from "vitest";
|
||||
|
||||
type PeerRole = "control-plane" | "session" | "service";
|
||||
|
||||
/** Mirrors the predicate inserted into the kick handler. */
|
||||
function shouldSkipKick(args: {
|
||||
verb: "kick" | "disconnect";
|
||||
peerRole: PeerRole;
|
||||
}): boolean {
|
||||
const skipControlPlane = args.verb === "kick";
|
||||
return skipControlPlane && args.peerRole === "control-plane";
|
||||
}
|
||||
|
||||
describe("kick control-plane skip (gap #3a)", () => {
|
||||
test("kick on control-plane → skipped (would auto-reconnect)", () => {
|
||||
expect(shouldSkipKick({ verb: "kick", peerRole: "control-plane" })).toBe(true);
|
||||
});
|
||||
|
||||
test("kick on session → not skipped (closes user session)", () => {
|
||||
expect(shouldSkipKick({ verb: "kick", peerRole: "session" })).toBe(false);
|
||||
});
|
||||
|
||||
test("kick on service → not skipped", () => {
|
||||
expect(shouldSkipKick({ verb: "kick", peerRole: "service" })).toBe(false);
|
||||
});
|
||||
|
||||
test("disconnect on control-plane → not skipped (intentional nudge)", () => {
|
||||
expect(shouldSkipKick({ verb: "disconnect", peerRole: "control-plane" })).toBe(false);
|
||||
});
|
||||
|
||||
test("disconnect on session → not skipped", () => {
|
||||
expect(shouldSkipKick({ verb: "disconnect", peerRole: "session" })).toBe(false);
|
||||
});
|
||||
});
|
||||
@@ -1,5 +1,110 @@
|
||||
# Changelog
|
||||
|
||||
## 1.34.15 (2026-05-04) — `peer list --mesh` actually scopes + `kick` refuses control-plane
|
||||
|
||||
Two follow-ups from the 1.34.x train, both backwards-compatible.
|
||||
|
||||
### `peer list --mesh <slug>` no longer aggregates across meshes
|
||||
|
||||
`apps/cli/src/commands/peers.ts:140` was calling
|
||||
`tryListPeersViaDaemon()` with no argument, so a multi-mesh daemon
|
||||
returned peers from EVERY attached mesh and the renderer printed
|
||||
"peers on flexicar" with cross-mesh rows mixed in. The daemon's
|
||||
`/v1/peers?mesh=<slug>` filter (server-side, since 1.26.0) was
|
||||
already correctly scoping when the slug was passed; the CLI just
|
||||
wasn't passing it. Fixed.
|
||||
|
||||
`apps/cli/src/commands/launch.ts:407` (the `printBrokerWelcome` peer
|
||||
count in the launch banner) had the same bug. The "N peers online"
|
||||
line in the welcome now shows the count for the launched mesh only.
|
||||
|
||||
`apps/cli/src/commands/send.ts` cross-mesh hex-prefix resolution is
|
||||
intentionally cross-mesh (the user is targeting by hex without
|
||||
specifying a mesh) and was deliberately left as-is.
|
||||
|
||||
### `claudemesh kick` refuses no-op kicks on control-plane connections
|
||||
|
||||
Pre-1.34.15, kicking a daemon's member-WS or a dashboard connection
|
||||
just closed the socket — the daemon's WS-lifecycle reconnect loop
|
||||
brought it back within seconds, the kicker's CLI rendered "Their
|
||||
Claude Code session ended" (which was misleading), and the user-
|
||||
visible state was unchanged. The verb was effectively a no-op, but
|
||||
the user had to learn that the hard way.
|
||||
|
||||
The broker's kick handler (`apps/broker/src/index.ts:4628+`) now
|
||||
skips peers where `peerRole === "control-plane"` and surfaces the
|
||||
skipped peers in a new additive ack field `skipped_control_plane`.
|
||||
The soft `disconnect` verb keeps the old behavior — useful when
|
||||
intentionally nudging a control-plane peer to re-authenticate.
|
||||
|
||||
The CLI (`apps/cli/src/commands/kick.ts`) reads the new field and
|
||||
prints a clearer message: refused peers are listed, with the hint
|
||||
that `claudemesh ban <peer>` is the right tool to remove a member,
|
||||
or `claudemesh daemon down` to take a daemon offline locally.
|
||||
|
||||
`apps/broker/src/index.ts` adds `peerRole` to the in-memory
|
||||
`PeerConn` shape, populated from both connection paths
|
||||
(member-keyed `hello` → `"control-plane"`, per-launch
|
||||
`session_hello` → `"session"`). The DB-side role taxonomy is
|
||||
unchanged.
|
||||
|
||||
### Back-compat
|
||||
|
||||
- Older CLI clients ignore the new `skipped_control_plane` ack
|
||||
field; their kick continues to print "Kicked 0 peer(s)" against
|
||||
a control-plane target as before.
|
||||
- Older brokers don't emit the field at all; newer CLI handles
|
||||
the absence (the new branch is only reached when the field is
|
||||
present and non-empty).
|
||||
- The new `peerRole` slot on `PeerConn` is filled at every
|
||||
`connections.set` callsite, so older code paths never read
|
||||
`undefined`.
|
||||
|
||||
### Tests
|
||||
|
||||
- `apps/broker/tests/kick-control-plane-skip.test.ts` — 5 cases
|
||||
covering the kick/disconnect × control-plane/session/service
|
||||
truth table.
|
||||
|
||||
## 1.34.14 (2026-05-04) — stale `CLAUDEMESH_CONFIG_DIR` falls back
|
||||
|
||||
`claudemesh launch` exports `CLAUDEMESH_CONFIG_DIR=<tmpdir>` to its
|
||||
spawned `claude` so the per-session mesh selection is isolated from
|
||||
`~/.claudemesh/config.json`. The tmpdir is `rmSync`'d on launch exit
|
||||
via the `process.on('exit', cleanup)` handler.
|
||||
|
||||
Footgun: if a later `claudemesh` invocation INHERITED that env — a
|
||||
Bash tool call inside Claude Code, a tmux pane that captured the env
|
||||
via `update-environment`, an exported var the user forgot to clear —
|
||||
the inherited path pointed at a tmpdir that no longer existed.
|
||||
Pre-1.34.14 we silently used the dead path, `readConfig()` came back
|
||||
empty, and the user saw "No meshes joined" from an otherwise-working
|
||||
install. Fish users hit it harder because fish has no `unset` —
|
||||
they had to discover `set -e CLAUDEMESH_CONFIG_DIR`.
|
||||
|
||||
`apps/cli/src/constants/paths.ts` now resolves `CONFIG_DIR` once via
|
||||
a memoized `resolveConfigDir()`:
|
||||
|
||||
1. No env var → `~/.claudemesh` (default, unchanged).
|
||||
2. Env points at a dir containing `config.json` → trust it. The
|
||||
legitimate per-session-launch case is byte-identical to before.
|
||||
3. Env set but stale (dir gone) → warn once on stderr (TTY-only —
|
||||
CI / MCP boot / piped scripts stay quiet) with a shell-specific
|
||||
unset hint, then fall back to `~/.claudemesh`.
|
||||
|
||||
The check is on the directory's existence, not on `config.json`,
|
||||
because a fresh-launch tmpdir legitimately has no `config.json` until
|
||||
the first write. The stale signature we catch is the outer launch's
|
||||
`rmSync(tmpDir, {recursive: true})` cleanup, which removes the
|
||||
directory entirely.
|
||||
|
||||
The "no meshes" check from the original triage was deliberately NOT
|
||||
adopted: a launched session that legitimately joins one mesh would
|
||||
hit it.
|
||||
|
||||
No back-compat surface affected. No other files changed. `_resetPathsForTest()`
|
||||
exported for unit tests.
|
||||
|
||||
## 1.34.13 (2026-05-04) — MCP forwards session token on /v1/events
|
||||
|
||||
The 1.34.10 SSE demux + 1.34.11 inbox per-recipient column were both
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "claudemesh-cli",
|
||||
"version": "1.34.13",
|
||||
"version": "1.34.16",
|
||||
"description": "Peer mesh for Claude Code sessions — CLI + MCP server.",
|
||||
"keywords": [
|
||||
"claude-code",
|
||||
|
||||
@@ -76,12 +76,32 @@ export async function runKick(
|
||||
if ("error" in built) { render.err(String(built.error)); return EXIT.INVALID_ARGS; }
|
||||
|
||||
return await withMesh({ meshSlug }, async (client) => {
|
||||
const result = await client.sendAndWait(built as Record<string, unknown>) as { affected?: string[]; kicked?: string[] };
|
||||
const result = await client.sendAndWait(built as Record<string, unknown>) as {
|
||||
affected?: string[];
|
||||
kicked?: string[];
|
||||
// 1.34.15: broker refuses to kick control-plane WSes (they'd
|
||||
// just auto-reconnect). Older brokers don't emit this field.
|
||||
skipped_control_plane?: string[];
|
||||
};
|
||||
const peers = result?.affected ?? result?.kicked ?? [];
|
||||
if (peers.length === 0) render.info("No peers matched.");
|
||||
else {
|
||||
const skipped = result?.skipped_control_plane ?? [];
|
||||
|
||||
if (peers.length === 0 && skipped.length === 0) {
|
||||
render.info("No peers matched.");
|
||||
} else if (peers.length === 0 && skipped.length > 0) {
|
||||
render.warn(
|
||||
`${skipped.length} match(es) refused: ${skipped.join(", ")} — control-plane connections (daemon / dashboard) auto-reconnect, so kick is a no-op.`,
|
||||
"To take a daemon offline locally, run `claudemesh daemon down` on that machine. To remove a member from the mesh, use `claudemesh ban <peer>`.",
|
||||
);
|
||||
} else {
|
||||
render.ok(`Kicked ${peers.length} peer(s): ${peers.join(", ")}`);
|
||||
render.hint("Their Claude Code session ended. They can rejoin anytime by running `claudemesh`.");
|
||||
if (skipped.length > 0) {
|
||||
render.warn(
|
||||
`(also refused ${skipped.length} control-plane connection(s): ${skipped.join(", ")})`,
|
||||
"Daemon / dashboard connections auto-reconnect; kick is a no-op against them. Use `claudemesh ban <peer>` to remove a member entirely.",
|
||||
);
|
||||
}
|
||||
}
|
||||
return EXIT.SUCCESS;
|
||||
});
|
||||
|
||||
@@ -400,11 +400,13 @@ async function printBrokerWelcome(meshSlug: string): Promise<void> {
|
||||
}
|
||||
} catch { /* daemon unreachable — not fatal */ }
|
||||
|
||||
// Peer count (best-effort).
|
||||
// Peer count (best-effort). 1.34.15: scope to the launched mesh so
|
||||
// multi-mesh daemons don't inflate the welcome banner with peers
|
||||
// from other meshes the user didn't just attach to.
|
||||
let peerCount = -1;
|
||||
try {
|
||||
const { tryListPeersViaDaemon } = await import("~/services/bridge/daemon-route.js");
|
||||
const peers = (await tryListPeersViaDaemon()) ?? [];
|
||||
const peers = (await tryListPeersViaDaemon(meshSlug)) ?? [];
|
||||
peerCount = peers.filter((p) =>
|
||||
(p as { channel?: string }).channel !== "claudemesh-daemon",
|
||||
).length;
|
||||
|
||||
@@ -135,9 +135,17 @@ async function listPeersForMesh(slug: string): Promise<PeerRecord[]> {
|
||||
// lifecycle helper inside tryListPeersViaDaemon auto-spawns the
|
||||
// daemon if it's down and probes it for liveness — no separate bridge
|
||||
// tier is needed any more (1.28.0).
|
||||
//
|
||||
// 1.34.15: forward `slug` to the daemon as `?mesh=<slug>` so the
|
||||
// server-side aggregator narrows to the requested mesh. Pre-1.34.15
|
||||
// we called this with no argument, so a multi-mesh daemon returned
|
||||
// peers from every attached mesh and the renderer printed "peers on
|
||||
// flexicar" with cross-mesh rows mixed in. The daemon's
|
||||
// `meshFromCtx` already does the right scoping when the slug is
|
||||
// passed; the CLI just wasn't passing it.
|
||||
try {
|
||||
const { tryListPeersViaDaemon } = await import("~/services/bridge/daemon-route.js");
|
||||
const dr = await tryListPeersViaDaemon();
|
||||
const dr = await tryListPeersViaDaemon(slug);
|
||||
if (dr !== null) {
|
||||
return dr.map((p) => annotateSelf(p as PeerRecord, selfMemberPubkey, selfSessionPubkey));
|
||||
}
|
||||
|
||||
@@ -1,10 +1,82 @@
|
||||
import { existsSync } from "node:fs";
|
||||
import { homedir } from "node:os";
|
||||
import { join } from "node:path";
|
||||
|
||||
const home = homedir();
|
||||
const DEFAULT_CONFIG_DIR = join(home, ".claudemesh");
|
||||
|
||||
/**
|
||||
* Resolve `CONFIG_DIR` once, with stale-env detection.
|
||||
*
|
||||
* `claudemesh launch` exposes `CLAUDEMESH_CONFIG_DIR=<tmpdir>` to its
|
||||
* spawned `claude` so the per-session mesh selection is isolated from
|
||||
* `~/.claudemesh/config.json`. The tmpdir is rmSync'd on launch exit.
|
||||
*
|
||||
* Footgun: if a `claudemesh` invocation INHERITS that env from an
|
||||
* already-launched (or previously-launched) session — e.g. a Bash tool
|
||||
* call inside Claude Code, or a tmux pane that captured the env via
|
||||
* `update-environment` — the inherited path may point at a tmpdir that
|
||||
* no longer exists. Pre-1.34.14 we silently used the dead path,
|
||||
* `readConfig()` came back empty, and the user saw "No meshes joined"
|
||||
* from an otherwise-working install.
|
||||
*
|
||||
* Resolution rules:
|
||||
* 1. No env var → `~/.claudemesh` (default).
|
||||
* 2. Env points at a dir containing `config.json` → trust it
|
||||
* (the legitimate per-session-launch case).
|
||||
* 3. Env set but stale (dir missing or no `config.json`) → warn
|
||||
* once on stderr (TTY-only) and fall back to `~/.claudemesh`.
|
||||
*
|
||||
* Memoized: resolves once on first access. Mid-process env mutations
|
||||
* are intentionally ignored — paths must stay stable across one CLI
|
||||
* invocation.
|
||||
*/
|
||||
let _resolvedConfigDir: string | null = null;
|
||||
let _warnedStaleEnv = false;
|
||||
|
||||
function resolveConfigDir(): string {
|
||||
if (_resolvedConfigDir !== null) return _resolvedConfigDir;
|
||||
const envDir = process.env.CLAUDEMESH_CONFIG_DIR;
|
||||
if (!envDir) {
|
||||
_resolvedConfigDir = DEFAULT_CONFIG_DIR;
|
||||
return DEFAULT_CONFIG_DIR;
|
||||
}
|
||||
// Trust the env when it resolves to a real directory. We check
|
||||
// the DIR (not `config.json`) because the legitimate "fresh launch
|
||||
// before any write" case has the dir but no config.json yet.
|
||||
// The stale signature we want to catch is `rmSync(tmpDir,
|
||||
// {recursive: true})` from the outer launch's cleanup — that
|
||||
// removes the directory entirely, so a missing dir is the
|
||||
// unambiguous "stale" signal.
|
||||
if (existsSync(envDir)) {
|
||||
_resolvedConfigDir = envDir;
|
||||
return envDir;
|
||||
}
|
||||
// Stale: env set but the dir is gone. Most likely the outer
|
||||
// launch's cleanup ran and we inherited its (now-dead) tmpdir
|
||||
// path. Fall back to default and warn the user once on stderr —
|
||||
// only when attached to a TTY, so non-interactive callers (CI,
|
||||
// MCP boot, scripts piping stdout) stay quiet.
|
||||
if (!_warnedStaleEnv && process.stderr.isTTY) {
|
||||
_warnedStaleEnv = true;
|
||||
const unsetHint =
|
||||
process.env.SHELL?.endsWith("fish")
|
||||
? "set -e CLAUDEMESH_CONFIG_DIR CLAUDEMESH_IPC_TOKEN_FILE"
|
||||
: "unset CLAUDEMESH_CONFIG_DIR CLAUDEMESH_IPC_TOKEN_FILE";
|
||||
process.stderr.write(
|
||||
`claudemesh: ignoring stale CLAUDEMESH_CONFIG_DIR=${envDir} (no config.json there); using ${DEFAULT_CONFIG_DIR}.\n`
|
||||
+ ` Hint: this is usually a leftover env from a previous \`claudemesh launch\`. Clean it with:\n`
|
||||
+ ` ${unsetHint}\n`,
|
||||
);
|
||||
}
|
||||
_resolvedConfigDir = DEFAULT_CONFIG_DIR;
|
||||
return DEFAULT_CONFIG_DIR;
|
||||
}
|
||||
|
||||
export const PATHS = {
|
||||
CONFIG_DIR: process.env.CLAUDEMESH_CONFIG_DIR || join(home, ".claudemesh"),
|
||||
get CONFIG_DIR() {
|
||||
return resolveConfigDir();
|
||||
},
|
||||
get CONFIG_FILE() {
|
||||
return join(this.CONFIG_DIR, "config.json");
|
||||
},
|
||||
@@ -20,3 +92,12 @@ export const PATHS = {
|
||||
CLAUDE_JSON: join(home, ".claude.json"),
|
||||
CLAUDE_SETTINGS: join(home, ".claude", "settings.json"),
|
||||
} as const;
|
||||
|
||||
/**
|
||||
* Test-only: reset the memoized resolution. Not exported from the
|
||||
* package barrel; reach in via the relative path from a test file.
|
||||
*/
|
||||
export function _resetPathsForTest(): void {
|
||||
_resolvedConfigDir = null;
|
||||
_warnedStaleEnv = false;
|
||||
}
|
||||
|
||||
@@ -139,6 +139,25 @@ export function connectWsWithBackoff(opts: WsLifecycleOptions): Promise<WsLifecy
|
||||
* but ignores the rejection — by then the close handler has already
|
||||
* scheduled its own reconnect).
|
||||
*/
|
||||
// Liveness watchdog: same cadence (30s) as the broker's outbound
|
||||
// ping. Two jobs per tick:
|
||||
// 1. If we haven't heard from the broker in >75s (2.5x the ping
|
||||
// cadence — covers one missed ping plus some slack), terminate
|
||||
// the socket. Fires the close handler → backoff reconnect runs
|
||||
// its normal path. This is what catches NAT-dropped half-dead
|
||||
// connections that the kernel won't RST for ~2 hours.
|
||||
// 2. Otherwise, send our own ping. The broker's `ws` library
|
||||
// auto-replies with a pong, which bumps lastActivity. This
|
||||
// keeps the broker's stale-pong watchdog seeing us as alive.
|
||||
//
|
||||
// Bare `ping` and `pong` events both bump lastActivity, as does
|
||||
// any inbound application message — any sign of life resets the
|
||||
// dead-man's-switch.
|
||||
const PING_INTERVAL_MS = 30_000;
|
||||
const STALE_THRESHOLD_MS = 75_000;
|
||||
let lastActivity = Date.now();
|
||||
let watchdogTimer: NodeJS.Timeout | null = null;
|
||||
|
||||
const openOnce = (): Promise<void> => {
|
||||
if (closed) return Promise.reject(new Error("client_closed"));
|
||||
setStatus("connecting");
|
||||
@@ -146,6 +165,7 @@ export function connectWsWithBackoff(opts: WsLifecycleOptions): Promise<WsLifecy
|
||||
log("info", "ws_open_attempt", { url: opts.url });
|
||||
const sock = new WebSocket(opts.url);
|
||||
ws = sock;
|
||||
lastActivity = Date.now();
|
||||
|
||||
return new Promise<void>((resolve, reject) => {
|
||||
sock.on("open", () => {
|
||||
@@ -170,6 +190,7 @@ export function connectWsWithBackoff(opts: WsLifecycleOptions): Promise<WsLifecy
|
||||
});
|
||||
|
||||
sock.on("message", (raw) => {
|
||||
lastActivity = Date.now();
|
||||
let msg: Record<string, unknown>;
|
||||
try { msg = JSON.parse(raw.toString()) as Record<string, unknown>; }
|
||||
catch { return; }
|
||||
@@ -179,6 +200,18 @@ export function connectWsWithBackoff(opts: WsLifecycleOptions): Promise<WsLifecy
|
||||
setStatus("open");
|
||||
reconnectAttempt = 0;
|
||||
log("info", "ws_hello_acked", { url: opts.url });
|
||||
// Start liveness watchdog only after a successful handshake.
|
||||
if (watchdogTimer) clearInterval(watchdogTimer);
|
||||
watchdogTimer = setInterval(() => {
|
||||
if (sock.readyState !== sock.OPEN) return;
|
||||
const idle = Date.now() - lastActivity;
|
||||
if (idle > STALE_THRESHOLD_MS) {
|
||||
log("warn", "ws_stale_terminate", { url: opts.url, idle_ms: idle });
|
||||
try { sock.terminate(); } catch { /* socket already gone */ }
|
||||
return;
|
||||
}
|
||||
try { sock.ping(); } catch { /* ignore */ }
|
||||
}, PING_INTERVAL_MS);
|
||||
resolve();
|
||||
return;
|
||||
}
|
||||
@@ -186,8 +219,12 @@ export function connectWsWithBackoff(opts: WsLifecycleOptions): Promise<WsLifecy
|
||||
opts.onMessage(msg);
|
||||
});
|
||||
|
||||
sock.on("ping", () => { lastActivity = Date.now(); });
|
||||
sock.on("pong", () => { lastActivity = Date.now(); });
|
||||
|
||||
sock.on("close", (code, reason) => {
|
||||
if (helloTimer) { clearTimeout(helloTimer); helloTimer = null; }
|
||||
if (watchdogTimer) { clearInterval(watchdogTimer); watchdogTimer = null; }
|
||||
const reasonStr = reason.toString("utf8");
|
||||
log("warn", "ws_closed", { url: opts.url, code, reason: reasonStr, status });
|
||||
opts.onBeforeReconnect?.(code, reasonStr);
|
||||
@@ -227,6 +264,7 @@ export function connectWsWithBackoff(opts: WsLifecycleOptions): Promise<WsLifecy
|
||||
closed = true;
|
||||
if (reconnectTimer) { clearTimeout(reconnectTimer); reconnectTimer = null; }
|
||||
if (helloTimer) { clearTimeout(helloTimer); helloTimer = null; }
|
||||
if (watchdogTimer) { clearInterval(watchdogTimer); watchdogTimer = null; }
|
||||
try { ws?.close(); } catch { /* ignore */ }
|
||||
setStatus("closed");
|
||||
},
|
||||
|
||||
57
apps/cli/tests/unit/paths-stale-env.test.ts
Normal file
57
apps/cli/tests/unit/paths-stale-env.test.ts
Normal file
@@ -0,0 +1,57 @@
|
||||
import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
|
||||
import { mkdirSync, rmSync, existsSync } from "node:fs";
|
||||
import { join } from "node:path";
|
||||
import { tmpdir, homedir } from "node:os";
|
||||
|
||||
/** Each test imports a fresh copy of paths.ts via dynamic import +
|
||||
* `_resetPathsForTest()` so memoization doesn't leak across cases. */
|
||||
|
||||
const TEST_DIR = join(tmpdir(), "claudemesh-paths-test-" + Date.now());
|
||||
|
||||
describe("paths CONFIG_DIR resolution", () => {
|
||||
beforeEach(() => {
|
||||
delete process.env.CLAUDEMESH_CONFIG_DIR;
|
||||
if (existsSync(TEST_DIR)) rmSync(TEST_DIR, { recursive: true, force: true });
|
||||
});
|
||||
afterEach(() => {
|
||||
delete process.env.CLAUDEMESH_CONFIG_DIR;
|
||||
if (existsSync(TEST_DIR)) rmSync(TEST_DIR, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it("falls back to ~/.claudemesh when env var is unset", async () => {
|
||||
const mod = await import("~/constants/paths.js");
|
||||
mod._resetPathsForTest();
|
||||
expect(mod.PATHS.CONFIG_DIR).toBe(join(homedir(), ".claudemesh"));
|
||||
});
|
||||
|
||||
it("honors CLAUDEMESH_CONFIG_DIR when the dir exists, even without config.json", async () => {
|
||||
mkdirSync(TEST_DIR, { recursive: true });
|
||||
process.env.CLAUDEMESH_CONFIG_DIR = TEST_DIR;
|
||||
const mod = await import("~/constants/paths.js");
|
||||
mod._resetPathsForTest();
|
||||
expect(mod.PATHS.CONFIG_DIR).toBe(TEST_DIR);
|
||||
});
|
||||
|
||||
it("falls back to default when env points at a missing dir (stale-tmpdir case)", async () => {
|
||||
process.env.CLAUDEMESH_CONFIG_DIR = "/var/folders/_nonexistent_claudemesh_dir_xyz123";
|
||||
const mod = await import("~/constants/paths.js");
|
||||
mod._resetPathsForTest();
|
||||
// Suppress the stderr warning to keep test output clean
|
||||
const stderr = vi.spyOn(process.stderr, "write").mockImplementation(() => true);
|
||||
try {
|
||||
expect(mod.PATHS.CONFIG_DIR).toBe(join(homedir(), ".claudemesh"));
|
||||
} finally {
|
||||
stderr.mockRestore();
|
||||
}
|
||||
});
|
||||
|
||||
it("memoizes — second access returns the same path even if env changes mid-process", async () => {
|
||||
mkdirSync(TEST_DIR, { recursive: true });
|
||||
process.env.CLAUDEMESH_CONFIG_DIR = TEST_DIR;
|
||||
const mod = await import("~/constants/paths.js");
|
||||
mod._resetPathsForTest();
|
||||
const first = mod.PATHS.CONFIG_DIR;
|
||||
process.env.CLAUDEMESH_CONFIG_DIR = "/something/else";
|
||||
expect(mod.PATHS.CONFIG_DIR).toBe(first);
|
||||
});
|
||||
});
|
||||
@@ -1,55 +1,161 @@
|
||||
import Link from "next/link";
|
||||
|
||||
import {
|
||||
CHANGELOG_ENTRIES,
|
||||
CHANGELOG_TYPE_COLOR,
|
||||
CHANGELOG_TYPE_LABELS,
|
||||
} from "~/modules/marketing/home/changelog-data";
|
||||
|
||||
export const metadata = {
|
||||
title: "Changelog — claudemesh",
|
||||
description: "Release history for claudemesh-cli.",
|
||||
description:
|
||||
"Release history for claudemesh-cli — every shipped version, with the why behind it.",
|
||||
};
|
||||
|
||||
const ENTRIES = [
|
||||
{ version: "0.1.4", date: "2026-04-06", type: "feat", summary: "Stateful welcome screen, PROTOCOL.md, THREAT_MODEL.md, Windows CI matrix" },
|
||||
{ version: "0.1.3", date: "2026-04-05", type: "feat", summary: "claudemesh --version, status, doctor commands" },
|
||||
{ version: "0.1.2", date: "2026-04-05", type: "feat", summary: "claudemesh launch command, transparency banner, decrypt fix, Windows support" },
|
||||
];
|
||||
|
||||
const TYPE_LABELS: Record<string, string> = { feat: "Feature", fix: "Fix", docs: "Docs" };
|
||||
const TYPE_COLORS: Record<string, string> = { feat: "bg-[var(--cm-clay)]", fix: "bg-[var(--cm-cactus)]", docs: "bg-[var(--cm-oat)]" };
|
||||
|
||||
export default function ChangelogPage() {
|
||||
return (
|
||||
<section className="mx-auto max-w-3xl px-6 py-24 md:py-32">
|
||||
<h1
|
||||
className="text-[clamp(2rem,4.5vw,3rem)] font-medium leading-[1.1] text-[var(--cm-fg)]"
|
||||
style={{ fontFamily: "var(--cm-font-serif)" }}
|
||||
>
|
||||
Changelog
|
||||
</h1>
|
||||
<p
|
||||
className="mt-4 text-[15px] text-[var(--cm-fg-secondary)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
Every shipped version of claudemesh-cli.
|
||||
</p>
|
||||
<div className="mt-12 space-y-8">
|
||||
{ENTRIES.map((entry) => (
|
||||
<article key={entry.version} className="border-b border-[var(--cm-border)] pb-6">
|
||||
<div className="flex items-center gap-3">
|
||||
<span
|
||||
className={`rounded-[4px] px-2 py-0.5 text-[10px] font-medium uppercase tracking-wider text-[var(--cm-bg)] ${TYPE_COLORS[entry.type] || "bg-[var(--cm-fg-tertiary)]"}`}
|
||||
style={{ fontFamily: "var(--cm-font-mono)" }}
|
||||
>
|
||||
{TYPE_LABELS[entry.type] || entry.type}
|
||||
</span>
|
||||
<span className="text-[18px] font-medium text-[var(--cm-fg)]" style={{ fontFamily: "var(--cm-font-serif)" }}>
|
||||
v{entry.version}
|
||||
</span>
|
||||
<time dateTime={entry.date} className="text-[11px] text-[var(--cm-fg-tertiary)]" style={{ fontFamily: "var(--cm-font-mono)" }}>
|
||||
{new Date(entry.date).toLocaleDateString("en-US", { year: "numeric", month: "short", day: "numeric" })}
|
||||
</time>
|
||||
</div>
|
||||
<p className="mt-2 text-[14px] leading-[1.6] text-[var(--cm-fg-secondary)]" style={{ fontFamily: "var(--cm-font-sans)" }}>
|
||||
{entry.summary}
|
||||
</p>
|
||||
</article>
|
||||
))}
|
||||
<div className="mb-12">
|
||||
<p
|
||||
className="text-[11px] uppercase tracking-[0.2em] text-[var(--cm-fg-tertiary)]"
|
||||
style={{ fontFamily: "var(--cm-font-mono)" }}
|
||||
>
|
||||
claudemesh-cli · release log
|
||||
</p>
|
||||
<h1
|
||||
className="mt-3 text-[clamp(2rem,4.5vw,3rem)] font-medium leading-[1.1] text-[var(--cm-fg)]"
|
||||
style={{ fontFamily: "var(--cm-font-serif)" }}
|
||||
>
|
||||
Changelog
|
||||
</h1>
|
||||
<p
|
||||
className="mt-4 max-w-xl text-[15px] leading-[1.65] text-[var(--cm-fg-secondary)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
Hand-picked, load-bearing ships from{" "}
|
||||
<span className="text-[var(--cm-fg)]">v0.1.0</span> through{" "}
|
||||
<span className="text-[var(--cm-clay)]">v1.34.15</span>. For the
|
||||
byte-level diff, the canonical{" "}
|
||||
<Link
|
||||
href="https://github.com/alezmad/claudemesh/blob/main/apps/cli/CHANGELOG.md"
|
||||
className="underline decoration-[var(--cm-fg-tertiary)] underline-offset-4 transition-colors hover:text-[var(--cm-fg)] hover:decoration-[var(--cm-clay)]"
|
||||
>
|
||||
CHANGELOG.md
|
||||
</Link>{" "}
|
||||
lives in the repo.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
{/* Vertical timeline rail */}
|
||||
<div className="relative">
|
||||
<div
|
||||
className="absolute left-[7px] top-2 hidden h-full w-px md:block"
|
||||
style={{
|
||||
background:
|
||||
"linear-gradient(to bottom, var(--cm-clay) 0%, var(--cm-fig) 30%, var(--cm-cactus) 60%, transparent 100%)",
|
||||
}}
|
||||
/>
|
||||
|
||||
<div className="space-y-10">
|
||||
{CHANGELOG_ENTRIES.map((entry, idx) => (
|
||||
<article
|
||||
key={entry.version + entry.date}
|
||||
className="relative md:pl-10"
|
||||
>
|
||||
{/* Dot on rail */}
|
||||
<div
|
||||
className="absolute left-0 top-[10px] hidden h-[15px] w-[15px] rounded-full border-2 md:block"
|
||||
style={{
|
||||
borderColor: CHANGELOG_TYPE_COLOR[entry.type],
|
||||
backgroundColor: "var(--cm-bg)",
|
||||
}}
|
||||
>
|
||||
<div
|
||||
className="absolute inset-[3px] rounded-full"
|
||||
style={{
|
||||
backgroundColor: CHANGELOG_TYPE_COLOR[entry.type],
|
||||
opacity: idx === 0 ? 1 : 0.5,
|
||||
}}
|
||||
/>
|
||||
</div>
|
||||
|
||||
<header className="mb-3 flex flex-wrap items-baseline gap-x-3 gap-y-1">
|
||||
<span
|
||||
className="rounded-[3px] px-1.5 py-0.5 text-[10px] font-medium uppercase tracking-wider"
|
||||
style={{
|
||||
fontFamily: "var(--cm-font-mono)",
|
||||
backgroundColor: CHANGELOG_TYPE_COLOR[entry.type],
|
||||
color: "var(--cm-gray-900)",
|
||||
}}
|
||||
>
|
||||
{CHANGELOG_TYPE_LABELS[entry.type]}
|
||||
</span>
|
||||
<span
|
||||
className="text-[18px] font-medium text-[var(--cm-fg)]"
|
||||
style={{ fontFamily: "var(--cm-font-serif)" }}
|
||||
>
|
||||
v{entry.version}
|
||||
</span>
|
||||
<time
|
||||
dateTime={entry.date}
|
||||
className="text-[11px] text-[var(--cm-fg-tertiary)]"
|
||||
style={{ fontFamily: "var(--cm-font-mono)" }}
|
||||
>
|
||||
{new Date(entry.date).toLocaleDateString("en-US", {
|
||||
year: "numeric",
|
||||
month: "short",
|
||||
day: "numeric",
|
||||
})}
|
||||
</time>
|
||||
</header>
|
||||
|
||||
<h2
|
||||
className="text-[15px] font-medium text-[var(--cm-fg)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
{entry.title}
|
||||
</h2>
|
||||
|
||||
<p
|
||||
className="mt-2 text-[14px] leading-[1.7] text-[var(--cm-fg-secondary)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
{entry.summary}
|
||||
</p>
|
||||
</article>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<footer className="mt-20 border-t border-[var(--cm-border)] pt-8">
|
||||
<p
|
||||
className="text-[13px] text-[var(--cm-fg-tertiary)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
Tracked at{" "}
|
||||
<Link
|
||||
href="https://github.com/alezmad/claudemesh/blob/main/docs/roadmap.md"
|
||||
className="underline decoration-[var(--cm-fg-tertiary)] underline-offset-4 transition-colors hover:text-[var(--cm-fg)] hover:decoration-[var(--cm-clay)]"
|
||||
>
|
||||
docs/roadmap.md
|
||||
</Link>
|
||||
. Specs at{" "}
|
||||
<Link
|
||||
href="https://github.com/alezmad/claudemesh/tree/main/.artifacts/specs"
|
||||
className="underline decoration-[var(--cm-fg-tertiary)] underline-offset-4 transition-colors hover:text-[var(--cm-fg)] hover:decoration-[var(--cm-clay)]"
|
||||
>
|
||||
.artifacts/specs/
|
||||
</Link>
|
||||
. Tagged binaries on{" "}
|
||||
<Link
|
||||
href="https://github.com/alezmad/claudemesh/releases"
|
||||
className="underline decoration-[var(--cm-fg-tertiary)] underline-offset-4 transition-colors hover:text-[var(--cm-fg)] hover:decoration-[var(--cm-clay)]"
|
||||
>
|
||||
GitHub Releases
|
||||
</Link>
|
||||
.
|
||||
</p>
|
||||
</footer>
|
||||
</section>
|
||||
);
|
||||
}
|
||||
|
||||
@@ -3,6 +3,7 @@ import { Features } from "~/modules/marketing/home/features";
|
||||
import { WhereMeshFits } from "~/modules/marketing/home/where-mesh-fits";
|
||||
import { WhatIsClaudemesh } from "~/modules/marketing/home/what-is-claudemesh";
|
||||
import { Timeline } from "~/modules/marketing/home/timeline";
|
||||
import { LatestReleases } from "~/modules/marketing/home/latest-releases";
|
||||
import { Pricing } from "~/modules/marketing/home/pricing";
|
||||
import { FAQ } from "~/modules/marketing/home/faq";
|
||||
import { CallToAction } from "~/modules/marketing/home/cta";
|
||||
@@ -22,6 +23,7 @@ const HomePage = () => {
|
||||
<WhereMeshFits />
|
||||
<WhatIsClaudemesh />
|
||||
<Timeline />
|
||||
<LatestReleases count={5} />
|
||||
<Pricing />
|
||||
<FAQ />
|
||||
<CallToAction />
|
||||
|
||||
168
apps/web/src/modules/marketing/home/changelog-data.ts
Normal file
168
apps/web/src/modules/marketing/home/changelog-data.ts
Normal file
@@ -0,0 +1,168 @@
|
||||
/**
|
||||
* Single source of truth for the curated release log surfaced on:
|
||||
* - /changelog (full timeline)
|
||||
* - / (Latest Releases compact strip)
|
||||
*
|
||||
* Lives outside `app/.../page.tsx` because Next.js's app-router type generator
|
||||
* rejects non-conforming exports from route files (only `default`, `metadata`,
|
||||
* `dynamic`, etc. are allowed). Importing data from a plain module sidesteps
|
||||
* the constraint without changing route semantics.
|
||||
*
|
||||
* Hand-picked load-bearing ships, newest first. For the byte-level history
|
||||
* see `apps/cli/CHANGELOG.md` in the repo.
|
||||
*/
|
||||
|
||||
export type ChangelogEntry = {
|
||||
version: string;
|
||||
date: string;
|
||||
type: "feat" | "fix" | "docs" | "perf" | "infra";
|
||||
title: string;
|
||||
summary: string;
|
||||
};
|
||||
|
||||
export const CHANGELOG_ENTRIES: ChangelogEntry[] = [
|
||||
{
|
||||
version: "1.34.15",
|
||||
date: "2026-05-04",
|
||||
type: "fix",
|
||||
title: "peer list --mesh scopes; kick refuses control-plane",
|
||||
summary:
|
||||
"Two follow-ups from the multi-session correctness train. peer list --mesh now forwards the slug to the daemon (was aggregating across all attached meshes). The broker refuses no-op kicks against control-plane connections (daemon, dashboard) — they auto-reconnected within seconds — and surfaces them in a new additive ack field. Soft `disconnect` keeps old behavior.",
|
||||
},
|
||||
{
|
||||
version: "1.34.14",
|
||||
date: "2026-05-04",
|
||||
type: "fix",
|
||||
title: "stale CLAUDEMESH_CONFIG_DIR falls back",
|
||||
summary:
|
||||
"When the launched-session env leaked into a later CLI invocation and pointed at a tmpdir that no longer existed, the resolver silently used the dead path and showed “No meshes joined”. Now memoized: env unset → default; env points at a real dir → trust; env set but dir gone → TTY-only stderr warning + fallback to ~/.claudemesh.",
|
||||
},
|
||||
{
|
||||
version: "1.34.7 → 1.34.13",
|
||||
date: "2026-05-04",
|
||||
type: "fix",
|
||||
title: "multi-session correctness train",
|
||||
summary:
|
||||
"Seven releases over a few hours that took claudemesh from “works for one session” to “internally consistent for N sessions on one daemon.” Per-session SSE demux at the bind layer, inbox per-recipient column, daemon detached by default, MCP forwards session token on /v1/events. Architecture invariant: every shared store / channel scopes by recipient.",
|
||||
},
|
||||
{
|
||||
version: "1.32.0",
|
||||
date: "2026-05-04",
|
||||
type: "feat",
|
||||
title: "multi-session UX bundle",
|
||||
summary:
|
||||
"Self-identity via session pubkey, `--self` fan-out for member-pubkey targeting, broker welcome on launch (broker state + peer count + unread inbox). Resolves hex prefixes to full pubkeys before send.",
|
||||
},
|
||||
{
|
||||
version: "1.30.0",
|
||||
date: "2026-05-04",
|
||||
type: "feat",
|
||||
title: "per-session broker presence",
|
||||
summary:
|
||||
"Two `claudemesh launch` sessions in the same cwd finally see each other in `peer list`. Each session has a long-lived broker presence row owned by the daemon, identified by a per-launch ephemeral keypair vouched by the member's stable key. Broker `session_hello` handler with parent-attestation TTL and session-signature checks.",
|
||||
},
|
||||
{
|
||||
version: "1.26.0 → 1.29.0",
|
||||
date: "2026-05-04",
|
||||
type: "feat",
|
||||
title: "multi-mesh daemon · per-session IPC tokens",
|
||||
summary:
|
||||
"One daemon process attaches to every joined mesh simultaneously. Aggregate read routes (/v1/peers, /v1/skills) tag each record with its mesh; explicit ?mesh=<slug> narrows server-side. Per-session IPC tokens scoped to tmpdir mode-0600 so CLI invocations from inside a launched session auto-attribute to its workspace. Self-healing daemon lifecycle (auto-spawn under file-lock, version probe).",
|
||||
},
|
||||
{
|
||||
version: "1.24.0",
|
||||
date: "2026-05-03",
|
||||
type: "feat",
|
||||
title: "daemon required + thin MCP",
|
||||
summary:
|
||||
"MCP server shrinks from 979 LoC to ~200 LoC of push-pipe. The daemon owns the broker WS and feeds the MCP push channel over IPC SSE. `claudemesh install` auto-installs and starts the daemon service. `claudemesh launch` ensures daemon is running before spawning Claude.",
|
||||
},
|
||||
{
|
||||
version: "0.9.0 (1.22.0)",
|
||||
date: "2026-05-03",
|
||||
type: "feat",
|
||||
title: "daemon foundation",
|
||||
summary:
|
||||
"Long-lived process holding one broker WS per attached mesh, durable outbox/inbox in SQLite, IPC over UDS (+ optional loopback TCP w/ bearer), SSE event stream. Caller-stable idempotency on every send. Service install (launchd / systemd-user). Outbox CLI with atomic abort+insert on requeue. Host-fingerprint pin on first run.",
|
||||
},
|
||||
{
|
||||
version: "0.7.0 (1.21.0)",
|
||||
date: "2026-05-03",
|
||||
type: "infra",
|
||||
title: "slug = identifier",
|
||||
summary:
|
||||
"Pre-launch correction of generic SaaS scaffolding. mesh.name and mesh.slug collapse — slug IS the identifier. `claudemesh rename <old-slug> <new-slug>` is the entire rename surface. CLI picker drops the (parens). Server PATCH /api/cli/meshes/:slug body becomes `{ slug }`.",
|
||||
},
|
||||
{
|
||||
version: "0.4.0 → 0.5.2 (1.10.0–1.18.0)",
|
||||
date: "2026-05-03",
|
||||
type: "feat",
|
||||
title: "me/* cross-mesh aggregation",
|
||||
summary:
|
||||
"First cross-mesh read-aggregating verbs. /v1/me/workspace, /v1/me/topics, /v1/me/notifications, /v1/me/activity, /v1/me/search — every aggregating read verb has CLI + web parity. Default-aggregation for `topic list`, `notification list`, `task list`, `state list`, `memory recall` when no --mesh is passed. file share / get with same-host fast path.",
|
||||
},
|
||||
{
|
||||
version: "0.3.0 (1.8.0)",
|
||||
date: "2026-05-02",
|
||||
type: "feat",
|
||||
title: "per-topic encryption (CLI + web)",
|
||||
summary:
|
||||
"Topics generate a 32-byte symmetric key on creation; broker seals via crypto_box for the creator. Pending-seals endpoint, seal POST, claudemesh topic post for encrypted REST sends, decrypt-on-render in topic tail, 30s background re-seal loop. Web side: browser-side persistent ed25519 identity in IndexedDB + encrypt-on-send / decrypt-on-render.",
|
||||
},
|
||||
{
|
||||
version: "1.7.0",
|
||||
date: "2026-05-02",
|
||||
type: "feat",
|
||||
title: "demo cut: topic tail, member list, notifications",
|
||||
summary:
|
||||
"Member sidebar in chat panel with names, online dots, presence summaries. Topic search + member-mention autocomplete. Notification feed at /dashboard listing every @<your-name> reference across all meshes (last 7 days). CLI parity: `claudemesh topic tail` (live SSE consumer), `claudemesh member list`, `claudemesh notification list`.",
|
||||
},
|
||||
{
|
||||
version: "0.2.0 (1.6.0)",
|
||||
date: "2026-05-02",
|
||||
type: "feat",
|
||||
title: "topics + REST gateway + bridge peers",
|
||||
summary:
|
||||
"Topics (channel pub/sub) with mesh = trust boundary, group = identity tag, topic = conversation scope — three orthogonal axes. API keys for non-WebSocket clients. REST /api/v1/* with bearer-token auth (messages, topics, peers, history). Bridge peers belonging to two meshes forwarding a topic between them. Humans-as-peers — peer_type: human plumbed end-to-end.",
|
||||
},
|
||||
{
|
||||
version: "1.5.0",
|
||||
date: "2026-05-02",
|
||||
type: "feat",
|
||||
title: "CLI-first architecture lock-in",
|
||||
summary:
|
||||
"Tool-less MCP — tools/list returns []. Inbound peer messages still arrive as experimental.claude/channel notifications mid-turn. Bundle size −42%. Resource-noun-verb CLI (peer list, message send, memory recall). Bundled claudemesh skill installed to ~/.claude/skills/. Unix-socket bridge for warm WS reuse (~220 ms warm vs ~600 ms cold). Policy engine + audit log.",
|
||||
},
|
||||
{
|
||||
version: "1.0.0-alpha",
|
||||
date: "2026-04-15",
|
||||
type: "feat",
|
||||
title: "single-binary distribution + per-peer caps",
|
||||
summary:
|
||||
"curl -fsSL claudemesh.com/install | sh downloads the right binary (darwin/linux/windows × x64/arm64). claudemesh:// URL scheme makes invite emails one-click. Per-peer capability grants: claudemesh grant/revoke/block/grants enforced server-side. Encrypted backup / restore with Argon2id + XChaCha20-Poly1305. Safety numbers (`claudemesh verify <peer>`).",
|
||||
},
|
||||
{
|
||||
version: "0.1.0",
|
||||
date: "2026-04-04",
|
||||
type: "feat",
|
||||
title: "public launch",
|
||||
summary:
|
||||
"Direct peer-to-peer messaging through a hosted broker, ready for real teams. End-to-end encryption — crypto_box direct, crypto_secretbox group. Signed ed25519 identities + signed invite links (ic://join/...). Hello-sig handshake auth. Hosted broker at wss://ic.claudemesh.com/ws. Claude Code MCP tools: list_peers, send_message, check_messages, set_summary, set_status.",
|
||||
},
|
||||
];
|
||||
|
||||
export const CHANGELOG_TYPE_LABELS: Record<ChangelogEntry["type"], string> = {
|
||||
feat: "Feature",
|
||||
fix: "Fix",
|
||||
docs: "Docs",
|
||||
perf: "Perf",
|
||||
infra: "Infra",
|
||||
};
|
||||
|
||||
export const CHANGELOG_TYPE_COLOR: Record<ChangelogEntry["type"], string> = {
|
||||
feat: "var(--cm-clay)",
|
||||
fix: "var(--cm-cactus)",
|
||||
docs: "var(--cm-oat)",
|
||||
perf: "var(--cm-fig)",
|
||||
infra: "var(--cm-fg-tertiary)",
|
||||
};
|
||||
@@ -32,9 +32,9 @@ export const CallToAction = () => {
|
||||
className="mx-auto mt-8 max-w-2xl text-lg leading-[1.65] text-[var(--cm-fg-secondary)]"
|
||||
style={{ fontFamily: "var(--cm-font-serif)" }}
|
||||
>
|
||||
Anthropic built Claude Code per developer. The next unlock is
|
||||
between developers. Hosted on claudemesh.com or self-hosted in
|
||||
your VPC — same CLI, same features, same encryption.
|
||||
Anthropic Agent Teams stops at the edge of one laptop. claudemesh
|
||||
starts there — across machines, users, and organizations. Hosted
|
||||
on claudemesh.com or self-hosted in your VPC, same CLI either way.
|
||||
</p>
|
||||
</Reveal>
|
||||
<Reveal delay={3}>
|
||||
|
||||
@@ -5,7 +5,7 @@ import { Reveal } from "./_reveal";
|
||||
const ITEMS = [
|
||||
{
|
||||
q: "Is claudemesh free?",
|
||||
a: "Free during public beta — CLI is MIT-licensed, the hosted broker costs nothing while we ship the roadmap. Paid tiers launch when the dashboard ships. Beta users keep the free plan for life.",
|
||||
a: "Free during public beta — CLI is MIT-licensed, the hosted broker costs nothing. Paid tiers launch when we exit beta and add team-scale features (SSO, audit retention, dedicated brokers). Beta users keep the free plan for life.",
|
||||
},
|
||||
{
|
||||
q: "How do I get started?",
|
||||
@@ -33,7 +33,11 @@ const ITEMS = [
|
||||
},
|
||||
{
|
||||
q: "How is this different from MCP?",
|
||||
a: "MCP connects one Claude to tools and services. claudemesh connects many Claudes to each other. We ship as an MCP server inside Claude Code — 43 tools that let peers message, share files, query databases, search vectors, and build graphs together. From the agent's view, other peers look like callable tools. It composes on top of MCP; it doesn't replace it.",
|
||||
a: "MCP connects one Claude to tools and services. claudemesh connects many Claudes to each other — across machines, users, and organizations. As of v1.5.0 the MCP shim is intentionally thin: tools/list returns []. Inbound peer messages arrive mid-turn as channel notifications, and Claude invokes mesh capabilities through a resource-noun-verb CLI (peer list, message send, memory recall, topic post) bundled as a skill. claudemesh composes on top of MCP; it doesn't replace it.",
|
||||
},
|
||||
{
|
||||
q: "How is this different from Anthropic's Agent Teams?",
|
||||
a: "Anthropic's experimental Agent Teams (shipped Feb 2026, Claude Code v2.1.32+) coordinates multiple Claude Code sessions inside ONE Unix user's ~/.claude/ directory on ONE machine. Mailbox lives in process. Task list is a markdown file. Lead is fixed for the team's lifetime. Cleanup wipes the state. claudemesh runs across machines, users, and organizations. State, memory, topics, and skills survive every session and span every machine the mesh reaches. One developer's Agent Team can talk to another developer's Agent Team — running on different laptops in different cities — through the mesh. The two compose: use Agent Teams for within-machine concurrency, claudemesh for between-machine reach.",
|
||||
},
|
||||
{
|
||||
q: "What persistence backends does the mesh include?",
|
||||
@@ -53,7 +57,7 @@ const ITEMS = [
|
||||
},
|
||||
{
|
||||
q: "Can a peer be in multiple meshes?",
|
||||
a: "Yes. Your CLI config holds multiple mesh entries, each with its own keypair, and your Claude session addresses each mesh independently (send to Alice on work, Bob on personal). Cross-mesh bridge peers that auto-forward tagged messages are v0.2; cross-broker federation (your self-host ↔ claudemesh.com) is v0.3.",
|
||||
a: "Yes. Your CLI config holds multiple mesh entries, each with its own keypair. As of v1.26.0, the daemon attaches to every joined mesh simultaneously — `claudemesh peer list` aggregates across all of them, `--mesh <slug>` narrows to one. Cross-mesh bridge peers that auto-forward tagged topics shipped in v0.2.0 (v1.6.0). Cross-broker federation (your self-host ↔ claudemesh.com) is the next major direction.",
|
||||
},
|
||||
];
|
||||
|
||||
|
||||
@@ -67,9 +67,10 @@ export const HeroWithMesh = () => {
|
||||
textShadow: "0 2px 20px rgba(0,0,0,0.8)",
|
||||
}}
|
||||
>
|
||||
Share context, files, skills, and MCPs across every Claude Code
|
||||
session — end-to-end encrypted. Hosted on claudemesh.com or
|
||||
self-hosted in your VPC. Same CLI, same wire, your choice.
|
||||
The encrypted backbone where Claude Code sessions, autonomous
|
||||
agents, and humans coordinate — across machines, across users,
|
||||
across organizations. Hosted on claudemesh.com or self-hosted in
|
||||
your VPC. Same CLI, same wire, your choice.
|
||||
</p>
|
||||
</Reveal>
|
||||
|
||||
|
||||
141
apps/web/src/modules/marketing/home/latest-releases.tsx
Normal file
141
apps/web/src/modules/marketing/home/latest-releases.tsx
Normal file
@@ -0,0 +1,141 @@
|
||||
import Link from "next/link";
|
||||
|
||||
import {
|
||||
CHANGELOG_ENTRIES,
|
||||
CHANGELOG_TYPE_COLOR,
|
||||
CHANGELOG_TYPE_LABELS,
|
||||
} from "./changelog-data";
|
||||
import { Reveal, SectionIcon } from "./_reveal";
|
||||
|
||||
/**
|
||||
* Compact recent-releases strip for the home page. Pulls the top N entries
|
||||
* from the same data source as the full /changelog page so they never
|
||||
* disagree.
|
||||
*/
|
||||
export const LatestReleases = ({ count = 5 }: { count?: number }) => {
|
||||
const recent = CHANGELOG_ENTRIES.slice(0, count);
|
||||
|
||||
return (
|
||||
<section className="border-b border-[var(--cm-border)] bg-[var(--cm-bg-elevated)] px-6 py-24 md:px-12 md:py-28">
|
||||
<div className="mx-auto max-w-[var(--cm-max-w)]">
|
||||
<Reveal className="mb-6 flex justify-center">
|
||||
<SectionIcon glyph="grid" />
|
||||
</Reveal>
|
||||
|
||||
<Reveal delay={1}>
|
||||
<p
|
||||
className="text-center text-[11px] uppercase tracking-[0.2em] text-[var(--cm-fg-tertiary)]"
|
||||
style={{ fontFamily: "var(--cm-font-mono)" }}
|
||||
>
|
||||
release log · last {count} ships
|
||||
</p>
|
||||
</Reveal>
|
||||
|
||||
<Reveal delay={2}>
|
||||
<h2
|
||||
className="mt-3 text-center text-[clamp(1.75rem,3.5vw,2.5rem)] font-medium leading-[1.15] text-[var(--cm-fg)]"
|
||||
style={{ fontFamily: "var(--cm-font-serif)" }}
|
||||
>
|
||||
What shipped this week
|
||||
</h2>
|
||||
</Reveal>
|
||||
|
||||
<Reveal delay={3}>
|
||||
<p
|
||||
className="mx-auto mt-3 max-w-xl text-center text-[14px] leading-[1.65] text-[var(--cm-fg-secondary)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
Every release is in production on{" "}
|
||||
<span
|
||||
className="text-[var(--cm-fg)]"
|
||||
style={{ fontFamily: "var(--cm-font-mono)" }}
|
||||
>
|
||||
wss://ic.claudemesh.com
|
||||
</span>{" "}
|
||||
within minutes. The CLI publishes to npm; the broker auto-deploys.
|
||||
</p>
|
||||
</Reveal>
|
||||
|
||||
<Reveal delay={4}>
|
||||
<ol className="mx-auto mt-12 max-w-3xl space-y-4">
|
||||
{recent.map((entry, idx) => (
|
||||
<li key={entry.version + entry.date}>
|
||||
<Link
|
||||
href="/changelog"
|
||||
className="group block rounded-[var(--cm-radius-md)] border border-[var(--cm-border)] bg-[var(--cm-bg)] p-5 transition-colors hover:border-[var(--cm-clay)]/40"
|
||||
>
|
||||
<div className="flex flex-wrap items-baseline gap-x-3 gap-y-1">
|
||||
<span
|
||||
className="rounded-[3px] px-1.5 py-0.5 text-[10px] font-medium uppercase tracking-wider"
|
||||
style={{
|
||||
fontFamily: "var(--cm-font-mono)",
|
||||
backgroundColor: CHANGELOG_TYPE_COLOR[entry.type],
|
||||
color: "var(--cm-gray-900)",
|
||||
}}
|
||||
>
|
||||
{CHANGELOG_TYPE_LABELS[entry.type]}
|
||||
</span>
|
||||
<span
|
||||
className="text-[16px] font-medium text-[var(--cm-fg)]"
|
||||
style={{ fontFamily: "var(--cm-font-serif)" }}
|
||||
>
|
||||
v{entry.version}
|
||||
</span>
|
||||
<time
|
||||
dateTime={entry.date}
|
||||
className="text-[11px] text-[var(--cm-fg-tertiary)]"
|
||||
style={{ fontFamily: "var(--cm-font-mono)" }}
|
||||
>
|
||||
{new Date(entry.date).toLocaleDateString("en-US", {
|
||||
year: "numeric",
|
||||
month: "short",
|
||||
day: "numeric",
|
||||
})}
|
||||
</time>
|
||||
{idx === 0 && (
|
||||
<span
|
||||
className="rounded-full bg-[var(--cm-clay)]/15 px-2 py-0.5 text-[10px] font-medium uppercase tracking-wider text-[var(--cm-clay)]"
|
||||
style={{ fontFamily: "var(--cm-font-mono)" }}
|
||||
>
|
||||
latest
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
<h3
|
||||
className="mt-2.5 text-[15px] font-medium text-[var(--cm-fg)] transition-colors group-hover:text-[var(--cm-clay)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
{entry.title}
|
||||
</h3>
|
||||
<p
|
||||
className="mt-2 line-clamp-2 text-[13px] leading-[1.6] text-[var(--cm-fg-secondary)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
{entry.summary}
|
||||
</p>
|
||||
</Link>
|
||||
</li>
|
||||
))}
|
||||
</ol>
|
||||
</Reveal>
|
||||
|
||||
<Reveal delay={5}>
|
||||
<div className="mt-10 flex justify-center">
|
||||
<Link
|
||||
href="/changelog"
|
||||
className="group inline-flex items-center gap-2 text-[13px] font-medium text-[var(--cm-fg-secondary)] transition-colors hover:text-[var(--cm-clay)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
<span className="border-b border-dashed border-[var(--cm-fg-tertiary)] pb-0.5 transition-colors group-hover:border-[var(--cm-clay)]">
|
||||
Read the full changelog
|
||||
</span>
|
||||
<span className="transition-transform duration-300 group-hover:translate-x-1">
|
||||
→
|
||||
</span>
|
||||
</Link>
|
||||
</div>
|
||||
</Reveal>
|
||||
</div>
|
||||
</section>
|
||||
);
|
||||
};
|
||||
@@ -111,8 +111,9 @@ export const Pricing = () => {
|
||||
className="mb-4 text-[12px] leading-[1.5] text-[var(--cm-fg-tertiary)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
Paid tiers launch when the dashboard ships. Beta users keep
|
||||
the free plan for life.
|
||||
Paid tiers launch when we exit beta and add team-scale
|
||||
features (SSO, audit retention, dedicated brokers). Beta
|
||||
users keep the free plan for life.
|
||||
</p>
|
||||
<Link
|
||||
href="/auth/register"
|
||||
|
||||
@@ -85,6 +85,23 @@ const MILESTONES = [
|
||||
],
|
||||
stat: "43 MCP tools total",
|
||||
},
|
||||
{
|
||||
version: "v0.9 → 1.34",
|
||||
phase: "Daemon · multi-mesh · multi-session",
|
||||
color: "var(--cm-cactus)",
|
||||
items: [
|
||||
"Persistent daemon — long-lived broker WS, durable outbox/inbox",
|
||||
"Universal multi-mesh daemon — one process, every joined mesh",
|
||||
"Per-session IPC tokens — auto-scope to the launched session",
|
||||
"Per-session broker presence — sibling sessions see each other",
|
||||
"Self-healing daemon lifecycle (auto-spawn, version probe)",
|
||||
"Multi-session correctness train — per-recipient SSE demux + inbox scoping",
|
||||
"Refuse-to-kick on control-plane (no more no-op kicks)",
|
||||
"Caller-stable idempotency on every send",
|
||||
"Stale CLAUDEMESH_CONFIG_DIR fallback",
|
||||
],
|
||||
stat: "1.34.15 shipped",
|
||||
},
|
||||
];
|
||||
|
||||
export const Timeline = () => {
|
||||
@@ -94,7 +111,7 @@ export const Timeline = () => {
|
||||
<section className="border-b border-[var(--cm-border)] bg-[var(--cm-bg)] px-6 py-24 md:px-12 md:py-32">
|
||||
<div className="mx-auto max-w-[var(--cm-max-w)]">
|
||||
<Reveal className="mb-6 flex justify-center">
|
||||
<SectionIcon glyph="layers" />
|
||||
<SectionIcon glyph="grid" />
|
||||
</Reveal>
|
||||
<Reveal delay={1}>
|
||||
<h2
|
||||
@@ -109,7 +126,8 @@ export const Timeline = () => {
|
||||
className="mx-auto mt-4 max-w-xl text-center text-[15px] leading-[1.6] text-[var(--cm-fg-secondary)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
66 npm releases. Every feature below is in production today.
|
||||
120+ npm releases through v1.34.15. Every feature below is in
|
||||
production today.
|
||||
</p>
|
||||
</Reveal>
|
||||
|
||||
@@ -210,8 +228,8 @@ export const Timeline = () => {
|
||||
className="text-[14px] text-[var(--cm-fg-tertiary)]"
|
||||
style={{ fontFamily: "var(--cm-font-serif)" }}
|
||||
>
|
||||
Daemon redesign · per-topic encryption · self-host
|
||||
packaging · federation
|
||||
HKDF cross-machine identity · session capabilities · A2A
|
||||
interop · self-host packaging · federation
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
@@ -4,28 +4,28 @@ import Link from "next/link";
|
||||
|
||||
const NEWS = [
|
||||
{
|
||||
tag: "New",
|
||||
title: "claudemesh launch (v0.1.4)",
|
||||
body: "Real-time peer messages pushed into Claude Code mid-turn. One command. Source open at github.com/alezmad/claudemesh-cli.",
|
||||
href: "https://github.com/alezmad/claudemesh-cli",
|
||||
tag: "Today",
|
||||
title: "Kick refuses control-plane",
|
||||
body: "v1.34.15 — broker now skips control-plane peers on kick and acks the skip. Use ban for hard removal, or take the daemon down for transient cases.",
|
||||
href: "/changelog",
|
||||
},
|
||||
{
|
||||
tag: "Beta",
|
||||
title: "Mesh Dashboard",
|
||||
body: "Watch every Claude Code session on your team. Routes, presence, priority — all live.",
|
||||
href: "#",
|
||||
tag: "This week",
|
||||
title: "Multi-session correctness",
|
||||
body: "1.34.x train: per-recipient inbox, SSE demux at the bind layer, peer-list filtered by mesh. Multiple sessions on one machine no longer cross-talk.",
|
||||
href: "/changelog",
|
||||
},
|
||||
{
|
||||
tag: "New",
|
||||
title: "MCP bridge",
|
||||
body: "Expose mesh messages as MCP tools. Your agent can message peers without leaving its context.",
|
||||
href: "#",
|
||||
tag: "Shipped",
|
||||
title: "Per-session presence",
|
||||
body: "v1.30.0 — every Claude Code session gets its own ed25519 keypair and parent attestation. The broker tracks sessions, not machines.",
|
||||
href: "/changelog",
|
||||
},
|
||||
{
|
||||
tag: "Launch",
|
||||
title: "Self-hosted broker",
|
||||
body: "One binary. SQLite-backed. Runs on a Pi. Your mesh, never the cloud's.",
|
||||
href: "#",
|
||||
tag: "Shipped",
|
||||
title: "Multi-mesh daemon",
|
||||
body: "v1.26.0 — one daemon, every mesh you've joined. Switch context with a flag. Self-host the broker in your VPC; same CLI, your URL.",
|
||||
href: "/changelog",
|
||||
},
|
||||
];
|
||||
|
||||
|
||||
@@ -25,6 +25,14 @@ const CARDS: Card[] = [
|
||||
weDo: "claudemesh connects full, independent Claude Code sessions across machines, across developers, across continents. Each peer keeps its own repo, its own perspective, its own scrollback.",
|
||||
tone: "compare",
|
||||
},
|
||||
{
|
||||
label: "vs. Agent Teams",
|
||||
title: "Multi-agent within one machine",
|
||||
theyDo:
|
||||
"Anthropic's experimental Agent Teams (Feb 2026, Claude Code v2.1.32+) coordinates multiple Claude Code sessions inside ONE Unix user's ~/.claude/ directory on ONE machine. Mailbox in process. Task list in a markdown file. Lead is fixed. Cleanup wipes the state.",
|
||||
weDo: "claudemesh runs across machines, users, and organizations. State, memory, topics, and skills survive every session. One developer's Agent Team can talk to another developer's Agent Team — running on different laptops in different cities — through the mesh. Use Agent Teams for within-machine concurrency, claudemesh for between-machine reach.",
|
||||
tone: "compare",
|
||||
},
|
||||
{
|
||||
label: "vs. OpenClaw",
|
||||
title: "Autonomous agents that run while you sleep",
|
||||
@@ -35,10 +43,10 @@ const CARDS: Card[] = [
|
||||
},
|
||||
{
|
||||
label: "What claudemesh is",
|
||||
title: "The wire between Claude Code sessions",
|
||||
title: "The wire across machines, users, and orgs",
|
||||
theyDo:
|
||||
"Every Claude Code session today is an island. Context dies with the terminal. Skills and MCPs are per-developer. Teammates relay insights through Slack.",
|
||||
weDo: "claudemesh is one thing: a peer network for Claude Code. Share context, files, skills, MCPs, and slash commands across sessions — end-to-end encrypted. Host the broker on claudemesh.com or run it in your VPC. Same CLI either way.",
|
||||
"Every Claude Code session is an island unless you wrap it. Anthropic's Agent Teams now ties them together within one Unix user, one machine. Beyond that — across laptops, across team members, across companies — the gap is still wide.",
|
||||
weDo: "claudemesh is one thing: an end-to-end encrypted backbone where Claude Code sessions, autonomous agents, and humans coordinate across every boundary your existing tools stop at. Persistent state, topics, memory, and skills span every machine the mesh reaches. Host the broker on claudemesh.com or run it in your VPC. Same CLI either way.",
|
||||
tone: "claim",
|
||||
},
|
||||
];
|
||||
|
||||
112
docs/roadmap.md
112
docs/roadmap.md
@@ -382,24 +382,102 @@ one chokepoint per layer.
|
||||
| inbox.db | `recipient_pubkey` / `recipient_kind` columns | 1.34.11 |
|
||||
| outbox.db | `sender_session_pubkey` for routing | 1.34.0 |
|
||||
|
||||
### Known gaps tracked for follow-ups
|
||||
### Known gaps — status after the 2026-05-04 follow-up sprint
|
||||
|
||||
- `claudemesh launch` exports `CLAUDEMESH_CONFIG_DIR` /
|
||||
`CLAUDEMESH_IPC_TOKEN_FILE` into the parent shell; vars persist
|
||||
after the launched session exits and silently break subsequent
|
||||
CLI calls until unset. Fish lacks `unset`; users hit
|
||||
`set -e CLAUDEMESH_CONFIG_DIR`.
|
||||
- Broker `listPeers` ignores `--mesh` filter (server-side returns
|
||||
global peer set across all meshes regardless of the query
|
||||
param). Read-view noise only; doesn't affect correctness.
|
||||
- `kick` on a daemon's control-plane WS is effectively a no-op
|
||||
(it auto-reconnects within seconds). Wants either a mesh-admin
|
||||
cap check or a `presence pause [--mesh X]` verb.
|
||||
- Session capabilities don't exist as a first-class concept — a
|
||||
launched session inherits ALL of its parent member's grants.
|
||||
Parent attestation is just an existence proof; it doesn't carry
|
||||
a capability subset. Worth filling in before any cross-org
|
||||
use case lands.
|
||||
Three of the four 1.34.x triage gaps shipped in 1.34.14 + 1.34.15
|
||||
(2026-05-04). Gap #4 is spec'd and queued.
|
||||
|
||||
- ✅ **Stale `CLAUDEMESH_CONFIG_DIR` falls back** *(1.34.14)*. The
|
||||
env var no longer silently breaks subsequent CLI calls. When the
|
||||
inherited path points at a tmpdir that no longer exists,
|
||||
`paths.ts` warns once on stderr (TTY-only) with a shell-specific
|
||||
unset hint and falls back to `~/.claudemesh`. The dir-existence
|
||||
check (not `config.json`) keeps fresh-launch first-write working.
|
||||
- ✅ **`peer list --mesh <slug>` actually scopes** *(1.34.15)*.
|
||||
Diagnosis from the original triage was wrong — broker has been
|
||||
scoping correctly since 1.26.0 via `conn.meshId`. Bug was CLI-
|
||||
side: `tryListPeersViaDaemon()` was called with no argument in
|
||||
`commands/peers.ts:140` and `commands/launch.ts:407`. Both now
|
||||
forward the slug as `?mesh=<slug>`. `send.ts` cross-mesh hex-
|
||||
prefix resolution intentionally untouched.
|
||||
- ✅ **`kick` refuses no-op kicks on control-plane** *(1.34.15)*.
|
||||
Broker now skips peers where `peerRole === "control-plane"` and
|
||||
surfaces them in a new additive ack field
|
||||
`skipped_control_plane`; CLI reads it and points the user at
|
||||
`ban` (remove member) or `daemon down` (take a daemon offline
|
||||
locally). Soft `disconnect` keeps old behavior — useful when
|
||||
intentionally nudging a control-plane peer to re-authenticate.
|
||||
`PeerConn` gains a `peerRole` slot populated at both
|
||||
`connections.set` sites. The richer `presence pause [--mesh X]`
|
||||
verb (option (b) from the triage) deferred as its own feature.
|
||||
- 📋 **Session capabilities — spec only**. Launched sessions still
|
||||
inherit all member grants transitively. Spec at
|
||||
`.artifacts/specs/2026-05-04-session-capabilities.md` covers a v2
|
||||
parent attestation alongside v1 with an `allowed_caps[]` subset,
|
||||
broker enforcement as `intersection(member.peerGrants, session.
|
||||
allowed_caps)`, and a bonus `state-write` cap to close the "any
|
||||
session can clobber shared keys like `current-pr`" footgun.
|
||||
Default when no caps subset is declared = full member set
|
||||
(today's behavior; opt-in restriction). Ships behind a 1-week
|
||||
dry-run window before flipping enforcement, mirroring the
|
||||
original per-peer-capabilities rollout. ~1 sprint of focused
|
||||
work; queued behind v0.3.0 topic-encryption.
|
||||
|
||||
---
|
||||
|
||||
## v1.34.16 + broker — *continuous presence* — *shipped*
|
||||
|
||||
User report on 2026-05-05: `claudemesh peer list` returned zero
|
||||
peers despite running sessions. Diagnosis: half-dead WS connections
|
||||
that NAT/CGNAT silently dropped, with no application-layer staleness
|
||||
detection on either side. Linux TCP keepalive default ≈ 2hrs idle
|
||||
+ 11min probes — sessions stayed zombie for hours before the kernel
|
||||
RST'd the socket and the daemon's existing close-handler reconnect
|
||||
fired.
|
||||
|
||||
Two layers shipped together:
|
||||
|
||||
- **Liveness watchdogs** *(broker + CLI 1.34.16)*. Both sides now
|
||||
detect stalled WS in 75s instead of waiting for the kernel.
|
||||
- Broker: `PeerConn.lastPongAt` bumped on every `pong`. The 30s
|
||||
ping loop also calls `ws.terminate()` on conns whose pong is
|
||||
>75s stale, firing the close handler → existing peer_left
|
||||
cleanup.
|
||||
- Daemon: `ws-lifecycle.ts` adds an idle watchdog at 30s cadence,
|
||||
started after hello-ack. Bumps `lastActivity` on incoming
|
||||
message + ping + pong frames. Sends its own `sock.ping()` if
|
||||
activity is recent, `sock.terminate()` if idle >75s. Watchdog
|
||||
cleared on close + explicit close().
|
||||
- 100x improvement on detection time (2hrs → 75s).
|
||||
- **Lease model** *(broker only, no protocol change)*. Peers no
|
||||
longer see `peer_left`/`peer_joined` for transient reconnects.
|
||||
- `PeerConn` gains `leaseState` ("online"|"offline"), `leaseUntil`,
|
||||
`evictionTimer`. On WS close, the conn enters **offline-leased**
|
||||
state for 90s instead of immediate cleanup.
|
||||
- `handleHello` and `handleSessionHello` check for an offline-
|
||||
leased entry matching the stable identity before running session-
|
||||
id dedup. On match: clear `evictionTimer`, swap `ws`, restore
|
||||
online state, drain queued DMs, return `silent: true`. The
|
||||
hello dispatcher skips the peer_joined broadcast.
|
||||
- `evictPresenceFully` extracted from the close handler — runs
|
||||
the peer_left broadcast + cleanup (URL watches, streams, MCP
|
||||
registry, clock auto-pause). Called by `evictionTimer` after 90s
|
||||
grace, or directly when no lease was online (defensive).
|
||||
- `broker.ts` exports `restorePresence(presenceId)` — clears
|
||||
`disconnectedAt` + bumps `lastPingAt`, called on reattach to
|
||||
undo the DB-level stale-presence sweeper if it fired during
|
||||
grace.
|
||||
- DMs sent during grace fall through to the existing message_queue
|
||||
path (sendToPeer no-ops on dead WS, queue row stays with
|
||||
deliveredAt=NULL, drained on reattach). Backward compatible
|
||||
with old daemons.
|
||||
|
||||
Spec at `.artifacts/specs/2026-05-05-continuous-presence.md`.
|
||||
Layer 3 (resume token to skip full attestation on reconnect) deferred
|
||||
— pure optimization, not needed for the user-visible "no
|
||||
invisibility moment" goal.
|
||||
|
||||
*Shipped 2026-05-05.*
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user