9 Commits

Author SHA1 Message Date
Alejandro Gutiérrez
1b28550f30 docs(roadmap): v1.34.16 + broker — continuous presence shipped
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
Watchdogs (75s stale detect) and lease model (90s grace window for
silent reconnects) both shipped 2026-05-05.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:41:25 +01:00
Alejandro Gutiérrez
9d1b4f3d4c feat(broker): lease model — 90s grace window across WS reconnects
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
Continuous presence: peers no longer see peer_left/peer_joined for
transient WS reconnects. After a WS close, the connection enters a
90s grace window in offline-leased state. If the same session
reconnects (matched by sessionPubkey, or sessionId+memberPubkey for
member-WS) within grace, it silently swaps the WS reference, restores
online state, drains queued DMs, and resets the DB row. No peer ever
sees the session leave.

Mechanics:
- PeerConn gains leaseState ("online"|"offline"), leaseUntil, evictionTimer
- ws.on("close") starts grace instead of immediate cleanup; old
  socket close after a reattach is detected (conn.ws !== ws) and
  ignored, since the lease is already healthy on the new socket
- handleHello / handleSessionHello check for offline-leased entry
  matching the stable identity BEFORE running session-id dedup;
  reattach swaps ws, resets state, returns silent: true
- The hello dispatcher skips peer_joined broadcast when result.silent
- evictPresenceFully extracted from the close handler — runs the
  peer_left broadcast + cleanup (URL watches, streams, MCP registry,
  clock auto-pause). Called by evictionTimer after 90s, or directly
  if lease wasn't online (defensive)
- Stale-pong watchdog skips offline-leased entries (their WS is
  intentionally dead during grace)
- broker.ts exports restorePresence(presenceId) — clears
  disconnectedAt + bumps lastPingAt, called on reattach to undo any
  damage the DB-level stale-presence sweeper may have done during
  grace

DMs sent to a session in grace fall through to today's existing
queueing path (sendToPeer no-ops on dead WS, the message_queue row
sits with deliveredAt=NULL, drained on reattach via the existing
maybePushQueuedMessages call). No protocol change. No DB schema
change. Backward compatible — old daemons against this broker get
silent reconnects within 90s, full peer_joined cycle beyond.

Layer 2 of the continuous-presence work; spec at
.artifacts/specs/2026-05-05-continuous-presence.md. Layer 3
(daemon-side resume token storage + send) is optional polish, not
needed for the user-visible behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:31:55 +01:00
Alejandro Gutiérrez
ffd0621ccc feat(broker,cli): liveness watchdogs — 75s stale-pong terminate
Some checks failed
CI / Typecheck (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
Both sides now actively detect half-dead WS connections instead of
waiting for kernel TCP keepalive (~2hrs default on Linux). Bug user
reported: "claudemesh peer list" shows zero peers despite running
sessions, because NAT/CGNAT silently dropped the WS flow but neither
side noticed.

Broker (apps/broker/src/index.ts):
- Add lastPongAt to PeerConn, populate at connections.set sites,
  bump in ws.on("pong").
- 30s ping loop now also terminates conns whose pong is >75s stale.
  ws.terminate() fires the close handler → existing peer_left path.

Daemon (apps/cli/src/daemon/ws-lifecycle.ts):
- Add idle watchdog at 30s cadence, started after hello-ack.
- Bumps lastActivity on incoming message, ping, and pong frames.
- Sends sock.ping() if recent activity, terminates if idle >75s.
- Watchdog cleared on close handler + explicit close().

CLI 1.34.15 → 1.34.16. Broker stays 0.1.0 (deploys from main).

Spec: .artifacts/specs/2026-05-05-continuous-presence.md (full lease
model + resume token, this commit ships only the watchdogs — first
of four progressive layers).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 11:22:15 +01:00
Alejandro Gutiérrez
b9ecbe79ad feat(web): refresh Latest News toaster — current shipped work
Some checks failed
CI / Typecheck (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
Replace four April-vintage entries (claudemesh launch v0.1.4, Mesh
Dashboard placeholder, MCP bridge placeholder, "SQLite-backed"
self-host) with the four most recent shipped milestones: kick refuses
control-plane (v1.34.15), 1.34.x multi-session correctness train,
per-session presence (v1.30.0), multi-mesh daemon (v1.26.0). All
entries link to /changelog instead of dead "#" hrefs or the old
github.com/alezmad/claudemesh-cli repo.

Copy passes Strunk: active voice, concrete versions, no puffery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:48:24 +01:00
Alejandro Gutiérrez
33051b95bf feat(web): marketing audit — Agent Teams positioning, MCP/dashboard claims fixed
Some checks failed
CI / Typecheck (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
Comprehensive review of all home-page marketing components against
the post-correction positioning. Five surgical fixes, zero hand-waving.

CTA copy. The previous "Anthropic built Claude Code per developer.
The next unlock is between developers." was a strong line in 2025
but Anthropic Agent Teams (Feb 2026) IS now between-developers
within one machine. Replaced with the accurate distinction:
"Anthropic Agent Teams stops at the edge of one laptop. claudemesh
starts there — across machines, users, and organizations."

WhereMeshFits — new "vs. Agent Teams" comparison card. The single
most important card the page can have right now. Most readers
arriving in May 2026 know about Agent Teams; the comparison they
want to read is exactly this one. Also tightened the "What
claudemesh is" claim card to lean into "across machines, users,
orgs" instead of the narrower "peer network for Claude Code"
framing.

FAQ — three updates:
  1. "How is this different from MCP?" was claiming "43 tools that
     let peers message, share files…" which contradicted v1.5.0's
     ship of tool-less MCP (tools/list returns []). Replaced with
     the actual current architecture: thin push-pipe + resource-
     noun-verb CLI bundled as a skill.
  2. New entry "How is this different from Anthropic's Agent
     Teams?" — the biggest gap in the FAQ given the new ecosystem.
     Same shape as the WhereMeshFits card so the messaging stays
     consistent across surfaces.
  3. "Can a peer be in multiple meshes?" updated to reflect
     v1.26.0's universal multi-mesh daemon (was speaking about it
     as roadmap; it's been shipped for ~2 days). Bridge peers
     promoted from "v0.2 roadmap" to "shipped in v0.2.0 (v1.6.0)".
  4. "Free during public beta" no longer claims paid tiers launch
     "when the dashboard ships" — dashboard already shipped (v1.5+
     web chat, v1.7 demo cut). Replaced with team-scale features
     (SSO, audit retention, dedicated brokers) as the pricing
     trigger.

Pricing card — same "dashboard ships" → "team-scale features"
language fix as the FAQ pricing entry. Single source of truth
maintained between FAQ + Pricing card.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 23:10:27 +01:00
Alejandro Gutiérrez
64d9f9f6f9 feat(web): refresh marketing site — accurate timeline, live changelog, cross-boundary positioning
Some checks failed
CI / Typecheck (push) Has been cancelled
CI / Lint (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
The site had drifted ~6 months behind the product. Three problems
addressed in one push:

1. Timeline ("Shipped, not promised") topped out at v0.6–0.8 and
   claimed "66 npm releases" — both stale. Adds a v0.9 → 1.34 tier
   covering daemon, multi-mesh, multi-session correctness train,
   refuse-to-kick on control-plane, env-var fallback. Updates count
   to "120+ npm releases through v1.34.15." Rewrites the "next"
   block from the now-shipped "Daemon redesign · per-topic
   encryption" to the actually-pending "HKDF cross-machine identity
   · session capabilities · A2A interop · self-host packaging ·
   federation."

2. Hero subhead leaned into the original "Claude Code peer mesh"
   framing, which is undercut by Anthropic Agent Teams (Feb 2026,
   single-machine native mailbox). Now reframes claudemesh as the
   encrypted backbone where Claude Code sessions, autonomous
   agents, and humans coordinate "across machines, across users,
   across organizations" — the four words that distinguish the
   product from anything Anthropic structurally can ship from
   inside Claude Code.

3. /changelog had three entries from April 2026 (v0.1.2 → v0.1.4)
   and was 70+ versions out of date. Replaced with a curated
   16-entry timeline from v0.1.0 → v1.34.15, hand-picked to tell
   the story (load-bearing ships, not every patch). Adds links
   back to docs/roadmap.md, .artifacts/specs/, and GitHub Releases.

New module: apps/web/src/modules/marketing/home/changelog-data.ts
holds the curated entries as a single source of truth. Imported by
both the /changelog page and a new home-page component
LatestReleases (compact 5-entry strip, slotted between Timeline
and Pricing) so they never disagree.

Misc fixes pulled in:
- timeline.tsx had glyph="layers" which isn't in SectionIcon's
  valid set; switched to "grid" (changelog-data.ts uses same).
- changelog data extracted to a non-route module so Next.js's
  route-export validator stops complaining about exporting
  CHANGELOG_ENTRIES from app/.../changelog/page.tsx.

Pre-existing typecheck noise in packages/ui/web/sidebar.tsx
(csstype version mismatch) + billing modules unrelated to this
change. My files all typecheck clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:55:30 +01:00
Alejandro Gutiérrez
7f61a711f1 docs(roadmap): mark 1.34.x triage gaps 1-3 shipped, gap 4 spec'd
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
Updates the "Known gaps tracked for follow-ups" subsection of the
v1.34.x section to reflect the 2026-05-04 follow-up sprint:

- Gap 1 (stale CLAUDEMESH_CONFIG_DIR) shipped in 1.34.14.
- Gap 2 (peer list --mesh scope) shipped in 1.34.15. Notes the
  diagnosis correction — bug was CLI-side, not broker.
- Gap 3 (kick no-op on control-plane) shipped in 1.34.15 as
  refuse-with-hint. Richer presence-pause verb deferred.
- Gap 4 (session capabilities) has a written spec at
  .artifacts/specs/2026-05-04-session-capabilities.md;
  implementation queued behind v0.3.0 topic-encryption.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 22:05:30 +01:00
Alejandro Gutiérrez
96520394ff docs(spec): session capabilities — first-class concept
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
Spec for the gap #4 follow-up from the 1.34.x triage. Builds on
2026-04-15-per-peer-capabilities.md (member-keyed recipient grants)
by adding a sender-side cap subset on session attestations: parent
member signs {session_pubkey, allowed_caps[], expires_at}, broker
enforces intersection of recipient grants × session caps on every
protected operation.

v2 attestation alongside v1 (different canonical prefix
"claudemesh-session-attest-v2|..." → no collision). Default when
no caps subset is declared = full member caps (today's behavior;
opt-in restriction, not breaking).

CLI surface: claudemesh launch --caps dm,read. Bonus: set_state
gate (state-write cap) ships in the same release — closes the
"any session can clobber shared keys like current-pr" footgun.

Migration: dry-run mode for one release before flipping
enforcement. Mirrors the original per-peer-capabilities rollout.

Estimate: ~1 sprint + 1 week dry-run window.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 21:59:18 +01:00
Alejandro Gutiérrez
a2a53ff355 feat(cli,broker): 1.34.14 + 1.34.15 — env-var fallback, peer list scope, kick refuses control-plane
Three follow-ups from the 1.34.x multi-session correctness train,
all backwards-compatible.

1.34.14 — stale CLAUDEMESH_CONFIG_DIR falls back. The launch flow
exposes CLAUDEMESH_CONFIG_DIR=<tmpdir> to its spawned claude; if a
later claudemesh invocation inherited that env (Bash tool inside
Claude Code, tmux update-environment, exported var), the inherited
path pointed at a tmpdir that no longer existed and readConfig()
silently returned empty. paths.ts now memoizes resolution: env unset
→ default; env points at a real dir → trust it; env set but dir gone
→ TTY-only stderr warning with shell-specific unset hint, fall back
to ~/.claudemesh.

1.34.15 — peer list --mesh actually scopes. peers.ts and launch.ts
were calling tryListPeersViaDaemon() with no argument; the daemon's
?mesh= filter (server-side, since 1.26.0) was already correct, the
CLI just wasn't passing the slug. Forwarding fixed in both sites;
send.ts cross-mesh hex-prefix resolution intentionally untouched.

1.34.15 — kick refuses no-op kicks on control-plane. Pre-1.34.15
kicking a daemon's member-WS just closed the socket and triggered
auto-reconnect — a no-op with a misleading "session ended" message.
Broker now skips peers where peerRole === "control-plane" and
surfaces them in a new additive ack field skipped_control_plane;
the CLI reads it and prints a clearer hint pointing at ban / daemon
down. Soft disconnect verb keeps old behavior. PeerConn gains a
peerRole slot populated at both connections.set sites.

Tests: 4 new for paths-stale-env, 5 for kick-control-plane-skip.
CLI 87/87 green; broker 55/55 unit green (integration tests
pre-existing infra failure on this machine).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 21:59:06 +01:00
25 changed files with 1994 additions and 187 deletions

View File

@@ -0,0 +1,288 @@
# Session capabilities — first-class concept
**Status:** spec, queued behind v0.3.0 topic-encryption work.
**Owner:** alezmad
**Author:** Claude (Sprint B follow-up, 2026-05-04)
**Related:** `2026-04-15-per-peer-capabilities.md` (existing per-peer
caps system, member-keyed), `2026-05-04-per-session-presence.md`
(per-launch session presence — what we're now restricting).
## Problem
Per-peer capability grants (`apps/broker/src/index.ts:2178+, 2309+`)
are keyed on the sender's **stable member pubkey**. The grant model
gives the recipient fine-grained control: "alice can DM me",
"bob can read state but not broadcast", etc.
But: as of v1.30.0 (`per-session-presence`), every `claudemesh
launch` mints a per-launch ephemeral keypair with a parent attestation
binding it to the member identity. The launched session inherits **all**
the member's capabilities transitively, because cap enforcement always
falls through to the member key.
Concretely:
- Member `alice` is in mesh `flexicar`, granted `dm + state-read +
state-write` by everyone.
- Alice launches a session with `claudemesh launch` to do an automated
task — say, run a Claude Code agent that iterates over PRs.
- That session has full member privileges. It can DM peers, write
shared state keys (e.g. clobber `current-pr`), grant new caps, ban
members, etc. — none of which the user wanted to delegate.
There is no way to express "this session can DM peers but cannot
deploy services or grant caps." The parent attestation is a binary
existence proof — "this session was vouched by a member" — with no
capability subset.
Plus an adjacent footgun: `set_state` (`apps/broker/src/index.ts:2949`)
has **no cap check at all**. Anyone in the mesh can write any key. The
spec at `2026-04-15-per-peer-capabilities.md` lists `state-write` as a
planned cap but it was never wired into the broker. Shared keys like
`current-pr` are write-anyone today.
## Goal
A launched session can be issued **a capability subset** of its
parent member, signed by the parent at launch time, and the broker
enforces the **intersection** of recipient grants × session caps on
every protected operation.
## Non-goals
- Changing the existing per-peer cap model. Member-keyed grants stay
authoritative for "who is allowed to talk to me."
- Cross-machine session caps (waiting on 2.0.0 HKDF identity).
- Per-tool granularity inside the Claude Code MCP surface — this
spec only covers the broker-enforceable verbs (dm, broadcast,
state-read, state-write, grant, kick, ban, profile-write,
service-deploy).
- Delegation: a session cannot re-vouch a sub-session with its own
cap subset. Only members can attest sessions. (Could be lifted in
a future spec; today's launch flow doesn't need it.)
## Design
### Capability vocabulary
Existing (today, member-level):
| Capability | Effect when GRANTED on a recipient → sender pair |
|---------------|---------------------------------------------------|
| `read` | Sender appears in recipient's `list_peers` |
| `dm` | Sender can DM recipient |
| `broadcast` | Sender's broadcasts reach recipient |
| `state-read` | Sender can read shared state |
| `state-write` | (planned) Sender can write shared state |
| `file-read` | Sender can fetch files recipient shared |
New (session-level — cap subset on the attestation):
These are the **verbs the session is allowed to invoke**, NOT what
peers can do TO it. A session attestation declaring `["dm", "read"]`
means the session can SEND dm/read-list operations; it cannot
broadcast, write state, grant, etc.
| Session cap | Gates which broker operations |
|-------------------|------------------------------------------------|
| `dm` | `send` with single recipient |
| `broadcast` | `send` with `*`, `@group`, `#topic` |
| `state-read` | `get_state`, `list_state` |
| `state-write` | `set_state` |
| `grant` | `grant`, `revoke`, `block` |
| `kick` | `kick`, `disconnect` |
| `ban` | `ban`, `unban` |
| `profile-write` | `set_profile`, `set_summary`, `set_status` |
| `service-deploy` | `mesh_service_register`, `_unregister` |
The default cap set when no subset is declared: the **full member
set** (today's behavior — opt-in restriction, not breaking).
### Attestation v2
Existing v1 (`apps/cli/src/services/broker/session-hello-sig.ts`):
```
canonical = `claudemesh-session-attest|<parent>|<session>|<expires>`
```
New v2 (additive — broker accepts both):
```
canonical = `claudemesh-session-attest-v2|<parent>|<session>|<expires>|<sorted-caps-csv>`
```
Where `<sorted-caps-csv>` is the lower-cased, comma-joined,
ASCII-sorted cap list. Empty-list = full member caps (default,
back-compat).
**Wire shape additions on `session_hello`:**
```ts
{
type: "session_hello",
...existing fields...,
parentAttestation: {
sessionPubkey,
parentMemberPubkey,
expiresAt,
signature,
// NEW:
allowed_caps?: string[], // omitted = full member set
version?: 2, // omitted = v1
},
}
```
The broker version-detects: `version === 2` → verify v2 canonical
including `allowed_caps`. Default behavior is unchanged for clients
that don't pass it.
### Enforcement
Add `allowed_caps: string[] | null` to the in-memory `PeerConn`
shape (`apps/broker/src/index.ts:131`). Populated from
`handleSessionHello` (the v2 attestation supplies it) and from
`handleHello` (control-plane / member connection — set to `null`,
meaning "full member caps").
**Effective cap check** for a sending peer needing `cap`:
```ts
function senderHasCap(conn: PeerConn, cap: string): boolean {
if (conn.allowed_caps === null) return true; // member-level, no subset
return conn.allowed_caps.includes(cap);
}
```
Wire this into every broker operation in the table above. The
existing per-peer recipient-cap check at `2178+, 2309+` stays —
session caps gate the **sender side**, recipient grants gate the
**receive side**, and both must allow:
```
allowed = senderHasCap(conn, capNeeded) && recipientGrants[sender][capNeeded]
```
### `set_state` gate (bonus, ship together)
Today: no cap check. After this spec: `set_state` requires
`state-write` on the sender side. Migration: existing members
default to having `state-write` in their member caps (no recipient
grant model for state-write — it's a sender-side gate only, mesh-
wide). New attestations can omit it to forbid the session.
The recipient-side analog (per-peer state-write grants) is left for
a future spec — today the value of guarding state-write is
session-level (avoid an automated session clobbering shared keys),
not peer-level.
### CLI surface
```
claudemesh launch --caps dm,read # tight: read-only chat agent
claudemesh launch --caps dm,broadcast # send-only, no state writes
claudemesh launch # default: full member caps
```
`claudemesh launch --caps ?` prints the table above with descriptions.
`claudemesh peer list --json` includes `allowed_caps` per row when
present (`null` = full member). Lets users audit what their running
sessions can actually do.
### Migration plan (mirrors `2026-04-15-per-peer-capabilities.md` §"Migration plan")
1. **Broker schema additive** — `PeerConn.allowed_caps` in-memory
only; no DB column. Reload-on-reconnect is fine because the
attestation is re-sent on every WS open (it's the proof of
identity).
2. **CLI ships v2 attestation alongside v1.** New `--caps` flag
defaults to omitted (= v1 attestation, full caps). Older
brokers ignore the new fields entirely.
3. **Broker accepts v2.** When `allowed_caps` arrives, store it.
No enforcement yet — log denied operations as `cap_check_dryrun`
metric counter, still allow them through.
4. **Dry-run release.** Ship one CLI + broker release that emits
the metric but doesn't enforce. Watch for false positives in
real meshes for ≥ 1 week.
5. **Flip enforcement on.** Broker rejects operations failing the
cap check with `forbidden: missing session capability "<cap>"`.
Default ("no caps declared = full member") keeps existing
sessions unaffected.
6. **`set_state` gate** ships in step 5 alongside the rest. Default
member caps include `state-write`, so flipping it on doesn't
break existing flows. Only sessions that explicitly omit
`state-write` from `--caps` lose write access.
### Crypto notes
- v2 attestation re-uses `crypto_sign_detached` over the new
canonical string; same parent member secret key, same TTL caps
(≤24 h), same `expiresAt` semantics.
- v1 signatures are NOT v2 signatures — collision is impossible
because the canonical strings have different prefixes
(`claudemesh-session-attest` vs `claudemesh-session-attest-v2`).
Domain separation is intrinsic.
- Like the existing per-peer cap system: caps are server-enforced
metadata, not capability tokens. A malicious broker can ignore
them. This is about UX trust + footgun prevention, not protocol-
level security.
## Open questions
1. **Should the session attestation also bind to a fingerprint of
the launched binary / Claude version?** Would let a member say
"this session is constrained to Claude Code v1.34.15" so a
compromised launched-binary doesn't get reused. Probably no — too
much friction for the threat model.
2. **What's the right default for `claudemesh launch` going forward?**
Once enforcement ships, do we change the default `--caps` from
"full member" to "dm + read + state-read"? Tighter but breaks
existing automation that writes state. Probably worth a one-
release deprecation warning ("your session will lose state-write
in v2.0.0 unless you pass --caps state-write") and then flip in
v2.0.0.
3. **Does `--caps` belong in `~/.claudemesh/config.json` per-mesh
defaults too?** A user who always launches read-only agents
wants `caps: ["dm", "read"]` as a personal default. Easy add;
defer until users ask for it.
4. **Per-tool MCP cap surface?** Out of scope here, but: a `claudemesh
launch --tools peer:read,memory:write` would be a finer cut than
broker-verb caps. The broker can't enforce that — it'd live in the
MCP wrapper / Claude Code's allowedTools. Different layer.
## Test plan
- Pure-logic tests on `senderHasCap` (member-level → always true,
empty caps → always false, declared caps → exact match).
- Broker integration: launch a session with `--caps dm`, attempt
`set_state` → expect `forbidden: missing session capability
"state-write"`.
- v1 attestation still accepted, no `allowed_caps` set, all caps
permitted (back-compat).
- v2 attestation with empty `allowed_caps` array → broker treats
as "explicitly empty, no caps allowed" (NOT "full member"). The
full-member default is "field omitted entirely". Test both.
- Dry-run mode: cap fail increments the counter but the operation
proceeds. Smoke-test before flipping enforcement.
## Estimate
- Spec review + open-question resolution: 12 days.
- Broker change (PeerConn field, attestation v2 accept, per-verb
enforcement, dry-run mode): 23 days.
- CLI change (`--caps` flag, attestation builder, peer list
surface): 1 day.
- Tests: 1 day.
- Dry-run release window: ≥ 1 week.
Total: ~1 sprint of focused work, plus a dry-run window.

View File

@@ -0,0 +1,350 @@
# Continuous presence — lease model + resume token
**Status:** spec, ready for v0.3.0.
**Owner:** alezmad
**Author:** Claude (2026-05-05, follow-up to user-reported "after hours claudemesh disconnects")
**Related:** `2026-05-04-per-session-presence.md` (per-launch ephemeral keypair), `apps/broker/src/index.ts:5430-5436` (current 30s ping loop), `apps/cli/src/daemon/ws-lifecycle.ts` (current backoff reconnect).
## Problem
Today, presence is fused to a single TCP/WS connection. When the
connection breaks — half-dead NAT entries, ISP route changes, laptop
sleep, broker restart — the broker tears down the presence row, fires
`peer_left`, and waits for the daemon to dial a fresh socket and run
the full attestation hello again. Other peers see the user blink
offline → back online. Messages sent to the session during the gap are
either dropped (if it's a `now`/`next` priority DM with no recipient
match) or held in `message_queue` for `low` only.
Concrete symptom (user-reported): `claudemesh peer list` shows zero
peers despite multiple sessions being "up" — they're stuck on
half-dead TCP connections. Daemon hasn't noticed because no `close`
fired. Hours later, kernel TCP keepalive (default Linux: 7200s idle +
9 × 75s probes ≈ 2h11m) finally RSTs the socket, daemon's existing
backoff reconnects, peers reappear. Until then: zombie session.
Two coupled bugs:
1. **No application-layer staleness detection.** Broker pings every
30s (line 5431) and updates `lastPingAt` on pong, but never
`terminate()`s a connection that stops returning pongs. Daemon
doesn't ping at all. Both sides trust the kernel for liveness,
which only fires after hours.
2. **Presence == connection.** Even once the staleness IS detected
and the daemon reconnects, peers see a full `peer_left` /
`peer_joined` cycle for a network blip that took 130 seconds.
Outbound messages during the gap that target the session by
pubkey route to nothing.
The user's ask: peers should never see a gap during transient
disconnects. Presence should be continuous as long as the *session
intent* is alive, regardless of how many sockets carried it.
## Goal
Presence is a **lease** keyed off the session's stable identity
(`sessionPubkey`), held in broker memory + DB, with a TTL refreshed
on every keepalive. Sockets come and go beneath the lease. Other peers
see continuous online status across reconnects up to the lease TTL.
Specifically:
- A daemon (or per-session WS) can drop and re-establish the WS
within a configurable grace window (default 90s) without any peer
observing `peer_left` / `peer_joined`.
- Messages sent to a session while its socket is mid-flap are queued,
delivered on the next reattach, ordered.
- Reconnect itself is sub-second on the wire when a `resume_token` is
presented — broker recognises the session, restores the slot, no
re-attestation round-trip.
- After the grace window expires, the broker fires `peer_left`
exactly once; on a later reconnect it fires `peer_joined` exactly
once. No flapping.
## Non-goals
- **Multi-broker handoff.** Out of scope. If the broker process
restarts, leases are lost and we fall back to today's behavior
(clean reconnect, peers see one cycle). A future spec can address
this with a shared lease store (Redis / Postgres LISTEN).
- **Dual-socket on the daemon.** Useful gold-plating but not required
for the user-facing problem. Single-socket with watchdog +
resume-token covers the failure modes actually observed (NAT drops,
ISP blips, sleep <90s).
- **Manual `claudemesh reconnect` CLI.** Not needed; the lease model
makes it redundant. Re-evaluate if real support cases surface.
## Design
### Lease model
```
sessionPubkey → { transport: "online" | "offline",
leaseUntil: Date,
ws: WebSocket | null,
...existing PeerConn fields }
```
Today the `connections` Map IS keyed by `presenceId`, which is a fresh
UUID per WS. We change that key to `sessionPubkey` (member-WS:
`memberPubkey`; session-WS: `sessionPubkey`). The PeerConn struct
gains:
```ts
transport: "online" | "offline";
leaseUntil: Date; // Date.now() + LEASE_TTL_MS
evictionTimer: NodeJS.Timeout | null;
```
### State transitions
**On WS open + hello accepted (initial):**
- Insert into `connections` with `transport: "online"`,
`leaseUntil: now + 90s`, `evictionTimer: null`.
- Broadcast `peer_joined` (today's behavior).
- Issue `resume_token` (see below) in the `hello_ack`.
**On WS open + hello carries valid `resume_token`:**
- Look up by `sessionPubkey`, verify token signature + freshness
(TTL <= LEASE_TTL_MS). If valid AND entry exists with
`transport: "offline"`:
- Cancel `evictionTimer`.
- Swap `ws` reference.
- Set `transport: "online"`, refresh `leaseUntil`.
- **Do NOT** broadcast `peer_joined`. The lease never expired.
- Drain any queued DMs accumulated during offline window.
- Reply `hello_ack` with new `resume_token`.
- If entry exists with `transport: "online"` (token replay attack or
rapid reconnect race): close old `ws` with `1000, "session_replaced"`
before swapping. Same as today's `oldConn.ws.close(1000, ...)`
pattern at lines 1768/1996.
- If no entry exists or token is stale: treat as a fresh hello,
broadcast `peer_joined`. Token expired = same as a cold start.
**On WS close (any reason):**
- Look up by `sessionPubkey`. If not found, no-op (already evicted).
- Set `transport: "offline"`, clear `ws` reference.
- Start `evictionTimer = setTimeout(evict, GRACE_MS)`.
- **Do NOT** broadcast `peer_left`. **Do NOT** delete the entry.
- **Do NOT** call `disconnectPresence(presenceId)` yet.
**On `evictionTimer` fire (lease expired without reattach):**
- Delete from `connections`.
- Broadcast `peer_left` (today's behavior at lines 5167-5189).
- `decMeshCount`.
- `disconnectPresence(presenceId)`.
- Clean up URL watches, stream subs, MCP registry — same as today's
close handler.
- Audit `peer_left`.
**Watchdog (broker):**
- The 30s ping loop (line 5431) gains a staleness check: if any
conn's `transport === "online"` and `lastPingAt < now - 75s`, call
`ws.terminate()`. This converts the half-dead socket into a clean
`close` event, which fires the lease-offline transition above.
- Same logic on the daemon side (see § Daemon changes).
### Resume token
A short opaque string the broker hands the daemon in `hello_ack`.
Format: `mesh-resume.v1.<base64url(JSON-payload)>.<base64url(sig)>`
where `JSON-payload = { sub: <sessionPubkey>, mid: <meshId>, exp:
<unix-ms>, iat: <unix-ms> }` and `sig = ed25519(brokerSigningKey,
JSON-payload)`.
- **Why a token, not just sessionPubkey?** A session needs to prove
it's the holder of an existing lease without re-running the full
attestation handshake (which involves member key + parent
attestation lookup). The token is a server-issued cookie: cheap to
verify, scoped to a single session, expires with the lease.
- **Storage:** broker keeps the signing key in env (`RESUME_TOKEN_KEY`,
generated on first boot if missing, persisted to a config row). No
DB column needed for the tokens themselves — they're verified by
signature alone.
- **TTL:** equal to LEASE_TTL_MS (90s). After that the daemon must
re-handshake with full attestation. Refreshed on every successful
reattach.
- **Daemon storage:** in-memory only. Lost on daemon restart, which
is correct: a daemon restart is a real reconnect and should run
the full hello.
### Wire protocol additions
`hello` (member-WS, session-WS, fresh-launch hello — all three):
```diff
{
type: "hello",
memberPubkey: "...",
sessionPubkey: "...", // session-WS only
attestation: "...", // session-WS only
signature: "...",
+ resumeToken?: "mesh-resume.v1...", // optional; presence = reattach attempt
...
}
```
`hello_ack`:
```diff
{
type: "hello_ack",
presenceId: "...",
...
+ resumeToken: "mesh-resume.v1...", // always issued; replaces prior on reattach
+ leaseTtlMs: 90000, // informational; daemon may use for ping cadence
}
```
No new message types. Old daemons that don't send `resumeToken` get
today's full-handshake behavior — fully backward compatible.
### Message queue during grace window
Today: DMs to a presence whose WS is closed → routed to
`message_queue` only for `priority: low`; `now`/`next` either route
to a different connected session of the same member or drop.
Change: when broker would route to a session whose
`transport === "offline"` (lease still valid), enqueue regardless of
priority. On reattach, the existing inbox-drain path
(`maybePushQueuedMessages` at line 967) flushes them in order. The
`message_queue` already has the schema for this; we're just relaxing
the priority gate when the target is in grace.
### Constants
```ts
const LEASE_TTL_MS = 90_000; // grace window after WS close
const PING_INTERVAL_MS = 30_000; // unchanged
const STALE_PONG_THRESHOLD_MS = 75_000; // 2.5x ping interval
const RESUME_TOKEN_TTL_MS = LEASE_TTL_MS;
```
`LEASE_TTL_MS` = 90s rationale: long enough to absorb a sleep/resume
cycle, NAT timeout, ISP route flap, mobile→wifi handover. Short
enough that a true crash (daemon killed, machine off) clears the
session within 90s — peers don't see ghost online status forever.
Configurable via env (`LEASE_TTL_MS`) for self-hosted brokers.
## Daemon changes
### Watchdog
In `ws-lifecycle.ts`, add an `idleWatchdog` parallel to the existing
backoff/reconnect machinery:
```ts
let lastActivity = Date.now(); // bumped on every incoming message + pong
const watchdog = setInterval(() => {
if (Date.now() - lastActivity > STALE_THRESHOLD_MS) {
log("warn", "ws_stale_terminate", { url: opts.url });
sock.terminate(); // fires existing close handler → reconnect path
} else if (sock.readyState === sock.OPEN) {
sock.ping(); // matches broker's 30s cadence, gives broker a pong
}
}, PING_INTERVAL_MS);
sock.on("message", () => { lastActivity = Date.now(); });
sock.on("pong", () => { lastActivity = Date.now(); });
```
Cleanup `clearInterval(watchdog)` in the close handler and explicit
`close()` path.
### Resume token in hello
`apps/cli/src/daemon/broker.ts:136` and equivalent in
`session-broker.ts`: persist the `resumeToken` from each successful
`hello_ack` into a private field, include it in the next
`buildHello()` call. On daemon restart the field is empty → cold
start, exactly today's behavior.
### No CLI changes
`claudemesh peer list` keeps reading the broker's `connections` Map
which now reflects continuous presence. Users see online sessions as
online during transient blips. No UX surface changes.
## Migration
- New broker is fully backward compatible with old daemons (resume
token is optional, defaults fall through to today's path).
- New daemons against an old broker: token is sent but ignored, full
handshake runs each reconnect — same as today.
- DB migration: none. `presence` table semantics unchanged. The
`disconnectedAt` column is now set only on lease eviction (>90s),
not on every WS close. This is a behavioral change but not a
schema change.
- Add ENV var `RESUME_TOKEN_KEY` (broker generates on first boot if
unset, persists to a singleton config row).
## Test plan
1. **Sleep test:** kill -STOP the daemon for 60s, then kill -CONT.
Expect: peers never see `peer_left`. Daemon's WS is dead-on-arrival
when it wakes; watchdog terminates it; reconnect with resume_token
succeeds within 1-2s; lease was at ~30s of its 90s TTL when the
daemon resumed.
2. **Hard offline:** kill -STOP for 120s, kill -CONT. Expect: peers
see exactly one `peer_left` at t=90s, then exactly one
`peer_joined` after the daemon resumes and reconnects (resume
token is now stale; full handshake runs).
3. **NAT drop simulation:** `iptables -A OUTPUT -p tcp --dport 443
-j DROP` for 60s on the daemon host, then remove the rule. Expect:
broker pings stop landing, broker-side watchdog calls
`ws.terminate()` at t=75s, lease enters grace, daemon's own
watchdog fires within ~30s, daemon reconnects with resume_token,
peers never see a flap.
4. **Message-during-grace:** while a target session is in grace
(offline, lease valid), send a `priority: now` DM. Expect: queued
in `message_queue`, delivered exactly once on reattach, no
`peer_left` visible to sender, ack returns delivered.
5. **Replay attack:** capture a resume_token in flight, replay it
against a different broker connection while the original session
is still online. Expect: broker treats it as a reconnect for an
already-online session → closes old WS with `session_replaced`,
new WS takes over. Equivalent to today's session-replacement
semantics; the original session detects the close and either
reconnects (if it's still alive) or gives up.
6. **Token forgery:** send a `resumeToken` not signed by the broker.
Expect: signature check fails, broker treats hello as a fresh
handshake (or rejects if the rest of the hello is invalid).
## Open questions
- **Should `peer list` expose a `transport` field** so callers can
distinguish "leased but offline" from "online"? Default no — the
abstraction we're selling is "they're online." But debugging may
want it; gate it behind `--all` or `--debug`.
- **What about the broker-side `mcpRegistry` cleanup?** Today we
delete non-persistent MCP entries on WS close (line 5217). With
leases, we should defer that to lease eviction, not WS close.
Otherwise an MCP server registered by a session disappears every
time its WS reconnects.
## Build order
1. **Broker lease model** — change `connections` keying, add
`transport`/`leaseUntil`/`evictionTimer`, refactor close handler
to start grace timer instead of immediate teardown, refactor
eviction path. (~80 lines.)
2. **Resume token** — signing key bootstrap, token issue/verify,
wire format, hello_ack changes. (~50 lines + 1 config row.)
3. **Daemon watchdog** — `ws-lifecycle.ts` adds `idleWatchdog` and
stores `resumeToken` from acks. (~25 lines.)
4. **Daemon hello** — pass `resumeToken` in next `buildHello()`.
(~10 lines across `broker.ts` + `session-broker.ts`.)
5. **Broker watchdog** — extend the 30s ping loop with
`terminate()`-on-stale logic. (~15 lines.)
6. **Queue-during-grace** — relax priority gate in DM routing.
(~5 lines.)
7. **Spec docs** — update `docs/protocol.md` with resume_token,
lease semantics. (~30 lines.)
8. **Tests** — six scenarios above. Likely ~3 new test files.
Estimated total: one focused day. The broker lease model is the load-
bearing change; everything else slots in cleanly once that's done.

View File

@@ -427,6 +427,21 @@ export async function heartbeat(presenceId: string): Promise<void> {
.where(eq(presence.id, presenceId));
}
/**
* Restore a presence row to online state on lease reattach: clear
* `disconnectedAt` and bump `lastPingAt`. Needed because the DB-level
* stale-presence sweeper may have flipped the row to disconnected
* during the grace window — the lease is in-memory truth, but other
* code paths read presence.disconnectedAt directly.
*/
export async function restorePresence(presenceId: string): Promise<void> {
const now = new Date();
await db
.update(presence)
.set({ disconnectedAt: null, lastPingAt: now })
.where(eq(presence.id, presenceId));
}
// --- Peer discovery ---
/** Return all active (connected) presences in a mesh, joined with member info. */

View File

@@ -41,6 +41,7 @@ import {
grantFileKey,
handleHookSetStatus,
heartbeat,
restorePresence,
insertFileKeys,
joinGroup,
joinMesh,
@@ -156,11 +157,53 @@ interface PeerConn {
bio?: string;
capabilities?: string[];
};
/** v2 agentic-comms presence taxonomy. Mirrors the value passed to
* `recordPresence`. Used by the kick handler to refuse no-op kicks
* on long-lived control-plane connections (daemon, dashboard) that
* would just auto-reconnect. */
peerRole: "control-plane" | "session" | "service";
/** Last time this connection's WS replied to a broker ping. Bumped
* in the `pong` handler. Used by the staleness watchdog to detect
* half-dead TCP/NAT-dropped connections that the kernel hasn't yet
* RST'd (Linux default keepalive ≈ 2hrs). */
lastPongAt: number;
/** Lease state: "online" while the WS is healthy, "offline" during
* the GRACE window after a WS close. While offline, the entry stays
* in `connections` so peer_list / sendToPeer still see it; DMs land
* in the message_queue (sendToPeer no-ops on dead WS, but the queue
* row stays with deliveredAt=NULL and drains on reattach). After
* GRACE_MS without a reattach, evictionTimer fires the full peer_left
* + cleanup. Reattach (same sessionPubkey hello arriving on a fresh
* WS) cancels the timer, swaps in the new ws, restores online. */
leaseState: "online" | "offline";
/** When the lease will be evicted if no reattach happens. 0 when online. */
leaseUntil: number;
/** Timer that fires evictPresenceFully(presenceId) at leaseUntil. null when online. */
evictionTimer: NodeJS.Timeout | null;
}
const connections = new Map<string, PeerConn>();
const connectionsPerMesh = new Map<string, number>();
/**
* Lease grace window — how long after a WS close the broker will hold
* the presence row open before evicting and broadcasting peer_left.
*
* 90s: long enough to absorb a sleep/resume cycle, NAT timeout, ISP
* route flap, mobile→wifi handover, broker restart of the daemon's
* machine. Short enough that a true crash (machine off, daemon killed)
* clears the session within 90s — peers don't see ghost online status
* forever.
*
* During grace: lease stays in `connections`, peer_list keeps showing
* the session as online to other peers, DMs route through message_queue
* (sendToPeer no-ops on dead WS, drain happens on reattach). On
* reattach (same sessionPubkey hello on a new WS): silent swap, no
* peer_joined / peer_left visible to anyone. After grace expires:
* full eviction (peer_left + cleanup) fires exactly once.
*/
const GRACE_MS = 90_000;
// Rate limiter for /tg/token endpoint (IP → count, cleared hourly)
const tgTokenRateLimit = new Map<string, number>();
setInterval(() => tgTokenRateLimit.clear(), 60 * 60_000).unref();
@@ -525,6 +568,97 @@ function sendToPeer(presenceId: string, msg: WSServerMessage): void {
}
}
/**
* Run the full presence-cleanup path: broadcast peer_left, decMeshCount,
* disconnectPresence in DB, audit, clean up URL watches / streams /
* MCP entries / clock. Removes the entry from `connections`.
*
* Called from two places:
* 1. `ws.on("close")` when the closing WS belongs to a connection
* with no active lease (no grace) — i.e. the lease had already
* been evicted, or the close fires before lease is established.
* 2. The grace-window evictionTimer when no reattach happened in
* GRACE_MS. This is the "presence is really gone" path.
*
* Idempotent: re-entering when the connections entry is already gone
* is a no-op.
*/
async function evictPresenceFully(presenceId: string): Promise<void> {
const conn = connections.get(presenceId);
if (!conn) return; // already evicted
if (conn.evictionTimer) {
clearTimeout(conn.evictionTimer);
conn.evictionTimer = null;
}
connections.delete(presenceId);
decMeshCount(conn.meshId);
const leaveMsg: WSPushMessage = {
type: "push",
subtype: "system",
event: "peer_left",
eventData: {
name: conn.displayName,
pubkey: conn.sessionPubkey ?? conn.memberPubkey,
},
messageId: crypto.randomUUID(),
meshId: conn.meshId,
senderPubkey: "system",
priority: "low",
nonce: "",
ciphertext: "",
createdAt: new Date().toISOString(),
};
for (const [pid, peer] of connections) {
if (peer.meshId !== conn.meshId) continue;
// Don't tell the user's own other sessions they "left" when one
// of their Claude Code instances closes. Same pubkey = same user.
if (peer.memberPubkey === conn.memberPubkey) continue;
sendToPeer(pid, leaveMsg);
}
await disconnectPresence(presenceId);
void audit(conn.meshId, "peer_left", conn.memberId, conn.displayName, {});
// URL watches owned by this presence — interval would otherwise
// happily fetch forever after the peer is gone.
for (const [watchId, watch] of urlWatches) {
if (watch.presenceId === presenceId) {
clearInterval(watch.timer);
urlWatches.delete(watchId);
}
}
// Stream subscriptions for this presence.
for (const [key, subs] of streamSubscriptions) {
subs.delete(presenceId);
if (subs.size === 0) streamSubscriptions.delete(key);
}
// MCP servers registered by this presence.
for (const [key, entry] of mcpRegistry) {
if (entry.presenceId === presenceId) {
if (entry.persistent) {
// Keep persistent entries but mark offline
entry.online = false;
entry.offlineSince = new Date().toISOString();
entry.presenceId = "";
} else {
mcpRegistry.delete(key);
}
}
}
// Auto-pause clock when mesh becomes empty.
if (!connectionsPerMesh.has(conn.meshId)) {
const clock = meshClocks.get(conn.meshId);
if (clock && clock.timer) {
clearInterval(clock.timer);
clock.timer = null;
clock.paused = true;
log.info("clock auto-paused (mesh empty)", { mesh_id: conn.meshId });
}
}
log.info("ws evict full", { presence_id: presenceId });
}
async function maybePushQueuedMessages(
presenceId: string,
excludeSenderSessionPubkey?: string,
@@ -1661,6 +1795,10 @@ async function handleHello(
lastSeenAt?: string;
restoredGroups?: Array<{ name: string; role?: string }>;
restoredStats?: unknown;
/** True when this hello reattached an existing offline lease — caller
* must skip the peer_joined broadcast and the services-list ack
* augmentation. The session was never visibly absent from peers. */
silent?: boolean;
} | null> {
// Validate sessionPubkey shape — it becomes a routable identity in
// listPeers/drainForMember, so arbitrary strings let a client claim
@@ -1753,6 +1891,61 @@ async function handleHello(
const initialGroups = helloHasGroups
? hello.groups!
: (saved?.groups?.length ? saved.groups : (member.defaultGroups ?? []));
// Reattach check: if an offline-leased lease exists for the same
// stable identity (sessionPubkey when present, otherwise sessionId
// for member-WS), this hello is a transient reconnect within the
// grace window — swap the WS reference, clear the eviction timer,
// restore online state. No peer_joined broadcast — peers never saw
// this session leave.
for (const [pid, oldConn] of connections) {
if (oldConn.meshId !== hello.meshId) continue;
if (oldConn.leaseState !== "offline") continue;
const matchByPubkey =
!!hello.sessionPubkey
&& oldConn.sessionPubkey === hello.sessionPubkey;
const matchBySessionId =
!hello.sessionPubkey
&& !oldConn.sessionPubkey
&& oldConn.sessionId === hello.sessionId
&& oldConn.memberPubkey === hello.pubkey;
if (!matchByPubkey && !matchBySessionId) continue;
if (oldConn.evictionTimer) {
clearTimeout(oldConn.evictionTimer);
oldConn.evictionTimer = null;
}
oldConn.ws = ws;
oldConn.leaseState = "online";
oldConn.leaseUntil = 0;
oldConn.lastPongAt = Date.now();
// Refresh mutable fields from the new hello — the same session may
// have moved cwd / changed display name across the blip.
oldConn.cwd = hello.cwd;
if (hello.displayName) oldConn.displayName = hello.displayName;
log.info("ws hello reattach (lease)", {
presence_id: pid,
session_pubkey: hello.sessionPubkey?.slice(0, 12) ?? "(member-WS)",
session_id: hello.sessionId,
});
// Reset DB row to online: the stale-presence sweeper may have set
// disconnectedAt during the grace window. Lease is in-memory truth
// but downstream code paths read presence.disconnectedAt directly.
void restorePresence(pid);
// Drain any queued DMs that landed during the offline window.
void maybePushQueuedMessages(pid);
return {
presenceId: pid,
memberDisplayName: oldConn.displayName,
memberProfile: {
roleTag: member.roleTag,
groups: member.defaultGroups ?? [],
messageMode: member.messageMode ?? "push",
},
meshPolicy,
silent: true,
};
}
// Session-id dedup: if this session_id already has an active presence,
// disconnect the ghost. Happens when a client reconnects after a
// network blip or broker restart before the 90s stale sweeper runs.
@@ -1797,6 +1990,11 @@ async function handleHello(
groups: initialGroups,
visible: saved?.visible ?? true,
profile: saved?.profile ?? {},
peerRole: "control-plane",
lastPongAt: Date.now(),
leaseState: "online",
leaseUntil: 0,
evictionTimer: null,
});
incMeshCount(hello.meshId);
void audit(hello.meshId, "peer_joined", member.id, effectiveDisplayName, {
@@ -1853,6 +2051,10 @@ async function handleSessionHello(
memberDisplayName: string;
memberProfile?: unknown;
meshPolicy?: Record<string, unknown>;
/** True when this hello reattached an existing offline lease — caller
* must skip the peer_joined broadcast. The session was never visibly
* absent from peers. */
silent?: boolean;
} | null> {
// Shape checks. The crypto helpers also enforce these but bailing
// early gives a clearer error code on the wire.
@@ -1982,6 +2184,42 @@ async function handleSessionHello(
const initialGroups = hello.groups ?? member.defaultGroups ?? [];
// Reattach check: an offline-leased connection with the same
// sessionPubkey is the same launched session resuming inside the
// grace window. Cancel the eviction timer, swap the WS, restore
// online state. No peer_joined broadcast — peers never saw the
// session leave.
for (const [pid, oldConn] of connections) {
if (oldConn.meshId !== hello.meshId) continue;
if (oldConn.leaseState !== "offline") continue;
if (oldConn.sessionPubkey !== hello.sessionPubkey) continue;
if (oldConn.evictionTimer) {
clearTimeout(oldConn.evictionTimer);
oldConn.evictionTimer = null;
}
oldConn.ws = ws;
oldConn.leaseState = "online";
oldConn.leaseUntil = 0;
oldConn.lastPongAt = Date.now();
// Refresh mutable fields from the new hello.
oldConn.cwd = hello.cwd;
if (hello.displayName) oldConn.displayName = hello.displayName;
log.info("session_hello reattach (lease)", {
presence_id: pid,
session_pubkey: hello.sessionPubkey.slice(0, 12),
});
void restorePresence(pid);
void maybePushQueuedMessages(pid);
return {
presenceId: pid,
memberDisplayName: oldConn.displayName,
memberProfile: undefined,
meshPolicy,
silent: true,
};
}
// Session-id dedup: if the same session_id is already connected, kick
// the ghost. Reconnect after a network blip lands here cleanly.
for (const [oldPid, oldConn] of connections) {
@@ -2022,6 +2260,11 @@ async function handleSessionHello(
groups: initialGroups,
visible: true,
profile: {},
peerRole: "session",
lastPongAt: Date.now(),
leaseState: "online",
leaseUntil: 0,
evictionTimer: null,
});
incMeshCount(hello.meshId);
void audit(hello.meshId, "peer_joined", member.id, effectiveDisplayName, {
@@ -2420,8 +2663,10 @@ function handleConnection(ws: WebSocket): void {
}
// Broadcast peer_joined to siblings — same shape as the regular
// hello path, so list_peers consumers don't need to special-case.
// Skipped on lease reattach: the session was never visibly absent,
// so no synthetic join event should fire.
const joinedConn = connections.get(presenceId);
if (joinedConn) {
if (joinedConn && !result.silent) {
const joinMsg: WSPushMessage = {
type: "push",
subtype: "system",
@@ -2504,9 +2749,11 @@ function handleConnection(ws: WebSocket): void {
} catch {
/* ws closed during hello */
}
// Broadcast peer_joined or peer_returned to all other peers in the same mesh.
// Broadcast peer_joined or peer_returned to all other peers in the
// same mesh. Skipped on lease reattach: the session never appeared
// offline so no synthetic join event should fire.
const joinedConn = connections.get(presenceId);
if (joinedConn) {
if (joinedConn && !result.silent) {
const isReturning = !!result.restored;
const joinMsg: WSPushMessage = {
type: "push",
@@ -4645,11 +4892,30 @@ function handleConnection(ws: WebSocket): void {
}
const affected: string[] = [];
// 1.34.15 (gap #3a): kick was a no-op against long-lived
// control-plane connections (daemon, dashboard) — closing
// their WS just triggered the auto-reconnect loop, the
// kicker's CLI rendered "Their Claude Code session ended"
// (which was misleading), and the user-visible state was
// unchanged seconds later. We now refuse to close control-
// plane WSes and surface the skipped peers in a new
// additive ack field. Pre-1.34.15 CLI clients only read
// `kicked`/`affected`, so this stays back-compat.
//
// For `kick`-only: the soft `disconnect` verb still closes
// control-plane WSes intentionally — that's what users want
// when they're nudging a peer for it to re-authenticate.
const skippedControlPlane: string[] = [];
const skipControlPlane = isKick;
const now = Date.now();
if (km.all) {
for (const [pid, peer] of connections) {
if (peer.meshId !== conn.meshId || pid === presenceId) continue;
if (skipControlPlane && peer.peerRole === "control-plane") {
skippedControlPlane.push(peer.displayName || pid);
continue;
}
try { peer.ws.close(closeCode, closeReason); } catch {}
connections.delete(pid);
void disconnectPresence(pid);
@@ -4661,6 +4927,10 @@ function handleConnection(ws: WebSocket): void {
if (peer.meshId !== conn.meshId || pid === presenceId) continue;
const [pres] = await db.select({ lastPingAt: presence.lastPingAt }).from(presence).where(eq(presence.id, pid)).limit(1);
if (pres && pres.lastPingAt && pres.lastPingAt.getTime() < cutoff) {
if (skipControlPlane && peer.peerRole === "control-plane") {
skippedControlPlane.push(peer.displayName || pid);
continue;
}
try { peer.ws.close(closeCode, `${closeReason}_stale`); } catch {}
connections.delete(pid);
void disconnectPresence(pid);
@@ -4671,6 +4941,10 @@ function handleConnection(ws: WebSocket): void {
for (const [pid, peer] of connections) {
if (peer.meshId !== conn.meshId) continue;
if (peer.displayName === km.target || peer.memberPubkey === km.target || peer.memberPubkey.startsWith(km.target)) {
if (skipControlPlane && peer.peerRole === "control-plane") {
skippedControlPlane.push(peer.displayName || pid);
continue;
}
try { peer.ws.close(closeCode, closeReason); } catch {}
connections.delete(pid);
void disconnectPresence(pid);
@@ -4679,8 +4953,20 @@ function handleConnection(ws: WebSocket): void {
}
}
conn.ws.send(JSON.stringify({ type: ackType, kicked: affected, affected, _reqId: km._reqId }));
log.info(`ws ${closeReason}`, { presence_id: presenceId, count: affected.length, target: km.target ?? km.stale ?? "all" });
conn.ws.send(JSON.stringify({
type: ackType,
kicked: affected,
affected,
// Additive — older CLI clients ignore this field.
...(skippedControlPlane.length > 0 ? { skipped_control_plane: skippedControlPlane } : {}),
_reqId: km._reqId,
}));
log.info(`ws ${closeReason}`, {
presence_id: presenceId,
count: affected.length,
target: km.target ?? km.stale ?? "all",
skipped_control_plane: skippedControlPlane.length,
});
break;
}
@@ -5108,88 +5394,52 @@ function handleConnection(ws: WebSocket): void {
}
});
ws.on("close", async () => {
if (presenceId) {
const conn = connections.get(presenceId);
// Persist peer state BEFORE removing from connections.
if (conn) {
await savePeerState(conn, conn.memberId, conn.meshId);
}
connections.delete(presenceId);
if (conn) {
decMeshCount(conn.meshId);
// Broadcast peer_left to remaining peers in the same mesh.
const leaveMsg: WSPushMessage = {
type: "push",
subtype: "system",
event: "peer_left",
eventData: {
name: conn.displayName,
pubkey: conn.sessionPubkey ?? conn.memberPubkey,
},
messageId: crypto.randomUUID(),
meshId: conn.meshId,
senderPubkey: "system",
priority: "low",
nonce: "",
ciphertext: "",
createdAt: new Date().toISOString(),
};
for (const [pid, peer] of connections) {
if (peer.meshId !== conn.meshId) continue;
// Don't tell the user's own other sessions they "left" when one
// of their Claude Code instances closes. Same pubkey = same user.
if (peer.memberPubkey === conn.memberPubkey) continue;
sendToPeer(pid, leaveMsg);
}
}
await disconnectPresence(presenceId);
if (conn) {
void audit(conn.meshId, "peer_left", conn.memberId, conn.displayName, {});
}
// Clean up URL watches owned by this peer — the interval was
// happily fetching forever after the peer disconnected.
for (const [watchId, watch] of urlWatches) {
if (watch.presenceId === presenceId) {
clearInterval(watch.timer);
urlWatches.delete(watchId);
}
}
// Clean up stream subscriptions for this peer
for (const [key, subs] of streamSubscriptions) {
subs.delete(presenceId);
if (subs.size === 0) streamSubscriptions.delete(key);
}
// Clean up MCP servers registered by this peer
for (const [key, entry] of mcpRegistry) {
if (entry.presenceId === presenceId) {
if (entry.persistent) {
// Keep persistent entries but mark offline
entry.online = false;
entry.offlineSince = new Date().toISOString();
entry.presenceId = "";
} else {
mcpRegistry.delete(key);
}
}
}
// Auto-pause clock when mesh becomes empty
if (conn && !connectionsPerMesh.has(conn.meshId)) {
const clock = meshClocks.get(conn.meshId);
if (clock && clock.timer) {
clearInterval(clock.timer);
clock.timer = null;
clock.paused = true;
log.info("clock auto-paused (mesh empty)", { mesh_id: conn.meshId });
}
}
log.info("ws close", { presence_id: presenceId });
if (!presenceId) return;
const conn = connections.get(presenceId);
if (!conn) return; // already evicted
// If the conn's `ws` is no longer THIS ws, the close belongs to an
// older socket that was already replaced by a reattach. Ignore — the
// lease is healthy with the new WS, no eviction needed.
if (conn.ws !== ws) {
log.debug("ws close on replaced socket — ignoring", { presence_id: presenceId });
return;
}
await savePeerState(conn, conn.memberId, conn.meshId);
// If lease is currently online, enter grace. Other peers see the
// session as still online; DMs queue (sendToPeer no-ops on dead
// WS, drain on reattach). After GRACE_MS without a reattach, the
// timer fires evictPresenceFully and cleanup runs as before.
const pid = presenceId;
if (conn.leaseState === "online") {
conn.leaseState = "offline";
conn.leaseUntil = Date.now() + GRACE_MS;
conn.evictionTimer = setTimeout(() => {
log.info("lease grace expired — evicting", { presence_id: pid });
void evictPresenceFully(pid);
}, GRACE_MS);
log.info("ws close — lease grace started", {
presence_id: pid,
grace_ms: GRACE_MS,
});
return;
}
// Not online (already in grace from an earlier close, or odd state).
// Run full eviction immediately.
await evictPresenceFully(pid);
});
ws.on("error", (err) => {
log.warn("ws error", { error: err.message });
});
ws.on("pong", () => {
if (presenceId) void heartbeat(presenceId);
if (presenceId) {
const conn = connections.get(presenceId);
if (conn) conn.lastPongAt = Date.now();
void heartbeat(presenceId);
}
});
}
@@ -5381,10 +5631,29 @@ async function main(): Promise<void> {
});
});
// WS heartbeat ping every 30s; clients reply with pong → bumps lastPingAt.
// WS heartbeat ping every 30s; clients reply with pong → bumps
// lastPongAt. Connections whose pong is older than 75s (2.5x the
// ping interval) are considered half-dead — kernel hasn't yet RST'd
// the socket but no application traffic is flowing. Force-terminate
// them to fire the close handler and free the connection slot.
const STALE_PONG_THRESHOLD_MS = 75_000;
const pingInterval = setInterval(() => {
for (const { ws } of connections.values()) {
if (ws.readyState === ws.OPEN) ws.ping();
const now = Date.now();
for (const [pid, conn] of connections) {
// Skip offline-leased entries: their WS is intentionally dead
// during grace; the eviction timer handles their lifecycle.
if (conn.leaseState === "offline") continue;
const { ws } = conn;
if (ws.readyState !== ws.OPEN) continue;
if (now - conn.lastPongAt > STALE_PONG_THRESHOLD_MS) {
log.warn("ws stale terminate", {
presence_id: pid,
last_pong_ago_ms: now - conn.lastPongAt,
});
try { ws.terminate(); } catch { /* socket already gone */ }
continue;
}
ws.ping();
}
}, 30_000);
pingInterval.unref();

View File

@@ -0,0 +1,47 @@
/**
* Kick control-plane skip: 1.34.15 (gap #3a) refuses to close
* long-lived control-plane connections (claudemesh daemon, dashboard)
* via `kick`, because they auto-reconnect within seconds and the verb
* was effectively a no-op. The soft `disconnect` verb keeps the old
* behavior so users can still nudge a control-plane peer to
* re-authenticate.
*
* Pure-logic test — mirrors the branch inside handleSend's kick case
* without spinning up a broker. Same pattern as
* grants-enforcement.test.ts.
*/
import { describe, expect, test } from "vitest";
type PeerRole = "control-plane" | "session" | "service";
/** Mirrors the predicate inserted into the kick handler. */
function shouldSkipKick(args: {
verb: "kick" | "disconnect";
peerRole: PeerRole;
}): boolean {
const skipControlPlane = args.verb === "kick";
return skipControlPlane && args.peerRole === "control-plane";
}
describe("kick control-plane skip (gap #3a)", () => {
test("kick on control-plane → skipped (would auto-reconnect)", () => {
expect(shouldSkipKick({ verb: "kick", peerRole: "control-plane" })).toBe(true);
});
test("kick on session → not skipped (closes user session)", () => {
expect(shouldSkipKick({ verb: "kick", peerRole: "session" })).toBe(false);
});
test("kick on service → not skipped", () => {
expect(shouldSkipKick({ verb: "kick", peerRole: "service" })).toBe(false);
});
test("disconnect on control-plane → not skipped (intentional nudge)", () => {
expect(shouldSkipKick({ verb: "disconnect", peerRole: "control-plane" })).toBe(false);
});
test("disconnect on session → not skipped", () => {
expect(shouldSkipKick({ verb: "disconnect", peerRole: "session" })).toBe(false);
});
});

View File

@@ -1,5 +1,110 @@
# Changelog
## 1.34.15 (2026-05-04) — `peer list --mesh` actually scopes + `kick` refuses control-plane
Two follow-ups from the 1.34.x train, both backwards-compatible.
### `peer list --mesh <slug>` no longer aggregates across meshes
`apps/cli/src/commands/peers.ts:140` was calling
`tryListPeersViaDaemon()` with no argument, so a multi-mesh daemon
returned peers from EVERY attached mesh and the renderer printed
"peers on flexicar" with cross-mesh rows mixed in. The daemon's
`/v1/peers?mesh=<slug>` filter (server-side, since 1.26.0) was
already correctly scoping when the slug was passed; the CLI just
wasn't passing it. Fixed.
`apps/cli/src/commands/launch.ts:407` (the `printBrokerWelcome` peer
count in the launch banner) had the same bug. The "N peers online"
line in the welcome now shows the count for the launched mesh only.
`apps/cli/src/commands/send.ts` cross-mesh hex-prefix resolution is
intentionally cross-mesh (the user is targeting by hex without
specifying a mesh) and was deliberately left as-is.
### `claudemesh kick` refuses no-op kicks on control-plane connections
Pre-1.34.15, kicking a daemon's member-WS or a dashboard connection
just closed the socket — the daemon's WS-lifecycle reconnect loop
brought it back within seconds, the kicker's CLI rendered "Their
Claude Code session ended" (which was misleading), and the user-
visible state was unchanged. The verb was effectively a no-op, but
the user had to learn that the hard way.
The broker's kick handler (`apps/broker/src/index.ts:4628+`) now
skips peers where `peerRole === "control-plane"` and surfaces the
skipped peers in a new additive ack field `skipped_control_plane`.
The soft `disconnect` verb keeps the old behavior — useful when
intentionally nudging a control-plane peer to re-authenticate.
The CLI (`apps/cli/src/commands/kick.ts`) reads the new field and
prints a clearer message: refused peers are listed, with the hint
that `claudemesh ban <peer>` is the right tool to remove a member,
or `claudemesh daemon down` to take a daemon offline locally.
`apps/broker/src/index.ts` adds `peerRole` to the in-memory
`PeerConn` shape, populated from both connection paths
(member-keyed `hello``"control-plane"`, per-launch
`session_hello``"session"`). The DB-side role taxonomy is
unchanged.
### Back-compat
- Older CLI clients ignore the new `skipped_control_plane` ack
field; their kick continues to print "Kicked 0 peer(s)" against
a control-plane target as before.
- Older brokers don't emit the field at all; newer CLI handles
the absence (the new branch is only reached when the field is
present and non-empty).
- The new `peerRole` slot on `PeerConn` is filled at every
`connections.set` callsite, so older code paths never read
`undefined`.
### Tests
- `apps/broker/tests/kick-control-plane-skip.test.ts` — 5 cases
covering the kick/disconnect × control-plane/session/service
truth table.
## 1.34.14 (2026-05-04) — stale `CLAUDEMESH_CONFIG_DIR` falls back
`claudemesh launch` exports `CLAUDEMESH_CONFIG_DIR=<tmpdir>` to its
spawned `claude` so the per-session mesh selection is isolated from
`~/.claudemesh/config.json`. The tmpdir is `rmSync`'d on launch exit
via the `process.on('exit', cleanup)` handler.
Footgun: if a later `claudemesh` invocation INHERITED that env — a
Bash tool call inside Claude Code, a tmux pane that captured the env
via `update-environment`, an exported var the user forgot to clear —
the inherited path pointed at a tmpdir that no longer existed.
Pre-1.34.14 we silently used the dead path, `readConfig()` came back
empty, and the user saw "No meshes joined" from an otherwise-working
install. Fish users hit it harder because fish has no `unset`
they had to discover `set -e CLAUDEMESH_CONFIG_DIR`.
`apps/cli/src/constants/paths.ts` now resolves `CONFIG_DIR` once via
a memoized `resolveConfigDir()`:
1. No env var → `~/.claudemesh` (default, unchanged).
2. Env points at a dir containing `config.json` → trust it. The
legitimate per-session-launch case is byte-identical to before.
3. Env set but stale (dir gone) → warn once on stderr (TTY-only —
CI / MCP boot / piped scripts stay quiet) with a shell-specific
unset hint, then fall back to `~/.claudemesh`.
The check is on the directory's existence, not on `config.json`,
because a fresh-launch tmpdir legitimately has no `config.json` until
the first write. The stale signature we catch is the outer launch's
`rmSync(tmpDir, {recursive: true})` cleanup, which removes the
directory entirely.
The "no meshes" check from the original triage was deliberately NOT
adopted: a launched session that legitimately joins one mesh would
hit it.
No back-compat surface affected. No other files changed. `_resetPathsForTest()`
exported for unit tests.
## 1.34.13 (2026-05-04) — MCP forwards session token on /v1/events
The 1.34.10 SSE demux + 1.34.11 inbox per-recipient column were both

View File

@@ -1,6 +1,6 @@
{
"name": "claudemesh-cli",
"version": "1.34.13",
"version": "1.34.16",
"description": "Peer mesh for Claude Code sessions — CLI + MCP server.",
"keywords": [
"claude-code",

View File

@@ -76,12 +76,32 @@ export async function runKick(
if ("error" in built) { render.err(String(built.error)); return EXIT.INVALID_ARGS; }
return await withMesh({ meshSlug }, async (client) => {
const result = await client.sendAndWait(built as Record<string, unknown>) as { affected?: string[]; kicked?: string[] };
const result = await client.sendAndWait(built as Record<string, unknown>) as {
affected?: string[];
kicked?: string[];
// 1.34.15: broker refuses to kick control-plane WSes (they'd
// just auto-reconnect). Older brokers don't emit this field.
skipped_control_plane?: string[];
};
const peers = result?.affected ?? result?.kicked ?? [];
if (peers.length === 0) render.info("No peers matched.");
else {
const skipped = result?.skipped_control_plane ?? [];
if (peers.length === 0 && skipped.length === 0) {
render.info("No peers matched.");
} else if (peers.length === 0 && skipped.length > 0) {
render.warn(
`${skipped.length} match(es) refused: ${skipped.join(", ")} — control-plane connections (daemon / dashboard) auto-reconnect, so kick is a no-op.`,
"To take a daemon offline locally, run `claudemesh daemon down` on that machine. To remove a member from the mesh, use `claudemesh ban <peer>`.",
);
} else {
render.ok(`Kicked ${peers.length} peer(s): ${peers.join(", ")}`);
render.hint("Their Claude Code session ended. They can rejoin anytime by running `claudemesh`.");
if (skipped.length > 0) {
render.warn(
`(also refused ${skipped.length} control-plane connection(s): ${skipped.join(", ")})`,
"Daemon / dashboard connections auto-reconnect; kick is a no-op against them. Use `claudemesh ban <peer>` to remove a member entirely.",
);
}
}
return EXIT.SUCCESS;
});

View File

@@ -400,11 +400,13 @@ async function printBrokerWelcome(meshSlug: string): Promise<void> {
}
} catch { /* daemon unreachable — not fatal */ }
// Peer count (best-effort).
// Peer count (best-effort). 1.34.15: scope to the launched mesh so
// multi-mesh daemons don't inflate the welcome banner with peers
// from other meshes the user didn't just attach to.
let peerCount = -1;
try {
const { tryListPeersViaDaemon } = await import("~/services/bridge/daemon-route.js");
const peers = (await tryListPeersViaDaemon()) ?? [];
const peers = (await tryListPeersViaDaemon(meshSlug)) ?? [];
peerCount = peers.filter((p) =>
(p as { channel?: string }).channel !== "claudemesh-daemon",
).length;

View File

@@ -135,9 +135,17 @@ async function listPeersForMesh(slug: string): Promise<PeerRecord[]> {
// lifecycle helper inside tryListPeersViaDaemon auto-spawns the
// daemon if it's down and probes it for liveness — no separate bridge
// tier is needed any more (1.28.0).
//
// 1.34.15: forward `slug` to the daemon as `?mesh=<slug>` so the
// server-side aggregator narrows to the requested mesh. Pre-1.34.15
// we called this with no argument, so a multi-mesh daemon returned
// peers from every attached mesh and the renderer printed "peers on
// flexicar" with cross-mesh rows mixed in. The daemon's
// `meshFromCtx` already does the right scoping when the slug is
// passed; the CLI just wasn't passing it.
try {
const { tryListPeersViaDaemon } = await import("~/services/bridge/daemon-route.js");
const dr = await tryListPeersViaDaemon();
const dr = await tryListPeersViaDaemon(slug);
if (dr !== null) {
return dr.map((p) => annotateSelf(p as PeerRecord, selfMemberPubkey, selfSessionPubkey));
}

View File

@@ -1,10 +1,82 @@
import { existsSync } from "node:fs";
import { homedir } from "node:os";
import { join } from "node:path";
const home = homedir();
const DEFAULT_CONFIG_DIR = join(home, ".claudemesh");
/**
* Resolve `CONFIG_DIR` once, with stale-env detection.
*
* `claudemesh launch` exposes `CLAUDEMESH_CONFIG_DIR=<tmpdir>` to its
* spawned `claude` so the per-session mesh selection is isolated from
* `~/.claudemesh/config.json`. The tmpdir is rmSync'd on launch exit.
*
* Footgun: if a `claudemesh` invocation INHERITS that env from an
* already-launched (or previously-launched) session — e.g. a Bash tool
* call inside Claude Code, or a tmux pane that captured the env via
* `update-environment` — the inherited path may point at a tmpdir that
* no longer exists. Pre-1.34.14 we silently used the dead path,
* `readConfig()` came back empty, and the user saw "No meshes joined"
* from an otherwise-working install.
*
* Resolution rules:
* 1. No env var → `~/.claudemesh` (default).
* 2. Env points at a dir containing `config.json` → trust it
* (the legitimate per-session-launch case).
* 3. Env set but stale (dir missing or no `config.json`) → warn
* once on stderr (TTY-only) and fall back to `~/.claudemesh`.
*
* Memoized: resolves once on first access. Mid-process env mutations
* are intentionally ignored — paths must stay stable across one CLI
* invocation.
*/
let _resolvedConfigDir: string | null = null;
let _warnedStaleEnv = false;
function resolveConfigDir(): string {
if (_resolvedConfigDir !== null) return _resolvedConfigDir;
const envDir = process.env.CLAUDEMESH_CONFIG_DIR;
if (!envDir) {
_resolvedConfigDir = DEFAULT_CONFIG_DIR;
return DEFAULT_CONFIG_DIR;
}
// Trust the env when it resolves to a real directory. We check
// the DIR (not `config.json`) because the legitimate "fresh launch
// before any write" case has the dir but no config.json yet.
// The stale signature we want to catch is `rmSync(tmpDir,
// {recursive: true})` from the outer launch's cleanup — that
// removes the directory entirely, so a missing dir is the
// unambiguous "stale" signal.
if (existsSync(envDir)) {
_resolvedConfigDir = envDir;
return envDir;
}
// Stale: env set but the dir is gone. Most likely the outer
// launch's cleanup ran and we inherited its (now-dead) tmpdir
// path. Fall back to default and warn the user once on stderr —
// only when attached to a TTY, so non-interactive callers (CI,
// MCP boot, scripts piping stdout) stay quiet.
if (!_warnedStaleEnv && process.stderr.isTTY) {
_warnedStaleEnv = true;
const unsetHint =
process.env.SHELL?.endsWith("fish")
? "set -e CLAUDEMESH_CONFIG_DIR CLAUDEMESH_IPC_TOKEN_FILE"
: "unset CLAUDEMESH_CONFIG_DIR CLAUDEMESH_IPC_TOKEN_FILE";
process.stderr.write(
`claudemesh: ignoring stale CLAUDEMESH_CONFIG_DIR=${envDir} (no config.json there); using ${DEFAULT_CONFIG_DIR}.\n`
+ ` Hint: this is usually a leftover env from a previous \`claudemesh launch\`. Clean it with:\n`
+ ` ${unsetHint}\n`,
);
}
_resolvedConfigDir = DEFAULT_CONFIG_DIR;
return DEFAULT_CONFIG_DIR;
}
export const PATHS = {
CONFIG_DIR: process.env.CLAUDEMESH_CONFIG_DIR || join(home, ".claudemesh"),
get CONFIG_DIR() {
return resolveConfigDir();
},
get CONFIG_FILE() {
return join(this.CONFIG_DIR, "config.json");
},
@@ -20,3 +92,12 @@ export const PATHS = {
CLAUDE_JSON: join(home, ".claude.json"),
CLAUDE_SETTINGS: join(home, ".claude", "settings.json"),
} as const;
/**
* Test-only: reset the memoized resolution. Not exported from the
* package barrel; reach in via the relative path from a test file.
*/
export function _resetPathsForTest(): void {
_resolvedConfigDir = null;
_warnedStaleEnv = false;
}

View File

@@ -139,6 +139,25 @@ export function connectWsWithBackoff(opts: WsLifecycleOptions): Promise<WsLifecy
* but ignores the rejection — by then the close handler has already
* scheduled its own reconnect).
*/
// Liveness watchdog: same cadence (30s) as the broker's outbound
// ping. Two jobs per tick:
// 1. If we haven't heard from the broker in >75s (2.5x the ping
// cadence — covers one missed ping plus some slack), terminate
// the socket. Fires the close handler → backoff reconnect runs
// its normal path. This is what catches NAT-dropped half-dead
// connections that the kernel won't RST for ~2 hours.
// 2. Otherwise, send our own ping. The broker's `ws` library
// auto-replies with a pong, which bumps lastActivity. This
// keeps the broker's stale-pong watchdog seeing us as alive.
//
// Bare `ping` and `pong` events both bump lastActivity, as does
// any inbound application message — any sign of life resets the
// dead-man's-switch.
const PING_INTERVAL_MS = 30_000;
const STALE_THRESHOLD_MS = 75_000;
let lastActivity = Date.now();
let watchdogTimer: NodeJS.Timeout | null = null;
const openOnce = (): Promise<void> => {
if (closed) return Promise.reject(new Error("client_closed"));
setStatus("connecting");
@@ -146,6 +165,7 @@ export function connectWsWithBackoff(opts: WsLifecycleOptions): Promise<WsLifecy
log("info", "ws_open_attempt", { url: opts.url });
const sock = new WebSocket(opts.url);
ws = sock;
lastActivity = Date.now();
return new Promise<void>((resolve, reject) => {
sock.on("open", () => {
@@ -170,6 +190,7 @@ export function connectWsWithBackoff(opts: WsLifecycleOptions): Promise<WsLifecy
});
sock.on("message", (raw) => {
lastActivity = Date.now();
let msg: Record<string, unknown>;
try { msg = JSON.parse(raw.toString()) as Record<string, unknown>; }
catch { return; }
@@ -179,6 +200,18 @@ export function connectWsWithBackoff(opts: WsLifecycleOptions): Promise<WsLifecy
setStatus("open");
reconnectAttempt = 0;
log("info", "ws_hello_acked", { url: opts.url });
// Start liveness watchdog only after a successful handshake.
if (watchdogTimer) clearInterval(watchdogTimer);
watchdogTimer = setInterval(() => {
if (sock.readyState !== sock.OPEN) return;
const idle = Date.now() - lastActivity;
if (idle > STALE_THRESHOLD_MS) {
log("warn", "ws_stale_terminate", { url: opts.url, idle_ms: idle });
try { sock.terminate(); } catch { /* socket already gone */ }
return;
}
try { sock.ping(); } catch { /* ignore */ }
}, PING_INTERVAL_MS);
resolve();
return;
}
@@ -186,8 +219,12 @@ export function connectWsWithBackoff(opts: WsLifecycleOptions): Promise<WsLifecy
opts.onMessage(msg);
});
sock.on("ping", () => { lastActivity = Date.now(); });
sock.on("pong", () => { lastActivity = Date.now(); });
sock.on("close", (code, reason) => {
if (helloTimer) { clearTimeout(helloTimer); helloTimer = null; }
if (watchdogTimer) { clearInterval(watchdogTimer); watchdogTimer = null; }
const reasonStr = reason.toString("utf8");
log("warn", "ws_closed", { url: opts.url, code, reason: reasonStr, status });
opts.onBeforeReconnect?.(code, reasonStr);
@@ -227,6 +264,7 @@ export function connectWsWithBackoff(opts: WsLifecycleOptions): Promise<WsLifecy
closed = true;
if (reconnectTimer) { clearTimeout(reconnectTimer); reconnectTimer = null; }
if (helloTimer) { clearTimeout(helloTimer); helloTimer = null; }
if (watchdogTimer) { clearInterval(watchdogTimer); watchdogTimer = null; }
try { ws?.close(); } catch { /* ignore */ }
setStatus("closed");
},

View File

@@ -0,0 +1,57 @@
import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
import { mkdirSync, rmSync, existsSync } from "node:fs";
import { join } from "node:path";
import { tmpdir, homedir } from "node:os";
/** Each test imports a fresh copy of paths.ts via dynamic import +
* `_resetPathsForTest()` so memoization doesn't leak across cases. */
const TEST_DIR = join(tmpdir(), "claudemesh-paths-test-" + Date.now());
describe("paths CONFIG_DIR resolution", () => {
beforeEach(() => {
delete process.env.CLAUDEMESH_CONFIG_DIR;
if (existsSync(TEST_DIR)) rmSync(TEST_DIR, { recursive: true, force: true });
});
afterEach(() => {
delete process.env.CLAUDEMESH_CONFIG_DIR;
if (existsSync(TEST_DIR)) rmSync(TEST_DIR, { recursive: true, force: true });
});
it("falls back to ~/.claudemesh when env var is unset", async () => {
const mod = await import("~/constants/paths.js");
mod._resetPathsForTest();
expect(mod.PATHS.CONFIG_DIR).toBe(join(homedir(), ".claudemesh"));
});
it("honors CLAUDEMESH_CONFIG_DIR when the dir exists, even without config.json", async () => {
mkdirSync(TEST_DIR, { recursive: true });
process.env.CLAUDEMESH_CONFIG_DIR = TEST_DIR;
const mod = await import("~/constants/paths.js");
mod._resetPathsForTest();
expect(mod.PATHS.CONFIG_DIR).toBe(TEST_DIR);
});
it("falls back to default when env points at a missing dir (stale-tmpdir case)", async () => {
process.env.CLAUDEMESH_CONFIG_DIR = "/var/folders/_nonexistent_claudemesh_dir_xyz123";
const mod = await import("~/constants/paths.js");
mod._resetPathsForTest();
// Suppress the stderr warning to keep test output clean
const stderr = vi.spyOn(process.stderr, "write").mockImplementation(() => true);
try {
expect(mod.PATHS.CONFIG_DIR).toBe(join(homedir(), ".claudemesh"));
} finally {
stderr.mockRestore();
}
});
it("memoizes — second access returns the same path even if env changes mid-process", async () => {
mkdirSync(TEST_DIR, { recursive: true });
process.env.CLAUDEMESH_CONFIG_DIR = TEST_DIR;
const mod = await import("~/constants/paths.js");
mod._resetPathsForTest();
const first = mod.PATHS.CONFIG_DIR;
process.env.CLAUDEMESH_CONFIG_DIR = "/something/else";
expect(mod.PATHS.CONFIG_DIR).toBe(first);
});
});

View File

@@ -1,55 +1,161 @@
import Link from "next/link";
import {
CHANGELOG_ENTRIES,
CHANGELOG_TYPE_COLOR,
CHANGELOG_TYPE_LABELS,
} from "~/modules/marketing/home/changelog-data";
export const metadata = {
title: "Changelog — claudemesh",
description: "Release history for claudemesh-cli.",
description:
"Release history for claudemesh-cli — every shipped version, with the why behind it.",
};
const ENTRIES = [
{ version: "0.1.4", date: "2026-04-06", type: "feat", summary: "Stateful welcome screen, PROTOCOL.md, THREAT_MODEL.md, Windows CI matrix" },
{ version: "0.1.3", date: "2026-04-05", type: "feat", summary: "claudemesh --version, status, doctor commands" },
{ version: "0.1.2", date: "2026-04-05", type: "feat", summary: "claudemesh launch command, transparency banner, decrypt fix, Windows support" },
];
const TYPE_LABELS: Record<string, string> = { feat: "Feature", fix: "Fix", docs: "Docs" };
const TYPE_COLORS: Record<string, string> = { feat: "bg-[var(--cm-clay)]", fix: "bg-[var(--cm-cactus)]", docs: "bg-[var(--cm-oat)]" };
export default function ChangelogPage() {
return (
<section className="mx-auto max-w-3xl px-6 py-24 md:py-32">
<h1
className="text-[clamp(2rem,4.5vw,3rem)] font-medium leading-[1.1] text-[var(--cm-fg)]"
style={{ fontFamily: "var(--cm-font-serif)" }}
>
Changelog
</h1>
<p
className="mt-4 text-[15px] text-[var(--cm-fg-secondary)]"
style={{ fontFamily: "var(--cm-font-sans)" }}
>
Every shipped version of claudemesh-cli.
</p>
<div className="mt-12 space-y-8">
{ENTRIES.map((entry) => (
<article key={entry.version} className="border-b border-[var(--cm-border)] pb-6">
<div className="flex items-center gap-3">
<span
className={`rounded-[4px] px-2 py-0.5 text-[10px] font-medium uppercase tracking-wider text-[var(--cm-bg)] ${TYPE_COLORS[entry.type] || "bg-[var(--cm-fg-tertiary)]"}`}
style={{ fontFamily: "var(--cm-font-mono)" }}
>
{TYPE_LABELS[entry.type] || entry.type}
</span>
<span className="text-[18px] font-medium text-[var(--cm-fg)]" style={{ fontFamily: "var(--cm-font-serif)" }}>
v{entry.version}
</span>
<time dateTime={entry.date} className="text-[11px] text-[var(--cm-fg-tertiary)]" style={{ fontFamily: "var(--cm-font-mono)" }}>
{new Date(entry.date).toLocaleDateString("en-US", { year: "numeric", month: "short", day: "numeric" })}
</time>
</div>
<p className="mt-2 text-[14px] leading-[1.6] text-[var(--cm-fg-secondary)]" style={{ fontFamily: "var(--cm-font-sans)" }}>
{entry.summary}
</p>
</article>
))}
<div className="mb-12">
<p
className="text-[11px] uppercase tracking-[0.2em] text-[var(--cm-fg-tertiary)]"
style={{ fontFamily: "var(--cm-font-mono)" }}
>
claudemesh-cli · release log
</p>
<h1
className="mt-3 text-[clamp(2rem,4.5vw,3rem)] font-medium leading-[1.1] text-[var(--cm-fg)]"
style={{ fontFamily: "var(--cm-font-serif)" }}
>
Changelog
</h1>
<p
className="mt-4 max-w-xl text-[15px] leading-[1.65] text-[var(--cm-fg-secondary)]"
style={{ fontFamily: "var(--cm-font-sans)" }}
>
Hand-picked, load-bearing ships from{" "}
<span className="text-[var(--cm-fg)]">v0.1.0</span> through{" "}
<span className="text-[var(--cm-clay)]">v1.34.15</span>. For the
byte-level diff, the canonical{" "}
<Link
href="https://github.com/alezmad/claudemesh/blob/main/apps/cli/CHANGELOG.md"
className="underline decoration-[var(--cm-fg-tertiary)] underline-offset-4 transition-colors hover:text-[var(--cm-fg)] hover:decoration-[var(--cm-clay)]"
>
CHANGELOG.md
</Link>{" "}
lives in the repo.
</p>
</div>
{/* Vertical timeline rail */}
<div className="relative">
<div
className="absolute left-[7px] top-2 hidden h-full w-px md:block"
style={{
background:
"linear-gradient(to bottom, var(--cm-clay) 0%, var(--cm-fig) 30%, var(--cm-cactus) 60%, transparent 100%)",
}}
/>
<div className="space-y-10">
{CHANGELOG_ENTRIES.map((entry, idx) => (
<article
key={entry.version + entry.date}
className="relative md:pl-10"
>
{/* Dot on rail */}
<div
className="absolute left-0 top-[10px] hidden h-[15px] w-[15px] rounded-full border-2 md:block"
style={{
borderColor: CHANGELOG_TYPE_COLOR[entry.type],
backgroundColor: "var(--cm-bg)",
}}
>
<div
className="absolute inset-[3px] rounded-full"
style={{
backgroundColor: CHANGELOG_TYPE_COLOR[entry.type],
opacity: idx === 0 ? 1 : 0.5,
}}
/>
</div>
<header className="mb-3 flex flex-wrap items-baseline gap-x-3 gap-y-1">
<span
className="rounded-[3px] px-1.5 py-0.5 text-[10px] font-medium uppercase tracking-wider"
style={{
fontFamily: "var(--cm-font-mono)",
backgroundColor: CHANGELOG_TYPE_COLOR[entry.type],
color: "var(--cm-gray-900)",
}}
>
{CHANGELOG_TYPE_LABELS[entry.type]}
</span>
<span
className="text-[18px] font-medium text-[var(--cm-fg)]"
style={{ fontFamily: "var(--cm-font-serif)" }}
>
v{entry.version}
</span>
<time
dateTime={entry.date}
className="text-[11px] text-[var(--cm-fg-tertiary)]"
style={{ fontFamily: "var(--cm-font-mono)" }}
>
{new Date(entry.date).toLocaleDateString("en-US", {
year: "numeric",
month: "short",
day: "numeric",
})}
</time>
</header>
<h2
className="text-[15px] font-medium text-[var(--cm-fg)]"
style={{ fontFamily: "var(--cm-font-sans)" }}
>
{entry.title}
</h2>
<p
className="mt-2 text-[14px] leading-[1.7] text-[var(--cm-fg-secondary)]"
style={{ fontFamily: "var(--cm-font-sans)" }}
>
{entry.summary}
</p>
</article>
))}
</div>
</div>
<footer className="mt-20 border-t border-[var(--cm-border)] pt-8">
<p
className="text-[13px] text-[var(--cm-fg-tertiary)]"
style={{ fontFamily: "var(--cm-font-sans)" }}
>
Tracked at{" "}
<Link
href="https://github.com/alezmad/claudemesh/blob/main/docs/roadmap.md"
className="underline decoration-[var(--cm-fg-tertiary)] underline-offset-4 transition-colors hover:text-[var(--cm-fg)] hover:decoration-[var(--cm-clay)]"
>
docs/roadmap.md
</Link>
. Specs at{" "}
<Link
href="https://github.com/alezmad/claudemesh/tree/main/.artifacts/specs"
className="underline decoration-[var(--cm-fg-tertiary)] underline-offset-4 transition-colors hover:text-[var(--cm-fg)] hover:decoration-[var(--cm-clay)]"
>
.artifacts/specs/
</Link>
. Tagged binaries on{" "}
<Link
href="https://github.com/alezmad/claudemesh/releases"
className="underline decoration-[var(--cm-fg-tertiary)] underline-offset-4 transition-colors hover:text-[var(--cm-fg)] hover:decoration-[var(--cm-clay)]"
>
GitHub Releases
</Link>
.
</p>
</footer>
</section>
);
}

View File

@@ -3,6 +3,7 @@ import { Features } from "~/modules/marketing/home/features";
import { WhereMeshFits } from "~/modules/marketing/home/where-mesh-fits";
import { WhatIsClaudemesh } from "~/modules/marketing/home/what-is-claudemesh";
import { Timeline } from "~/modules/marketing/home/timeline";
import { LatestReleases } from "~/modules/marketing/home/latest-releases";
import { Pricing } from "~/modules/marketing/home/pricing";
import { FAQ } from "~/modules/marketing/home/faq";
import { CallToAction } from "~/modules/marketing/home/cta";
@@ -22,6 +23,7 @@ const HomePage = () => {
<WhereMeshFits />
<WhatIsClaudemesh />
<Timeline />
<LatestReleases count={5} />
<Pricing />
<FAQ />
<CallToAction />

View File

@@ -0,0 +1,168 @@
/**
* Single source of truth for the curated release log surfaced on:
* - /changelog (full timeline)
* - / (Latest Releases compact strip)
*
* Lives outside `app/.../page.tsx` because Next.js's app-router type generator
* rejects non-conforming exports from route files (only `default`, `metadata`,
* `dynamic`, etc. are allowed). Importing data from a plain module sidesteps
* the constraint without changing route semantics.
*
* Hand-picked load-bearing ships, newest first. For the byte-level history
* see `apps/cli/CHANGELOG.md` in the repo.
*/
export type ChangelogEntry = {
version: string;
date: string;
type: "feat" | "fix" | "docs" | "perf" | "infra";
title: string;
summary: string;
};
export const CHANGELOG_ENTRIES: ChangelogEntry[] = [
{
version: "1.34.15",
date: "2026-05-04",
type: "fix",
title: "peer list --mesh scopes; kick refuses control-plane",
summary:
"Two follow-ups from the multi-session correctness train. peer list --mesh now forwards the slug to the daemon (was aggregating across all attached meshes). The broker refuses no-op kicks against control-plane connections (daemon, dashboard) — they auto-reconnected within seconds — and surfaces them in a new additive ack field. Soft `disconnect` keeps old behavior.",
},
{
version: "1.34.14",
date: "2026-05-04",
type: "fix",
title: "stale CLAUDEMESH_CONFIG_DIR falls back",
summary:
"When the launched-session env leaked into a later CLI invocation and pointed at a tmpdir that no longer existed, the resolver silently used the dead path and showed “No meshes joined”. Now memoized: env unset → default; env points at a real dir → trust; env set but dir gone → TTY-only stderr warning + fallback to ~/.claudemesh.",
},
{
version: "1.34.7 → 1.34.13",
date: "2026-05-04",
type: "fix",
title: "multi-session correctness train",
summary:
"Seven releases over a few hours that took claudemesh from “works for one session” to “internally consistent for N sessions on one daemon.” Per-session SSE demux at the bind layer, inbox per-recipient column, daemon detached by default, MCP forwards session token on /v1/events. Architecture invariant: every shared store / channel scopes by recipient.",
},
{
version: "1.32.0",
date: "2026-05-04",
type: "feat",
title: "multi-session UX bundle",
summary:
"Self-identity via session pubkey, `--self` fan-out for member-pubkey targeting, broker welcome on launch (broker state + peer count + unread inbox). Resolves hex prefixes to full pubkeys before send.",
},
{
version: "1.30.0",
date: "2026-05-04",
type: "feat",
title: "per-session broker presence",
summary:
"Two `claudemesh launch` sessions in the same cwd finally see each other in `peer list`. Each session has a long-lived broker presence row owned by the daemon, identified by a per-launch ephemeral keypair vouched by the member's stable key. Broker `session_hello` handler with parent-attestation TTL and session-signature checks.",
},
{
version: "1.26.0 → 1.29.0",
date: "2026-05-04",
type: "feat",
title: "multi-mesh daemon · per-session IPC tokens",
summary:
"One daemon process attaches to every joined mesh simultaneously. Aggregate read routes (/v1/peers, /v1/skills) tag each record with its mesh; explicit ?mesh=<slug> narrows server-side. Per-session IPC tokens scoped to tmpdir mode-0600 so CLI invocations from inside a launched session auto-attribute to its workspace. Self-healing daemon lifecycle (auto-spawn under file-lock, version probe).",
},
{
version: "1.24.0",
date: "2026-05-03",
type: "feat",
title: "daemon required + thin MCP",
summary:
"MCP server shrinks from 979 LoC to ~200 LoC of push-pipe. The daemon owns the broker WS and feeds the MCP push channel over IPC SSE. `claudemesh install` auto-installs and starts the daemon service. `claudemesh launch` ensures daemon is running before spawning Claude.",
},
{
version: "0.9.0 (1.22.0)",
date: "2026-05-03",
type: "feat",
title: "daemon foundation",
summary:
"Long-lived process holding one broker WS per attached mesh, durable outbox/inbox in SQLite, IPC over UDS (+ optional loopback TCP w/ bearer), SSE event stream. Caller-stable idempotency on every send. Service install (launchd / systemd-user). Outbox CLI with atomic abort+insert on requeue. Host-fingerprint pin on first run.",
},
{
version: "0.7.0 (1.21.0)",
date: "2026-05-03",
type: "infra",
title: "slug = identifier",
summary:
"Pre-launch correction of generic SaaS scaffolding. mesh.name and mesh.slug collapse — slug IS the identifier. `claudemesh rename <old-slug> <new-slug>` is the entire rename surface. CLI picker drops the (parens). Server PATCH /api/cli/meshes/:slug body becomes `{ slug }`.",
},
{
version: "0.4.0 → 0.5.2 (1.10.01.18.0)",
date: "2026-05-03",
type: "feat",
title: "me/* cross-mesh aggregation",
summary:
"First cross-mesh read-aggregating verbs. /v1/me/workspace, /v1/me/topics, /v1/me/notifications, /v1/me/activity, /v1/me/search — every aggregating read verb has CLI + web parity. Default-aggregation for `topic list`, `notification list`, `task list`, `state list`, `memory recall` when no --mesh is passed. file share / get with same-host fast path.",
},
{
version: "0.3.0 (1.8.0)",
date: "2026-05-02",
type: "feat",
title: "per-topic encryption (CLI + web)",
summary:
"Topics generate a 32-byte symmetric key on creation; broker seals via crypto_box for the creator. Pending-seals endpoint, seal POST, claudemesh topic post for encrypted REST sends, decrypt-on-render in topic tail, 30s background re-seal loop. Web side: browser-side persistent ed25519 identity in IndexedDB + encrypt-on-send / decrypt-on-render.",
},
{
version: "1.7.0",
date: "2026-05-02",
type: "feat",
title: "demo cut: topic tail, member list, notifications",
summary:
"Member sidebar in chat panel with names, online dots, presence summaries. Topic search + member-mention autocomplete. Notification feed at /dashboard listing every @<your-name> reference across all meshes (last 7 days). CLI parity: `claudemesh topic tail` (live SSE consumer), `claudemesh member list`, `claudemesh notification list`.",
},
{
version: "0.2.0 (1.6.0)",
date: "2026-05-02",
type: "feat",
title: "topics + REST gateway + bridge peers",
summary:
"Topics (channel pub/sub) with mesh = trust boundary, group = identity tag, topic = conversation scope — three orthogonal axes. API keys for non-WebSocket clients. REST /api/v1/* with bearer-token auth (messages, topics, peers, history). Bridge peers belonging to two meshes forwarding a topic between them. Humans-as-peers — peer_type: human plumbed end-to-end.",
},
{
version: "1.5.0",
date: "2026-05-02",
type: "feat",
title: "CLI-first architecture lock-in",
summary:
"Tool-less MCP — tools/list returns []. Inbound peer messages still arrive as experimental.claude/channel notifications mid-turn. Bundle size 42%. Resource-noun-verb CLI (peer list, message send, memory recall). Bundled claudemesh skill installed to ~/.claude/skills/. Unix-socket bridge for warm WS reuse (~220 ms warm vs ~600 ms cold). Policy engine + audit log.",
},
{
version: "1.0.0-alpha",
date: "2026-04-15",
type: "feat",
title: "single-binary distribution + per-peer caps",
summary:
"curl -fsSL claudemesh.com/install | sh downloads the right binary (darwin/linux/windows × x64/arm64). claudemesh:// URL scheme makes invite emails one-click. Per-peer capability grants: claudemesh grant/revoke/block/grants enforced server-side. Encrypted backup / restore with Argon2id + XChaCha20-Poly1305. Safety numbers (`claudemesh verify <peer>`).",
},
{
version: "0.1.0",
date: "2026-04-04",
type: "feat",
title: "public launch",
summary:
"Direct peer-to-peer messaging through a hosted broker, ready for real teams. End-to-end encryption — crypto_box direct, crypto_secretbox group. Signed ed25519 identities + signed invite links (ic://join/...). Hello-sig handshake auth. Hosted broker at wss://ic.claudemesh.com/ws. Claude Code MCP tools: list_peers, send_message, check_messages, set_summary, set_status.",
},
];
export const CHANGELOG_TYPE_LABELS: Record<ChangelogEntry["type"], string> = {
feat: "Feature",
fix: "Fix",
docs: "Docs",
perf: "Perf",
infra: "Infra",
};
export const CHANGELOG_TYPE_COLOR: Record<ChangelogEntry["type"], string> = {
feat: "var(--cm-clay)",
fix: "var(--cm-cactus)",
docs: "var(--cm-oat)",
perf: "var(--cm-fig)",
infra: "var(--cm-fg-tertiary)",
};

View File

@@ -32,9 +32,9 @@ export const CallToAction = () => {
className="mx-auto mt-8 max-w-2xl text-lg leading-[1.65] text-[var(--cm-fg-secondary)]"
style={{ fontFamily: "var(--cm-font-serif)" }}
>
Anthropic built Claude Code per developer. The next unlock is
between developers. Hosted on claudemesh.com or self-hosted in
your VPC same CLI, same features, same encryption.
Anthropic Agent Teams stops at the edge of one laptop. claudemesh
starts there across machines, users, and organizations. Hosted
on claudemesh.com or self-hosted in your VPC, same CLI either way.
</p>
</Reveal>
<Reveal delay={3}>

View File

@@ -5,7 +5,7 @@ import { Reveal } from "./_reveal";
const ITEMS = [
{
q: "Is claudemesh free?",
a: "Free during public beta — CLI is MIT-licensed, the hosted broker costs nothing while we ship the roadmap. Paid tiers launch when the dashboard ships. Beta users keep the free plan for life.",
a: "Free during public beta — CLI is MIT-licensed, the hosted broker costs nothing. Paid tiers launch when we exit beta and add team-scale features (SSO, audit retention, dedicated brokers). Beta users keep the free plan for life.",
},
{
q: "How do I get started?",
@@ -33,7 +33,11 @@ const ITEMS = [
},
{
q: "How is this different from MCP?",
a: "MCP connects one Claude to tools and services. claudemesh connects many Claudes to each other. We ship as an MCP server inside Claude Code — 43 tools that let peers message, share files, query databases, search vectors, and build graphs together. From the agent's view, other peers look like callable tools. It composes on top of MCP; it doesn't replace it.",
a: "MCP connects one Claude to tools and services. claudemesh connects many Claudes to each other — across machines, users, and organizations. As of v1.5.0 the MCP shim is intentionally thin: tools/list returns []. Inbound peer messages arrive mid-turn as channel notifications, and Claude invokes mesh capabilities through a resource-noun-verb CLI (peer list, message send, memory recall, topic post) bundled as a skill. claudemesh composes on top of MCP; it doesn't replace it.",
},
{
q: "How is this different from Anthropic's Agent Teams?",
a: "Anthropic's experimental Agent Teams (shipped Feb 2026, Claude Code v2.1.32+) coordinates multiple Claude Code sessions inside ONE Unix user's ~/.claude/ directory on ONE machine. Mailbox lives in process. Task list is a markdown file. Lead is fixed for the team's lifetime. Cleanup wipes the state. claudemesh runs across machines, users, and organizations. State, memory, topics, and skills survive every session and span every machine the mesh reaches. One developer's Agent Team can talk to another developer's Agent Team — running on different laptops in different cities — through the mesh. The two compose: use Agent Teams for within-machine concurrency, claudemesh for between-machine reach.",
},
{
q: "What persistence backends does the mesh include?",
@@ -53,7 +57,7 @@ const ITEMS = [
},
{
q: "Can a peer be in multiple meshes?",
a: "Yes. Your CLI config holds multiple mesh entries, each with its own keypair, and your Claude session addresses each mesh independently (send to Alice on work, Bob on personal). Cross-mesh bridge peers that auto-forward tagged messages are v0.2; cross-broker federation (your self-host ↔ claudemesh.com) is v0.3.",
a: "Yes. Your CLI config holds multiple mesh entries, each with its own keypair. As of v1.26.0, the daemon attaches to every joined mesh simultaneously — `claudemesh peer list` aggregates across all of them, `--mesh <slug>` narrows to one. Cross-mesh bridge peers that auto-forward tagged topics shipped in v0.2.0 (v1.6.0). Cross-broker federation (your self-host ↔ claudemesh.com) is the next major direction.",
},
];

View File

@@ -67,9 +67,10 @@ export const HeroWithMesh = () => {
textShadow: "0 2px 20px rgba(0,0,0,0.8)",
}}
>
Share context, files, skills, and MCPs across every Claude Code
session end-to-end encrypted. Hosted on claudemesh.com or
self-hosted in your VPC. Same CLI, same wire, your choice.
The encrypted backbone where Claude Code sessions, autonomous
agents, and humans coordinate across machines, across users,
across organizations. Hosted on claudemesh.com or self-hosted in
your VPC. Same CLI, same wire, your choice.
</p>
</Reveal>

View File

@@ -0,0 +1,141 @@
import Link from "next/link";
import {
CHANGELOG_ENTRIES,
CHANGELOG_TYPE_COLOR,
CHANGELOG_TYPE_LABELS,
} from "./changelog-data";
import { Reveal, SectionIcon } from "./_reveal";
/**
* Compact recent-releases strip for the home page. Pulls the top N entries
* from the same data source as the full /changelog page so they never
* disagree.
*/
export const LatestReleases = ({ count = 5 }: { count?: number }) => {
const recent = CHANGELOG_ENTRIES.slice(0, count);
return (
<section className="border-b border-[var(--cm-border)] bg-[var(--cm-bg-elevated)] px-6 py-24 md:px-12 md:py-28">
<div className="mx-auto max-w-[var(--cm-max-w)]">
<Reveal className="mb-6 flex justify-center">
<SectionIcon glyph="grid" />
</Reveal>
<Reveal delay={1}>
<p
className="text-center text-[11px] uppercase tracking-[0.2em] text-[var(--cm-fg-tertiary)]"
style={{ fontFamily: "var(--cm-font-mono)" }}
>
release log · last {count} ships
</p>
</Reveal>
<Reveal delay={2}>
<h2
className="mt-3 text-center text-[clamp(1.75rem,3.5vw,2.5rem)] font-medium leading-[1.15] text-[var(--cm-fg)]"
style={{ fontFamily: "var(--cm-font-serif)" }}
>
What shipped this week
</h2>
</Reveal>
<Reveal delay={3}>
<p
className="mx-auto mt-3 max-w-xl text-center text-[14px] leading-[1.65] text-[var(--cm-fg-secondary)]"
style={{ fontFamily: "var(--cm-font-sans)" }}
>
Every release is in production on{" "}
<span
className="text-[var(--cm-fg)]"
style={{ fontFamily: "var(--cm-font-mono)" }}
>
wss://ic.claudemesh.com
</span>{" "}
within minutes. The CLI publishes to npm; the broker auto-deploys.
</p>
</Reveal>
<Reveal delay={4}>
<ol className="mx-auto mt-12 max-w-3xl space-y-4">
{recent.map((entry, idx) => (
<li key={entry.version + entry.date}>
<Link
href="/changelog"
className="group block rounded-[var(--cm-radius-md)] border border-[var(--cm-border)] bg-[var(--cm-bg)] p-5 transition-colors hover:border-[var(--cm-clay)]/40"
>
<div className="flex flex-wrap items-baseline gap-x-3 gap-y-1">
<span
className="rounded-[3px] px-1.5 py-0.5 text-[10px] font-medium uppercase tracking-wider"
style={{
fontFamily: "var(--cm-font-mono)",
backgroundColor: CHANGELOG_TYPE_COLOR[entry.type],
color: "var(--cm-gray-900)",
}}
>
{CHANGELOG_TYPE_LABELS[entry.type]}
</span>
<span
className="text-[16px] font-medium text-[var(--cm-fg)]"
style={{ fontFamily: "var(--cm-font-serif)" }}
>
v{entry.version}
</span>
<time
dateTime={entry.date}
className="text-[11px] text-[var(--cm-fg-tertiary)]"
style={{ fontFamily: "var(--cm-font-mono)" }}
>
{new Date(entry.date).toLocaleDateString("en-US", {
year: "numeric",
month: "short",
day: "numeric",
})}
</time>
{idx === 0 && (
<span
className="rounded-full bg-[var(--cm-clay)]/15 px-2 py-0.5 text-[10px] font-medium uppercase tracking-wider text-[var(--cm-clay)]"
style={{ fontFamily: "var(--cm-font-mono)" }}
>
latest
</span>
)}
</div>
<h3
className="mt-2.5 text-[15px] font-medium text-[var(--cm-fg)] transition-colors group-hover:text-[var(--cm-clay)]"
style={{ fontFamily: "var(--cm-font-sans)" }}
>
{entry.title}
</h3>
<p
className="mt-2 line-clamp-2 text-[13px] leading-[1.6] text-[var(--cm-fg-secondary)]"
style={{ fontFamily: "var(--cm-font-sans)" }}
>
{entry.summary}
</p>
</Link>
</li>
))}
</ol>
</Reveal>
<Reveal delay={5}>
<div className="mt-10 flex justify-center">
<Link
href="/changelog"
className="group inline-flex items-center gap-2 text-[13px] font-medium text-[var(--cm-fg-secondary)] transition-colors hover:text-[var(--cm-clay)]"
style={{ fontFamily: "var(--cm-font-sans)" }}
>
<span className="border-b border-dashed border-[var(--cm-fg-tertiary)] pb-0.5 transition-colors group-hover:border-[var(--cm-clay)]">
Read the full changelog
</span>
<span className="transition-transform duration-300 group-hover:translate-x-1">
</span>
</Link>
</div>
</Reveal>
</div>
</section>
);
};

View File

@@ -111,8 +111,9 @@ export const Pricing = () => {
className="mb-4 text-[12px] leading-[1.5] text-[var(--cm-fg-tertiary)]"
style={{ fontFamily: "var(--cm-font-sans)" }}
>
Paid tiers launch when the dashboard ships. Beta users keep
the free plan for life.
Paid tiers launch when we exit beta and add team-scale
features (SSO, audit retention, dedicated brokers). Beta
users keep the free plan for life.
</p>
<Link
href="/auth/register"

View File

@@ -85,6 +85,23 @@ const MILESTONES = [
],
stat: "43 MCP tools total",
},
{
version: "v0.9 → 1.34",
phase: "Daemon · multi-mesh · multi-session",
color: "var(--cm-cactus)",
items: [
"Persistent daemon — long-lived broker WS, durable outbox/inbox",
"Universal multi-mesh daemon — one process, every joined mesh",
"Per-session IPC tokens — auto-scope to the launched session",
"Per-session broker presence — sibling sessions see each other",
"Self-healing daemon lifecycle (auto-spawn, version probe)",
"Multi-session correctness train — per-recipient SSE demux + inbox scoping",
"Refuse-to-kick on control-plane (no more no-op kicks)",
"Caller-stable idempotency on every send",
"Stale CLAUDEMESH_CONFIG_DIR fallback",
],
stat: "1.34.15 shipped",
},
];
export const Timeline = () => {
@@ -94,7 +111,7 @@ export const Timeline = () => {
<section className="border-b border-[var(--cm-border)] bg-[var(--cm-bg)] px-6 py-24 md:px-12 md:py-32">
<div className="mx-auto max-w-[var(--cm-max-w)]">
<Reveal className="mb-6 flex justify-center">
<SectionIcon glyph="layers" />
<SectionIcon glyph="grid" />
</Reveal>
<Reveal delay={1}>
<h2
@@ -109,7 +126,8 @@ export const Timeline = () => {
className="mx-auto mt-4 max-w-xl text-center text-[15px] leading-[1.6] text-[var(--cm-fg-secondary)]"
style={{ fontFamily: "var(--cm-font-sans)" }}
>
66 npm releases. Every feature below is in production today.
120+ npm releases through v1.34.15. Every feature below is in
production today.
</p>
</Reveal>
@@ -210,8 +228,8 @@ export const Timeline = () => {
className="text-[14px] text-[var(--cm-fg-tertiary)]"
style={{ fontFamily: "var(--cm-font-serif)" }}
>
Daemon redesign · per-topic encryption · self-host
packaging · federation
HKDF cross-machine identity · session capabilities · A2A
interop · self-host packaging · federation
</span>
</div>
</div>

View File

@@ -4,28 +4,28 @@ import Link from "next/link";
const NEWS = [
{
tag: "New",
title: "claudemesh launch (v0.1.4)",
body: "Real-time peer messages pushed into Claude Code mid-turn. One command. Source open at github.com/alezmad/claudemesh-cli.",
href: "https://github.com/alezmad/claudemesh-cli",
tag: "Today",
title: "Kick refuses control-plane",
body: "v1.34.15 — broker now skips control-plane peers on kick and acks the skip. Use ban for hard removal, or take the daemon down for transient cases.",
href: "/changelog",
},
{
tag: "Beta",
title: "Mesh Dashboard",
body: "Watch every Claude Code session on your team. Routes, presence, priority — all live.",
href: "#",
tag: "This week",
title: "Multi-session correctness",
body: "1.34.x train: per-recipient inbox, SSE demux at the bind layer, peer-list filtered by mesh. Multiple sessions on one machine no longer cross-talk.",
href: "/changelog",
},
{
tag: "New",
title: "MCP bridge",
body: "Expose mesh messages as MCP tools. Your agent can message peers without leaving its context.",
href: "#",
tag: "Shipped",
title: "Per-session presence",
body: "v1.30.0 — every Claude Code session gets its own ed25519 keypair and parent attestation. The broker tracks sessions, not machines.",
href: "/changelog",
},
{
tag: "Launch",
title: "Self-hosted broker",
body: "One binary. SQLite-backed. Runs on a Pi. Your mesh, never the cloud's.",
href: "#",
tag: "Shipped",
title: "Multi-mesh daemon",
body: "v1.26.0 — one daemon, every mesh you've joined. Switch context with a flag. Self-host the broker in your VPC; same CLI, your URL.",
href: "/changelog",
},
];

View File

@@ -25,6 +25,14 @@ const CARDS: Card[] = [
weDo: "claudemesh connects full, independent Claude Code sessions across machines, across developers, across continents. Each peer keeps its own repo, its own perspective, its own scrollback.",
tone: "compare",
},
{
label: "vs. Agent Teams",
title: "Multi-agent within one machine",
theyDo:
"Anthropic's experimental Agent Teams (Feb 2026, Claude Code v2.1.32+) coordinates multiple Claude Code sessions inside ONE Unix user's ~/.claude/ directory on ONE machine. Mailbox in process. Task list in a markdown file. Lead is fixed. Cleanup wipes the state.",
weDo: "claudemesh runs across machines, users, and organizations. State, memory, topics, and skills survive every session. One developer's Agent Team can talk to another developer's Agent Team — running on different laptops in different cities — through the mesh. Use Agent Teams for within-machine concurrency, claudemesh for between-machine reach.",
tone: "compare",
},
{
label: "vs. OpenClaw",
title: "Autonomous agents that run while you sleep",
@@ -35,10 +43,10 @@ const CARDS: Card[] = [
},
{
label: "What claudemesh is",
title: "The wire between Claude Code sessions",
title: "The wire across machines, users, and orgs",
theyDo:
"Every Claude Code session today is an island. Context dies with the terminal. Skills and MCPs are per-developer. Teammates relay insights through Slack.",
weDo: "claudemesh is one thing: a peer network for Claude Code. Share context, files, skills, MCPs, and slash commands across sessions — end-to-end encrypted. Host the broker on claudemesh.com or run it in your VPC. Same CLI either way.",
"Every Claude Code session is an island unless you wrap it. Anthropic's Agent Teams now ties them together within one Unix user, one machine. Beyond that — across laptops, across team members, across companies — the gap is still wide.",
weDo: "claudemesh is one thing: an end-to-end encrypted backbone where Claude Code sessions, autonomous agents, and humans coordinate across every boundary your existing tools stop at. Persistent state, topics, memory, and skills span every machine the mesh reaches. Host the broker on claudemesh.com or run it in your VPC. Same CLI either way.",
tone: "claim",
},
];

View File

@@ -382,24 +382,102 @@ one chokepoint per layer.
| inbox.db | `recipient_pubkey` / `recipient_kind` columns | 1.34.11 |
| outbox.db | `sender_session_pubkey` for routing | 1.34.0 |
### Known gaps tracked for follow-ups
### Known gaps — status after the 2026-05-04 follow-up sprint
- `claudemesh launch` exports `CLAUDEMESH_CONFIG_DIR` /
`CLAUDEMESH_IPC_TOKEN_FILE` into the parent shell; vars persist
after the launched session exits and silently break subsequent
CLI calls until unset. Fish lacks `unset`; users hit
`set -e CLAUDEMESH_CONFIG_DIR`.
- Broker `listPeers` ignores `--mesh` filter (server-side returns
global peer set across all meshes regardless of the query
param). Read-view noise only; doesn't affect correctness.
- `kick` on a daemon's control-plane WS is effectively a no-op
(it auto-reconnects within seconds). Wants either a mesh-admin
cap check or a `presence pause [--mesh X]` verb.
- Session capabilities don't exist as a first-class concept — a
launched session inherits ALL of its parent member's grants.
Parent attestation is just an existence proof; it doesn't carry
a capability subset. Worth filling in before any cross-org
use case lands.
Three of the four 1.34.x triage gaps shipped in 1.34.14 + 1.34.15
(2026-05-04). Gap #4 is spec'd and queued.
-**Stale `CLAUDEMESH_CONFIG_DIR` falls back** *(1.34.14)*. The
env var no longer silently breaks subsequent CLI calls. When the
inherited path points at a tmpdir that no longer exists,
`paths.ts` warns once on stderr (TTY-only) with a shell-specific
unset hint and falls back to `~/.claudemesh`. The dir-existence
check (not `config.json`) keeps fresh-launch first-write working.
-**`peer list --mesh <slug>` actually scopes** *(1.34.15)*.
Diagnosis from the original triage was wrong — broker has been
scoping correctly since 1.26.0 via `conn.meshId`. Bug was CLI-
side: `tryListPeersViaDaemon()` was called with no argument in
`commands/peers.ts:140` and `commands/launch.ts:407`. Both now
forward the slug as `?mesh=<slug>`. `send.ts` cross-mesh hex-
prefix resolution intentionally untouched.
-**`kick` refuses no-op kicks on control-plane** *(1.34.15)*.
Broker now skips peers where `peerRole === "control-plane"` and
surfaces them in a new additive ack field
`skipped_control_plane`; CLI reads it and points the user at
`ban` (remove member) or `daemon down` (take a daemon offline
locally). Soft `disconnect` keeps old behavior — useful when
intentionally nudging a control-plane peer to re-authenticate.
`PeerConn` gains a `peerRole` slot populated at both
`connections.set` sites. The richer `presence pause [--mesh X]`
verb (option (b) from the triage) deferred as its own feature.
- 📋 **Session capabilities — spec only**. Launched sessions still
inherit all member grants transitively. Spec at
`.artifacts/specs/2026-05-04-session-capabilities.md` covers a v2
parent attestation alongside v1 with an `allowed_caps[]` subset,
broker enforcement as `intersection(member.peerGrants, session.
allowed_caps)`, and a bonus `state-write` cap to close the "any
session can clobber shared keys like `current-pr`" footgun.
Default when no caps subset is declared = full member set
(today's behavior; opt-in restriction). Ships behind a 1-week
dry-run window before flipping enforcement, mirroring the
original per-peer-capabilities rollout. ~1 sprint of focused
work; queued behind v0.3.0 topic-encryption.
---
## v1.34.16 + broker — *continuous presence* — *shipped*
User report on 2026-05-05: `claudemesh peer list` returned zero
peers despite running sessions. Diagnosis: half-dead WS connections
that NAT/CGNAT silently dropped, with no application-layer staleness
detection on either side. Linux TCP keepalive default ≈ 2hrs idle
+ 11min probes — sessions stayed zombie for hours before the kernel
RST'd the socket and the daemon's existing close-handler reconnect
fired.
Two layers shipped together:
- **Liveness watchdogs** *(broker + CLI 1.34.16)*. Both sides now
detect stalled WS in 75s instead of waiting for the kernel.
- Broker: `PeerConn.lastPongAt` bumped on every `pong`. The 30s
ping loop also calls `ws.terminate()` on conns whose pong is
>75s stale, firing the close handler → existing peer_left
cleanup.
- Daemon: `ws-lifecycle.ts` adds an idle watchdog at 30s cadence,
started after hello-ack. Bumps `lastActivity` on incoming
message + ping + pong frames. Sends its own `sock.ping()` if
activity is recent, `sock.terminate()` if idle >75s. Watchdog
cleared on close + explicit close().
- 100x improvement on detection time (2hrs → 75s).
- **Lease model** *(broker only, no protocol change)*. Peers no
longer see `peer_left`/`peer_joined` for transient reconnects.
- `PeerConn` gains `leaseState` ("online"|"offline"), `leaseUntil`,
`evictionTimer`. On WS close, the conn enters **offline-leased**
state for 90s instead of immediate cleanup.
- `handleHello` and `handleSessionHello` check for an offline-
leased entry matching the stable identity before running session-
id dedup. On match: clear `evictionTimer`, swap `ws`, restore
online state, drain queued DMs, return `silent: true`. The
hello dispatcher skips the peer_joined broadcast.
- `evictPresenceFully` extracted from the close handler — runs
the peer_left broadcast + cleanup (URL watches, streams, MCP
registry, clock auto-pause). Called by `evictionTimer` after 90s
grace, or directly when no lease was online (defensive).
- `broker.ts` exports `restorePresence(presenceId)` — clears
`disconnectedAt` + bumps `lastPingAt`, called on reattach to
undo the DB-level stale-presence sweeper if it fired during
grace.
- DMs sent during grace fall through to the existing message_queue
path (sendToPeer no-ops on dead WS, queue row stays with
deliveredAt=NULL, drained on reattach). Backward compatible
with old daemons.
Spec at `.artifacts/specs/2026-05-05-continuous-presence.md`.
Layer 3 (resume token to skip full attestation on reconnect) deferred
— pure optimization, not needed for the user-visible "no
invisibility moment" goal.
*Shipped 2026-05-05.*
---