Compare commits
2 Commits
6780899185
...
96520394ff
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
96520394ff | ||
|
|
a2a53ff355 |
288
.artifacts/specs/2026-05-04-session-capabilities.md
Normal file
288
.artifacts/specs/2026-05-04-session-capabilities.md
Normal file
@@ -0,0 +1,288 @@
|
||||
# Session capabilities — first-class concept
|
||||
|
||||
**Status:** spec, queued behind v0.3.0 topic-encryption work.
|
||||
**Owner:** alezmad
|
||||
**Author:** Claude (Sprint B follow-up, 2026-05-04)
|
||||
**Related:** `2026-04-15-per-peer-capabilities.md` (existing per-peer
|
||||
caps system, member-keyed), `2026-05-04-per-session-presence.md`
|
||||
(per-launch session presence — what we're now restricting).
|
||||
|
||||
## Problem
|
||||
|
||||
Per-peer capability grants (`apps/broker/src/index.ts:2178+, 2309+`)
|
||||
are keyed on the sender's **stable member pubkey**. The grant model
|
||||
gives the recipient fine-grained control: "alice can DM me",
|
||||
"bob can read state but not broadcast", etc.
|
||||
|
||||
But: as of v1.30.0 (`per-session-presence`), every `claudemesh
|
||||
launch` mints a per-launch ephemeral keypair with a parent attestation
|
||||
binding it to the member identity. The launched session inherits **all**
|
||||
the member's capabilities transitively, because cap enforcement always
|
||||
falls through to the member key.
|
||||
|
||||
Concretely:
|
||||
|
||||
- Member `alice` is in mesh `flexicar`, granted `dm + state-read +
|
||||
state-write` by everyone.
|
||||
- Alice launches a session with `claudemesh launch` to do an automated
|
||||
task — say, run a Claude Code agent that iterates over PRs.
|
||||
- That session has full member privileges. It can DM peers, write
|
||||
shared state keys (e.g. clobber `current-pr`), grant new caps, ban
|
||||
members, etc. — none of which the user wanted to delegate.
|
||||
|
||||
There is no way to express "this session can DM peers but cannot
|
||||
deploy services or grant caps." The parent attestation is a binary
|
||||
existence proof — "this session was vouched by a member" — with no
|
||||
capability subset.
|
||||
|
||||
Plus an adjacent footgun: `set_state` (`apps/broker/src/index.ts:2949`)
|
||||
has **no cap check at all**. Anyone in the mesh can write any key. The
|
||||
spec at `2026-04-15-per-peer-capabilities.md` lists `state-write` as a
|
||||
planned cap but it was never wired into the broker. Shared keys like
|
||||
`current-pr` are write-anyone today.
|
||||
|
||||
## Goal
|
||||
|
||||
A launched session can be issued **a capability subset** of its
|
||||
parent member, signed by the parent at launch time, and the broker
|
||||
enforces the **intersection** of recipient grants × session caps on
|
||||
every protected operation.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Changing the existing per-peer cap model. Member-keyed grants stay
|
||||
authoritative for "who is allowed to talk to me."
|
||||
- Cross-machine session caps (waiting on 2.0.0 HKDF identity).
|
||||
- Per-tool granularity inside the Claude Code MCP surface — this
|
||||
spec only covers the broker-enforceable verbs (dm, broadcast,
|
||||
state-read, state-write, grant, kick, ban, profile-write,
|
||||
service-deploy).
|
||||
- Delegation: a session cannot re-vouch a sub-session with its own
|
||||
cap subset. Only members can attest sessions. (Could be lifted in
|
||||
a future spec; today's launch flow doesn't need it.)
|
||||
|
||||
## Design
|
||||
|
||||
### Capability vocabulary
|
||||
|
||||
Existing (today, member-level):
|
||||
|
||||
| Capability | Effect when GRANTED on a recipient → sender pair |
|
||||
|---------------|---------------------------------------------------|
|
||||
| `read` | Sender appears in recipient's `list_peers` |
|
||||
| `dm` | Sender can DM recipient |
|
||||
| `broadcast` | Sender's broadcasts reach recipient |
|
||||
| `state-read` | Sender can read shared state |
|
||||
| `state-write` | (planned) Sender can write shared state |
|
||||
| `file-read` | Sender can fetch files recipient shared |
|
||||
|
||||
New (session-level — cap subset on the attestation):
|
||||
|
||||
These are the **verbs the session is allowed to invoke**, NOT what
|
||||
peers can do TO it. A session attestation declaring `["dm", "read"]`
|
||||
means the session can SEND dm/read-list operations; it cannot
|
||||
broadcast, write state, grant, etc.
|
||||
|
||||
| Session cap | Gates which broker operations |
|
||||
|-------------------|------------------------------------------------|
|
||||
| `dm` | `send` with single recipient |
|
||||
| `broadcast` | `send` with `*`, `@group`, `#topic` |
|
||||
| `state-read` | `get_state`, `list_state` |
|
||||
| `state-write` | `set_state` |
|
||||
| `grant` | `grant`, `revoke`, `block` |
|
||||
| `kick` | `kick`, `disconnect` |
|
||||
| `ban` | `ban`, `unban` |
|
||||
| `profile-write` | `set_profile`, `set_summary`, `set_status` |
|
||||
| `service-deploy` | `mesh_service_register`, `_unregister` |
|
||||
|
||||
The default cap set when no subset is declared: the **full member
|
||||
set** (today's behavior — opt-in restriction, not breaking).
|
||||
|
||||
### Attestation v2
|
||||
|
||||
Existing v1 (`apps/cli/src/services/broker/session-hello-sig.ts`):
|
||||
|
||||
```
|
||||
canonical = `claudemesh-session-attest|<parent>|<session>|<expires>`
|
||||
```
|
||||
|
||||
New v2 (additive — broker accepts both):
|
||||
|
||||
```
|
||||
canonical = `claudemesh-session-attest-v2|<parent>|<session>|<expires>|<sorted-caps-csv>`
|
||||
```
|
||||
|
||||
Where `<sorted-caps-csv>` is the lower-cased, comma-joined,
|
||||
ASCII-sorted cap list. Empty-list = full member caps (default,
|
||||
back-compat).
|
||||
|
||||
**Wire shape additions on `session_hello`:**
|
||||
|
||||
```ts
|
||||
{
|
||||
type: "session_hello",
|
||||
...existing fields...,
|
||||
parentAttestation: {
|
||||
sessionPubkey,
|
||||
parentMemberPubkey,
|
||||
expiresAt,
|
||||
signature,
|
||||
// NEW:
|
||||
allowed_caps?: string[], // omitted = full member set
|
||||
version?: 2, // omitted = v1
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
The broker version-detects: `version === 2` → verify v2 canonical
|
||||
including `allowed_caps`. Default behavior is unchanged for clients
|
||||
that don't pass it.
|
||||
|
||||
### Enforcement
|
||||
|
||||
Add `allowed_caps: string[] | null` to the in-memory `PeerConn`
|
||||
shape (`apps/broker/src/index.ts:131`). Populated from
|
||||
`handleSessionHello` (the v2 attestation supplies it) and from
|
||||
`handleHello` (control-plane / member connection — set to `null`,
|
||||
meaning "full member caps").
|
||||
|
||||
**Effective cap check** for a sending peer needing `cap`:
|
||||
|
||||
```ts
|
||||
function senderHasCap(conn: PeerConn, cap: string): boolean {
|
||||
if (conn.allowed_caps === null) return true; // member-level, no subset
|
||||
return conn.allowed_caps.includes(cap);
|
||||
}
|
||||
```
|
||||
|
||||
Wire this into every broker operation in the table above. The
|
||||
existing per-peer recipient-cap check at `2178+, 2309+` stays —
|
||||
session caps gate the **sender side**, recipient grants gate the
|
||||
**receive side**, and both must allow:
|
||||
|
||||
```
|
||||
allowed = senderHasCap(conn, capNeeded) && recipientGrants[sender][capNeeded]
|
||||
```
|
||||
|
||||
### `set_state` gate (bonus, ship together)
|
||||
|
||||
Today: no cap check. After this spec: `set_state` requires
|
||||
`state-write` on the sender side. Migration: existing members
|
||||
default to having `state-write` in their member caps (no recipient
|
||||
grant model for state-write — it's a sender-side gate only, mesh-
|
||||
wide). New attestations can omit it to forbid the session.
|
||||
|
||||
The recipient-side analog (per-peer state-write grants) is left for
|
||||
a future spec — today the value of guarding state-write is
|
||||
session-level (avoid an automated session clobbering shared keys),
|
||||
not peer-level.
|
||||
|
||||
### CLI surface
|
||||
|
||||
```
|
||||
claudemesh launch --caps dm,read # tight: read-only chat agent
|
||||
claudemesh launch --caps dm,broadcast # send-only, no state writes
|
||||
claudemesh launch # default: full member caps
|
||||
```
|
||||
|
||||
`claudemesh launch --caps ?` prints the table above with descriptions.
|
||||
|
||||
`claudemesh peer list --json` includes `allowed_caps` per row when
|
||||
present (`null` = full member). Lets users audit what their running
|
||||
sessions can actually do.
|
||||
|
||||
### Migration plan (mirrors `2026-04-15-per-peer-capabilities.md` §"Migration plan")
|
||||
|
||||
1. **Broker schema additive** — `PeerConn.allowed_caps` in-memory
|
||||
only; no DB column. Reload-on-reconnect is fine because the
|
||||
attestation is re-sent on every WS open (it's the proof of
|
||||
identity).
|
||||
|
||||
2. **CLI ships v2 attestation alongside v1.** New `--caps` flag
|
||||
defaults to omitted (= v1 attestation, full caps). Older
|
||||
brokers ignore the new fields entirely.
|
||||
|
||||
3. **Broker accepts v2.** When `allowed_caps` arrives, store it.
|
||||
No enforcement yet — log denied operations as `cap_check_dryrun`
|
||||
metric counter, still allow them through.
|
||||
|
||||
4. **Dry-run release.** Ship one CLI + broker release that emits
|
||||
the metric but doesn't enforce. Watch for false positives in
|
||||
real meshes for ≥ 1 week.
|
||||
|
||||
5. **Flip enforcement on.** Broker rejects operations failing the
|
||||
cap check with `forbidden: missing session capability "<cap>"`.
|
||||
Default ("no caps declared = full member") keeps existing
|
||||
sessions unaffected.
|
||||
|
||||
6. **`set_state` gate** ships in step 5 alongside the rest. Default
|
||||
member caps include `state-write`, so flipping it on doesn't
|
||||
break existing flows. Only sessions that explicitly omit
|
||||
`state-write` from `--caps` lose write access.
|
||||
|
||||
### Crypto notes
|
||||
|
||||
- v2 attestation re-uses `crypto_sign_detached` over the new
|
||||
canonical string; same parent member secret key, same TTL caps
|
||||
(≤24 h), same `expiresAt` semantics.
|
||||
- v1 signatures are NOT v2 signatures — collision is impossible
|
||||
because the canonical strings have different prefixes
|
||||
(`claudemesh-session-attest` vs `claudemesh-session-attest-v2`).
|
||||
Domain separation is intrinsic.
|
||||
- Like the existing per-peer cap system: caps are server-enforced
|
||||
metadata, not capability tokens. A malicious broker can ignore
|
||||
them. This is about UX trust + footgun prevention, not protocol-
|
||||
level security.
|
||||
|
||||
## Open questions
|
||||
|
||||
1. **Should the session attestation also bind to a fingerprint of
|
||||
the launched binary / Claude version?** Would let a member say
|
||||
"this session is constrained to Claude Code v1.34.15" so a
|
||||
compromised launched-binary doesn't get reused. Probably no — too
|
||||
much friction for the threat model.
|
||||
|
||||
2. **What's the right default for `claudemesh launch` going forward?**
|
||||
Once enforcement ships, do we change the default `--caps` from
|
||||
"full member" to "dm + read + state-read"? Tighter but breaks
|
||||
existing automation that writes state. Probably worth a one-
|
||||
release deprecation warning ("your session will lose state-write
|
||||
in v2.0.0 unless you pass --caps state-write") and then flip in
|
||||
v2.0.0.
|
||||
|
||||
3. **Does `--caps` belong in `~/.claudemesh/config.json` per-mesh
|
||||
defaults too?** A user who always launches read-only agents
|
||||
wants `caps: ["dm", "read"]` as a personal default. Easy add;
|
||||
defer until users ask for it.
|
||||
|
||||
4. **Per-tool MCP cap surface?** Out of scope here, but: a `claudemesh
|
||||
launch --tools peer:read,memory:write` would be a finer cut than
|
||||
broker-verb caps. The broker can't enforce that — it'd live in the
|
||||
MCP wrapper / Claude Code's allowedTools. Different layer.
|
||||
|
||||
## Test plan
|
||||
|
||||
- Pure-logic tests on `senderHasCap` (member-level → always true,
|
||||
empty caps → always false, declared caps → exact match).
|
||||
- Broker integration: launch a session with `--caps dm`, attempt
|
||||
`set_state` → expect `forbidden: missing session capability
|
||||
"state-write"`.
|
||||
- v1 attestation still accepted, no `allowed_caps` set, all caps
|
||||
permitted (back-compat).
|
||||
- v2 attestation with empty `allowed_caps` array → broker treats
|
||||
as "explicitly empty, no caps allowed" (NOT "full member"). The
|
||||
full-member default is "field omitted entirely". Test both.
|
||||
- Dry-run mode: cap fail increments the counter but the operation
|
||||
proceeds. Smoke-test before flipping enforcement.
|
||||
|
||||
## Estimate
|
||||
|
||||
- Spec review + open-question resolution: 1–2 days.
|
||||
- Broker change (PeerConn field, attestation v2 accept, per-verb
|
||||
enforcement, dry-run mode): 2–3 days.
|
||||
- CLI change (`--caps` flag, attestation builder, peer list
|
||||
surface): 1 day.
|
||||
- Tests: 1 day.
|
||||
- Dry-run release window: ≥ 1 week.
|
||||
|
||||
Total: ~1 sprint of focused work, plus a dry-run window.
|
||||
@@ -156,6 +156,11 @@ interface PeerConn {
|
||||
bio?: string;
|
||||
capabilities?: string[];
|
||||
};
|
||||
/** v2 agentic-comms presence taxonomy. Mirrors the value passed to
|
||||
* `recordPresence`. Used by the kick handler to refuse no-op kicks
|
||||
* on long-lived control-plane connections (daemon, dashboard) that
|
||||
* would just auto-reconnect. */
|
||||
peerRole: "control-plane" | "session" | "service";
|
||||
}
|
||||
|
||||
const connections = new Map<string, PeerConn>();
|
||||
@@ -1797,6 +1802,7 @@ async function handleHello(
|
||||
groups: initialGroups,
|
||||
visible: saved?.visible ?? true,
|
||||
profile: saved?.profile ?? {},
|
||||
peerRole: "control-plane",
|
||||
});
|
||||
incMeshCount(hello.meshId);
|
||||
void audit(hello.meshId, "peer_joined", member.id, effectiveDisplayName, {
|
||||
@@ -2022,6 +2028,7 @@ async function handleSessionHello(
|
||||
groups: initialGroups,
|
||||
visible: true,
|
||||
profile: {},
|
||||
peerRole: "session",
|
||||
});
|
||||
incMeshCount(hello.meshId);
|
||||
void audit(hello.meshId, "peer_joined", member.id, effectiveDisplayName, {
|
||||
@@ -4645,11 +4652,30 @@ function handleConnection(ws: WebSocket): void {
|
||||
}
|
||||
|
||||
const affected: string[] = [];
|
||||
// 1.34.15 (gap #3a): kick was a no-op against long-lived
|
||||
// control-plane connections (daemon, dashboard) — closing
|
||||
// their WS just triggered the auto-reconnect loop, the
|
||||
// kicker's CLI rendered "Their Claude Code session ended"
|
||||
// (which was misleading), and the user-visible state was
|
||||
// unchanged seconds later. We now refuse to close control-
|
||||
// plane WSes and surface the skipped peers in a new
|
||||
// additive ack field. Pre-1.34.15 CLI clients only read
|
||||
// `kicked`/`affected`, so this stays back-compat.
|
||||
//
|
||||
// For `kick`-only: the soft `disconnect` verb still closes
|
||||
// control-plane WSes intentionally — that's what users want
|
||||
// when they're nudging a peer for it to re-authenticate.
|
||||
const skippedControlPlane: string[] = [];
|
||||
const skipControlPlane = isKick;
|
||||
const now = Date.now();
|
||||
|
||||
if (km.all) {
|
||||
for (const [pid, peer] of connections) {
|
||||
if (peer.meshId !== conn.meshId || pid === presenceId) continue;
|
||||
if (skipControlPlane && peer.peerRole === "control-plane") {
|
||||
skippedControlPlane.push(peer.displayName || pid);
|
||||
continue;
|
||||
}
|
||||
try { peer.ws.close(closeCode, closeReason); } catch {}
|
||||
connections.delete(pid);
|
||||
void disconnectPresence(pid);
|
||||
@@ -4661,6 +4687,10 @@ function handleConnection(ws: WebSocket): void {
|
||||
if (peer.meshId !== conn.meshId || pid === presenceId) continue;
|
||||
const [pres] = await db.select({ lastPingAt: presence.lastPingAt }).from(presence).where(eq(presence.id, pid)).limit(1);
|
||||
if (pres && pres.lastPingAt && pres.lastPingAt.getTime() < cutoff) {
|
||||
if (skipControlPlane && peer.peerRole === "control-plane") {
|
||||
skippedControlPlane.push(peer.displayName || pid);
|
||||
continue;
|
||||
}
|
||||
try { peer.ws.close(closeCode, `${closeReason}_stale`); } catch {}
|
||||
connections.delete(pid);
|
||||
void disconnectPresence(pid);
|
||||
@@ -4671,6 +4701,10 @@ function handleConnection(ws: WebSocket): void {
|
||||
for (const [pid, peer] of connections) {
|
||||
if (peer.meshId !== conn.meshId) continue;
|
||||
if (peer.displayName === km.target || peer.memberPubkey === km.target || peer.memberPubkey.startsWith(km.target)) {
|
||||
if (skipControlPlane && peer.peerRole === "control-plane") {
|
||||
skippedControlPlane.push(peer.displayName || pid);
|
||||
continue;
|
||||
}
|
||||
try { peer.ws.close(closeCode, closeReason); } catch {}
|
||||
connections.delete(pid);
|
||||
void disconnectPresence(pid);
|
||||
@@ -4679,8 +4713,20 @@ function handleConnection(ws: WebSocket): void {
|
||||
}
|
||||
}
|
||||
|
||||
conn.ws.send(JSON.stringify({ type: ackType, kicked: affected, affected, _reqId: km._reqId }));
|
||||
log.info(`ws ${closeReason}`, { presence_id: presenceId, count: affected.length, target: km.target ?? km.stale ?? "all" });
|
||||
conn.ws.send(JSON.stringify({
|
||||
type: ackType,
|
||||
kicked: affected,
|
||||
affected,
|
||||
// Additive — older CLI clients ignore this field.
|
||||
...(skippedControlPlane.length > 0 ? { skipped_control_plane: skippedControlPlane } : {}),
|
||||
_reqId: km._reqId,
|
||||
}));
|
||||
log.info(`ws ${closeReason}`, {
|
||||
presence_id: presenceId,
|
||||
count: affected.length,
|
||||
target: km.target ?? km.stale ?? "all",
|
||||
skipped_control_plane: skippedControlPlane.length,
|
||||
});
|
||||
break;
|
||||
}
|
||||
|
||||
|
||||
47
apps/broker/tests/kick-control-plane-skip.test.ts
Normal file
47
apps/broker/tests/kick-control-plane-skip.test.ts
Normal file
@@ -0,0 +1,47 @@
|
||||
/**
|
||||
* Kick control-plane skip: 1.34.15 (gap #3a) refuses to close
|
||||
* long-lived control-plane connections (claudemesh daemon, dashboard)
|
||||
* via `kick`, because they auto-reconnect within seconds and the verb
|
||||
* was effectively a no-op. The soft `disconnect` verb keeps the old
|
||||
* behavior so users can still nudge a control-plane peer to
|
||||
* re-authenticate.
|
||||
*
|
||||
* Pure-logic test — mirrors the branch inside handleSend's kick case
|
||||
* without spinning up a broker. Same pattern as
|
||||
* grants-enforcement.test.ts.
|
||||
*/
|
||||
|
||||
import { describe, expect, test } from "vitest";
|
||||
|
||||
type PeerRole = "control-plane" | "session" | "service";
|
||||
|
||||
/** Mirrors the predicate inserted into the kick handler. */
|
||||
function shouldSkipKick(args: {
|
||||
verb: "kick" | "disconnect";
|
||||
peerRole: PeerRole;
|
||||
}): boolean {
|
||||
const skipControlPlane = args.verb === "kick";
|
||||
return skipControlPlane && args.peerRole === "control-plane";
|
||||
}
|
||||
|
||||
describe("kick control-plane skip (gap #3a)", () => {
|
||||
test("kick on control-plane → skipped (would auto-reconnect)", () => {
|
||||
expect(shouldSkipKick({ verb: "kick", peerRole: "control-plane" })).toBe(true);
|
||||
});
|
||||
|
||||
test("kick on session → not skipped (closes user session)", () => {
|
||||
expect(shouldSkipKick({ verb: "kick", peerRole: "session" })).toBe(false);
|
||||
});
|
||||
|
||||
test("kick on service → not skipped", () => {
|
||||
expect(shouldSkipKick({ verb: "kick", peerRole: "service" })).toBe(false);
|
||||
});
|
||||
|
||||
test("disconnect on control-plane → not skipped (intentional nudge)", () => {
|
||||
expect(shouldSkipKick({ verb: "disconnect", peerRole: "control-plane" })).toBe(false);
|
||||
});
|
||||
|
||||
test("disconnect on session → not skipped", () => {
|
||||
expect(shouldSkipKick({ verb: "disconnect", peerRole: "session" })).toBe(false);
|
||||
});
|
||||
});
|
||||
@@ -1,5 +1,110 @@
|
||||
# Changelog
|
||||
|
||||
## 1.34.15 (2026-05-04) — `peer list --mesh` actually scopes + `kick` refuses control-plane
|
||||
|
||||
Two follow-ups from the 1.34.x train, both backwards-compatible.
|
||||
|
||||
### `peer list --mesh <slug>` no longer aggregates across meshes
|
||||
|
||||
`apps/cli/src/commands/peers.ts:140` was calling
|
||||
`tryListPeersViaDaemon()` with no argument, so a multi-mesh daemon
|
||||
returned peers from EVERY attached mesh and the renderer printed
|
||||
"peers on flexicar" with cross-mesh rows mixed in. The daemon's
|
||||
`/v1/peers?mesh=<slug>` filter (server-side, since 1.26.0) was
|
||||
already correctly scoping when the slug was passed; the CLI just
|
||||
wasn't passing it. Fixed.
|
||||
|
||||
`apps/cli/src/commands/launch.ts:407` (the `printBrokerWelcome` peer
|
||||
count in the launch banner) had the same bug. The "N peers online"
|
||||
line in the welcome now shows the count for the launched mesh only.
|
||||
|
||||
`apps/cli/src/commands/send.ts` cross-mesh hex-prefix resolution is
|
||||
intentionally cross-mesh (the user is targeting by hex without
|
||||
specifying a mesh) and was deliberately left as-is.
|
||||
|
||||
### `claudemesh kick` refuses no-op kicks on control-plane connections
|
||||
|
||||
Pre-1.34.15, kicking a daemon's member-WS or a dashboard connection
|
||||
just closed the socket — the daemon's WS-lifecycle reconnect loop
|
||||
brought it back within seconds, the kicker's CLI rendered "Their
|
||||
Claude Code session ended" (which was misleading), and the user-
|
||||
visible state was unchanged. The verb was effectively a no-op, but
|
||||
the user had to learn that the hard way.
|
||||
|
||||
The broker's kick handler (`apps/broker/src/index.ts:4628+`) now
|
||||
skips peers where `peerRole === "control-plane"` and surfaces the
|
||||
skipped peers in a new additive ack field `skipped_control_plane`.
|
||||
The soft `disconnect` verb keeps the old behavior — useful when
|
||||
intentionally nudging a control-plane peer to re-authenticate.
|
||||
|
||||
The CLI (`apps/cli/src/commands/kick.ts`) reads the new field and
|
||||
prints a clearer message: refused peers are listed, with the hint
|
||||
that `claudemesh ban <peer>` is the right tool to remove a member,
|
||||
or `claudemesh daemon down` to take a daemon offline locally.
|
||||
|
||||
`apps/broker/src/index.ts` adds `peerRole` to the in-memory
|
||||
`PeerConn` shape, populated from both connection paths
|
||||
(member-keyed `hello` → `"control-plane"`, per-launch
|
||||
`session_hello` → `"session"`). The DB-side role taxonomy is
|
||||
unchanged.
|
||||
|
||||
### Back-compat
|
||||
|
||||
- Older CLI clients ignore the new `skipped_control_plane` ack
|
||||
field; their kick continues to print "Kicked 0 peer(s)" against
|
||||
a control-plane target as before.
|
||||
- Older brokers don't emit the field at all; newer CLI handles
|
||||
the absence (the new branch is only reached when the field is
|
||||
present and non-empty).
|
||||
- The new `peerRole` slot on `PeerConn` is filled at every
|
||||
`connections.set` callsite, so older code paths never read
|
||||
`undefined`.
|
||||
|
||||
### Tests
|
||||
|
||||
- `apps/broker/tests/kick-control-plane-skip.test.ts` — 5 cases
|
||||
covering the kick/disconnect × control-plane/session/service
|
||||
truth table.
|
||||
|
||||
## 1.34.14 (2026-05-04) — stale `CLAUDEMESH_CONFIG_DIR` falls back
|
||||
|
||||
`claudemesh launch` exports `CLAUDEMESH_CONFIG_DIR=<tmpdir>` to its
|
||||
spawned `claude` so the per-session mesh selection is isolated from
|
||||
`~/.claudemesh/config.json`. The tmpdir is `rmSync`'d on launch exit
|
||||
via the `process.on('exit', cleanup)` handler.
|
||||
|
||||
Footgun: if a later `claudemesh` invocation INHERITED that env — a
|
||||
Bash tool call inside Claude Code, a tmux pane that captured the env
|
||||
via `update-environment`, an exported var the user forgot to clear —
|
||||
the inherited path pointed at a tmpdir that no longer existed.
|
||||
Pre-1.34.14 we silently used the dead path, `readConfig()` came back
|
||||
empty, and the user saw "No meshes joined" from an otherwise-working
|
||||
install. Fish users hit it harder because fish has no `unset` —
|
||||
they had to discover `set -e CLAUDEMESH_CONFIG_DIR`.
|
||||
|
||||
`apps/cli/src/constants/paths.ts` now resolves `CONFIG_DIR` once via
|
||||
a memoized `resolveConfigDir()`:
|
||||
|
||||
1. No env var → `~/.claudemesh` (default, unchanged).
|
||||
2. Env points at a dir containing `config.json` → trust it. The
|
||||
legitimate per-session-launch case is byte-identical to before.
|
||||
3. Env set but stale (dir gone) → warn once on stderr (TTY-only —
|
||||
CI / MCP boot / piped scripts stay quiet) with a shell-specific
|
||||
unset hint, then fall back to `~/.claudemesh`.
|
||||
|
||||
The check is on the directory's existence, not on `config.json`,
|
||||
because a fresh-launch tmpdir legitimately has no `config.json` until
|
||||
the first write. The stale signature we catch is the outer launch's
|
||||
`rmSync(tmpDir, {recursive: true})` cleanup, which removes the
|
||||
directory entirely.
|
||||
|
||||
The "no meshes" check from the original triage was deliberately NOT
|
||||
adopted: a launched session that legitimately joins one mesh would
|
||||
hit it.
|
||||
|
||||
No back-compat surface affected. No other files changed. `_resetPathsForTest()`
|
||||
exported for unit tests.
|
||||
|
||||
## 1.34.13 (2026-05-04) — MCP forwards session token on /v1/events
|
||||
|
||||
The 1.34.10 SSE demux + 1.34.11 inbox per-recipient column were both
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "claudemesh-cli",
|
||||
"version": "1.34.13",
|
||||
"version": "1.34.15",
|
||||
"description": "Peer mesh for Claude Code sessions — CLI + MCP server.",
|
||||
"keywords": [
|
||||
"claude-code",
|
||||
|
||||
@@ -76,12 +76,32 @@ export async function runKick(
|
||||
if ("error" in built) { render.err(String(built.error)); return EXIT.INVALID_ARGS; }
|
||||
|
||||
return await withMesh({ meshSlug }, async (client) => {
|
||||
const result = await client.sendAndWait(built as Record<string, unknown>) as { affected?: string[]; kicked?: string[] };
|
||||
const result = await client.sendAndWait(built as Record<string, unknown>) as {
|
||||
affected?: string[];
|
||||
kicked?: string[];
|
||||
// 1.34.15: broker refuses to kick control-plane WSes (they'd
|
||||
// just auto-reconnect). Older brokers don't emit this field.
|
||||
skipped_control_plane?: string[];
|
||||
};
|
||||
const peers = result?.affected ?? result?.kicked ?? [];
|
||||
if (peers.length === 0) render.info("No peers matched.");
|
||||
else {
|
||||
const skipped = result?.skipped_control_plane ?? [];
|
||||
|
||||
if (peers.length === 0 && skipped.length === 0) {
|
||||
render.info("No peers matched.");
|
||||
} else if (peers.length === 0 && skipped.length > 0) {
|
||||
render.warn(
|
||||
`${skipped.length} match(es) refused: ${skipped.join(", ")} — control-plane connections (daemon / dashboard) auto-reconnect, so kick is a no-op.`,
|
||||
"To take a daemon offline locally, run `claudemesh daemon down` on that machine. To remove a member from the mesh, use `claudemesh ban <peer>`.",
|
||||
);
|
||||
} else {
|
||||
render.ok(`Kicked ${peers.length} peer(s): ${peers.join(", ")}`);
|
||||
render.hint("Their Claude Code session ended. They can rejoin anytime by running `claudemesh`.");
|
||||
if (skipped.length > 0) {
|
||||
render.warn(
|
||||
`(also refused ${skipped.length} control-plane connection(s): ${skipped.join(", ")})`,
|
||||
"Daemon / dashboard connections auto-reconnect; kick is a no-op against them. Use `claudemesh ban <peer>` to remove a member entirely.",
|
||||
);
|
||||
}
|
||||
}
|
||||
return EXIT.SUCCESS;
|
||||
});
|
||||
|
||||
@@ -400,11 +400,13 @@ async function printBrokerWelcome(meshSlug: string): Promise<void> {
|
||||
}
|
||||
} catch { /* daemon unreachable — not fatal */ }
|
||||
|
||||
// Peer count (best-effort).
|
||||
// Peer count (best-effort). 1.34.15: scope to the launched mesh so
|
||||
// multi-mesh daemons don't inflate the welcome banner with peers
|
||||
// from other meshes the user didn't just attach to.
|
||||
let peerCount = -1;
|
||||
try {
|
||||
const { tryListPeersViaDaemon } = await import("~/services/bridge/daemon-route.js");
|
||||
const peers = (await tryListPeersViaDaemon()) ?? [];
|
||||
const peers = (await tryListPeersViaDaemon(meshSlug)) ?? [];
|
||||
peerCount = peers.filter((p) =>
|
||||
(p as { channel?: string }).channel !== "claudemesh-daemon",
|
||||
).length;
|
||||
|
||||
@@ -135,9 +135,17 @@ async function listPeersForMesh(slug: string): Promise<PeerRecord[]> {
|
||||
// lifecycle helper inside tryListPeersViaDaemon auto-spawns the
|
||||
// daemon if it's down and probes it for liveness — no separate bridge
|
||||
// tier is needed any more (1.28.0).
|
||||
//
|
||||
// 1.34.15: forward `slug` to the daemon as `?mesh=<slug>` so the
|
||||
// server-side aggregator narrows to the requested mesh. Pre-1.34.15
|
||||
// we called this with no argument, so a multi-mesh daemon returned
|
||||
// peers from every attached mesh and the renderer printed "peers on
|
||||
// flexicar" with cross-mesh rows mixed in. The daemon's
|
||||
// `meshFromCtx` already does the right scoping when the slug is
|
||||
// passed; the CLI just wasn't passing it.
|
||||
try {
|
||||
const { tryListPeersViaDaemon } = await import("~/services/bridge/daemon-route.js");
|
||||
const dr = await tryListPeersViaDaemon();
|
||||
const dr = await tryListPeersViaDaemon(slug);
|
||||
if (dr !== null) {
|
||||
return dr.map((p) => annotateSelf(p as PeerRecord, selfMemberPubkey, selfSessionPubkey));
|
||||
}
|
||||
|
||||
@@ -1,10 +1,82 @@
|
||||
import { existsSync } from "node:fs";
|
||||
import { homedir } from "node:os";
|
||||
import { join } from "node:path";
|
||||
|
||||
const home = homedir();
|
||||
const DEFAULT_CONFIG_DIR = join(home, ".claudemesh");
|
||||
|
||||
/**
|
||||
* Resolve `CONFIG_DIR` once, with stale-env detection.
|
||||
*
|
||||
* `claudemesh launch` exposes `CLAUDEMESH_CONFIG_DIR=<tmpdir>` to its
|
||||
* spawned `claude` so the per-session mesh selection is isolated from
|
||||
* `~/.claudemesh/config.json`. The tmpdir is rmSync'd on launch exit.
|
||||
*
|
||||
* Footgun: if a `claudemesh` invocation INHERITS that env from an
|
||||
* already-launched (or previously-launched) session — e.g. a Bash tool
|
||||
* call inside Claude Code, or a tmux pane that captured the env via
|
||||
* `update-environment` — the inherited path may point at a tmpdir that
|
||||
* no longer exists. Pre-1.34.14 we silently used the dead path,
|
||||
* `readConfig()` came back empty, and the user saw "No meshes joined"
|
||||
* from an otherwise-working install.
|
||||
*
|
||||
* Resolution rules:
|
||||
* 1. No env var → `~/.claudemesh` (default).
|
||||
* 2. Env points at a dir containing `config.json` → trust it
|
||||
* (the legitimate per-session-launch case).
|
||||
* 3. Env set but stale (dir missing or no `config.json`) → warn
|
||||
* once on stderr (TTY-only) and fall back to `~/.claudemesh`.
|
||||
*
|
||||
* Memoized: resolves once on first access. Mid-process env mutations
|
||||
* are intentionally ignored — paths must stay stable across one CLI
|
||||
* invocation.
|
||||
*/
|
||||
let _resolvedConfigDir: string | null = null;
|
||||
let _warnedStaleEnv = false;
|
||||
|
||||
function resolveConfigDir(): string {
|
||||
if (_resolvedConfigDir !== null) return _resolvedConfigDir;
|
||||
const envDir = process.env.CLAUDEMESH_CONFIG_DIR;
|
||||
if (!envDir) {
|
||||
_resolvedConfigDir = DEFAULT_CONFIG_DIR;
|
||||
return DEFAULT_CONFIG_DIR;
|
||||
}
|
||||
// Trust the env when it resolves to a real directory. We check
|
||||
// the DIR (not `config.json`) because the legitimate "fresh launch
|
||||
// before any write" case has the dir but no config.json yet.
|
||||
// The stale signature we want to catch is `rmSync(tmpDir,
|
||||
// {recursive: true})` from the outer launch's cleanup — that
|
||||
// removes the directory entirely, so a missing dir is the
|
||||
// unambiguous "stale" signal.
|
||||
if (existsSync(envDir)) {
|
||||
_resolvedConfigDir = envDir;
|
||||
return envDir;
|
||||
}
|
||||
// Stale: env set but the dir is gone. Most likely the outer
|
||||
// launch's cleanup ran and we inherited its (now-dead) tmpdir
|
||||
// path. Fall back to default and warn the user once on stderr —
|
||||
// only when attached to a TTY, so non-interactive callers (CI,
|
||||
// MCP boot, scripts piping stdout) stay quiet.
|
||||
if (!_warnedStaleEnv && process.stderr.isTTY) {
|
||||
_warnedStaleEnv = true;
|
||||
const unsetHint =
|
||||
process.env.SHELL?.endsWith("fish")
|
||||
? "set -e CLAUDEMESH_CONFIG_DIR CLAUDEMESH_IPC_TOKEN_FILE"
|
||||
: "unset CLAUDEMESH_CONFIG_DIR CLAUDEMESH_IPC_TOKEN_FILE";
|
||||
process.stderr.write(
|
||||
`claudemesh: ignoring stale CLAUDEMESH_CONFIG_DIR=${envDir} (no config.json there); using ${DEFAULT_CONFIG_DIR}.\n`
|
||||
+ ` Hint: this is usually a leftover env from a previous \`claudemesh launch\`. Clean it with:\n`
|
||||
+ ` ${unsetHint}\n`,
|
||||
);
|
||||
}
|
||||
_resolvedConfigDir = DEFAULT_CONFIG_DIR;
|
||||
return DEFAULT_CONFIG_DIR;
|
||||
}
|
||||
|
||||
export const PATHS = {
|
||||
CONFIG_DIR: process.env.CLAUDEMESH_CONFIG_DIR || join(home, ".claudemesh"),
|
||||
get CONFIG_DIR() {
|
||||
return resolveConfigDir();
|
||||
},
|
||||
get CONFIG_FILE() {
|
||||
return join(this.CONFIG_DIR, "config.json");
|
||||
},
|
||||
@@ -20,3 +92,12 @@ export const PATHS = {
|
||||
CLAUDE_JSON: join(home, ".claude.json"),
|
||||
CLAUDE_SETTINGS: join(home, ".claude", "settings.json"),
|
||||
} as const;
|
||||
|
||||
/**
|
||||
* Test-only: reset the memoized resolution. Not exported from the
|
||||
* package barrel; reach in via the relative path from a test file.
|
||||
*/
|
||||
export function _resetPathsForTest(): void {
|
||||
_resolvedConfigDir = null;
|
||||
_warnedStaleEnv = false;
|
||||
}
|
||||
|
||||
57
apps/cli/tests/unit/paths-stale-env.test.ts
Normal file
57
apps/cli/tests/unit/paths-stale-env.test.ts
Normal file
@@ -0,0 +1,57 @@
|
||||
import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
|
||||
import { mkdirSync, rmSync, existsSync } from "node:fs";
|
||||
import { join } from "node:path";
|
||||
import { tmpdir, homedir } from "node:os";
|
||||
|
||||
/** Each test imports a fresh copy of paths.ts via dynamic import +
|
||||
* `_resetPathsForTest()` so memoization doesn't leak across cases. */
|
||||
|
||||
const TEST_DIR = join(tmpdir(), "claudemesh-paths-test-" + Date.now());
|
||||
|
||||
describe("paths CONFIG_DIR resolution", () => {
|
||||
beforeEach(() => {
|
||||
delete process.env.CLAUDEMESH_CONFIG_DIR;
|
||||
if (existsSync(TEST_DIR)) rmSync(TEST_DIR, { recursive: true, force: true });
|
||||
});
|
||||
afterEach(() => {
|
||||
delete process.env.CLAUDEMESH_CONFIG_DIR;
|
||||
if (existsSync(TEST_DIR)) rmSync(TEST_DIR, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it("falls back to ~/.claudemesh when env var is unset", async () => {
|
||||
const mod = await import("~/constants/paths.js");
|
||||
mod._resetPathsForTest();
|
||||
expect(mod.PATHS.CONFIG_DIR).toBe(join(homedir(), ".claudemesh"));
|
||||
});
|
||||
|
||||
it("honors CLAUDEMESH_CONFIG_DIR when the dir exists, even without config.json", async () => {
|
||||
mkdirSync(TEST_DIR, { recursive: true });
|
||||
process.env.CLAUDEMESH_CONFIG_DIR = TEST_DIR;
|
||||
const mod = await import("~/constants/paths.js");
|
||||
mod._resetPathsForTest();
|
||||
expect(mod.PATHS.CONFIG_DIR).toBe(TEST_DIR);
|
||||
});
|
||||
|
||||
it("falls back to default when env points at a missing dir (stale-tmpdir case)", async () => {
|
||||
process.env.CLAUDEMESH_CONFIG_DIR = "/var/folders/_nonexistent_claudemesh_dir_xyz123";
|
||||
const mod = await import("~/constants/paths.js");
|
||||
mod._resetPathsForTest();
|
||||
// Suppress the stderr warning to keep test output clean
|
||||
const stderr = vi.spyOn(process.stderr, "write").mockImplementation(() => true);
|
||||
try {
|
||||
expect(mod.PATHS.CONFIG_DIR).toBe(join(homedir(), ".claudemesh"));
|
||||
} finally {
|
||||
stderr.mockRestore();
|
||||
}
|
||||
});
|
||||
|
||||
it("memoizes — second access returns the same path even if env changes mid-process", async () => {
|
||||
mkdirSync(TEST_DIR, { recursive: true });
|
||||
process.env.CLAUDEMESH_CONFIG_DIR = TEST_DIR;
|
||||
const mod = await import("~/constants/paths.js");
|
||||
mod._resetPathsForTest();
|
||||
const first = mod.PATHS.CONFIG_DIR;
|
||||
process.env.CLAUDEMESH_CONFIG_DIR = "/something/else";
|
||||
expect(mod.PATHS.CONFIG_DIR).toBe(first);
|
||||
});
|
||||
});
|
||||
Reference in New Issue
Block a user