Compare commits
31 Commits
cef246a34a
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
1b28550f30 | ||
|
|
9d1b4f3d4c | ||
|
|
ffd0621ccc | ||
|
|
b9ecbe79ad | ||
|
|
33051b95bf | ||
|
|
64d9f9f6f9 | ||
|
|
7f61a711f1 | ||
|
|
96520394ff | ||
|
|
a2a53ff355 | ||
|
|
6780899185 | ||
|
|
cba4a938ec | ||
|
|
706e681d6e | ||
|
|
c036f759c3 | ||
|
|
54e00109ab | ||
|
|
16c148a87f | ||
|
|
b57e47ed65 | ||
|
|
5a8db796a0 | ||
|
|
dab80f475e | ||
|
|
a25102a79f | ||
|
|
7460d34335 | ||
|
|
25586d298f | ||
|
|
a852a9df18 | ||
|
|
4cfb682eab | ||
|
|
0958463998 | ||
|
|
088a4efaa3 | ||
|
|
15b7920b2a | ||
|
|
b0c1348a0a | ||
|
|
1a14cef1e0 | ||
|
|
71f7f81880 | ||
|
|
052f65149d | ||
|
|
0b3014e7eb |
506
.artifacts/specs/2026-05-04-agentic-comms-architecture-v2.md
Normal file
506
.artifacts/specs/2026-05-04-agentic-comms-architecture-v2.md
Normal file
@@ -0,0 +1,506 @@
|
||||
---
|
||||
title: claudemesh — full end-state architecture for agentic peer communication
|
||||
status: draft (v2 — supersedes v1: removes time-boxed phasing, adds P2P data plane, applies Codex-2 correctness/scope-gap edits)
|
||||
target: end-state (architectural milestones, not version timelines)
|
||||
author: Alejandro + Claude (Codex GPT-5.2 cross-checked twice)
|
||||
date: 2026-05-04
|
||||
supersedes: 2026-05-04-agentic-comms-architecture.md (v1)
|
||||
references:
|
||||
- 2026-05-02-architecture-north-star.md (CLI-first commitment, push-pipe)
|
||||
- 2026-05-04-per-session-presence.md (per-launch session pubkey + attestation)
|
||||
- apps/cli/CHANGELOG.md (1.30.0–1.32.1 history)
|
||||
---
|
||||
|
||||
# claudemesh — agentic peer communication, full end-state
|
||||
|
||||
## What this document is
|
||||
|
||||
The end-state architecture for claudemesh as a transport-agnostic agentic peer-comms platform. Not a release plan, not a sprint roadmap — the **shape** the system needs to converge on. Implementation order at the end is a *suggestion*, not a contract; time estimates are deliberately omitted because the surface is too cross-cutting to phase by weeks.
|
||||
|
||||
v1 of this spec (same date, no `-v2` suffix) treated the broker as the sole data plane. v2 corrects that: **the broker is a coordination plane (signaling, discovery, offline queue, fan-out, registry, revocation); the data plane is hybrid P2P** with broker fallback for the cases P2P can't cover. Closer to how Tailscale, libp2p, LiveKit, and modern WebRTC stacks work in production.
|
||||
|
||||
## TL;DR
|
||||
|
||||
- **Identity** — three keypair types (member, session, service) all rooted in a member's secret key. Member is durable, session is per-launch, service is a member-scoped delegate for non-Claude integrations. Every service has its own pubkey and explicit revocation.
|
||||
- **Coordination plane** — broker handles signaling, peer discovery, offline message queue, group/topic fan-out, mesh state authority, revocation gossip. Always reachable.
|
||||
- **Data plane** — hybrid:
|
||||
- **P2P first** (WebRTC data channels, future: QUIC) when both peers online + NAT-traversable.
|
||||
- **Broker-relayed** when peers are NAT-blocked, when one peer is offline, or for group/topic/broadcast where fan-out at the broker is structurally cheaper than N-way sender-side fan-out.
|
||||
- **Pure broker** for service identities that can't run a P2P stack (HTTP webhook senders, OpenAI Assistants, browser SDKs without WebRTC).
|
||||
- **Channels** — typed envelope (dm, group, topic, rpc, system, stream). Channel type drives crypto, routing, and transport selection. `meta` is required in v2 envelope.
|
||||
- **Transports** — pluggable adapters under one interface: WS-to-broker (today), WebRTC P2P, HTTP webhook, future LiveKit/QUIC/etc. Broker negotiates which adapter a peer pair uses.
|
||||
- **Crypto** — every direct message is E2E encrypted to recipient's pubkey regardless of transport. Broker never sees plaintext. P2P doesn't get any extra trust just because it's direct.
|
||||
- **Delivery** — at-least-once **requires receiver ack** before broker marks `delivered_at`. The retry path before that is best-effort with idempotent dedupe at the receiver.
|
||||
|
||||
The CLI-first commitment from the North Star spec stays intact. Every channel type and every transport is invocable from `claudemesh <verb>`. MCP serves only `claude/channel` mid-turn push.
|
||||
|
||||
---
|
||||
|
||||
## The forcing functions (why this shape, not a smaller one)
|
||||
|
||||
1. **Multi-session interconnect already broke** (1.30.0 → 1.32.1) because the per-session WS subsystem shipped without push handler. Symptom of "broker is the data plane and we keep bolting on" thinking. Need to formalize roles and transport adapters before the next bolt-on.
|
||||
|
||||
2. **Codex review surfaced a correctness bug** in `drainForMember` — claims `delivered_at = NOW()` *before* WS push succeeds; if `ws.readyState !== OPEN` the row is marked delivered and message is lost. At-most-once with no retry. Inherited by every channel/transport added unless fixed at the foundation.
|
||||
|
||||
3. **The agentic-comms domain has standardized on hybrid P2P + central coordinator.** Tailscale (control plane + WireGuard P2P), LiveKit (signaling + SFU + P2P data channels), libp2p (DHT discovery + multi-transport), Iroh (gossip + QUIC P2P). Pure-broker is a 2010s pattern; pure-P2P is academic. Hybrid is the norm.
|
||||
|
||||
4. **claudemesh's pricing/economics demand P2P.** Every byte through the broker is your cost. Voice transcripts, file transfers, real-time tool I/O — bandwidth-heavy. P2P data plane lets the broker scale linearly with peer count, not message volume.
|
||||
|
||||
5. **Privacy/sovereignty matters as the agent ecosystem grows.** "Your agents talk to my agents" should default to peer-to-peer paths when possible. Broker as relay is fine; broker as forced middleman is not.
|
||||
|
||||
---
|
||||
|
||||
## Audience for this architecture
|
||||
|
||||
| Peer type | Identity | Online presence | Data plane preference | Notes |
|
||||
|---|---|---|---|---|
|
||||
| **Claude Code session** | Per-launch session pubkey, member-attested | WS to broker (control + signaling) | P2P first, broker fallback | Mid-turn push via MCP `claude/channel` |
|
||||
| **Daemon, no launch** (idle Mac with daemon running) | Member pubkey | WS to broker | Broker only (no P2P partner unless launched) | Receives broadcasts + member-targeted DMs |
|
||||
| **Voice agent** (LiveKit, Pipecat) | Service identity, member-signed | LiveKit room + bridge | LiveKit room data channels intra-room; bridge over broker for cross-mesh | Side-car bridges room ↔ broker |
|
||||
| **OpenAI Assistant / Anthropic Skill** | Service identity, scoped token | HTTP outbound, webhook inbound | Broker only (can't run P2P) | Daemon does delegated re-encryption |
|
||||
| **Browser-based peer** (web dashboard, SDK) | Member or service identity | WS to broker, WebRTC for P2P | P2P-where-possible (browsers ARE WebRTC-native) | Full feature parity once on-mesh |
|
||||
| **Webhook consumer** (Stripe-style passive) | Service identity | HTTP webhook inbound only | Broker only | Topic subscriptions; no inbound channel |
|
||||
| **Bridge** (Slack, WhatsApp, IRC, Matrix) | Service identity per bridge + per-end-user delegated | WS to broker | Broker only for bridge ↔ broker; native protocol for bridge ↔ external | Trust delegated to bridge operator |
|
||||
| **Cron / scheduled actor** | Member pubkey or service identity | Ephemeral; HTTP send only | Broker only | No long-lived connection |
|
||||
| **CLI-only user** (no Claude Code) | Member pubkey | Ephemeral on each `claudemesh send` | Broker only | Command-line agent, queues via outbox |
|
||||
|
||||
Every row in this table works without changing the broker's coordination plane.
|
||||
|
||||
---
|
||||
|
||||
## Layer 1: Identity
|
||||
|
||||
Three keypair types, one auth model.
|
||||
|
||||
### Member identity (durable)
|
||||
- Ed25519 keypair, generated at `claudemesh join <invite>`. Held in `~/.claudemesh/config.json` per mesh.
|
||||
- The auth boundary — grants, kicks, bans operate on members.
|
||||
- Used for hello signature on the daemon's control-plane WS.
|
||||
- Used as cryptographic root of trust for sibling sessions and service identities.
|
||||
|
||||
### Session identity (ephemeral, per-launch)
|
||||
- Ed25519 keypair generated by each `claudemesh launch`. Held in process memory only.
|
||||
- Parent-signed attestation vouches for it (TTL 12h, broker cap 24h). Rotation = new launch.
|
||||
- Used for hello signature on the per-session WS, and as routing key for DMs targeted at *this specific launched session*.
|
||||
- Session secret never touches disk; lives only in the daemon's `sessionBrokers` map keyed by IPC token.
|
||||
|
||||
### Service identity (third type, additive)
|
||||
|
||||
For non-Claude integrations that can't or shouldn't use a per-launch session.
|
||||
|
||||
```
|
||||
ServiceIdentity {
|
||||
service_id // Stable string id ("openai-assistant-foo", "livekit-room-bar")
|
||||
service_pubkey // Ed25519 pubkey — the cryptographic identity. crypto_box targets this.
|
||||
member_id // The mesh member that owns this service (auth boundary)
|
||||
service_type // "openai-assistant" | "livekit-room" | "webhook" | "voice-agent" | ...
|
||||
scopes // ["dm:read", "topic:write", "rpc:invoke", ...]
|
||||
attestation // member-signed: { service_id, service_pubkey, scopes, expires_at, signature }
|
||||
transport_hint // "ws" | "http-webhook" | "sse" | "livekit" — informs how the broker reaches it
|
||||
delegate_daemon_pubkey? // Optional. Set when the daemon holds the service's secret on its behalf.
|
||||
}
|
||||
```
|
||||
|
||||
Two flavors:
|
||||
- **Holds-secret service** — has its own keypair (`service_pubkey` + service-secret kept by the service itself). Runs E2E crypto end-to-end. Voice agent side-cars, browser SDK, MQTT bridges.
|
||||
- **Delegated service** — daemon holds the service-secret on the service's behalf. Senders still encrypt to `service_pubkey`; daemon decrypts on receipt and forwards plaintext (or re-signs) to the service via its `transport_hint`. Used by HTTP webhook consumers, OpenAI Assistants. Trust is in the daemon owner. `delegate_daemon_pubkey` records who's holding.
|
||||
|
||||
All three identity types resolve to a `member_id` for authorization. They differ in liveness (member = always; session = per-launch; service = scoped) and transport hint (member/session = WS-resident; service = polymorphic).
|
||||
|
||||
### Identity revocation (explicit)
|
||||
|
||||
Existing v1 left this implicit. v2 makes it concrete:
|
||||
|
||||
- **CLI verb:** `claudemesh service revoke <service_id>` (also `claudemesh peer revoke <pubkey>` for member revocation).
|
||||
- **Broker effect:** add row to `revocation` table with `(mesh_id, revoked_pubkey, revoked_at, revoked_by, reason?)`. Drop any active WS for that pubkey (close 4002 "revoked"). Reject future helloes.
|
||||
- **Drain effect:** `drainForMember` checks revocation list at drain time; ciphertext-in-flight from the revoked sender is dropped (sender already broker-acked, but recipient never sees it).
|
||||
- **Gossip:** revocation events publish on the `system` channel (highest priority). Online peers cache; offline peers see on reconnect. Required so P2P sessions also honor revoke (otherwise a revoked peer's stored attestations could keep working over direct paths).
|
||||
- **Latency target:** <30s for online peers to receive and apply.
|
||||
- **Expiry vs revoke distinction:** `expires_at` is graceful (predictable, scheduled rotation); revoke is emergency (leaked secret, fired employee, compromised host). Both use the same revocation table; `expires_at` enforces silently when reached, revoke is logged as an audit event.
|
||||
|
||||
---
|
||||
|
||||
## Layer 2: Coordination plane (the broker, properly scoped)
|
||||
|
||||
The broker is **not** the data plane. Its real responsibilities:
|
||||
|
||||
1. **Mesh state authority** — member roster, group memberships, topic registry, service registrations, revocation list. Source of truth for who's in a mesh and what they can do.
|
||||
2. **Peer discovery** — `list_peers` returns currently-online presences. Broker is the only system that knows which peers are reachable now and over which transports.
|
||||
3. **Signaling for P2P upgrades** — when peer A wants to open a P2P connection to peer B, A sends a SDP offer through the broker; B responds with an SDP answer through the broker. Once the data channel is up, broker is out of the path. Same as WebRTC signaling.
|
||||
4. **Offline message queue** — when recipient is offline, broker stores the (encrypted) message until they reconnect. P2P can't do this without an "always-on peer" model, which is awkward to bootstrap.
|
||||
5. **Group / topic / broadcast fan-out** — broker is the cheap fan-out point. Sender publishes once; broker delivers to N recipients. P2P fan-out (gossipsub) is possible but adds significant complexity for a feature most meshes won't need at scale.
|
||||
6. **TURN-style relay for NAT-blocked pairs** — when P2P negotiation fails (symmetric NAT, restrictive corporate firewall), broker carries the data. Functionally equivalent to TURN.
|
||||
7. **Revocation gossip publisher** — broker pushes revocation events to all online peers via the `system` channel; peers cache them.
|
||||
8. **Audit log + persistence layer** — encrypted message metadata for compliance. Bodies are E2E-encrypted, so audit is over (sender, recipient, channel, timestamp, size), not content.
|
||||
|
||||
The broker is **NOT**:
|
||||
- The default path for online-online direct messages (P2P should win).
|
||||
- The decryptor for any direct message (E2E means broker sees ciphertext only).
|
||||
- A bottleneck on bulk data (file transfer, voice, screen share — these go P2P or fail).
|
||||
- The sole identity authority for active sessions (P2P sessions verify attestations locally via cached mesh state).
|
||||
|
||||
### Two roles per mesh on the WS layer (Codex-1 correction, kept)
|
||||
|
||||
Within the broker's WS surface, the daemon holds two roles per mesh, not one connection per launch:
|
||||
|
||||
- **Control-plane connection** — one per mesh, member-keyed. Carries: signaling + outbox drain + RPCs + broadcast/member-targeted inbound + revocation gossip subscription.
|
||||
- **Session connections** — N per mesh, session-keyed. Carries: presence row keyed on session pubkey + signaling for P2P upgrades involving this session + inbound for session-targeted DMs that arrive via broker fallback.
|
||||
|
||||
A peer who's purely on the broker (no P2P) functions exactly as today. A peer who upgrades to P2P with another peer keeps its broker WS for the other roles.
|
||||
|
||||
---
|
||||
|
||||
## Layer 3: Data plane (hybrid P2P + broker fallback)
|
||||
|
||||
The data plane is what carries actual message bodies. Three modes, selected per (sender, recipient, channel) tuple:
|
||||
|
||||
### Mode 1: Direct P2P (preferred when possible)
|
||||
|
||||
Two peers run a WebRTC data channel (or QUIC stream — pluggable, see Layer 4) between their daemons. Established via signaling through the broker; once up, broker is out of the path.
|
||||
|
||||
**When P2P is selected:**
|
||||
- Both peers are online (have an active broker WS).
|
||||
- Both peers' transports advertise P2P capability (WebRTC available; not a webhook-only service identity; not a browser without `RTCPeerConnection`).
|
||||
- ICE negotiation succeeds (at least one candidate pair works — direct, server-reflexive, or peer-reflexive).
|
||||
- Channel type is `dm`, `rpc`, or `stream` (the 1:1 cases).
|
||||
|
||||
**P2P session lifecycle:**
|
||||
- Established lazily on first message (warm-up cost ~200ms; dominated by ICE + DTLS handshake). Subsequent messages reuse the channel.
|
||||
- Idle timeout: 5min of no traffic → tear down. Re-established on next message.
|
||||
- Hard timeout: 1h max regardless of activity, then re-handshake. Limits damage of compromised session keys.
|
||||
- Either side can demote to broker-relay at any time; broker is the fallback always.
|
||||
|
||||
**Crypto on P2P:**
|
||||
- DTLS handshake provides transport encryption (forward secrecy; recipient pubkey verified via cached attestation chain).
|
||||
- Application-layer crypto_box ALSO runs on top — same as broker-relayed messages — so the wire format and decryption path are identical on the receiver side. Defense in depth, no special-case code.
|
||||
|
||||
### Mode 2: Broker-relayed (fallback)
|
||||
|
||||
The current path. Sender encrypts to recipient pubkey (member or session or service), pushes to broker via WS, broker queues, recipient pulls (or broker pushes to recipient's WS).
|
||||
|
||||
**When broker-relay is selected:**
|
||||
- One peer offline → broker queues, delivers on reconnect.
|
||||
- ICE negotiation fails → broker becomes the relay.
|
||||
- Channel type is `group`, `topic`, or `broadcast` → broker fan-out is structurally cheaper than P2P fan-out for any group >2.
|
||||
- Service identity at either end can't run P2P → broker is the only path.
|
||||
|
||||
**Crypto:** unchanged from today — E2E crypto_box, broker sees ciphertext only.
|
||||
|
||||
### Mode 3: Direct webhook (broker as broker, not as relay)
|
||||
|
||||
For service identities advertising `transport_hint: "http-webhook"`. Sender encrypts to service's `service_pubkey` (or to delegate-daemon's pubkey for delegated services), broker POSTs the ciphertext to the service's registered URL with HMAC signature + retry. No long-lived connection on the service side.
|
||||
|
||||
This is functionally a "broker queue, custom delivery transport" — broker still mediates, but delivery is HTTP not WS.
|
||||
|
||||
### Selection logic (deterministic, sender-side)
|
||||
|
||||
```
|
||||
function pickTransport(sender, recipient, channel) -> Transport:
|
||||
if channel in [group, topic, broadcast]:
|
||||
return broker.relay # fan-out semantics
|
||||
|
||||
if recipient.transport_hint == "http-webhook":
|
||||
return broker.relay # broker calls webhook
|
||||
|
||||
if recipient is offline:
|
||||
return broker.queue # store-and-forward
|
||||
|
||||
if !recipient.capabilities.p2p:
|
||||
return broker.relay # one-end can't P2P
|
||||
|
||||
if !sender.capabilities.p2p:
|
||||
return broker.relay # we can't P2P
|
||||
|
||||
if has_active_p2p_session(sender, recipient):
|
||||
return p2p.session # warm path
|
||||
|
||||
attempt_p2p_handshake(sender, recipient, timeout=2s) ->
|
||||
if ok: return p2p.session
|
||||
else: return broker.relay # fall through, log degraded
|
||||
```
|
||||
|
||||
Policy lives in the daemon's send path. Broker doesn't know or care — it sees only the messages that actually go through it.
|
||||
|
||||
---
|
||||
|
||||
## Layer 4: Transport adapters (pluggable)
|
||||
|
||||
A transport adapter is an implementation of how *one peer pair* moves bytes. Defined by an interface; new adapters added without touching upper layers.
|
||||
|
||||
```typescript
|
||||
interface PeerTransport {
|
||||
readonly kind: string; // "ws-broker" | "webrtc-p2p" | "http-webhook" | ...
|
||||
|
||||
readonly capabilities: {
|
||||
p2p: boolean;
|
||||
bidirectional: boolean;
|
||||
midTurnPush: boolean;
|
||||
maxMessageBytes: number;
|
||||
streamingChunks: boolean;
|
||||
};
|
||||
|
||||
open(opts: TransportOpenOpts): Promise<TransportSession>;
|
||||
send(envelope: Envelope): Promise<TransportSendResult>;
|
||||
inbound(): AsyncIterable<Envelope>;
|
||||
heartbeat(): Promise<boolean>;
|
||||
close(reason?: string): Promise<void>;
|
||||
}
|
||||
```
|
||||
|
||||
### Concrete adapters at end-state
|
||||
|
||||
1. **`WsBrokerTransport`** — current code. WebSocket to `wss://ic.claudemesh.com/ws`. Underpins both broker-relay (Mode 2) and signaling for P2P upgrades.
|
||||
2. **`WebRtcP2pTransport`** — RTCPeerConnection + RTCDataChannel. Browser, Node (`node-datachannel` or similar), CLI all supported. Chunking handled at envelope layer for `stream` channel.
|
||||
3. **`HttpWebhookTransport`** — outbound HTTP POST to broker `/v1/send`; inbound HTTP POST to a registered webhook URL. Unidirectional from peer's perspective. Mid-turn push: no.
|
||||
4. **`LiveKitRoomTransport`** — for voice agents. Side-car bridges a LiveKit room to claudemesh. Maps a LiveKit participant → claudemesh service identity.
|
||||
|
||||
Future adapters TBD as concrete needs surface — no commitments here. (v1 listed MQTT/gRPC/SSE as future named adapters; v2 drops the named list per Codex-2 should-cut feedback.)
|
||||
|
||||
The peer's daemon advertises transport capabilities at hello time; broker stores them in the presence row; senders consult them via `list_peers` (capability fields added to the response).
|
||||
|
||||
---
|
||||
|
||||
## Layer 5: Channels (typed envelope)
|
||||
|
||||
Channels define **semantics**: what the message means, what crypto to apply, what delivery guarantees, what fan-out, what backpressure.
|
||||
|
||||
```typescript
|
||||
type ChannelType =
|
||||
| "dm" // 1:1 direct, encrypted to recipient pubkey, at-least-once with ack
|
||||
| "group" // post to named group, per-recipient encrypt or symmetric, at-least-once with ack
|
||||
| "topic" // pub/sub topic, persisted history, per-topic symmetric key, at-least-once with ack
|
||||
| "rpc" // request/response with correlation id + timeout, exactly-once via dedupe
|
||||
| "system" // peer_joined / peer_left / topology / lifecycle / revocation (broker-originated)
|
||||
| "stream"; // long-lived ordered chunks, idempotent per (stream_id, chunk_id)
|
||||
|
||||
interface Envelope {
|
||||
v: 2;
|
||||
channel: ChannelType;
|
||||
/** Routing target — meaning depends on channel:
|
||||
* dm: recipient pubkey (member, session, or service)
|
||||
* group: group name (e.g. "@admins")
|
||||
* topic: topic id (e.g. "#abc123")
|
||||
* rpc: recipient pubkey
|
||||
* system: ignored (sender-determined fan-out; broker fills in)
|
||||
* stream: recipient pubkey (the stream_id is in meta.streamId — see below) */
|
||||
target: string;
|
||||
/** Sender identity pubkey (member, session, or service). */
|
||||
from: string;
|
||||
/** Encrypted payload. Channel + recipient determines crypto recipe:
|
||||
* dm/rpc/stream: crypto_box to recipient pubkey
|
||||
* group: per-recipient seal (or symmetric in v3)
|
||||
* topic: per-topic symmetric key (v0.2.0 spec)
|
||||
* system: broker-signed, plaintext metadata (event has no body) */
|
||||
body: { nonce: string; ciphertext: string; bodyVersion: number };
|
||||
/** Required in v2 (was optional in v1). Even minimal envelopes must carry
|
||||
* clientMessageId for idempotent dedupe. */
|
||||
meta: {
|
||||
clientMessageId: string; // REQUIRED — idempotency id (spec §4.2)
|
||||
requestFingerprint?: string;
|
||||
priority?: "now" | "next" | "low"; // dm: gates mid-turn push; group/topic: fan-out priority
|
||||
timeoutMs?: number; // rpc only
|
||||
streamId?: string; // REQUIRED for channel:"stream"; identifies the stream
|
||||
streamChunkId?: number; // stream only; monotonic; receiver dedupes
|
||||
streamTerminator?: boolean; // stream only; signals end
|
||||
rpcCorrelationId?: string; // rpc only; back-edge for response
|
||||
rpcResponse?: boolean; // rpc only; this is a response, not request
|
||||
replyToId?: string; // dm/topic threading
|
||||
mentions?: string[]; // dm/topic; @-callouts
|
||||
expiresAt?: number; // any; broker drops past this; default 7d for queued
|
||||
};
|
||||
/** Sender Ed25519 signature over canonical bytes. Verified by recipient
|
||||
* (and by broker for system-message origin). */
|
||||
signature: string;
|
||||
}
|
||||
```
|
||||
|
||||
### Stream concurrency
|
||||
|
||||
For `channel: "stream"`, **`meta.streamId` is required**. Two concurrent streams to the same recipient pubkey use distinct streamIds; receiver demuxes by `(from, streamId)`. Without this, multi-stream voice transcripts or file transfers from the same peer would collide.
|
||||
|
||||
### Crypto by channel
|
||||
|
||||
- `dm`, `rpc`, `stream` → crypto_box(plaintext, recipient_pubkey, sender_secretkey). Receiver verifies attestation chain to ensure recipient_pubkey is a valid identity rooted in a current member.
|
||||
- `group` → for now: per-recipient crypto_box (sender encrypts N times, broker fans out). Future: hybrid Curve25519 → AES-GCM with sender key wrap, like Signal Sender Keys.
|
||||
- `topic` → per-topic symmetric key (already in v0.2.0 spec). Key rotation = new topic + members re-subscribe. Keys distributed via DM at join time, encrypted to each member's pubkey.
|
||||
- `system` → broker is the signer; receivers verify against the broker's published Ed25519 pubkey. Plaintext bodies allowed since these are operational events.
|
||||
|
||||
### Delivery semantics (Codex-2 correction applied)
|
||||
|
||||
**At-least-once requires receiver ack.** Today's broker sets `delivered_at = NOW()` inside the claim CTE before WS push succeeds — that's at-most-once with no retry. The end-state behavior:
|
||||
|
||||
1. Sender's daemon writes to outbox (durable).
|
||||
2. Drain worker sends to broker; broker acks with `client_message_id` echo (this is sender → broker delivery ack, NOT end-to-end).
|
||||
3. Broker queues with `claimed_at` NULL, `delivered_at` NULL.
|
||||
4. On recipient hello / push opportunity: broker claims by setting `claimed_at = NOW(), claim_id = <presenceId>` (lease 30s).
|
||||
5. Broker `sendToPeer` writes to WS / P2P / webhook.
|
||||
6. Receiver processes envelope and emits `client_ack { clientMessageId }` back to broker.
|
||||
7. Broker sets `delivered_at = NOW()` ON ACK RECEIPT.
|
||||
8. If lease expires without ack → broker re-eligible to claim and re-deliver.
|
||||
9. Receiver dedupes by `clientMessageId` (idempotent insert into inbox).
|
||||
|
||||
Until ack is wired (transitional state), the transitional label is **best-effort retry with idempotent dedupe**, not at-least-once. The outbox + claim/lease + dedupe combination upgrades to at-least-once when the ack path is in place.
|
||||
|
||||
`rpc` exactly-once is the same path with the addition that the response carries the `rpcCorrelationId`; sender retries the request until response received OR `timeoutMs` elapses; receiver-side dedupe ensures the handler runs at most once.
|
||||
|
||||
### Mid-turn push
|
||||
|
||||
`channel: "dm"` with `meta.priority: "now"` and recipient is a launched Claude Code session → recipient's daemon emits `claude/channel` MCP push; the session's Claude Code reads it mid-turn. Other priorities deliver via `claudemesh inbox` poll or at next tool boundary.
|
||||
|
||||
### Reply threading + mentions
|
||||
|
||||
Uniform across `dm` and `topic`: `meta.replyToId` references the original message's `clientMessageId`. `meta.mentions` is an array of pubkeys (or `@<group>`) — UI/CLI surfaces them; broker doesn't enforce.
|
||||
|
||||
---
|
||||
|
||||
## Layer 6: Mesh state — broker authority + signed gossip
|
||||
|
||||
The mesh state (members, groups, topics, services, revocations, policies) needs both:
|
||||
|
||||
- **Authority** — single source of truth. The broker DB. Mutations (add member, revoke, change policy) go through broker, signed by mesh owner / admin.
|
||||
- **Replication** — every peer needs a current-enough copy to authorize incoming P2P messages locally (otherwise revoke can't be enforced when peers chat directly).
|
||||
|
||||
End-state: broker publishes signed mesh-state-update events on the `system` channel; peers cache and apply. Conflict resolution is trivial because broker is authority — peers merge updates by version vector. Eventually consistent in seconds, not the open-ended convergence of CRDT-only systems.
|
||||
|
||||
For peer revocation specifically: revocation gossip is highest priority and must propagate within 30s to all online peers. Offline peers see it on reconnect.
|
||||
|
||||
---
|
||||
|
||||
## Crypto — what doesn't change vs what does
|
||||
|
||||
### Doesn't change
|
||||
- Per-peer Ed25519 keypairs (member + session + service).
|
||||
- crypto_box (Curve25519 + XSalsa20 + Poly1305) for DMs/RPC/stream.
|
||||
- Parent-attestation flow for sessions and services.
|
||||
|
||||
### Does change (additive)
|
||||
- DTLS layer underneath WebRTC P2P (transport-level encryption for fingerprint binding).
|
||||
- Per-topic symmetric keys (v0.2.0 baseline; v2 makes it a hard requirement for topics).
|
||||
- Broker signing key for `system` channel events (single Ed25519 keypair the broker holds; pubkey published in mesh state).
|
||||
- Service identity attestations carry `service_pubkey` + `scopes`.
|
||||
- Forward-secrecy for long-lived P2P sessions: post-handshake, derive a fresh symmetric key per session epoch (1h max); rotate.
|
||||
|
||||
---
|
||||
|
||||
## Migration order (architectural milestones, NO time estimates)
|
||||
|
||||
The end-state above doesn't ship in one PR. The following ordering minimizes regression risk and lets each milestone be useful on its own. **No weeks/sprints attached** — work proceeds when the prior milestone is stable.
|
||||
|
||||
### Milestone 1 — Foundational correctness
|
||||
*Required before anything else. Without this, every later milestone inherits the bugs.*
|
||||
|
||||
- Extract `connectWsWithBackoff` helper. Refactor `DaemonBrokerClient` and `SessionBrokerClient` to use it. Eliminates the drift bug class.
|
||||
- Drop daemon's stray `sessionPubkey` field (or rename + document).
|
||||
- Tighten daemon-WS inbound filter — `*` broadcasts and member-targeted DMs only; session-targeted DMs land on session WS exclusively.
|
||||
- Add `presence.role` column at broker (`control-plane | session | service`); list_peers + fan-out + reconnect honor it.
|
||||
- **Fix broker drain race** — schema migration adds `claimed_at`, `claim_id`, `claim_expires_at` columns. Rewrite `drainForMember` for two-phase claim/deliver. Re-claim if `claimed_at` older than lease (30s).
|
||||
- Receiver-side `client_ack` for at-least-once with ack (Codex-2 correction). Without ack wiring this stays at "best-effort retry with idempotent dedupe."
|
||||
- Receiver-side dedupe: idempotent insert on `clientMessageId`; finished + made required for v2 envelopes.
|
||||
|
||||
### Milestone 2 — Capability advertisement + transport abstraction
|
||||
*Sets up the interface. No new transport yet.*
|
||||
|
||||
- Define `PeerTransport` interface; refactor existing WS code to be the first implementation. No behavioral change.
|
||||
- Add capabilities field to hello payload + presence row + `list_peers` response.
|
||||
- Define `Envelope v2` schema with `meta` required + `streamId` requirement on `stream` channel. Broker accepts both v1 and v2 (v1 auto-upgraded server-side by inferring `channel` from `targetSpec` shape). Senders start emitting v2.
|
||||
|
||||
### Milestone 3 — Service identity + HTTP webhook transport
|
||||
*First non-WS transport. Validates abstraction. Includes revocation.*
|
||||
|
||||
- Service identity registration: `claudemesh service register --type webhook --pubkey <hex> --scopes ...` mints attestation, stores broker-side. Service pubkey explicit in attestation.
|
||||
- Service revocation: `claudemesh service revoke <service_id>` writes broker denylist + closes any active connections + publishes `system` revocation event.
|
||||
- Add `HttpWebhookTransport` (broker-side outbound: POST with HMAC + retry; daemon-side inbound: HTTP server receives webhook callbacks → handleBrokerPush).
|
||||
- Add `/v1/send` HTTP POST endpoint on broker (today broker is WS-only for sends).
|
||||
- Demo: cron job using only `curl` posts to mesh; webhook subscriber receives.
|
||||
- (`SseTransport` deferred — Codex-2 should-cut feedback. Pull in when concrete browser need arises.)
|
||||
|
||||
### Milestone 4 — Typed channels: rpc, stream, system
|
||||
*Channel layer becomes real.*
|
||||
|
||||
- `channel: "rpc"` end-to-end: correlation id routing through any transport, response timeout, `claudemesh rpc <peer> <method> <args>` CLI verb.
|
||||
- `channel: "stream"` end-to-end: chunked + ordered + idempotent, multi-stream demux via `meta.streamId`, `claudemesh stream <peer> <stream-id>` CLI verb.
|
||||
- `channel: "system"` formalized (broker-signed events for peer_joined, peer_left, topology, revocation, mesh-state-updates).
|
||||
|
||||
### Milestone 5 — P2P data plane (WebRTC adapter)
|
||||
*The big architectural shift. Broker becomes coordinator, not data path.*
|
||||
|
||||
- Add `WebRtcP2pTransport` adapter. Uses `node-datachannel` (or libdatachannel binding) on Node; native WebRTC in browser.
|
||||
- Add signaling protocol over the existing broker WS:
|
||||
- `p2p_offer` (sender → broker → recipient): SDP offer + ICE candidates.
|
||||
- `p2p_answer` (recipient → broker → sender): SDP answer + ICE candidates.
|
||||
- `p2p_candidate` (either way): trickle ICE candidates.
|
||||
- All signaling messages are broker-attested (only valid sender/recipient pairs).
|
||||
- Add `pickTransport()` policy in daemon send path.
|
||||
- Add P2P session manager: warm-cache, idle timeout, hard timeout, demote-to-broker on failure.
|
||||
- Tag broker-relayed messages that *could have* gone P2P with a metric, so degradation rate is observable.
|
||||
|
||||
### Milestone 6 — Mesh state replication + revocation gossip
|
||||
*Required before P2P is safe at scale.*
|
||||
|
||||
- Broker publishes signed `system` events for all mesh state mutations.
|
||||
- Peers subscribe; cache and apply.
|
||||
- Revocation propagation latency target: <30s for online peers.
|
||||
- P2P sessions verify peer identity against cached state on every message (cheap, just a map lookup).
|
||||
|
||||
### Milestone 7 — External integrations (proof points, parallel)
|
||||
*One PoC per category to validate the architecture, opportunistically.*
|
||||
|
||||
- LiveKit side-car (validates LiveKit room transport).
|
||||
- OpenAI Assistant (validates delegated-key crypto + webhook transport).
|
||||
- WhatsApp / Slack bridge (validates human-bridge service identity).
|
||||
- Browser SDK (validates browser as a peer; uses WebRTC adapter natively).
|
||||
|
||||
### Milestone 8 — Group/topic crypto upgrade
|
||||
*Group fan-out crypto efficiency.*
|
||||
|
||||
- Sender Keys protocol for group: sender derives group key, encrypts content once, encrypts group key per-recipient. Avoids N-way encryption per message.
|
||||
- Per-topic key rotation policy (member join → optional re-key; member leave → forced re-key).
|
||||
|
||||
### Beyond Milestone 8
|
||||
- Future transport adapters as concrete needs surface (no commitments).
|
||||
- Multi-broker federation (mesh spans multiple brokers; gossip across).
|
||||
- Onion routing option for adversarial environments.
|
||||
|
||||
---
|
||||
|
||||
## Non-goals (explicit)
|
||||
|
||||
- **Replacing Slack / Discord / Matrix as a human chat product.** claudemesh is for agent coordination; humans participate via bridges or direct DMs but UX is CLI-first.
|
||||
- **Pure-P2P with no central coordinator.** The broker stays — for offline queue, group fan-out, mesh authority, revocation. "P2P-first hybrid" is the commitment, not "P2P-only."
|
||||
- **Replacing the MCP `claude/channel` push-pipe.** Mid-turn interrupt stays MCP. The data-plane changes don't touch the daemon-to-Claude-Code path.
|
||||
- **Real-time media (audio/video) directly in claudemesh data channels.** Bandwidth-heavy media goes through dedicated stacks (LiveKit, WebRTC SFU). claudemesh metadata + signaling glues them.
|
||||
|
||||
---
|
||||
|
||||
## Open questions
|
||||
|
||||
1. **Mid-turn push when sender is on P2P session.** P2P delivery to recipient's daemon → daemon emits MCP push. Same shape as broker-delivered. Confirm the MCP push respects per-session targeting (different session pubkey siblings of the same member).
|
||||
|
||||
2. **Browser peers and NAT traversal.** Browser ↔ browser via WebRTC works. Browser ↔ daemon (Node WebRTC binding) — needs testing under symmetric NAT. May require running a STUN server (Google's for now; eventually self-hosted). TURN fallback uses the broker WS.
|
||||
|
||||
3. **Backpressure on stream channel.** WebRTC data channels have built-in flow control. Broker-relayed streams need per-stream backpressure signaling to avoid OOM at the broker. Proposal: receiver advertises `stream_window_bytes` periodically; sender pauses when used.
|
||||
|
||||
4. **Multi-region brokers.** Today single broker. If we add a second broker (or federation), how do peers in mesh A on broker 1 talk to peers in mesh A on broker 2? Out of scope here; separate spec when forced.
|
||||
|
||||
---
|
||||
|
||||
## Acknowledgements
|
||||
|
||||
**Codex-1 (initial architecture review of existing code) caught:**
|
||||
- "Remove daemon-WS inbound entirely" idea silently loses broadcasts + member-targeted DMs whenever zero launches exist. Corrected → retained.
|
||||
- Inheritance for the dup'd lifecycle would become a god class. Composition via helper kept.
|
||||
- Drain race needs `claimed_at` + delivered-on-success; "check OPEN before claim" still drops on crash. Kept.
|
||||
- Token-keyed registry is correct (token = auth boundary), not a smell. Kept.
|
||||
|
||||
**Codex-2 (single-pass review of v1 of this spec) caught:**
|
||||
- At-least-once requires receiver ack, not just "set delivered_at on success." → Layer 5 delivery semantics rewritten to require client_ack.
|
||||
- Service identity needs explicit `service_pubkey` field, included in attestation. → Added to ServiceIdentity definition.
|
||||
- v2 envelope `meta` should be non-optional with `clientMessageId` always present. → meta is now required.
|
||||
- Service identity needed explicit revocation/disable story. → New CLI verb `claudemesh service revoke`, broker denylist, system-channel gossip propagation.
|
||||
- `streamId` location ambiguous; concurrent streams to same peer would collide. → `meta.streamId` made REQUIRED for `channel: "stream"`.
|
||||
- Defer `SseTransport` from Milestone 3. → Done.
|
||||
- Drop named future-adapter list (MQTT/gRPC) to avoid false commitments. → Done.
|
||||
|
||||
The hybrid P2P data plane, transport adapter abstraction, typed channel envelope, mesh state replication, and milestone reordering are mine. Codex's reviews were targeted at correctness/scope-gap/should-cut, not redesign.
|
||||
|
||||
**This spec is now frozen for implementation.** No further architectural drift; deviations during implementation surface as new spec-deltas with explicit rationale, not silent edits to this document.
|
||||
360
.artifacts/specs/2026-05-04-agentic-comms-architecture.md
Normal file
360
.artifacts/specs/2026-05-04-agentic-comms-architecture.md
Normal file
@@ -0,0 +1,360 @@
|
||||
---
|
||||
title: claudemesh as agentic communication platform — architecture spec
|
||||
status: draft
|
||||
target: 2.0.0 (foundational cleanup) → 2.1.0 (transport adapters) → 2.2.0 (channel typing)
|
||||
author: Alejandro + Claude (cross-checked with Codex GPT-5.2)
|
||||
date: 2026-05-04
|
||||
supersedes: none
|
||||
references:
|
||||
- 2026-05-02-architecture-north-star.md (CLI-first commitment, push-pipe)
|
||||
- 2026-05-04-per-session-presence.md (per-launch session pubkey + attestation)
|
||||
- apps/cli/CHANGELOG.md (1.30.0–1.32.1 history)
|
||||
---
|
||||
|
||||
# claudemesh as agentic communication platform
|
||||
|
||||
## TL;DR
|
||||
|
||||
Today claudemesh is a **peer mesh for Claude Code sessions** — broker + CLI + per-session WS, encrypted DMs, peer list, mid-turn push via MCP. Tomorrow it has to be a **transport-agnostic agentic communication platform** that:
|
||||
|
||||
- treats Claude Code as **one channel type** among many (with first-class support for mid-turn interrupts via `claude/channel`)
|
||||
- accepts **non-Claude agents** as peers — voice agents (LiveKit/Pipecat), OpenAI Assistants, raw HTTP webhook consumers, scheduled cron actors, human IM bridges
|
||||
- exposes **typed channels** (DM, group, topic, RPC, system event, stream) so message semantics aren't shoved through one `targetSpec` string
|
||||
- has a **pluggable transport layer** so a peer can join the mesh over WS, HTTP webhook, SSE, MQTT, or gRPC without changing the broker's data plane
|
||||
- preserves **end-to-end encryption** as a non-negotiable for direct messages
|
||||
|
||||
This document specifies the architecture in three layers (identity, transport, channel), the foundational cleanup needed before adding any of it (Codex caught a few sharp issues), and the migration path that gets us there without a "v2 rewrite" event.
|
||||
|
||||
The CLI-first commitment from the North Star spec stays intact — every channel type and transport adapter must be invocable from `claudemesh <verb>` first, with MCP serving only `claude/channel` push.
|
||||
|
||||
---
|
||||
|
||||
## Why now
|
||||
|
||||
Three forcing functions:
|
||||
|
||||
1. **Multi-session interconnect already broke** (1.30.0 → 1.32.1). The per-session WS subsystem shipped without a push handler because the architecture assumed "one daemon WS per mesh handles everything" and then we bolted session WSes on top without finishing the inbound side. The shape is right; the wiring was incomplete. We need to formalize the role split before adding more transports.
|
||||
|
||||
2. **Codex review surfaced a correctness bug in the broker's drain.** `drainForMember` claims rows by setting `delivered_at = NOW()` *before* the WS push succeeds. If `ws.readyState !== OPEN` at push time, the row is marked delivered and the message is gone. This is at-most-once with no retry. Any future channel type or transport adapter inherits this bug if we don't fix it at the foundation.
|
||||
|
||||
3. **The agentic-comms market is becoming a thing.** Voice agents (LiveKit, Pipecat, ElevenLabs Conversational), OpenAI Assistants threads, MCP servers acting as autonomous workers, scheduled cron actors — they all need a "mesh" to coordinate. claudemesh has the right primitives (E2E crypto, peer presence, typed routing); it just needs the architecture to admit non-Claude peers without forking the codebase.
|
||||
|
||||
---
|
||||
|
||||
## Audience for this architecture
|
||||
|
||||
| Peer type | Identity | Transport | Channels they speak |
|
||||
|---|---|---|---|
|
||||
| **Claude Code session** (today) | Per-launch session pubkey, parent-attested by member key | WS to broker | DM, group, topic, system events; receives mid-turn push via MCP `claude/channel` |
|
||||
| **Headless agent** (e.g. cron job, Hermes/OpenClaw worker) | Member pubkey (no per-launch session) | WS to broker, OR HTTP webhook outbound | DM, group, topic; no mid-turn push (polls inbox) |
|
||||
| **Voice agent** (LiveKit/Pipecat call) | Service identity (signed by mesh owner) | WS to broker, possibly via TURN relay | DM (transcript stream), group (call participants), system events (call lifecycle) |
|
||||
| **OpenAI Assistant / Anthropic Agent** (Skill SDK) | Service identity, OAuth-style scoped token | HTTP webhook (server-side push) OR WS | DM, RPC (tool-style request/response) |
|
||||
| **Human via Slack/WhatsApp bridge** | Service identity for the bridge, end-user mapped via membership | WS (bridge to broker) | DM, topic |
|
||||
| **Webhook consumer** (Stripe-style passive listener) | Service identity, scoped to one channel | HTTP webhook outbound only | Topic (subscribe to events) |
|
||||
|
||||
Every row in this table needs to work without changing the broker's data plane.
|
||||
|
||||
---
|
||||
|
||||
## Layer 1: Identity
|
||||
|
||||
### Today
|
||||
|
||||
Two identity types coexist:
|
||||
|
||||
- **Member identity** — stable Ed25519 keypair held in `~/.claudemesh/config.json`. One per joined mesh. Used for hello signature on the daemon's main WS; used as the cryptographic root of trust for sibling sessions.
|
||||
- **Session identity** — ephemeral Ed25519 keypair generated per `claudemesh launch`. Parent-signed attestation vouches for it (TTL 12h, broker cap 24h). Used for hello signature on the per-session WS; used as the routing key for DMs targeted at *this specific launched session*.
|
||||
|
||||
This is enough for Claude Code peers. It's not enough for the audience table above.
|
||||
|
||||
### Proposed: third identity type — **service identity**
|
||||
|
||||
A service identity is what a non-Claude integration uses to authenticate:
|
||||
|
||||
```
|
||||
ServiceIdentity {
|
||||
member_id // The mesh member that owns this service (auth boundary)
|
||||
service_id // Stable id for the service ("openai-assistant-foo", "livekit-room-bar")
|
||||
service_type // "openai-assistant" | "livekit-room" | "webhook" | "voice-agent" | ...
|
||||
scopes // ["dm:read", "topic:write", "rpc:invoke", ...]
|
||||
attestation // member-signed: { service_id, scopes, expires_at, signature }
|
||||
transport_hint // "ws" | "http-webhook" | "sse" — informs how the broker reaches it
|
||||
}
|
||||
```
|
||||
|
||||
**Three identity types, one auth model:**
|
||||
- All identities resolve to a `member_id` (the auth boundary — grants, kicks, bans operate on members).
|
||||
- Identities differ in *liveness* (member = always; session = per-launch; service = scoped/scheduled) and in *transport hint* (member/session = WS-resident; service = polymorphic).
|
||||
|
||||
**Backward compatibility:** existing member + session identities are unchanged. Service identity is additive.
|
||||
|
||||
### Cryptographic implications
|
||||
|
||||
- E2E encryption (`crypto_box`) targets a public key. Member pubkey, session pubkey, service pubkey all work the same way.
|
||||
- A service that can't hold a long-lived secret (e.g. OpenAI Assistant calling out via HTTPS) gets a **delegated identity** the daemon holds — sender encrypts to the daemon's per-member key, daemon re-encrypts and forwards over the service's webhook. This adds trust in the daemon, but it's the only way to bridge to non-crypto-native peers without giving them raw secrets.
|
||||
|
||||
---
|
||||
|
||||
## Layer 2: Transport
|
||||
|
||||
### Today
|
||||
|
||||
One transport: **WebSocket to broker** (`wss://ic.claudemesh.com/ws`). Everything goes through it — hello, send, push, RPC. The CLI's daemon holds two WS instances per mesh (member-keyed `DaemonBrokerClient` + per-launch `SessionBrokerClient`).
|
||||
|
||||
### Proposed: transport adapter interface
|
||||
|
||||
```typescript
|
||||
interface BrokerTransport {
|
||||
/** One-time hello + auth handshake. Identity is opaque to the transport. */
|
||||
connect(opts: TransportConnectOpts): Promise<TransportSession>;
|
||||
|
||||
/** Send a typed envelope. Returns a delivery promise (ack or terminal failure). */
|
||||
send(envelope: Envelope): Promise<SendResult>;
|
||||
|
||||
/** Stream of inbound envelopes. Pull-model so a transport can be a webhook,
|
||||
* not just a long-lived socket. */
|
||||
inbound(): AsyncIterable<Envelope>;
|
||||
|
||||
/** Close cleanly. */
|
||||
close(reason?: string): Promise<void>;
|
||||
|
||||
/** Capabilities surfaced to the daemon — broker uses this to decide
|
||||
* whether mid-turn push is possible, whether RPC blocks are
|
||||
* supported, etc. */
|
||||
capabilities: TransportCapabilities;
|
||||
}
|
||||
```
|
||||
|
||||
**Concrete adapters at v2.1.0:**
|
||||
|
||||
1. **`WsBrokerTransport`** — current WS implementation. The `DaemonBrokerClient` and `SessionBrokerClient` are recast as two roles using this transport with different hello payloads.
|
||||
2. **`HttpWebhookTransport`** — for service identities that can't hold a WS open. Outbound: HTTP POST to the broker's `/v1/send`. Inbound: broker calls back to a registered webhook URL with retry + signature. Mid-turn push is not possible (degrades gracefully).
|
||||
3. **`SseTransport`** — for browsers / restricted environments. Outbound: HTTP POST. Inbound: SSE stream from broker to client.
|
||||
|
||||
**Future adapters (v2.3+):**
|
||||
|
||||
4. **`LiveKitTransport`** — for voice agents. The "broker" is a LiveKit room; messages are LiveKit data-channel packets. Bridges to the central broker via a daemon side-car.
|
||||
5. **`MqttTransport`** — for IoT / fleet scenarios.
|
||||
6. **`GrpcTransport`** — for low-latency intra-cluster.
|
||||
|
||||
Any new adapter implements the same interface; broker logic is transport-agnostic at the API boundary.
|
||||
|
||||
### The two-role model (Codex's correction)
|
||||
|
||||
Even within one transport, the daemon holds **two roles per mesh**, not one connection per launch:
|
||||
|
||||
- **Control-plane connection** — one per mesh, member-keyed. Carries: outbox drain (one queue, can't race), `list_peers`/state/memory/skill RPCs, inbound for `*` broadcasts and member-targeted DMs (legacy traffic + zero-launch state).
|
||||
- **Session connections** — N per mesh, session-keyed. Carries: presence row keyed on session pubkey, inbound for session-targeted DMs.
|
||||
|
||||
This is what we have today; the spec just makes the role split explicit. The mistake in 1.30.0–1.32.0 was treating session connections as "presence-only" instead of "second-class peers." 1.32.1 corrects that.
|
||||
|
||||
### Foundational cleanup (ship first, before any new transport)
|
||||
|
||||
1. **Extract `connectWsWithBackoff` helper** — current `DaemonBrokerClient` and `SessionBrokerClient` duplicate the WS lifecycle (open, hello, ack-timeout, close, backoff, reconnect). Codex's recommendation: composition, not inheritance. A single helper takes `{ url, buildHello, onMessage, onStatusChange }` and both clients call it. Eliminates the drift bug class that produced session_replaced thrashing.
|
||||
|
||||
2. **Drop the daemon's stray `sessionPubkey`** (`apps/cli/src/daemon/broker.ts:113`). It's a leftover from the era when the daemon WS was the only WS. The session role now owns session pubkeys. If we want the daemon itself to be addressable by a stable pubkey, rename it `daemonPubkey` and document it; today it's dead ballast.
|
||||
|
||||
3. **Tighten daemon-WS inbound filter, don't remove it** (Codex's correction to my prior take). Daemon WS should still receive `*` broadcasts and member-targeted DMs (legacy senders, zero-launch state). It should NOT decrypt session-targeted DMs (that's the session WS's job, and decryption requires the session secret which the daemon WS doesn't have anyway).
|
||||
|
||||
4. **Fix the broker drain race** (`apps/broker/src/broker.ts:2399-2402`). Add `claimed_at` + `claim_id` columns; claim sets `claimed_at = NOW()` (NOT `delivered_at`); push runs; `delivered_at = NOW()` is set ONLY after `ws.send` succeeds. Re-eligible if `claimed_at` is older than the lease timeout (e.g. 30s). Combined with `client_message_id` dedupe on the receiver side, this gives at-least-once semantics, which is what an agentic comms platform needs.
|
||||
|
||||
5. **Decouple presence-WS-role from session-WS-role at the broker.** Today `connectPresence` is called from both `handleHello` and `handleSessionHello`. The two paths diverge in identity (member vs session pubkey) and dedup key (sessionId in both cases). Make the role explicit on the presence row (`role: "control-plane" | "session" | "service"`) so list_peers, fan-out, and reconnect can reason about it. Hidden `claudemesh-daemon` rows in 1.32.0's `peer list` are a hack covering for missing typing.
|
||||
|
||||
---
|
||||
|
||||
## Layer 3: Channels
|
||||
|
||||
### Today
|
||||
|
||||
One channel type: **direct messages with target-spec routing**. `targetSpec` is a string that the broker pattern-matches:
|
||||
- `<64-hex-pubkey>` → DM to that member or session
|
||||
- `*` → broadcast to mesh
|
||||
- `@<groupname>` → group post
|
||||
- `#<topicId>` → topic post
|
||||
|
||||
This works but it's overloaded — the same `send` verb covers DMs, broadcasts, groups, topics, and (since v0.9) tagged messages. As we add agentic peers, the semantics matter and the routing key string can't carry them.
|
||||
|
||||
### Proposed: typed channel envelope
|
||||
|
||||
```typescript
|
||||
type ChannelType =
|
||||
| "dm" // 1:1 message, encrypted to recipient pubkey
|
||||
| "group" // post to named group, encrypted per-recipient (today: base64 plaintext)
|
||||
| "topic" // pub/sub topic, persisted, history available, per-topic symmetric key
|
||||
| "rpc" // request/response, correlation id, timeout, structured result
|
||||
| "system" // peer_joined / peer_left / topology / lifecycle events
|
||||
| "stream"; // long-lived data stream (voice transcript, log tail, file transfer chunks)
|
||||
|
||||
interface Envelope {
|
||||
/** Schema version. v1 = current opaque shape. v2 = this typed shape. */
|
||||
v: 2;
|
||||
/** What semantics the receiver should apply. */
|
||||
channel: ChannelType;
|
||||
/** Target — pubkey for dm, group name for group, topic id for topic, etc.
|
||||
* Same wire format as today's targetSpec, but typed. */
|
||||
target: string;
|
||||
/** Sender identity (member, session, or service pubkey). */
|
||||
from: string;
|
||||
/** Encrypted payload + crypto envelope. Channel type drives crypto:
|
||||
* - dm: crypto_box to recipient pubkey
|
||||
* - group: per-recipient seal (today: plaintext)
|
||||
* - topic: symmetric key (today: plaintext, v0.2.0+ adds per-topic key)
|
||||
* - rpc / system / stream: same as DM (crypto_box) */
|
||||
body: { nonce: string; ciphertext: string; bodyVersion: number };
|
||||
/** Optional metadata, varies by channel type. */
|
||||
meta?: {
|
||||
/** Stable client-supplied id for dedupe (existing field, made required for v2). */
|
||||
clientMessageId: string;
|
||||
/** Sender's canonical fingerprint per spec §4.4 (existing field). */
|
||||
requestFingerprint?: string;
|
||||
/** dm/group: priority gate (now/next/low). rpc: timeout_ms. stream: chunk_id. */
|
||||
priority?: "now" | "next" | "low";
|
||||
timeoutMs?: number;
|
||||
streamChunkId?: number;
|
||||
/** dm/topic: replyTo for threading. */
|
||||
replyToId?: string;
|
||||
/** topic: mentions list (existing field). */
|
||||
mentions?: string[];
|
||||
/** rpc: correlation back-edge so the broker can route the response. */
|
||||
rpcCorrelationId?: string;
|
||||
};
|
||||
/** Sender signature over (channel, target, from, nonce, ciphertext, meta). */
|
||||
signature?: string;
|
||||
}
|
||||
```
|
||||
|
||||
**Why this matters for agentic peers:**
|
||||
|
||||
- A voice agent sending a partial transcript wants `channel: "stream"` semantics — high-frequency, small chunks, idempotent, no per-message ack required.
|
||||
- An OpenAI Assistant calling a tool wants `channel: "rpc"` — request-response with timeout, correlation back-edge so the response routes.
|
||||
- A scheduled cron actor reporting completion wants `channel: "topic"` — fire-and-forget, persisted history.
|
||||
- Today all of these get bolted onto `dm` with conventions; v2 envelope makes them first-class.
|
||||
|
||||
### Claude Code channels — first-class support
|
||||
|
||||
Two specific channel features for Claude Code:
|
||||
|
||||
1. **Mid-turn interrupt** (`claude/channel` push). Already implemented via the MCP push-pipe. The new envelope makes it explicit: `channel: "dm"` with `meta.priority: "now"` triggers MCP push to a launched session. Other priorities deliver at next inbox poll.
|
||||
|
||||
2. **Reply threading** (`meta.replyToId`). Already partially supported on topics; v2 makes it work uniformly across `dm` and `topic`. The receiver Claude Code session sees a structured reply thread instead of flat history.
|
||||
|
||||
3. **Mentions** (`meta.mentions`). Already supported on topics; v2 surfaces them on `dm` too — useful for `@<peer>` callouts in groups even when the message body is encrypted.
|
||||
|
||||
### Backward compatibility
|
||||
|
||||
Envelope v1 (today's shape) stays accepted by the broker until v3.x. v1 envelopes are auto-upgraded server-side: `channel` inferred from `targetSpec` shape (`*` → group/broadcast, `#` → topic, hex → dm). Existing CLIs keep working.
|
||||
|
||||
---
|
||||
|
||||
## Future integrations (concrete)
|
||||
|
||||
These are not part of v2.0 — they're the test cases the architecture must support:
|
||||
|
||||
### LiveKit voice agent
|
||||
- Service identity: `livekit-room-<id>`, signed by mesh owner.
|
||||
- Transport: dedicated daemon side-car hosts a LiveKit participant; data-channel packets bridge to the central broker via WS.
|
||||
- Channels: `stream` for transcript chunks, `system` for call lifecycle (joined/left/muted), `dm` for sidebar text.
|
||||
- E2E: per-call ephemeral keypair held by the side-car; participants' member keys are discovered via mesh peer list.
|
||||
|
||||
### OpenAI Assistant integration
|
||||
- Service identity: `openai-assistant-<id>`, scoped to one or more topics + RPC.
|
||||
- Transport: HTTP webhook out (broker → assistant API), HTTP POST in (assistant → broker `/v1/send`).
|
||||
- Channels: `rpc` for tool-style invocations from claudemesh peers, `topic` for assistant-published events.
|
||||
- Crypto: delegated to daemon (assistant can't hold a libsodium secret; daemon re-encrypts on its behalf).
|
||||
|
||||
### Generic webhook consumer (Stripe-style)
|
||||
- Service identity: `webhook-<consumer-id>`, scoped to subscribed topics.
|
||||
- Transport: HTTP webhook out only. No inbound — it's a passive sink.
|
||||
- Channels: `topic` only.
|
||||
- Crypto: not E2E; webhook bodies are signed (HMAC-SHA256, sender = mesh) but plaintext.
|
||||
|
||||
### Human-via-WhatsApp bridge
|
||||
- Service identity: `whatsapp-bridge`, with member-mapping for each end-user.
|
||||
- Transport: WS (bridge holds long connection to broker), bridges to WhatsApp Business API.
|
||||
- Channels: `dm` (1:1 chat → WhatsApp DM), `topic` (claudemesh topic → WhatsApp group).
|
||||
- E2E: bridge holds a per-end-user delegated key; not "true" E2E to the WhatsApp side, but signaled clearly in UX.
|
||||
|
||||
---
|
||||
|
||||
## Migration plan
|
||||
|
||||
### v2.0.0 — Foundational cleanup (no new external surface)
|
||||
**Target: 1–2 weeks**
|
||||
|
||||
- [ ] Extract `connectWsWithBackoff` helper, refactor `DaemonBrokerClient` + `SessionBrokerClient` to use it.
|
||||
- [ ] Drop daemon's stray `sessionPubkey` (or rename + document).
|
||||
- [ ] Tighten daemon-WS inbound filter (broadcast + member-targeted only).
|
||||
- [ ] Add `presence.role` column (`control-plane | session | service`); broker fan-out + list_peers honor it.
|
||||
- [ ] **Fix drain race**: schema migration adds `claimed_at`, `claim_id`, `claim_expires_at` columns; rewrite `drainForMember` for two-phase claim/deliver; add re-claim path for stale leases.
|
||||
- [ ] Receiver-side: harden `client_message_id` dedupe (already partial in 1.32.x; finish for at-least-once). Add idempotent insert that returns existing row on conflict.
|
||||
|
||||
**Success criteria:**
|
||||
- Two-session smoke test still passes (1.32.1 baseline).
|
||||
- Crash-mid-push test: kill broker between claim and send; verify message redelivers on broker restart + recipient reconnect.
|
||||
- Reconnect storm test: 100 reconnect cycles per session over 60s; zero message loss.
|
||||
|
||||
### v2.1.0 — Transport adapter interface
|
||||
**Target: 2–3 weeks after v2.0.0**
|
||||
|
||||
- [ ] Define `BrokerTransport` interface; refactor existing WS code to be the first implementation.
|
||||
- [ ] Add `HttpWebhookTransport` adapter (broker side: outbound HTTP POST with retry + HMAC signature; daemon side: HTTP server that receives webhook callbacks and inserts into inbox).
|
||||
- [ ] Add `/v1/send` HTTP endpoint on the broker (today the broker is WS-only for sends).
|
||||
- [ ] Service identity registration flow: `claudemesh service register --type webhook --scopes dm:read,topic:write` mints attestation, stores it locally + on broker.
|
||||
- [ ] Basic `SseTransport` for browser/CI use cases.
|
||||
|
||||
**Success criteria:**
|
||||
- A scheduled cron job using only `curl` can send to the mesh (no daemon required).
|
||||
- A webhook consumer subscribed to a topic receives messages within 5s of post.
|
||||
|
||||
### v2.2.0 — Typed channels (envelope v2)
|
||||
**Target: 2–3 weeks after v2.1.0**
|
||||
|
||||
- [ ] Define `Envelope v2` schema; broker accepts both v1 and v2; sender-side code emits v2.
|
||||
- [ ] `channel: "rpc"` end-to-end: correlation id routing, response timeout, `claudemesh rpc <peer> <method> <args>` CLI verb.
|
||||
- [ ] `channel: "stream"` end-to-end: chunked delivery, ordered, idempotent, `claudemesh stream <peer> <stream-id>` CLI verb.
|
||||
- [ ] Mid-turn push (`claude/channel`) honors `channel: "dm"` with `meta.priority: "now"` only.
|
||||
- [ ] Mentions + replyToId surface uniformly across dm and topic.
|
||||
|
||||
**Success criteria:**
|
||||
- Demo: a Claude Code session sends an `rpc` to another Claude Code session, gets a structured response.
|
||||
- Demo: a voice-agent prototype sends `stream` chunks; another peer receives them in order with no gaps.
|
||||
|
||||
### v2.3+ — Concrete external integrations
|
||||
**Target: opportunistic**
|
||||
|
||||
- LiveKit side-car (one PoC integration to validate the architecture).
|
||||
- OpenAI Assistant integration (validate delegated-key crypto path).
|
||||
- WhatsApp bridge (validate human-bridge service identity).
|
||||
|
||||
These are not on the critical path for the architecture; they prove it.
|
||||
|
||||
---
|
||||
|
||||
## Non-goals (explicit)
|
||||
|
||||
- **Replacing Slack / Discord.** claudemesh is for agent coordination. Human chat is a side-effect, not the headline.
|
||||
- **Federation across multiple brokers.** v2.0 stays single-broker per mesh. Multi-broker (gossip / federation) is a separate spec, post-v3.
|
||||
- **Sync-only / no-broker P2P.** Direct peer-to-peer (without the central broker) is a different architecture (libp2p, Iroh). Not in scope.
|
||||
- **Replacing the MCP push-pipe.** Mid-turn interrupt stays MCP-based. The transport-adapter layer is broker-side; MCP is daemon-to-Claude-Code, untouched.
|
||||
|
||||
---
|
||||
|
||||
## Open questions
|
||||
|
||||
1. **How does a service identity prove liveness?** WS gives us implicit liveness via the connection. HTTP webhook services need an explicit heartbeat / health-check. Proposal: broker periodically POSTs to `<webhook>/health`; service is marked offline after 3 consecutive failures.
|
||||
|
||||
2. **RPC routing through offline peers — what's the failure mode?** If `claudemesh rpc <peer> ...` and the peer is offline, do we (a) queue and wait (DM semantics) or (b) fail fast (REST semantics)? Proposal: RPC fails fast with `peer_offline` after a 5s probe; explicit `--wait` flag opts into DM-style queue.
|
||||
|
||||
3. **Per-topic symmetric key rotation.** Existing v0.2.0 spec mentions per-topic keys. Rotation policy (when, who triggers, how members re-sync) is unsolved. Defer to a separate spec; v2.2.0 ships with one-shot keys (rotate by re-creating topic).
|
||||
|
||||
---
|
||||
|
||||
## Acknowledgements
|
||||
|
||||
Cross-checked with Codex (GPT-5.2, high reasoning) on the foundational cleanup section. Codex caught:
|
||||
- The "remove daemon-WS inbound entirely" idea would silently lose broadcasts + member-targeted DMs whenever zero launches exist. Corrected.
|
||||
- Inheritance for the dup'd lifecycle would become a god class. Composition via helper is the right call.
|
||||
- The drain race needs a `claimed_at` + delivered-on-success fix; "check OPEN before claim" still drops on crash.
|
||||
- Token-keyed registry is correct (token = auth boundary), not a smell.
|
||||
|
||||
The agentic-comms / typed-channels / transport-adapter layers are mine — Codex didn't touch those because the question I asked was about the existing architecture's smells, not the future roadmap.
|
||||
288
.artifacts/specs/2026-05-04-session-capabilities.md
Normal file
288
.artifacts/specs/2026-05-04-session-capabilities.md
Normal file
@@ -0,0 +1,288 @@
|
||||
# Session capabilities — first-class concept
|
||||
|
||||
**Status:** spec, queued behind v0.3.0 topic-encryption work.
|
||||
**Owner:** alezmad
|
||||
**Author:** Claude (Sprint B follow-up, 2026-05-04)
|
||||
**Related:** `2026-04-15-per-peer-capabilities.md` (existing per-peer
|
||||
caps system, member-keyed), `2026-05-04-per-session-presence.md`
|
||||
(per-launch session presence — what we're now restricting).
|
||||
|
||||
## Problem
|
||||
|
||||
Per-peer capability grants (`apps/broker/src/index.ts:2178+, 2309+`)
|
||||
are keyed on the sender's **stable member pubkey**. The grant model
|
||||
gives the recipient fine-grained control: "alice can DM me",
|
||||
"bob can read state but not broadcast", etc.
|
||||
|
||||
But: as of v1.30.0 (`per-session-presence`), every `claudemesh
|
||||
launch` mints a per-launch ephemeral keypair with a parent attestation
|
||||
binding it to the member identity. The launched session inherits **all**
|
||||
the member's capabilities transitively, because cap enforcement always
|
||||
falls through to the member key.
|
||||
|
||||
Concretely:
|
||||
|
||||
- Member `alice` is in mesh `flexicar`, granted `dm + state-read +
|
||||
state-write` by everyone.
|
||||
- Alice launches a session with `claudemesh launch` to do an automated
|
||||
task — say, run a Claude Code agent that iterates over PRs.
|
||||
- That session has full member privileges. It can DM peers, write
|
||||
shared state keys (e.g. clobber `current-pr`), grant new caps, ban
|
||||
members, etc. — none of which the user wanted to delegate.
|
||||
|
||||
There is no way to express "this session can DM peers but cannot
|
||||
deploy services or grant caps." The parent attestation is a binary
|
||||
existence proof — "this session was vouched by a member" — with no
|
||||
capability subset.
|
||||
|
||||
Plus an adjacent footgun: `set_state` (`apps/broker/src/index.ts:2949`)
|
||||
has **no cap check at all**. Anyone in the mesh can write any key. The
|
||||
spec at `2026-04-15-per-peer-capabilities.md` lists `state-write` as a
|
||||
planned cap but it was never wired into the broker. Shared keys like
|
||||
`current-pr` are write-anyone today.
|
||||
|
||||
## Goal
|
||||
|
||||
A launched session can be issued **a capability subset** of its
|
||||
parent member, signed by the parent at launch time, and the broker
|
||||
enforces the **intersection** of recipient grants × session caps on
|
||||
every protected operation.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- Changing the existing per-peer cap model. Member-keyed grants stay
|
||||
authoritative for "who is allowed to talk to me."
|
||||
- Cross-machine session caps (waiting on 2.0.0 HKDF identity).
|
||||
- Per-tool granularity inside the Claude Code MCP surface — this
|
||||
spec only covers the broker-enforceable verbs (dm, broadcast,
|
||||
state-read, state-write, grant, kick, ban, profile-write,
|
||||
service-deploy).
|
||||
- Delegation: a session cannot re-vouch a sub-session with its own
|
||||
cap subset. Only members can attest sessions. (Could be lifted in
|
||||
a future spec; today's launch flow doesn't need it.)
|
||||
|
||||
## Design
|
||||
|
||||
### Capability vocabulary
|
||||
|
||||
Existing (today, member-level):
|
||||
|
||||
| Capability | Effect when GRANTED on a recipient → sender pair |
|
||||
|---------------|---------------------------------------------------|
|
||||
| `read` | Sender appears in recipient's `list_peers` |
|
||||
| `dm` | Sender can DM recipient |
|
||||
| `broadcast` | Sender's broadcasts reach recipient |
|
||||
| `state-read` | Sender can read shared state |
|
||||
| `state-write` | (planned) Sender can write shared state |
|
||||
| `file-read` | Sender can fetch files recipient shared |
|
||||
|
||||
New (session-level — cap subset on the attestation):
|
||||
|
||||
These are the **verbs the session is allowed to invoke**, NOT what
|
||||
peers can do TO it. A session attestation declaring `["dm", "read"]`
|
||||
means the session can SEND dm/read-list operations; it cannot
|
||||
broadcast, write state, grant, etc.
|
||||
|
||||
| Session cap | Gates which broker operations |
|
||||
|-------------------|------------------------------------------------|
|
||||
| `dm` | `send` with single recipient |
|
||||
| `broadcast` | `send` with `*`, `@group`, `#topic` |
|
||||
| `state-read` | `get_state`, `list_state` |
|
||||
| `state-write` | `set_state` |
|
||||
| `grant` | `grant`, `revoke`, `block` |
|
||||
| `kick` | `kick`, `disconnect` |
|
||||
| `ban` | `ban`, `unban` |
|
||||
| `profile-write` | `set_profile`, `set_summary`, `set_status` |
|
||||
| `service-deploy` | `mesh_service_register`, `_unregister` |
|
||||
|
||||
The default cap set when no subset is declared: the **full member
|
||||
set** (today's behavior — opt-in restriction, not breaking).
|
||||
|
||||
### Attestation v2
|
||||
|
||||
Existing v1 (`apps/cli/src/services/broker/session-hello-sig.ts`):
|
||||
|
||||
```
|
||||
canonical = `claudemesh-session-attest|<parent>|<session>|<expires>`
|
||||
```
|
||||
|
||||
New v2 (additive — broker accepts both):
|
||||
|
||||
```
|
||||
canonical = `claudemesh-session-attest-v2|<parent>|<session>|<expires>|<sorted-caps-csv>`
|
||||
```
|
||||
|
||||
Where `<sorted-caps-csv>` is the lower-cased, comma-joined,
|
||||
ASCII-sorted cap list. Empty-list = full member caps (default,
|
||||
back-compat).
|
||||
|
||||
**Wire shape additions on `session_hello`:**
|
||||
|
||||
```ts
|
||||
{
|
||||
type: "session_hello",
|
||||
...existing fields...,
|
||||
parentAttestation: {
|
||||
sessionPubkey,
|
||||
parentMemberPubkey,
|
||||
expiresAt,
|
||||
signature,
|
||||
// NEW:
|
||||
allowed_caps?: string[], // omitted = full member set
|
||||
version?: 2, // omitted = v1
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
The broker version-detects: `version === 2` → verify v2 canonical
|
||||
including `allowed_caps`. Default behavior is unchanged for clients
|
||||
that don't pass it.
|
||||
|
||||
### Enforcement
|
||||
|
||||
Add `allowed_caps: string[] | null` to the in-memory `PeerConn`
|
||||
shape (`apps/broker/src/index.ts:131`). Populated from
|
||||
`handleSessionHello` (the v2 attestation supplies it) and from
|
||||
`handleHello` (control-plane / member connection — set to `null`,
|
||||
meaning "full member caps").
|
||||
|
||||
**Effective cap check** for a sending peer needing `cap`:
|
||||
|
||||
```ts
|
||||
function senderHasCap(conn: PeerConn, cap: string): boolean {
|
||||
if (conn.allowed_caps === null) return true; // member-level, no subset
|
||||
return conn.allowed_caps.includes(cap);
|
||||
}
|
||||
```
|
||||
|
||||
Wire this into every broker operation in the table above. The
|
||||
existing per-peer recipient-cap check at `2178+, 2309+` stays —
|
||||
session caps gate the **sender side**, recipient grants gate the
|
||||
**receive side**, and both must allow:
|
||||
|
||||
```
|
||||
allowed = senderHasCap(conn, capNeeded) && recipientGrants[sender][capNeeded]
|
||||
```
|
||||
|
||||
### `set_state` gate (bonus, ship together)
|
||||
|
||||
Today: no cap check. After this spec: `set_state` requires
|
||||
`state-write` on the sender side. Migration: existing members
|
||||
default to having `state-write` in their member caps (no recipient
|
||||
grant model for state-write — it's a sender-side gate only, mesh-
|
||||
wide). New attestations can omit it to forbid the session.
|
||||
|
||||
The recipient-side analog (per-peer state-write grants) is left for
|
||||
a future spec — today the value of guarding state-write is
|
||||
session-level (avoid an automated session clobbering shared keys),
|
||||
not peer-level.
|
||||
|
||||
### CLI surface
|
||||
|
||||
```
|
||||
claudemesh launch --caps dm,read # tight: read-only chat agent
|
||||
claudemesh launch --caps dm,broadcast # send-only, no state writes
|
||||
claudemesh launch # default: full member caps
|
||||
```
|
||||
|
||||
`claudemesh launch --caps ?` prints the table above with descriptions.
|
||||
|
||||
`claudemesh peer list --json` includes `allowed_caps` per row when
|
||||
present (`null` = full member). Lets users audit what their running
|
||||
sessions can actually do.
|
||||
|
||||
### Migration plan (mirrors `2026-04-15-per-peer-capabilities.md` §"Migration plan")
|
||||
|
||||
1. **Broker schema additive** — `PeerConn.allowed_caps` in-memory
|
||||
only; no DB column. Reload-on-reconnect is fine because the
|
||||
attestation is re-sent on every WS open (it's the proof of
|
||||
identity).
|
||||
|
||||
2. **CLI ships v2 attestation alongside v1.** New `--caps` flag
|
||||
defaults to omitted (= v1 attestation, full caps). Older
|
||||
brokers ignore the new fields entirely.
|
||||
|
||||
3. **Broker accepts v2.** When `allowed_caps` arrives, store it.
|
||||
No enforcement yet — log denied operations as `cap_check_dryrun`
|
||||
metric counter, still allow them through.
|
||||
|
||||
4. **Dry-run release.** Ship one CLI + broker release that emits
|
||||
the metric but doesn't enforce. Watch for false positives in
|
||||
real meshes for ≥ 1 week.
|
||||
|
||||
5. **Flip enforcement on.** Broker rejects operations failing the
|
||||
cap check with `forbidden: missing session capability "<cap>"`.
|
||||
Default ("no caps declared = full member") keeps existing
|
||||
sessions unaffected.
|
||||
|
||||
6. **`set_state` gate** ships in step 5 alongside the rest. Default
|
||||
member caps include `state-write`, so flipping it on doesn't
|
||||
break existing flows. Only sessions that explicitly omit
|
||||
`state-write` from `--caps` lose write access.
|
||||
|
||||
### Crypto notes
|
||||
|
||||
- v2 attestation re-uses `crypto_sign_detached` over the new
|
||||
canonical string; same parent member secret key, same TTL caps
|
||||
(≤24 h), same `expiresAt` semantics.
|
||||
- v1 signatures are NOT v2 signatures — collision is impossible
|
||||
because the canonical strings have different prefixes
|
||||
(`claudemesh-session-attest` vs `claudemesh-session-attest-v2`).
|
||||
Domain separation is intrinsic.
|
||||
- Like the existing per-peer cap system: caps are server-enforced
|
||||
metadata, not capability tokens. A malicious broker can ignore
|
||||
them. This is about UX trust + footgun prevention, not protocol-
|
||||
level security.
|
||||
|
||||
## Open questions
|
||||
|
||||
1. **Should the session attestation also bind to a fingerprint of
|
||||
the launched binary / Claude version?** Would let a member say
|
||||
"this session is constrained to Claude Code v1.34.15" so a
|
||||
compromised launched-binary doesn't get reused. Probably no — too
|
||||
much friction for the threat model.
|
||||
|
||||
2. **What's the right default for `claudemesh launch` going forward?**
|
||||
Once enforcement ships, do we change the default `--caps` from
|
||||
"full member" to "dm + read + state-read"? Tighter but breaks
|
||||
existing automation that writes state. Probably worth a one-
|
||||
release deprecation warning ("your session will lose state-write
|
||||
in v2.0.0 unless you pass --caps state-write") and then flip in
|
||||
v2.0.0.
|
||||
|
||||
3. **Does `--caps` belong in `~/.claudemesh/config.json` per-mesh
|
||||
defaults too?** A user who always launches read-only agents
|
||||
wants `caps: ["dm", "read"]` as a personal default. Easy add;
|
||||
defer until users ask for it.
|
||||
|
||||
4. **Per-tool MCP cap surface?** Out of scope here, but: a `claudemesh
|
||||
launch --tools peer:read,memory:write` would be a finer cut than
|
||||
broker-verb caps. The broker can't enforce that — it'd live in the
|
||||
MCP wrapper / Claude Code's allowedTools. Different layer.
|
||||
|
||||
## Test plan
|
||||
|
||||
- Pure-logic tests on `senderHasCap` (member-level → always true,
|
||||
empty caps → always false, declared caps → exact match).
|
||||
- Broker integration: launch a session with `--caps dm`, attempt
|
||||
`set_state` → expect `forbidden: missing session capability
|
||||
"state-write"`.
|
||||
- v1 attestation still accepted, no `allowed_caps` set, all caps
|
||||
permitted (back-compat).
|
||||
- v2 attestation with empty `allowed_caps` array → broker treats
|
||||
as "explicitly empty, no caps allowed" (NOT "full member"). The
|
||||
full-member default is "field omitted entirely". Test both.
|
||||
- Dry-run mode: cap fail increments the counter but the operation
|
||||
proceeds. Smoke-test before flipping enforcement.
|
||||
|
||||
## Estimate
|
||||
|
||||
- Spec review + open-question resolution: 1–2 days.
|
||||
- Broker change (PeerConn field, attestation v2 accept, per-verb
|
||||
enforcement, dry-run mode): 2–3 days.
|
||||
- CLI change (`--caps` flag, attestation builder, peer list
|
||||
surface): 1 day.
|
||||
- Tests: 1 day.
|
||||
- Dry-run release window: ≥ 1 week.
|
||||
|
||||
Total: ~1 sprint of focused work, plus a dry-run window.
|
||||
350
.artifacts/specs/2026-05-05-continuous-presence.md
Normal file
350
.artifacts/specs/2026-05-05-continuous-presence.md
Normal file
@@ -0,0 +1,350 @@
|
||||
# Continuous presence — lease model + resume token
|
||||
|
||||
**Status:** spec, ready for v0.3.0.
|
||||
**Owner:** alezmad
|
||||
**Author:** Claude (2026-05-05, follow-up to user-reported "after hours claudemesh disconnects")
|
||||
**Related:** `2026-05-04-per-session-presence.md` (per-launch ephemeral keypair), `apps/broker/src/index.ts:5430-5436` (current 30s ping loop), `apps/cli/src/daemon/ws-lifecycle.ts` (current backoff reconnect).
|
||||
|
||||
## Problem
|
||||
|
||||
Today, presence is fused to a single TCP/WS connection. When the
|
||||
connection breaks — half-dead NAT entries, ISP route changes, laptop
|
||||
sleep, broker restart — the broker tears down the presence row, fires
|
||||
`peer_left`, and waits for the daemon to dial a fresh socket and run
|
||||
the full attestation hello again. Other peers see the user blink
|
||||
offline → back online. Messages sent to the session during the gap are
|
||||
either dropped (if it's a `now`/`next` priority DM with no recipient
|
||||
match) or held in `message_queue` for `low` only.
|
||||
|
||||
Concrete symptom (user-reported): `claudemesh peer list` shows zero
|
||||
peers despite multiple sessions being "up" — they're stuck on
|
||||
half-dead TCP connections. Daemon hasn't noticed because no `close`
|
||||
fired. Hours later, kernel TCP keepalive (default Linux: 7200s idle +
|
||||
9 × 75s probes ≈ 2h11m) finally RSTs the socket, daemon's existing
|
||||
backoff reconnects, peers reappear. Until then: zombie session.
|
||||
|
||||
Two coupled bugs:
|
||||
|
||||
1. **No application-layer staleness detection.** Broker pings every
|
||||
30s (line 5431) and updates `lastPingAt` on pong, but never
|
||||
`terminate()`s a connection that stops returning pongs. Daemon
|
||||
doesn't ping at all. Both sides trust the kernel for liveness,
|
||||
which only fires after hours.
|
||||
|
||||
2. **Presence == connection.** Even once the staleness IS detected
|
||||
and the daemon reconnects, peers see a full `peer_left` /
|
||||
`peer_joined` cycle for a network blip that took 1–30 seconds.
|
||||
Outbound messages during the gap that target the session by
|
||||
pubkey route to nothing.
|
||||
|
||||
The user's ask: peers should never see a gap during transient
|
||||
disconnects. Presence should be continuous as long as the *session
|
||||
intent* is alive, regardless of how many sockets carried it.
|
||||
|
||||
## Goal
|
||||
|
||||
Presence is a **lease** keyed off the session's stable identity
|
||||
(`sessionPubkey`), held in broker memory + DB, with a TTL refreshed
|
||||
on every keepalive. Sockets come and go beneath the lease. Other peers
|
||||
see continuous online status across reconnects up to the lease TTL.
|
||||
|
||||
Specifically:
|
||||
|
||||
- A daemon (or per-session WS) can drop and re-establish the WS
|
||||
within a configurable grace window (default 90s) without any peer
|
||||
observing `peer_left` / `peer_joined`.
|
||||
- Messages sent to a session while its socket is mid-flap are queued,
|
||||
delivered on the next reattach, ordered.
|
||||
- Reconnect itself is sub-second on the wire when a `resume_token` is
|
||||
presented — broker recognises the session, restores the slot, no
|
||||
re-attestation round-trip.
|
||||
- After the grace window expires, the broker fires `peer_left`
|
||||
exactly once; on a later reconnect it fires `peer_joined` exactly
|
||||
once. No flapping.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **Multi-broker handoff.** Out of scope. If the broker process
|
||||
restarts, leases are lost and we fall back to today's behavior
|
||||
(clean reconnect, peers see one cycle). A future spec can address
|
||||
this with a shared lease store (Redis / Postgres LISTEN).
|
||||
- **Dual-socket on the daemon.** Useful gold-plating but not required
|
||||
for the user-facing problem. Single-socket with watchdog +
|
||||
resume-token covers the failure modes actually observed (NAT drops,
|
||||
ISP blips, sleep <90s).
|
||||
- **Manual `claudemesh reconnect` CLI.** Not needed; the lease model
|
||||
makes it redundant. Re-evaluate if real support cases surface.
|
||||
|
||||
## Design
|
||||
|
||||
### Lease model
|
||||
|
||||
```
|
||||
sessionPubkey → { transport: "online" | "offline",
|
||||
leaseUntil: Date,
|
||||
ws: WebSocket | null,
|
||||
...existing PeerConn fields }
|
||||
```
|
||||
|
||||
Today the `connections` Map IS keyed by `presenceId`, which is a fresh
|
||||
UUID per WS. We change that key to `sessionPubkey` (member-WS:
|
||||
`memberPubkey`; session-WS: `sessionPubkey`). The PeerConn struct
|
||||
gains:
|
||||
|
||||
```ts
|
||||
transport: "online" | "offline";
|
||||
leaseUntil: Date; // Date.now() + LEASE_TTL_MS
|
||||
evictionTimer: NodeJS.Timeout | null;
|
||||
```
|
||||
|
||||
### State transitions
|
||||
|
||||
**On WS open + hello accepted (initial):**
|
||||
- Insert into `connections` with `transport: "online"`,
|
||||
`leaseUntil: now + 90s`, `evictionTimer: null`.
|
||||
- Broadcast `peer_joined` (today's behavior).
|
||||
- Issue `resume_token` (see below) in the `hello_ack`.
|
||||
|
||||
**On WS open + hello carries valid `resume_token`:**
|
||||
- Look up by `sessionPubkey`, verify token signature + freshness
|
||||
(TTL <= LEASE_TTL_MS). If valid AND entry exists with
|
||||
`transport: "offline"`:
|
||||
- Cancel `evictionTimer`.
|
||||
- Swap `ws` reference.
|
||||
- Set `transport: "online"`, refresh `leaseUntil`.
|
||||
- **Do NOT** broadcast `peer_joined`. The lease never expired.
|
||||
- Drain any queued DMs accumulated during offline window.
|
||||
- Reply `hello_ack` with new `resume_token`.
|
||||
- If entry exists with `transport: "online"` (token replay attack or
|
||||
rapid reconnect race): close old `ws` with `1000, "session_replaced"`
|
||||
before swapping. Same as today's `oldConn.ws.close(1000, ...)`
|
||||
pattern at lines 1768/1996.
|
||||
- If no entry exists or token is stale: treat as a fresh hello,
|
||||
broadcast `peer_joined`. Token expired = same as a cold start.
|
||||
|
||||
**On WS close (any reason):**
|
||||
- Look up by `sessionPubkey`. If not found, no-op (already evicted).
|
||||
- Set `transport: "offline"`, clear `ws` reference.
|
||||
- Start `evictionTimer = setTimeout(evict, GRACE_MS)`.
|
||||
- **Do NOT** broadcast `peer_left`. **Do NOT** delete the entry.
|
||||
- **Do NOT** call `disconnectPresence(presenceId)` yet.
|
||||
|
||||
**On `evictionTimer` fire (lease expired without reattach):**
|
||||
- Delete from `connections`.
|
||||
- Broadcast `peer_left` (today's behavior at lines 5167-5189).
|
||||
- `decMeshCount`.
|
||||
- `disconnectPresence(presenceId)`.
|
||||
- Clean up URL watches, stream subs, MCP registry — same as today's
|
||||
close handler.
|
||||
- Audit `peer_left`.
|
||||
|
||||
**Watchdog (broker):**
|
||||
- The 30s ping loop (line 5431) gains a staleness check: if any
|
||||
conn's `transport === "online"` and `lastPingAt < now - 75s`, call
|
||||
`ws.terminate()`. This converts the half-dead socket into a clean
|
||||
`close` event, which fires the lease-offline transition above.
|
||||
- Same logic on the daemon side (see § Daemon changes).
|
||||
|
||||
### Resume token
|
||||
|
||||
A short opaque string the broker hands the daemon in `hello_ack`.
|
||||
Format: `mesh-resume.v1.<base64url(JSON-payload)>.<base64url(sig)>`
|
||||
where `JSON-payload = { sub: <sessionPubkey>, mid: <meshId>, exp:
|
||||
<unix-ms>, iat: <unix-ms> }` and `sig = ed25519(brokerSigningKey,
|
||||
JSON-payload)`.
|
||||
|
||||
- **Why a token, not just sessionPubkey?** A session needs to prove
|
||||
it's the holder of an existing lease without re-running the full
|
||||
attestation handshake (which involves member key + parent
|
||||
attestation lookup). The token is a server-issued cookie: cheap to
|
||||
verify, scoped to a single session, expires with the lease.
|
||||
- **Storage:** broker keeps the signing key in env (`RESUME_TOKEN_KEY`,
|
||||
generated on first boot if missing, persisted to a config row). No
|
||||
DB column needed for the tokens themselves — they're verified by
|
||||
signature alone.
|
||||
- **TTL:** equal to LEASE_TTL_MS (90s). After that the daemon must
|
||||
re-handshake with full attestation. Refreshed on every successful
|
||||
reattach.
|
||||
- **Daemon storage:** in-memory only. Lost on daemon restart, which
|
||||
is correct: a daemon restart is a real reconnect and should run
|
||||
the full hello.
|
||||
|
||||
### Wire protocol additions
|
||||
|
||||
`hello` (member-WS, session-WS, fresh-launch hello — all three):
|
||||
```diff
|
||||
{
|
||||
type: "hello",
|
||||
memberPubkey: "...",
|
||||
sessionPubkey: "...", // session-WS only
|
||||
attestation: "...", // session-WS only
|
||||
signature: "...",
|
||||
+ resumeToken?: "mesh-resume.v1...", // optional; presence = reattach attempt
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
`hello_ack`:
|
||||
```diff
|
||||
{
|
||||
type: "hello_ack",
|
||||
presenceId: "...",
|
||||
...
|
||||
+ resumeToken: "mesh-resume.v1...", // always issued; replaces prior on reattach
|
||||
+ leaseTtlMs: 90000, // informational; daemon may use for ping cadence
|
||||
}
|
||||
```
|
||||
|
||||
No new message types. Old daemons that don't send `resumeToken` get
|
||||
today's full-handshake behavior — fully backward compatible.
|
||||
|
||||
### Message queue during grace window
|
||||
|
||||
Today: DMs to a presence whose WS is closed → routed to
|
||||
`message_queue` only for `priority: low`; `now`/`next` either route
|
||||
to a different connected session of the same member or drop.
|
||||
|
||||
Change: when broker would route to a session whose
|
||||
`transport === "offline"` (lease still valid), enqueue regardless of
|
||||
priority. On reattach, the existing inbox-drain path
|
||||
(`maybePushQueuedMessages` at line 967) flushes them in order. The
|
||||
`message_queue` already has the schema for this; we're just relaxing
|
||||
the priority gate when the target is in grace.
|
||||
|
||||
### Constants
|
||||
|
||||
```ts
|
||||
const LEASE_TTL_MS = 90_000; // grace window after WS close
|
||||
const PING_INTERVAL_MS = 30_000; // unchanged
|
||||
const STALE_PONG_THRESHOLD_MS = 75_000; // 2.5x ping interval
|
||||
const RESUME_TOKEN_TTL_MS = LEASE_TTL_MS;
|
||||
```
|
||||
|
||||
`LEASE_TTL_MS` = 90s rationale: long enough to absorb a sleep/resume
|
||||
cycle, NAT timeout, ISP route flap, mobile→wifi handover. Short
|
||||
enough that a true crash (daemon killed, machine off) clears the
|
||||
session within 90s — peers don't see ghost online status forever.
|
||||
Configurable via env (`LEASE_TTL_MS`) for self-hosted brokers.
|
||||
|
||||
## Daemon changes
|
||||
|
||||
### Watchdog
|
||||
|
||||
In `ws-lifecycle.ts`, add an `idleWatchdog` parallel to the existing
|
||||
backoff/reconnect machinery:
|
||||
|
||||
```ts
|
||||
let lastActivity = Date.now(); // bumped on every incoming message + pong
|
||||
const watchdog = setInterval(() => {
|
||||
if (Date.now() - lastActivity > STALE_THRESHOLD_MS) {
|
||||
log("warn", "ws_stale_terminate", { url: opts.url });
|
||||
sock.terminate(); // fires existing close handler → reconnect path
|
||||
} else if (sock.readyState === sock.OPEN) {
|
||||
sock.ping(); // matches broker's 30s cadence, gives broker a pong
|
||||
}
|
||||
}, PING_INTERVAL_MS);
|
||||
sock.on("message", () => { lastActivity = Date.now(); });
|
||||
sock.on("pong", () => { lastActivity = Date.now(); });
|
||||
```
|
||||
|
||||
Cleanup `clearInterval(watchdog)` in the close handler and explicit
|
||||
`close()` path.
|
||||
|
||||
### Resume token in hello
|
||||
|
||||
`apps/cli/src/daemon/broker.ts:136` and equivalent in
|
||||
`session-broker.ts`: persist the `resumeToken` from each successful
|
||||
`hello_ack` into a private field, include it in the next
|
||||
`buildHello()` call. On daemon restart the field is empty → cold
|
||||
start, exactly today's behavior.
|
||||
|
||||
### No CLI changes
|
||||
|
||||
`claudemesh peer list` keeps reading the broker's `connections` Map
|
||||
which now reflects continuous presence. Users see online sessions as
|
||||
online during transient blips. No UX surface changes.
|
||||
|
||||
## Migration
|
||||
|
||||
- New broker is fully backward compatible with old daemons (resume
|
||||
token is optional, defaults fall through to today's path).
|
||||
- New daemons against an old broker: token is sent but ignored, full
|
||||
handshake runs each reconnect — same as today.
|
||||
- DB migration: none. `presence` table semantics unchanged. The
|
||||
`disconnectedAt` column is now set only on lease eviction (>90s),
|
||||
not on every WS close. This is a behavioral change but not a
|
||||
schema change.
|
||||
- Add ENV var `RESUME_TOKEN_KEY` (broker generates on first boot if
|
||||
unset, persists to a singleton config row).
|
||||
|
||||
## Test plan
|
||||
|
||||
1. **Sleep test:** kill -STOP the daemon for 60s, then kill -CONT.
|
||||
Expect: peers never see `peer_left`. Daemon's WS is dead-on-arrival
|
||||
when it wakes; watchdog terminates it; reconnect with resume_token
|
||||
succeeds within 1-2s; lease was at ~30s of its 90s TTL when the
|
||||
daemon resumed.
|
||||
|
||||
2. **Hard offline:** kill -STOP for 120s, kill -CONT. Expect: peers
|
||||
see exactly one `peer_left` at t=90s, then exactly one
|
||||
`peer_joined` after the daemon resumes and reconnects (resume
|
||||
token is now stale; full handshake runs).
|
||||
|
||||
3. **NAT drop simulation:** `iptables -A OUTPUT -p tcp --dport 443
|
||||
-j DROP` for 60s on the daemon host, then remove the rule. Expect:
|
||||
broker pings stop landing, broker-side watchdog calls
|
||||
`ws.terminate()` at t=75s, lease enters grace, daemon's own
|
||||
watchdog fires within ~30s, daemon reconnects with resume_token,
|
||||
peers never see a flap.
|
||||
|
||||
4. **Message-during-grace:** while a target session is in grace
|
||||
(offline, lease valid), send a `priority: now` DM. Expect: queued
|
||||
in `message_queue`, delivered exactly once on reattach, no
|
||||
`peer_left` visible to sender, ack returns delivered.
|
||||
|
||||
5. **Replay attack:** capture a resume_token in flight, replay it
|
||||
against a different broker connection while the original session
|
||||
is still online. Expect: broker treats it as a reconnect for an
|
||||
already-online session → closes old WS with `session_replaced`,
|
||||
new WS takes over. Equivalent to today's session-replacement
|
||||
semantics; the original session detects the close and either
|
||||
reconnects (if it's still alive) or gives up.
|
||||
|
||||
6. **Token forgery:** send a `resumeToken` not signed by the broker.
|
||||
Expect: signature check fails, broker treats hello as a fresh
|
||||
handshake (or rejects if the rest of the hello is invalid).
|
||||
|
||||
## Open questions
|
||||
|
||||
- **Should `peer list` expose a `transport` field** so callers can
|
||||
distinguish "leased but offline" from "online"? Default no — the
|
||||
abstraction we're selling is "they're online." But debugging may
|
||||
want it; gate it behind `--all` or `--debug`.
|
||||
- **What about the broker-side `mcpRegistry` cleanup?** Today we
|
||||
delete non-persistent MCP entries on WS close (line 5217). With
|
||||
leases, we should defer that to lease eviction, not WS close.
|
||||
Otherwise an MCP server registered by a session disappears every
|
||||
time its WS reconnects.
|
||||
|
||||
## Build order
|
||||
|
||||
1. **Broker lease model** — change `connections` keying, add
|
||||
`transport`/`leaseUntil`/`evictionTimer`, refactor close handler
|
||||
to start grace timer instead of immediate teardown, refactor
|
||||
eviction path. (~80 lines.)
|
||||
2. **Resume token** — signing key bootstrap, token issue/verify,
|
||||
wire format, hello_ack changes. (~50 lines + 1 config row.)
|
||||
3. **Daemon watchdog** — `ws-lifecycle.ts` adds `idleWatchdog` and
|
||||
stores `resumeToken` from acks. (~25 lines.)
|
||||
4. **Daemon hello** — pass `resumeToken` in next `buildHello()`.
|
||||
(~10 lines across `broker.ts` + `session-broker.ts`.)
|
||||
5. **Broker watchdog** — extend the 30s ping loop with
|
||||
`terminate()`-on-stale logic. (~15 lines.)
|
||||
6. **Queue-during-grace** — relax priority gate in DM routing.
|
||||
(~5 lines.)
|
||||
7. **Spec docs** — update `docs/protocol.md` with resume_token,
|
||||
lease semantics. (~30 lines.)
|
||||
8. **Tests** — six scenarios above. Likely ~3 new test files.
|
||||
|
||||
Estimated total: one focused day. The broker lease model is the load-
|
||||
bearing change; everything else slots in cleanly once that's done.
|
||||
@@ -369,8 +369,19 @@ export interface ConnectParams {
|
||||
pid: number;
|
||||
cwd: string;
|
||||
groups?: Array<{ name: string; role?: string }>;
|
||||
/**
|
||||
* v2 agentic-comms (M1) — connection role.
|
||||
* 'control-plane' — daemon WS (hidden from user-facing peer lists).
|
||||
* 'session' — per-Claude-Code-session WS (default).
|
||||
* 'service' — autonomous bots/services attached to the mesh.
|
||||
* Optional for backwards compatibility; defaults to 'session'.
|
||||
*/
|
||||
role?: PresenceRole;
|
||||
}
|
||||
|
||||
/** v2 agentic-comms (M1): typed connection roles. */
|
||||
export type PresenceRole = "control-plane" | "session" | "service";
|
||||
|
||||
/** Create a presence row for a new WS connection. */
|
||||
export async function connectPresence(
|
||||
params: ConnectParams,
|
||||
@@ -389,6 +400,7 @@ export async function connectPresence(
|
||||
statusSource: "jsonl",
|
||||
statusUpdatedAt: now,
|
||||
groups: params.groups ?? [],
|
||||
role: params.role ?? "session",
|
||||
connectedAt: now,
|
||||
lastPingAt: now,
|
||||
})
|
||||
@@ -415,6 +427,21 @@ export async function heartbeat(presenceId: string): Promise<void> {
|
||||
.where(eq(presence.id, presenceId));
|
||||
}
|
||||
|
||||
/**
|
||||
* Restore a presence row to online state on lease reattach: clear
|
||||
* `disconnectedAt` and bump `lastPingAt`. Needed because the DB-level
|
||||
* stale-presence sweeper may have flipped the row to disconnected
|
||||
* during the grace window — the lease is in-memory truth, but other
|
||||
* code paths read presence.disconnectedAt directly.
|
||||
*/
|
||||
export async function restorePresence(presenceId: string): Promise<void> {
|
||||
const now = new Date();
|
||||
await db
|
||||
.update(presence)
|
||||
.set({ disconnectedAt: null, lastPingAt: now })
|
||||
.where(eq(presence.id, presenceId));
|
||||
}
|
||||
|
||||
// --- Peer discovery ---
|
||||
|
||||
/** Return all active (connected) presences in a mesh, joined with member info. */
|
||||
@@ -431,6 +458,11 @@ export async function listPeersInMesh(
|
||||
sessionId: string;
|
||||
cwd: string;
|
||||
connectedAt: Date;
|
||||
/** v2 agentic-comms (M1): connection role. CLI uses this to hide
|
||||
* control-plane daemons from user-facing lists. Wire-level field
|
||||
* is `peerRole` to avoid collision with 1.31.5's top-level `role`
|
||||
* lift of profile.role (user-supplied string like "lead"). */
|
||||
peerRole: PresenceRole;
|
||||
}>
|
||||
> {
|
||||
const rows = await db
|
||||
@@ -445,6 +477,7 @@ export async function listPeersInMesh(
|
||||
sessionId: presence.sessionId,
|
||||
cwd: presence.cwd,
|
||||
connectedAt: presence.connectedAt,
|
||||
peerRole: presence.role,
|
||||
})
|
||||
.from(presence)
|
||||
.innerJoin(memberTable, eq(presence.memberId, memberTable.id))
|
||||
@@ -469,6 +502,7 @@ export async function listPeersInMesh(
|
||||
sessionId: r.sessionId,
|
||||
cwd: r.cwd,
|
||||
connectedAt: r.connectedAt,
|
||||
peerRole: (r.peerRole ?? "session") as PresenceRole,
|
||||
}));
|
||||
}
|
||||
|
||||
@@ -2311,6 +2345,22 @@ function deliverablePriorities(status: PeerStatus): Priority[] {
|
||||
* targetSpec routing: matches either the member's pubkey directly or
|
||||
* the broadcast wildcard ("*"). Channel/tag resolution is per-mesh
|
||||
* config that lives outside this function.
|
||||
*
|
||||
* v2 agentic-comms (M1): two-phase claim/deliver with a 30s lease.
|
||||
*
|
||||
* The legacy implementation set `delivered_at = NOW()` in the same
|
||||
* UPDATE that selected the row. If the recipient WS was no longer
|
||||
* OPEN at push time, the message dropped silently (the row read as
|
||||
* "delivered" so the next reconnect's drain skipped it).
|
||||
*
|
||||
* The new behaviour:
|
||||
* - claim sets (claimed_at, claim_id, claim_expires_at = NOW() + 30s)
|
||||
* - delivered_at stays NULL until the recipient acks via `client_ack`
|
||||
* - re-eligibility predicate accepts rows whose claim has expired,
|
||||
* so dropped pushes are redelivered (at-least-once)
|
||||
*
|
||||
* `claimerPresenceId` is recorded on the row purely for debugging — it
|
||||
* never gates re-claim; expiry alone does.
|
||||
*/
|
||||
export async function drainForMember(
|
||||
meshId: string,
|
||||
@@ -2320,6 +2370,7 @@ export async function drainForMember(
|
||||
sessionPubkey?: string,
|
||||
excludeSenderSessionPubkey?: string,
|
||||
memberGroups?: string[],
|
||||
claimerPresenceId?: string,
|
||||
): Promise<
|
||||
Array<{
|
||||
id: string;
|
||||
@@ -2385,6 +2436,11 @@ export async function drainForMember(
|
||||
// (with id as tiebreaker so equal-timestamp rows stay deterministic).
|
||||
// Sorting in SQL avoids JS Date's millisecond-precision collapse of
|
||||
// Postgres microsecond timestamps.
|
||||
//
|
||||
// v2 (M1): claim sets the lease columns, NOT delivered_at. Re-eligibility
|
||||
// accepts unclaimed rows AND rows with an expired claim (NULL or past
|
||||
// NOW()). delivered_at stays NULL until a `client_ack` lands.
|
||||
const claimerId = claimerPresenceId ?? null;
|
||||
const result = await db.execute<{
|
||||
id: string;
|
||||
priority: string;
|
||||
@@ -2398,12 +2454,15 @@ export async function drainForMember(
|
||||
}>(sql`
|
||||
WITH claimed AS (
|
||||
UPDATE mesh.message_queue AS mq
|
||||
SET delivered_at = NOW()
|
||||
SET claimed_at = NOW(),
|
||||
claim_id = ${claimerId},
|
||||
claim_expires_at = NOW() + INTERVAL '30 seconds'
|
||||
FROM mesh.member AS m
|
||||
WHERE mq.id IN (
|
||||
SELECT id FROM mesh.message_queue
|
||||
WHERE mesh_id = ${meshId}
|
||||
AND delivered_at IS NULL
|
||||
AND (claimed_at IS NULL OR claim_expires_at IS NULL OR claim_expires_at < NOW())
|
||||
AND priority::text IN (${priorityList})
|
||||
AND (target_spec = ${memberPubkey} OR target_spec = '*'${sessionPubkey ? sql` OR target_spec = ${sessionPubkey}` : sql``} OR target_spec IN (${groupTargetList})${topicTargetList ? sql` OR target_spec IN (${topicTargetList})` : sql``})
|
||||
${excludeSenderSessionPubkey ? sql`AND NOT (target_spec IN ('*') AND sender_session_pubkey = ${excludeSenderSessionPubkey})` : sql``}
|
||||
@@ -2445,11 +2504,93 @@ export async function drainForMember(
|
||||
}));
|
||||
}
|
||||
|
||||
/**
|
||||
* v2 agentic-comms (M1): mark a message_queue row as delivered.
|
||||
*
|
||||
* Called when the recipient WS replies with a `client_ack` carrying the
|
||||
* original `client_message_id`. Lookup is scoped to (mesh_id, member_id)
|
||||
* so a malicious peer can't ack messages addressed to others. Returns
|
||||
* the number of rows marked (0 = unknown id, already delivered, or wrong
|
||||
* recipient).
|
||||
*/
|
||||
export async function markDelivered(params: {
|
||||
meshId: string;
|
||||
/** memberId of the WS that's claiming to have received this message. */
|
||||
recipientMemberId: string;
|
||||
recipientMemberPubkey: string;
|
||||
recipientSessionPubkey?: string | null;
|
||||
clientMessageId?: string | null;
|
||||
brokerMessageId?: string | null;
|
||||
}): Promise<number> {
|
||||
const {
|
||||
meshId,
|
||||
recipientMemberPubkey,
|
||||
recipientSessionPubkey,
|
||||
clientMessageId,
|
||||
brokerMessageId,
|
||||
} = params;
|
||||
if (!clientMessageId && !brokerMessageId) return 0;
|
||||
|
||||
// Prefer broker id when available; falls back to clientMessageId.
|
||||
// Scope to (mesh_id, target_spec ∈ {member-pubkey, session-pubkey, '*', @group, #topic}).
|
||||
// For minimal blast radius we only allow direct/broadcast acks here —
|
||||
// group/topic acks would need the same membership expansion drainForMember
|
||||
// does and we'd rather under-ack than over-ack (re-claim is cheap).
|
||||
const result = await db.execute<{ id: string }>(sql`
|
||||
UPDATE mesh.message_queue
|
||||
SET delivered_at = NOW()
|
||||
WHERE mesh_id = ${meshId}
|
||||
AND delivered_at IS NULL
|
||||
AND (
|
||||
${brokerMessageId ? sql`id = ${brokerMessageId}` : sql`FALSE`}
|
||||
OR ${clientMessageId ? sql`client_message_id = ${clientMessageId}` : sql`FALSE`}
|
||||
)
|
||||
AND (
|
||||
target_spec = ${recipientMemberPubkey}
|
||||
${recipientSessionPubkey ? sql`OR target_spec = ${recipientSessionPubkey}` : sql``}
|
||||
OR target_spec = '*'
|
||||
OR target_spec LIKE '@%'
|
||||
OR target_spec LIKE '#%'
|
||||
)
|
||||
RETURNING id
|
||||
`);
|
||||
const rows = ((result as unknown as { rows?: unknown[] }).rows ?? (result as unknown as unknown[])) as Array<{ id: string }>;
|
||||
return rows.length;
|
||||
}
|
||||
|
||||
/**
|
||||
* v2 agentic-comms (M1): reap expired claims so dropped pushes redeliver.
|
||||
*
|
||||
* Runs every 15s. Clears (claimed_at, claim_id, claim_expires_at) on rows
|
||||
* where the lease has expired and no `client_ack` arrived. The next
|
||||
* `drainForMember` call will pick the row up again — at-least-once.
|
||||
*
|
||||
* Returns the number of rows reaped.
|
||||
*/
|
||||
export async function sweepExpiredClaims(): Promise<number> {
|
||||
const result = await db.execute<{ id: string }>(sql`
|
||||
UPDATE mesh.message_queue
|
||||
SET claimed_at = NULL,
|
||||
claim_id = NULL,
|
||||
claim_expires_at = NULL
|
||||
WHERE delivered_at IS NULL
|
||||
AND claim_expires_at IS NOT NULL
|
||||
AND claim_expires_at < NOW()
|
||||
RETURNING id
|
||||
`);
|
||||
const rows = ((result as unknown as { rows?: unknown[] }).rows ?? (result as unknown as unknown[])) as Array<{ id: string }>;
|
||||
return rows.length;
|
||||
}
|
||||
|
||||
// --- Lifecycle ---
|
||||
|
||||
let ttlTimer: ReturnType<typeof setInterval> | null = null;
|
||||
let pendingTimer: ReturnType<typeof setInterval> | null = null;
|
||||
let staleTimer: ReturnType<typeof setInterval> | null = null;
|
||||
let claimSweepTimer: ReturnType<typeof setInterval> | null = null;
|
||||
|
||||
/** v2 agentic-comms (M1): how often we reap expired message claims. */
|
||||
const CLAIM_SWEEP_INTERVAL_MS = 15_000;
|
||||
|
||||
/** Start background sweepers. Idempotent. */
|
||||
export function startSweepers(): void {
|
||||
@@ -2467,6 +2608,13 @@ export function startSweepers(): void {
|
||||
console.error("[broker] stale presence sweep:", e),
|
||||
);
|
||||
}, 30_000);
|
||||
claimSweepTimer = setInterval(() => {
|
||||
sweepExpiredClaims()
|
||||
.then((n) => {
|
||||
if (n > 0) console.log(`[broker] expired claims swept: ${n}`);
|
||||
})
|
||||
.catch((e) => console.error("[broker] claim sweep:", e));
|
||||
}, CLAIM_SWEEP_INTERVAL_MS);
|
||||
// Orphan-message sweep every hour; cheap, rows are all >7d at deletion time.
|
||||
setInterval(() => {
|
||||
sweepOrphanMessages()
|
||||
@@ -2480,9 +2628,11 @@ export async function stopSweepers(): Promise<void> {
|
||||
if (ttlTimer) clearInterval(ttlTimer);
|
||||
if (pendingTimer) clearInterval(pendingTimer);
|
||||
if (staleTimer) clearInterval(staleTimer);
|
||||
if (claimSweepTimer) clearInterval(claimSweepTimer);
|
||||
ttlTimer = null;
|
||||
pendingTimer = null;
|
||||
staleTimer = null;
|
||||
claimSweepTimer = null;
|
||||
await db
|
||||
.update(presence)
|
||||
.set({ disconnectedAt: new Date() })
|
||||
|
||||
@@ -41,6 +41,7 @@ import {
|
||||
grantFileKey,
|
||||
handleHookSetStatus,
|
||||
heartbeat,
|
||||
restorePresence,
|
||||
insertFileKeys,
|
||||
joinGroup,
|
||||
joinMesh,
|
||||
@@ -49,6 +50,7 @@ import {
|
||||
listFiles,
|
||||
listPeersInMesh,
|
||||
listState,
|
||||
markDelivered,
|
||||
listTasks,
|
||||
queueMessage,
|
||||
recallMemory,
|
||||
@@ -155,11 +157,53 @@ interface PeerConn {
|
||||
bio?: string;
|
||||
capabilities?: string[];
|
||||
};
|
||||
/** v2 agentic-comms presence taxonomy. Mirrors the value passed to
|
||||
* `recordPresence`. Used by the kick handler to refuse no-op kicks
|
||||
* on long-lived control-plane connections (daemon, dashboard) that
|
||||
* would just auto-reconnect. */
|
||||
peerRole: "control-plane" | "session" | "service";
|
||||
/** Last time this connection's WS replied to a broker ping. Bumped
|
||||
* in the `pong` handler. Used by the staleness watchdog to detect
|
||||
* half-dead TCP/NAT-dropped connections that the kernel hasn't yet
|
||||
* RST'd (Linux default keepalive ≈ 2hrs). */
|
||||
lastPongAt: number;
|
||||
/** Lease state: "online" while the WS is healthy, "offline" during
|
||||
* the GRACE window after a WS close. While offline, the entry stays
|
||||
* in `connections` so peer_list / sendToPeer still see it; DMs land
|
||||
* in the message_queue (sendToPeer no-ops on dead WS, but the queue
|
||||
* row stays with deliveredAt=NULL and drains on reattach). After
|
||||
* GRACE_MS without a reattach, evictionTimer fires the full peer_left
|
||||
* + cleanup. Reattach (same sessionPubkey hello arriving on a fresh
|
||||
* WS) cancels the timer, swaps in the new ws, restores online. */
|
||||
leaseState: "online" | "offline";
|
||||
/** When the lease will be evicted if no reattach happens. 0 when online. */
|
||||
leaseUntil: number;
|
||||
/** Timer that fires evictPresenceFully(presenceId) at leaseUntil. null when online. */
|
||||
evictionTimer: NodeJS.Timeout | null;
|
||||
}
|
||||
|
||||
const connections = new Map<string, PeerConn>();
|
||||
const connectionsPerMesh = new Map<string, number>();
|
||||
|
||||
/**
|
||||
* Lease grace window — how long after a WS close the broker will hold
|
||||
* the presence row open before evicting and broadcasting peer_left.
|
||||
*
|
||||
* 90s: long enough to absorb a sleep/resume cycle, NAT timeout, ISP
|
||||
* route flap, mobile→wifi handover, broker restart of the daemon's
|
||||
* machine. Short enough that a true crash (machine off, daemon killed)
|
||||
* clears the session within 90s — peers don't see ghost online status
|
||||
* forever.
|
||||
*
|
||||
* During grace: lease stays in `connections`, peer_list keeps showing
|
||||
* the session as online to other peers, DMs route through message_queue
|
||||
* (sendToPeer no-ops on dead WS, drain happens on reattach). On
|
||||
* reattach (same sessionPubkey hello on a new WS): silent swap, no
|
||||
* peer_joined / peer_left visible to anyone. After grace expires:
|
||||
* full eviction (peer_left + cleanup) fires exactly once.
|
||||
*/
|
||||
const GRACE_MS = 90_000;
|
||||
|
||||
// Rate limiter for /tg/token endpoint (IP → count, cleared hourly)
|
||||
const tgTokenRateLimit = new Map<string, number>();
|
||||
setInterval(() => tgTokenRateLimit.clear(), 60 * 60_000).unref();
|
||||
@@ -524,6 +568,97 @@ function sendToPeer(presenceId: string, msg: WSServerMessage): void {
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Run the full presence-cleanup path: broadcast peer_left, decMeshCount,
|
||||
* disconnectPresence in DB, audit, clean up URL watches / streams /
|
||||
* MCP entries / clock. Removes the entry from `connections`.
|
||||
*
|
||||
* Called from two places:
|
||||
* 1. `ws.on("close")` when the closing WS belongs to a connection
|
||||
* with no active lease (no grace) — i.e. the lease had already
|
||||
* been evicted, or the close fires before lease is established.
|
||||
* 2. The grace-window evictionTimer when no reattach happened in
|
||||
* GRACE_MS. This is the "presence is really gone" path.
|
||||
*
|
||||
* Idempotent: re-entering when the connections entry is already gone
|
||||
* is a no-op.
|
||||
*/
|
||||
async function evictPresenceFully(presenceId: string): Promise<void> {
|
||||
const conn = connections.get(presenceId);
|
||||
if (!conn) return; // already evicted
|
||||
if (conn.evictionTimer) {
|
||||
clearTimeout(conn.evictionTimer);
|
||||
conn.evictionTimer = null;
|
||||
}
|
||||
connections.delete(presenceId);
|
||||
decMeshCount(conn.meshId);
|
||||
|
||||
const leaveMsg: WSPushMessage = {
|
||||
type: "push",
|
||||
subtype: "system",
|
||||
event: "peer_left",
|
||||
eventData: {
|
||||
name: conn.displayName,
|
||||
pubkey: conn.sessionPubkey ?? conn.memberPubkey,
|
||||
},
|
||||
messageId: crypto.randomUUID(),
|
||||
meshId: conn.meshId,
|
||||
senderPubkey: "system",
|
||||
priority: "low",
|
||||
nonce: "",
|
||||
ciphertext: "",
|
||||
createdAt: new Date().toISOString(),
|
||||
};
|
||||
for (const [pid, peer] of connections) {
|
||||
if (peer.meshId !== conn.meshId) continue;
|
||||
// Don't tell the user's own other sessions they "left" when one
|
||||
// of their Claude Code instances closes. Same pubkey = same user.
|
||||
if (peer.memberPubkey === conn.memberPubkey) continue;
|
||||
sendToPeer(pid, leaveMsg);
|
||||
}
|
||||
|
||||
await disconnectPresence(presenceId);
|
||||
void audit(conn.meshId, "peer_left", conn.memberId, conn.displayName, {});
|
||||
|
||||
// URL watches owned by this presence — interval would otherwise
|
||||
// happily fetch forever after the peer is gone.
|
||||
for (const [watchId, watch] of urlWatches) {
|
||||
if (watch.presenceId === presenceId) {
|
||||
clearInterval(watch.timer);
|
||||
urlWatches.delete(watchId);
|
||||
}
|
||||
}
|
||||
// Stream subscriptions for this presence.
|
||||
for (const [key, subs] of streamSubscriptions) {
|
||||
subs.delete(presenceId);
|
||||
if (subs.size === 0) streamSubscriptions.delete(key);
|
||||
}
|
||||
// MCP servers registered by this presence.
|
||||
for (const [key, entry] of mcpRegistry) {
|
||||
if (entry.presenceId === presenceId) {
|
||||
if (entry.persistent) {
|
||||
// Keep persistent entries but mark offline
|
||||
entry.online = false;
|
||||
entry.offlineSince = new Date().toISOString();
|
||||
entry.presenceId = "";
|
||||
} else {
|
||||
mcpRegistry.delete(key);
|
||||
}
|
||||
}
|
||||
}
|
||||
// Auto-pause clock when mesh becomes empty.
|
||||
if (!connectionsPerMesh.has(conn.meshId)) {
|
||||
const clock = meshClocks.get(conn.meshId);
|
||||
if (clock && clock.timer) {
|
||||
clearInterval(clock.timer);
|
||||
clock.timer = null;
|
||||
clock.paused = true;
|
||||
log.info("clock auto-paused (mesh empty)", { mesh_id: conn.meshId });
|
||||
}
|
||||
}
|
||||
log.info("ws evict full", { presence_id: presenceId });
|
||||
}
|
||||
|
||||
async function maybePushQueuedMessages(
|
||||
presenceId: string,
|
||||
excludeSenderSessionPubkey?: string,
|
||||
@@ -546,6 +681,7 @@ async function maybePushQueuedMessages(
|
||||
conn.sessionPubkey ?? undefined,
|
||||
excludeSenderSessionPubkey,
|
||||
conn.groups.map((g) => g.name),
|
||||
presenceId,
|
||||
);
|
||||
log.info("maybePush", {
|
||||
presence_id: presenceId,
|
||||
@@ -1659,6 +1795,10 @@ async function handleHello(
|
||||
lastSeenAt?: string;
|
||||
restoredGroups?: Array<{ name: string; role?: string }>;
|
||||
restoredStats?: unknown;
|
||||
/** True when this hello reattached an existing offline lease — caller
|
||||
* must skip the peer_joined broadcast and the services-list ack
|
||||
* augmentation. The session was never visibly absent from peers. */
|
||||
silent?: boolean;
|
||||
} | null> {
|
||||
// Validate sessionPubkey shape — it becomes a routable identity in
|
||||
// listPeers/drainForMember, so arbitrary strings let a client claim
|
||||
@@ -1751,6 +1891,61 @@ async function handleHello(
|
||||
const initialGroups = helloHasGroups
|
||||
? hello.groups!
|
||||
: (saved?.groups?.length ? saved.groups : (member.defaultGroups ?? []));
|
||||
// Reattach check: if an offline-leased lease exists for the same
|
||||
// stable identity (sessionPubkey when present, otherwise sessionId
|
||||
// for member-WS), this hello is a transient reconnect within the
|
||||
// grace window — swap the WS reference, clear the eviction timer,
|
||||
// restore online state. No peer_joined broadcast — peers never saw
|
||||
// this session leave.
|
||||
for (const [pid, oldConn] of connections) {
|
||||
if (oldConn.meshId !== hello.meshId) continue;
|
||||
if (oldConn.leaseState !== "offline") continue;
|
||||
const matchByPubkey =
|
||||
!!hello.sessionPubkey
|
||||
&& oldConn.sessionPubkey === hello.sessionPubkey;
|
||||
const matchBySessionId =
|
||||
!hello.sessionPubkey
|
||||
&& !oldConn.sessionPubkey
|
||||
&& oldConn.sessionId === hello.sessionId
|
||||
&& oldConn.memberPubkey === hello.pubkey;
|
||||
if (!matchByPubkey && !matchBySessionId) continue;
|
||||
|
||||
if (oldConn.evictionTimer) {
|
||||
clearTimeout(oldConn.evictionTimer);
|
||||
oldConn.evictionTimer = null;
|
||||
}
|
||||
oldConn.ws = ws;
|
||||
oldConn.leaseState = "online";
|
||||
oldConn.leaseUntil = 0;
|
||||
oldConn.lastPongAt = Date.now();
|
||||
// Refresh mutable fields from the new hello — the same session may
|
||||
// have moved cwd / changed display name across the blip.
|
||||
oldConn.cwd = hello.cwd;
|
||||
if (hello.displayName) oldConn.displayName = hello.displayName;
|
||||
log.info("ws hello reattach (lease)", {
|
||||
presence_id: pid,
|
||||
session_pubkey: hello.sessionPubkey?.slice(0, 12) ?? "(member-WS)",
|
||||
session_id: hello.sessionId,
|
||||
});
|
||||
// Reset DB row to online: the stale-presence sweeper may have set
|
||||
// disconnectedAt during the grace window. Lease is in-memory truth
|
||||
// but downstream code paths read presence.disconnectedAt directly.
|
||||
void restorePresence(pid);
|
||||
// Drain any queued DMs that landed during the offline window.
|
||||
void maybePushQueuedMessages(pid);
|
||||
return {
|
||||
presenceId: pid,
|
||||
memberDisplayName: oldConn.displayName,
|
||||
memberProfile: {
|
||||
roleTag: member.roleTag,
|
||||
groups: member.defaultGroups ?? [],
|
||||
messageMode: member.messageMode ?? "push",
|
||||
},
|
||||
meshPolicy,
|
||||
silent: true,
|
||||
};
|
||||
}
|
||||
|
||||
// Session-id dedup: if this session_id already has an active presence,
|
||||
// disconnect the ghost. Happens when a client reconnects after a
|
||||
// network blip or broker restart before the 90s stale sweeper runs.
|
||||
@@ -1772,6 +1967,11 @@ async function handleHello(
|
||||
pid: hello.pid,
|
||||
cwd: hello.cwd,
|
||||
groups: initialGroups,
|
||||
// v2 agentic-comms (M1): the regular member-keyed `hello` path is
|
||||
// used by long-lived control-plane connections (claudemesh daemon,
|
||||
// dashboard, automation). Per-Claude-Code sessions go through
|
||||
// `session_hello` and get role='session'.
|
||||
role: "control-plane",
|
||||
});
|
||||
const effectiveDisplayName = hello.displayName || member.displayName;
|
||||
connections.set(presenceId, {
|
||||
@@ -1790,12 +1990,18 @@ async function handleHello(
|
||||
groups: initialGroups,
|
||||
visible: saved?.visible ?? true,
|
||||
profile: saved?.profile ?? {},
|
||||
peerRole: "control-plane",
|
||||
lastPongAt: Date.now(),
|
||||
leaseState: "online",
|
||||
leaseUntil: 0,
|
||||
evictionTimer: null,
|
||||
});
|
||||
incMeshCount(hello.meshId);
|
||||
void audit(hello.meshId, "peer_joined", member.id, effectiveDisplayName, {
|
||||
pubkey: hello.pubkey,
|
||||
groups: initialGroups,
|
||||
restored: !!saved,
|
||||
role: "control-plane",
|
||||
});
|
||||
log.info("ws hello", {
|
||||
mesh_id: hello.meshId,
|
||||
@@ -1845,6 +2051,10 @@ async function handleSessionHello(
|
||||
memberDisplayName: string;
|
||||
memberProfile?: unknown;
|
||||
meshPolicy?: Record<string, unknown>;
|
||||
/** True when this hello reattached an existing offline lease — caller
|
||||
* must skip the peer_joined broadcast. The session was never visibly
|
||||
* absent from peers. */
|
||||
silent?: boolean;
|
||||
} | null> {
|
||||
// Shape checks. The crypto helpers also enforce these but bailing
|
||||
// early gives a clearer error code on the wire.
|
||||
@@ -1974,6 +2184,42 @@ async function handleSessionHello(
|
||||
|
||||
const initialGroups = hello.groups ?? member.defaultGroups ?? [];
|
||||
|
||||
// Reattach check: an offline-leased connection with the same
|
||||
// sessionPubkey is the same launched session resuming inside the
|
||||
// grace window. Cancel the eviction timer, swap the WS, restore
|
||||
// online state. No peer_joined broadcast — peers never saw the
|
||||
// session leave.
|
||||
for (const [pid, oldConn] of connections) {
|
||||
if (oldConn.meshId !== hello.meshId) continue;
|
||||
if (oldConn.leaseState !== "offline") continue;
|
||||
if (oldConn.sessionPubkey !== hello.sessionPubkey) continue;
|
||||
|
||||
if (oldConn.evictionTimer) {
|
||||
clearTimeout(oldConn.evictionTimer);
|
||||
oldConn.evictionTimer = null;
|
||||
}
|
||||
oldConn.ws = ws;
|
||||
oldConn.leaseState = "online";
|
||||
oldConn.leaseUntil = 0;
|
||||
oldConn.lastPongAt = Date.now();
|
||||
// Refresh mutable fields from the new hello.
|
||||
oldConn.cwd = hello.cwd;
|
||||
if (hello.displayName) oldConn.displayName = hello.displayName;
|
||||
log.info("session_hello reattach (lease)", {
|
||||
presence_id: pid,
|
||||
session_pubkey: hello.sessionPubkey.slice(0, 12),
|
||||
});
|
||||
void restorePresence(pid);
|
||||
void maybePushQueuedMessages(pid);
|
||||
return {
|
||||
presenceId: pid,
|
||||
memberDisplayName: oldConn.displayName,
|
||||
memberProfile: undefined,
|
||||
meshPolicy,
|
||||
silent: true,
|
||||
};
|
||||
}
|
||||
|
||||
// Session-id dedup: if the same session_id is already connected, kick
|
||||
// the ghost. Reconnect after a network blip lands here cleanly.
|
||||
for (const [oldPid, oldConn] of connections) {
|
||||
@@ -1993,6 +2239,9 @@ async function handleSessionHello(
|
||||
pid: hello.pid,
|
||||
cwd: hello.cwd,
|
||||
groups: initialGroups,
|
||||
// v2 agentic-comms (M1): per-Claude-Code session WS — these are the
|
||||
// user-facing peers shown in `claudemesh peer list`.
|
||||
role: "session",
|
||||
});
|
||||
const effectiveDisplayName = hello.displayName || member.displayName;
|
||||
connections.set(presenceId, {
|
||||
@@ -2011,6 +2260,11 @@ async function handleSessionHello(
|
||||
groups: initialGroups,
|
||||
visible: true,
|
||||
profile: {},
|
||||
peerRole: "session",
|
||||
lastPongAt: Date.now(),
|
||||
leaseState: "online",
|
||||
leaseUntil: 0,
|
||||
evictionTimer: null,
|
||||
});
|
||||
incMeshCount(hello.meshId);
|
||||
void audit(hello.meshId, "peer_joined", member.id, effectiveDisplayName, {
|
||||
@@ -2018,6 +2272,7 @@ async function handleSessionHello(
|
||||
session_pubkey: hello.sessionPubkey,
|
||||
groups: initialGroups,
|
||||
via: "session_hello",
|
||||
role: "session",
|
||||
});
|
||||
log.info("ws session_hello", {
|
||||
mesh_id: hello.meshId,
|
||||
@@ -2408,8 +2663,10 @@ function handleConnection(ws: WebSocket): void {
|
||||
}
|
||||
// Broadcast peer_joined to siblings — same shape as the regular
|
||||
// hello path, so list_peers consumers don't need to special-case.
|
||||
// Skipped on lease reattach: the session was never visibly absent,
|
||||
// so no synthetic join event should fire.
|
||||
const joinedConn = connections.get(presenceId);
|
||||
if (joinedConn) {
|
||||
if (joinedConn && !result.silent) {
|
||||
const joinMsg: WSPushMessage = {
|
||||
type: "push",
|
||||
subtype: "system",
|
||||
@@ -2492,9 +2749,11 @@ function handleConnection(ws: WebSocket): void {
|
||||
} catch {
|
||||
/* ws closed during hello */
|
||||
}
|
||||
// Broadcast peer_joined or peer_returned to all other peers in the same mesh.
|
||||
// Broadcast peer_joined or peer_returned to all other peers in the
|
||||
// same mesh. Skipped on lease reattach: the session never appeared
|
||||
// offline so no synthetic join event should fire.
|
||||
const joinedConn = connections.get(presenceId);
|
||||
if (joinedConn) {
|
||||
if (joinedConn && !result.silent) {
|
||||
const isReturning = !!result.restored;
|
||||
const joinMsg: WSPushMessage = {
|
||||
type: "push",
|
||||
@@ -2567,6 +2826,39 @@ function handleConnection(ws: WebSocket): void {
|
||||
case "send":
|
||||
await handleSend(conn, msg);
|
||||
break;
|
||||
case "client_ack": {
|
||||
// v2 agentic-comms (M1): close out a previously pushed message.
|
||||
// Lookup is scoped to (mesh_id, recipient pubkey) so a peer can
|
||||
// only ack messages addressed to itself.
|
||||
const ack = msg as Extract<WSClientMessage, { type: "client_ack" }>;
|
||||
if (!ack.clientMessageId && !ack.brokerMessageId) {
|
||||
// Nothing to do; don't error — the daemon may speculatively
|
||||
// ack and we'd rather be lenient than break a CLI release.
|
||||
break;
|
||||
}
|
||||
try {
|
||||
const n = await markDelivered({
|
||||
meshId: conn.meshId,
|
||||
recipientMemberId: conn.memberId,
|
||||
recipientMemberPubkey: conn.memberPubkey,
|
||||
recipientSessionPubkey: conn.sessionPubkey ?? null,
|
||||
clientMessageId: ack.clientMessageId ?? null,
|
||||
brokerMessageId: ack.brokerMessageId ?? null,
|
||||
});
|
||||
log.debug("ws client_ack", {
|
||||
presence_id: presenceId,
|
||||
client_message_id: ack.clientMessageId,
|
||||
broker_message_id: ack.brokerMessageId,
|
||||
marked: n,
|
||||
});
|
||||
} catch (e) {
|
||||
log.warn("ws client_ack failed", {
|
||||
presence_id: presenceId,
|
||||
error: e instanceof Error ? e.message : String(e),
|
||||
});
|
||||
}
|
||||
break;
|
||||
}
|
||||
case "set_status":
|
||||
await writeStatus(presenceId, msg.status, "manual", new Date());
|
||||
log.info("ws set_status", {
|
||||
@@ -2604,6 +2896,12 @@ function handleConnection(ws: WebSocket): void {
|
||||
sessionId: p.sessionId,
|
||||
connectedAt: p.connectedAt.toISOString(),
|
||||
cwd: pc?.cwd ?? p.cwd,
|
||||
// v2 agentic-comms (M1): typed connection role. CLI uses
|
||||
// this to hide control-plane daemons from user-facing
|
||||
// peer lists (filter swap from peerType happens CLI-side).
|
||||
// Wire field is `peerRole` to avoid collision with the
|
||||
// 1.31.5 top-level `role` lift of profile.role.
|
||||
peerRole: p.peerRole,
|
||||
...(pc?.hostname ? { hostname: pc.hostname } : {}),
|
||||
...(pc?.peerType ? { peerType: pc.peerType } : {}),
|
||||
...(pc?.channel ? { channel: pc.channel } : {}),
|
||||
@@ -4594,11 +4892,30 @@ function handleConnection(ws: WebSocket): void {
|
||||
}
|
||||
|
||||
const affected: string[] = [];
|
||||
// 1.34.15 (gap #3a): kick was a no-op against long-lived
|
||||
// control-plane connections (daemon, dashboard) — closing
|
||||
// their WS just triggered the auto-reconnect loop, the
|
||||
// kicker's CLI rendered "Their Claude Code session ended"
|
||||
// (which was misleading), and the user-visible state was
|
||||
// unchanged seconds later. We now refuse to close control-
|
||||
// plane WSes and surface the skipped peers in a new
|
||||
// additive ack field. Pre-1.34.15 CLI clients only read
|
||||
// `kicked`/`affected`, so this stays back-compat.
|
||||
//
|
||||
// For `kick`-only: the soft `disconnect` verb still closes
|
||||
// control-plane WSes intentionally — that's what users want
|
||||
// when they're nudging a peer for it to re-authenticate.
|
||||
const skippedControlPlane: string[] = [];
|
||||
const skipControlPlane = isKick;
|
||||
const now = Date.now();
|
||||
|
||||
if (km.all) {
|
||||
for (const [pid, peer] of connections) {
|
||||
if (peer.meshId !== conn.meshId || pid === presenceId) continue;
|
||||
if (skipControlPlane && peer.peerRole === "control-plane") {
|
||||
skippedControlPlane.push(peer.displayName || pid);
|
||||
continue;
|
||||
}
|
||||
try { peer.ws.close(closeCode, closeReason); } catch {}
|
||||
connections.delete(pid);
|
||||
void disconnectPresence(pid);
|
||||
@@ -4610,6 +4927,10 @@ function handleConnection(ws: WebSocket): void {
|
||||
if (peer.meshId !== conn.meshId || pid === presenceId) continue;
|
||||
const [pres] = await db.select({ lastPingAt: presence.lastPingAt }).from(presence).where(eq(presence.id, pid)).limit(1);
|
||||
if (pres && pres.lastPingAt && pres.lastPingAt.getTime() < cutoff) {
|
||||
if (skipControlPlane && peer.peerRole === "control-plane") {
|
||||
skippedControlPlane.push(peer.displayName || pid);
|
||||
continue;
|
||||
}
|
||||
try { peer.ws.close(closeCode, `${closeReason}_stale`); } catch {}
|
||||
connections.delete(pid);
|
||||
void disconnectPresence(pid);
|
||||
@@ -4620,6 +4941,10 @@ function handleConnection(ws: WebSocket): void {
|
||||
for (const [pid, peer] of connections) {
|
||||
if (peer.meshId !== conn.meshId) continue;
|
||||
if (peer.displayName === km.target || peer.memberPubkey === km.target || peer.memberPubkey.startsWith(km.target)) {
|
||||
if (skipControlPlane && peer.peerRole === "control-plane") {
|
||||
skippedControlPlane.push(peer.displayName || pid);
|
||||
continue;
|
||||
}
|
||||
try { peer.ws.close(closeCode, closeReason); } catch {}
|
||||
connections.delete(pid);
|
||||
void disconnectPresence(pid);
|
||||
@@ -4628,8 +4953,20 @@ function handleConnection(ws: WebSocket): void {
|
||||
}
|
||||
}
|
||||
|
||||
conn.ws.send(JSON.stringify({ type: ackType, kicked: affected, affected, _reqId: km._reqId }));
|
||||
log.info(`ws ${closeReason}`, { presence_id: presenceId, count: affected.length, target: km.target ?? km.stale ?? "all" });
|
||||
conn.ws.send(JSON.stringify({
|
||||
type: ackType,
|
||||
kicked: affected,
|
||||
affected,
|
||||
// Additive — older CLI clients ignore this field.
|
||||
...(skippedControlPlane.length > 0 ? { skipped_control_plane: skippedControlPlane } : {}),
|
||||
_reqId: km._reqId,
|
||||
}));
|
||||
log.info(`ws ${closeReason}`, {
|
||||
presence_id: presenceId,
|
||||
count: affected.length,
|
||||
target: km.target ?? km.stale ?? "all",
|
||||
skipped_control_plane: skippedControlPlane.length,
|
||||
});
|
||||
break;
|
||||
}
|
||||
|
||||
@@ -5057,88 +5394,52 @@ function handleConnection(ws: WebSocket): void {
|
||||
}
|
||||
});
|
||||
ws.on("close", async () => {
|
||||
if (presenceId) {
|
||||
const conn = connections.get(presenceId);
|
||||
// Persist peer state BEFORE removing from connections.
|
||||
if (conn) {
|
||||
await savePeerState(conn, conn.memberId, conn.meshId);
|
||||
}
|
||||
connections.delete(presenceId);
|
||||
if (conn) {
|
||||
decMeshCount(conn.meshId);
|
||||
// Broadcast peer_left to remaining peers in the same mesh.
|
||||
const leaveMsg: WSPushMessage = {
|
||||
type: "push",
|
||||
subtype: "system",
|
||||
event: "peer_left",
|
||||
eventData: {
|
||||
name: conn.displayName,
|
||||
pubkey: conn.sessionPubkey ?? conn.memberPubkey,
|
||||
},
|
||||
messageId: crypto.randomUUID(),
|
||||
meshId: conn.meshId,
|
||||
senderPubkey: "system",
|
||||
priority: "low",
|
||||
nonce: "",
|
||||
ciphertext: "",
|
||||
createdAt: new Date().toISOString(),
|
||||
};
|
||||
for (const [pid, peer] of connections) {
|
||||
if (peer.meshId !== conn.meshId) continue;
|
||||
// Don't tell the user's own other sessions they "left" when one
|
||||
// of their Claude Code instances closes. Same pubkey = same user.
|
||||
if (peer.memberPubkey === conn.memberPubkey) continue;
|
||||
sendToPeer(pid, leaveMsg);
|
||||
}
|
||||
}
|
||||
await disconnectPresence(presenceId);
|
||||
if (conn) {
|
||||
void audit(conn.meshId, "peer_left", conn.memberId, conn.displayName, {});
|
||||
}
|
||||
// Clean up URL watches owned by this peer — the interval was
|
||||
// happily fetching forever after the peer disconnected.
|
||||
for (const [watchId, watch] of urlWatches) {
|
||||
if (watch.presenceId === presenceId) {
|
||||
clearInterval(watch.timer);
|
||||
urlWatches.delete(watchId);
|
||||
}
|
||||
}
|
||||
// Clean up stream subscriptions for this peer
|
||||
for (const [key, subs] of streamSubscriptions) {
|
||||
subs.delete(presenceId);
|
||||
if (subs.size === 0) streamSubscriptions.delete(key);
|
||||
}
|
||||
// Clean up MCP servers registered by this peer
|
||||
for (const [key, entry] of mcpRegistry) {
|
||||
if (entry.presenceId === presenceId) {
|
||||
if (entry.persistent) {
|
||||
// Keep persistent entries but mark offline
|
||||
entry.online = false;
|
||||
entry.offlineSince = new Date().toISOString();
|
||||
entry.presenceId = "";
|
||||
} else {
|
||||
mcpRegistry.delete(key);
|
||||
}
|
||||
}
|
||||
}
|
||||
// Auto-pause clock when mesh becomes empty
|
||||
if (conn && !connectionsPerMesh.has(conn.meshId)) {
|
||||
const clock = meshClocks.get(conn.meshId);
|
||||
if (clock && clock.timer) {
|
||||
clearInterval(clock.timer);
|
||||
clock.timer = null;
|
||||
clock.paused = true;
|
||||
log.info("clock auto-paused (mesh empty)", { mesh_id: conn.meshId });
|
||||
}
|
||||
}
|
||||
log.info("ws close", { presence_id: presenceId });
|
||||
if (!presenceId) return;
|
||||
const conn = connections.get(presenceId);
|
||||
if (!conn) return; // already evicted
|
||||
|
||||
// If the conn's `ws` is no longer THIS ws, the close belongs to an
|
||||
// older socket that was already replaced by a reattach. Ignore — the
|
||||
// lease is healthy with the new WS, no eviction needed.
|
||||
if (conn.ws !== ws) {
|
||||
log.debug("ws close on replaced socket — ignoring", { presence_id: presenceId });
|
||||
return;
|
||||
}
|
||||
|
||||
await savePeerState(conn, conn.memberId, conn.meshId);
|
||||
|
||||
// If lease is currently online, enter grace. Other peers see the
|
||||
// session as still online; DMs queue (sendToPeer no-ops on dead
|
||||
// WS, drain on reattach). After GRACE_MS without a reattach, the
|
||||
// timer fires evictPresenceFully and cleanup runs as before.
|
||||
const pid = presenceId;
|
||||
if (conn.leaseState === "online") {
|
||||
conn.leaseState = "offline";
|
||||
conn.leaseUntil = Date.now() + GRACE_MS;
|
||||
conn.evictionTimer = setTimeout(() => {
|
||||
log.info("lease grace expired — evicting", { presence_id: pid });
|
||||
void evictPresenceFully(pid);
|
||||
}, GRACE_MS);
|
||||
log.info("ws close — lease grace started", {
|
||||
presence_id: pid,
|
||||
grace_ms: GRACE_MS,
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
// Not online (already in grace from an earlier close, or odd state).
|
||||
// Run full eviction immediately.
|
||||
await evictPresenceFully(pid);
|
||||
});
|
||||
ws.on("error", (err) => {
|
||||
log.warn("ws error", { error: err.message });
|
||||
});
|
||||
ws.on("pong", () => {
|
||||
if (presenceId) void heartbeat(presenceId);
|
||||
if (presenceId) {
|
||||
const conn = connections.get(presenceId);
|
||||
if (conn) conn.lastPongAt = Date.now();
|
||||
void heartbeat(presenceId);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
@@ -5330,10 +5631,29 @@ async function main(): Promise<void> {
|
||||
});
|
||||
});
|
||||
|
||||
// WS heartbeat ping every 30s; clients reply with pong → bumps lastPingAt.
|
||||
// WS heartbeat ping every 30s; clients reply with pong → bumps
|
||||
// lastPongAt. Connections whose pong is older than 75s (2.5x the
|
||||
// ping interval) are considered half-dead — kernel hasn't yet RST'd
|
||||
// the socket but no application traffic is flowing. Force-terminate
|
||||
// them to fire the close handler and free the connection slot.
|
||||
const STALE_PONG_THRESHOLD_MS = 75_000;
|
||||
const pingInterval = setInterval(() => {
|
||||
for (const { ws } of connections.values()) {
|
||||
if (ws.readyState === ws.OPEN) ws.ping();
|
||||
const now = Date.now();
|
||||
for (const [pid, conn] of connections) {
|
||||
// Skip offline-leased entries: their WS is intentionally dead
|
||||
// during grace; the eviction timer handles their lifecycle.
|
||||
if (conn.leaseState === "offline") continue;
|
||||
const { ws } = conn;
|
||||
if (ws.readyState !== ws.OPEN) continue;
|
||||
if (now - conn.lastPongAt > STALE_PONG_THRESHOLD_MS) {
|
||||
log.warn("ws stale terminate", {
|
||||
presence_id: pid,
|
||||
last_pong_ago_ms: now - conn.lastPongAt,
|
||||
});
|
||||
try { ws.terminate(); } catch { /* socket already gone */ }
|
||||
continue;
|
||||
}
|
||||
ws.ping();
|
||||
}
|
||||
}, 30_000);
|
||||
pingInterval.unref();
|
||||
|
||||
@@ -224,6 +224,26 @@ export interface WSSetStatusMessage {
|
||||
status: PeerStatus;
|
||||
}
|
||||
|
||||
/**
|
||||
* Client → broker: confirm receipt of a previously pushed envelope so the
|
||||
* broker can mark the message_queue row delivered.
|
||||
*
|
||||
* v2 agentic-comms (M1): pairs with the two-phase claim/lease introduced
|
||||
* in `drainForMember`. Without this ack, the lease expires after 30s and
|
||||
* the message is re-claimed and re-pushed (at-least-once retry).
|
||||
*
|
||||
* Either id is accepted; daemons that track inbox dedupe by clientMessageId
|
||||
* should send that one. brokerMessageId is the row primary key, useful when
|
||||
* the original send didn't carry a client_message_id (legacy traffic).
|
||||
*/
|
||||
export interface WSClientAckMessage {
|
||||
type: "client_ack";
|
||||
/** Original caller-supplied idempotency id from the `send` envelope. */
|
||||
clientMessageId?: string;
|
||||
/** Broker-side row id (the `messageId` field on the inbound `push`). */
|
||||
brokerMessageId?: string;
|
||||
}
|
||||
|
||||
/** Client → broker: request list of connected peers in the same mesh. */
|
||||
export interface WSListPeersMessage {
|
||||
type: "list_peers";
|
||||
@@ -518,6 +538,8 @@ export interface WSPeersListMessage {
|
||||
type: "peers_list";
|
||||
peers: Array<{
|
||||
pubkey: string;
|
||||
/** Stable member pubkey — present on M1+ broker responses. */
|
||||
memberPubkey?: string;
|
||||
displayName: string;
|
||||
status: PeerStatus;
|
||||
summary: string | null;
|
||||
@@ -525,6 +547,13 @@ export interface WSPeersListMessage {
|
||||
sessionId: string;
|
||||
connectedAt: string;
|
||||
cwd?: string;
|
||||
/** v2 agentic-comms (M1): typed connection role. CLI uses this to
|
||||
* filter control-plane daemons out of user-facing peer lists.
|
||||
* Optional for clients talking to a pre-M1 broker. Wire field is
|
||||
* `peerRole` to avoid collision with 1.31.5's top-level `role`
|
||||
* (which is a lift of `profile.role`, the user-supplied string
|
||||
* like "lead" / "reviewer" / "human"). */
|
||||
peerRole?: "control-plane" | "session" | "service";
|
||||
hostname?: string;
|
||||
peerType?: "ai" | "human" | "connector";
|
||||
channel?: string;
|
||||
@@ -1417,6 +1446,7 @@ export type WSClientMessage =
|
||||
| WSHelloMessage
|
||||
| WSSessionHelloMessage
|
||||
| WSSendMessage
|
||||
| WSClientAckMessage
|
||||
| WSSetStatusMessage
|
||||
| WSListPeersMessage
|
||||
| WSSetSummaryMessage
|
||||
|
||||
47
apps/broker/tests/kick-control-plane-skip.test.ts
Normal file
47
apps/broker/tests/kick-control-plane-skip.test.ts
Normal file
@@ -0,0 +1,47 @@
|
||||
/**
|
||||
* Kick control-plane skip: 1.34.15 (gap #3a) refuses to close
|
||||
* long-lived control-plane connections (claudemesh daemon, dashboard)
|
||||
* via `kick`, because they auto-reconnect within seconds and the verb
|
||||
* was effectively a no-op. The soft `disconnect` verb keeps the old
|
||||
* behavior so users can still nudge a control-plane peer to
|
||||
* re-authenticate.
|
||||
*
|
||||
* Pure-logic test — mirrors the branch inside handleSend's kick case
|
||||
* without spinning up a broker. Same pattern as
|
||||
* grants-enforcement.test.ts.
|
||||
*/
|
||||
|
||||
import { describe, expect, test } from "vitest";
|
||||
|
||||
type PeerRole = "control-plane" | "session" | "service";
|
||||
|
||||
/** Mirrors the predicate inserted into the kick handler. */
|
||||
function shouldSkipKick(args: {
|
||||
verb: "kick" | "disconnect";
|
||||
peerRole: PeerRole;
|
||||
}): boolean {
|
||||
const skipControlPlane = args.verb === "kick";
|
||||
return skipControlPlane && args.peerRole === "control-plane";
|
||||
}
|
||||
|
||||
describe("kick control-plane skip (gap #3a)", () => {
|
||||
test("kick on control-plane → skipped (would auto-reconnect)", () => {
|
||||
expect(shouldSkipKick({ verb: "kick", peerRole: "control-plane" })).toBe(true);
|
||||
});
|
||||
|
||||
test("kick on session → not skipped (closes user session)", () => {
|
||||
expect(shouldSkipKick({ verb: "kick", peerRole: "session" })).toBe(false);
|
||||
});
|
||||
|
||||
test("kick on service → not skipped", () => {
|
||||
expect(shouldSkipKick({ verb: "kick", peerRole: "service" })).toBe(false);
|
||||
});
|
||||
|
||||
test("disconnect on control-plane → not skipped (intentional nudge)", () => {
|
||||
expect(shouldSkipKick({ verb: "disconnect", peerRole: "control-plane" })).toBe(false);
|
||||
});
|
||||
|
||||
test("disconnect on session → not skipped", () => {
|
||||
expect(shouldSkipKick({ verb: "disconnect", peerRole: "session" })).toBe(false);
|
||||
});
|
||||
});
|
||||
File diff suppressed because it is too large
Load Diff
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "claudemesh-cli",
|
||||
"version": "1.30.0",
|
||||
"version": "1.34.16",
|
||||
"description": "Peer mesh for Claude Code sessions — CLI + MCP server.",
|
||||
"keywords": [
|
||||
"claude-code",
|
||||
|
||||
@@ -9,9 +9,13 @@ description: Use when the user asks to send a message to a peer Claude session,
|
||||
|
||||
**You invoke claudemesh exclusively through the `claudemesh` CLI via Bash.** There are no MCP tools to call — `tools/list` returns empty for the claudemesh MCP server. The MCP server exists only to deliver inbound peer messages as `<channel source="claudemesh">` interrupts mid-turn. Everything else is CLI.
|
||||
|
||||
## Launch welcome (`kind: "welcome"`) — 1.34.2+
|
||||
|
||||
5 seconds after Claude Code attaches to claudemesh via `claudemesh launch`, the MCP server emits ONE `<channel source="claudemesh">` push with `meta.kind: "welcome"`. It carries identity (`self_display_name`, `self_session_pubkey`, `self_role`), the active `mesh_slug`, live `peer_count` + `peer_names`, recent `unread_count` + `latest_message_ids`, and a CLI hint line. Treat it as the "mesh is connected" handshake — read it once, internalize identity + peers + inbox state, and use it to decide whether to act on unread items right away. Do NOT reply to a welcome push the way you reply to a DM; it has no sender.
|
||||
|
||||
## When you receive a `<channel source="claudemesh">` message
|
||||
|
||||
Respond IMMEDIATELY. Pause your current task, reply via `claudemesh send`, then resume. Read `from_name`, `mesh_slug`, and `priority` from the channel attributes. Reply by setting `<to>` to the sender's `from_name`. Do not ignore low-priority messages — acknowledge them briefly even if you defer action. If the channel meta contains `subtype: reminder`, this is a scheduled reminder you set yourself — act on it.
|
||||
Respond IMMEDIATELY (unless `meta.kind` is `"welcome"` or `"system"` — those are informational, no reply needed). Pause your current task, reply via `claudemesh send`, then resume. Read `from_name`, `mesh_slug`, and `priority` from the channel attributes. Reply by setting `<to>` to the sender's `from_name`. Do not ignore low-priority messages — acknowledge them briefly even if you defer action. If the channel meta contains `subtype: reminder`, this is a scheduled reminder you set yourself — act on it.
|
||||
|
||||
### Channel attributes (everything you need to reply is in the push)
|
||||
|
||||
@@ -19,14 +23,17 @@ The `<channel>` interrupt carries these attributes — no lookup needed:
|
||||
|
||||
| Attribute | What it is |
|
||||
|---|---|
|
||||
| `from_name` | Sender's display name. **Use as `to` in your reply** for DMs. |
|
||||
| `from_pubkey` | Sender's session pubkey (hex). Stable per-session. |
|
||||
| `from_member_id` | Sender's stable mesh.member id. Survives display-name changes — the canonical id. |
|
||||
| `from_name` | Sender's display name. **Use as `to` in your reply** for DMs. Empty/absent on `kind: "welcome"` and `kind: "system"`. |
|
||||
| `from_pubkey` | Sender's **session pubkey** (hex, ephemeral per-launch). Since 1.34.0 this is the session pubkey of the launched session that originated the send, NOT the daemon's stable member pubkey — sibling sessions of the same human are correctly disambiguated. |
|
||||
| `from_session_pubkey` | Same as `from_pubkey` for session-originated DMs. Kept as a separate key so the model never confuses session vs member identity when a control-plane source is involved. |
|
||||
| `from_member_id` / `from_member_pubkey` | Sender's stable mesh.member id / pubkey. Survives display-name and session rotation. Use to recognize "the same human across multiple Claude Code windows". |
|
||||
| `mesh_slug` | Mesh the message arrived on. Pass via `--mesh <slug>` if the parent isn't on the same mesh. |
|
||||
| `priority` | `now` / `next` / `low`. |
|
||||
| `message_id` | Server-side id of THIS message. **Pass to `--reply-to <id>` to thread your reply** in topic posts. |
|
||||
| `client_message_id` | Sender-stable idempotency id (UUID). Survives broker restarts; safe to log. |
|
||||
| `topic` | Set when the source is a topic post. Reply via `topic post <topic> --reply-to <message_id>`. |
|
||||
| `reply_to_id` | Set when the message itself is a reply to a previous one — render thread context. |
|
||||
| `kind` (welcome/system meta only) | `"welcome"` for the launch handshake, `"system"` for peer_join/peer_leave/etc. — neither needs a reply. |
|
||||
|
||||
**Reply patterns:**
|
||||
|
||||
@@ -328,22 +335,36 @@ claudemesh peer bans # list banned members
|
||||
claudemesh peer verify [peer] # 6×5-digit safety numbers
|
||||
```
|
||||
|
||||
JSON shape (per peer):
|
||||
JSON shape (per peer) — **render `role` and `groups` whenever you build a table for the user**, they're the highest-signal fields after `displayName`:
|
||||
```json
|
||||
{
|
||||
"displayName": "Mou",
|
||||
"pubkey": "abc123...",
|
||||
"pubkey": "abc123...", // session pubkey (rotates per claudemesh launch)
|
||||
"memberPubkey": "def456...", // stable identity (same across all sibling sessions)
|
||||
"sessionId": "uuid",
|
||||
"status": "idle | working | dnd",
|
||||
"summary": "string or null",
|
||||
"role": "lead | reviewer | bot | ...", // 1.31.5+: top-level alias of profile.role
|
||||
"groups": [{ "name": "reviewers", "role": "lead" }],
|
||||
"peerType": "claude | telegram | ...",
|
||||
"profile": {
|
||||
"role": "lead",
|
||||
"title": "string or null",
|
||||
"bio": "string or null",
|
||||
"avatar": "emoji or null",
|
||||
"capabilities": ["..."]
|
||||
},
|
||||
"peerType": "claude | telegram | ai | human | connector | ...",
|
||||
"channel": "claude-code | api | ...",
|
||||
"model": "claude-opus-4-7 | ...",
|
||||
"cwd": "/path/to/working/dir or null",
|
||||
"isSelf": true, // peer is one of the caller's own sessions
|
||||
"isThisSession": false, // peer is the exact session running the cli
|
||||
"stats": { "messagesIn": 0, "messagesOut": 0, "toolCalls": 0, "errors": 0, "uptime": 1200 }
|
||||
}
|
||||
```
|
||||
|
||||
**When asked to "list peers" inside a launched session, prefer the human renderer (`claudemesh peer list`, no `--json`) — it already prints role + groups inline next to the name with an explicit `(none)` footer when both are absent. If you do need JSON for parsing, always include `role` and `groups` columns in any rendered table; the user's primary question is usually "who's in what role" and dropping those fields hides the answer.**
|
||||
|
||||
### `message` — send and inspect messages
|
||||
|
||||
```bash
|
||||
@@ -356,15 +377,33 @@ claudemesh message send <p> "..." --priority now # bypass busy gates
|
||||
claudemesh message send <p> "..." --priority next # default
|
||||
claudemesh message send <p> "..." --priority low # pull-only
|
||||
|
||||
# inbox (alias: claudemesh inbox)
|
||||
claudemesh message inbox
|
||||
claudemesh message inbox --json
|
||||
# inbox (alias: claudemesh inbox) — 1.34.0+ reads from inbox.db via daemon IPC
|
||||
claudemesh inbox # all attached meshes, last 100
|
||||
claudemesh inbox --mesh <slug> # scoped to one mesh
|
||||
claudemesh inbox --mesh <slug> --limit 20 # custom cap
|
||||
claudemesh inbox --json # full row (sender_pubkey, mesh, body, received_at, seen_at, …)
|
||||
claudemesh inbox --unread # 1.34.8+ only rows whose seen_at IS NULL
|
||||
|
||||
# inbox flush + delete — 1.34.7+
|
||||
claudemesh inbox flush --mesh <slug> # delete all rows on one mesh
|
||||
claudemesh inbox flush --before <iso-timestamp> # delete rows older than timestamp
|
||||
claudemesh inbox flush --all # delete every row on every mesh (required guard)
|
||||
claudemesh inbox delete <id> # delete one inbox row by id (alias: rm)
|
||||
claudemesh inbox flush --mesh <slug> --json # JSON: { ok: true, removed: N }
|
||||
|
||||
# delivery status (alias: claudemesh msg-status <id>)
|
||||
claudemesh message status <message-id>
|
||||
claudemesh message status <message-id> --json
|
||||
```
|
||||
|
||||
**Inbox source (1.34.0+):** `claudemesh inbox` queries the daemon's persistent `~/.claudemesh/daemon/inbox.db` over IPC — it is NOT a fresh broker-WS buffer drain. Rows survive daemon restarts. Sender attribution is the actual session pubkey of the launched session that originated the send (NOT the stable member pubkey of the sender's daemon), so two sibling sessions of the same human appear as distinct rows.
|
||||
|
||||
**Read-state (1.34.8+):** every inbox row carries a `seen_at` timestamp. `null` = never surfaced; an ISO string = first surfaced at that moment. The flag flips automatically when (a) the row is returned by an interactive `claudemesh inbox` listing, or (b) the MCP server emits a live `<channel>` reminder for it. The launch welcome push uses `unread_only=true` to surface only rows the user hasn't seen — so a session relaunched a day later sees what it actually missed, not the same 24h batch every time. Use `claudemesh inbox --unread` to get the same filter from the CLI.
|
||||
|
||||
**Self-echo guard (1.34.8+):** broker fan-out paths sometimes mirror an outbound DM back to the originating session-WS. The daemon now drops those at the WS boundary (matching on `senderPubkey === own.session_pubkey`), so the sender no longer sees their own `claudemesh send` arrive as a `← claudemesh: <self>: ...` channel push immediately after dispatching it.
|
||||
|
||||
**Inbox TTL (1.34.8+):** the daemon runs an hourly prune that deletes rows older than 30 days. Without this the inbox grew unbounded; now it self-trims while preserving "I went on holiday and want to see what I missed" recovery for a generous window. No CLI knob — it's a built-in retention policy. To override, manually `claudemesh inbox flush --before <iso>`.
|
||||
|
||||
`send` JSON output: `{"ok": true, "messageId": "...", "target": "..."}`. Errors: `{"ok": false, "error": "..."}`.
|
||||
|
||||
### `state` — shared per-mesh key-value store
|
||||
|
||||
@@ -2,6 +2,43 @@ import { defineCommand, runMain } from "citty";
|
||||
|
||||
export interface ParsedArgs { command: string; positionals: string[]; flags: Record<string, string | boolean | undefined>; }
|
||||
|
||||
/**
|
||||
* Flags that NEVER take a value. The parser's default behavior is greedy
|
||||
* (any `--flag` consumes the next non-`-` arg as its value), which is
|
||||
* fine for `--mesh foo` and `--priority now` but breaks for booleans:
|
||||
* `claudemesh send --self <pubkey> "msg"` was eating the pubkey as the
|
||||
* value of --self, leaving zero positionals and triggering Usage errors.
|
||||
*
|
||||
* Adding to this set: any new boolean / no-arg switch.
|
||||
*/
|
||||
const BOOLEAN_FLAGS = new Set([
|
||||
"self",
|
||||
"json", // also accepts --json=a,b,c form below
|
||||
"all",
|
||||
"yes", "y",
|
||||
"help", "h",
|
||||
"version", "v",
|
||||
"quiet",
|
||||
"strict",
|
||||
"continue",
|
||||
"no-daemon",
|
||||
"no-color",
|
||||
"debug",
|
||||
"allow-ci-persistent",
|
||||
"force",
|
||||
"dry-run",
|
||||
"verbose",
|
||||
"skip-service",
|
||||
// 1.34.8: `--unread` filters `claudemesh inbox` to rows whose
|
||||
// seen_at is NULL. No value — pure switch.
|
||||
"unread",
|
||||
// 1.34.12: `--foreground` keeps `claudemesh daemon up` attached
|
||||
// to the terminal (pre-1.34.12 behavior). Default is detached now.
|
||||
"foreground",
|
||||
"no-tcp",
|
||||
"public-health",
|
||||
]);
|
||||
|
||||
export function parseArgv(argv: string[]): ParsedArgs {
|
||||
const args = argv.slice(2);
|
||||
const flags: Record<string, string | boolean | undefined> = {};
|
||||
@@ -10,14 +47,26 @@ export function parseArgv(argv: string[]): ParsedArgs {
|
||||
|
||||
for (let i = 0; i < args.length; i++) {
|
||||
const arg = args[i]!;
|
||||
// --flag=value (always parsed as a value, regardless of boolean set)
|
||||
if (arg.startsWith("--") && arg.includes("=")) {
|
||||
const eq = arg.indexOf("=");
|
||||
const key = arg.slice(2, eq);
|
||||
flags[key] = arg.slice(eq + 1);
|
||||
continue;
|
||||
}
|
||||
if (arg.startsWith("--")) {
|
||||
const key = arg.slice(2);
|
||||
// Known boolean → never consume the next token as a value.
|
||||
if (BOOLEAN_FLAGS.has(key)) { flags[key] = true; continue; }
|
||||
const next = args[i + 1];
|
||||
if (next && !next.startsWith("-")) { flags[key] = next; i++; } else flags[key] = true;
|
||||
if (next !== undefined && !next.startsWith("-")) { flags[key] = next; i++; }
|
||||
else flags[key] = true;
|
||||
} else if (arg.startsWith("-") && arg.length === 2) {
|
||||
const key = arg.slice(1);
|
||||
if (BOOLEAN_FLAGS.has(key)) { flags[key] = true; continue; }
|
||||
const next = args[i + 1];
|
||||
if (next && !next.startsWith("-")) { flags[key] = next; i++; } else flags[key] = true;
|
||||
if (next !== undefined && !next.startsWith("-")) { flags[key] = next; i++; }
|
||||
else flags[key] = true;
|
||||
} else if (!command) {
|
||||
command = arg;
|
||||
} else {
|
||||
|
||||
@@ -1,3 +1,7 @@
|
||||
import { spawn } from "node:child_process";
|
||||
import { existsSync, openSync, mkdirSync } from "node:fs";
|
||||
import { join } from "node:path";
|
||||
|
||||
import { runDaemon } from "~/daemon/run.js";
|
||||
import { ipc, IpcError } from "~/daemon/ipc/client.js";
|
||||
import { readRunningPid } from "~/daemon/lock.js";
|
||||
@@ -9,6 +13,15 @@ export interface DaemonOptions {
|
||||
publicHealth?: boolean;
|
||||
mesh?: string;
|
||||
displayName?: string;
|
||||
/** 1.34.12: keep the daemon attached to the current shell instead
|
||||
* of double-forking. Default behavior changed in 1.34.12 — `up`
|
||||
* now detaches by default and writes JSON logs to
|
||||
* ~/.claudemesh/daemon/daemon.log. Pass `--foreground` to get the
|
||||
* pre-1.34.12 behavior (logs streaming to stdout, blocks the
|
||||
* terminal until Ctrl-C). install-service and `claudemesh launch`'s
|
||||
* auto-spawn path always pass --foreground because their parents
|
||||
* (launchd / the launch helper) own the lifecycle. */
|
||||
foreground?: boolean;
|
||||
/** outbox-list status filter, set from boolean flags --failed/--pending/etc. */
|
||||
outboxStatus?: "pending" | "inflight" | "done" | "dead" | "aborted";
|
||||
/** outbox requeue: optional id to mint a fresh client_message_id with. */
|
||||
@@ -26,11 +39,40 @@ export async function runDaemonCommand(
|
||||
|
||||
case "up":
|
||||
case "start":
|
||||
// 1.34.10: `--mesh` and `--name` deprecated.
|
||||
// --mesh: daemon attaches to every joined mesh automatically;
|
||||
// pinning at start time blocks new meshes from being picked up.
|
||||
// --name: overrides the daemon-WS display name GLOBALLY across
|
||||
// every mesh, but each mesh has its own per-mesh display name
|
||||
// in config.json (set at `claudemesh join` time). Passing one
|
||||
// name flattens that out. Sessions advertise their own
|
||||
// CLAUDEMESH_DISPLAY_NAME at `claudemesh launch` time anyway,
|
||||
// and the daemon-WS presence is hidden from peer lists since
|
||||
// 1.32, so the daemon's display name isn't user-visible.
|
||||
if (opts.mesh) {
|
||||
process.stderr.write(
|
||||
`[claudemesh] --mesh on \`daemon up\` is deprecated; the daemon attaches to every joined mesh automatically. ` +
|
||||
`Ignoring --mesh ${opts.mesh}.\n`,
|
||||
);
|
||||
}
|
||||
if (opts.displayName) {
|
||||
process.stderr.write(
|
||||
`[claudemesh] --name on \`daemon up\` is deprecated; per-mesh display names live in config.json (set at join time), ` +
|
||||
`and session display names come from \`claudemesh launch --name\`. Ignoring --name ${opts.displayName}.\n`,
|
||||
);
|
||||
}
|
||||
// 1.34.12: detach by default. The pre-1.34.12 behavior streamed
|
||||
// JSON logs to the controlling terminal and blocked the shell —
|
||||
// fine for debugging, surprising for users who just want the
|
||||
// daemon "up." `--foreground` opts back into the old behavior;
|
||||
// launchd / systemd-user units always pass it because the unit
|
||||
// manager owns lifecycle and stdio redirection.
|
||||
if (!opts.foreground) {
|
||||
return spawnDetachedDaemon(opts);
|
||||
}
|
||||
return runDaemon({
|
||||
tcpEnabled: !opts.noTcp,
|
||||
publicHealthCheck: opts.publicHealth,
|
||||
mesh: opts.mesh,
|
||||
displayName: opts.displayName,
|
||||
});
|
||||
|
||||
case "help":
|
||||
@@ -74,19 +116,18 @@ USAGE
|
||||
claudemesh daemon <command> [options]
|
||||
|
||||
COMMANDS
|
||||
up | start start the daemon in the foreground
|
||||
up | start start the daemon (detached by default)
|
||||
status show running pid + IPC health
|
||||
version ipc + schema version of the running daemon
|
||||
down | stop stop the running daemon (SIGTERM, then wait)
|
||||
accept-host pin the current host fingerprint
|
||||
outbox list list local outbox rows (newest first)
|
||||
outbox requeue <id> re-enqueue an aborted / dead outbox row
|
||||
install-service --mesh <s> write launchd (macOS) / systemd-user (Linux) unit
|
||||
install-service write launchd (macOS) / systemd-user (Linux) unit
|
||||
uninstall-service remove the platform service unit
|
||||
|
||||
OPTIONS
|
||||
--mesh <slug> attach to / target this mesh
|
||||
--name <displayName> override CLAUDEMESH_DISPLAY_NAME
|
||||
--foreground keep daemon attached to terminal, JSON logs to stdout (1.34.12+)
|
||||
--no-tcp disable the loopback TCP listener (UDS only)
|
||||
--public-health expose /v1/health unauthenticated on TCP
|
||||
--json machine-readable output where supported
|
||||
@@ -190,13 +231,14 @@ async function runInstallService(opts: DaemonOptions): Promise<number> {
|
||||
process.stderr.write(`unsupported platform: ${process.platform}\n`);
|
||||
return 2;
|
||||
}
|
||||
if (!opts.mesh) {
|
||||
process.stderr.write(`pass --mesh <slug> so the service knows which mesh to attach to\n`);
|
||||
return 2;
|
||||
}
|
||||
// Resolve the binary path. Prefer the running argv[0] when it's an
|
||||
// installed claudemesh binary; fall back to whichever `claudemesh` is
|
||||
// first on PATH.
|
||||
// 1.34.10: install-service no longer bakes --mesh into the unit. The
|
||||
// daemon attaches to every joined mesh by default, and pinning the
|
||||
// unit to one slug at install time was the source of the "joined a
|
||||
// new mesh but my service ignores it" footgun. If the user passes
|
||||
// --mesh anyway, we warn + ignore.
|
||||
let binary = process.argv[1] ?? "";
|
||||
if (!binary || /\.ts$/.test(binary) || /node_modules|src\/entrypoints/.test(binary)) {
|
||||
try {
|
||||
@@ -207,11 +249,19 @@ async function runInstallService(opts: DaemonOptions): Promise<number> {
|
||||
return 1;
|
||||
}
|
||||
}
|
||||
if (opts.mesh) {
|
||||
process.stderr.write(
|
||||
`[claudemesh] --mesh on \`daemon install-service\` is deprecated and ignored; the daemon attaches to every joined mesh.\n`,
|
||||
);
|
||||
}
|
||||
if (opts.displayName) {
|
||||
process.stderr.write(
|
||||
`[claudemesh] --name on \`daemon install-service\` is deprecated and ignored; per-mesh names live in config.json, session names come from \`claudemesh launch --name\`.\n`,
|
||||
);
|
||||
}
|
||||
try {
|
||||
const r = installService({
|
||||
binaryPath: binary,
|
||||
meshSlug: opts.mesh,
|
||||
displayName: opts.displayName,
|
||||
});
|
||||
if (opts.json) {
|
||||
process.stdout.write(JSON.stringify({ ok: true, ...r }) + "\n");
|
||||
@@ -311,3 +361,71 @@ async function runStop(opts: DaemonOptions): Promise<number> {
|
||||
else process.stdout.write(`daemon: signaled but did not exit within 5s (pid ${pid})\n`);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/**
|
||||
* 1.34.12: spawn the daemon as a detached background process. Re-execs
|
||||
* the same `claudemesh` binary with `daemon up --foreground` (so the
|
||||
* child runs the long-lived loop), redirects stdout/stderr to
|
||||
* ~/.claudemesh/daemon/daemon.log, and `unref()`s so the parent shell
|
||||
* can exit cleanly.
|
||||
*
|
||||
* The parent waits up to ~3s for the UDS socket to appear before
|
||||
* declaring success — that's the same liveness check `claudemesh launch`
|
||||
* uses, and it catches the "child crashed during boot" case (config
|
||||
* read failed, port bind failed, etc.) with an actionable error
|
||||
* pointing at the log file rather than silent loss.
|
||||
*/
|
||||
async function spawnDetachedDaemon(opts: DaemonOptions): Promise<number> {
|
||||
// Ensure the log directory exists before opening the FDs.
|
||||
mkdirSync(DAEMON_PATHS.DAEMON_DIR, { recursive: true, mode: 0o700 });
|
||||
const logPath = join(DAEMON_PATHS.DAEMON_DIR, "daemon.log");
|
||||
|
||||
// The CLI binary path. process.argv[1] is the entrypoint script the
|
||||
// node runtime is currently executing — for an installed CLI that's
|
||||
// .../bin/claudemesh, for `bun run` dev that's the local dist file.
|
||||
// Either way it's the right thing to re-exec.
|
||||
const binary = process.argv[1] ?? "claudemesh";
|
||||
const args = ["daemon", "up", "--foreground"];
|
||||
if (opts.noTcp) args.push("--no-tcp");
|
||||
if (opts.publicHealth) args.push("--public-health");
|
||||
|
||||
const out = openSync(logPath, "a");
|
||||
const err = openSync(logPath, "a");
|
||||
const child = spawn(process.execPath, [binary, ...args], {
|
||||
detached: true,
|
||||
stdio: ["ignore", out, err],
|
||||
env: process.env,
|
||||
});
|
||||
// Decouple the child from the parent's process group so closing the
|
||||
// shell doesn't SIGHUP the daemon.
|
||||
child.unref();
|
||||
|
||||
// Wait for the socket to appear — the daemon's IPC listener binds
|
||||
// ~immediately after the broker WS handshake starts, so socket
|
||||
// existence is a reliable "the daemon is alive enough to accept
|
||||
// requests" signal.
|
||||
const sockPath = DAEMON_PATHS.SOCK_FILE;
|
||||
const startedAt = Date.now();
|
||||
while (Date.now() - startedAt < 3_000) {
|
||||
if (existsSync(sockPath)) {
|
||||
if (opts.json) {
|
||||
process.stdout.write(JSON.stringify({ ok: true, detached: true, pid: child.pid, log: logPath }) + "\n");
|
||||
} else {
|
||||
process.stdout.write(` ✔ daemon started (pid ${child.pid})\n`);
|
||||
process.stdout.write(` → log: ${logPath}\n`);
|
||||
process.stdout.write(` → stop: claudemesh daemon down\n`);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
await new Promise<void>((r) => setTimeout(r, 100));
|
||||
}
|
||||
|
||||
if (opts.json) {
|
||||
process.stdout.write(JSON.stringify({ ok: false, detached: true, pid: child.pid, reason: "socket_not_appeared", log: logPath }) + "\n");
|
||||
} else {
|
||||
process.stderr.write(` ✘ daemon spawn timeout: socket did not appear within 3s\n`);
|
||||
process.stderr.write(` → check log: ${logPath}\n`);
|
||||
process.stderr.write(` → run foreground for live output: claudemesh daemon up --foreground\n`);
|
||||
}
|
||||
return 1;
|
||||
}
|
||||
|
||||
91
apps/cli/src/commands/inbox-actions.ts
Normal file
91
apps/cli/src/commands/inbox-actions.ts
Normal file
@@ -0,0 +1,91 @@
|
||||
/**
|
||||
* `claudemesh inbox flush` and `claudemesh inbox delete <id>` —
|
||||
* mutate the daemon's persistent inbox store
|
||||
* (`~/.claudemesh/daemon/inbox.db`) over IPC.
|
||||
*
|
||||
* 1.34.7: until this version, the only way to clean the inbox was a
|
||||
* raw `sqlite3 inbox.db "DELETE FROM inbox"` against the daemon's
|
||||
* private DB. That works but bypasses the IPC layer (and any future
|
||||
* lifecycle hooks on row removal), and is invisible to a user who
|
||||
* doesn't know the schema. These two verbs make the operation visible
|
||||
* + safe + scriptable.
|
||||
*/
|
||||
|
||||
import {
|
||||
tryFlushInboxViaDaemon,
|
||||
tryDeleteInboxRowViaDaemon,
|
||||
} from "~/services/bridge/daemon-route.js";
|
||||
import { render } from "~/ui/render.js";
|
||||
import { dim } from "~/ui/styles.js";
|
||||
|
||||
export interface InboxFlushFlags {
|
||||
mesh?: string;
|
||||
/** ISO-8601 timestamp; deletes rows received_at < before. */
|
||||
before?: string;
|
||||
/** Required when neither --mesh nor --before is set, to prevent an
|
||||
* accidental "delete every row on every mesh". */
|
||||
all?: boolean;
|
||||
json?: boolean;
|
||||
}
|
||||
|
||||
export async function runInboxFlush(flags: InboxFlushFlags): Promise<void> {
|
||||
const hasFilter = !!(flags.mesh || flags.before);
|
||||
if (!hasFilter && !flags.all) {
|
||||
if (flags.json) { process.stdout.write(JSON.stringify({ ok: false, error: "missing_filter" }) + "\n"); return; }
|
||||
render.info(dim(
|
||||
"Refusing to flush every row on every mesh.\n" +
|
||||
" Re-run with --mesh <slug>, --before <iso-timestamp>, or --all to confirm.",
|
||||
));
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
const removed = await tryFlushInboxViaDaemon({
|
||||
...(flags.mesh ? { mesh: flags.mesh } : {}),
|
||||
...(flags.before ? { beforeIso: flags.before } : {}),
|
||||
});
|
||||
|
||||
if (removed === null) {
|
||||
if (flags.json) { process.stdout.write(JSON.stringify({ ok: false, error: "daemon_unreachable" }) + "\n"); return; }
|
||||
render.info(dim("Daemon not reachable. Run `claudemesh daemon up` and retry."));
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
if (flags.json) {
|
||||
process.stdout.write(JSON.stringify({ ok: true, removed }) + "\n");
|
||||
return;
|
||||
}
|
||||
const scope = flags.mesh
|
||||
? `mesh "${flags.mesh}"`
|
||||
: flags.before
|
||||
? `older than ${flags.before}`
|
||||
: "all meshes";
|
||||
render.info(`✔ Flushed ${removed} message${removed === 1 ? "" : "s"} from ${scope}.`);
|
||||
}
|
||||
|
||||
export interface InboxDeleteFlags {
|
||||
json?: boolean;
|
||||
}
|
||||
|
||||
export async function runInboxDelete(id: string, flags: InboxDeleteFlags): Promise<void> {
|
||||
if (!id) {
|
||||
if (flags.json) { process.stdout.write(JSON.stringify({ ok: false, error: "missing_id" }) + "\n"); return; }
|
||||
render.info(dim("Usage: claudemesh inbox delete <message-id>"));
|
||||
process.exit(1);
|
||||
}
|
||||
const ok = await tryDeleteInboxRowViaDaemon(id);
|
||||
if (ok === null) {
|
||||
if (flags.json) { process.stdout.write(JSON.stringify({ ok: false, error: "daemon_unreachable" }) + "\n"); return; }
|
||||
render.info(dim("Daemon not reachable. Run `claudemesh daemon up` and retry."));
|
||||
process.exit(1);
|
||||
}
|
||||
if (!ok) {
|
||||
if (flags.json) { process.stdout.write(JSON.stringify({ ok: false, error: "not_found", id }) + "\n"); return; }
|
||||
render.info(dim(`No inbox row with id "${id}".`));
|
||||
process.exit(1);
|
||||
}
|
||||
if (flags.json) {
|
||||
process.stdout.write(JSON.stringify({ ok: true, id }) + "\n");
|
||||
return;
|
||||
}
|
||||
render.info(`✔ Deleted inbox row ${id}.`);
|
||||
}
|
||||
@@ -1,49 +1,101 @@
|
||||
/**
|
||||
* `claudemesh inbox` — read pending peer messages.
|
||||
* `claudemesh inbox` — read pending peer messages from the daemon's
|
||||
* persisted inbox (`~/.claudemesh/daemon/inbox.db`).
|
||||
*
|
||||
* Connects, waits briefly for push delivery, drains the buffer, prints.
|
||||
* Works best when message-mode is "inbox" or "off" (messages held at broker).
|
||||
* 1.34.0: switched from the legacy cold-path "open fresh broker WS,
|
||||
* drain in-memory buffer" flow to a daemon IPC read against `/v1/inbox`.
|
||||
* The cold path was structurally broken — the persistent inbox lives in
|
||||
* the daemon, and pushes land on its session-WS, not on a freshly-opened
|
||||
* standalone WS. The daemon-route `tryListInboxViaDaemon` returns rows
|
||||
* persisted across daemon restarts and surfaces them with the correct
|
||||
* mesh scoping (server-side mesh filter added in 1.34.0).
|
||||
*
|
||||
* Cold-path fallback removed: when the daemon isn't reachable, the
|
||||
* prior implementation returned an empty list anyway (no broker state
|
||||
* = no buffered pushes), so removing that path doesn't lose any
|
||||
* functionality. Strict mode emits a clear error via daemon-route.
|
||||
*/
|
||||
|
||||
import { withMesh } from "./connect.js";
|
||||
import type { InboundPush } from "~/services/broker/facade.js";
|
||||
import { tryListInboxViaDaemon } from "~/services/bridge/daemon-route.js";
|
||||
import { render } from "~/ui/render.js";
|
||||
import { bold, dim } from "~/ui/styles.js";
|
||||
|
||||
export interface InboxFlags {
|
||||
mesh?: string;
|
||||
json?: boolean;
|
||||
wait?: number;
|
||||
/** Cap the number of rows returned by the daemon. Default 100. */
|
||||
limit?: number;
|
||||
/** 1.34.8: only show rows whose seen_at is NULL (i.e., never
|
||||
* surfaced via an interactive listing or live channel reminder).
|
||||
* When omitted, every row is returned and an interactive listing
|
||||
* stamps them seen as a side effect. */
|
||||
unread?: boolean;
|
||||
}
|
||||
|
||||
function formatMessage(msg: InboundPush): string {
|
||||
const text = msg.plaintext ?? `[encrypted: ${msg.ciphertext.slice(0, 32)}…]`;
|
||||
const from = msg.senderPubkey.slice(0, 8);
|
||||
const time = new Date(msg.createdAt).toLocaleTimeString();
|
||||
const kindTag = msg.kind === "direct" ? "→ direct" : msg.kind;
|
||||
return ` ${bold(from)} ${dim(`[${kindTag}] ${time}`)}\n ${text}`;
|
||||
interface FormattedItem {
|
||||
sender_pubkey: string;
|
||||
sender_name: string;
|
||||
body: string | null;
|
||||
topic: string | null;
|
||||
received_at: string;
|
||||
mesh: string;
|
||||
}
|
||||
|
||||
function formatMessage(msg: FormattedItem, includeMesh: boolean): string {
|
||||
const text = msg.body ?? "[encrypted]";
|
||||
const from = msg.sender_name && msg.sender_name !== msg.sender_pubkey.slice(0, 8)
|
||||
? `${msg.sender_name} (${msg.sender_pubkey.slice(0, 8)})`
|
||||
: msg.sender_pubkey.slice(0, 8);
|
||||
const time = new Date(msg.received_at).toLocaleTimeString();
|
||||
const topicTag = msg.topic ? ` (#${msg.topic})` : "";
|
||||
const meshTag = includeMesh ? ` [${msg.mesh}]` : "";
|
||||
return ` ${bold(from)} ${dim(`${meshTag}${topicTag} ${time}`)}\n ${text}`;
|
||||
}
|
||||
|
||||
export async function runInbox(flags: InboxFlags): Promise<void> {
|
||||
const waitMs = (flags.wait ?? 1) * 1000;
|
||||
// Mesh resolution is owned by the daemon (it knows which meshes are
|
||||
// attached) — the CLI just forwards the user's --mesh flag through.
|
||||
// When omitted, the daemon's `/v1/inbox` honors the session-default
|
||||
// mesh on auth-token requests; out-of-session callers see rows from
|
||||
// every attached mesh. We don't pre-validate the mesh slug here so
|
||||
// the command works even from a launch tmpdir whose local
|
||||
// `config.json` only knows about the launch's mesh.
|
||||
const meshSlug = flags.mesh;
|
||||
|
||||
await withMesh({ meshSlug: flags.mesh ?? null }, async (client, mesh) => {
|
||||
await new Promise<void>((resolve) => setTimeout(resolve, waitMs));
|
||||
const messages = client.drainPushBuffer();
|
||||
|
||||
if (flags.json) {
|
||||
process.stdout.write(JSON.stringify(messages, null, 2) + "\n");
|
||||
return;
|
||||
}
|
||||
|
||||
if (messages.length === 0) {
|
||||
render.info(dim(`No messages on mesh "${mesh.slug}".`));
|
||||
return;
|
||||
}
|
||||
|
||||
render.section(`inbox — ${mesh.slug} (${messages.length} message${messages.length === 1 ? "" : "s"})`);
|
||||
for (const msg of messages) {
|
||||
process.stdout.write(formatMessage(msg) + "\n\n");
|
||||
}
|
||||
const items = await tryListInboxViaDaemon(meshSlug, flags.limit ?? 100, {
|
||||
unreadOnly: flags.unread === true,
|
||||
// CLI is the canonical "I'm reading my inbox" path — let the daemon
|
||||
// auto-stamp seen_at on the rows we just rendered. The MCP welcome
|
||||
// path passes mark_seen=false instead and stamps explicitly after
|
||||
// the channel notification succeeds.
|
||||
markSeen: true,
|
||||
});
|
||||
if (items === null) {
|
||||
if (flags.json) { process.stdout.write("[]\n"); return; }
|
||||
render.info(dim("Daemon not reachable. Run `claudemesh daemon up` and retry."));
|
||||
return;
|
||||
}
|
||||
|
||||
if (flags.json) {
|
||||
process.stdout.write(JSON.stringify(items, null, 2) + "\n");
|
||||
return;
|
||||
}
|
||||
|
||||
if (items.length === 0) {
|
||||
const scope = meshSlug ? `mesh "${meshSlug}"` : "any mesh";
|
||||
const filter = flags.unread ? "unread " : "";
|
||||
render.info(dim(`No ${filter}messages on ${scope}.`));
|
||||
return;
|
||||
}
|
||||
|
||||
const filterTag = flags.unread ? " unread" : "";
|
||||
const heading = meshSlug
|
||||
? `inbox — ${meshSlug} (${items.length}${filterTag} message${items.length === 1 ? "" : "s"})`
|
||||
: `inbox (${items.length}${filterTag} message${items.length === 1 ? "" : "s"})`;
|
||||
render.section(heading);
|
||||
// When the user didn't filter by mesh, surface the mesh slug per row
|
||||
// so they can tell apart rows from different meshes at a glance.
|
||||
for (const msg of items) {
|
||||
process.stdout.write(formatMessage(msg, !meshSlug) + "\n\n");
|
||||
}
|
||||
}
|
||||
|
||||
@@ -434,7 +434,7 @@ function installStatusLine(): { installed: boolean } {
|
||||
return { installed: true };
|
||||
}
|
||||
|
||||
export function runInstall(args: string[] = []): void {
|
||||
export async function runInstall(args: string[] = []): Promise<void> {
|
||||
const skipHooks = args.includes("--no-hooks");
|
||||
const skipSkill = args.includes("--no-skill");
|
||||
const skipService = args.includes("--no-service");
|
||||
@@ -545,23 +545,25 @@ export function runInstall(args: string[] = []): void {
|
||||
}
|
||||
|
||||
let hasMeshes = false;
|
||||
let primaryMesh: string | undefined;
|
||||
try {
|
||||
const meshConfig = readConfig();
|
||||
hasMeshes = meshConfig.meshes.length > 0;
|
||||
primaryMesh = meshConfig.meshes[0]?.slug;
|
||||
} catch {}
|
||||
|
||||
// Daemon service install — required for MCP integration as of 1.24.0.
|
||||
// The daemon owns the broker WS and feeds the MCP push-pipe via SSE;
|
||||
// skipping it leaves channel push, slash commands, and resources broken.
|
||||
if (!skipService && hasMeshes && primaryMesh) {
|
||||
// 1.30.2: install no longer locks the unit to a single mesh; the
|
||||
// daemon attaches to every joined mesh on boot (1.26.0 multi-mesh
|
||||
// design). Users who want single-mesh can pass `claudemesh daemon
|
||||
// install-service --mesh <slug>` explicitly.
|
||||
if (!skipService && hasMeshes) {
|
||||
try {
|
||||
installDaemonService(entry, primaryMesh);
|
||||
await installDaemonService(entry);
|
||||
} catch (e) {
|
||||
render.warn(
|
||||
`daemon service install failed: ${e instanceof Error ? e.message : String(e)}`,
|
||||
"Run `claudemesh daemon install-service --mesh <slug>` to retry.",
|
||||
"Run `claudemesh daemon install-service` to retry.",
|
||||
);
|
||||
}
|
||||
} else if (skipService) {
|
||||
@@ -601,7 +603,7 @@ export function runInstall(args: string[] = []): void {
|
||||
* the user knows there's a problem before it shows up as "no messages
|
||||
* arriving."
|
||||
*/
|
||||
function installDaemonService(binaryEntry: string, meshSlug: string): void {
|
||||
async function installDaemonService(binaryEntry: string): Promise<void> {
|
||||
const {
|
||||
installService,
|
||||
detectPlatform,
|
||||
@@ -625,17 +627,17 @@ function installDaemonService(binaryEntry: string, meshSlug: string): void {
|
||||
} catch {
|
||||
render.warn(
|
||||
"couldn't resolve a 'claudemesh' binary on PATH; daemon service skipped",
|
||||
"Install via npm/homebrew, then run `claudemesh daemon install-service --mesh " + meshSlug + "`",
|
||||
"Install via npm/homebrew, then run `claudemesh daemon install-service`",
|
||||
);
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
const r = installService({ binaryPath: binary, meshSlug });
|
||||
const r = installService({ binaryPath: binary });
|
||||
render.ok(`daemon service installed (${r.platform})`);
|
||||
render.kv([
|
||||
["unit", dim(r.unitPath)],
|
||||
["mesh", dim(meshSlug)],
|
||||
["mesh", dim("(all joined meshes)")],
|
||||
]);
|
||||
|
||||
// Boot the unit immediately so MCP has a daemon to attach to on next
|
||||
@@ -650,7 +652,52 @@ function installDaemonService(binaryEntry: string, meshSlug: string): void {
|
||||
`daemon service installed but failed to start: ${e instanceof Error ? e.message : String(e)}`,
|
||||
`Run manually: ${r.bootCommand}`,
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
// 1.31.0 — post-flight: verify the daemon actually establishes a
|
||||
// broker WebSocket. Boots that fail silently here (DNS, expired TLS,
|
||||
// outbound :443 blocked, broker outage) used to surface only when
|
||||
// the user's first `peer list` or `send` failed half an hour later.
|
||||
// Polling /v1/health gives a clear, install-time signal.
|
||||
await verifyBrokerConnectivity();
|
||||
}
|
||||
|
||||
async function verifyBrokerConnectivity(): Promise<void> {
|
||||
const VERIFY_BUDGET_MS = 15_000;
|
||||
const POLL_INTERVAL_MS = 500;
|
||||
const { ipc } = await import("~/daemon/ipc/client.js");
|
||||
const start = Date.now();
|
||||
let lastBrokers: Record<string, string> = {};
|
||||
|
||||
while (Date.now() - start < VERIFY_BUDGET_MS) {
|
||||
try {
|
||||
const res = await ipc<{ ok: boolean; brokers?: Record<string, string> }>({
|
||||
path: "/v1/health",
|
||||
timeoutMs: 2_000,
|
||||
});
|
||||
lastBrokers = res.body?.brokers ?? {};
|
||||
const openMesh = Object.entries(lastBrokers).find(([, s]) => s === "open");
|
||||
if (openMesh) {
|
||||
const others = Object.entries(lastBrokers).filter(([slug]) => slug !== openMesh[0]);
|
||||
const tail = others.length > 0 ? `, ${others.length} other mesh${others.length === 1 ? "" : "es"} attaching` : "";
|
||||
render.ok(`broker connected (mesh=${openMesh[0]}${tail})`);
|
||||
return;
|
||||
}
|
||||
} catch { /* daemon may still be starting up; keep polling */ }
|
||||
await new Promise((r) => setTimeout(r, POLL_INTERVAL_MS));
|
||||
}
|
||||
|
||||
// Timed out without a single broker reaching `open`. Surface what we
|
||||
// saw last so the user can act — this is exactly the bug class we
|
||||
// want to catch at install time, not at first send.
|
||||
const states = Object.keys(lastBrokers).length === 0
|
||||
? "no health response from daemon"
|
||||
: Object.entries(lastBrokers).map(([m, s]) => `${m}=${s}`).join(", ");
|
||||
render.warn(
|
||||
`broker did not reach open within ${Math.round(VERIFY_BUDGET_MS / 1000)}s (${states})`,
|
||||
"Check ~/.claudemesh/daemon/daemon.log for connect errors. Common causes: outbound :443 blocked, expired TLS, DNS resolution.",
|
||||
);
|
||||
}
|
||||
|
||||
export function runUninstall(): void {
|
||||
|
||||
@@ -76,12 +76,32 @@ export async function runKick(
|
||||
if ("error" in built) { render.err(String(built.error)); return EXIT.INVALID_ARGS; }
|
||||
|
||||
return await withMesh({ meshSlug }, async (client) => {
|
||||
const result = await client.sendAndWait(built as Record<string, unknown>) as { affected?: string[]; kicked?: string[] };
|
||||
const result = await client.sendAndWait(built as Record<string, unknown>) as {
|
||||
affected?: string[];
|
||||
kicked?: string[];
|
||||
// 1.34.15: broker refuses to kick control-plane WSes (they'd
|
||||
// just auto-reconnect). Older brokers don't emit this field.
|
||||
skipped_control_plane?: string[];
|
||||
};
|
||||
const peers = result?.affected ?? result?.kicked ?? [];
|
||||
if (peers.length === 0) render.info("No peers matched.");
|
||||
else {
|
||||
const skipped = result?.skipped_control_plane ?? [];
|
||||
|
||||
if (peers.length === 0 && skipped.length === 0) {
|
||||
render.info("No peers matched.");
|
||||
} else if (peers.length === 0 && skipped.length > 0) {
|
||||
render.warn(
|
||||
`${skipped.length} match(es) refused: ${skipped.join(", ")} — control-plane connections (daemon / dashboard) auto-reconnect, so kick is a no-op.`,
|
||||
"To take a daemon offline locally, run `claudemesh daemon down` on that machine. To remove a member from the mesh, use `claudemesh ban <peer>`.",
|
||||
);
|
||||
} else {
|
||||
render.ok(`Kicked ${peers.length} peer(s): ${peers.join(", ")}`);
|
||||
render.hint("Their Claude Code session ended. They can rejoin anytime by running `claudemesh`.");
|
||||
if (skipped.length > 0) {
|
||||
render.warn(
|
||||
`(also refused ${skipped.length} control-plane connection(s): ${skipped.join(", ")})`,
|
||||
"Daemon / dashboard connections auto-reconnect; kick is a no-op against them. Use `claudemesh ban <peer>` to remove a member entirely.",
|
||||
);
|
||||
}
|
||||
}
|
||||
return EXIT.SUCCESS;
|
||||
});
|
||||
|
||||
@@ -63,6 +63,7 @@ async function ensureDaemonRunning(meshSlug: string, quiet: boolean): Promise<vo
|
||||
const res = await ensureDaemonReady({ budgetMs: 10_000, mesh: meshSlug });
|
||||
if (res.state === "up") {
|
||||
if (!quiet) render.ok("daemon already running");
|
||||
await warnIfDaemonStale(quiet);
|
||||
return;
|
||||
}
|
||||
if (res.state === "started") {
|
||||
@@ -71,10 +72,34 @@ async function ensureDaemonRunning(meshSlug: string, quiet: boolean): Promise<vo
|
||||
}
|
||||
render.warn(
|
||||
`daemon ${res.state}${res.reason ? `: ${res.reason}` : ""}`,
|
||||
"Run `claudemesh daemon up --mesh " + meshSlug + "` manually, then re-launch.",
|
||||
"Run `claudemesh daemon up` manually, then re-launch.",
|
||||
);
|
||||
}
|
||||
|
||||
/** 1.34.9: warn when the running daemon's version doesn't match the CLI
|
||||
* that's about to launch a session. `npm i -g claudemesh-cli` upgrades
|
||||
* the binaries on disk but doesn't restart a launchd / systemd-user
|
||||
* service or a foreground `claudemesh daemon up`, so users routinely
|
||||
* ship a fix to the CLI side and never see it because the WS lifecycle,
|
||||
* echo guards, and self-join filters all live in the long-running
|
||||
* daemon process. We probe `/v1/version` and emit a one-shot stderr
|
||||
* warning when CLI ≠ daemon. Best-effort; failures are silent. */
|
||||
async function warnIfDaemonStale(quiet: boolean): Promise<void> {
|
||||
if (quiet) return;
|
||||
try {
|
||||
const { ipc } = await import("~/daemon/ipc/client.js");
|
||||
const { VERSION } = await import("~/constants/urls.js");
|
||||
const res = await ipc<{ daemon_version?: string }>({ path: "/v1/version", timeoutMs: 1_500 });
|
||||
if (res.status !== 200) return;
|
||||
const daemonVersion = res.body.daemon_version ?? "";
|
||||
if (!daemonVersion || daemonVersion === VERSION) return;
|
||||
render.warn(
|
||||
`daemon is ${daemonVersion}, CLI is ${VERSION} — restart to pick up new fixes.`,
|
||||
"Run: `claudemesh daemon down && claudemesh daemon up` (no --mesh — daemon attaches to every joined mesh; restart the launchd / systemd-user unit if you installed one).",
|
||||
);
|
||||
} catch { /* swallow — version probe is best-effort */ }
|
||||
}
|
||||
|
||||
async function pickMesh(meshes: JoinedMesh[]): Promise<JoinedMesh> {
|
||||
if (meshes.length === 1) return meshes[0]!;
|
||||
|
||||
@@ -349,6 +374,66 @@ async function runLaunchWizard(opts: {
|
||||
return { mesh, role, groups, messageMode, skipPermissions };
|
||||
}
|
||||
|
||||
/**
|
||||
* 1.32.0 — broker welcome line printed right after the launch banner.
|
||||
* Polls the daemon's /v1/health (per-mesh broker WS state) and tries
|
||||
* to fetch the inbox + peer count via daemon-route helpers. Best-effort:
|
||||
* if any call fails the welcome simply prints what it knows and moves
|
||||
* on — never blocks the launch path.
|
||||
*/
|
||||
async function printBrokerWelcome(meshSlug: string): Promise<void> {
|
||||
const useColor = !process.env.NO_COLOR && process.env.TERM !== "dumb" && process.stdout.isTTY;
|
||||
const dim = (s: string): string => (useColor ? `\x1b[2m${s}\x1b[22m` : s);
|
||||
const green = (s: string): string => (useColor ? `\x1b[32m${s}\x1b[22m` : s);
|
||||
const yellow = (s: string): string => (useColor ? `\x1b[33m${s}\x1b[22m` : s);
|
||||
|
||||
// Probe daemon health for broker WS state.
|
||||
let brokerState = "unknown";
|
||||
try {
|
||||
const { ipc } = await import("~/daemon/ipc/client.js");
|
||||
const res = await ipc<{ ok?: boolean; brokers?: Record<string, string> }>({
|
||||
path: "/v1/health",
|
||||
timeoutMs: 1_500,
|
||||
});
|
||||
if (res.status === 200 && res.body?.brokers) {
|
||||
brokerState = res.body.brokers[meshSlug] ?? "unknown";
|
||||
}
|
||||
} catch { /* daemon unreachable — not fatal */ }
|
||||
|
||||
// Peer count (best-effort). 1.34.15: scope to the launched mesh so
|
||||
// multi-mesh daemons don't inflate the welcome banner with peers
|
||||
// from other meshes the user didn't just attach to.
|
||||
let peerCount = -1;
|
||||
try {
|
||||
const { tryListPeersViaDaemon } = await import("~/services/bridge/daemon-route.js");
|
||||
const peers = (await tryListPeersViaDaemon(meshSlug)) ?? [];
|
||||
peerCount = peers.filter((p) =>
|
||||
(p as { channel?: string }).channel !== "claudemesh-daemon",
|
||||
).length;
|
||||
} catch { /* skip peer count */ }
|
||||
|
||||
// Unread inbox count (best-effort).
|
||||
let unread = -1;
|
||||
try {
|
||||
const { ipc } = await import("~/daemon/ipc/client.js");
|
||||
const res = await ipc<{ messages?: unknown[] }>({
|
||||
path: "/v1/inbox",
|
||||
timeoutMs: 1_500,
|
||||
});
|
||||
if (res.status === 200 && Array.isArray(res.body?.messages)) {
|
||||
unread = res.body.messages.length;
|
||||
}
|
||||
} catch { /* skip unread */ }
|
||||
|
||||
const dot = brokerState === "open" ? green("●") : yellow("●");
|
||||
const parts: string[] = [];
|
||||
parts.push(`broker ${brokerState === "open" ? "connected" : brokerState}`);
|
||||
if (peerCount >= 0) parts.push(`${peerCount} peer${peerCount === 1 ? "" : "s"} online`);
|
||||
if (unread >= 0) parts.push(`${unread} unread`);
|
||||
console.log(`${dot} ${parts.join(dim(" · "))}`);
|
||||
console.log("");
|
||||
}
|
||||
|
||||
function printBanner(name: string, meshSlug: string, role: string | null, groups: GroupEntry[], messageMode: "push" | "inbox" | "off"): void {
|
||||
const useColor =
|
||||
!process.env.NO_COLOR && process.env.TERM !== "dumb" && process.stdout.isTTY;
|
||||
@@ -752,6 +837,10 @@ export async function runLaunch(flags: LaunchFlags, rawArgs: string[]): Promise<
|
||||
// 5. Print summary banner (wizard already handled all interactive config).
|
||||
if (!args.quiet) {
|
||||
printBanner(displayName, mesh.slug, role, parsedGroups, messageMode);
|
||||
// 1.32.0+: broker welcome — confirm the per-session WS is actually
|
||||
// attached and surface peer count + unread inbox so the user lands
|
||||
// in claude code with a clear state instead of silent assumptions.
|
||||
await printBrokerWelcome(mesh.slug);
|
||||
}
|
||||
|
||||
// --- Install native MCP entries for deployed mesh services ---
|
||||
|
||||
@@ -21,22 +21,72 @@ export interface PeersFlags {
|
||||
mesh?: string;
|
||||
/** `true`/`undefined` = full record; comma-separated string = field projection. */
|
||||
json?: boolean | string;
|
||||
/** When false (default), hide control-plane presence rows from the
|
||||
* human renderer — they're infrastructure (daemon-WS member-keyed
|
||||
* presence), not interactive peers, and confused users into thinking
|
||||
* the daemon counted as a "peer". The JSON output still includes them
|
||||
* so scripts that need a full inventory can opt in via --all (or
|
||||
* just consume JSON).
|
||||
*
|
||||
* Source of truth is the broker-side `role` field
|
||||
* (`'control-plane' | 'session' | 'service'`). Older brokers don't
|
||||
* emit `role` yet — this code falls back to treating missing role as
|
||||
* `'session'` so legacy peer rows stay visible. */
|
||||
all?: boolean;
|
||||
}
|
||||
|
||||
/**
|
||||
* Broker-emitted peer classification, added 2026-05-04. Older brokers
|
||||
* may omit it — treat missing as 'session' so legacy meshes still
|
||||
* render their peers (and don't accidentally hide them all). The CLI
|
||||
* never emits 'control-plane' on its own; that comes from the broker.
|
||||
*/
|
||||
export type PeerRole = "control-plane" | "session" | "service";
|
||||
|
||||
interface PeerRecord {
|
||||
pubkey: string;
|
||||
/** Stable member pubkey (independent of session). When sender shares
|
||||
* this with a peer, they're talking to the same person across all
|
||||
* their open sessions. */
|
||||
memberPubkey?: string;
|
||||
/** Per-launch session identifier (uuid). Used by the renderer to
|
||||
* disambiguate sibling sessions of the same member that otherwise
|
||||
* look identical (same name, same cwd). */
|
||||
sessionId?: string;
|
||||
displayName: string;
|
||||
status?: string;
|
||||
summary?: string;
|
||||
groups: Array<{ name: string; role?: string }>;
|
||||
/** Top-level convenience alias for `profile.role`, lifted by the CLI
|
||||
* since 1.31.5 so JSON consumers (the agent-vibes claudemesh skill,
|
||||
* launched-session LLMs) see the user-supplied role string at the
|
||||
* shape's top level. Same value as `profile.role`. Distinct from
|
||||
* `peerRole` below — that's the broker's presence-class taxonomy. */
|
||||
role?: string;
|
||||
/** Broker-emitted presence classification: 'control-plane' | 'session'
|
||||
* | 'service'. Source of truth for the --all visibility filter and
|
||||
* the default-hide rule. Older brokers omit this; the CLI fills
|
||||
* missing values with 'session' so legacy peer rows stay visible.
|
||||
*
|
||||
* Renamed from `role` to avoid collision with 1.31.5's profile.role
|
||||
* lift above. Wire-level field on the broker is also `peerRole`. */
|
||||
peerRole?: PeerRole;
|
||||
peerType?: string;
|
||||
channel?: string;
|
||||
model?: string;
|
||||
cwd?: string;
|
||||
/** Peer-level profile metadata (set via `claudemesh profile`). The
|
||||
* broker passes this through verbatim; the most common field is
|
||||
* `role` ("lead", "reviewer", "human", etc.) but capabilities, bio,
|
||||
* avatar, and title also live here when set. */
|
||||
profile?: {
|
||||
role?: string;
|
||||
title?: string;
|
||||
bio?: string;
|
||||
avatar?: string;
|
||||
capabilities?: string[];
|
||||
[k: string]: unknown;
|
||||
};
|
||||
/** True when this peer is one of the caller's own member's sessions.
|
||||
* Set in the cli (not the broker) by comparing memberPubkey against
|
||||
* the caller's stable JoinedMesh.pubkey. */
|
||||
@@ -66,16 +116,38 @@ async function listPeersForMesh(slug: string): Promise<PeerRecord[]> {
|
||||
const joined = config.meshes.find((m) => m.slug === slug);
|
||||
const selfMemberPubkey = joined?.pubkey ?? null;
|
||||
|
||||
// Resolve our own session pubkey via the daemon's /v1/sessions/me when
|
||||
// we're inside a launched session. Without this, isThisSession can't
|
||||
// be set on the daemon path (only on the cold path where a fresh WS
|
||||
// creates the keypair), and the renderer can't tell the user which
|
||||
// row in `peer list` is them.
|
||||
let selfSessionPubkey: string | null = null;
|
||||
try {
|
||||
const { getSessionInfo } = await import("~/services/session/resolve.js");
|
||||
const sess = await getSessionInfo();
|
||||
if (sess && sess.mesh === slug && sess.presence?.sessionPubkey) {
|
||||
selfSessionPubkey = sess.presence.sessionPubkey;
|
||||
}
|
||||
} catch { /* not in a launched session; isThisSession stays false */ }
|
||||
|
||||
// Daemon path — preferred when running. Same routing pattern as send.ts:
|
||||
// ~1 ms IPC round-trip; broker WS already warm in the daemon. The
|
||||
// lifecycle helper inside tryListPeersViaDaemon auto-spawns the
|
||||
// daemon if it's down and probes it for liveness — no separate bridge
|
||||
// tier is needed any more (1.28.0).
|
||||
//
|
||||
// 1.34.15: forward `slug` to the daemon as `?mesh=<slug>` so the
|
||||
// server-side aggregator narrows to the requested mesh. Pre-1.34.15
|
||||
// we called this with no argument, so a multi-mesh daemon returned
|
||||
// peers from every attached mesh and the renderer printed "peers on
|
||||
// flexicar" with cross-mesh rows mixed in. The daemon's
|
||||
// `meshFromCtx` already does the right scoping when the slug is
|
||||
// passed; the CLI just wasn't passing it.
|
||||
try {
|
||||
const { tryListPeersViaDaemon } = await import("~/services/bridge/daemon-route.js");
|
||||
const dr = await tryListPeersViaDaemon();
|
||||
const dr = await tryListPeersViaDaemon(slug);
|
||||
if (dr !== null) {
|
||||
return dr.map((p) => annotateSelf(p as PeerRecord, selfMemberPubkey, null));
|
||||
return dr.map((p) => annotateSelf(p as PeerRecord, selfMemberPubkey, selfSessionPubkey));
|
||||
}
|
||||
} catch { /* daemon route helper not available; fall through */ }
|
||||
|
||||
@@ -98,6 +170,15 @@ async function listPeersForMesh(slug: string): Promise<PeerRecord[]> {
|
||||
* tell sender's own sessions from real peers. The broker has always
|
||||
* surfaced a sender's siblings as separate rows because they're separate
|
||||
* presence rows; the cli just hadn't been making that visible.
|
||||
*
|
||||
* Also normalizes the broker's `peerRole` classification: missing
|
||||
* values (older brokers) default to 'session' so legacy peer rows stay
|
||||
* visible under the default `--all=false` filter.
|
||||
*
|
||||
* And lifts `profile.role` to a top-level `role` field — the 1.31.5
|
||||
* convenience alias for JSON consumers (skill SKILL.md, launched-session
|
||||
* LLMs, jq pipelines). Same value as profile.role; distinct from
|
||||
* peerRole (presence taxonomy).
|
||||
*/
|
||||
function annotateSelf(
|
||||
peer: PeerRecord,
|
||||
@@ -114,7 +195,15 @@ function annotateSelf(
|
||||
selfSessionPubkey &&
|
||||
peer.pubkey === selfSessionPubkey
|
||||
);
|
||||
return { ...peer, isSelf, isThisSession };
|
||||
const peerRole: PeerRole = peer.peerRole ?? "session";
|
||||
const profileRole = peer.profile?.role?.trim() || undefined;
|
||||
return {
|
||||
...peer,
|
||||
...(profileRole ? { role: profileRole } : {}),
|
||||
peerRole,
|
||||
isSelf,
|
||||
isThisSession,
|
||||
};
|
||||
}
|
||||
|
||||
export async function runPeers(flags: PeersFlags): Promise<void> {
|
||||
@@ -160,21 +249,41 @@ export async function runPeers(flags: PeersFlags): Promise<void> {
|
||||
continue;
|
||||
}
|
||||
|
||||
render.section(`peers on ${slug} (${peers.length})`);
|
||||
// Hide control-plane rows by default — they're infrastructure
|
||||
// (daemon-WS member-keyed presence), not interactive peers, and
|
||||
// they confused users into thinking the daemon counted as a
|
||||
// separate peer. --all opts back in for debugging.
|
||||
//
|
||||
// Source of truth: broker-emitted `peerRole` field (added
|
||||
// 2026-05-04). annotateSelf() filled in 'session' for older
|
||||
// brokers that don't emit peerRole yet, so this filter is
|
||||
// backwards-compatible by construction — legacy rows show up.
|
||||
const visible = flags.all
|
||||
? peers
|
||||
: peers.filter((p) => p.peerRole !== "control-plane");
|
||||
|
||||
if (peers.length === 0) {
|
||||
// Sort: this-session first, then your-other-sessions, then real
|
||||
// peers. Within each group, idle/working ahead of dnd. Inside the
|
||||
// groups, leave broker order. The point is: when you run peer
|
||||
// list, the row that's YOU is row 1.
|
||||
const sorted = visible.slice().sort((a, b) => {
|
||||
const score = (p: PeerRecord) =>
|
||||
p.isThisSession ? 0 : p.isSelf ? 1 : 2;
|
||||
return score(a) - score(b);
|
||||
});
|
||||
|
||||
const hiddenControlPlane = peers.length - visible.length;
|
||||
const header = hiddenControlPlane > 0
|
||||
? `peers on ${slug} (${sorted.length}, ${hiddenControlPlane} control-plane hidden — use --all)`
|
||||
: `peers on ${slug} (${sorted.length})`;
|
||||
render.section(header);
|
||||
|
||||
if (sorted.length === 0) {
|
||||
render.info(dim(" (no peers connected)"));
|
||||
continue;
|
||||
}
|
||||
|
||||
for (const p of peers) {
|
||||
const groups = p.groups.length
|
||||
? " [" +
|
||||
p.groups
|
||||
.map((g) => `@${g.name}${g.role ? `:${g.role}` : ""}`)
|
||||
.join(", ") +
|
||||
"]"
|
||||
: "";
|
||||
for (const p of sorted) {
|
||||
const statusDot = p.status === "working" ? yellow("●") : green("●");
|
||||
const name = bold(p.displayName);
|
||||
const meta: string[] = [];
|
||||
@@ -184,15 +293,46 @@ export async function runPeers(flags: PeersFlags): Promise<void> {
|
||||
const metaStr = meta.length ? dim(` (${meta.join(", ")})`) : "";
|
||||
const summary = p.summary ? dim(` — ${p.summary}`) : "";
|
||||
const pubkeyTag = dim(` · ${p.pubkey.slice(0, 16)}…`);
|
||||
// Short sessionId tag — appears for sibling sessions of the same
|
||||
// member that would otherwise be visually identical (same name,
|
||||
// same cwd, only the truncated pubkey on the right differs).
|
||||
const sidTag = p.sessionId
|
||||
? dim(` · sid:${p.sessionId.slice(0, 8)}`)
|
||||
: "";
|
||||
const selfTag = p.isThisSession
|
||||
? dim(" ") + yellow("(this session)")
|
||||
: p.isSelf
|
||||
? dim(" ") + yellow("(your other session)")
|
||||
: "";
|
||||
|
||||
// Inline tags ("role:lead [@flexicar:reviewer, @oncall]") so the
|
||||
// first thing the user sees beside the name is the access /
|
||||
// affiliation context. Empty role + empty groups → omit the
|
||||
// bracket entirely (the dim summary line below carries the
|
||||
// explicit "(no role / no groups)" so JSON output is unaffected
|
||||
// and screen readers don't get spammed with literal "no").
|
||||
const inlineTags: string[] = [];
|
||||
const peerRole = p.profile?.role?.trim();
|
||||
if (peerRole) inlineTags.push(`role:${peerRole}`);
|
||||
if (p.groups.length) {
|
||||
inlineTags.push(
|
||||
...p.groups.map((g) => `@${g.name}${g.role ? `:${g.role}` : ""}`),
|
||||
);
|
||||
}
|
||||
const tagsStr = inlineTags.length ? " [" + inlineTags.join(", ") + "]" : "";
|
||||
|
||||
render.info(
|
||||
`${statusDot} ${name}${selfTag}${groups}${metaStr}${pubkeyTag}${summary}`,
|
||||
`${statusDot} ${name}${selfTag}${tagsStr}${metaStr}${pubkeyTag}${sidTag}${summary}`,
|
||||
);
|
||||
|
||||
// Second line: cwd + an explicit role/groups footer when both
|
||||
// are absent. Surfacing the absence is important — the previous
|
||||
// renderer hid it, so users couldn't tell "no role set" from
|
||||
// "the cli isn't showing roles".
|
||||
if (p.cwd) render.info(dim(` cwd: ${p.cwd}`));
|
||||
if (!peerRole && p.groups.length === 0) {
|
||||
render.info(dim(" role: (none) groups: (none)"));
|
||||
}
|
||||
}
|
||||
} catch (e) {
|
||||
render.err(`${slug}: ${e instanceof Error ? e.message : String(e)}`);
|
||||
|
||||
@@ -47,13 +47,71 @@ export async function runSend(flags: SendFlags, to: string, message: string): Pr
|
||||
flags.mesh ??
|
||||
(config.meshes.length === 1 ? config.meshes[0]!.slug : null);
|
||||
|
||||
// 1.31.6: hex-prefix resolution. If `to` looks like hex but isn't a
|
||||
// full 64-char pubkey, resolve it against the peer list and replace
|
||||
// it with the matching full pubkey. The broker stores `targetSpec`
|
||||
// verbatim and the drain query at apps/broker/src/broker.ts:2408
|
||||
// matches only on full pubkeys, so a 16-hex prefix would queue
|
||||
// successfully but never fetch — sender saw "sent", recipient saw
|
||||
// nothing. Resolving here makes the CLI's prefix UX work end-to-end
|
||||
// and surfaces ambiguous / unmatched prefixes with a clear error
|
||||
// instead of a silent drop.
|
||||
if (
|
||||
!to.startsWith("@") &&
|
||||
!to.startsWith("#") &&
|
||||
to !== "*" &&
|
||||
/^[0-9a-f]{4,63}$/i.test(to)
|
||||
) {
|
||||
try {
|
||||
const { tryListPeersViaDaemon } = await import("~/services/bridge/daemon-route.js");
|
||||
const peers = (await tryListPeersViaDaemon()) ?? [];
|
||||
const lower = to.toLowerCase();
|
||||
const matches = peers.filter((p) => {
|
||||
const pk = (p as { pubkey?: string }).pubkey ?? "";
|
||||
const mpk = (p as { memberPubkey?: string }).memberPubkey ?? "";
|
||||
return pk.toLowerCase().startsWith(lower) || mpk.toLowerCase().startsWith(lower);
|
||||
});
|
||||
if (matches.length === 0) {
|
||||
render.err(`No peer matches hex prefix "${to}".`);
|
||||
const names = peers
|
||||
.map((p) => (p as { displayName?: string }).displayName)
|
||||
.filter(Boolean)
|
||||
.join(", ");
|
||||
if (names) render.hint(`online: ${names}`);
|
||||
process.exit(1);
|
||||
}
|
||||
if (matches.length > 1) {
|
||||
const candidates = matches
|
||||
.map((p) => {
|
||||
const pk = (p as { pubkey?: string }).pubkey ?? "";
|
||||
const dn = (p as { displayName?: string }).displayName ?? "?";
|
||||
return `${dn} ${pk.slice(0, 16)}…`;
|
||||
})
|
||||
.join(", ");
|
||||
render.err(`Ambiguous hex prefix "${to}" — matches ${matches.length} peers.`);
|
||||
render.hint(`candidates: ${candidates}`);
|
||||
render.hint("Use a longer prefix or paste the full 64-char pubkey.");
|
||||
process.exit(1);
|
||||
}
|
||||
to = (matches[0] as { pubkey?: string }).pubkey ?? to;
|
||||
} catch {
|
||||
// Daemon unreachable — fall through; cold path will try a name
|
||||
// lookup and surface its own error if that also fails.
|
||||
}
|
||||
}
|
||||
|
||||
// Self-DM safety check: if target is a 64-char hex that matches the
|
||||
// caller's own member pubkey (or any of the caller's session/member
|
||||
// entries), refuse without --self. Catches the common pasted-from-
|
||||
// peer-list-not-realizing-it-was-mine footgun.
|
||||
if (!flags.self && meshSlug) {
|
||||
// caller's own member pubkey, refuse without --self. Catches the
|
||||
// common pasted-from-peer-list-not-realizing-it-was-mine footgun.
|
||||
// With --self, member-pubkey targeting fans out to every connected
|
||||
// sibling session of your member (the broker's drain only matches
|
||||
// exact session pubkeys, so we resolve here in the CLI).
|
||||
if (meshSlug) {
|
||||
const joined = config.meshes.find((m) => m.slug === meshSlug);
|
||||
if (joined && /^[0-9a-f]{64}$/i.test(to) && to.toLowerCase() === joined.pubkey.toLowerCase()) {
|
||||
const isOwnMemberKey =
|
||||
joined && /^[0-9a-f]{64}$/i.test(to) && to.toLowerCase() === joined.pubkey.toLowerCase();
|
||||
|
||||
if (isOwnMemberKey && !flags.self) {
|
||||
render.err(
|
||||
`Target "${to.slice(0, 16)}…" is your own member pubkey on mesh "${meshSlug}".`,
|
||||
);
|
||||
@@ -62,6 +120,68 @@ export async function runSend(flags: SendFlags, to: string, message: string): Pr
|
||||
);
|
||||
process.exit(1);
|
||||
}
|
||||
|
||||
if (isOwnMemberKey && flags.self) {
|
||||
// Member-pubkey fan-out: resolve to every connected sibling
|
||||
// session pubkey and send one message per recipient. Required
|
||||
// because the broker's drain query at apps/broker/src/broker.ts
|
||||
// matches target_spec only against full session pubkeys —
|
||||
// sending to a member pubkey would queue successfully but no
|
||||
// drain would fetch.
|
||||
try {
|
||||
const { tryListPeersViaDaemon } = await import("~/services/bridge/daemon-route.js");
|
||||
const { getSessionInfo } = await import("~/services/session/resolve.js");
|
||||
const peers = (await tryListPeersViaDaemon()) ?? [];
|
||||
const session = await getSessionInfo();
|
||||
const ownSessionPk = session?.presence?.sessionPubkey?.toLowerCase();
|
||||
const siblings = peers.filter((p) => {
|
||||
const r = p as { memberPubkey?: string; pubkey?: string; channel?: string };
|
||||
if (!r.pubkey) return false;
|
||||
if (ownSessionPk && r.pubkey.toLowerCase() === ownSessionPk) return false;
|
||||
if (r.channel === "claudemesh-daemon") return false;
|
||||
return r.memberPubkey?.toLowerCase() === to.toLowerCase();
|
||||
});
|
||||
if (siblings.length === 0) {
|
||||
render.err(`--self fan-out: no other sibling sessions of your member online.`);
|
||||
process.exit(1);
|
||||
}
|
||||
const results: Array<{ pubkey: string; ok: boolean; messageId?: string; error?: string }> = [];
|
||||
for (const peer of siblings) {
|
||||
const pk = (peer as { pubkey: string }).pubkey;
|
||||
const dr = await trySendViaDaemon({ to: pk, message, priority, expectedMesh: meshSlug ?? undefined });
|
||||
if (dr === null) {
|
||||
results.push({ pubkey: pk, ok: false, error: "daemon path unavailable" });
|
||||
continue;
|
||||
}
|
||||
if (dr.ok) {
|
||||
results.push({
|
||||
pubkey: pk,
|
||||
ok: true,
|
||||
...(dr.messageId ? { messageId: dr.messageId } : {}),
|
||||
});
|
||||
} else {
|
||||
results.push({ pubkey: pk, ok: false, error: dr.error });
|
||||
}
|
||||
}
|
||||
const okCount = results.filter((r) => r.ok).length;
|
||||
if (flags.json) {
|
||||
console.log(JSON.stringify({ ok: okCount > 0, fanout: results, via: "daemon" }));
|
||||
} else if (okCount === results.length) {
|
||||
render.ok(`fanned out to ${okCount} sibling session${okCount === 1 ? "" : "s"} (daemon)`);
|
||||
for (const r of results) render.info(dim(` → ${r.pubkey.slice(0, 16)}… ${r.messageId ? dim(r.messageId.slice(0, 8)) : ""}`));
|
||||
} else {
|
||||
render.warn(`fanned out: ${okCount}/${results.length} delivered`);
|
||||
for (const r of results) {
|
||||
const tag = r.ok ? "✔" : "✘";
|
||||
render.info(` ${tag} ${r.pubkey.slice(0, 16)}… ${r.error ? dim(`— ${r.error}`) : ""}`);
|
||||
}
|
||||
}
|
||||
return;
|
||||
} catch (e) {
|
||||
render.err(`--self fan-out failed: ${e instanceof Error ? e.message : String(e)}`);
|
||||
process.exit(1);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Daemon path — preferred when a long-lived daemon is local. UDS at
|
||||
|
||||
@@ -1,25 +1,51 @@
|
||||
import { whoAmI } from "~/services/auth/facade.js";
|
||||
import { getSessionInfo } from "~/services/session/resolve.js";
|
||||
import { render } from "~/ui/render.js";
|
||||
import { bold, clay, dim } from "~/ui/styles.js";
|
||||
import { bold, clay, dim, yellow } from "~/ui/styles.js";
|
||||
import { EXIT } from "~/constants/exit-codes.js";
|
||||
|
||||
export async function whoami(opts: { json?: boolean }): Promise<number> {
|
||||
const result = await whoAmI();
|
||||
// 1.32.0+: surface the calling session's identity when whoami is run
|
||||
// from inside a `claudemesh launch`-spawned shell. Previously the
|
||||
// command only reported web sign-in + local mesh memberships, and a
|
||||
// launched session had to dig env vars + parse config.json to figure
|
||||
// out its own session pubkey.
|
||||
const session = await getSessionInfo();
|
||||
|
||||
if (opts.json) {
|
||||
console.log(JSON.stringify({ schema_version: "1.0", ...result }, null, 2));
|
||||
return result.signed_in || result.local ? EXIT.SUCCESS : EXIT.AUTH_FAILED;
|
||||
console.log(JSON.stringify({ schema_version: "1.0", ...result, session }, null, 2));
|
||||
return result.signed_in || result.local || session ? EXIT.SUCCESS : EXIT.AUTH_FAILED;
|
||||
}
|
||||
|
||||
// Show whatever we have. Both the web session and the local mesh
|
||||
// config are independent surfaces of identity; suppress sections that
|
||||
// are empty.
|
||||
if (!result.signed_in && !result.local) {
|
||||
// Show whatever we have. Web session, local mesh config, and the
|
||||
// launched-session identity are three independent surfaces.
|
||||
if (!result.signed_in && !result.local && !session) {
|
||||
render.err("Not signed in", "Run `claudemesh login` to sign in or `claudemesh <invite>` to join.");
|
||||
return EXIT.AUTH_FAILED;
|
||||
}
|
||||
|
||||
render.section("whoami");
|
||||
|
||||
if (session) {
|
||||
const sessionPk = session.presence?.sessionPubkey;
|
||||
const groups = (session.groups ?? []).join(", ") || dim("(none)");
|
||||
render.kv([
|
||||
["this session", `${yellow(session.displayName)} on ${bold(session.mesh)}`],
|
||||
["session id", dim(session.sessionId)],
|
||||
...(sessionPk
|
||||
? [["session pubkey", dim(`${sessionPk.slice(0, 16)}… (full: ${sessionPk})`)] as [string, string]]
|
||||
: []),
|
||||
...(session.role
|
||||
? [["role", session.role] as [string, string]]
|
||||
: []),
|
||||
["groups", groups],
|
||||
...(session.cwd ? [["cwd", dim(session.cwd)] as [string, string]] : []),
|
||||
["pid", String(session.pid)],
|
||||
]);
|
||||
render.blank();
|
||||
}
|
||||
|
||||
if (result.signed_in) {
|
||||
render.kv([
|
||||
["user", `${bold(result.user!.display_name)} ${dim(`(${result.user!.email})`)}`],
|
||||
|
||||
@@ -1,10 +1,82 @@
|
||||
import { existsSync } from "node:fs";
|
||||
import { homedir } from "node:os";
|
||||
import { join } from "node:path";
|
||||
|
||||
const home = homedir();
|
||||
const DEFAULT_CONFIG_DIR = join(home, ".claudemesh");
|
||||
|
||||
/**
|
||||
* Resolve `CONFIG_DIR` once, with stale-env detection.
|
||||
*
|
||||
* `claudemesh launch` exposes `CLAUDEMESH_CONFIG_DIR=<tmpdir>` to its
|
||||
* spawned `claude` so the per-session mesh selection is isolated from
|
||||
* `~/.claudemesh/config.json`. The tmpdir is rmSync'd on launch exit.
|
||||
*
|
||||
* Footgun: if a `claudemesh` invocation INHERITS that env from an
|
||||
* already-launched (or previously-launched) session — e.g. a Bash tool
|
||||
* call inside Claude Code, or a tmux pane that captured the env via
|
||||
* `update-environment` — the inherited path may point at a tmpdir that
|
||||
* no longer exists. Pre-1.34.14 we silently used the dead path,
|
||||
* `readConfig()` came back empty, and the user saw "No meshes joined"
|
||||
* from an otherwise-working install.
|
||||
*
|
||||
* Resolution rules:
|
||||
* 1. No env var → `~/.claudemesh` (default).
|
||||
* 2. Env points at a dir containing `config.json` → trust it
|
||||
* (the legitimate per-session-launch case).
|
||||
* 3. Env set but stale (dir missing or no `config.json`) → warn
|
||||
* once on stderr (TTY-only) and fall back to `~/.claudemesh`.
|
||||
*
|
||||
* Memoized: resolves once on first access. Mid-process env mutations
|
||||
* are intentionally ignored — paths must stay stable across one CLI
|
||||
* invocation.
|
||||
*/
|
||||
let _resolvedConfigDir: string | null = null;
|
||||
let _warnedStaleEnv = false;
|
||||
|
||||
function resolveConfigDir(): string {
|
||||
if (_resolvedConfigDir !== null) return _resolvedConfigDir;
|
||||
const envDir = process.env.CLAUDEMESH_CONFIG_DIR;
|
||||
if (!envDir) {
|
||||
_resolvedConfigDir = DEFAULT_CONFIG_DIR;
|
||||
return DEFAULT_CONFIG_DIR;
|
||||
}
|
||||
// Trust the env when it resolves to a real directory. We check
|
||||
// the DIR (not `config.json`) because the legitimate "fresh launch
|
||||
// before any write" case has the dir but no config.json yet.
|
||||
// The stale signature we want to catch is `rmSync(tmpDir,
|
||||
// {recursive: true})` from the outer launch's cleanup — that
|
||||
// removes the directory entirely, so a missing dir is the
|
||||
// unambiguous "stale" signal.
|
||||
if (existsSync(envDir)) {
|
||||
_resolvedConfigDir = envDir;
|
||||
return envDir;
|
||||
}
|
||||
// Stale: env set but the dir is gone. Most likely the outer
|
||||
// launch's cleanup ran and we inherited its (now-dead) tmpdir
|
||||
// path. Fall back to default and warn the user once on stderr —
|
||||
// only when attached to a TTY, so non-interactive callers (CI,
|
||||
// MCP boot, scripts piping stdout) stay quiet.
|
||||
if (!_warnedStaleEnv && process.stderr.isTTY) {
|
||||
_warnedStaleEnv = true;
|
||||
const unsetHint =
|
||||
process.env.SHELL?.endsWith("fish")
|
||||
? "set -e CLAUDEMESH_CONFIG_DIR CLAUDEMESH_IPC_TOKEN_FILE"
|
||||
: "unset CLAUDEMESH_CONFIG_DIR CLAUDEMESH_IPC_TOKEN_FILE";
|
||||
process.stderr.write(
|
||||
`claudemesh: ignoring stale CLAUDEMESH_CONFIG_DIR=${envDir} (no config.json there); using ${DEFAULT_CONFIG_DIR}.\n`
|
||||
+ ` Hint: this is usually a leftover env from a previous \`claudemesh launch\`. Clean it with:\n`
|
||||
+ ` ${unsetHint}\n`,
|
||||
);
|
||||
}
|
||||
_resolvedConfigDir = DEFAULT_CONFIG_DIR;
|
||||
return DEFAULT_CONFIG_DIR;
|
||||
}
|
||||
|
||||
export const PATHS = {
|
||||
CONFIG_DIR: process.env.CLAUDEMESH_CONFIG_DIR || join(home, ".claudemesh"),
|
||||
get CONFIG_DIR() {
|
||||
return resolveConfigDir();
|
||||
},
|
||||
get CONFIG_FILE() {
|
||||
return join(this.CONFIG_DIR, "config.json");
|
||||
},
|
||||
@@ -20,3 +92,12 @@ export const PATHS = {
|
||||
CLAUDE_JSON: join(home, ".claude.json"),
|
||||
CLAUDE_SETTINGS: join(home, ".claude", "settings.json"),
|
||||
} as const;
|
||||
|
||||
/**
|
||||
* Test-only: reset the memoized resolution. Not exported from the
|
||||
* package barrel; reach in via the relative path from a test file.
|
||||
*/
|
||||
export function _resetPathsForTest(): void {
|
||||
_resolvedConfigDir = null;
|
||||
_warnedStaleEnv = false;
|
||||
}
|
||||
|
||||
@@ -7,13 +7,19 @@
|
||||
// - Wire envelope adds `client_message_id` (broker may ignore in legacy
|
||||
// mode; Sprint 7 promotes it to authoritative dedupe).
|
||||
// - Reconnect with exponential backoff, signaled to the drain worker.
|
||||
|
||||
import WebSocket from "ws";
|
||||
//
|
||||
// 2026-05-04: lifecycle (connect / hello-ack / close-reconnect) now
|
||||
// lives in `ws-lifecycle.ts`. This class supplies the daemon-WS hello
|
||||
// content and routes incoming RPC replies / pushes; the helper handles
|
||||
// the rest. The hello no longer carries an ephemeral `sessionPubkey` —
|
||||
// session-targeted DMs land on the per-session WS (SessionBrokerClient)
|
||||
// since 1.32.1, so this socket only needs the member identity.
|
||||
|
||||
import type { JoinedMesh } from "~/services/config/facade.js";
|
||||
import { signHello } from "~/services/broker/hello-sig.js";
|
||||
import { connectWsWithBackoff, type WsLifecycle, type WsStatus } from "./ws-lifecycle.js";
|
||||
|
||||
export type ConnStatus = "connecting" | "open" | "closed" | "reconnecting";
|
||||
export type ConnStatus = WsStatus;
|
||||
|
||||
export interface BrokerSendArgs {
|
||||
/** Target as the broker expects it: peer name | pubkey | @group | * | topic. */
|
||||
@@ -49,6 +55,8 @@ export interface PeerSummary {
|
||||
hostname?: string;
|
||||
peerType?: string;
|
||||
channel?: string;
|
||||
/** Broker-side classification, added 2026-05-04. Missing in older brokers. */
|
||||
role?: "control-plane" | "session" | "service";
|
||||
}
|
||||
|
||||
interface PendingPeerList {
|
||||
@@ -84,9 +92,7 @@ export interface MemoryRow {
|
||||
rememberedAt: string;
|
||||
}
|
||||
|
||||
const HELLO_ACK_TIMEOUT_MS = 5_000;
|
||||
const SEND_ACK_TIMEOUT_MS = 15_000;
|
||||
const BACKOFF_CAPS_MS = [1_000, 2_000, 4_000, 8_000, 16_000, 30_000];
|
||||
const SEND_ACK_TIMEOUT_MS = 15_000;
|
||||
|
||||
export interface DaemonBrokerOptions {
|
||||
displayName?: string;
|
||||
@@ -96,12 +102,9 @@ export interface DaemonBrokerOptions {
|
||||
}
|
||||
|
||||
export class DaemonBrokerClient {
|
||||
private ws: WebSocket | null = null;
|
||||
private lifecycle: WsLifecycle | null = null;
|
||||
private _status: ConnStatus = "closed";
|
||||
private closed = false;
|
||||
private reconnectAttempt = 0;
|
||||
private reconnectTimer: NodeJS.Timeout | null = null;
|
||||
private helloTimer: NodeJS.Timeout | null = null;
|
||||
private pendingAcks = new Map<string, PendingAck>();
|
||||
private peerListResolvers = new Map<string, PendingPeerList>();
|
||||
private skillListResolvers = new Map<string, { resolve: (rows: SkillSummary[]) => void; timer: NodeJS.Timeout }>();
|
||||
@@ -110,8 +113,6 @@ export class DaemonBrokerClient {
|
||||
private stateListResolvers = new Map<string, { resolve: (rows: StateRow[]) => void; timer: NodeJS.Timeout }>();
|
||||
private memoryStoreResolvers = new Map<string, { resolve: (id: string | null) => void; timer: NodeJS.Timeout }>();
|
||||
private memoryRecallResolvers = new Map<string, { resolve: (rows: MemoryRow[]) => void; timer: NodeJS.Timeout }>();
|
||||
private sessionPubkey: string | null = null;
|
||||
private sessionSecretKey: string | null = null;
|
||||
private opens: Array<() => void> = [];
|
||||
private reqCounter = 0;
|
||||
|
||||
@@ -125,198 +126,182 @@ export class DaemonBrokerClient {
|
||||
(this.opts.log ?? defaultLog)(level, msg, { mesh: this.mesh.slug, ...meta });
|
||||
};
|
||||
|
||||
private setConnStatus(s: ConnStatus) {
|
||||
if (this._status === s) return;
|
||||
this._status = s;
|
||||
this.opts.onStatusChange?.(s);
|
||||
}
|
||||
|
||||
/** Open the WS, run the hello handshake, resolve once the broker accepts. */
|
||||
async connect(): Promise<void> {
|
||||
if (this.closed) throw new Error("client_closed");
|
||||
if (this._status === "connecting" || this._status === "open") return;
|
||||
this.setConnStatus("connecting");
|
||||
|
||||
const ws = new WebSocket(this.mesh.brokerUrl);
|
||||
this.ws = ws;
|
||||
|
||||
return new Promise<void>((resolve, reject) => {
|
||||
ws.on("open", async () => {
|
||||
try {
|
||||
if (!this.sessionPubkey) {
|
||||
const { generateKeypair } = await import("~/services/crypto/facade.js");
|
||||
const kp = await generateKeypair();
|
||||
this.sessionPubkey = kp.publicKey;
|
||||
this.sessionSecretKey = kp.secretKey;
|
||||
}
|
||||
const { timestamp, signature } = await signHello(
|
||||
this.mesh.meshId, this.mesh.memberId, this.mesh.pubkey, this.mesh.secretKey,
|
||||
);
|
||||
ws.send(JSON.stringify({
|
||||
type: "hello",
|
||||
meshId: this.mesh.meshId,
|
||||
memberId: this.mesh.memberId,
|
||||
pubkey: this.mesh.pubkey,
|
||||
sessionPubkey: this.sessionPubkey,
|
||||
displayName: this.opts.displayName,
|
||||
sessionId: `daemon-${process.pid}`,
|
||||
pid: process.pid,
|
||||
cwd: process.cwd(),
|
||||
hostname: require("node:os").hostname(),
|
||||
peerType: "ai" as const,
|
||||
channel: "claudemesh-daemon",
|
||||
timestamp,
|
||||
signature,
|
||||
}));
|
||||
this.helloTimer = setTimeout(() => {
|
||||
this.log("warn", "broker_hello_ack_timeout");
|
||||
try { ws.close(); } catch { /* ignore */ }
|
||||
reject(new Error("hello_ack_timeout"));
|
||||
}, HELLO_ACK_TIMEOUT_MS);
|
||||
} catch (e) {
|
||||
reject(e instanceof Error ? e : new Error(String(e)));
|
||||
}
|
||||
});
|
||||
|
||||
ws.on("message", (raw) => {
|
||||
let msg: Record<string, unknown>;
|
||||
try { msg = JSON.parse(raw.toString()) as Record<string, unknown>; }
|
||||
catch { return; }
|
||||
|
||||
if (msg.type === "hello_ack") {
|
||||
if (this.helloTimer) clearTimeout(this.helloTimer);
|
||||
this.helloTimer = null;
|
||||
this.setConnStatus("open");
|
||||
this.reconnectAttempt = 0;
|
||||
// Flush deferred openers (drain worker, etc.)
|
||||
this.lifecycle = await connectWsWithBackoff({
|
||||
url: this.mesh.brokerUrl,
|
||||
buildHello: async () => {
|
||||
const { timestamp, signature } = await signHello(
|
||||
this.mesh.meshId, this.mesh.memberId, this.mesh.pubkey, this.mesh.secretKey,
|
||||
);
|
||||
return {
|
||||
type: "hello",
|
||||
meshId: this.mesh.meshId,
|
||||
memberId: this.mesh.memberId,
|
||||
pubkey: this.mesh.pubkey,
|
||||
// No `sessionPubkey` — daemon-WS is member-keyed only. The
|
||||
// per-session presence WS (SessionBrokerClient) carries the
|
||||
// ephemeral session pubkey. Spec §"Layer 1: Identity → Member identity".
|
||||
displayName: this.opts.displayName,
|
||||
sessionId: `daemon-${process.pid}`,
|
||||
pid: process.pid,
|
||||
cwd: process.cwd(),
|
||||
hostname: require("node:os").hostname(),
|
||||
peerType: "ai" as const,
|
||||
channel: "claudemesh-daemon",
|
||||
timestamp,
|
||||
signature,
|
||||
};
|
||||
},
|
||||
isHelloAck: (msg) => msg.type === "hello_ack",
|
||||
onMessage: (msg) => this.handleMessage(msg),
|
||||
onStatusChange: (s) => {
|
||||
this._status = s;
|
||||
this.opts.onStatusChange?.(s);
|
||||
if (s === "open") {
|
||||
// Flush deferred openers (drain worker, etc.).
|
||||
const queued = this.opens.slice();
|
||||
this.opens.length = 0;
|
||||
for (const fn of queued) { try { fn(); } catch (e) { this.log("warn", "open_handler_failed", { err: String(e) }); } }
|
||||
resolve();
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "ack") {
|
||||
// Broker shape: { type: "ack", id, messageId, queued, error? }
|
||||
const id = String(msg.id ?? "");
|
||||
const ack = this.pendingAcks.get(id);
|
||||
if (ack) {
|
||||
this.pendingAcks.delete(id);
|
||||
clearTimeout(ack.timer);
|
||||
if (typeof msg.error === "string" && msg.error.length > 0) {
|
||||
ack.resolve({ ok: false, error: msg.error, permanent: classifyPermanent(msg.error) });
|
||||
} else {
|
||||
ack.resolve({ ok: true, messageId: String(msg.messageId ?? id) });
|
||||
}
|
||||
for (const fn of queued) {
|
||||
try { fn(); } catch (e) { this.log("warn", "open_handler_failed", { err: String(e) }); }
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "peers_list") {
|
||||
const reqId = String(msg._reqId ?? "");
|
||||
const pending = this.peerListResolvers.get(reqId);
|
||||
if (pending) {
|
||||
this.peerListResolvers.delete(reqId);
|
||||
clearTimeout(pending.timer);
|
||||
pending.resolve(Array.isArray(msg.peers) ? (msg.peers as PeerSummary[]) : []);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "skill_list") {
|
||||
const reqId = String(msg._reqId ?? "");
|
||||
const pending = this.skillListResolvers.get(reqId);
|
||||
if (pending) {
|
||||
this.skillListResolvers.delete(reqId);
|
||||
clearTimeout(pending.timer);
|
||||
pending.resolve(Array.isArray(msg.skills) ? (msg.skills as SkillSummary[]) : []);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "skill_data") {
|
||||
const reqId = String(msg._reqId ?? "");
|
||||
const pending = this.skillDataResolvers.get(reqId);
|
||||
if (pending) {
|
||||
this.skillDataResolvers.delete(reqId);
|
||||
clearTimeout(pending.timer);
|
||||
pending.resolve((msg.skill as SkillFull) ?? null);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "state_value" || msg.type === "state_data") {
|
||||
const reqId = String(msg._reqId ?? "");
|
||||
const pending = this.stateGetResolvers.get(reqId);
|
||||
if (pending) {
|
||||
this.stateGetResolvers.delete(reqId);
|
||||
clearTimeout(pending.timer);
|
||||
pending.resolve((msg.state ?? msg.row ?? null) as StateRow | null);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "state_list") {
|
||||
const reqId = String(msg._reqId ?? "");
|
||||
const pending = this.stateListResolvers.get(reqId);
|
||||
if (pending) {
|
||||
this.stateListResolvers.delete(reqId);
|
||||
clearTimeout(pending.timer);
|
||||
pending.resolve(Array.isArray(msg.entries) ? (msg.entries as StateRow[]) : []);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "memory_stored") {
|
||||
const reqId = String(msg._reqId ?? "");
|
||||
const pending = this.memoryStoreResolvers.get(reqId);
|
||||
if (pending) {
|
||||
this.memoryStoreResolvers.delete(reqId);
|
||||
clearTimeout(pending.timer);
|
||||
pending.resolve(typeof msg.memoryId === "string" ? msg.memoryId : null);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "memory_recall_result") {
|
||||
const reqId = String(msg._reqId ?? "");
|
||||
const pending = this.memoryRecallResolvers.get(reqId);
|
||||
if (pending) {
|
||||
this.memoryRecallResolvers.delete(reqId);
|
||||
clearTimeout(pending.timer);
|
||||
pending.resolve(Array.isArray(msg.matches) ? (msg.matches as MemoryRow[]) : []);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "push" || msg.type === "inbound") {
|
||||
this.opts.onPush?.(msg);
|
||||
return;
|
||||
}
|
||||
});
|
||||
|
||||
ws.on("close", (code, reason) => {
|
||||
if (this.helloTimer) { clearTimeout(this.helloTimer); this.helloTimer = null; }
|
||||
this.failPendingAcks(`broker_disconnected_${code}`);
|
||||
if (this.closed) { this.setConnStatus("closed"); return; }
|
||||
this.setConnStatus("reconnecting");
|
||||
const wait = BACKOFF_CAPS_MS[Math.min(this.reconnectAttempt, BACKOFF_CAPS_MS.length - 1)] ?? 30_000;
|
||||
this.reconnectAttempt++;
|
||||
this.log("info", "broker_reconnect_scheduled", { wait_ms: wait, code, reason: reason.toString("utf8") });
|
||||
this.reconnectTimer = setTimeout(() => this.connect().catch((err) => this.log("warn", "broker_reconnect_failed", { err: String(err) })), wait);
|
||||
// First connection failure also rejects the original connect() promise.
|
||||
if (this._status === "connecting") reject(new Error(`closed_before_hello_${code}`));
|
||||
});
|
||||
|
||||
ws.on("error", (err) => this.log("warn", "broker_ws_error", { err: err.message }));
|
||||
},
|
||||
onBeforeReconnect: (code) => this.failPendingAcks(`broker_disconnected_${code}`),
|
||||
log: (level, msg, meta) => this.log(level, `broker_${msg}`, meta),
|
||||
});
|
||||
}
|
||||
|
||||
private handleMessage(msg: Record<string, unknown>): void {
|
||||
if (msg.type === "ack") {
|
||||
// Broker shape: { type: "ack", id, messageId, queued, error? }
|
||||
const id = String(msg.id ?? "");
|
||||
const ack = this.pendingAcks.get(id);
|
||||
if (ack) {
|
||||
this.pendingAcks.delete(id);
|
||||
clearTimeout(ack.timer);
|
||||
if (typeof msg.error === "string" && msg.error.length > 0) {
|
||||
ack.resolve({ ok: false, error: msg.error, permanent: classifyPermanent(msg.error) });
|
||||
} else {
|
||||
ack.resolve({ ok: true, messageId: String(msg.messageId ?? id) });
|
||||
}
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "peers_list") {
|
||||
const reqId = String(msg._reqId ?? "");
|
||||
const pending = this.peerListResolvers.get(reqId);
|
||||
if (pending) {
|
||||
this.peerListResolvers.delete(reqId);
|
||||
clearTimeout(pending.timer);
|
||||
pending.resolve(Array.isArray(msg.peers) ? (msg.peers as PeerSummary[]) : []);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "skill_list") {
|
||||
const reqId = String(msg._reqId ?? "");
|
||||
const pending = this.skillListResolvers.get(reqId);
|
||||
if (pending) {
|
||||
this.skillListResolvers.delete(reqId);
|
||||
clearTimeout(pending.timer);
|
||||
pending.resolve(Array.isArray(msg.skills) ? (msg.skills as SkillSummary[]) : []);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "skill_data") {
|
||||
const reqId = String(msg._reqId ?? "");
|
||||
const pending = this.skillDataResolvers.get(reqId);
|
||||
if (pending) {
|
||||
this.skillDataResolvers.delete(reqId);
|
||||
clearTimeout(pending.timer);
|
||||
pending.resolve((msg.skill as SkillFull) ?? null);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "state_value" || msg.type === "state_data") {
|
||||
const reqId = String(msg._reqId ?? "");
|
||||
const pending = this.stateGetResolvers.get(reqId);
|
||||
if (pending) {
|
||||
this.stateGetResolvers.delete(reqId);
|
||||
clearTimeout(pending.timer);
|
||||
pending.resolve((msg.state ?? msg.row ?? null) as StateRow | null);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "state_list") {
|
||||
const reqId = String(msg._reqId ?? "");
|
||||
const pending = this.stateListResolvers.get(reqId);
|
||||
if (pending) {
|
||||
this.stateListResolvers.delete(reqId);
|
||||
clearTimeout(pending.timer);
|
||||
pending.resolve(Array.isArray(msg.entries) ? (msg.entries as StateRow[]) : []);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "memory_stored") {
|
||||
const reqId = String(msg._reqId ?? "");
|
||||
const pending = this.memoryStoreResolvers.get(reqId);
|
||||
if (pending) {
|
||||
this.memoryStoreResolvers.delete(reqId);
|
||||
clearTimeout(pending.timer);
|
||||
pending.resolve(typeof msg.memoryId === "string" ? msg.memoryId : null);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "memory_recall_result") {
|
||||
const reqId = String(msg._reqId ?? "");
|
||||
const pending = this.memoryRecallResolvers.get(reqId);
|
||||
if (pending) {
|
||||
this.memoryRecallResolvers.delete(reqId);
|
||||
clearTimeout(pending.timer);
|
||||
pending.resolve(Array.isArray(msg.matches) ? (msg.matches as MemoryRow[]) : []);
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
if (msg.type === "push" || msg.type === "inbound") {
|
||||
this.opts.onPush?.(msg);
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
/** True when underlying socket is OPEN-ready for direct sends. */
|
||||
private isOpen(): boolean {
|
||||
const sock = this.lifecycle?.ws;
|
||||
return !!sock && sock.readyState === sock.OPEN;
|
||||
}
|
||||
|
||||
/** v2 agentic-comms (M1): send `client_ack` back to the broker after
|
||||
* successfully landing an inbound push in inbox.db. Broker uses the
|
||||
* ack to set `delivered_at` (atomic at-least-once). Best-effort —
|
||||
* if the WS isn't open, drop the ack; broker's 30s lease will
|
||||
* re-deliver. */
|
||||
sendClientAck(clientMessageId: string, brokerMessageId: string | null): void {
|
||||
if (!this.isOpen()) return;
|
||||
try {
|
||||
this.lifecycle!.send({
|
||||
type: "client_ack",
|
||||
clientMessageId,
|
||||
...(brokerMessageId ? { brokerMessageId } : {}),
|
||||
});
|
||||
} catch { /* drop; lease re-delivers */ }
|
||||
}
|
||||
|
||||
/** Send one outbox row. Resolves on broker ack/timeout. */
|
||||
send(req: BrokerSendArgs): Promise<BrokerSendResult> {
|
||||
return new Promise<BrokerSendResult>((resolve) => {
|
||||
const dispatch = () => {
|
||||
if (!this.ws || this.ws.readyState !== this.ws.OPEN) {
|
||||
if (!this.isOpen()) {
|
||||
resolve({ ok: false, error: "broker_not_open", permanent: false });
|
||||
return;
|
||||
}
|
||||
@@ -328,7 +313,7 @@ export class DaemonBrokerClient {
|
||||
}, SEND_ACK_TIMEOUT_MS);
|
||||
this.pendingAcks.set(id, { resolve, timer });
|
||||
try {
|
||||
this.ws.send(JSON.stringify({
|
||||
this.lifecycle!.send({
|
||||
type: "send",
|
||||
id, // legacy correlation id
|
||||
client_message_id: id, // forward-compat per spec §4.2
|
||||
@@ -337,7 +322,7 @@ export class DaemonBrokerClient {
|
||||
priority: req.priority,
|
||||
nonce: req.nonce,
|
||||
ciphertext: req.ciphertext,
|
||||
}));
|
||||
});
|
||||
} catch (e) {
|
||||
this.pendingAcks.delete(id);
|
||||
clearTimeout(timer);
|
||||
@@ -352,153 +337,149 @@ export class DaemonBrokerClient {
|
||||
|
||||
/** Ask the broker for the current peer list. */
|
||||
async listPeers(timeoutMs = 5_000): Promise<PeerSummary[]> {
|
||||
if (this._status !== "open" || !this.ws) return [];
|
||||
if (this._status !== "open" || !this.lifecycle) return [];
|
||||
return new Promise<PeerSummary[]>((resolve) => {
|
||||
const reqId = `pl-${++this.reqCounter}`;
|
||||
const timer = setTimeout(() => {
|
||||
if (this.peerListResolvers.delete(reqId)) resolve([]);
|
||||
}, timeoutMs);
|
||||
this.peerListResolvers.set(reqId, { resolve, timer });
|
||||
try { this.ws!.send(JSON.stringify({ type: "list_peers", _reqId: reqId })); }
|
||||
try { this.lifecycle!.send({ type: "list_peers", _reqId: reqId }); }
|
||||
catch { this.peerListResolvers.delete(reqId); clearTimeout(timer); resolve([]); }
|
||||
});
|
||||
}
|
||||
|
||||
/** List mesh-published skills. Empty array on disconnect / timeout. */
|
||||
async listSkills(query?: string, timeoutMs = 5_000): Promise<SkillSummary[]> {
|
||||
if (this._status !== "open" || !this.ws) return [];
|
||||
if (this._status !== "open" || !this.lifecycle) return [];
|
||||
return new Promise<SkillSummary[]>((resolve) => {
|
||||
const reqId = `sl-${++this.reqCounter}`;
|
||||
const timer = setTimeout(() => {
|
||||
if (this.skillListResolvers.delete(reqId)) resolve([]);
|
||||
}, timeoutMs);
|
||||
this.skillListResolvers.set(reqId, { resolve, timer });
|
||||
try { this.ws!.send(JSON.stringify({ type: "list_skills", query, _reqId: reqId })); }
|
||||
try { this.lifecycle!.send({ type: "list_skills", query, _reqId: reqId }); }
|
||||
catch { this.skillListResolvers.delete(reqId); clearTimeout(timer); resolve([]); }
|
||||
});
|
||||
}
|
||||
|
||||
/** Fetch one skill's full body. Null on not-found / disconnect / timeout. */
|
||||
async getSkill(name: string, timeoutMs = 5_000): Promise<SkillFull | null> {
|
||||
if (this._status !== "open" || !this.ws) return null;
|
||||
if (this._status !== "open" || !this.lifecycle) return null;
|
||||
return new Promise<SkillFull | null>((resolve) => {
|
||||
const reqId = `sg-${++this.reqCounter}`;
|
||||
const timer = setTimeout(() => {
|
||||
if (this.skillDataResolvers.delete(reqId)) resolve(null);
|
||||
}, timeoutMs);
|
||||
this.skillDataResolvers.set(reqId, { resolve, timer });
|
||||
try { this.ws!.send(JSON.stringify({ type: "get_skill", name, _reqId: reqId })); }
|
||||
try { this.lifecycle!.send({ type: "get_skill", name, _reqId: reqId }); }
|
||||
catch { this.skillDataResolvers.delete(reqId); clearTimeout(timer); resolve(null); }
|
||||
});
|
||||
}
|
||||
|
||||
/** Read a single shared state row. Null on disconnect / timeout / not-found. */
|
||||
async getState(key: string, timeoutMs = 5_000): Promise<StateRow | null> {
|
||||
if (this._status !== "open" || !this.ws) return null;
|
||||
if (this._status !== "open" || !this.lifecycle) return null;
|
||||
return new Promise<StateRow | null>((resolve) => {
|
||||
const reqId = `sg-${++this.reqCounter}`;
|
||||
const timer = setTimeout(() => {
|
||||
if (this.stateGetResolvers.delete(reqId)) resolve(null);
|
||||
}, timeoutMs);
|
||||
this.stateGetResolvers.set(reqId, { resolve, timer });
|
||||
try { this.ws!.send(JSON.stringify({ type: "get_state", key, _reqId: reqId })); }
|
||||
try { this.lifecycle!.send({ type: "get_state", key, _reqId: reqId }); }
|
||||
catch { this.stateGetResolvers.delete(reqId); clearTimeout(timer); resolve(null); }
|
||||
});
|
||||
}
|
||||
|
||||
/** List all shared state rows in the mesh. */
|
||||
async listState(timeoutMs = 5_000): Promise<StateRow[]> {
|
||||
if (this._status !== "open" || !this.ws) return [];
|
||||
if (this._status !== "open" || !this.lifecycle) return [];
|
||||
return new Promise<StateRow[]>((resolve) => {
|
||||
const reqId = `sl-${++this.reqCounter}`;
|
||||
const timer = setTimeout(() => {
|
||||
if (this.stateListResolvers.delete(reqId)) resolve([]);
|
||||
}, timeoutMs);
|
||||
this.stateListResolvers.set(reqId, { resolve, timer });
|
||||
try { this.ws!.send(JSON.stringify({ type: "list_state", _reqId: reqId })); }
|
||||
try { this.lifecycle!.send({ type: "list_state", _reqId: reqId }); }
|
||||
catch { this.stateListResolvers.delete(reqId); clearTimeout(timer); resolve([]); }
|
||||
});
|
||||
}
|
||||
|
||||
/** Set a shared state value. Fire-and-forget. */
|
||||
setState(key: string, value: unknown): void {
|
||||
if (this._status !== "open" || !this.ws) return;
|
||||
try { this.ws.send(JSON.stringify({ type: "set_state", key, value })); }
|
||||
if (this._status !== "open" || !this.lifecycle) return;
|
||||
try { this.lifecycle.send({ type: "set_state", key, value }); }
|
||||
catch { /* ignore */ }
|
||||
}
|
||||
|
||||
/** Store a memory in the mesh. Returns the assigned id, or null on timeout. */
|
||||
async remember(content: string, tags?: string[], timeoutMs = 5_000): Promise<string | null> {
|
||||
if (this._status !== "open" || !this.ws) return null;
|
||||
if (this._status !== "open" || !this.lifecycle) return null;
|
||||
return new Promise<string | null>((resolve) => {
|
||||
const reqId = `mr-${++this.reqCounter}`;
|
||||
const timer = setTimeout(() => {
|
||||
if (this.memoryStoreResolvers.delete(reqId)) resolve(null);
|
||||
}, timeoutMs);
|
||||
this.memoryStoreResolvers.set(reqId, { resolve, timer });
|
||||
try { this.ws!.send(JSON.stringify({ type: "remember", content, tags, _reqId: reqId })); }
|
||||
try { this.lifecycle!.send({ type: "remember", content, tags, _reqId: reqId }); }
|
||||
catch { this.memoryStoreResolvers.delete(reqId); clearTimeout(timer); resolve(null); }
|
||||
});
|
||||
}
|
||||
|
||||
/** Search memories by relevance. */
|
||||
async recall(query: string, timeoutMs = 5_000): Promise<MemoryRow[]> {
|
||||
if (this._status !== "open" || !this.ws) return [];
|
||||
if (this._status !== "open" || !this.lifecycle) return [];
|
||||
return new Promise<MemoryRow[]>((resolve) => {
|
||||
const reqId = `mc-${++this.reqCounter}`;
|
||||
const timer = setTimeout(() => {
|
||||
if (this.memoryRecallResolvers.delete(reqId)) resolve([]);
|
||||
}, timeoutMs);
|
||||
this.memoryRecallResolvers.set(reqId, { resolve, timer });
|
||||
try { this.ws!.send(JSON.stringify({ type: "recall", query, _reqId: reqId })); }
|
||||
try { this.lifecycle!.send({ type: "recall", query, _reqId: reqId }); }
|
||||
catch { this.memoryRecallResolvers.delete(reqId); clearTimeout(timer); resolve([]); }
|
||||
});
|
||||
}
|
||||
|
||||
/** Forget a memory by id. Fire-and-forget. */
|
||||
forget(memoryId: string): void {
|
||||
if (this._status !== "open" || !this.ws) return;
|
||||
try { this.ws.send(JSON.stringify({ type: "forget", memoryId })); }
|
||||
if (this._status !== "open" || !this.lifecycle) return;
|
||||
try { this.lifecycle.send({ type: "forget", memoryId }); }
|
||||
catch { /* ignore */ }
|
||||
}
|
||||
|
||||
/** Set the daemon's profile (avatar/title/bio/capabilities). Fire-and-forget. */
|
||||
setProfile(profile: { avatar?: string; title?: string; bio?: string; capabilities?: string[] }): void {
|
||||
if (this._status !== "open" || !this.ws) return;
|
||||
try { this.ws.send(JSON.stringify({ type: "set_profile", ...profile })); }
|
||||
if (this._status !== "open" || !this.lifecycle) return;
|
||||
try { this.lifecycle.send({ type: "set_profile", ...profile }); }
|
||||
catch { /* ignore */ }
|
||||
}
|
||||
|
||||
setSummary(summary: string): void {
|
||||
if (this._status !== "open" || !this.ws) return;
|
||||
try { this.ws.send(JSON.stringify({ type: "set_summary", summary })); }
|
||||
if (this._status !== "open" || !this.lifecycle) return;
|
||||
try { this.lifecycle.send({ type: "set_summary", summary }); }
|
||||
catch { /* ignore */ }
|
||||
}
|
||||
|
||||
setStatus(status: "idle" | "working" | "dnd"): void {
|
||||
if (this._status !== "open" || !this.ws) return;
|
||||
try { this.ws.send(JSON.stringify({ type: "set_status", status })); }
|
||||
if (this._status !== "open" || !this.lifecycle) return;
|
||||
try { this.lifecycle.send({ type: "set_status", status }); }
|
||||
catch { /* ignore */ }
|
||||
}
|
||||
|
||||
setVisible(visible: boolean): void {
|
||||
if (this._status !== "open" || !this.ws) return;
|
||||
try { this.ws.send(JSON.stringify({ type: "set_visible", visible })); }
|
||||
if (this._status !== "open" || !this.lifecycle) return;
|
||||
try { this.lifecycle.send({ type: "set_visible", visible }); }
|
||||
catch { /* ignore */ }
|
||||
}
|
||||
|
||||
async close(): Promise<void> {
|
||||
this.closed = true;
|
||||
if (this.reconnectTimer) { clearTimeout(this.reconnectTimer); this.reconnectTimer = null; }
|
||||
if (this.helloTimer) { clearTimeout(this.helloTimer); this.helloTimer = null; }
|
||||
this.failPendingAcks("daemon_shutdown");
|
||||
try { this.ws?.close(); } catch { /* ignore */ }
|
||||
this.setConnStatus("closed");
|
||||
}
|
||||
|
||||
getSessionKeys(): { sessionPubkey: string; sessionSecretKey: string } | null {
|
||||
if (!this.sessionPubkey || !this.sessionSecretKey) return null;
|
||||
return { sessionPubkey: this.sessionPubkey, sessionSecretKey: this.sessionSecretKey };
|
||||
if (this.lifecycle) {
|
||||
try { await this.lifecycle.close(); } catch { /* ignore */ }
|
||||
this.lifecycle = null;
|
||||
}
|
||||
this._status = "closed";
|
||||
}
|
||||
|
||||
private failPendingAcks(reason: string) {
|
||||
|
||||
@@ -15,6 +15,25 @@ export interface InboxRow {
|
||||
meta: string | null;
|
||||
received_at: number;
|
||||
reply_to_id: string | null;
|
||||
/** 1.34.8: Unix ms of when this row was first surfaced to the user
|
||||
* (returned by an interactive `inbox` listing or pushed via channel
|
||||
* reminder). NULL = never seen. Welcome filters on `seen_at IS NULL`
|
||||
* so freshly-launched sessions only see what they actually missed. */
|
||||
seen_at: number | null;
|
||||
/** 1.34.11: pubkey of the WS that received this push. Either the
|
||||
* daemon's member pubkey for member-keyed broadcasts, or one of
|
||||
* our session pubkeys for session-targeted DMs. Without this, two
|
||||
* sessions on the same daemon shared one inbox table and each saw
|
||||
* every other session's messages — same bug shape the 1.34.10 SSE
|
||||
* demux fixed for the live event path, just at the storage layer.
|
||||
* Pre-1.34.11 rows have NULL here and are visible to every session
|
||||
* on the same mesh (best-effort back-compat for already-stored
|
||||
* history). */
|
||||
recipient_pubkey: string | null;
|
||||
/** 1.34.11: matches `recipient_kind` on the bus event. "session" =
|
||||
* scoped to one session pubkey; "member" = visible to every
|
||||
* session of that member on the mesh. NULL on legacy rows. */
|
||||
recipient_kind: string | null;
|
||||
}
|
||||
|
||||
export function migrateInbox(db: SqliteDb): void {
|
||||
@@ -36,6 +55,24 @@ export function migrateInbox(db: SqliteDb): void {
|
||||
CREATE INDEX IF NOT EXISTS inbox_topic ON inbox(topic);
|
||||
CREATE INDEX IF NOT EXISTS inbox_sender ON inbox(sender_pubkey);
|
||||
`);
|
||||
// 1.34.8: read-state tracking. Pre-1.34.8 rows land with seen_at=NULL
|
||||
// (treated as unread); welcome surfaces them once and the listing
|
||||
// marks them seen. Indexed because welcome queries WHERE seen_at IS
|
||||
// NULL on every launch.
|
||||
const cols = db.prepare(`PRAGMA table_info(inbox)`).all<{ name: string }>();
|
||||
if (!cols.some((c) => c.name === "seen_at")) {
|
||||
db.exec(`ALTER TABLE inbox ADD COLUMN seen_at INTEGER`);
|
||||
db.exec(`CREATE INDEX IF NOT EXISTS inbox_seen_at ON inbox(seen_at)`);
|
||||
}
|
||||
// 1.34.11: per-recipient scoping. Two sessions on the same daemon
|
||||
// share one inbox table; without this column, listInbox returns
|
||||
// every row regardless of which session is asking. Indexed
|
||||
// because every interactive listing + welcome path filters by it.
|
||||
if (!cols.some((c) => c.name === "recipient_pubkey")) {
|
||||
db.exec(`ALTER TABLE inbox ADD COLUMN recipient_pubkey TEXT`);
|
||||
db.exec(`ALTER TABLE inbox ADD COLUMN recipient_kind TEXT`);
|
||||
db.exec(`CREATE INDEX IF NOT EXISTS inbox_recipient ON inbox(recipient_pubkey)`);
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
@@ -45,7 +82,14 @@ export function migrateInbox(db: SqliteDb): void {
|
||||
* Returns the new row id when this was a fresh insert, or null when the
|
||||
* message id was already known (idempotent receive).
|
||||
*/
|
||||
export function insertIfNew(db: SqliteDb, row: Omit<InboxRow, "id"> & { id: string }): string | null {
|
||||
export function insertIfNew(
|
||||
db: SqliteDb,
|
||||
// 1.34.8: callers don't pass `seen_at` — it's always NULL on insert
|
||||
// (a freshly-received row is by definition unread). Stripping the
|
||||
// field from the input type keeps inbound.ts callers from having to
|
||||
// construct it.
|
||||
row: Omit<InboxRow, "id" | "seen_at"> & { id: string },
|
||||
): string | null {
|
||||
// node:sqlite does support RETURNING. bun:sqlite does too. We branch on
|
||||
// the row count instead so it works on both.
|
||||
const before = db.prepare(`SELECT id FROM inbox WHERE client_message_id = ?`).get<{ id: string }>(row.client_message_id);
|
||||
@@ -53,12 +97,14 @@ export function insertIfNew(db: SqliteDb, row: Omit<InboxRow, "id"> & { id: stri
|
||||
db.prepare(`
|
||||
INSERT INTO inbox (
|
||||
id, client_message_id, broker_message_id, mesh, topic,
|
||||
sender_pubkey, sender_name, body, meta, received_at, reply_to_id
|
||||
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
sender_pubkey, sender_name, body, meta, received_at, reply_to_id,
|
||||
recipient_pubkey, recipient_kind
|
||||
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
|
||||
ON CONFLICT(client_message_id) DO NOTHING
|
||||
`).run(
|
||||
row.id, row.client_message_id, row.broker_message_id, row.mesh, row.topic,
|
||||
row.sender_pubkey, row.sender_name, row.body, row.meta, row.received_at, row.reply_to_id,
|
||||
row.recipient_pubkey, row.recipient_kind,
|
||||
);
|
||||
// Confirm the insert landed (handles the conflict-noop race).
|
||||
const after = db.prepare(`SELECT id FROM inbox WHERE client_message_id = ?`).get<{ id: string }>(row.client_message_id);
|
||||
@@ -69,6 +115,21 @@ export interface ListInboxParams {
|
||||
since?: number; // received_at >= since
|
||||
topic?: string;
|
||||
fromPubkey?: string;
|
||||
/** 1.34.0: filter by mesh slug. Omit to return rows across all meshes. */
|
||||
mesh?: string;
|
||||
/** 1.34.8: only rows with `seen_at IS NULL`. Used by the welcome
|
||||
* push so a freshly-launched session surfaces what it actually
|
||||
* missed instead of every row from the last 24h. */
|
||||
unreadOnly?: boolean;
|
||||
/** 1.34.11: scope to rows whose recipient is this session pubkey,
|
||||
* PLUS member-keyed rows for the same member, PLUS legacy rows
|
||||
* with a NULL recipient (best-effort back-compat with pre-1.34.11
|
||||
* history). Set by the IPC `/v1/inbox` route from the bearer
|
||||
* session token; without it the listing returns everything.
|
||||
* `recipientMemberPubkey` widens the match to include broadcasts
|
||||
* / member DMs that should reach every session of this member. */
|
||||
recipientPubkey?: string;
|
||||
recipientMemberPubkey?: string;
|
||||
limit?: number;
|
||||
}
|
||||
|
||||
@@ -78,9 +139,28 @@ export function listInbox(db: SqliteDb, p: ListInboxParams): InboxRow[] {
|
||||
if (p.since !== undefined) { where.push("received_at >= ?"); args.push(p.since); }
|
||||
if (p.topic !== undefined) { where.push("topic = ?"); args.push(p.topic); }
|
||||
if (p.fromPubkey !== undefined){ where.push("sender_pubkey = ?"); args.push(p.fromPubkey); }
|
||||
if (p.mesh !== undefined) { where.push("mesh = ?"); args.push(p.mesh); }
|
||||
if (p.unreadOnly === true) { where.push("seen_at IS NULL"); }
|
||||
// 1.34.11: recipient scoping. A session sees:
|
||||
// - rows whose recipient_pubkey === its session pubkey (its DMs),
|
||||
// - rows whose recipient_pubkey === the daemon's member pubkey
|
||||
// (broadcasts / member-keyed DMs to anyone in this member's
|
||||
// identity — every sibling session sees them),
|
||||
// - legacy rows where recipient_pubkey IS NULL (pre-1.34.11
|
||||
// history; we can't tell who they were for, so surface to all).
|
||||
if (p.recipientPubkey) {
|
||||
const ors: string[] = ["recipient_pubkey IS NULL", "recipient_pubkey = ?"];
|
||||
args.push(p.recipientPubkey);
|
||||
if (p.recipientMemberPubkey) {
|
||||
ors.push("recipient_pubkey = ?");
|
||||
args.push(p.recipientMemberPubkey);
|
||||
}
|
||||
where.push(`(${ors.join(" OR ")})`);
|
||||
}
|
||||
const sql = `
|
||||
SELECT id, client_message_id, broker_message_id, mesh, topic,
|
||||
sender_pubkey, sender_name, body, meta, received_at, reply_to_id
|
||||
sender_pubkey, sender_name, body, meta, received_at, reply_to_id, seen_at,
|
||||
recipient_pubkey, recipient_kind
|
||||
FROM inbox
|
||||
${where.length ? "WHERE " + where.join(" AND ") : ""}
|
||||
ORDER BY received_at DESC
|
||||
@@ -89,3 +169,57 @@ export function listInbox(db: SqliteDb, p: ListInboxParams): InboxRow[] {
|
||||
args.push(Math.min(Math.max(p.limit ?? 100, 1), 1000));
|
||||
return db.prepare(sql).all<InboxRow>(...args);
|
||||
}
|
||||
|
||||
/** 1.34.8: stamp `seen_at = now` on every row whose id is in `ids`,
|
||||
* but only when `seen_at IS NULL` so re-marking doesn't bump the
|
||||
* timestamp on a row the user already knew about. Returns the number
|
||||
* of rows that flipped from unread → seen. Used by:
|
||||
* - the IPC `/v1/inbox` route when called by an interactive
|
||||
* listing (the daemon stamps after returning rows so the human
|
||||
* who just looked at their inbox doesn't see the same rows
|
||||
* flagged "unread" on next launch);
|
||||
* - the MCP server when the SSE message event surfaces a live
|
||||
* `<channel>` reminder (Claude Code already saw the row inline,
|
||||
* no need to surface it again on welcome). */
|
||||
export function markInboxSeen(db: SqliteDb, ids: readonly string[], now = Date.now()): number {
|
||||
if (ids.length === 0) return 0;
|
||||
const placeholders = ids.map(() => "?").join(",");
|
||||
const r = db.prepare(
|
||||
`UPDATE inbox SET seen_at = ? WHERE seen_at IS NULL AND id IN (${placeholders})`,
|
||||
).run(now, ...ids);
|
||||
return Number(r.changes);
|
||||
}
|
||||
|
||||
/** 1.34.8: TTL prune. Removes inbox rows older than `cutoffMs`
|
||||
* (received_at < cutoffMs). Daemon schedules this hourly with a 30-day
|
||||
* default retention (see startInboxPruner). Returns the number of
|
||||
* rows removed so the caller can log the volume. */
|
||||
export function pruneInboxBefore(db: SqliteDb, cutoffMs: number): number {
|
||||
const r = db.prepare(`DELETE FROM inbox WHERE received_at < ?`).run(cutoffMs);
|
||||
return Number(r.changes);
|
||||
}
|
||||
|
||||
/** 1.34.7: delete a single inbox row by id. Returns true iff a row was
|
||||
* removed. The CLI exposes this as `claudemesh inbox delete <id>`. */
|
||||
export function deleteInboxRow(db: SqliteDb, id: string): boolean {
|
||||
const r = db.prepare(`DELETE FROM inbox WHERE id = ?`).run(id);
|
||||
return Number(r.changes) > 0;
|
||||
}
|
||||
|
||||
/** 1.34.7: bulk delete with mesh / age filters. Returns the number of
|
||||
* rows removed. With no filter, deletes ALL rows on ALL meshes —
|
||||
* caller is expected to gate this behind a `--all` confirmation. */
|
||||
export interface FlushInboxParams {
|
||||
mesh?: string;
|
||||
/** Unix ms — delete rows received_at < before. */
|
||||
before?: number;
|
||||
}
|
||||
export function flushInbox(db: SqliteDb, p: FlushInboxParams): number {
|
||||
const where: string[] = [];
|
||||
const args: unknown[] = [];
|
||||
if (p.mesh !== undefined) { where.push("mesh = ?"); args.push(p.mesh); }
|
||||
if (p.before !== undefined) { where.push("received_at < ?"); args.push(p.before); }
|
||||
const sql = `DELETE FROM inbox ${where.length ? "WHERE " + where.join(" AND ") : ""}`;
|
||||
const r = db.prepare(sql).run(...args);
|
||||
return Number(r.changes);
|
||||
}
|
||||
|
||||
@@ -26,6 +26,15 @@ export interface OutboxRow {
|
||||
nonce: string | null;
|
||||
ciphertext: string | null;
|
||||
priority: string | null;
|
||||
/**
|
||||
* 1.34.0: hex pubkey of the launched session that originated this row.
|
||||
* NULL when the send came from outside a registered session
|
||||
* (cold-path CLI, system-issued sends, etc.) — drain falls through to
|
||||
* the daemon-WS in that case. When set, drain prefers the matching
|
||||
* SessionBrokerClient so the broker fan-out attributes the push to
|
||||
* the session pubkey instead of the daemon's stable member pubkey.
|
||||
*/
|
||||
sender_session_pubkey: string | null;
|
||||
}
|
||||
|
||||
export function migrateOutbox(db: SqliteDb): void {
|
||||
@@ -68,6 +77,14 @@ export function migrateOutbox(db: SqliteDb): void {
|
||||
if (!hasNonce) db.exec(`ALTER TABLE outbox ADD COLUMN nonce TEXT`);
|
||||
if (!hasCiphertext) db.exec(`ALTER TABLE outbox ADD COLUMN ciphertext TEXT`);
|
||||
if (!hasPriority) db.exec(`ALTER TABLE outbox ADD COLUMN priority TEXT`);
|
||||
|
||||
// 1.34.0: per-row sender session pubkey, used by the drain worker to
|
||||
// route via the originating session's WS so broker fan-out attributes
|
||||
// the push to the session pubkey, not the daemon's member pubkey.
|
||||
// Pre-1.34.0 rows land with NULL — drain falls back to the daemon-WS
|
||||
// path (legacy attribution).
|
||||
const hasSenderSessionPk = columnExists(db, "outbox", "sender_session_pubkey");
|
||||
if (!hasSenderSessionPk) db.exec(`ALTER TABLE outbox ADD COLUMN sender_session_pubkey TEXT`);
|
||||
}
|
||||
|
||||
function columnExists(db: SqliteDb, table: string, column: string): boolean {
|
||||
@@ -80,7 +97,8 @@ export function findByClientId(db: SqliteDb, clientMessageId: string): OutboxRow
|
||||
SELECT id, client_message_id, request_fingerprint, payload, enqueued_at,
|
||||
attempts, next_attempt_at, status, last_error, delivered_at,
|
||||
broker_message_id, aborted_at, aborted_by, superseded_by,
|
||||
mesh, target_spec, nonce, ciphertext, priority
|
||||
mesh, target_spec, nonce, ciphertext, priority,
|
||||
sender_session_pubkey
|
||||
FROM outbox WHERE client_message_id = ?
|
||||
`).get<OutboxRow>(clientMessageId);
|
||||
return row ?? null;
|
||||
@@ -98,6 +116,9 @@ export interface InsertPendingInput {
|
||||
nonce?: string;
|
||||
ciphertext?: string;
|
||||
priority?: string;
|
||||
/** 1.34.0: hex pubkey of the originating session (omit for cold-path
|
||||
* CLI sends — drain will use the daemon-WS). */
|
||||
sender_session_pubkey?: string;
|
||||
}
|
||||
|
||||
export function insertPending(db: SqliteDb, input: InsertPendingInput): void {
|
||||
@@ -105,8 +126,9 @@ export function insertPending(db: SqliteDb, input: InsertPendingInput): void {
|
||||
INSERT INTO outbox (
|
||||
id, client_message_id, request_fingerprint, payload,
|
||||
enqueued_at, attempts, next_attempt_at, status,
|
||||
mesh, target_spec, nonce, ciphertext, priority
|
||||
) VALUES (?, ?, ?, ?, ?, 0, ?, 'pending', ?, ?, ?, ?, ?)
|
||||
mesh, target_spec, nonce, ciphertext, priority,
|
||||
sender_session_pubkey
|
||||
) VALUES (?, ?, ?, ?, ?, 0, ?, 'pending', ?, ?, ?, ?, ?, ?)
|
||||
`).run(
|
||||
input.id,
|
||||
input.client_message_id,
|
||||
@@ -114,11 +136,12 @@ export function insertPending(db: SqliteDb, input: InsertPendingInput): void {
|
||||
input.payload,
|
||||
input.now,
|
||||
input.now,
|
||||
input.mesh ?? null,
|
||||
input.target_spec ?? null,
|
||||
input.nonce ?? null,
|
||||
input.ciphertext ?? null,
|
||||
input.priority ?? null,
|
||||
input.mesh ?? null,
|
||||
input.target_spec ?? null,
|
||||
input.nonce ?? null,
|
||||
input.ciphertext ?? null,
|
||||
input.priority ?? null,
|
||||
input.sender_session_pubkey ?? null,
|
||||
);
|
||||
}
|
||||
|
||||
@@ -149,7 +172,8 @@ export function listOutbox(db: SqliteDb, p: ListOutboxParams = {}): OutboxRow[]
|
||||
SELECT id, client_message_id, request_fingerprint, payload, enqueued_at,
|
||||
attempts, next_attempt_at, status, last_error, delivered_at,
|
||||
broker_message_id, aborted_at, aborted_by, superseded_by,
|
||||
mesh, target_spec, nonce, ciphertext, priority
|
||||
mesh, target_spec, nonce, ciphertext, priority,
|
||||
sender_session_pubkey
|
||||
FROM outbox
|
||||
${where.length ? "WHERE " + where.join(" AND ") : ""}
|
||||
ORDER BY enqueued_at DESC
|
||||
@@ -164,7 +188,8 @@ export function findById(db: SqliteDb, id: string): OutboxRow | null {
|
||||
SELECT id, client_message_id, request_fingerprint, payload, enqueued_at,
|
||||
attempts, next_attempt_at, status, last_error, delivered_at,
|
||||
broker_message_id, aborted_at, aborted_by, superseded_by,
|
||||
mesh, target_spec, nonce, ciphertext, priority
|
||||
mesh, target_spec, nonce, ciphertext, priority,
|
||||
sender_session_pubkey
|
||||
FROM outbox WHERE id = ?
|
||||
`).get<OutboxRow>(id) ?? null;
|
||||
}
|
||||
|
||||
@@ -13,6 +13,7 @@
|
||||
|
||||
import type { SqliteDb } from "./db/sqlite.js";
|
||||
import type { DaemonBrokerClient } from "./broker.js";
|
||||
import type { SessionBrokerClient } from "./session-broker.js";
|
||||
import type { OutboxStatus } from "./db/outbox.js";
|
||||
|
||||
const POLL_INTERVAL_MS = 500;
|
||||
@@ -32,6 +33,10 @@ interface PendingRow {
|
||||
ciphertext: string | null;
|
||||
priority: string | null;
|
||||
mesh: string | null;
|
||||
/** 1.34.0: hex pubkey of the originating session — drain prefers
|
||||
* routing via that session's WS so broker fan-out attributes the
|
||||
* push to the session pubkey. NULL on cold-path / pre-1.34.0 rows. */
|
||||
sender_session_pubkey: string | null;
|
||||
}
|
||||
|
||||
export interface DrainOptions {
|
||||
@@ -40,6 +45,20 @@ export interface DrainOptions {
|
||||
* broker keyed by its `mesh` column. Single-mesh daemons pass a
|
||||
* Map of size 1; multi-mesh daemons pass one entry per joined mesh. */
|
||||
brokers: Map<string, DaemonBrokerClient>;
|
||||
/**
|
||||
* 1.34.0: lookup for the per-session WS keyed by hex session pubkey.
|
||||
* When an outbox row has `sender_session_pubkey` set and this lookup
|
||||
* returns an open client, the drain routes via the session-WS so the
|
||||
* broker fan-out attributes the push to the session pubkey instead
|
||||
* of the daemon's stable member pubkey.
|
||||
*
|
||||
* Returning `undefined` (or an unopened client) signals "no session
|
||||
* WS available" — the drain backs off and retries; it does NOT fall
|
||||
* back to the daemon-WS, because the row was encrypted with the
|
||||
* session secret and would fail to decrypt on the recipient side
|
||||
* if attribution silently changed mid-flight.
|
||||
*/
|
||||
getSessionBrokerByPubkey?: (sessionPubkey: string) => SessionBrokerClient | undefined;
|
||||
log?: (level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) => void;
|
||||
}
|
||||
|
||||
@@ -88,7 +107,8 @@ async function drainOnce(opts: DrainOptions, log: NonNullable<DrainOptions["log"
|
||||
const now = Date.now();
|
||||
const rows = opts.db.prepare(`
|
||||
SELECT id, client_message_id, request_fingerprint, payload, attempts,
|
||||
target_spec, nonce, ciphertext, priority, mesh
|
||||
target_spec, nonce, ciphertext, priority, mesh,
|
||||
sender_session_pubkey
|
||||
FROM outbox
|
||||
WHERE status = 'pending' AND next_attempt_at <= ?
|
||||
ORDER BY enqueued_at
|
||||
@@ -101,21 +121,34 @@ async function drainOnce(opts: DrainOptions, log: NonNullable<DrainOptions["log"
|
||||
if (markInflight(opts.db, row.id, now) === 0) continue; // raced with another drainer
|
||||
const fpHex = bufferToHex(row.request_fingerprint);
|
||||
|
||||
// v1.26.0: pick the broker keyed by the row's mesh. Legacy rows
|
||||
// (mesh=NULL) fall back to the only broker if there's exactly one;
|
||||
// otherwise mark dead because we don't know where to send them.
|
||||
let broker: DaemonBrokerClient | undefined;
|
||||
// v1.26.0: pick the daemon-WS broker keyed by the row's mesh.
|
||||
// Legacy rows (mesh=NULL) fall back to the only broker if there's
|
||||
// exactly one; otherwise mark dead because we don't know where to
|
||||
// send them.
|
||||
let daemonBroker: DaemonBrokerClient | undefined;
|
||||
if (row.mesh) {
|
||||
broker = opts.brokers.get(row.mesh);
|
||||
daemonBroker = opts.brokers.get(row.mesh);
|
||||
} else if (opts.brokers.size === 1) {
|
||||
broker = opts.brokers.values().next().value;
|
||||
daemonBroker = opts.brokers.values().next().value;
|
||||
}
|
||||
if (!broker) {
|
||||
if (!daemonBroker) {
|
||||
log("warn", "drain_no_broker_for_mesh", { id: row.id, mesh: row.mesh ?? "(null)" });
|
||||
markDead(opts.db, row.id, `no_broker_for_mesh:${row.mesh ?? "null"}`);
|
||||
continue;
|
||||
}
|
||||
|
||||
// 1.34.0: when the row was written by an authenticated session,
|
||||
// dispatch via the matching SessionBrokerClient so broker fan-out
|
||||
// attributes the push to the session pubkey. Encryption is
|
||||
// session-secret based on those rows, so we MUST NOT silently fall
|
||||
// back to the daemon-WS — the recipient's decrypt would fail. If
|
||||
// the session-WS is closed (reconnecting / session terminated), we
|
||||
// back off and retry.
|
||||
let sessionBroker: SessionBrokerClient | undefined;
|
||||
if (row.sender_session_pubkey && opts.getSessionBrokerByPubkey) {
|
||||
sessionBroker = opts.getSessionBrokerByPubkey(row.sender_session_pubkey);
|
||||
}
|
||||
|
||||
// Sprint 4: use the row's resolved target/ciphertext if present.
|
||||
// Legacy v0.9.0 rows (NULL on these columns) fall back to the
|
||||
// broadcast smoke-test shape so existing in-flight rows still drain.
|
||||
@@ -135,16 +168,31 @@ async function drainOnce(opts: DrainOptions, log: NonNullable<DrainOptions["log"
|
||||
priority = "next";
|
||||
}
|
||||
|
||||
const sendArgs = {
|
||||
targetSpec,
|
||||
priority,
|
||||
nonce,
|
||||
ciphertext,
|
||||
client_message_id: row.client_message_id,
|
||||
request_fingerprint_hex: fpHex,
|
||||
};
|
||||
|
||||
let res;
|
||||
try {
|
||||
res = await broker.send({
|
||||
targetSpec,
|
||||
priority,
|
||||
nonce,
|
||||
ciphertext,
|
||||
client_message_id: row.client_message_id,
|
||||
request_fingerprint_hex: fpHex,
|
||||
});
|
||||
if (row.sender_session_pubkey) {
|
||||
// Session-attributed row. Require an open session-WS — see comment
|
||||
// above on why we don't fall back to the daemon-WS.
|
||||
if (!sessionBroker || !sessionBroker.isOpen()) {
|
||||
log("info", "drain_session_ws_not_ready", {
|
||||
id: row.id, session_pubkey: row.sender_session_pubkey.slice(0, 12),
|
||||
});
|
||||
backoffPending(opts.db, row.id, row.attempts + 1, "session_ws_not_open", "session_ws_not_open");
|
||||
continue;
|
||||
}
|
||||
res = await sessionBroker.send(sendArgs);
|
||||
} else {
|
||||
res = await daemonBroker.send(sendArgs);
|
||||
}
|
||||
} catch (e) {
|
||||
log("warn", "drain_send_threw", { id: row.id, err: String(e) });
|
||||
backoffPending(opts.db, row.id, row.attempts + 1, "exception", String(e));
|
||||
|
||||
@@ -41,8 +41,68 @@ export function writeSse(res: ServerResponse, e: DaemonEvent, idCounter: number)
|
||||
res.write(`data: ${JSON.stringify({ ts: e.ts, ...e.data })}\n\n`);
|
||||
}
|
||||
|
||||
/** Open an SSE stream on the response and route bus events to it. */
|
||||
export function bindSseStream(res: ServerResponse, bus: EventBus): () => void {
|
||||
/** 1.34.10: per-subscriber demux options. The MCP server passes its
|
||||
* own session pubkey + member pubkey when binding so the bus only
|
||||
* sends events meant for that session. Without this, every MCP on a
|
||||
* multi-session daemon receives every inbox row and emits a
|
||||
* duplicate channel notification — manifests as session A seeing its
|
||||
* own outbound DM to B because B's session-WS published the row to
|
||||
* the shared bus. */
|
||||
export interface SseFilterOptions {
|
||||
/** Session pubkey the subscribing MCP serves. Events tagged
|
||||
* `recipient_kind: "session"` only flow when their
|
||||
* `recipient_pubkey` matches this. */
|
||||
sessionPubkey?: string;
|
||||
/** Daemon's member pubkey for this mesh. Events tagged
|
||||
* `recipient_kind: "member"` flow when their `recipient_pubkey`
|
||||
* matches — those are member-keyed broadcasts / DMs that should
|
||||
* reach every session of this member, but not OTHER members. */
|
||||
memberPubkey?: string;
|
||||
/** Mesh slug the subscriber is bound to (from session registry).
|
||||
* When set, system events (peer_join etc.) are filtered to this
|
||||
* mesh; without it every system event surfaces. */
|
||||
meshSlug?: string;
|
||||
}
|
||||
|
||||
function shouldDeliver(e: DaemonEvent, f: SseFilterOptions): boolean {
|
||||
// No filter set → legacy behavior: deliver everything (used by
|
||||
// diagnostic tooling like `claudemesh daemon events`).
|
||||
if (!f.sessionPubkey && !f.memberPubkey && !f.meshSlug) return true;
|
||||
|
||||
// Mesh scoping for events that carry a mesh slug. peer_join /
|
||||
// peer_leave / broker_status all carry `data.mesh`; if the
|
||||
// subscriber is bound to a specific mesh, drop events from other
|
||||
// meshes.
|
||||
if (f.meshSlug) {
|
||||
const eventMesh = typeof e.data.mesh === "string" ? e.data.mesh : null;
|
||||
if (eventMesh && eventMesh !== f.meshSlug) return false;
|
||||
}
|
||||
|
||||
// System events (peer_join etc.) flow to every session on the same
|
||||
// mesh — they're informational, not addressed.
|
||||
if (e.kind !== "message") return true;
|
||||
|
||||
const recipientKind = typeof e.data.recipient_kind === "string" ? e.data.recipient_kind : null;
|
||||
const recipientPubkey = typeof e.data.recipient_pubkey === "string" ? e.data.recipient_pubkey.toLowerCase() : null;
|
||||
|
||||
// Legacy publish without recipient context → everyone gets it. Keeps
|
||||
// backward compatibility with older daemon code paths until they're
|
||||
// migrated. Also covers test paths that don't thread context.
|
||||
if (!recipientKind || !recipientPubkey) return true;
|
||||
|
||||
if (recipientKind === "session") {
|
||||
return !!f.sessionPubkey && f.sessionPubkey.toLowerCase() === recipientPubkey;
|
||||
}
|
||||
if (recipientKind === "member") {
|
||||
return !!f.memberPubkey && f.memberPubkey.toLowerCase() === recipientPubkey;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
/** Open an SSE stream on the response and route bus events to it.
|
||||
* 1.34.10: optional `filter` scopes the stream to one session/member;
|
||||
* see SseFilterOptions. */
|
||||
export function bindSseStream(res: ServerResponse, bus: EventBus, filter: SseFilterOptions = {}): () => void {
|
||||
res.statusCode = 200;
|
||||
res.setHeader("Content-Type", "text/event-stream");
|
||||
res.setHeader("Cache-Control", "no-cache, no-transform");
|
||||
@@ -51,7 +111,10 @@ export function bindSseStream(res: ServerResponse, bus: EventBus): () => void {
|
||||
res.write(": connected\n\n");
|
||||
|
||||
let counter = 0;
|
||||
const unsubscribe = bus.subscribe((e) => writeSse(res, e, ++counter));
|
||||
const unsubscribe = bus.subscribe((e) => {
|
||||
if (!shouldDeliver(e, filter)) return;
|
||||
writeSse(res, e, ++counter);
|
||||
});
|
||||
|
||||
const heartbeat = setInterval(() => {
|
||||
try { res.write(": keepalive\n\n"); }
|
||||
|
||||
@@ -18,6 +18,37 @@ export interface InboundContext {
|
||||
/** Daemon's session secret key hex (rotates per connect). When the
|
||||
* sender encrypted to our session pubkey, decrypt with this instead. */
|
||||
sessionSecretKeyHex?: string;
|
||||
/** 1.34.10: recipient pubkey of the WS that received this push.
|
||||
* Either the daemon's member pubkey (member-WS) or one of our
|
||||
* session pubkeys (session-WS). Threaded through to the bus event
|
||||
* so each MCP subscriber can filter to events meant for its own
|
||||
* session — without it, every MCP on the same daemon renders every
|
||||
* inbox row, which manifests as session A seeing its own outbound
|
||||
* to B (because A's MCP also picks up the bus event B's WS just
|
||||
* published). */
|
||||
recipientPubkey?: string;
|
||||
/** 1.34.10: kind of WS this push arrived on. "session" pushes only
|
||||
* surface to the matching session's MCP; "member" pushes surface to
|
||||
* every session on the same mesh (member-keyed broadcasts, member
|
||||
* DMs that don't have a session). */
|
||||
recipientKind?: "session" | "member";
|
||||
/** v2 agentic-comms (M1): emit `client_ack` back to the broker after
|
||||
* the message lands in inbox.db. Broker uses the ack to set
|
||||
* `delivered_at` (atomic at-least-once). Without it, the broker's
|
||||
* 30s lease expires and re-delivers — correct but noisy. The WS
|
||||
* client owns this callback because it's the one that owns the
|
||||
* socket; inbound.ts just signals "I accepted this id." */
|
||||
ackClientMessage?: (clientMessageId: string, brokerMessageId: string | null) => void;
|
||||
/** 1.34.9: drops system events (peer_joined / peer_left /
|
||||
* peer_returned) whose eventData.pubkey is one of our own. The broker
|
||||
* fans peer_joined to every OTHER connection in the mesh — but our
|
||||
* daemon's member-WS counts as "other" relative to our session-WS,
|
||||
* so without this filter the user sees `[system] Peer "<self>"
|
||||
* joined the mesh` every time their own session reconnects.
|
||||
* Implementation passes a closure that walks the live broker map
|
||||
* rather than a static set, so newly-spawned sessions are visible
|
||||
* immediately. */
|
||||
isOwnPubkey?: (pubkey: string) => boolean;
|
||||
log?: (level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) => void;
|
||||
}
|
||||
|
||||
@@ -31,10 +62,21 @@ export interface InboundContext {
|
||||
export async function handleBrokerPush(msg: Record<string, unknown>, ctx: InboundContext): Promise<void> {
|
||||
// System/topology pushes (peer_join, tick, …) — emit verbatim.
|
||||
if (msg.subtype === "system" && typeof msg.event === "string") {
|
||||
const eventData = (msg.eventData as Record<string, unknown> | undefined) ?? {};
|
||||
// 1.34.9: drop self-joins. The broker excludes the JOINING
|
||||
// connection from the fan-out, but our daemon owns multiple
|
||||
// connections per mesh (member-WS + N session-WSs), and each is a
|
||||
// distinct "other" from the broker's view — so a session's own
|
||||
// peer_joined arrives at the same daemon's member-WS and used to
|
||||
// surface as `[system] Peer "<self>" joined`. The session-WS path
|
||||
// already skips system events entirely (see session-broker.ts
|
||||
// 1.34.9), and this filter handles the member-WS path.
|
||||
const eventPubkey = typeof eventData.pubkey === "string" ? eventData.pubkey : "";
|
||||
if (eventPubkey && ctx.isOwnPubkey?.(eventPubkey)) return;
|
||||
ctx.bus.publish(mapSystemEventKind(msg.event), {
|
||||
mesh: ctx.meshSlug,
|
||||
event: msg.event,
|
||||
...(msg.eventData as Record<string, unknown> | undefined ?? {}),
|
||||
...eventData,
|
||||
});
|
||||
return;
|
||||
}
|
||||
@@ -71,8 +113,20 @@ export async function handleBrokerPush(msg: Record<string, unknown>, ctx: Inboun
|
||||
meta: createdAt ? JSON.stringify({ created_at: createdAt }) : null,
|
||||
received_at: Date.now(),
|
||||
reply_to_id: replyToId,
|
||||
// 1.34.11: persist the recipient context so /v1/inbox can scope
|
||||
// queries to the asking session. Mirrors the same fields on the
|
||||
// bus event added in 1.34.10. Falls back to NULL when the caller
|
||||
// didn't pass them (legacy paths, tests).
|
||||
recipient_pubkey: ctx.recipientPubkey ?? null,
|
||||
recipient_kind: ctx.recipientKind ?? null,
|
||||
});
|
||||
|
||||
// Whether the row was newly inserted or already existed (dedupe), the
|
||||
// broker still wants to know we received and processed this message —
|
||||
// ack regardless. Skipping ack on dedupe would leak: broker would
|
||||
// re-deliver after lease, and the receiver would re-dedupe forever.
|
||||
ctx.ackClientMessage?.(clientMessageId, brokerMessageId);
|
||||
|
||||
if (!inserted) return; // already had this id; no event
|
||||
|
||||
ctx.bus.publish("message", {
|
||||
@@ -89,6 +143,14 @@ export async function handleBrokerPush(msg: Record<string, unknown>, ctx: Inboun
|
||||
...(subtype ? { subtype } : {}),
|
||||
body,
|
||||
created_at: createdAt,
|
||||
// 1.34.10: per-recipient routing context. SSE subscribers (the
|
||||
// MCP servers that translate bus events into channel notifications)
|
||||
// use this to filter to events meant for their own session. Without
|
||||
// it, every MCP on the same daemon emits a channel push for every
|
||||
// inbox row, which means session A sees its own outbound to B
|
||||
// because B's session-WS published the inbox row to the shared bus.
|
||||
...(ctx.recipientPubkey ? { recipient_pubkey: ctx.recipientPubkey } : {}),
|
||||
...(ctx.recipientKind ? { recipient_kind: ctx.recipientKind } : {}),
|
||||
});
|
||||
}
|
||||
|
||||
|
||||
73
apps/cli/src/daemon/inbox-pruner.ts
Normal file
73
apps/cli/src/daemon/inbox-pruner.ts
Normal file
@@ -0,0 +1,73 @@
|
||||
// 1.34.8: TTL prune for inbox.db.
|
||||
//
|
||||
// The inbox grows monotonically — every received DM lands as a row and
|
||||
// nothing removes it except an explicit `claudemesh inbox flush`. For
|
||||
// chatty meshes that's tens of thousands of rows over a few weeks.
|
||||
// SQLite handles that volume fine, but the rows are sitting there
|
||||
// forever and `claudemesh inbox` queries get slower as the table grows.
|
||||
//
|
||||
// The pruner runs hourly inside the daemon process and deletes rows
|
||||
// whose received_at is older than `retentionMs`. Default is 30 days,
|
||||
// which is generous for the "I went on holiday and want to see what I
|
||||
// missed" case but won't carry old rows into next year.
|
||||
//
|
||||
// Best-effort: a failure logs a warning and the pruner keeps trying on
|
||||
// the next interval. There's no shared state to corrupt — pruneInboxBefore
|
||||
// is a single DELETE statement.
|
||||
|
||||
import { pruneInboxBefore } from "./db/inbox.js";
|
||||
import type { SqliteDb } from "./db/sqlite.js";
|
||||
|
||||
export interface InboxPrunerOptions {
|
||||
db: SqliteDb;
|
||||
/** Retention window in ms. Rows with received_at < (now - retentionMs)
|
||||
* are deleted. Default: 30 days. */
|
||||
retentionMs?: number;
|
||||
/** How often to run the prune. Default: 1 hour. */
|
||||
intervalMs?: number;
|
||||
log?: (level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) => void;
|
||||
}
|
||||
|
||||
export interface InboxPrunerHandle {
|
||||
stop: () => void;
|
||||
}
|
||||
|
||||
const DEFAULT_RETENTION_MS = 30 * 24 * 60 * 60 * 1000;
|
||||
const DEFAULT_INTERVAL_MS = 60 * 60 * 1000;
|
||||
|
||||
export function startInboxPruner(opts: InboxPrunerOptions): InboxPrunerHandle {
|
||||
const retentionMs = opts.retentionMs ?? DEFAULT_RETENTION_MS;
|
||||
const intervalMs = opts.intervalMs ?? DEFAULT_INTERVAL_MS;
|
||||
const log = opts.log ?? defaultLog;
|
||||
|
||||
const tick = (): void => {
|
||||
try {
|
||||
const cutoff = Date.now() - retentionMs;
|
||||
const removed = pruneInboxBefore(opts.db, cutoff);
|
||||
if (removed > 0) {
|
||||
log("info", "inbox_prune_completed", {
|
||||
removed,
|
||||
retention_days: Math.round(retentionMs / (24 * 60 * 60 * 1000)),
|
||||
});
|
||||
}
|
||||
} catch (e) {
|
||||
log("warn", "inbox_prune_failed", { err: String(e) });
|
||||
}
|
||||
};
|
||||
|
||||
// Run once at startup so a daemon that's been down for weeks reaps
|
||||
// immediately rather than waiting an hour.
|
||||
tick();
|
||||
|
||||
const handle = setInterval(tick, intervalMs);
|
||||
// Don't let the pruner block daemon shutdown.
|
||||
if (typeof handle.unref === "function") handle.unref();
|
||||
|
||||
return { stop: () => clearInterval(handle) };
|
||||
}
|
||||
|
||||
function defaultLog(level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) {
|
||||
const line = JSON.stringify({ level, msg, ...meta, ts: new Date().toISOString() });
|
||||
if (level === "info") process.stdout.write(line + "\n");
|
||||
else process.stderr.write(line + "\n");
|
||||
}
|
||||
@@ -39,6 +39,12 @@ export interface SendRequest {
|
||||
nonce?: string;
|
||||
/** Sprint 4: which mesh this send is for (single-mesh daemon today; multi-mesh later). */
|
||||
mesh?: string;
|
||||
/** 1.34.0: when the IPC request authenticated as a launched session,
|
||||
* the IPC layer fills this with the session's hex pubkey. The drain
|
||||
* worker uses it to route via the matching SessionBrokerClient so
|
||||
* broker fan-out attributes the push to the session pubkey instead
|
||||
* of the daemon's member pubkey. */
|
||||
sender_session_pubkey?: string;
|
||||
}
|
||||
|
||||
export type AcceptOutcome =
|
||||
@@ -93,6 +99,7 @@ export function acceptSend(req: SendRequest, deps: AcceptDeps): AcceptOutcome {
|
||||
nonce: req.nonce,
|
||||
ciphertext: req.ciphertext,
|
||||
priority: req.priority,
|
||||
sender_session_pubkey: req.sender_session_pubkey,
|
||||
});
|
||||
return { kind: "accepted_pending", status: 202, client_message_id: clientId };
|
||||
}
|
||||
|
||||
@@ -5,7 +5,7 @@ import { timingSafeEqual } from "node:crypto";
|
||||
import { DAEMON_PATHS, DAEMON_TCP_HOST, DAEMON_TCP_DEFAULT_PORT } from "../paths.js";
|
||||
import type { SqliteDb } from "../db/sqlite.js";
|
||||
import { acceptSend, type SendRequest } from "./handlers/send.js";
|
||||
import { listInbox } from "../db/inbox.js";
|
||||
import { listInbox, deleteInboxRow, flushInbox, markInboxSeen } from "../db/inbox.js";
|
||||
import { listOutbox, requeueDeadOrPending, type OutboxStatus } from "../db/outbox.js";
|
||||
import { randomUUID } from "node:crypto";
|
||||
import { bindSseStream, type EventBus } from "../events.js";
|
||||
@@ -204,7 +204,17 @@ function makeHandler(opts: {
|
||||
}
|
||||
|
||||
if (req.method === "GET" && url.pathname === "/v1/health") {
|
||||
respond(res, 200, { ok: true, pid: process.pid });
|
||||
// 1.31.0: include per-mesh broker WS state so callers can verify
|
||||
// functional connectivity, not just that the daemon process is
|
||||
// running. Used by `claudemesh install` post-flight to wait for
|
||||
// at least one broker to be `open` before declaring success —
|
||||
// catches dead WS / DNS / TLS / outbound-blocked-port issues at
|
||||
// install time instead of when the user's first message fails.
|
||||
const brokers: Record<string, string> = {};
|
||||
if (opts.brokers) {
|
||||
for (const [slug, client] of opts.brokers) brokers[slug] = client.status;
|
||||
}
|
||||
respond(res, 200, { ok: true, pid: process.pid, brokers });
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -309,7 +319,21 @@ function makeHandler(opts: {
|
||||
respond(res, 503, { error: "event bus not initialised" });
|
||||
return;
|
||||
}
|
||||
bindSseStream(res, opts.bus);
|
||||
// 1.34.10: per-session SSE demux. When the subscriber presented
|
||||
// a ClaudeMesh-Session token (the MCP server always does post-
|
||||
// 1.34.10), scope the stream to that session's pubkey + the
|
||||
// matching mesh's member pubkey. Diagnostic callers without a
|
||||
// session token (`claudemesh daemon events`) get the unfiltered
|
||||
// legacy stream. The bus itself stays single-shot; demux lives
|
||||
// entirely at the SSE bind layer (events.ts shouldDeliver).
|
||||
const filter: Record<string, string> = {};
|
||||
if (session?.presence?.sessionPubkey) filter.sessionPubkey = session.presence.sessionPubkey;
|
||||
if (session?.mesh) {
|
||||
filter.meshSlug = session.mesh;
|
||||
const meshCfg = opts.meshConfigs?.get(session.mesh);
|
||||
if (meshCfg?.pubkey) filter.memberPubkey = meshCfg.pubkey;
|
||||
}
|
||||
bindSseStream(res, opts.bus, filter);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -569,12 +593,46 @@ function makeHandler(opts: {
|
||||
const fromPubkey = url.searchParams.get("from") ?? undefined;
|
||||
const limitRaw = url.searchParams.get("limit");
|
||||
const limit = limitRaw ? Number.parseInt(limitRaw, 10) : undefined;
|
||||
// 1.34.0: mesh filter. Falls back to session-default if header set.
|
||||
const meshFilter = meshFromCtx(url.searchParams.get("mesh")) ?? undefined;
|
||||
// 1.34.8: read-state filter. ?unread_only=true narrows to rows
|
||||
// whose seen_at is NULL — used by the welcome push so a freshly
|
||||
// launched session surfaces only what it actually missed.
|
||||
const unreadOnly = url.searchParams.get("unread_only") === "true";
|
||||
// 1.34.8: ?mark_seen=false opts out of the auto-stamp behavior. By
|
||||
// default an interactive listing flips seen_at on the rows it just
|
||||
// returned (the user "saw" them), which is what we want for the
|
||||
// CLI but not for diagnostic tooling that wants to peek without
|
||||
// affecting state. The MCP server uses mark_seen=false on the
|
||||
// welcome path; it stamps explicitly via /v1/inbox/seen instead.
|
||||
const markSeen = url.searchParams.get("mark_seen") !== "false";
|
||||
// 1.34.11: scope by recipient when the caller is an authenticated
|
||||
// session. The daemon receives every inbox row for every session
|
||||
// it hosts, so a query without scoping returns the global table —
|
||||
// session A would see B's DMs (the bug 1.34.10 fixed for the
|
||||
// live event path; this is the storage half). Scope = session
|
||||
// pubkey (DMs) + member pubkey (broadcasts/member DMs the whole
|
||||
// member should see) + NULL (legacy rows we can't attribute).
|
||||
const recipientPubkey = session?.presence?.sessionPubkey;
|
||||
const meshCfgForRecipient = session?.mesh ? opts.meshConfigs?.get(session.mesh) : undefined;
|
||||
const recipientMemberPubkey = meshCfgForRecipient?.pubkey;
|
||||
const rows = listInbox(opts.inboxDb, {
|
||||
since: Number.isFinite(since) ? since : undefined,
|
||||
topic,
|
||||
fromPubkey,
|
||||
...(meshFilter ? { mesh: meshFilter } : {}),
|
||||
unreadOnly,
|
||||
...(recipientPubkey ? { recipientPubkey } : {}),
|
||||
...(recipientMemberPubkey ? { recipientMemberPubkey } : {}),
|
||||
limit: Number.isFinite(limit ?? NaN) ? limit : undefined,
|
||||
});
|
||||
let flippedCount = 0;
|
||||
if (markSeen) {
|
||||
const unreadIds = rows.filter((r) => r.seen_at == null).map((r) => r.id);
|
||||
if (unreadIds.length > 0) {
|
||||
flippedCount = markInboxSeen(opts.inboxDb, unreadIds);
|
||||
}
|
||||
}
|
||||
respond(res, 200, {
|
||||
items: rows.map((r) => ({
|
||||
id: r.id,
|
||||
@@ -587,11 +645,72 @@ function makeHandler(opts: {
|
||||
body: r.body,
|
||||
received_at: new Date(r.received_at).toISOString(),
|
||||
reply_to_id: r.reply_to_id,
|
||||
// 1.34.8: surface read-state. `null` = never seen (welcome
|
||||
// candidate). Note that if mark_seen=true (default), we just
|
||||
// stamped these rows — but the snapshot reflects the value
|
||||
// BEFORE the stamp so callers can still tell which rows were
|
||||
// unread when they asked.
|
||||
seen_at: r.seen_at ? new Date(r.seen_at).toISOString() : null,
|
||||
// 1.34.11: recipient context. Lets `--json` consumers tell
|
||||
// a session DM apart from a member-keyed broadcast, and
|
||||
// distinguishes pre-1.34.11 legacy rows (NULL) from
|
||||
// properly-scoped ones.
|
||||
recipient_pubkey: r.recipient_pubkey,
|
||||
recipient_kind: r.recipient_kind,
|
||||
})),
|
||||
// 1.34.8: how many rows just flipped from unread → seen. Useful
|
||||
// for telemetry and lets the CLI render "marked N as read".
|
||||
marked_seen: flippedCount,
|
||||
});
|
||||
return;
|
||||
}
|
||||
|
||||
// 1.34.8: explicit mark-seen endpoint. Used by the MCP server after
|
||||
// it surfaces a live `<channel>` reminder for an inbox row — Claude
|
||||
// Code already saw the row inline, so welcome shouldn't re-surface
|
||||
// it on the next launch. Body: { ids: string[] }. Returns the
|
||||
// number of rows that flipped from unread → seen.
|
||||
if (req.method === "POST" && url.pathname === "/v1/inbox/seen") {
|
||||
if (!opts.inboxDb) { respond(res, 503, { error: "inbox not initialised" }); return; }
|
||||
try {
|
||||
const body = await readJsonBody(req, 64 * 1024) as Record<string, unknown> | null;
|
||||
const ids = Array.isArray(body?.ids)
|
||||
? (body!.ids as unknown[]).filter((x): x is string => typeof x === "string")
|
||||
: [];
|
||||
if (ids.length === 0) { respond(res, 400, { error: "missing 'ids' (string[])" }); return; }
|
||||
const flipped = markInboxSeen(opts.inboxDb, ids);
|
||||
respond(res, 200, { marked_seen: flipped });
|
||||
} catch (e) {
|
||||
respond(res, 400, { error: String(e) });
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
// 1.34.7: inbox flush + per-row delete. The inbox is the daemon's
|
||||
// local persisted SQLite store — there's no broker-side state to
|
||||
// coordinate, so these are simple local writes.
|
||||
if (req.method === "DELETE" && url.pathname === "/v1/inbox") {
|
||||
if (!opts.inboxDb) { respond(res, 503, { error: "inbox not initialised" }); return; }
|
||||
const meshFilter = meshFromCtx(url.searchParams.get("mesh")) ?? undefined;
|
||||
const beforeRaw = url.searchParams.get("before");
|
||||
const before = beforeRaw ? Date.parse(beforeRaw) : undefined;
|
||||
const removed = flushInbox(opts.inboxDb, {
|
||||
...(meshFilter ? { mesh: meshFilter } : {}),
|
||||
...(Number.isFinite(before) ? { before } : {}),
|
||||
});
|
||||
respond(res, 200, { removed });
|
||||
return;
|
||||
}
|
||||
if (req.method === "DELETE" && url.pathname.startsWith("/v1/inbox/")) {
|
||||
if (!opts.inboxDb) { respond(res, 503, { error: "inbox not initialised" }); return; }
|
||||
const id = url.pathname.slice("/v1/inbox/".length);
|
||||
if (!id) { respond(res, 400, { error: "missing id" }); return; }
|
||||
const ok = deleteInboxRow(opts.inboxDb, id);
|
||||
if (!ok) { respond(res, 404, { error: "not found", id }); return; }
|
||||
respond(res, 200, { removed: 1, id });
|
||||
return;
|
||||
}
|
||||
|
||||
if (req.method === "GET" && url.pathname === "/v1/outbox") {
|
||||
if (!opts.outboxDb) { respond(res, 503, { error: "outbox not initialised" }); return; }
|
||||
const statusParam = url.searchParams.get("status") ?? undefined;
|
||||
@@ -691,12 +810,23 @@ function makeHandler(opts: {
|
||||
respond(res, 404, { error: "mesh_not_attached", mesh: chosenSlug });
|
||||
return;
|
||||
}
|
||||
// 1.34.0: authenticated session sends encrypt with the session
|
||||
// secret key + carry the session pubkey through to the outbox
|
||||
// row, so the drain worker can route via SessionBrokerClient
|
||||
// and the broker fan-out attributes the push to the session
|
||||
// pubkey instead of the daemon's member pubkey. Cold-path
|
||||
// sends (no session token) keep the legacy member-key flow.
|
||||
const senderSessionPubkey = session?.presence?.sessionPubkey;
|
||||
const senderSecretKey = session?.presence?.sessionSecretKey ?? meshCfg.secretKey;
|
||||
try {
|
||||
const routed = await resolveAndEncrypt(parsed.req, broker, meshCfg.secretKey, chosenSlug);
|
||||
const routed = await resolveAndEncrypt(parsed.req, broker, senderSecretKey, chosenSlug);
|
||||
parsed.req.target_spec = routed.target_spec;
|
||||
parsed.req.ciphertext = routed.ciphertext;
|
||||
parsed.req.nonce = routed.nonce;
|
||||
parsed.req.mesh = routed.mesh;
|
||||
if (senderSessionPubkey) {
|
||||
parsed.req.sender_session_pubkey = senderSessionPubkey;
|
||||
}
|
||||
} catch (e) {
|
||||
respond(res, 502, { error: "route_failed", detail: String(e) });
|
||||
return;
|
||||
@@ -860,11 +990,14 @@ async function resolveAndEncrypt(
|
||||
return { target_spec: to, ciphertext, nonce, mesh: meshSlug ?? "" };
|
||||
}
|
||||
|
||||
// 64-char hex pubkey → DM directly.
|
||||
// 64-char hex pubkey → DM directly. Encrypt with the daemon's member
|
||||
// secret: recipient decrypts using THEIR session pubkey's matching
|
||||
// secret on their session-WS, so the sender side just needs any
|
||||
// private key whose public counterpart is known to the recipient as
|
||||
// "the sender". Member key is the stable choice and is what the
|
||||
// recipient already trusts via mesh membership.
|
||||
if (/^[0-9a-f]{64}$/i.test(to)) {
|
||||
const sessionKeys = broker.getSessionKeys();
|
||||
const senderSecret = sessionKeys?.sessionSecretKey ?? meshSecretKey;
|
||||
const env = await encryptDirect(req.message, to, senderSecret);
|
||||
const env = await encryptDirect(req.message, to, meshSecretKey);
|
||||
return { target_spec: to, ciphertext: env.ciphertext, nonce: env.nonce, mesh: meshSlug ?? "" };
|
||||
}
|
||||
|
||||
@@ -880,9 +1013,7 @@ async function resolveAndEncrypt(
|
||||
if (matches.length === 0) throw new Error(`no peer matching prefix "${to}"`);
|
||||
if (matches.length > 1) throw new Error(`prefix "${to}" is ambiguous (${matches.length} matches)`);
|
||||
const recipient = matches[0]!.pubkey;
|
||||
const sessionKeys = broker.getSessionKeys();
|
||||
const senderSecret = sessionKeys?.sessionSecretKey ?? meshSecretKey;
|
||||
const env = await encryptDirect(req.message, recipient, senderSecret);
|
||||
const env = await encryptDirect(req.message, recipient, meshSecretKey);
|
||||
return { target_spec: recipient, ciphertext: env.ciphertext, nonce: env.nonce, mesh: meshSlug ?? "" };
|
||||
}
|
||||
|
||||
@@ -890,9 +1021,7 @@ async function resolveAndEncrypt(
|
||||
const match = peers.find((p) => p.displayName.toLowerCase() === to.toLowerCase());
|
||||
if (!match) throw new Error(`peer "${to}" not found`);
|
||||
const recipient = match.pubkey;
|
||||
const sessionKeys = broker.getSessionKeys();
|
||||
const senderSecret = sessionKeys?.sessionSecretKey ?? meshSecretKey;
|
||||
const env = await encryptDirect(req.message, recipient, senderSecret);
|
||||
const env = await encryptDirect(req.message, recipient, meshSecretKey);
|
||||
return { target_spec: recipient, ciphertext: env.ciphertext, nonce: env.nonce, mesh: meshSlug ?? "" };
|
||||
}
|
||||
|
||||
|
||||
@@ -1,9 +1,30 @@
|
||||
import { homedir } from "node:os";
|
||||
import { join } from "node:path";
|
||||
|
||||
import { PATHS } from "~/constants/paths.js";
|
||||
/**
|
||||
* Daemon paths intentionally do NOT honor `CLAUDEMESH_CONFIG_DIR`.
|
||||
*
|
||||
* `claudemesh launch` sets `CLAUDEMESH_CONFIG_DIR` to a per-session
|
||||
* tmpdir so that joined-mesh state, last-used selections, and the
|
||||
* IPC session token stay isolated from the host's shared config.
|
||||
* The daemon, however, is a single per-machine process serving every
|
||||
* launched session — its socket, pid file, on-disk outbox, and SQLite
|
||||
* stores all live under `~/.claudemesh/daemon/`. Letting them inherit
|
||||
* the per-session tmpdir would point each CLI invocation inside a
|
||||
* launched session at a daemon socket that doesn't exist, force the
|
||||
* cold path, and surface as "service-managed daemon not responding
|
||||
* within 8000ms" (1.31.0 regression observed in real install).
|
||||
*
|
||||
* `CLAUDEMESH_DAEMON_DIR` exists as an explicit override for tests
|
||||
* and for the rare case of running multiple daemon instances side by
|
||||
* side (e.g. integration tests). Production callers should never set
|
||||
* it.
|
||||
*/
|
||||
const DAEMON_DIR_ROOT =
|
||||
process.env.CLAUDEMESH_DAEMON_DIR || join(homedir(), ".claudemesh", "daemon");
|
||||
|
||||
export const DAEMON_PATHS = {
|
||||
get DAEMON_DIR() { return join(PATHS.CONFIG_DIR, "daemon"); },
|
||||
get DAEMON_DIR() { return DAEMON_DIR_ROOT; },
|
||||
get PID_FILE() { return join(this.DAEMON_DIR, "daemon.pid"); },
|
||||
get SOCK_FILE() { return join(this.DAEMON_DIR, "daemon.sock"); },
|
||||
get TOKEN_FILE() { return join(this.DAEMON_DIR, "local-token"); },
|
||||
|
||||
98
apps/cli/src/daemon/process-info.ts
Normal file
98
apps/cli/src/daemon/process-info.ts
Normal file
@@ -0,0 +1,98 @@
|
||||
/**
|
||||
* Process-info helpers used by the session reaper to detect dead-pid AND
|
||||
* pid-reuse safely.
|
||||
*
|
||||
* `process.kill(pid, 0)` alone is insufficient: a recently-recycled pid
|
||||
* passes the liveness check even though the process registered under it
|
||||
* is long gone. To avoid mistakenly trusting a recycled pid, we capture
|
||||
* a stable per-process start-time at register, and compare it on each
|
||||
* sweep — if it changed, treat the original process as dead.
|
||||
*
|
||||
* macOS + Linux both expose `ps -o lstart=` returning a fixed-format
|
||||
* timestamp ("Sun May 4 09:14:00 2026"). Equality is the only
|
||||
* operation the reaper needs, so we keep the value as an opaque string.
|
||||
*
|
||||
* IMPORTANT (1.31.1): every fork / execFile blocks the daemon's event
|
||||
* loop until ps completes (~30-80 ms per call on macOS). The first
|
||||
* 1.31.0 implementation called execFileSync once per registered
|
||||
* session every 5 s, and with 10+ sessions that stalled IPC for hundreds
|
||||
* of milliseconds at a time — long enough that probes against
|
||||
* /v1/version were declared "stale" and the CLI fell back to the cold
|
||||
* path with the misleading "service-managed daemon not responding"
|
||||
* warning. This module now exposes:
|
||||
*
|
||||
* - `getProcessStartTime(pid)`: async, single-pid, used at register.
|
||||
* - `getProcessStartTimes(pids)`: async, batched, used by the reaper.
|
||||
* One ps invocation handles N pids, so the per-sweep cost is fixed
|
||||
* and tiny regardless of how many sessions are registered.
|
||||
*/
|
||||
|
||||
import { execFile } from "node:child_process";
|
||||
import { promisify } from "node:util";
|
||||
|
||||
const execFileAsync = promisify(execFile);
|
||||
|
||||
/**
|
||||
* Returns a stable process-start identifier for `pid`, or null if the
|
||||
* process is dead or unreachable. Async — never blocks the event loop.
|
||||
*/
|
||||
export async function getProcessStartTime(pid: number): Promise<string | null> {
|
||||
if (!Number.isFinite(pid) || pid <= 0) return null;
|
||||
try {
|
||||
const { stdout } = await execFileAsync("ps", ["-o", "lstart=", "-p", String(pid)], {
|
||||
encoding: "utf8",
|
||||
timeout: 1_000,
|
||||
});
|
||||
const out = stdout.trim();
|
||||
return out.length > 0 ? out : null;
|
||||
} catch {
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* Batched form: returns a Map<pid, lstart> for every pid that is still
|
||||
* alive. Pids that ps doesn't return (i.e. dead) are absent from the
|
||||
* map. One ps fork handles all pids — O(1) sweep cost regardless of
|
||||
* session count.
|
||||
*/
|
||||
export async function getProcessStartTimes(pids: number[]): Promise<Map<number, string>> {
|
||||
const result = new Map<number, string>();
|
||||
const valid = pids.filter((p) => Number.isFinite(p) && p > 0);
|
||||
if (valid.length === 0) return result;
|
||||
// ps -o pid,lstart= -p p1,p2,... emits one row per live pid:
|
||||
// " 12345 Sun May 4 09:14:00 2026"
|
||||
// Dead pids are silently omitted.
|
||||
try {
|
||||
const { stdout } = await execFileAsync(
|
||||
"ps",
|
||||
["-o", "pid=,lstart=", "-p", valid.join(",")],
|
||||
{ encoding: "utf8", timeout: 2_000 },
|
||||
);
|
||||
for (const raw of stdout.split("\n")) {
|
||||
const line = raw.trim();
|
||||
if (!line) continue;
|
||||
const m = /^(\d+)\s+(.+)$/.exec(line);
|
||||
if (!m) continue;
|
||||
const pid = Number.parseInt(m[1]!, 10);
|
||||
const lstart = m[2]!.trim();
|
||||
if (Number.isFinite(pid) && lstart.length > 0) result.set(pid, lstart);
|
||||
}
|
||||
} catch {
|
||||
// ps failure (timeout, ENOENT) — treat as "no info available" and
|
||||
// let the reaper fall back to bare liveness for these pids. Better
|
||||
// to keep entries than to nuke them on a transient ps error.
|
||||
}
|
||||
return result;
|
||||
}
|
||||
|
||||
/** Liveness-only probe (signal 0). Use together with start-time guard. */
|
||||
export function isPidAlive(pid: number): boolean {
|
||||
if (!Number.isFinite(pid) || pid <= 0) return false;
|
||||
try {
|
||||
process.kill(pid, 0);
|
||||
return true;
|
||||
} catch {
|
||||
return false;
|
||||
}
|
||||
}
|
||||
@@ -11,19 +11,17 @@ import { migrateInbox } from "./db/inbox.js";
|
||||
import { DaemonBrokerClient } from "./broker.js";
|
||||
import { SessionBrokerClient } from "./session-broker.js";
|
||||
import { startDrainWorker, type DrainHandle } from "./drain.js";
|
||||
import { startInboxPruner, type InboxPrunerHandle } from "./inbox-pruner.js";
|
||||
import { handleBrokerPush } from "./inbound.js";
|
||||
import { EventBus } from "./events.js";
|
||||
import { checkFingerprint, type ClonePolicy } from "./identity.js";
|
||||
import { readConfig } from "~/services/config/facade.js";
|
||||
import { VERSION } from "~/constants/urls.js";
|
||||
|
||||
export interface RunDaemonOptions {
|
||||
/** Disable TCP loopback (UDS-only). Defaults true in container envs. */
|
||||
tcpEnabled?: boolean;
|
||||
publicHealthCheck?: boolean;
|
||||
/** Mesh slug to attach to. Required when the user has joined multiple meshes. */
|
||||
mesh?: string;
|
||||
/** Daemon's display name on the mesh. */
|
||||
displayName?: string;
|
||||
/** Behavior on host_fingerprint mismatch. Defaults 'refuse'. */
|
||||
clonePolicy?: ClonePolicy;
|
||||
}
|
||||
@@ -95,30 +93,27 @@ export async function runDaemon(opts: RunDaemonOptions = {}): Promise<number> {
|
||||
|
||||
const bus = new EventBus();
|
||||
|
||||
// 1.26.0 — multi-mesh by default. With --mesh <slug>, the daemon
|
||||
// scopes to one mesh (legacy mode). Without it, attaches to every
|
||||
// joined mesh simultaneously so ambient mode (raw `claude`) works
|
||||
// for all meshes with one daemon process.
|
||||
// 1.34.10: the daemon is universal — attaches to every mesh listed
|
||||
// in config.json. Single-mesh isolation is handled by simply joining
|
||||
// only one mesh in that environment (containers, etc.). No --mesh
|
||||
// flag, no per-mesh service unit; one daemon, every mesh.
|
||||
const cfg = readConfig();
|
||||
let meshes: Array<typeof cfg.meshes[number]>;
|
||||
if (opts.mesh) {
|
||||
const found = cfg.meshes.find((m) => m.slug === opts.mesh);
|
||||
if (!found) {
|
||||
process.stderr.write(`mesh not found: ${opts.mesh}\n`);
|
||||
process.stderr.write(`joined meshes: ${cfg.meshes.map((m) => m.slug).join(", ") || "(none)"}\n`);
|
||||
releaseSingletonLock();
|
||||
try { outboxDb.close(); } catch { /* ignore */ }
|
||||
return 2;
|
||||
}
|
||||
meshes = [found];
|
||||
} else if (cfg.meshes.length === 0) {
|
||||
if (cfg.meshes.length === 0) {
|
||||
process.stderr.write(`no mesh joined; run \`claudemesh join <invite-url>\` first\n`);
|
||||
releaseSingletonLock();
|
||||
try { outboxDb.close(); } catch { /* ignore */ }
|
||||
return 2;
|
||||
} else {
|
||||
meshes = cfg.meshes;
|
||||
}
|
||||
const meshes = cfg.meshes;
|
||||
|
||||
// 1.34.9 — declared upfront so the daemon-WS onPush closure can
|
||||
// reach into the per-session map for the isOwnPubkey filter (drops
|
||||
// peer_joined / peer_left events for our own session pubkeys before
|
||||
// they surface as `[system] Peer "<self>" joined`). Populated below
|
||||
// by setRegistryHooks; empty until the first session registers, but
|
||||
// that's fine — the closure walks it lazily.
|
||||
const sessionBrokers = new Map<string, SessionBrokerClient>();
|
||||
const sessionBrokersByPubkey = new Map<string, SessionBrokerClient>();
|
||||
|
||||
// Spin up one broker per mesh. Connection failures are non-fatal:
|
||||
// the outbox keeps queuing per-mesh and reconnect logic in
|
||||
@@ -127,8 +122,11 @@ export async function runDaemon(opts: RunDaemonOptions = {}): Promise<number> {
|
||||
const meshConfigs = new Map<string, typeof cfg.meshes[number]>();
|
||||
for (const mesh of meshes) {
|
||||
meshConfigs.set(mesh.slug, mesh);
|
||||
const broker = new DaemonBrokerClient(mesh, {
|
||||
displayName: opts.displayName,
|
||||
// 1.34.10: no global displayName override anymore. Each mesh's
|
||||
// hello uses its own per-mesh display name from config.json (set
|
||||
// at `claudemesh join` time). Sessions advertise their own name
|
||||
// via `claudemesh launch --name`.
|
||||
const broker: DaemonBrokerClient = new DaemonBrokerClient(mesh, {
|
||||
onStatusChange: (s) => {
|
||||
process.stdout.write(JSON.stringify({
|
||||
msg: "broker_status", status: s, mesh: mesh.slug, ts: new Date().toISOString(),
|
||||
@@ -136,13 +134,47 @@ export async function runDaemon(opts: RunDaemonOptions = {}): Promise<number> {
|
||||
bus.publish("broker_status", { mesh: mesh.slug, status: s });
|
||||
},
|
||||
onPush: (m) => {
|
||||
const sessionKeys = broker.getSessionKeys();
|
||||
// Daemon-WS is member-keyed, not session-keyed. Session-targeted
|
||||
// DMs land on the per-session WS (SessionBrokerClient) since
|
||||
// 1.32.1 and decrypt with the session secret there. Anything that
|
||||
// arrives here can only be member-keyed (broadcasts, member DMs,
|
||||
// system events) — pass member secret only.
|
||||
// 1.34.9: drop self-echoes — broker fan-out paths mirror an
|
||||
// outbound back to the SAME daemon's member-WS even when the
|
||||
// send originated on a session-WS (because both connections
|
||||
// belong to the same member from the broker's view). Filter on
|
||||
// senderMemberPubkey alone: anything attributed to OUR member is
|
||||
// either our own send echoing back or, theoretically, a peer
|
||||
// send from a different connection that happens to share our
|
||||
// pubkey — but two-different-clients-same-pubkey is impossible
|
||||
// by construction (member pubkeys are stable + unique per
|
||||
// identity). Sibling-session DMs don't fan to our member-WS;
|
||||
// they fan session-to-session. So this is safe.
|
||||
const senderMemberPk = String((m as Record<string, unknown>).senderMemberPubkey ?? "").toLowerCase();
|
||||
const ownMember = mesh.pubkey.toLowerCase();
|
||||
if (senderMemberPk && senderMemberPk === ownMember) {
|
||||
return;
|
||||
}
|
||||
void handleBrokerPush(m, {
|
||||
db: inboxDb,
|
||||
bus,
|
||||
meshSlug: mesh.slug,
|
||||
recipientSecretKeyHex: mesh.secretKey,
|
||||
sessionSecretKeyHex: sessionKeys?.sessionSecretKey,
|
||||
// v2 agentic-comms (M1): client_ack closes the at-least-once
|
||||
// loop. Broker holds the row claimed (not delivered) until ack.
|
||||
ackClientMessage: (cmid, bmid) => broker.sendClientAck(cmid, bmid),
|
||||
// 1.34.9: drop self-join system events. Member pubkey + every
|
||||
// live session pubkey on this daemon all count as "us".
|
||||
isOwnPubkey: (pubkey) => {
|
||||
const lower = pubkey.toLowerCase();
|
||||
if (lower === ownMember) return true;
|
||||
return sessionBrokersByPubkey.has(lower);
|
||||
},
|
||||
// 1.34.10: tag the bus event with our member pubkey so the
|
||||
// SSE demux only fans this row to MCPs whose subscriber
|
||||
// matches (member-keyed broadcasts / DMs).
|
||||
recipientPubkey: mesh.pubkey,
|
||||
recipientKind: "member",
|
||||
});
|
||||
},
|
||||
});
|
||||
@@ -150,16 +182,33 @@ export async function runDaemon(opts: RunDaemonOptions = {}): Promise<number> {
|
||||
brokers.set(mesh.slug, broker);
|
||||
}
|
||||
|
||||
// Start the drain worker. With multi-mesh, drain dispatches each
|
||||
// outbox row to its mesh's broker via the `mesh` column.
|
||||
let drain: DrainHandle | null = null;
|
||||
drain = startDrainWorker({ db: outboxDb, brokers });
|
||||
|
||||
// 1.30.0 — per-session broker presence. Always on. Older CLIs that
|
||||
// don't include `presence` material in the register body just won't
|
||||
// get a session WS; the daemon's own member-keyed broker still
|
||||
// covers them.
|
||||
const sessionBrokers = new Map<string, SessionBrokerClient>();
|
||||
//
|
||||
// The two index maps (sessionBrokers by token, sessionBrokersByPubkey
|
||||
// by session pubkey) are declared earlier in this function so the
|
||||
// daemon-WS onPush closure can reference them for the isOwnPubkey
|
||||
// self-join filter.
|
||||
|
||||
// Start the drain worker. With multi-mesh, drain dispatches each
|
||||
// outbox row to its mesh's broker via the `mesh` column.
|
||||
// 1.34.0: drain also accepts a session-pubkey lookup so rows
|
||||
// written by authenticated sessions route via the matching session-WS
|
||||
// (broker fan-out then attributes the push to the session pubkey).
|
||||
let drain: DrainHandle | null = null;
|
||||
drain = startDrainWorker({
|
||||
db: outboxDb,
|
||||
brokers,
|
||||
getSessionBrokerByPubkey: (pubkey) => sessionBrokersByPubkey.get(pubkey),
|
||||
});
|
||||
|
||||
// 1.34.8 — TTL prune for inbox.db. Runs hourly with a 30-day default
|
||||
// retention. Without this the inbox grows unbounded; even on a moderate
|
||||
// mesh that's tens of thousands of rows over a few weeks. Prune is a
|
||||
// single DELETE; failures are non-fatal and the next interval retries.
|
||||
const inboxPruner: InboxPrunerHandle = startInboxPruner({ db: inboxDb });
|
||||
setRegistryHooks({
|
||||
onRegister: (info) => {
|
||||
if (!info.presence) return;
|
||||
@@ -175,9 +224,24 @@ export async function runDaemon(opts: RunDaemonOptions = {}): Promise<number> {
|
||||
const prior = sessionBrokers.get(info.token);
|
||||
if (prior) {
|
||||
sessionBrokers.delete(info.token);
|
||||
// 1.34.0: keep both indices in sync.
|
||||
if (sessionBrokersByPubkey.get(prior.sessionPubkey) === prior) {
|
||||
sessionBrokersByPubkey.delete(prior.sessionPubkey);
|
||||
}
|
||||
prior.close().catch(() => { /* ignore */ });
|
||||
}
|
||||
const client = new SessionBrokerClient({
|
||||
// 1.32.1 — wire push delivery. Messages targeted at the launched
|
||||
// session's pubkey land on THIS WS, not on the member-keyed one,
|
||||
// so without this forward they'd silently disappear (the bug that
|
||||
// kept inbox.db at zero rows since 1.30.0). Decrypt prefers the
|
||||
// session secret key; member key remains the fallback for legacy
|
||||
// member-targeted traffic that happens to fan out here.
|
||||
const sessionSecretKeyHex = info.presence.sessionSecretKey;
|
||||
// Capture the pubkey for the onPush closure below — TS can't
|
||||
// narrow `info.presence` inside the async arrow even though we
|
||||
// guard `if (!info.presence) return` earlier.
|
||||
const sessionPubkeyHex = info.presence.sessionPubkey;
|
||||
const client: SessionBrokerClient = new SessionBrokerClient({
|
||||
mesh: meshConfig,
|
||||
sessionPubkey: info.presence.sessionPubkey,
|
||||
sessionSecretKey: info.presence.sessionSecretKey,
|
||||
@@ -187,8 +251,27 @@ export async function runDaemon(opts: RunDaemonOptions = {}): Promise<number> {
|
||||
...(info.role ? { role: info.role } : {}),
|
||||
...(info.cwd ? { cwd: info.cwd } : {}),
|
||||
pid: info.pid,
|
||||
onPush: (m) => {
|
||||
void handleBrokerPush(m, {
|
||||
db: inboxDb,
|
||||
bus,
|
||||
meshSlug: meshConfig.slug,
|
||||
recipientSecretKeyHex: meshConfig.secretKey,
|
||||
sessionSecretKeyHex,
|
||||
// v2 agentic-comms (M1): close the at-least-once loop.
|
||||
ackClientMessage: (cmid, bmid) => client.sendClientAck(cmid, bmid),
|
||||
// 1.34.10: tag the bus event with this session's pubkey so
|
||||
// the SSE demux only delivers to the MCP serving THIS
|
||||
// session — not its siblings on the same daemon. Without
|
||||
// this, A's MCP also rendered DMs intended for B because
|
||||
// the bus was a single shared stream.
|
||||
recipientPubkey: sessionPubkeyHex,
|
||||
recipientKind: "session",
|
||||
});
|
||||
},
|
||||
});
|
||||
sessionBrokers.set(info.token, client);
|
||||
sessionBrokersByPubkey.set(info.presence.sessionPubkey, client);
|
||||
client.connect().catch((err) =>
|
||||
process.stderr.write(JSON.stringify({
|
||||
level: "warn", msg: "session_broker_connect_failed",
|
||||
@@ -200,6 +283,11 @@ export async function runDaemon(opts: RunDaemonOptions = {}): Promise<number> {
|
||||
const client = sessionBrokers.get(info.token);
|
||||
if (!client) return;
|
||||
sessionBrokers.delete(info.token);
|
||||
// 1.34.0: drop the pubkey index iff this client still owns it
|
||||
// (a re-register may have already swapped the entry).
|
||||
if (sessionBrokersByPubkey.get(client.sessionPubkey) === client) {
|
||||
sessionBrokersByPubkey.delete(client.sessionPubkey);
|
||||
}
|
||||
client.close().catch(() => { /* ignore */ });
|
||||
},
|
||||
});
|
||||
@@ -228,6 +316,10 @@ export async function runDaemon(opts: RunDaemonOptions = {}): Promise<number> {
|
||||
|
||||
process.stdout.write(JSON.stringify({
|
||||
msg: "daemon_started",
|
||||
// 1.34.10: stamp the version so users can tell whether the
|
||||
// running daemon picked up a recent CLI ship. Read off the same
|
||||
// VERSION constant the IPC `/v1/version` endpoint serves.
|
||||
version: VERSION,
|
||||
pid: process.pid,
|
||||
sock: DAEMON_PATHS.SOCK_FILE,
|
||||
tcp: tcpEnabled ? `127.0.0.1:47823` : null,
|
||||
@@ -240,6 +332,7 @@ export async function runDaemon(opts: RunDaemonOptions = {}): Promise<number> {
|
||||
if (shuttingDown) return;
|
||||
shuttingDown = true;
|
||||
process.stdout.write(JSON.stringify({ msg: "daemon_shutdown", signal: sig, ts: new Date().toISOString() }) + "\n");
|
||||
inboxPruner.stop();
|
||||
if (drain) await drain.close();
|
||||
for (const b of brokers.values()) {
|
||||
try { await b.close(); } catch { /* ignore */ }
|
||||
|
||||
@@ -38,8 +38,13 @@ function isCi(): boolean {
|
||||
export interface InstallArgs {
|
||||
/** Path to the `claudemesh` binary, e.g. /opt/homebrew/bin/claudemesh */
|
||||
binaryPath: string;
|
||||
/** Mesh slug to attach to. */
|
||||
meshSlug: string;
|
||||
/**
|
||||
* Optional mesh slug to lock the daemon to. Omit (the new default) so
|
||||
* the daemon attaches to every joined mesh — matches the 1.26.0
|
||||
* multi-mesh design. Single-mesh lock is preserved for users who
|
||||
* explicitly want it (testing, CI, host with one mesh).
|
||||
*/
|
||||
meshSlug?: string;
|
||||
/** Optional display name. */
|
||||
displayName?: string;
|
||||
/** Override the auto-detected CI refusal. */
|
||||
@@ -87,11 +92,25 @@ function installDarwin(args: InstallArgs): InstallResult {
|
||||
const plist = darwinPlistPath();
|
||||
mkdirSync(dirname(plist), { recursive: true });
|
||||
const log = DAEMON_PATHS.LOG_FILE;
|
||||
// Resolve `node` explicitly. The bin script in node_modules/.bin starts
|
||||
// with `#!/usr/bin/env node`; under launchd's restricted PATH that would
|
||||
// resolve `node` to a system Node (often the wrong major) instead of the
|
||||
// one that installed claudemesh-cli. Pinning process.execPath here means
|
||||
// the daemon always runs under the same Node that ran `claudemesh install`.
|
||||
const nodeBin = process.execPath;
|
||||
// 1.34.12: --foreground because launchd manages lifecycle + stdio.
|
||||
// Without it, the daemon would re-spawn itself detached (the new
|
||||
// default) and launchd would lose track of the actual long-lived
|
||||
// process — KeepAlive wouldn't work and stdout redirect would
|
||||
// capture only the parent's brief boot.
|
||||
const meshArgs = [
|
||||
`<string>${escapeXml(args.binaryPath)}</string>`,
|
||||
"<string>daemon</string>",
|
||||
"<string>up</string>",
|
||||
"<string>--mesh</string>",
|
||||
`<string>${escapeXml(args.meshSlug)}</string>`,
|
||||
"<string>--foreground</string>",
|
||||
...(args.meshSlug
|
||||
? ["<string>--mesh</string>", `<string>${escapeXml(args.meshSlug)}</string>`]
|
||||
: []),
|
||||
...(args.displayName ? ["<string>--name</string>", `<string>${escapeXml(args.displayName)}</string>`] : []),
|
||||
].join("\n ");
|
||||
|
||||
@@ -103,7 +122,7 @@ function installDarwin(args: InstallArgs): InstallResult {
|
||||
<string>${SERVICE_LABEL}</string>
|
||||
<key>ProgramArguments</key>
|
||||
<array>
|
||||
<string>${escapeXml(args.binaryPath)}</string>
|
||||
<string>${escapeXml(nodeBin)}</string>
|
||||
${meshArgs}
|
||||
</array>
|
||||
<key>RunAtLoad</key>
|
||||
@@ -128,6 +147,26 @@ function installDarwin(args: InstallArgs): InstallResult {
|
||||
`;
|
||||
writeFileSync(plist, xml, { mode: 0o644 });
|
||||
|
||||
// Stop any prior incarnation BEFORE bootstrapping so an upgrade run
|
||||
// doesn't hit "service already loaded" → bootstrap exit-5 IO_ERROR.
|
||||
// Both calls are best-effort: launchctl prints to stderr if the unit
|
||||
// isn't loaded, and we don't want to fail install for that.
|
||||
try {
|
||||
execSync(`launchctl bootout gui/$(id -u)/${SERVICE_LABEL}`, { stdio: "ignore" });
|
||||
} catch { /* unit not loaded — fine */ }
|
||||
// Also kill any orphaned daemon process (started manually or by an
|
||||
// older script) so the new launchd-managed one can claim the singleton
|
||||
// lock on first start.
|
||||
try {
|
||||
const pidPath = DAEMON_PATHS.PID_FILE;
|
||||
if (existsSync(pidPath)) {
|
||||
const pid = parseInt(readFileSync(pidPath, "utf8").trim(), 10);
|
||||
if (Number.isFinite(pid) && pid > 0) {
|
||||
try { process.kill(pid, "SIGTERM"); } catch { /* already dead */ }
|
||||
}
|
||||
}
|
||||
} catch { /* pid file missing — fine */ }
|
||||
|
||||
return {
|
||||
platform: "darwin",
|
||||
unitPath: plist,
|
||||
@@ -144,9 +183,15 @@ function linuxUnitPath(): string {
|
||||
function installLinux(args: InstallArgs): InstallResult {
|
||||
const unit = linuxUnitPath();
|
||||
mkdirSync(dirname(unit), { recursive: true });
|
||||
// Same node-pinning rationale as macOS — systemd's User= environment is
|
||||
// similarly minimal; resolve node by absolute path.
|
||||
const nodeBin = process.execPath;
|
||||
// 1.34.12: --foreground because systemd-user owns process lifecycle
|
||||
// and stdio capture; we don't want the child to double-fork into a
|
||||
// detached grandchild systemd can't track.
|
||||
const execArgs = [
|
||||
"daemon", "up",
|
||||
"--mesh", args.meshSlug,
|
||||
"daemon", "up", "--foreground",
|
||||
...(args.meshSlug ? ["--mesh", args.meshSlug] : []),
|
||||
...(args.displayName ? ["--name", args.displayName] : []),
|
||||
].map(shellQuote).join(" ");
|
||||
|
||||
@@ -157,7 +202,7 @@ Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
ExecStart=${shellQuote(args.binaryPath)} ${execArgs}
|
||||
ExecStart=${shellQuote(nodeBin)} ${shellQuote(args.binaryPath)} ${execArgs}
|
||||
Restart=always
|
||||
RestartSec=3
|
||||
StandardOutput=append:${DAEMON_PATHS.LOG_FILE}
|
||||
@@ -169,6 +214,22 @@ WantedBy=default.target
|
||||
`;
|
||||
writeFileSync(unit, content, { mode: 0o644 });
|
||||
|
||||
// Mirror the darwin path: stop the previous unit (if any) so an
|
||||
// upgrade run replaces it cleanly, plus kill any orphaned manual
|
||||
// daemon process holding the singleton lock.
|
||||
try {
|
||||
execSync(`systemctl --user stop ${SYSTEMD_UNIT}`, { stdio: "ignore" });
|
||||
} catch { /* not loaded — fine */ }
|
||||
try {
|
||||
const pidPath = DAEMON_PATHS.PID_FILE;
|
||||
if (existsSync(pidPath)) {
|
||||
const pid = parseInt(readFileSync(pidPath, "utf8").trim(), 10);
|
||||
if (Number.isFinite(pid) && pid > 0) {
|
||||
try { process.kill(pid, "SIGTERM"); } catch { /* already dead */ }
|
||||
}
|
||||
}
|
||||
} catch { /* pid file missing — fine */ }
|
||||
|
||||
return {
|
||||
platform: "linux",
|
||||
unitPath: unit,
|
||||
|
||||
@@ -11,12 +11,22 @@
|
||||
* Differences from `DaemonBrokerClient`:
|
||||
* - Uses session_hello (1.30.0+ broker), with a parent-vouched
|
||||
* attestation provided at construction time.
|
||||
* - Does NOT drain the outbox — that stays the parent member-keyed
|
||||
* DaemonBrokerClient's job. Keeps the responsibility split clean
|
||||
* and avoids two clients fighting over the same outbox row.
|
||||
* - Does NOT carry list_peers / state / memory RPCs. This client is
|
||||
* presence-only (and inbound DM delivery for messages targeted at
|
||||
* the session pubkey).
|
||||
* presence + inbound DM delivery + (1.34.0) outbound send for
|
||||
* messages that originate from this session. Routing those through
|
||||
* here is what makes the broker fan-out attribute the push to the
|
||||
* session pubkey instead of the daemon's stable member pubkey.
|
||||
*
|
||||
* Outbox routing (1.34.0): the drain worker now consults
|
||||
* `outbox.sender_session_pubkey`. If a row was written by an
|
||||
* authenticated session and the matching session-WS is `open`, the
|
||||
* drain dispatches via `SessionBrokerClient.send()` — this
|
||||
* connection's `conn.sessionPubkey` server-side is the session pubkey,
|
||||
* so the broker's existing fan-out attribution
|
||||
* (`senderPubkey: conn.sessionPubkey ?? conn.memberPubkey`) just works.
|
||||
* Pre-1.34.0 every drain went through DaemonBrokerClient (member-WS),
|
||||
* so every push showed up as "from <daemon-member-pubkey>" regardless
|
||||
* of which session typed `claudemesh send`.
|
||||
*
|
||||
* Old brokers reply with `unknown_message_type` on session_hello — we
|
||||
* surface that as a one-shot `error` event and the daemon decides
|
||||
@@ -24,15 +34,37 @@
|
||||
* expected to be deployed first.
|
||||
*
|
||||
* Spec: .artifacts/specs/2026-05-04-per-session-presence.md.
|
||||
*
|
||||
* 2026-05-04: lifecycle (connect / hello-ack / close-reconnect) lives
|
||||
* in `ws-lifecycle.ts`. This class supplies session_hello content and
|
||||
* routes the inbound onPush; the helper handles the rest.
|
||||
*/
|
||||
|
||||
import { hostname as osHostname } from "node:os";
|
||||
import WebSocket from "ws";
|
||||
|
||||
import type { JoinedMesh } from "~/services/config/facade.js";
|
||||
import { signSessionHello } from "~/services/broker/session-hello-sig.js";
|
||||
import { connectWsWithBackoff, type WsLifecycle, type WsStatus } from "./ws-lifecycle.js";
|
||||
import type { BrokerSendArgs, BrokerSendResult } from "./broker.js";
|
||||
|
||||
export type SessionBrokerStatus = "connecting" | "open" | "closed" | "reconnecting";
|
||||
export type SessionBrokerStatus = WsStatus;
|
||||
|
||||
/** Ack-tracking shape, mirrors DaemonBrokerClient.PendingAck. Kept
|
||||
* internal — callers see only the resolved BrokerSendResult. */
|
||||
interface PendingAck {
|
||||
resolve: (r: BrokerSendResult) => void;
|
||||
timer: NodeJS.Timeout;
|
||||
}
|
||||
|
||||
const SEND_ACK_TIMEOUT_MS = 15_000;
|
||||
|
||||
/** Heuristic: which broker-reported send errors are permanent enough
|
||||
* that the drain worker should give up rather than retry. Mirrors the
|
||||
* daemon-WS classifier so behavior is identical regardless of which
|
||||
* socket the row went out on. */
|
||||
function classifyPermanent(error: string): boolean {
|
||||
return /unknown|invalid|forbidden|not_authorized|target_not_found/i.test(error);
|
||||
}
|
||||
|
||||
export interface ParentAttestation {
|
||||
sessionPubkey: string;
|
||||
@@ -62,19 +94,32 @@ export interface SessionBrokerOptions {
|
||||
/** Pid of the launched session (NOT the daemon). */
|
||||
pid: number;
|
||||
onStatusChange?: (s: SessionBrokerStatus) => void;
|
||||
/**
|
||||
* Inbound push/inbound dispatch. The broker fans messages targeted at
|
||||
* a session pubkey out over the corresponding session WS — without
|
||||
* this callback they hit the floor and the daemon's inbox.db never
|
||||
* sees them. Wired in run.ts to a handleBrokerPush call that decrypts
|
||||
* with this session's secret key (member key as fallback).
|
||||
*/
|
||||
onPush?: (msg: Record<string, unknown>) => void;
|
||||
log?: (level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) => void;
|
||||
}
|
||||
|
||||
const HELLO_ACK_TIMEOUT_MS = 5_000;
|
||||
const BACKOFF_CAPS_MS = [1_000, 2_000, 4_000, 8_000, 16_000, 30_000];
|
||||
|
||||
export class SessionBrokerClient {
|
||||
private ws: WebSocket | null = null;
|
||||
private lifecycle: WsLifecycle | null = null;
|
||||
private _status: SessionBrokerStatus = "closed";
|
||||
private closed = false;
|
||||
private reconnectAttempt = 0;
|
||||
private reconnectTimer: NodeJS.Timeout | null = null;
|
||||
private helloTimer: NodeJS.Timeout | null = null;
|
||||
/** Set when the broker rejects session_hello with `unknown_message_type` —
|
||||
* older brokers without the 1.30.0 surface. We stop retrying. */
|
||||
private brokerUnsupported = false;
|
||||
/** 1.34.0: outbound send tracking. Keyed by client_message_id. The
|
||||
* drain worker registers an entry on dispatch; the WS message
|
||||
* handler resolves it on broker `ack`. Times out after 15s. */
|
||||
private pendingAcks = new Map<string, PendingAck>();
|
||||
/** 1.34.0: dispatchers queued while the WS is reconnecting — flushed
|
||||
* in onStatusChange when status flips to `open`. Mirrors the
|
||||
* daemon-WS `opens` array. */
|
||||
private opens: Array<() => void> = [];
|
||||
|
||||
constructor(private opts: SessionBrokerOptions) {}
|
||||
|
||||
@@ -90,112 +135,219 @@ export class SessionBrokerClient {
|
||||
});
|
||||
};
|
||||
|
||||
private setStatus(s: SessionBrokerStatus) {
|
||||
if (this._status === s) return;
|
||||
this._status = s;
|
||||
this.opts.onStatusChange?.(s);
|
||||
}
|
||||
|
||||
/** Open the WS, run session_hello, resolve once the broker accepts. */
|
||||
async connect(): Promise<void> {
|
||||
if (this.closed) throw new Error("client_closed");
|
||||
if (this._status === "connecting" || this._status === "open") return;
|
||||
this.setStatus("connecting");
|
||||
|
||||
const ws = new WebSocket(this.opts.mesh.brokerUrl);
|
||||
this.ws = ws;
|
||||
|
||||
return new Promise<void>((resolve, reject) => {
|
||||
ws.on("open", async () => {
|
||||
try {
|
||||
const { timestamp, signature } = await signSessionHello({
|
||||
meshId: this.opts.mesh.meshId,
|
||||
parentMemberPubkey: this.opts.mesh.pubkey,
|
||||
sessionPubkey: this.opts.sessionPubkey,
|
||||
sessionSecretKey: this.opts.sessionSecretKey,
|
||||
});
|
||||
ws.send(JSON.stringify({
|
||||
type: "session_hello",
|
||||
meshId: this.opts.mesh.meshId,
|
||||
parentMemberId: this.opts.mesh.memberId,
|
||||
parentMemberPubkey: this.opts.mesh.pubkey,
|
||||
sessionPubkey: this.opts.sessionPubkey,
|
||||
parentAttestation: this.opts.parentAttestation,
|
||||
displayName: this.opts.displayName,
|
||||
sessionId: this.opts.sessionId,
|
||||
pid: this.opts.pid,
|
||||
cwd: this.opts.cwd ?? process.cwd(),
|
||||
hostname: osHostname(),
|
||||
peerType: "ai" as const,
|
||||
channel: "claudemesh-session",
|
||||
...(this.opts.groups && this.opts.groups.length > 0 ? { groups: this.opts.groups } : {}),
|
||||
...(this.opts.role ? { role: this.opts.role } : {}),
|
||||
timestamp,
|
||||
signature,
|
||||
}));
|
||||
this.helloTimer = setTimeout(() => {
|
||||
this.log("warn", "session_hello_ack_timeout");
|
||||
try { ws.close(); } catch { /* ignore */ }
|
||||
reject(new Error("session_hello_ack_timeout"));
|
||||
}, HELLO_ACK_TIMEOUT_MS);
|
||||
} catch (e) {
|
||||
reject(e instanceof Error ? e : new Error(String(e)));
|
||||
}
|
||||
});
|
||||
|
||||
ws.on("message", (raw) => {
|
||||
let msg: Record<string, unknown>;
|
||||
try { msg = JSON.parse(raw.toString()) as Record<string, unknown>; }
|
||||
catch { return; }
|
||||
|
||||
if (msg.type === "hello_ack") {
|
||||
if (this.helloTimer) clearTimeout(this.helloTimer);
|
||||
this.helloTimer = null;
|
||||
this.setStatus("open");
|
||||
this.reconnectAttempt = 0;
|
||||
resolve();
|
||||
return;
|
||||
}
|
||||
|
||||
this.lifecycle = await connectWsWithBackoff({
|
||||
url: this.opts.mesh.brokerUrl,
|
||||
buildHello: async () => {
|
||||
const { timestamp, signature } = await signSessionHello({
|
||||
meshId: this.opts.mesh.meshId,
|
||||
parentMemberPubkey: this.opts.mesh.pubkey,
|
||||
sessionPubkey: this.opts.sessionPubkey,
|
||||
sessionSecretKey: this.opts.sessionSecretKey,
|
||||
});
|
||||
return {
|
||||
type: "session_hello",
|
||||
meshId: this.opts.mesh.meshId,
|
||||
parentMemberId: this.opts.mesh.memberId,
|
||||
parentMemberPubkey: this.opts.mesh.pubkey,
|
||||
sessionPubkey: this.opts.sessionPubkey,
|
||||
parentAttestation: this.opts.parentAttestation,
|
||||
displayName: this.opts.displayName,
|
||||
sessionId: this.opts.sessionId,
|
||||
pid: this.opts.pid,
|
||||
cwd: this.opts.cwd ?? process.cwd(),
|
||||
hostname: osHostname(),
|
||||
peerType: "ai" as const,
|
||||
channel: "claudemesh-session",
|
||||
...(this.opts.groups && this.opts.groups.length > 0 ? { groups: this.opts.groups } : {}),
|
||||
...(this.opts.role ? { role: this.opts.role } : {}),
|
||||
timestamp,
|
||||
signature,
|
||||
};
|
||||
},
|
||||
isHelloAck: (msg) => msg.type === "hello_ack",
|
||||
onMessage: (msg) => {
|
||||
if (msg.type === "error") {
|
||||
// Older brokers respond with `unknown_message_type` to session_hello;
|
||||
// surface that so the daemon can decide to skip per-session presence
|
||||
// rather than churn through reconnects.
|
||||
// rather than churn through reconnects. Setting `closed` halts the
|
||||
// helper's reconnect loop on the next close.
|
||||
this.log("warn", "broker_error", { code: msg.code, message: msg.message });
|
||||
if (msg.code === "unknown_message_type") {
|
||||
this.brokerUnsupported = true;
|
||||
this.closed = true;
|
||||
void this.lifecycle?.close();
|
||||
}
|
||||
return;
|
||||
}
|
||||
// push / inbound — presence-only client ignores them; the daemon's
|
||||
// member-keyed client handles all DM decryption.
|
||||
});
|
||||
|
||||
ws.on("close", (code, reason) => {
|
||||
if (this.helloTimer) { clearTimeout(this.helloTimer); this.helloTimer = null; }
|
||||
if (this.closed) { this.setStatus("closed"); return; }
|
||||
this.setStatus("reconnecting");
|
||||
const wait = BACKOFF_CAPS_MS[Math.min(this.reconnectAttempt, BACKOFF_CAPS_MS.length - 1)] ?? 30_000;
|
||||
this.reconnectAttempt++;
|
||||
this.log("info", "session_broker_reconnect_scheduled", { wait_ms: wait, code, reason: reason.toString("utf8") });
|
||||
this.reconnectTimer = setTimeout(
|
||||
() => this.connect().catch((err) => this.log("warn", "session_broker_reconnect_failed", { err: String(err) })),
|
||||
wait,
|
||||
);
|
||||
if (this._status === "connecting") reject(new Error(`closed_before_hello_${code}`));
|
||||
});
|
||||
// 1.34.0: outbox `send` ack arriving on the session-WS. Resolves
|
||||
// the Promise the drain worker is awaiting. Mirrors the
|
||||
// daemon-WS handler exactly.
|
||||
if (msg.type === "ack") {
|
||||
const id = String(msg.id ?? "");
|
||||
const ack = this.pendingAcks.get(id);
|
||||
if (ack) {
|
||||
this.pendingAcks.delete(id);
|
||||
clearTimeout(ack.timer);
|
||||
if (typeof msg.error === "string" && msg.error.length > 0) {
|
||||
ack.resolve({ ok: false, error: msg.error, permanent: classifyPermanent(msg.error) });
|
||||
} else {
|
||||
ack.resolve({ ok: true, messageId: String(msg.messageId ?? id) });
|
||||
}
|
||||
}
|
||||
return;
|
||||
}
|
||||
|
||||
ws.on("error", (err) => this.log("warn", "session_broker_ws_error", { err: err.message }));
|
||||
// 1.32.1 — DMs targeted at the launched session's pubkey arrive
|
||||
// here, NOT on the daemon's member-keyed WS. Forward to the
|
||||
// daemon-level push handler so they land in inbox.db.
|
||||
if (msg.type === "push" || msg.type === "inbound") {
|
||||
// 1.34.9: skip system events on the session-WS — the daemon-WS
|
||||
// already receives the same broker broadcast and publishes it
|
||||
// to the bus, so forwarding here just produces duplicate
|
||||
// `[system] Peer "X" joined the mesh` channel pushes (one per
|
||||
// connection: 1 member-WS + 1 session-WS = 2 messages, +
|
||||
// another set per sibling session). Caught in the 2026-05-04
|
||||
// peer-rejoin smoke.
|
||||
if ((msg as Record<string, unknown>).subtype === "system") return;
|
||||
// 1.34.8: drop self-echoes. Some broker fan-out paths mirror an
|
||||
// outbound DM back to the originating session-WS; without this
|
||||
// guard the sender's own message lands in inbox.db, publishes a
|
||||
// `message` bus event, and Claude Code surfaces it as
|
||||
// `← claudemesh: <self>: <text>` immediately after the user
|
||||
// typed `claudemesh send`. Caught in the 2026-05-04 two-session
|
||||
// smoke. Match on session pubkey only — sibling sessions of the
|
||||
// same member share `senderMemberPubkey`, so a member-level
|
||||
// filter would wrongly drop legit sibling DMs.
|
||||
const senderPubkey = String((msg as Record<string, unknown>).senderPubkey ?? "").toLowerCase();
|
||||
if (senderPubkey && senderPubkey === this.opts.sessionPubkey.toLowerCase()) {
|
||||
this.log("info", "self_echo_dropped", { sender: senderPubkey.slice(0, 12) });
|
||||
return;
|
||||
}
|
||||
this.opts.onPush?.(msg);
|
||||
return;
|
||||
}
|
||||
},
|
||||
onStatusChange: (s) => {
|
||||
this._status = s;
|
||||
this.opts.onStatusChange?.(s);
|
||||
if (s === "open") {
|
||||
// 1.34.0: flush queued send dispatchers so any outbox row that
|
||||
// tried to dispatch while we were reconnecting goes out now.
|
||||
const queued = this.opens.slice();
|
||||
this.opens.length = 0;
|
||||
for (const fn of queued) {
|
||||
try { fn(); } catch (e) { this.log("warn", "session_open_handler_failed", { err: String(e) }); }
|
||||
}
|
||||
} else if (s === "closed" || s === "reconnecting") {
|
||||
// Fail any in-flight acks so the drain worker can retry/backoff
|
||||
// instead of hanging on a dead promise. The daemon-WS does the
|
||||
// same thing via onBeforeReconnect; we centralize it here
|
||||
// because session-broker uses status transitions directly.
|
||||
this.failPendingAcks(`session_ws_${s}`);
|
||||
}
|
||||
},
|
||||
log: (level, msg, meta) => this.log(level, `session_broker_${msg}`, meta),
|
||||
});
|
||||
}
|
||||
|
||||
/** v2 agentic-comms (M1): send `client_ack` back to the broker after
|
||||
* successfully landing an inbound push in inbox.db. Broker uses the
|
||||
* ack to set `delivered_at`. Best-effort. */
|
||||
sendClientAck(clientMessageId: string, brokerMessageId: string | null): void {
|
||||
if (this._status !== "open" || !this.lifecycle) return;
|
||||
try {
|
||||
this.lifecycle.send({
|
||||
type: "client_ack",
|
||||
clientMessageId,
|
||||
...(brokerMessageId ? { brokerMessageId } : {}),
|
||||
});
|
||||
} catch { /* drop; lease re-delivers */ }
|
||||
}
|
||||
|
||||
/** True when underlying socket is OPEN-ready for direct sends. */
|
||||
isOpen(): boolean {
|
||||
const sock = this.lifecycle?.ws;
|
||||
return !!sock && sock.readyState === sock.OPEN;
|
||||
}
|
||||
|
||||
/**
|
||||
* 1.34.0 — Send one outbox row over the session-WS. Same wire format
|
||||
* as DaemonBrokerClient.send, but routed via this connection so the
|
||||
* broker's fan-out attributes the push to the session pubkey.
|
||||
*
|
||||
* Used by the drain worker for rows whose `sender_session_pubkey`
|
||||
* matches this client's session pubkey. When the WS is reconnecting
|
||||
* the dispatcher is queued via `opens` and flushed on the next
|
||||
* status flip.
|
||||
*/
|
||||
send(req: BrokerSendArgs): Promise<BrokerSendResult> {
|
||||
return new Promise<BrokerSendResult>((resolve) => {
|
||||
const dispatch = () => {
|
||||
if (!this.isOpen() || !this.lifecycle) {
|
||||
resolve({ ok: false, error: "session_ws_not_open", permanent: false });
|
||||
return;
|
||||
}
|
||||
const id = req.client_message_id;
|
||||
const timer = setTimeout(() => {
|
||||
if (this.pendingAcks.delete(id)) {
|
||||
resolve({ ok: false, error: "ack_timeout", permanent: false });
|
||||
}
|
||||
}, SEND_ACK_TIMEOUT_MS);
|
||||
this.pendingAcks.set(id, { resolve, timer });
|
||||
try {
|
||||
this.lifecycle.send({
|
||||
type: "send",
|
||||
id,
|
||||
client_message_id: id,
|
||||
request_fingerprint: req.request_fingerprint_hex,
|
||||
targetSpec: req.targetSpec,
|
||||
priority: req.priority,
|
||||
nonce: req.nonce,
|
||||
ciphertext: req.ciphertext,
|
||||
});
|
||||
} catch (e) {
|
||||
this.pendingAcks.delete(id);
|
||||
clearTimeout(timer);
|
||||
resolve({ ok: false, error: `ws_write_failed: ${String(e)}`, permanent: false });
|
||||
}
|
||||
};
|
||||
|
||||
if (this._status === "open") dispatch();
|
||||
else this.opens.push(dispatch);
|
||||
});
|
||||
}
|
||||
|
||||
/** Resolve every in-flight ack with a synthetic failure. Called on
|
||||
* WS close so the drain worker stops waiting and either retries or
|
||||
* reroutes via the daemon-WS. */
|
||||
private failPendingAcks(reason: string): void {
|
||||
if (this.pendingAcks.size === 0) return;
|
||||
const entries = [...this.pendingAcks.entries()];
|
||||
this.pendingAcks.clear();
|
||||
for (const [, ack] of entries) {
|
||||
clearTimeout(ack.timer);
|
||||
ack.resolve({ ok: false, error: reason, permanent: false });
|
||||
}
|
||||
}
|
||||
|
||||
async close(): Promise<void> {
|
||||
this.closed = true;
|
||||
if (this.reconnectTimer) { clearTimeout(this.reconnectTimer); this.reconnectTimer = null; }
|
||||
if (this.helloTimer) { clearTimeout(this.helloTimer); this.helloTimer = null; }
|
||||
try { this.ws?.close(); } catch { /* ignore */ }
|
||||
this.setStatus("closed");
|
||||
if (this.lifecycle) {
|
||||
try { await this.lifecycle.close(); } catch { /* ignore */ }
|
||||
this.lifecycle = null;
|
||||
}
|
||||
this._status = "closed";
|
||||
}
|
||||
|
||||
/** True when the broker rejected our session_hello as unknown — caller
|
||||
* may want to skip per-session presence entirely on this mesh. */
|
||||
get isBrokerUnsupported(): boolean { return this.brokerUnsupported; }
|
||||
}
|
||||
|
||||
function defaultLog(level: "info" | "warn" | "error", msg: string, meta?: Record<string, unknown>) {
|
||||
|
||||
@@ -10,7 +10,9 @@
|
||||
* Lifecycle:
|
||||
* - register replaces any prior entry under the same `sessionId`
|
||||
* (handles re-launch and `--resume` flows cleanly).
|
||||
* - reaper polls every 30 s and drops entries whose pid is dead.
|
||||
* - reaper polls every 5 s. An entry is dropped when its pid is dead
|
||||
* OR when its captured start-time no longer matches the running
|
||||
* process (PID reuse — original is gone, OS recycled the number).
|
||||
* - hard ttl ceiling of 24 h is a leak guard for forgotten sessions.
|
||||
*
|
||||
* Persistence: in-memory only for v1. A daemon restart clears the
|
||||
@@ -20,6 +22,8 @@
|
||||
* session have no token to begin with.
|
||||
*/
|
||||
|
||||
import { getProcessStartTime, getProcessStartTimes, isPidAlive } from "./process-info.js";
|
||||
|
||||
/**
|
||||
* Optional per-launch presence material. Carried opaquely through the
|
||||
* registry; the daemon's session-broker subsystem (1.30.0+) reads it to
|
||||
@@ -51,6 +55,16 @@ export interface SessionInfo {
|
||||
groups?: string[];
|
||||
/** 1.30.0+: per-launch presence material. */
|
||||
presence?: SessionPresence;
|
||||
/**
|
||||
* 1.31.0+: opaque per-process start-time captured at register. The
|
||||
* reaper compares the live value against this on every sweep — a
|
||||
* mismatch means the original process exited and the pid was reused
|
||||
* by an unrelated program, so the registry entry must be dropped.
|
||||
* `undefined` when capture failed (process already dead at register
|
||||
* time, ps unavailable, etc.) — the reaper falls back to bare
|
||||
* liveness in that case.
|
||||
*/
|
||||
startTime?: string;
|
||||
registeredAt: number;
|
||||
}
|
||||
|
||||
@@ -61,7 +75,7 @@ export interface RegistryHooks {
|
||||
}
|
||||
|
||||
const TTL_MS = 24 * 60 * 60 * 1000;
|
||||
const REAPER_INTERVAL_MS = 30 * 1000;
|
||||
const REAPER_INTERVAL_MS = 5 * 1000;
|
||||
|
||||
const byToken = new Map<string, SessionInfo>();
|
||||
const bySessionId = new Map<string, string>();
|
||||
@@ -71,7 +85,10 @@ let reaperHandle: NodeJS.Timeout | null = null;
|
||||
|
||||
export function startReaper(): void {
|
||||
if (reaperHandle) return;
|
||||
reaperHandle = setInterval(reapDead, REAPER_INTERVAL_MS).unref?.() ?? reaperHandle;
|
||||
// The sweep is async (batched ps) — wrap in `void` so setInterval
|
||||
// doesn't try to await us, and so an unexpected throw doesn't crash
|
||||
// the daemon. Errors are swallowed inside reapDead.
|
||||
reaperHandle = setInterval(() => { void reapDead(); }, REAPER_INTERVAL_MS).unref?.() ?? reaperHandle;
|
||||
}
|
||||
|
||||
export function stopReaper(): void {
|
||||
@@ -98,13 +115,31 @@ export function registerSession(info: Omit<SessionInfo, "registeredAt">): Sessio
|
||||
}
|
||||
}
|
||||
|
||||
// Caller may pre-fill info.startTime (tests do this for determinism).
|
||||
// For the real path we fire-and-forget an async ps probe — register
|
||||
// stays sync and microsecond-fast, and the start-time lands on the
|
||||
// entry within a few ms. Until it lands, the reaper falls back to
|
||||
// bare liveness for this entry, which is fine for the common case
|
||||
// (PID reuse is rare; the brief window without the guard is
|
||||
// tolerable).
|
||||
const stored: SessionInfo = { ...info, registeredAt: Date.now() };
|
||||
byToken.set(info.token, stored);
|
||||
bySessionId.set(info.sessionId, info.token);
|
||||
try { hooks.onRegister?.(stored); } catch { /* see above */ }
|
||||
if (stored.startTime === undefined) {
|
||||
void captureStartTimeAsync(info.token, info.pid);
|
||||
}
|
||||
return stored;
|
||||
}
|
||||
|
||||
async function captureStartTimeAsync(token: string, pid: number): Promise<void> {
|
||||
const lstart = await getProcessStartTime(pid);
|
||||
if (lstart === null) return;
|
||||
const entry = byToken.get(token);
|
||||
if (!entry || entry.pid !== pid) return; // entry was replaced; skip
|
||||
entry.startTime = lstart;
|
||||
}
|
||||
|
||||
export function deregisterByToken(token: string): boolean {
|
||||
const entry = byToken.get(token);
|
||||
if (!entry) return false;
|
||||
@@ -128,15 +163,54 @@ export function listSessions(): SessionInfo[] {
|
||||
return [...byToken.values()];
|
||||
}
|
||||
|
||||
function reapDead(): void {
|
||||
async function reapDead(): Promise<void> {
|
||||
// Snapshot first; the second (async) phase calls ps and we must not
|
||||
// mutate the registry mid-iteration.
|
||||
const entries = [...byToken.entries()];
|
||||
|
||||
// Phase 1 — TTL + bare liveness. Sync, microsecond-fast.
|
||||
const dead: string[] = [];
|
||||
for (const [token, info] of byToken.entries()) {
|
||||
const survivors: Array<[string, SessionInfo]> = [];
|
||||
for (const [token, info] of entries) {
|
||||
if (Date.now() - info.registeredAt > TTL_MS) { dead.push(token); continue; }
|
||||
try { process.kill(info.pid, 0); } catch { dead.push(token); }
|
||||
if (!isPidAlive(info.pid)) { dead.push(token); continue; }
|
||||
survivors.push([token, info]);
|
||||
}
|
||||
|
||||
// Phase 2 — PID-reuse guard for survivors that have a captured
|
||||
// start-time. Single batched ps call: O(1) forks regardless of
|
||||
// session count. Survivors without a start-time keep the bare-
|
||||
// liveness verdict from phase 1 (their captureStartTimeAsync may
|
||||
// still be in-flight from a recent register).
|
||||
const guardedPids = survivors
|
||||
.filter(([, info]) => info.startTime !== undefined)
|
||||
.map(([, info]) => info.pid);
|
||||
if (guardedPids.length > 0) {
|
||||
try {
|
||||
const live = await getProcessStartTimes(guardedPids);
|
||||
for (const [token, info] of survivors) {
|
||||
if (info.startTime === undefined) continue;
|
||||
const lstart = live.get(info.pid);
|
||||
// ps may transiently miss a pid that was alive when isPidAlive
|
||||
// ran — treat absence as "racing", let the next sweep decide.
|
||||
if (lstart === undefined) continue;
|
||||
if (lstart !== info.startTime) dead.push(token);
|
||||
}
|
||||
} catch {
|
||||
// ps failure here is non-fatal: survivors keep their phase-1
|
||||
// verdict. Logging is the daemon's responsibility — the
|
||||
// registry deliberately stays log-free.
|
||||
}
|
||||
}
|
||||
|
||||
for (const t of dead) deregisterByToken(t);
|
||||
}
|
||||
|
||||
/** Test helper: run a single reaper pass. */
|
||||
export async function _runReaperOnce(): Promise<void> {
|
||||
await reapDead();
|
||||
}
|
||||
|
||||
/** Test helper. */
|
||||
export function _resetRegistry(): void {
|
||||
byToken.clear();
|
||||
|
||||
274
apps/cli/src/daemon/ws-lifecycle.ts
Normal file
274
apps/cli/src/daemon/ws-lifecycle.ts
Normal file
@@ -0,0 +1,274 @@
|
||||
/**
|
||||
* Shared WS lifecycle helper for the daemon's two broker clients.
|
||||
*
|
||||
* Both `DaemonBrokerClient` (member-keyed, one per joined mesh) and
|
||||
* `SessionBrokerClient` (session-keyed, one per launched session) used
|
||||
* to inline the same connect/hello/ack-timeout/close-reconnect logic.
|
||||
* They drifted apart subtly — different ack-timeout names, different
|
||||
* reconnect log messages, slightly different status flips — and that's
|
||||
* how 1.32.x bugs shipped (push handler attached to the wrong client,
|
||||
* etc).
|
||||
*
|
||||
* This helper owns ONLY the lifecycle:
|
||||
* - new WebSocket(url), wire up open/message/close/error
|
||||
* - on open → call buildHello() and send the result
|
||||
* - start an ack-timeout timer; if it fires before the hello ack
|
||||
* arrives, close the socket and reject the connect promise
|
||||
* - on message, gate on isHelloAck(); when true, flip status to
|
||||
* "open", clear the ack timer, resolve. All other messages are
|
||||
* forwarded to onMessage()
|
||||
* - on close, schedule a backoff reconnect (unless explicitly closed)
|
||||
*
|
||||
* Each client keeps its own concerns: DaemonBrokerClient still owns
|
||||
* pendingAcks / peerListResolvers / etc; SessionBrokerClient still owns
|
||||
* its onPush callback. The helper just hands them an open WS and a
|
||||
* stable status field, and reconnects under their feet on disconnect.
|
||||
*
|
||||
* Composition over inheritance — callers receive a `WsLifecycle` handle
|
||||
* with `send` / `close` / `status`, NOT a subclass.
|
||||
*/
|
||||
|
||||
import WebSocket from "ws";
|
||||
|
||||
export type WsStatus = "connecting" | "open" | "closed" | "reconnecting";
|
||||
|
||||
export type WsLogLevel = "info" | "warn" | "error";
|
||||
export type WsLog = (level: WsLogLevel, msg: string, meta?: Record<string, unknown>) => void;
|
||||
|
||||
export interface WsLifecycleOptions {
|
||||
/** Broker URL (e.g. wss://ic.claudemesh.com/ws). */
|
||||
url: string;
|
||||
/**
|
||||
* Build the hello frame to send right after the WS opens. Async because
|
||||
* signing the hello may need libsodium initialization. Whatever this
|
||||
* returns is JSON.stringified and sent verbatim — the helper does NOT
|
||||
* inspect or modify it.
|
||||
*/
|
||||
buildHello: () => Promise<unknown>;
|
||||
/**
|
||||
* Returns true iff `msg` is the hello ack the helper should treat as
|
||||
* "broker accepted us; flip status to open". Both daemon-WS and
|
||||
* session-WS use `{ type: "hello_ack" }` today, but keeping this a
|
||||
* predicate lets either client narrow further (e.g. on a `code` field)
|
||||
* without leaking client-specific shape into the helper.
|
||||
*/
|
||||
isHelloAck: (msg: Record<string, unknown>) => boolean;
|
||||
/**
|
||||
* Called for every parsed message that is NOT the hello ack. The
|
||||
* helper does NOT decide which messages are pushes vs RPCs vs errors;
|
||||
* that's the caller's concern.
|
||||
*/
|
||||
onMessage: (msg: Record<string, unknown>) => void;
|
||||
onStatusChange?: (s: WsStatus) => void;
|
||||
/**
|
||||
* How long to wait for the broker's hello ack before giving up and
|
||||
* forcing a close. Defaults 5s — same as both pre-refactor clients.
|
||||
*/
|
||||
helloAckTimeoutMs?: number;
|
||||
/**
|
||||
* Reconnect backoff schedule. Defaults [1s, 2s, 4s, 8s, 16s, 30s] —
|
||||
* matches both pre-refactor clients exactly.
|
||||
*/
|
||||
backoffCapsMs?: readonly number[];
|
||||
log?: WsLog;
|
||||
/**
|
||||
* Hook for the close path BEFORE the helper schedules a reconnect.
|
||||
* Used by DaemonBrokerClient to fail its in-flight pendingAcks map
|
||||
* with a "broker_disconnected_<code>" reason. The helper passes the
|
||||
* raw close code so the caller can shape its rejection text.
|
||||
*
|
||||
* Returns nothing — close handling continues regardless.
|
||||
*/
|
||||
onBeforeReconnect?: (code: number, reason: string) => void;
|
||||
}
|
||||
|
||||
export interface WsLifecycle {
|
||||
/** Current connection status. Updated synchronously before onStatusChange fires. */
|
||||
readonly status: WsStatus;
|
||||
/** Underlying socket. Exposed for callers that need OPEN-state checks
|
||||
* before sending (mirrors the pre-refactor `this.ws.readyState` checks). */
|
||||
readonly ws: WebSocket | null;
|
||||
/** Send a JSON payload over the open WS. Throws if not open — callers
|
||||
* that need queue-while-disconnected semantics should layer that
|
||||
* themselves (DaemonBrokerClient does, via its `opens` deferred-fn array). */
|
||||
send(payload: unknown): void;
|
||||
/** Close the WS and stop reconnecting. Idempotent. */
|
||||
close(): Promise<void>;
|
||||
}
|
||||
|
||||
const DEFAULT_HELLO_ACK_TIMEOUT_MS = 5_000;
|
||||
const DEFAULT_BACKOFF_CAPS_MS: readonly number[] = [1_000, 2_000, 4_000, 8_000, 16_000, 30_000];
|
||||
|
||||
const defaultLog: WsLog = (level, msg, meta) => {
|
||||
const line = JSON.stringify({ level, msg, ...meta, ts: new Date().toISOString() });
|
||||
if (level === "info") process.stdout.write(line + "\n");
|
||||
else process.stderr.write(line + "\n");
|
||||
};
|
||||
|
||||
/**
|
||||
* Connect a WebSocket with hello-handshake, ack-timeout, and reconnect
|
||||
* with exponential backoff. Resolves once the broker accepts the hello;
|
||||
* rejects if the first connect closes before the ack lands.
|
||||
*
|
||||
* Subsequent automatic reconnects are silent — they fire on the close
|
||||
* handler's backoff timer and surface only via onStatusChange (and any
|
||||
* caller-installed log).
|
||||
*/
|
||||
export function connectWsWithBackoff(opts: WsLifecycleOptions): Promise<WsLifecycle> {
|
||||
const helloAckTimeoutMs = opts.helloAckTimeoutMs ?? DEFAULT_HELLO_ACK_TIMEOUT_MS;
|
||||
const backoffCapsMs = opts.backoffCapsMs ?? DEFAULT_BACKOFF_CAPS_MS;
|
||||
const log: WsLog = opts.log ?? defaultLog;
|
||||
|
||||
let ws: WebSocket | null = null;
|
||||
let status: WsStatus = "closed";
|
||||
let closed = false;
|
||||
let reconnectAttempt = 0;
|
||||
let reconnectTimer: NodeJS.Timeout | null = null;
|
||||
let helloTimer: NodeJS.Timeout | null = null;
|
||||
|
||||
const setStatus = (s: WsStatus) => {
|
||||
if (status === s) return;
|
||||
status = s;
|
||||
opts.onStatusChange?.(s);
|
||||
};
|
||||
|
||||
/**
|
||||
* Open one WS attempt. Returns a promise that resolves on hello ack
|
||||
* or rejects if the socket closes before we get one. Used by both the
|
||||
* initial connect and the close-handler backoff timer (which awaits
|
||||
* but ignores the rejection — by then the close handler has already
|
||||
* scheduled its own reconnect).
|
||||
*/
|
||||
// Liveness watchdog: same cadence (30s) as the broker's outbound
|
||||
// ping. Two jobs per tick:
|
||||
// 1. If we haven't heard from the broker in >75s (2.5x the ping
|
||||
// cadence — covers one missed ping plus some slack), terminate
|
||||
// the socket. Fires the close handler → backoff reconnect runs
|
||||
// its normal path. This is what catches NAT-dropped half-dead
|
||||
// connections that the kernel won't RST for ~2 hours.
|
||||
// 2. Otherwise, send our own ping. The broker's `ws` library
|
||||
// auto-replies with a pong, which bumps lastActivity. This
|
||||
// keeps the broker's stale-pong watchdog seeing us as alive.
|
||||
//
|
||||
// Bare `ping` and `pong` events both bump lastActivity, as does
|
||||
// any inbound application message — any sign of life resets the
|
||||
// dead-man's-switch.
|
||||
const PING_INTERVAL_MS = 30_000;
|
||||
const STALE_THRESHOLD_MS = 75_000;
|
||||
let lastActivity = Date.now();
|
||||
let watchdogTimer: NodeJS.Timeout | null = null;
|
||||
|
||||
const openOnce = (): Promise<void> => {
|
||||
if (closed) return Promise.reject(new Error("client_closed"));
|
||||
setStatus("connecting");
|
||||
|
||||
log("info", "ws_open_attempt", { url: opts.url });
|
||||
const sock = new WebSocket(opts.url);
|
||||
ws = sock;
|
||||
lastActivity = Date.now();
|
||||
|
||||
return new Promise<void>((resolve, reject) => {
|
||||
sock.on("open", () => {
|
||||
log("info", "ws_open_ok", { url: opts.url });
|
||||
// Build and send the hello inside a microtask so any sync
|
||||
// throws from buildHello() reject this connect attempt cleanly.
|
||||
(async () => {
|
||||
try {
|
||||
const hello = await opts.buildHello();
|
||||
sock.send(JSON.stringify(hello));
|
||||
log("info", "ws_hello_sent", { url: opts.url });
|
||||
helloTimer = setTimeout(() => {
|
||||
log("warn", "hello_ack_timeout", { url: opts.url });
|
||||
try { sock.close(); } catch { /* ignore */ }
|
||||
reject(new Error("hello_ack_timeout"));
|
||||
}, helloAckTimeoutMs);
|
||||
} catch (e) {
|
||||
log("warn", "ws_build_hello_threw", { err: String(e) });
|
||||
reject(e instanceof Error ? e : new Error(String(e)));
|
||||
}
|
||||
})();
|
||||
});
|
||||
|
||||
sock.on("message", (raw) => {
|
||||
lastActivity = Date.now();
|
||||
let msg: Record<string, unknown>;
|
||||
try { msg = JSON.parse(raw.toString()) as Record<string, unknown>; }
|
||||
catch { return; }
|
||||
|
||||
if (opts.isHelloAck(msg)) {
|
||||
if (helloTimer) { clearTimeout(helloTimer); helloTimer = null; }
|
||||
setStatus("open");
|
||||
reconnectAttempt = 0;
|
||||
log("info", "ws_hello_acked", { url: opts.url });
|
||||
// Start liveness watchdog only after a successful handshake.
|
||||
if (watchdogTimer) clearInterval(watchdogTimer);
|
||||
watchdogTimer = setInterval(() => {
|
||||
if (sock.readyState !== sock.OPEN) return;
|
||||
const idle = Date.now() - lastActivity;
|
||||
if (idle > STALE_THRESHOLD_MS) {
|
||||
log("warn", "ws_stale_terminate", { url: opts.url, idle_ms: idle });
|
||||
try { sock.terminate(); } catch { /* socket already gone */ }
|
||||
return;
|
||||
}
|
||||
try { sock.ping(); } catch { /* ignore */ }
|
||||
}, PING_INTERVAL_MS);
|
||||
resolve();
|
||||
return;
|
||||
}
|
||||
|
||||
opts.onMessage(msg);
|
||||
});
|
||||
|
||||
sock.on("ping", () => { lastActivity = Date.now(); });
|
||||
sock.on("pong", () => { lastActivity = Date.now(); });
|
||||
|
||||
sock.on("close", (code, reason) => {
|
||||
if (helloTimer) { clearTimeout(helloTimer); helloTimer = null; }
|
||||
if (watchdogTimer) { clearInterval(watchdogTimer); watchdogTimer = null; }
|
||||
const reasonStr = reason.toString("utf8");
|
||||
log("warn", "ws_closed", { url: opts.url, code, reason: reasonStr, status });
|
||||
opts.onBeforeReconnect?.(code, reasonStr);
|
||||
|
||||
if (closed) {
|
||||
setStatus("closed");
|
||||
return;
|
||||
}
|
||||
setStatus("reconnecting");
|
||||
const wait = backoffCapsMs[Math.min(reconnectAttempt, backoffCapsMs.length - 1)] ?? 30_000;
|
||||
reconnectAttempt++;
|
||||
log("info", "ws_reconnect_scheduled", { url: opts.url, wait_ms: wait, code, reason: reasonStr });
|
||||
reconnectTimer = setTimeout(
|
||||
() => openOnce().catch((err) => log("warn", "ws_reconnect_failed", { url: opts.url, err: String(err) })),
|
||||
wait,
|
||||
);
|
||||
if (status === "connecting" || status === "reconnecting") {
|
||||
reject(new Error(`closed_before_hello_${code}`));
|
||||
}
|
||||
});
|
||||
|
||||
sock.on("error", (err) => log("warn", "ws_error", { url: opts.url, err: err.message }));
|
||||
});
|
||||
};
|
||||
|
||||
return openOnce().then(() => {
|
||||
const handle: WsLifecycle = {
|
||||
get status() { return status; },
|
||||
get ws() { return ws; },
|
||||
send(payload: unknown) {
|
||||
if (!ws || ws.readyState !== ws.OPEN) {
|
||||
throw new Error("ws_not_open");
|
||||
}
|
||||
ws.send(JSON.stringify(payload));
|
||||
},
|
||||
async close() {
|
||||
closed = true;
|
||||
if (reconnectTimer) { clearTimeout(reconnectTimer); reconnectTimer = null; }
|
||||
if (helloTimer) { clearTimeout(helloTimer); helloTimer = null; }
|
||||
if (watchdogTimer) { clearInterval(watchdogTimer); watchdogTimer = null; }
|
||||
try { ws?.close(); } catch { /* ignore */ }
|
||||
setStatus("closed");
|
||||
},
|
||||
};
|
||||
return handle;
|
||||
});
|
||||
}
|
||||
@@ -93,7 +93,19 @@ Peer (resource form, recommended)
|
||||
|
||||
Message (resource form)
|
||||
claudemesh message send <to> <m> send a message (alias: send)
|
||||
claudemesh message inbox drain pending (alias: inbox)
|
||||
flags: [--priority now|next|low] [--mesh <slug>]
|
||||
[--self] (allow targeting your own member/session pubkey;
|
||||
fans out to every sibling session of your member)
|
||||
[--json] (machine-readable result)
|
||||
claudemesh message inbox read persisted inbox (alias: inbox)
|
||||
flags: [--mesh <slug>] [--limit N] [--unread] [--json]
|
||||
reads ~/.claudemesh/daemon/inbox.db via daemon
|
||||
--unread → only rows never surfaced before (seen_at IS NULL);
|
||||
listing stamps returned rows seen as a side effect
|
||||
claudemesh inbox flush bulk-delete inbox rows
|
||||
flags: [--mesh <slug>] [--before <iso-timestamp>] [--all]
|
||||
--all required when neither --mesh nor --before is set
|
||||
claudemesh inbox delete <id> delete one inbox row by id (alias: rm)
|
||||
claudemesh message status <id> delivery status (alias: msg-status)
|
||||
|
||||
Memory (resource form)
|
||||
@@ -186,16 +198,18 @@ Security
|
||||
claudemesh backup [file] encrypt config → portable recovery file
|
||||
claudemesh restore <file> restore config from a backup file
|
||||
|
||||
Daemon (long-lived peer mesh runtime, v0.9.0)
|
||||
claudemesh daemon up start daemon (alias: start) [--mesh <slug>] [--no-tcp]
|
||||
Daemon (long-lived peer mesh runtime — universal across every joined mesh)
|
||||
claudemesh daemon up start daemon (alias: start) [--no-tcp]
|
||||
claudemesh daemon status show running pid + IPC health [--json]
|
||||
claudemesh daemon down stop daemon (alias: stop)
|
||||
claudemesh daemon version ipc + schema version of running daemon
|
||||
claudemesh daemon outbox list list local outbox rows [--failed|--pending|--inflight|--done]
|
||||
claudemesh daemon outbox requeue <id> re-enqueue an aborted/dead row [--new-client-id <id>]
|
||||
claudemesh daemon accept-host pin current host fingerprint
|
||||
claudemesh daemon install-service --mesh <slug> write launchd / systemd-user unit
|
||||
claudemesh daemon uninstall-service remove the unit
|
||||
claudemesh daemon install-service write launchd / systemd-user unit
|
||||
claudemesh daemon uninstall-service remove the unit
|
||||
Note: the daemon attaches to every mesh in ~/.claudemesh/config.json
|
||||
automatically; --mesh on up / install-service is deprecated and ignored.
|
||||
|
||||
Setup
|
||||
claudemesh install register MCP server + hooks
|
||||
@@ -388,9 +402,32 @@ async function main(): Promise<void> {
|
||||
case "bans": { const { runBans } = await import("~/commands/ban.js"); process.exit(await runBans({ mesh: flags.mesh as string, json: !!flags.json })); break; }
|
||||
|
||||
// Messaging
|
||||
case "peers": { const { runPeers } = await import("~/commands/peers.js"); await runPeers({ mesh: flags.mesh as string, json: flags.json as boolean | string | undefined }); break; }
|
||||
case "peers": { const { runPeers } = await import("~/commands/peers.js"); await runPeers({ mesh: flags.mesh as string, json: flags.json as boolean | string | undefined, all: !!flags.all }); break; }
|
||||
case "send": { const { runSend } = await import("~/commands/send.js"); await runSend({ mesh: flags.mesh as string, priority: flags.priority as string, json: !!flags.json, self: !!flags.self }, positionals[0] ?? "", positionals.slice(1).join(" ")); break; }
|
||||
case "inbox": { const { runInbox } = await import("~/commands/inbox.js"); await runInbox({ json: !!flags.json }); break; }
|
||||
case "inbox": {
|
||||
const sub = positionals[0];
|
||||
if (sub === "flush") {
|
||||
const { runInboxFlush } = await import("~/commands/inbox-actions.js");
|
||||
await runInboxFlush({
|
||||
mesh: flags.mesh as string | undefined,
|
||||
before: flags.before as string | undefined,
|
||||
all: !!flags.all,
|
||||
json: !!flags.json,
|
||||
});
|
||||
} else if (sub === "delete" || sub === "rm") {
|
||||
const { runInboxDelete } = await import("~/commands/inbox-actions.js");
|
||||
await runInboxDelete(positionals[1] ?? "", { json: !!flags.json });
|
||||
} else {
|
||||
const { runInbox } = await import("~/commands/inbox.js");
|
||||
await runInbox({
|
||||
mesh: flags.mesh as string | undefined,
|
||||
json: !!flags.json,
|
||||
limit: typeof flags.limit === "number" ? flags.limit : (typeof flags.limit === "string" ? Number.parseInt(flags.limit, 10) : undefined),
|
||||
unread: !!flags.unread,
|
||||
});
|
||||
}
|
||||
break;
|
||||
}
|
||||
case "state": {
|
||||
const sub = positionals[0];
|
||||
if (sub === "set") { const { runStateSet } = await import("~/commands/state.js"); await runStateSet({}, positionals[1] ?? "", positionals[2] ?? ""); }
|
||||
@@ -462,6 +499,11 @@ async function main(): Promise<void> {
|
||||
publicHealth: !!flags["public-health"],
|
||||
mesh: flags.mesh as string | undefined,
|
||||
displayName: flags.name as string | undefined,
|
||||
// 1.34.12: --foreground opts out of the new "detach by default"
|
||||
// behavior. install-service and `claudemesh launch`'s auto-spawn
|
||||
// path always run with --foreground so their parents (launchd /
|
||||
// the launch helper) own lifecycle and stdio redirection.
|
||||
foreground: !!flags.foreground,
|
||||
outboxStatus,
|
||||
newClientId: flags["new-client-id"] as string | undefined,
|
||||
}, rest);
|
||||
@@ -470,7 +512,7 @@ async function main(): Promise<void> {
|
||||
}
|
||||
|
||||
// Setup
|
||||
case "install": { const { runInstall } = await import("~/commands/install.js"); runInstall(positionals); break; }
|
||||
case "install": { const { runInstall } = await import("~/commands/install.js"); await runInstall(positionals); break; }
|
||||
case "uninstall": { const { uninstall } = await import("~/commands/uninstall.js"); process.exit(await uninstall()); break; }
|
||||
case "doctor": { const { runDoctor } = await import("~/commands/doctor.js"); await runDoctor(); break; }
|
||||
case "status": {
|
||||
@@ -510,7 +552,7 @@ async function main(): Promise<void> {
|
||||
|
||||
case "peer": {
|
||||
const sub = positionals[0];
|
||||
const f = { mesh: flags.mesh as string, json: flags.json as boolean | string | undefined };
|
||||
const f = { mesh: flags.mesh as string, json: flags.json as boolean | string | undefined, all: !!flags.all };
|
||||
const id = positionals[1] ?? "";
|
||||
if (sub === "list") { const { runPeers } = await import("~/commands/peers.js"); await runPeers(f); }
|
||||
else if (sub === "kick") { const { runKick } = await import("~/commands/kick.js"); process.exit(await runKick(id, { mesh: flags.mesh as string, stale: flags.stale as string, all: !!flags.all })); }
|
||||
@@ -525,8 +567,30 @@ async function main(): Promise<void> {
|
||||
|
||||
case "message": {
|
||||
const sub = positionals[0];
|
||||
if (sub === "send") { const { runSend } = await import("~/commands/send.js"); await runSend({ mesh: flags.mesh as string, priority: flags.priority as string, json: !!flags.json }, positionals[1] ?? "", positionals.slice(2).join(" ")); }
|
||||
else if (sub === "inbox") { const { runInbox } = await import("~/commands/inbox.js"); await runInbox({ json: !!flags.json }); }
|
||||
if (sub === "send") { const { runSend } = await import("~/commands/send.js"); await runSend({ mesh: flags.mesh as string, priority: flags.priority as string, json: !!flags.json, self: !!flags.self }, positionals[1] ?? "", positionals.slice(2).join(" ")); }
|
||||
else if (sub === "inbox") {
|
||||
const sub2 = positionals[1];
|
||||
if (sub2 === "flush") {
|
||||
const { runInboxFlush } = await import("~/commands/inbox-actions.js");
|
||||
await runInboxFlush({
|
||||
mesh: flags.mesh as string | undefined,
|
||||
before: flags.before as string | undefined,
|
||||
all: !!flags.all,
|
||||
json: !!flags.json,
|
||||
});
|
||||
} else if (sub2 === "delete" || sub2 === "rm") {
|
||||
const { runInboxDelete } = await import("~/commands/inbox-actions.js");
|
||||
await runInboxDelete(positionals[2] ?? "", { json: !!flags.json });
|
||||
} else {
|
||||
const { runInbox } = await import("~/commands/inbox.js");
|
||||
await runInbox({
|
||||
mesh: flags.mesh as string | undefined,
|
||||
json: !!flags.json,
|
||||
limit: typeof flags.limit === "number" ? flags.limit : (typeof flags.limit === "string" ? Number.parseInt(flags.limit, 10) : undefined),
|
||||
unread: !!flags.unread,
|
||||
});
|
||||
}
|
||||
}
|
||||
else if (sub === "status") { const { runMsgStatus } = await import("~/commands/broker-actions.js"); process.exit(await runMsgStatus(positionals[1], { mesh: flags.mesh as string, json: !!flags.json })); }
|
||||
else { console.error("Usage: claudemesh message <send|inbox|status>"); process.exit(EXIT.INVALID_ARGS); }
|
||||
break;
|
||||
|
||||
@@ -30,8 +30,9 @@ import {
|
||||
ListResourcesRequestSchema,
|
||||
ReadResourceRequestSchema,
|
||||
} from "@modelcontextprotocol/sdk/types.js";
|
||||
import { existsSync } from "node:fs";
|
||||
import { existsSync, appendFileSync } from "node:fs";
|
||||
import { request as httpRequest, type IncomingMessage } from "node:http";
|
||||
import { join } from "node:path";
|
||||
|
||||
import { DAEMON_PATHS } from "~/daemon/paths.js";
|
||||
import { VERSION } from "~/constants/urls.js";
|
||||
@@ -69,10 +70,15 @@ function bailNoDaemon(): never {
|
||||
|
||||
interface DaemonGetResult { status: number; body: any }
|
||||
|
||||
function daemonGet(path: string): Promise<DaemonGetResult> {
|
||||
function daemonGet(path: string, opts: { sessionToken?: string | null } = {}): Promise<DaemonGetResult> {
|
||||
return new Promise((resolve, reject) => {
|
||||
const headers: Record<string, string> = {};
|
||||
// 1.34.2+: when the launched process gave us a session token, forward
|
||||
// it on every IPC. Routes like `/v1/sessions/me` 401 without it, and
|
||||
// routes like `/v1/peers` use it for default-mesh scoping.
|
||||
if (opts.sessionToken) headers.Authorization = `ClaudeMesh-Session ${opts.sessionToken}`;
|
||||
const req = httpRequest(
|
||||
{ socketPath: DAEMON_PATHS.SOCK_FILE, path, method: "GET", timeout: 5_000 },
|
||||
{ socketPath: DAEMON_PATHS.SOCK_FILE, path, method: "GET", timeout: 5_000, headers },
|
||||
(res: IncomingMessage) => {
|
||||
const chunks: Buffer[] = [];
|
||||
res.on("data", (c) => chunks.push(c as Buffer));
|
||||
@@ -90,21 +96,54 @@ function daemonGet(path: string): Promise<DaemonGetResult> {
|
||||
});
|
||||
}
|
||||
|
||||
/** 1.34.8: best-effort POST /v1/inbox/seen so the MCP can stamp rows it
|
||||
* just surfaced via a `<channel>` reminder. Failures are swallowed —
|
||||
* read-state is a UX optimization, not a correctness gate. */
|
||||
function daemonMarkSeen(ids: string[], sessionToken?: string | null): Promise<void> {
|
||||
return new Promise((resolve) => {
|
||||
if (ids.length === 0) { resolve(); return; }
|
||||
const body = JSON.stringify({ ids });
|
||||
const headers: Record<string, string> = {
|
||||
"Content-Type": "application/json",
|
||||
"Content-Length": String(Buffer.byteLength(body)),
|
||||
};
|
||||
if (sessionToken) headers.Authorization = `ClaudeMesh-Session ${sessionToken}`;
|
||||
const req = httpRequest(
|
||||
{ socketPath: DAEMON_PATHS.SOCK_FILE, path: "/v1/inbox/seen", method: "POST", timeout: 3_000, headers },
|
||||
(res: IncomingMessage) => { res.on("data", () => { /* drain */ }); res.on("end", () => resolve()); },
|
||||
);
|
||||
req.on("error", () => resolve());
|
||||
req.on("timeout", () => { req.destroy(); resolve(); });
|
||||
req.write(body);
|
||||
req.end();
|
||||
});
|
||||
}
|
||||
|
||||
// ── daemon SSE subscription ────────────────────────────────────────────
|
||||
|
||||
interface DaemonEvent { kind: string; ts: string; data: Record<string, any> }
|
||||
|
||||
function subscribeEvents(onEvent: (e: DaemonEvent) => void): { close: () => void } {
|
||||
function subscribeEvents(onEvent: (e: DaemonEvent) => void, opts: { sessionToken?: string | null } = {}): { close: () => void } {
|
||||
let active = true;
|
||||
let req: ReturnType<typeof httpRequest> | null = null;
|
||||
|
||||
const connect = (): void => {
|
||||
if (!active) return;
|
||||
// 1.34.13: forward the session token on the SSE subscription so the
|
||||
// daemon's `/v1/events` route can scope the stream to this session
|
||||
// via the SseFilterOptions demux added in 1.34.10. Without this
|
||||
// header, `session` resolves to null in the IPC handler, the filter
|
||||
// is empty, and every MCP receives every event — manifests as
|
||||
// session A rendering DMs that arrived on B's session-WS. The
|
||||
// launch helper sets CLAUDEMESH_IPC_TOKEN_FILE in the child env;
|
||||
// readSessionTokenFromEnv() picks it up at MCP boot time.
|
||||
const headers: Record<string, string> = { Accept: "text/event-stream" };
|
||||
if (opts.sessionToken) headers.Authorization = `ClaudeMesh-Session ${opts.sessionToken}`;
|
||||
req = httpRequest({
|
||||
socketPath: DAEMON_PATHS.SOCK_FILE,
|
||||
path: "/v1/events",
|
||||
method: "GET",
|
||||
headers: { Accept: "text/event-stream" },
|
||||
headers,
|
||||
});
|
||||
let buffer = "";
|
||||
req.on("response", (res: IncomingMessage) => {
|
||||
@@ -166,7 +205,26 @@ export async function startMcpServer(): Promise<void> {
|
||||
|
||||
const server = new Server(
|
||||
{ name: "claudemesh", version: VERSION },
|
||||
{ capabilities: { tools: {}, prompts: {}, resources: {} } },
|
||||
{
|
||||
capabilities: {
|
||||
tools: {},
|
||||
prompts: {},
|
||||
resources: {},
|
||||
// 1.34.1 — declare the experimental `claude/channel` capability.
|
||||
// Claude Code v2.1.x gates `notifications/claude/channel` on this
|
||||
// exact key: its `xJ_(serverName, capabilities, pluginSource)` check
|
||||
// returns {action:"skip", kind:"capability"} when
|
||||
// `capabilities.experimental?.["claude/channel"]` is missing, and
|
||||
// the notification handler is never registered → every channel
|
||||
// emit lands on the floor, regardless of the
|
||||
// `--dangerously-load-development-channels server:claudemesh` flag.
|
||||
// This was the silent regression: pre-2.1.x clients didn't gate on
|
||||
// this key, so the same MCP wire shape "worked" until Claude Code
|
||||
// tightened the check. Verified by reading the binary at the
|
||||
// offsets near `notifications/claude/channel` in the strings dump.
|
||||
experimental: { "claude/channel": {} },
|
||||
},
|
||||
},
|
||||
);
|
||||
|
||||
// Tools: empty. The CLI is the API; the model invokes it via Bash.
|
||||
@@ -264,8 +322,33 @@ export async function startMcpServer(): Promise<void> {
|
||||
return { contents: [{ uri, mimeType: "text/markdown", text: fm.join("\n") + skill.instructions }] };
|
||||
});
|
||||
|
||||
// 1.34.1: every channel emit (and SSE event arrival) writes to a
|
||||
// per-pid log file under ~/.claudemesh/daemon/. Stderr from a Claude
|
||||
// Code-spawned MCP server isn't surfaced anywhere visible to the
|
||||
// user; without an on-disk trace we can't tell whether the SSE
|
||||
// delivered the event, whether the bus reached the MCP, or whether
|
||||
// server.notification rejected. The file path is stable across MCP
|
||||
// restarts so users can `tail -f` to watch live.
|
||||
const mcpLogPath = join(DAEMON_PATHS.DAEMON_DIR, `mcp-${process.pid}.log`);
|
||||
const mcpLog = (msg: string, meta?: Record<string, unknown>): void => {
|
||||
const line = JSON.stringify({ ts: new Date().toISOString(), pid: process.pid, msg, ...meta }) + "\n";
|
||||
try { appendFileSync(mcpLogPath, line); } catch { /* logging must never crash */ }
|
||||
};
|
||||
mcpLog("mcp_started", { version: VERSION });
|
||||
|
||||
// 1.34.8: forward session token on /v1/inbox/seen so the daemon can
|
||||
// resolve mesh scoping if it ever needs to. We read it once here and
|
||||
// capture it in the closure since the MCP runs for the lifetime of
|
||||
// the session; the env var doesn't rotate mid-process.
|
||||
const { readSessionTokenFromEnv } = await import("~/services/session/token.js");
|
||||
const sessionTokenForSeen = readSessionTokenFromEnv();
|
||||
|
||||
// Subscribe to daemon events; translate to channel notifications.
|
||||
// 1.34.13: pass the session token so the daemon scopes the SSE
|
||||
// stream via SseFilterOptions. Re-uses the same token already read
|
||||
// for /v1/inbox/seen above.
|
||||
const sub = subscribeEvents(async (ev) => {
|
||||
mcpLog("sse_event_received", { kind: ev.kind });
|
||||
if (ev.kind === "message") {
|
||||
const d = ev.data;
|
||||
const fromName = String(d.sender_name ?? "unknown");
|
||||
@@ -295,17 +378,51 @@ export async function startMcpServer(): Promise<void> {
|
||||
},
|
||||
},
|
||||
});
|
||||
mcpLog("channel_emitted", { content_preview: content.slice(0, 80), mesh: String(d.mesh ?? "") });
|
||||
// 1.34.8: this row was just surfaced inline as a channel
|
||||
// reminder; mark it seen so the next launch's welcome doesn't
|
||||
// re-surface it as "unread." Best-effort: a failure here just
|
||||
// means the welcome will list one extra row, not data loss.
|
||||
const inboxRowId = String(d.id ?? "");
|
||||
if (inboxRowId) {
|
||||
void daemonMarkSeen([inboxRowId], sessionTokenForSeen).catch(() => { /* swallow */ });
|
||||
}
|
||||
} catch (err) {
|
||||
mcpLog("channel_emit_failed", { err: String(err) });
|
||||
process.stderr.write(`[claudemesh-mcp] channel emit failed: ${err}\n`);
|
||||
}
|
||||
} else if (ev.kind === "peer_join" || ev.kind === "peer_leave" || ev.kind === "system") {
|
||||
const d = ev.data;
|
||||
const eventName = String(d.event ?? ev.kind);
|
||||
// 1.34.9: enrich peer_join/leave with the context the broker
|
||||
// already ships (name, pubkey prefix, groups, returning summary).
|
||||
// Pre-1.34.9 we surfaced just the displayName, which is ambiguous
|
||||
// when two sessions share a name (e.g. two `agutierrez` peers in
|
||||
// different cwds). Pubkey prefix disambiguates; groups hint at
|
||||
// role (e.g. "[ops, devs]"). cwd / role aren't in the broker
|
||||
// event yet, so they're skipped — adding them broker-side is a
|
||||
// separate ship.
|
||||
const renderPeerLine = (verb: string): string => {
|
||||
const name = String(d.name ?? "unknown");
|
||||
const pubkey = String(d.pubkey ?? "");
|
||||
const pubkeyTag = pubkey ? ` (${pubkey.slice(0, 8)})` : "";
|
||||
const groups = Array.isArray(d.groups) ? d.groups : [];
|
||||
const groupNames = groups
|
||||
.map((g) => (typeof g === "object" && g !== null && "name" in g ? String((g as { name: unknown }).name) : typeof g === "string" ? g : ""))
|
||||
.filter(Boolean);
|
||||
const groupsTag = groupNames.length > 0 ? ` [${groupNames.join(", ")}]` : "";
|
||||
const lastSeen = typeof d.lastSeenAt === "string" ? d.lastSeenAt : null;
|
||||
const summary = typeof d.summary === "string" && d.summary.trim() ? d.summary.trim() : null;
|
||||
const returningTail = lastSeen
|
||||
? ` — last seen ${new Date(lastSeen).toLocaleTimeString()}${summary ? ` · "${summary.slice(0, 80)}"` : ""}`
|
||||
: "";
|
||||
return `[system] Peer "${name}"${pubkeyTag}${groupsTag} ${verb} the mesh${returningTail}`;
|
||||
};
|
||||
let content: string;
|
||||
if (ev.kind === "peer_join") {
|
||||
content = `[system] Peer "${String(d.name ?? "unknown")}" joined the mesh`;
|
||||
content = renderPeerLine(eventName === "peer_returned" ? "returned to" : "joined");
|
||||
} else if (ev.kind === "peer_leave") {
|
||||
content = `[system] Peer "${String(d.name ?? "unknown")}" left the mesh`;
|
||||
content = renderPeerLine("left");
|
||||
} else {
|
||||
content = `[system] ${eventName}: ${JSON.stringify(d).slice(0, 240)}`;
|
||||
}
|
||||
@@ -318,12 +435,55 @@ export async function startMcpServer(): Promise<void> {
|
||||
kind: "system",
|
||||
event: eventName,
|
||||
mesh_slug: String(d.mesh ?? ""),
|
||||
...(typeof d.name === "string" ? { peer_name: d.name } : {}),
|
||||
...(typeof d.pubkey === "string" ? { peer_pubkey: d.pubkey } : {}),
|
||||
...(Array.isArray(d.groups) ? { peer_groups: JSON.stringify(d.groups) } : {}),
|
||||
...(typeof d.lastSeenAt === "string" ? { peer_last_seen_at: d.lastSeenAt } : {}),
|
||||
...(typeof d.summary === "string" ? { peer_summary: d.summary } : {}),
|
||||
},
|
||||
},
|
||||
});
|
||||
} catch { /* best effort */ }
|
||||
}
|
||||
});
|
||||
}, { sessionToken: sessionTokenForSeen });
|
||||
|
||||
// 1.34.6 — Welcome: single emit on oninitialized + 3s grace.
|
||||
//
|
||||
// The earlier "timing race" theory was wrong. Reading Claude Code's
|
||||
// binary at the `notifications/claude/channel` Zod schema:
|
||||
//
|
||||
// IJ_ = y.object({
|
||||
// method: y.literal("notifications/claude/channel"),
|
||||
// params: y.object({
|
||||
// content: y.string(),
|
||||
// meta: y.record(y.string(), y.string()).optional()
|
||||
// })
|
||||
// })
|
||||
//
|
||||
// `meta` MUST be a record of string-to-string. Pre-1.34.6 the
|
||||
// welcome shipped numbers (`peer_count`, `unread_count`) and arrays
|
||||
// (`peer_names`, `latest_message_ids`) — Zod rejected the entire
|
||||
// notification before it ever reached the channel handler.
|
||||
//
|
||||
// Live peer DMs always survived because their meta values all went
|
||||
// through `String(...)`. The welcome was the only notification
|
||||
// shape with non-string meta — uniquely affected, schema-rejected,
|
||||
// silently dropped.
|
||||
//
|
||||
// 1.34.6 fixes the meta values (see `emitMeshWelcome`) so the
|
||||
// notification passes validation; the dual-lane retry from 1.34.5
|
||||
// is no longer necessary and would now surface a duplicate. Back to
|
||||
// a single emit, with a 3s grace after `oninitialized` — enough for
|
||||
// the React effect that registers the channel handler to run, but
|
||||
// tight enough to feel like a launch handshake.
|
||||
const WELCOME_GRACE_MS = 3_000;
|
||||
let welcomeSent = false;
|
||||
server.oninitialized = () => {
|
||||
mcpLog("server_initialized");
|
||||
if (welcomeSent) return;
|
||||
welcomeSent = true;
|
||||
setTimeout(() => { void emitMeshWelcome(server, mcpLog); }, WELCOME_GRACE_MS);
|
||||
};
|
||||
|
||||
const transport = new StdioServerTransport();
|
||||
await server.connect(transport);
|
||||
@@ -341,6 +501,193 @@ export async function startMcpServer(): Promise<void> {
|
||||
process.on("SIGINT", shutdown);
|
||||
}
|
||||
|
||||
/**
|
||||
* Mesh-connected welcome. Runs once 5s after the MCP transport is up,
|
||||
* regardless of inbox state. The point isn't just to summarize unread —
|
||||
* an empty welcome still confirms to the user that the mesh pipe is
|
||||
* live, names the session, says how many peers are visible, and lists
|
||||
* the canonical CLI commands so the model can use them mid-turn.
|
||||
*
|
||||
* Composes from up to three best-effort daemon queries:
|
||||
* - `/v1/sessions/me` → display name + session pubkey + mesh
|
||||
* (requires session token; absent on bare `claudemesh mcp`)
|
||||
* - `/v1/peers?mesh=…` → live peer count, filtered to non-control-plane
|
||||
* - `/v1/inbox?…` → recent message count + up to 3 previews
|
||||
*
|
||||
* Each query degrades silently — a missing field becomes "unknown" or
|
||||
* is omitted. The welcome ALWAYS emits unless the IPC socket is
|
||||
* unreachable; that's the design contract: "you launched into the
|
||||
* mesh, here's what you've got."
|
||||
*/
|
||||
async function emitMeshWelcome(
|
||||
server: import("@modelcontextprotocol/sdk/server/index.js").Server,
|
||||
mcpLog: (msg: string, meta?: Record<string, unknown>) => void,
|
||||
): Promise<void> {
|
||||
const { readSessionTokenFromEnv } = await import("~/services/session/token.js");
|
||||
const sessionToken = readSessionTokenFromEnv();
|
||||
|
||||
// 1) Self identity. Token-less path (bare `claudemesh mcp` outside a
|
||||
// launch) just leaves these undefined; the welcome still goes out.
|
||||
let selfDisplayName: string | undefined;
|
||||
let selfSessionPubkey: string | undefined;
|
||||
let selfMeshSlug: string | undefined;
|
||||
let selfRole: string | undefined;
|
||||
if (sessionToken) {
|
||||
try {
|
||||
const { status, body } = await daemonGet("/v1/sessions/me", { sessionToken });
|
||||
if (status === 200 && body?.session) {
|
||||
selfDisplayName = body.session.displayName;
|
||||
selfMeshSlug = body.session.mesh;
|
||||
selfRole = body.session.role;
|
||||
selfSessionPubkey = body.session.presence?.sessionPubkey;
|
||||
}
|
||||
} catch (e) { mcpLog("welcome_self_lookup_failed", { err: String(e) }); }
|
||||
}
|
||||
|
||||
// 2) Live peer count. Match the same filter the launch banner uses
|
||||
// (`channel !== "claudemesh-daemon"`) so the welcome's number agrees
|
||||
// with the "N peers online" line that just printed in the terminal.
|
||||
// We also fall back to `peerRole !== "control-plane"` for newer
|
||||
// brokers that emit the role taxonomy. Excluding self uses both
|
||||
// session pubkey AND session id (older brokers may not surface
|
||||
// peerRole, so name-only matching would fail).
|
||||
let peerCount = -1;
|
||||
let peerNames: string[] = [];
|
||||
try {
|
||||
const path = selfMeshSlug ? `/v1/peers?mesh=${encodeURIComponent(selfMeshSlug)}` : "/v1/peers";
|
||||
const { status, body } = await daemonGet(path, { sessionToken });
|
||||
if (status === 200 && Array.isArray(body?.peers)) {
|
||||
const peers = body.peers as Array<Record<string, unknown>>;
|
||||
const real = peers.filter((p) => {
|
||||
const channel = String(p.channel ?? "");
|
||||
const peerRole = String(p.peerRole ?? "");
|
||||
const isInfra = channel === "claudemesh-daemon" || peerRole === "control-plane";
|
||||
if (isInfra) return false;
|
||||
if (selfSessionPubkey && p.pubkey === selfSessionPubkey) return false;
|
||||
return true;
|
||||
});
|
||||
peerCount = real.length;
|
||||
peerNames = real
|
||||
.map((p) => String(p.displayName ?? "unknown"))
|
||||
.filter((n, i, arr) => arr.indexOf(n) === i)
|
||||
.slice(0, 5);
|
||||
mcpLog("welcome_peers_resolved", { total: peers.length, real: real.length });
|
||||
} else {
|
||||
mcpLog("welcome_peers_status", { status });
|
||||
}
|
||||
} catch (e) { mcpLog("welcome_peers_lookup_failed", { err: String(e) }); }
|
||||
|
||||
// 3) Unread inbox. 1.34.8 replaced the "last 24h" window with the
|
||||
// proper read-state filter — `?unread_only=true` returns rows whose
|
||||
// `seen_at` is NULL. The list call uses `mark_seen=false` so the
|
||||
// welcome doesn't auto-stamp; we stamp explicitly via /v1/inbox/seen
|
||||
// *after* we know the channel notification went out (otherwise a
|
||||
// schema rejection would silently mark rows seen that the user
|
||||
// never actually saw — the original 1.34.6 bug shape).
|
||||
const inboxPath = selfMeshSlug
|
||||
? `/v1/inbox?mesh=${encodeURIComponent(selfMeshSlug)}&unread_only=true&mark_seen=false&limit=50`
|
||||
: `/v1/inbox?unread_only=true&mark_seen=false&limit=50`;
|
||||
let inboxItems: Array<Record<string, unknown>> = [];
|
||||
try {
|
||||
const { status, body } = await daemonGet(inboxPath, { sessionToken });
|
||||
if (status === 200 && Array.isArray(body?.items)) {
|
||||
inboxItems = body.items as Array<Record<string, unknown>>;
|
||||
}
|
||||
} catch (e) { mcpLog("welcome_inbox_lookup_failed", { err: String(e) }); }
|
||||
|
||||
// Compose the body. Markdown-friendly so it renders cleanly in the
|
||||
// Claude Code channel reminder block.
|
||||
const lines: string[] = [];
|
||||
const idTag = selfDisplayName
|
||||
? `${selfDisplayName}${selfSessionPubkey ? ` (${selfSessionPubkey.slice(0, 8)})` : ""}${selfRole ? ` [${selfRole}]` : ""}`
|
||||
: "session";
|
||||
const meshTag = selfMeshSlug ? ` on mesh \`${selfMeshSlug}\`` : "";
|
||||
lines.push(`🌐 [welcome] claudemesh connected — you are **${idTag}**${meshTag}.`);
|
||||
|
||||
if (peerCount === 0) {
|
||||
lines.push(`👥 No other peers online right now.`);
|
||||
} else if (peerCount > 0) {
|
||||
const namesPreview = peerNames.join(", ");
|
||||
const more = peerCount > peerNames.length ? ` …and ${peerCount - peerNames.length} more` : "";
|
||||
lines.push(`👥 ${peerCount} peer${peerCount === 1 ? "" : "s"} online: ${namesPreview}${more}`);
|
||||
} else {
|
||||
lines.push(`👥 Peer list unavailable (daemon query failed).`);
|
||||
}
|
||||
|
||||
if (inboxItems.length === 0) {
|
||||
lines.push(`📥 No unread messages.`);
|
||||
} else {
|
||||
lines.push(`📥 ${inboxItems.length} unread message${inboxItems.length === 1 ? "" : "s"}:`);
|
||||
for (const it of inboxItems.slice(0, 3)) {
|
||||
const sender = String(it.sender_name ?? "unknown");
|
||||
const senderPub = String(it.sender_pubkey ?? "").slice(0, 8);
|
||||
const tag = sender !== senderPub ? `${sender} (${senderPub})` : senderPub;
|
||||
const bodyText = (typeof it.body === "string" ? it.body : "(encrypted)").slice(0, 60);
|
||||
const time = it.received_at ? new Date(String(it.received_at)).toLocaleTimeString() : "";
|
||||
lines.push(` ${tag} ${time}: ${bodyText}`);
|
||||
}
|
||||
if (inboxItems.length > 3) lines.push(` …and ${inboxItems.length - 3} more`);
|
||||
}
|
||||
|
||||
// CLI hints — what the model should call when the user asks. Listed
|
||||
// here as a one-liner so the welcome stays compact.
|
||||
lines.push(`💡 Use: \`claudemesh peer list\` · \`claudemesh send <peer> <msg>\` · \`claudemesh inbox\``);
|
||||
// Skill pointer — the `claudemesh` skill in the user's Claude install
|
||||
// documents every CLI verb, JSON shapes, channel attributes, and
|
||||
// common patterns. If the model isn't already loaded with it, this is
|
||||
// the cue to read it once before acting on the mesh.
|
||||
lines.push(`📚 Read the \`claudemesh\` skill (SKILL.md) for full CLI / channel / inbox reference if not yet in context.`);
|
||||
|
||||
const content = lines.join("\n");
|
||||
try {
|
||||
// Claude Code's `notifications/claude/channel` schema is
|
||||
// `meta: y.record(y.string(), y.string())` — string values only.
|
||||
// Pre-1.34.6 we sent numbers / arrays in `peer_count`, `unread_count`,
|
||||
// `peer_names`, `latest_message_ids`; Zod silently rejected the
|
||||
// whole notification before it reached the channel handler. Live
|
||||
// peer DMs survived because their meta values all went through
|
||||
// `String(...)`. Coerce everything here too — arrays stringify as
|
||||
// JSON so downstream consumers can re-parse if they want, and the
|
||||
// counts become digit strings (parseable on the receiving side).
|
||||
await server.notification({
|
||||
method: "notifications/claude/channel",
|
||||
params: {
|
||||
content,
|
||||
meta: {
|
||||
kind: "welcome",
|
||||
self_display_name: selfDisplayName ?? "",
|
||||
self_session_pubkey: selfSessionPubkey ?? "",
|
||||
self_role: selfRole ?? "",
|
||||
mesh_slug: selfMeshSlug ?? "",
|
||||
peer_count: peerCount >= 0 ? String(peerCount) : "",
|
||||
peer_names: JSON.stringify(peerNames),
|
||||
unread_count: String(inboxItems.length),
|
||||
latest_message_ids: JSON.stringify(
|
||||
inboxItems.slice(0, 10).map((it) => String(it.id ?? "")),
|
||||
),
|
||||
},
|
||||
},
|
||||
});
|
||||
mcpLog("welcome_emitted", {
|
||||
mesh: selfMeshSlug ?? "",
|
||||
peer_count: peerCount,
|
||||
unread_count: inboxItems.length,
|
||||
});
|
||||
// 1.34.8: stamp the rows we just surfaced. Done AFTER the
|
||||
// notification succeeds so a Zod-rejected welcome (the 1.34.6 bug
|
||||
// shape) doesn't silently mark rows seen that the user never
|
||||
// actually saw. Best-effort.
|
||||
if (inboxItems.length > 0) {
|
||||
const ids = inboxItems.map((it) => String(it.id ?? "")).filter(Boolean);
|
||||
if (ids.length > 0) {
|
||||
void daemonMarkSeen(ids, sessionToken).catch(() => { /* swallow */ });
|
||||
}
|
||||
}
|
||||
} catch (err) {
|
||||
mcpLog("welcome_emit_failed", { err: String(err) });
|
||||
}
|
||||
}
|
||||
|
||||
// ── mesh-service proxy mode (unchanged from prior versions) ────────────
|
||||
|
||||
/**
|
||||
|
||||
@@ -52,6 +52,105 @@ export async function tryListPeersViaDaemon(mesh?: string): Promise<unknown[] |
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* 1.34.0 — Try fetching the persisted inbox from the daemon.
|
||||
*
|
||||
* Reads from `~/.claudemesh/daemon/inbox.db` via `/v1/inbox`. This is
|
||||
* the authoritative source of received messages — pushes from the
|
||||
* broker land here through the daemon's session-WS / member-WS push
|
||||
* handler. The pre-1.34.0 cold-path inbox command opened a fresh
|
||||
* BrokerClient and drained an empty in-memory buffer, which never
|
||||
* matched what the daemon was actually receiving.
|
||||
*/
|
||||
export interface InboxItem {
|
||||
id: string;
|
||||
client_message_id: string;
|
||||
broker_message_id: string | null;
|
||||
mesh: string;
|
||||
topic: string | null;
|
||||
sender_pubkey: string;
|
||||
sender_name: string;
|
||||
body: string | null;
|
||||
received_at: string;
|
||||
reply_to_id: string | null;
|
||||
/** 1.34.8: ISO timestamp of when the row was first surfaced to the
|
||||
* user (interactive listing or live channel reminder). `null` =
|
||||
* never seen. */
|
||||
seen_at?: string | null;
|
||||
}
|
||||
|
||||
export async function tryListInboxViaDaemon(
|
||||
mesh?: string,
|
||||
limit = 100,
|
||||
opts: { unreadOnly?: boolean; markSeen?: boolean } = {},
|
||||
): Promise<InboxItem[] | null> {
|
||||
if (!(await daemonReachable())) return null;
|
||||
try {
|
||||
const params: string[] = [`limit=${limit}`];
|
||||
if (mesh) params.push(`mesh=${encodeURIComponent(mesh)}`);
|
||||
// 1.34.8: read-state filters. `unread_only=true` narrows to seen_at
|
||||
// IS NULL; `mark_seen=false` lets the caller peek without flipping
|
||||
// the seen flag (used by the welcome push on the MCP side, not the
|
||||
// CLI). Default behavior matches pre-1.34.8 — return everything
|
||||
// and stamp it seen — so existing callers keep working.
|
||||
if (opts.unreadOnly) params.push("unread_only=true");
|
||||
if (opts.markSeen === false) params.push("mark_seen=false");
|
||||
const path = `/v1/inbox?${params.join("&")}`;
|
||||
const res = await ipc<{ items?: InboxItem[] }>({ path, timeoutMs: 3_000 });
|
||||
if (res.status !== 200) return null;
|
||||
return Array.isArray(res.body.items) ? res.body.items : [];
|
||||
} catch (err) {
|
||||
const msg = String(err);
|
||||
if (/ENOENT|ECONNREFUSED|ipc_timeout/.test(msg)) return null;
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/**
|
||||
* 1.34.7: bulk-delete inbox rows. `mesh` scopes to one mesh (omit =
|
||||
* across every attached mesh); `beforeIso` filters by `received_at <
|
||||
* Date.parse(beforeIso)`. Returns the number of rows removed, or null
|
||||
* when the daemon couldn't be reached.
|
||||
*/
|
||||
export async function tryFlushInboxViaDaemon(
|
||||
args: { mesh?: string; beforeIso?: string } = {},
|
||||
): Promise<number | null> {
|
||||
if (!(await daemonReachable())) return null;
|
||||
try {
|
||||
const params: string[] = [];
|
||||
if (args.mesh) params.push(`mesh=${encodeURIComponent(args.mesh)}`);
|
||||
if (args.beforeIso) params.push(`before=${encodeURIComponent(args.beforeIso)}`);
|
||||
const path = `/v1/inbox${params.length ? `?${params.join("&")}` : ""}`;
|
||||
const res = await ipc<{ removed?: number }>({ path, method: "DELETE", timeoutMs: 3_000 });
|
||||
if (res.status !== 200) return null;
|
||||
return typeof res.body.removed === "number" ? res.body.removed : null;
|
||||
} catch (err) {
|
||||
const msg = String(err);
|
||||
if (/ENOENT|ECONNREFUSED|ipc_timeout/.test(msg)) return null;
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/** 1.34.7: delete one inbox row by id. Returns true iff the row was
|
||||
* removed; false on 404; null on transport failure. */
|
||||
export async function tryDeleteInboxRowViaDaemon(id: string): Promise<boolean | null> {
|
||||
if (!(await daemonReachable())) return null;
|
||||
try {
|
||||
const res = await ipc<{ removed?: number }>({
|
||||
path: `/v1/inbox/${encodeURIComponent(id)}`,
|
||||
method: "DELETE",
|
||||
timeoutMs: 3_000,
|
||||
});
|
||||
if (res.status === 404) return false;
|
||||
if (res.status !== 200) return null;
|
||||
return (res.body.removed ?? 0) > 0;
|
||||
} catch (err) {
|
||||
const msg = String(err);
|
||||
if (/ENOENT|ECONNREFUSED|ipc_timeout/.test(msg)) return null;
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
/** Try fetching mesh-published skills through the daemon. */
|
||||
export async function tryListSkillsViaDaemon(mesh?: string): Promise<unknown[] | null> {
|
||||
if (!(await daemonReachable())) return null;
|
||||
|
||||
@@ -30,6 +30,7 @@
|
||||
*/
|
||||
|
||||
import { existsSync, readFileSync, statSync, unlinkSync, writeFileSync } from "node:fs";
|
||||
import { homedir } from "node:os";
|
||||
import { join } from "node:path";
|
||||
|
||||
import { ipc, IpcError } from "~/daemon/ipc/client.js";
|
||||
@@ -40,7 +41,11 @@ export type DaemonReadyState =
|
||||
| "started"
|
||||
| "down"
|
||||
| "spawn-failed"
|
||||
| "spawn-suppressed";
|
||||
| "spawn-suppressed"
|
||||
/** 1.31.0+: launchd / systemd manages the daemon and it didn't respond
|
||||
* within the service budget. Distinct from spawn-failed: the CLI did
|
||||
* not attempt to spawn (the OS owns the lifecycle). */
|
||||
| "service-not-ready";
|
||||
|
||||
export interface EnsureDaemonResult {
|
||||
state: DaemonReadyState;
|
||||
@@ -62,7 +67,16 @@ export interface EnsureDaemonOpts {
|
||||
const SPAWN_LOCK_FILE = () => join(DAEMON_PATHS.DAEMON_DIR, ".spawn.lock");
|
||||
const SPAWN_FAIL_FILE = () => join(DAEMON_PATHS.DAEMON_DIR, ".spawn-failure");
|
||||
const SPAWN_FAIL_TTL_MS = 30_000;
|
||||
const PROBE_TIMEOUT_MS = 800;
|
||||
// 1.31.0: 800 ms was too tight — the daemon's first IPC after a launchd
|
||||
// (re)start can take a beat while it migrates SQLite, opens broker WSes,
|
||||
// and warms up the event loop. False "stale" probes triggered the
|
||||
// pointless spawn → "socket did not appear" warning even on a perfectly
|
||||
// healthy service-managed daemon. 2500 ms still bounds the worst case.
|
||||
const PROBE_TIMEOUT_MS = 2_500;
|
||||
// When the daemon is service-managed (launchd/systemd) and KeepAlive=true,
|
||||
// the OS guarantees a restart on death — the CLI must NOT race that with
|
||||
// its own spawn. Just wait longer for the service unit to come up.
|
||||
const SERVICE_BUDGET_MS = 8_000;
|
||||
|
||||
let lastResultThisProcess: EnsureDaemonResult | null = null;
|
||||
|
||||
@@ -91,9 +105,30 @@ async function runEnsureDaemon(opts: EnsureDaemonOpts): Promise<EnsureDaemonResu
|
||||
// Step 1 — probe.
|
||||
const probe = await probeDaemon();
|
||||
if (probe === "up") return { state: "up", durationMs: Date.now() - t0 };
|
||||
|
||||
// Step 2 — service-managed shortcut. When launchd / systemd manages
|
||||
// the daemon and KeepAlive is set, the OS will restart a crashed
|
||||
// daemon on its own; the CLI must NOT race that with its own spawn
|
||||
// (would double-bind the singleton lock and trigger "daemon already
|
||||
// running" errors). Just wait quietly for the service to bring the
|
||||
// socket up.
|
||||
if (isServiceManaged()) {
|
||||
if (probe === "stale") cleanupStaleFiles();
|
||||
const polled = await pollForSocket(SERVICE_BUDGET_MS);
|
||||
if (polled.ok) return { state: "up", durationMs: Date.now() - t0 };
|
||||
const tool = process.platform === "darwin"
|
||||
? `launchctl print gui/$(id -u)/${SERVICE_LABEL}`
|
||||
: `systemctl --user status ${SYSTEMD_UNIT}`;
|
||||
return {
|
||||
state: "service-not-ready",
|
||||
durationMs: Date.now() - t0,
|
||||
reason: `service-managed daemon not responding within ${SERVICE_BUDGET_MS}ms (run \`${tool}\`)`,
|
||||
};
|
||||
}
|
||||
|
||||
if (probe === "stale") cleanupStaleFiles();
|
||||
|
||||
// Step 2 — auto-spawn unless forbidden.
|
||||
// Step 3 — auto-spawn unless forbidden.
|
||||
if (opts.noAutoSpawn) {
|
||||
return { state: "down", durationMs: Date.now() - t0, reason: "auto-spawn disabled" };
|
||||
}
|
||||
@@ -105,17 +140,37 @@ async function runEnsureDaemon(opts: EnsureDaemonOpts): Promise<EnsureDaemonResu
|
||||
};
|
||||
}
|
||||
|
||||
// Step 3 — spawn detached.
|
||||
// Step 4 — spawn detached.
|
||||
const spawnRes = await spawnDaemon(opts);
|
||||
if (spawnRes.ok) {
|
||||
return { state: "started", durationMs: Date.now() - t0 };
|
||||
}
|
||||
|
||||
// Step 4 — record failure for backoff and report.
|
||||
// Step 5 — record failure for backoff and report.
|
||||
markSpawnFailure();
|
||||
return { state: "spawn-failed", durationMs: Date.now() - t0, reason: spawnRes.reason };
|
||||
}
|
||||
|
||||
const SERVICE_LABEL = "com.claudemesh.daemon";
|
||||
const SYSTEMD_UNIT = "claudemesh-daemon.service";
|
||||
|
||||
/**
|
||||
* Returns true when the user has installed the daemon as a launchd
|
||||
* agent (macOS) or systemd --user unit (Linux). We detect by file
|
||||
* presence rather than shelling out to launchctl/systemctl on every
|
||||
* CLI invocation — this stays cheap and avoids spurious permission
|
||||
* prompts on locked-down hosts.
|
||||
*/
|
||||
function isServiceManaged(): boolean {
|
||||
if (process.platform === "darwin") {
|
||||
return existsSync(join(homedir(), "Library", "LaunchAgents", `${SERVICE_LABEL}.plist`));
|
||||
}
|
||||
if (process.platform === "linux") {
|
||||
return existsSync(join(homedir(), ".config", "systemd", "user", SYSTEMD_UNIT));
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
async function probeDaemon(): Promise<"up" | "absent" | "stale"> {
|
||||
if (!existsSync(DAEMON_PATHS.SOCK_FILE)) return "absent";
|
||||
try {
|
||||
@@ -165,8 +220,12 @@ async function spawnDaemon(opts: EnsureDaemonOpts): Promise<SpawnResult> {
|
||||
try {
|
||||
const { spawn } = await import("node:child_process");
|
||||
const binary = await resolveCliBinary();
|
||||
const args = ["daemon", "up"];
|
||||
if (opts.mesh) args.push("--mesh", opts.mesh);
|
||||
// 1.34.12: pass --foreground because the lifecycle helper IS the
|
||||
// detacher in this path — it spawns with detached:true + stdio:
|
||||
// ignore. If we let the child re-detach (the new default), we'd
|
||||
// double-fork and orphan the grandchild. --mesh is dropped (1.34.10
|
||||
// deprecation; daemon attaches to every joined mesh).
|
||||
const args = ["daemon", "up", "--foreground"];
|
||||
|
||||
const child = spawn(binary, args, {
|
||||
detached: true,
|
||||
|
||||
@@ -28,6 +28,14 @@ export interface ResolvedSession {
|
||||
cwd?: string;
|
||||
role?: string;
|
||||
groups?: string[];
|
||||
/** 1.32.0+: per-launch presence material lifted from the daemon's
|
||||
* registry so callers (peer list, whoami) can identify themselves
|
||||
* in the broker's peer list without re-handshaking a fresh WS. */
|
||||
presence?: {
|
||||
sessionPubkey: string;
|
||||
sessionSecretKey: string;
|
||||
parentAttestation?: unknown;
|
||||
};
|
||||
}
|
||||
|
||||
let cached: ResolvedSession | null | undefined = undefined;
|
||||
|
||||
@@ -50,6 +50,9 @@ export function warnDaemonState(
|
||||
case "spawn-failed":
|
||||
process.stderr.write(`${tag("warn")} daemon spawn failed${res.reason ? `: ${res.reason}` : ""} — using cold path ${hint("(check ~/.claudemesh/daemon/daemon.log)")}\n`);
|
||||
return true;
|
||||
case "service-not-ready":
|
||||
process.stderr.write(`${tag("warn")} ${res.reason ?? "service-managed daemon not responding"} — using cold path ${hint("(check ~/.claudemesh/daemon/daemon.log)")}\n`);
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
57
apps/cli/tests/unit/paths-stale-env.test.ts
Normal file
57
apps/cli/tests/unit/paths-stale-env.test.ts
Normal file
@@ -0,0 +1,57 @@
|
||||
import { describe, it, expect, beforeEach, afterEach, vi } from "vitest";
|
||||
import { mkdirSync, rmSync, existsSync } from "node:fs";
|
||||
import { join } from "node:path";
|
||||
import { tmpdir, homedir } from "node:os";
|
||||
|
||||
/** Each test imports a fresh copy of paths.ts via dynamic import +
|
||||
* `_resetPathsForTest()` so memoization doesn't leak across cases. */
|
||||
|
||||
const TEST_DIR = join(tmpdir(), "claudemesh-paths-test-" + Date.now());
|
||||
|
||||
describe("paths CONFIG_DIR resolution", () => {
|
||||
beforeEach(() => {
|
||||
delete process.env.CLAUDEMESH_CONFIG_DIR;
|
||||
if (existsSync(TEST_DIR)) rmSync(TEST_DIR, { recursive: true, force: true });
|
||||
});
|
||||
afterEach(() => {
|
||||
delete process.env.CLAUDEMESH_CONFIG_DIR;
|
||||
if (existsSync(TEST_DIR)) rmSync(TEST_DIR, { recursive: true, force: true });
|
||||
});
|
||||
|
||||
it("falls back to ~/.claudemesh when env var is unset", async () => {
|
||||
const mod = await import("~/constants/paths.js");
|
||||
mod._resetPathsForTest();
|
||||
expect(mod.PATHS.CONFIG_DIR).toBe(join(homedir(), ".claudemesh"));
|
||||
});
|
||||
|
||||
it("honors CLAUDEMESH_CONFIG_DIR when the dir exists, even without config.json", async () => {
|
||||
mkdirSync(TEST_DIR, { recursive: true });
|
||||
process.env.CLAUDEMESH_CONFIG_DIR = TEST_DIR;
|
||||
const mod = await import("~/constants/paths.js");
|
||||
mod._resetPathsForTest();
|
||||
expect(mod.PATHS.CONFIG_DIR).toBe(TEST_DIR);
|
||||
});
|
||||
|
||||
it("falls back to default when env points at a missing dir (stale-tmpdir case)", async () => {
|
||||
process.env.CLAUDEMESH_CONFIG_DIR = "/var/folders/_nonexistent_claudemesh_dir_xyz123";
|
||||
const mod = await import("~/constants/paths.js");
|
||||
mod._resetPathsForTest();
|
||||
// Suppress the stderr warning to keep test output clean
|
||||
const stderr = vi.spyOn(process.stderr, "write").mockImplementation(() => true);
|
||||
try {
|
||||
expect(mod.PATHS.CONFIG_DIR).toBe(join(homedir(), ".claudemesh"));
|
||||
} finally {
|
||||
stderr.mockRestore();
|
||||
}
|
||||
});
|
||||
|
||||
it("memoizes — second access returns the same path even if env changes mid-process", async () => {
|
||||
mkdirSync(TEST_DIR, { recursive: true });
|
||||
process.env.CLAUDEMESH_CONFIG_DIR = TEST_DIR;
|
||||
const mod = await import("~/constants/paths.js");
|
||||
mod._resetPathsForTest();
|
||||
const first = mod.PATHS.CONFIG_DIR;
|
||||
process.env.CLAUDEMESH_CONFIG_DIR = "/something/else";
|
||||
expect(mod.PATHS.CONFIG_DIR).toBe(first);
|
||||
});
|
||||
});
|
||||
129
apps/cli/tests/unit/session-reaper.test.ts
Normal file
129
apps/cli/tests/unit/session-reaper.test.ts
Normal file
@@ -0,0 +1,129 @@
|
||||
/**
|
||||
* Session reaper — PID-watcher autoclean (1.31.0).
|
||||
*
|
||||
* Verifies that registry entries are dropped when:
|
||||
* 1. their pid is no longer alive,
|
||||
* 2. their pid is alive but its start-time changed since register
|
||||
* (PID reuse — original process gone, OS recycled the number).
|
||||
*
|
||||
* The reaper is the autoclean source-of-truth: process-exit IPC from
|
||||
* the launched session is best-effort (skipped on SIGKILL, OOM, hard
|
||||
* crash, kernel panic) so this sweep is what actually keeps the
|
||||
* broker presence honest. Both signals must work or stale "ghost"
|
||||
* sessions linger on the broker.
|
||||
*/
|
||||
|
||||
import { afterEach, describe, expect, test, vi } from "vitest";
|
||||
|
||||
import {
|
||||
_resetRegistry,
|
||||
_runReaperOnce,
|
||||
listSessions,
|
||||
registerSession,
|
||||
setRegistryHooks,
|
||||
type SessionInfo,
|
||||
} from "../../src/daemon/session-registry.js";
|
||||
|
||||
afterEach(() => {
|
||||
_resetRegistry();
|
||||
vi.restoreAllMocks();
|
||||
});
|
||||
|
||||
describe("session reaper", () => {
|
||||
test("drops entry when pid is dead", async () => {
|
||||
const onDeregister = vi.fn();
|
||||
setRegistryHooks({ onDeregister });
|
||||
|
||||
// Use a high pid that is exceedingly unlikely to be alive on any
|
||||
// host — the alive check uses signal 0 which returns ESRCH for
|
||||
// unused pids.
|
||||
registerSession({
|
||||
token: "a".repeat(64),
|
||||
sessionId: "sess-dead",
|
||||
mesh: "m",
|
||||
displayName: "x",
|
||||
pid: 999_999,
|
||||
startTime: "Fri May 1 09:00:00 2026",
|
||||
});
|
||||
expect(listSessions()).toHaveLength(1);
|
||||
|
||||
await _runReaperOnce();
|
||||
|
||||
expect(listSessions()).toHaveLength(0);
|
||||
expect(onDeregister).toHaveBeenCalledTimes(1);
|
||||
const arg = onDeregister.mock.calls[0]![0] as SessionInfo;
|
||||
expect(arg.sessionId).toBe("sess-dead");
|
||||
});
|
||||
|
||||
test("keeps entry when pid is alive and start-time matches", async () => {
|
||||
const onDeregister = vi.fn();
|
||||
setRegistryHooks({ onDeregister });
|
||||
|
||||
// Use the test runner's own pid (process.pid is always alive here)
|
||||
// and capture its real start-time so the start-time guard sees a
|
||||
// match. Pre-seed startTime so registerSession's async ps probe
|
||||
// doesn't race the test.
|
||||
const { execFileSync } = require("node:child_process");
|
||||
const realStart = execFileSync("ps", ["-o", "lstart=", "-p", String(process.pid)], {
|
||||
encoding: "utf8",
|
||||
}).trim();
|
||||
|
||||
registerSession({
|
||||
token: "b".repeat(64),
|
||||
sessionId: "sess-live",
|
||||
mesh: "m",
|
||||
displayName: "x",
|
||||
pid: process.pid,
|
||||
startTime: realStart,
|
||||
});
|
||||
|
||||
await _runReaperOnce();
|
||||
|
||||
expect(listSessions()).toHaveLength(1);
|
||||
expect(onDeregister).not.toHaveBeenCalled();
|
||||
});
|
||||
|
||||
test("drops entry when pid is alive but start-time mismatched (PID reuse)", async () => {
|
||||
const onDeregister = vi.fn();
|
||||
setRegistryHooks({ onDeregister });
|
||||
|
||||
// Pid IS alive (process.pid) but we register a fake start-time
|
||||
// that won't match. Reaper must reap.
|
||||
registerSession({
|
||||
token: "c".repeat(64),
|
||||
sessionId: "sess-reused",
|
||||
mesh: "m",
|
||||
displayName: "x",
|
||||
pid: process.pid,
|
||||
startTime: "Sat Jan 1 00:00:00 1980",
|
||||
});
|
||||
|
||||
await _runReaperOnce();
|
||||
|
||||
expect(listSessions()).toHaveLength(0);
|
||||
expect(onDeregister).toHaveBeenCalledTimes(1);
|
||||
});
|
||||
|
||||
test("keeps entry when start-time wasn't captured (best-effort fallback)", async () => {
|
||||
const onDeregister = vi.fn();
|
||||
setRegistryHooks({ onDeregister });
|
||||
|
||||
// Register without startTime → reaper falls back to bare liveness.
|
||||
// process.pid is alive, so the entry must survive. (The fire-and-
|
||||
// forget capture inside registerSession will eventually populate
|
||||
// startTime, but it does so after a real fork — for this test we
|
||||
// rely on the synchronous reaper pass not seeing it yet.)
|
||||
registerSession({
|
||||
token: "d".repeat(64),
|
||||
sessionId: "sess-no-start-" + Math.random().toString(36).slice(2),
|
||||
mesh: "m",
|
||||
displayName: "x",
|
||||
pid: process.pid,
|
||||
});
|
||||
|
||||
await _runReaperOnce();
|
||||
|
||||
expect(listSessions()).toHaveLength(1);
|
||||
expect(onDeregister).not.toHaveBeenCalled();
|
||||
});
|
||||
});
|
||||
@@ -1,55 +1,161 @@
|
||||
import Link from "next/link";
|
||||
|
||||
import {
|
||||
CHANGELOG_ENTRIES,
|
||||
CHANGELOG_TYPE_COLOR,
|
||||
CHANGELOG_TYPE_LABELS,
|
||||
} from "~/modules/marketing/home/changelog-data";
|
||||
|
||||
export const metadata = {
|
||||
title: "Changelog — claudemesh",
|
||||
description: "Release history for claudemesh-cli.",
|
||||
description:
|
||||
"Release history for claudemesh-cli — every shipped version, with the why behind it.",
|
||||
};
|
||||
|
||||
const ENTRIES = [
|
||||
{ version: "0.1.4", date: "2026-04-06", type: "feat", summary: "Stateful welcome screen, PROTOCOL.md, THREAT_MODEL.md, Windows CI matrix" },
|
||||
{ version: "0.1.3", date: "2026-04-05", type: "feat", summary: "claudemesh --version, status, doctor commands" },
|
||||
{ version: "0.1.2", date: "2026-04-05", type: "feat", summary: "claudemesh launch command, transparency banner, decrypt fix, Windows support" },
|
||||
];
|
||||
|
||||
const TYPE_LABELS: Record<string, string> = { feat: "Feature", fix: "Fix", docs: "Docs" };
|
||||
const TYPE_COLORS: Record<string, string> = { feat: "bg-[var(--cm-clay)]", fix: "bg-[var(--cm-cactus)]", docs: "bg-[var(--cm-oat)]" };
|
||||
|
||||
export default function ChangelogPage() {
|
||||
return (
|
||||
<section className="mx-auto max-w-3xl px-6 py-24 md:py-32">
|
||||
<h1
|
||||
className="text-[clamp(2rem,4.5vw,3rem)] font-medium leading-[1.1] text-[var(--cm-fg)]"
|
||||
style={{ fontFamily: "var(--cm-font-serif)" }}
|
||||
>
|
||||
Changelog
|
||||
</h1>
|
||||
<p
|
||||
className="mt-4 text-[15px] text-[var(--cm-fg-secondary)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
Every shipped version of claudemesh-cli.
|
||||
</p>
|
||||
<div className="mt-12 space-y-8">
|
||||
{ENTRIES.map((entry) => (
|
||||
<article key={entry.version} className="border-b border-[var(--cm-border)] pb-6">
|
||||
<div className="flex items-center gap-3">
|
||||
<span
|
||||
className={`rounded-[4px] px-2 py-0.5 text-[10px] font-medium uppercase tracking-wider text-[var(--cm-bg)] ${TYPE_COLORS[entry.type] || "bg-[var(--cm-fg-tertiary)]"}`}
|
||||
style={{ fontFamily: "var(--cm-font-mono)" }}
|
||||
>
|
||||
{TYPE_LABELS[entry.type] || entry.type}
|
||||
</span>
|
||||
<span className="text-[18px] font-medium text-[var(--cm-fg)]" style={{ fontFamily: "var(--cm-font-serif)" }}>
|
||||
v{entry.version}
|
||||
</span>
|
||||
<time dateTime={entry.date} className="text-[11px] text-[var(--cm-fg-tertiary)]" style={{ fontFamily: "var(--cm-font-mono)" }}>
|
||||
{new Date(entry.date).toLocaleDateString("en-US", { year: "numeric", month: "short", day: "numeric" })}
|
||||
</time>
|
||||
</div>
|
||||
<p className="mt-2 text-[14px] leading-[1.6] text-[var(--cm-fg-secondary)]" style={{ fontFamily: "var(--cm-font-sans)" }}>
|
||||
{entry.summary}
|
||||
</p>
|
||||
</article>
|
||||
))}
|
||||
<div className="mb-12">
|
||||
<p
|
||||
className="text-[11px] uppercase tracking-[0.2em] text-[var(--cm-fg-tertiary)]"
|
||||
style={{ fontFamily: "var(--cm-font-mono)" }}
|
||||
>
|
||||
claudemesh-cli · release log
|
||||
</p>
|
||||
<h1
|
||||
className="mt-3 text-[clamp(2rem,4.5vw,3rem)] font-medium leading-[1.1] text-[var(--cm-fg)]"
|
||||
style={{ fontFamily: "var(--cm-font-serif)" }}
|
||||
>
|
||||
Changelog
|
||||
</h1>
|
||||
<p
|
||||
className="mt-4 max-w-xl text-[15px] leading-[1.65] text-[var(--cm-fg-secondary)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
Hand-picked, load-bearing ships from{" "}
|
||||
<span className="text-[var(--cm-fg)]">v0.1.0</span> through{" "}
|
||||
<span className="text-[var(--cm-clay)]">v1.34.15</span>. For the
|
||||
byte-level diff, the canonical{" "}
|
||||
<Link
|
||||
href="https://github.com/alezmad/claudemesh/blob/main/apps/cli/CHANGELOG.md"
|
||||
className="underline decoration-[var(--cm-fg-tertiary)] underline-offset-4 transition-colors hover:text-[var(--cm-fg)] hover:decoration-[var(--cm-clay)]"
|
||||
>
|
||||
CHANGELOG.md
|
||||
</Link>{" "}
|
||||
lives in the repo.
|
||||
</p>
|
||||
</div>
|
||||
|
||||
{/* Vertical timeline rail */}
|
||||
<div className="relative">
|
||||
<div
|
||||
className="absolute left-[7px] top-2 hidden h-full w-px md:block"
|
||||
style={{
|
||||
background:
|
||||
"linear-gradient(to bottom, var(--cm-clay) 0%, var(--cm-fig) 30%, var(--cm-cactus) 60%, transparent 100%)",
|
||||
}}
|
||||
/>
|
||||
|
||||
<div className="space-y-10">
|
||||
{CHANGELOG_ENTRIES.map((entry, idx) => (
|
||||
<article
|
||||
key={entry.version + entry.date}
|
||||
className="relative md:pl-10"
|
||||
>
|
||||
{/* Dot on rail */}
|
||||
<div
|
||||
className="absolute left-0 top-[10px] hidden h-[15px] w-[15px] rounded-full border-2 md:block"
|
||||
style={{
|
||||
borderColor: CHANGELOG_TYPE_COLOR[entry.type],
|
||||
backgroundColor: "var(--cm-bg)",
|
||||
}}
|
||||
>
|
||||
<div
|
||||
className="absolute inset-[3px] rounded-full"
|
||||
style={{
|
||||
backgroundColor: CHANGELOG_TYPE_COLOR[entry.type],
|
||||
opacity: idx === 0 ? 1 : 0.5,
|
||||
}}
|
||||
/>
|
||||
</div>
|
||||
|
||||
<header className="mb-3 flex flex-wrap items-baseline gap-x-3 gap-y-1">
|
||||
<span
|
||||
className="rounded-[3px] px-1.5 py-0.5 text-[10px] font-medium uppercase tracking-wider"
|
||||
style={{
|
||||
fontFamily: "var(--cm-font-mono)",
|
||||
backgroundColor: CHANGELOG_TYPE_COLOR[entry.type],
|
||||
color: "var(--cm-gray-900)",
|
||||
}}
|
||||
>
|
||||
{CHANGELOG_TYPE_LABELS[entry.type]}
|
||||
</span>
|
||||
<span
|
||||
className="text-[18px] font-medium text-[var(--cm-fg)]"
|
||||
style={{ fontFamily: "var(--cm-font-serif)" }}
|
||||
>
|
||||
v{entry.version}
|
||||
</span>
|
||||
<time
|
||||
dateTime={entry.date}
|
||||
className="text-[11px] text-[var(--cm-fg-tertiary)]"
|
||||
style={{ fontFamily: "var(--cm-font-mono)" }}
|
||||
>
|
||||
{new Date(entry.date).toLocaleDateString("en-US", {
|
||||
year: "numeric",
|
||||
month: "short",
|
||||
day: "numeric",
|
||||
})}
|
||||
</time>
|
||||
</header>
|
||||
|
||||
<h2
|
||||
className="text-[15px] font-medium text-[var(--cm-fg)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
{entry.title}
|
||||
</h2>
|
||||
|
||||
<p
|
||||
className="mt-2 text-[14px] leading-[1.7] text-[var(--cm-fg-secondary)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
{entry.summary}
|
||||
</p>
|
||||
</article>
|
||||
))}
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<footer className="mt-20 border-t border-[var(--cm-border)] pt-8">
|
||||
<p
|
||||
className="text-[13px] text-[var(--cm-fg-tertiary)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
Tracked at{" "}
|
||||
<Link
|
||||
href="https://github.com/alezmad/claudemesh/blob/main/docs/roadmap.md"
|
||||
className="underline decoration-[var(--cm-fg-tertiary)] underline-offset-4 transition-colors hover:text-[var(--cm-fg)] hover:decoration-[var(--cm-clay)]"
|
||||
>
|
||||
docs/roadmap.md
|
||||
</Link>
|
||||
. Specs at{" "}
|
||||
<Link
|
||||
href="https://github.com/alezmad/claudemesh/tree/main/.artifacts/specs"
|
||||
className="underline decoration-[var(--cm-fg-tertiary)] underline-offset-4 transition-colors hover:text-[var(--cm-fg)] hover:decoration-[var(--cm-clay)]"
|
||||
>
|
||||
.artifacts/specs/
|
||||
</Link>
|
||||
. Tagged binaries on{" "}
|
||||
<Link
|
||||
href="https://github.com/alezmad/claudemesh/releases"
|
||||
className="underline decoration-[var(--cm-fg-tertiary)] underline-offset-4 transition-colors hover:text-[var(--cm-fg)] hover:decoration-[var(--cm-clay)]"
|
||||
>
|
||||
GitHub Releases
|
||||
</Link>
|
||||
.
|
||||
</p>
|
||||
</footer>
|
||||
</section>
|
||||
);
|
||||
}
|
||||
|
||||
@@ -3,6 +3,7 @@ import { Features } from "~/modules/marketing/home/features";
|
||||
import { WhereMeshFits } from "~/modules/marketing/home/where-mesh-fits";
|
||||
import { WhatIsClaudemesh } from "~/modules/marketing/home/what-is-claudemesh";
|
||||
import { Timeline } from "~/modules/marketing/home/timeline";
|
||||
import { LatestReleases } from "~/modules/marketing/home/latest-releases";
|
||||
import { Pricing } from "~/modules/marketing/home/pricing";
|
||||
import { FAQ } from "~/modules/marketing/home/faq";
|
||||
import { CallToAction } from "~/modules/marketing/home/cta";
|
||||
@@ -22,6 +23,7 @@ const HomePage = () => {
|
||||
<WhereMeshFits />
|
||||
<WhatIsClaudemesh />
|
||||
<Timeline />
|
||||
<LatestReleases count={5} />
|
||||
<Pricing />
|
||||
<FAQ />
|
||||
<CallToAction />
|
||||
|
||||
168
apps/web/src/modules/marketing/home/changelog-data.ts
Normal file
168
apps/web/src/modules/marketing/home/changelog-data.ts
Normal file
@@ -0,0 +1,168 @@
|
||||
/**
|
||||
* Single source of truth for the curated release log surfaced on:
|
||||
* - /changelog (full timeline)
|
||||
* - / (Latest Releases compact strip)
|
||||
*
|
||||
* Lives outside `app/.../page.tsx` because Next.js's app-router type generator
|
||||
* rejects non-conforming exports from route files (only `default`, `metadata`,
|
||||
* `dynamic`, etc. are allowed). Importing data from a plain module sidesteps
|
||||
* the constraint without changing route semantics.
|
||||
*
|
||||
* Hand-picked load-bearing ships, newest first. For the byte-level history
|
||||
* see `apps/cli/CHANGELOG.md` in the repo.
|
||||
*/
|
||||
|
||||
export type ChangelogEntry = {
|
||||
version: string;
|
||||
date: string;
|
||||
type: "feat" | "fix" | "docs" | "perf" | "infra";
|
||||
title: string;
|
||||
summary: string;
|
||||
};
|
||||
|
||||
export const CHANGELOG_ENTRIES: ChangelogEntry[] = [
|
||||
{
|
||||
version: "1.34.15",
|
||||
date: "2026-05-04",
|
||||
type: "fix",
|
||||
title: "peer list --mesh scopes; kick refuses control-plane",
|
||||
summary:
|
||||
"Two follow-ups from the multi-session correctness train. peer list --mesh now forwards the slug to the daemon (was aggregating across all attached meshes). The broker refuses no-op kicks against control-plane connections (daemon, dashboard) — they auto-reconnected within seconds — and surfaces them in a new additive ack field. Soft `disconnect` keeps old behavior.",
|
||||
},
|
||||
{
|
||||
version: "1.34.14",
|
||||
date: "2026-05-04",
|
||||
type: "fix",
|
||||
title: "stale CLAUDEMESH_CONFIG_DIR falls back",
|
||||
summary:
|
||||
"When the launched-session env leaked into a later CLI invocation and pointed at a tmpdir that no longer existed, the resolver silently used the dead path and showed “No meshes joined”. Now memoized: env unset → default; env points at a real dir → trust; env set but dir gone → TTY-only stderr warning + fallback to ~/.claudemesh.",
|
||||
},
|
||||
{
|
||||
version: "1.34.7 → 1.34.13",
|
||||
date: "2026-05-04",
|
||||
type: "fix",
|
||||
title: "multi-session correctness train",
|
||||
summary:
|
||||
"Seven releases over a few hours that took claudemesh from “works for one session” to “internally consistent for N sessions on one daemon.” Per-session SSE demux at the bind layer, inbox per-recipient column, daemon detached by default, MCP forwards session token on /v1/events. Architecture invariant: every shared store / channel scopes by recipient.",
|
||||
},
|
||||
{
|
||||
version: "1.32.0",
|
||||
date: "2026-05-04",
|
||||
type: "feat",
|
||||
title: "multi-session UX bundle",
|
||||
summary:
|
||||
"Self-identity via session pubkey, `--self` fan-out for member-pubkey targeting, broker welcome on launch (broker state + peer count + unread inbox). Resolves hex prefixes to full pubkeys before send.",
|
||||
},
|
||||
{
|
||||
version: "1.30.0",
|
||||
date: "2026-05-04",
|
||||
type: "feat",
|
||||
title: "per-session broker presence",
|
||||
summary:
|
||||
"Two `claudemesh launch` sessions in the same cwd finally see each other in `peer list`. Each session has a long-lived broker presence row owned by the daemon, identified by a per-launch ephemeral keypair vouched by the member's stable key. Broker `session_hello` handler with parent-attestation TTL and session-signature checks.",
|
||||
},
|
||||
{
|
||||
version: "1.26.0 → 1.29.0",
|
||||
date: "2026-05-04",
|
||||
type: "feat",
|
||||
title: "multi-mesh daemon · per-session IPC tokens",
|
||||
summary:
|
||||
"One daemon process attaches to every joined mesh simultaneously. Aggregate read routes (/v1/peers, /v1/skills) tag each record with its mesh; explicit ?mesh=<slug> narrows server-side. Per-session IPC tokens scoped to tmpdir mode-0600 so CLI invocations from inside a launched session auto-attribute to its workspace. Self-healing daemon lifecycle (auto-spawn under file-lock, version probe).",
|
||||
},
|
||||
{
|
||||
version: "1.24.0",
|
||||
date: "2026-05-03",
|
||||
type: "feat",
|
||||
title: "daemon required + thin MCP",
|
||||
summary:
|
||||
"MCP server shrinks from 979 LoC to ~200 LoC of push-pipe. The daemon owns the broker WS and feeds the MCP push channel over IPC SSE. `claudemesh install` auto-installs and starts the daemon service. `claudemesh launch` ensures daemon is running before spawning Claude.",
|
||||
},
|
||||
{
|
||||
version: "0.9.0 (1.22.0)",
|
||||
date: "2026-05-03",
|
||||
type: "feat",
|
||||
title: "daemon foundation",
|
||||
summary:
|
||||
"Long-lived process holding one broker WS per attached mesh, durable outbox/inbox in SQLite, IPC over UDS (+ optional loopback TCP w/ bearer), SSE event stream. Caller-stable idempotency on every send. Service install (launchd / systemd-user). Outbox CLI with atomic abort+insert on requeue. Host-fingerprint pin on first run.",
|
||||
},
|
||||
{
|
||||
version: "0.7.0 (1.21.0)",
|
||||
date: "2026-05-03",
|
||||
type: "infra",
|
||||
title: "slug = identifier",
|
||||
summary:
|
||||
"Pre-launch correction of generic SaaS scaffolding. mesh.name and mesh.slug collapse — slug IS the identifier. `claudemesh rename <old-slug> <new-slug>` is the entire rename surface. CLI picker drops the (parens). Server PATCH /api/cli/meshes/:slug body becomes `{ slug }`.",
|
||||
},
|
||||
{
|
||||
version: "0.4.0 → 0.5.2 (1.10.0–1.18.0)",
|
||||
date: "2026-05-03",
|
||||
type: "feat",
|
||||
title: "me/* cross-mesh aggregation",
|
||||
summary:
|
||||
"First cross-mesh read-aggregating verbs. /v1/me/workspace, /v1/me/topics, /v1/me/notifications, /v1/me/activity, /v1/me/search — every aggregating read verb has CLI + web parity. Default-aggregation for `topic list`, `notification list`, `task list`, `state list`, `memory recall` when no --mesh is passed. file share / get with same-host fast path.",
|
||||
},
|
||||
{
|
||||
version: "0.3.0 (1.8.0)",
|
||||
date: "2026-05-02",
|
||||
type: "feat",
|
||||
title: "per-topic encryption (CLI + web)",
|
||||
summary:
|
||||
"Topics generate a 32-byte symmetric key on creation; broker seals via crypto_box for the creator. Pending-seals endpoint, seal POST, claudemesh topic post for encrypted REST sends, decrypt-on-render in topic tail, 30s background re-seal loop. Web side: browser-side persistent ed25519 identity in IndexedDB + encrypt-on-send / decrypt-on-render.",
|
||||
},
|
||||
{
|
||||
version: "1.7.0",
|
||||
date: "2026-05-02",
|
||||
type: "feat",
|
||||
title: "demo cut: topic tail, member list, notifications",
|
||||
summary:
|
||||
"Member sidebar in chat panel with names, online dots, presence summaries. Topic search + member-mention autocomplete. Notification feed at /dashboard listing every @<your-name> reference across all meshes (last 7 days). CLI parity: `claudemesh topic tail` (live SSE consumer), `claudemesh member list`, `claudemesh notification list`.",
|
||||
},
|
||||
{
|
||||
version: "0.2.0 (1.6.0)",
|
||||
date: "2026-05-02",
|
||||
type: "feat",
|
||||
title: "topics + REST gateway + bridge peers",
|
||||
summary:
|
||||
"Topics (channel pub/sub) with mesh = trust boundary, group = identity tag, topic = conversation scope — three orthogonal axes. API keys for non-WebSocket clients. REST /api/v1/* with bearer-token auth (messages, topics, peers, history). Bridge peers belonging to two meshes forwarding a topic between them. Humans-as-peers — peer_type: human plumbed end-to-end.",
|
||||
},
|
||||
{
|
||||
version: "1.5.0",
|
||||
date: "2026-05-02",
|
||||
type: "feat",
|
||||
title: "CLI-first architecture lock-in",
|
||||
summary:
|
||||
"Tool-less MCP — tools/list returns []. Inbound peer messages still arrive as experimental.claude/channel notifications mid-turn. Bundle size −42%. Resource-noun-verb CLI (peer list, message send, memory recall). Bundled claudemesh skill installed to ~/.claude/skills/. Unix-socket bridge for warm WS reuse (~220 ms warm vs ~600 ms cold). Policy engine + audit log.",
|
||||
},
|
||||
{
|
||||
version: "1.0.0-alpha",
|
||||
date: "2026-04-15",
|
||||
type: "feat",
|
||||
title: "single-binary distribution + per-peer caps",
|
||||
summary:
|
||||
"curl -fsSL claudemesh.com/install | sh downloads the right binary (darwin/linux/windows × x64/arm64). claudemesh:// URL scheme makes invite emails one-click. Per-peer capability grants: claudemesh grant/revoke/block/grants enforced server-side. Encrypted backup / restore with Argon2id + XChaCha20-Poly1305. Safety numbers (`claudemesh verify <peer>`).",
|
||||
},
|
||||
{
|
||||
version: "0.1.0",
|
||||
date: "2026-04-04",
|
||||
type: "feat",
|
||||
title: "public launch",
|
||||
summary:
|
||||
"Direct peer-to-peer messaging through a hosted broker, ready for real teams. End-to-end encryption — crypto_box direct, crypto_secretbox group. Signed ed25519 identities + signed invite links (ic://join/...). Hello-sig handshake auth. Hosted broker at wss://ic.claudemesh.com/ws. Claude Code MCP tools: list_peers, send_message, check_messages, set_summary, set_status.",
|
||||
},
|
||||
];
|
||||
|
||||
export const CHANGELOG_TYPE_LABELS: Record<ChangelogEntry["type"], string> = {
|
||||
feat: "Feature",
|
||||
fix: "Fix",
|
||||
docs: "Docs",
|
||||
perf: "Perf",
|
||||
infra: "Infra",
|
||||
};
|
||||
|
||||
export const CHANGELOG_TYPE_COLOR: Record<ChangelogEntry["type"], string> = {
|
||||
feat: "var(--cm-clay)",
|
||||
fix: "var(--cm-cactus)",
|
||||
docs: "var(--cm-oat)",
|
||||
perf: "var(--cm-fig)",
|
||||
infra: "var(--cm-fg-tertiary)",
|
||||
};
|
||||
@@ -32,9 +32,9 @@ export const CallToAction = () => {
|
||||
className="mx-auto mt-8 max-w-2xl text-lg leading-[1.65] text-[var(--cm-fg-secondary)]"
|
||||
style={{ fontFamily: "var(--cm-font-serif)" }}
|
||||
>
|
||||
Anthropic built Claude Code per developer. The next unlock is
|
||||
between developers. Hosted on claudemesh.com or self-hosted in
|
||||
your VPC — same CLI, same features, same encryption.
|
||||
Anthropic Agent Teams stops at the edge of one laptop. claudemesh
|
||||
starts there — across machines, users, and organizations. Hosted
|
||||
on claudemesh.com or self-hosted in your VPC, same CLI either way.
|
||||
</p>
|
||||
</Reveal>
|
||||
<Reveal delay={3}>
|
||||
|
||||
@@ -5,7 +5,7 @@ import { Reveal } from "./_reveal";
|
||||
const ITEMS = [
|
||||
{
|
||||
q: "Is claudemesh free?",
|
||||
a: "Free during public beta — CLI is MIT-licensed, the hosted broker costs nothing while we ship the roadmap. Paid tiers launch when the dashboard ships. Beta users keep the free plan for life.",
|
||||
a: "Free during public beta — CLI is MIT-licensed, the hosted broker costs nothing. Paid tiers launch when we exit beta and add team-scale features (SSO, audit retention, dedicated brokers). Beta users keep the free plan for life.",
|
||||
},
|
||||
{
|
||||
q: "How do I get started?",
|
||||
@@ -33,7 +33,11 @@ const ITEMS = [
|
||||
},
|
||||
{
|
||||
q: "How is this different from MCP?",
|
||||
a: "MCP connects one Claude to tools and services. claudemesh connects many Claudes to each other. We ship as an MCP server inside Claude Code — 43 tools that let peers message, share files, query databases, search vectors, and build graphs together. From the agent's view, other peers look like callable tools. It composes on top of MCP; it doesn't replace it.",
|
||||
a: "MCP connects one Claude to tools and services. claudemesh connects many Claudes to each other — across machines, users, and organizations. As of v1.5.0 the MCP shim is intentionally thin: tools/list returns []. Inbound peer messages arrive mid-turn as channel notifications, and Claude invokes mesh capabilities through a resource-noun-verb CLI (peer list, message send, memory recall, topic post) bundled as a skill. claudemesh composes on top of MCP; it doesn't replace it.",
|
||||
},
|
||||
{
|
||||
q: "How is this different from Anthropic's Agent Teams?",
|
||||
a: "Anthropic's experimental Agent Teams (shipped Feb 2026, Claude Code v2.1.32+) coordinates multiple Claude Code sessions inside ONE Unix user's ~/.claude/ directory on ONE machine. Mailbox lives in process. Task list is a markdown file. Lead is fixed for the team's lifetime. Cleanup wipes the state. claudemesh runs across machines, users, and organizations. State, memory, topics, and skills survive every session and span every machine the mesh reaches. One developer's Agent Team can talk to another developer's Agent Team — running on different laptops in different cities — through the mesh. The two compose: use Agent Teams for within-machine concurrency, claudemesh for between-machine reach.",
|
||||
},
|
||||
{
|
||||
q: "What persistence backends does the mesh include?",
|
||||
@@ -53,7 +57,7 @@ const ITEMS = [
|
||||
},
|
||||
{
|
||||
q: "Can a peer be in multiple meshes?",
|
||||
a: "Yes. Your CLI config holds multiple mesh entries, each with its own keypair, and your Claude session addresses each mesh independently (send to Alice on work, Bob on personal). Cross-mesh bridge peers that auto-forward tagged messages are v0.2; cross-broker federation (your self-host ↔ claudemesh.com) is v0.3.",
|
||||
a: "Yes. Your CLI config holds multiple mesh entries, each with its own keypair. As of v1.26.0, the daemon attaches to every joined mesh simultaneously — `claudemesh peer list` aggregates across all of them, `--mesh <slug>` narrows to one. Cross-mesh bridge peers that auto-forward tagged topics shipped in v0.2.0 (v1.6.0). Cross-broker federation (your self-host ↔ claudemesh.com) is the next major direction.",
|
||||
},
|
||||
];
|
||||
|
||||
|
||||
@@ -67,9 +67,10 @@ export const HeroWithMesh = () => {
|
||||
textShadow: "0 2px 20px rgba(0,0,0,0.8)",
|
||||
}}
|
||||
>
|
||||
Share context, files, skills, and MCPs across every Claude Code
|
||||
session — end-to-end encrypted. Hosted on claudemesh.com or
|
||||
self-hosted in your VPC. Same CLI, same wire, your choice.
|
||||
The encrypted backbone where Claude Code sessions, autonomous
|
||||
agents, and humans coordinate — across machines, across users,
|
||||
across organizations. Hosted on claudemesh.com or self-hosted in
|
||||
your VPC. Same CLI, same wire, your choice.
|
||||
</p>
|
||||
</Reveal>
|
||||
|
||||
|
||||
141
apps/web/src/modules/marketing/home/latest-releases.tsx
Normal file
141
apps/web/src/modules/marketing/home/latest-releases.tsx
Normal file
@@ -0,0 +1,141 @@
|
||||
import Link from "next/link";
|
||||
|
||||
import {
|
||||
CHANGELOG_ENTRIES,
|
||||
CHANGELOG_TYPE_COLOR,
|
||||
CHANGELOG_TYPE_LABELS,
|
||||
} from "./changelog-data";
|
||||
import { Reveal, SectionIcon } from "./_reveal";
|
||||
|
||||
/**
|
||||
* Compact recent-releases strip for the home page. Pulls the top N entries
|
||||
* from the same data source as the full /changelog page so they never
|
||||
* disagree.
|
||||
*/
|
||||
export const LatestReleases = ({ count = 5 }: { count?: number }) => {
|
||||
const recent = CHANGELOG_ENTRIES.slice(0, count);
|
||||
|
||||
return (
|
||||
<section className="border-b border-[var(--cm-border)] bg-[var(--cm-bg-elevated)] px-6 py-24 md:px-12 md:py-28">
|
||||
<div className="mx-auto max-w-[var(--cm-max-w)]">
|
||||
<Reveal className="mb-6 flex justify-center">
|
||||
<SectionIcon glyph="grid" />
|
||||
</Reveal>
|
||||
|
||||
<Reveal delay={1}>
|
||||
<p
|
||||
className="text-center text-[11px] uppercase tracking-[0.2em] text-[var(--cm-fg-tertiary)]"
|
||||
style={{ fontFamily: "var(--cm-font-mono)" }}
|
||||
>
|
||||
release log · last {count} ships
|
||||
</p>
|
||||
</Reveal>
|
||||
|
||||
<Reveal delay={2}>
|
||||
<h2
|
||||
className="mt-3 text-center text-[clamp(1.75rem,3.5vw,2.5rem)] font-medium leading-[1.15] text-[var(--cm-fg)]"
|
||||
style={{ fontFamily: "var(--cm-font-serif)" }}
|
||||
>
|
||||
What shipped this week
|
||||
</h2>
|
||||
</Reveal>
|
||||
|
||||
<Reveal delay={3}>
|
||||
<p
|
||||
className="mx-auto mt-3 max-w-xl text-center text-[14px] leading-[1.65] text-[var(--cm-fg-secondary)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
Every release is in production on{" "}
|
||||
<span
|
||||
className="text-[var(--cm-fg)]"
|
||||
style={{ fontFamily: "var(--cm-font-mono)" }}
|
||||
>
|
||||
wss://ic.claudemesh.com
|
||||
</span>{" "}
|
||||
within minutes. The CLI publishes to npm; the broker auto-deploys.
|
||||
</p>
|
||||
</Reveal>
|
||||
|
||||
<Reveal delay={4}>
|
||||
<ol className="mx-auto mt-12 max-w-3xl space-y-4">
|
||||
{recent.map((entry, idx) => (
|
||||
<li key={entry.version + entry.date}>
|
||||
<Link
|
||||
href="/changelog"
|
||||
className="group block rounded-[var(--cm-radius-md)] border border-[var(--cm-border)] bg-[var(--cm-bg)] p-5 transition-colors hover:border-[var(--cm-clay)]/40"
|
||||
>
|
||||
<div className="flex flex-wrap items-baseline gap-x-3 gap-y-1">
|
||||
<span
|
||||
className="rounded-[3px] px-1.5 py-0.5 text-[10px] font-medium uppercase tracking-wider"
|
||||
style={{
|
||||
fontFamily: "var(--cm-font-mono)",
|
||||
backgroundColor: CHANGELOG_TYPE_COLOR[entry.type],
|
||||
color: "var(--cm-gray-900)",
|
||||
}}
|
||||
>
|
||||
{CHANGELOG_TYPE_LABELS[entry.type]}
|
||||
</span>
|
||||
<span
|
||||
className="text-[16px] font-medium text-[var(--cm-fg)]"
|
||||
style={{ fontFamily: "var(--cm-font-serif)" }}
|
||||
>
|
||||
v{entry.version}
|
||||
</span>
|
||||
<time
|
||||
dateTime={entry.date}
|
||||
className="text-[11px] text-[var(--cm-fg-tertiary)]"
|
||||
style={{ fontFamily: "var(--cm-font-mono)" }}
|
||||
>
|
||||
{new Date(entry.date).toLocaleDateString("en-US", {
|
||||
year: "numeric",
|
||||
month: "short",
|
||||
day: "numeric",
|
||||
})}
|
||||
</time>
|
||||
{idx === 0 && (
|
||||
<span
|
||||
className="rounded-full bg-[var(--cm-clay)]/15 px-2 py-0.5 text-[10px] font-medium uppercase tracking-wider text-[var(--cm-clay)]"
|
||||
style={{ fontFamily: "var(--cm-font-mono)" }}
|
||||
>
|
||||
latest
|
||||
</span>
|
||||
)}
|
||||
</div>
|
||||
<h3
|
||||
className="mt-2.5 text-[15px] font-medium text-[var(--cm-fg)] transition-colors group-hover:text-[var(--cm-clay)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
{entry.title}
|
||||
</h3>
|
||||
<p
|
||||
className="mt-2 line-clamp-2 text-[13px] leading-[1.6] text-[var(--cm-fg-secondary)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
{entry.summary}
|
||||
</p>
|
||||
</Link>
|
||||
</li>
|
||||
))}
|
||||
</ol>
|
||||
</Reveal>
|
||||
|
||||
<Reveal delay={5}>
|
||||
<div className="mt-10 flex justify-center">
|
||||
<Link
|
||||
href="/changelog"
|
||||
className="group inline-flex items-center gap-2 text-[13px] font-medium text-[var(--cm-fg-secondary)] transition-colors hover:text-[var(--cm-clay)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
<span className="border-b border-dashed border-[var(--cm-fg-tertiary)] pb-0.5 transition-colors group-hover:border-[var(--cm-clay)]">
|
||||
Read the full changelog
|
||||
</span>
|
||||
<span className="transition-transform duration-300 group-hover:translate-x-1">
|
||||
→
|
||||
</span>
|
||||
</Link>
|
||||
</div>
|
||||
</Reveal>
|
||||
</div>
|
||||
</section>
|
||||
);
|
||||
};
|
||||
@@ -111,8 +111,9 @@ export const Pricing = () => {
|
||||
className="mb-4 text-[12px] leading-[1.5] text-[var(--cm-fg-tertiary)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
Paid tiers launch when the dashboard ships. Beta users keep
|
||||
the free plan for life.
|
||||
Paid tiers launch when we exit beta and add team-scale
|
||||
features (SSO, audit retention, dedicated brokers). Beta
|
||||
users keep the free plan for life.
|
||||
</p>
|
||||
<Link
|
||||
href="/auth/register"
|
||||
|
||||
@@ -85,6 +85,23 @@ const MILESTONES = [
|
||||
],
|
||||
stat: "43 MCP tools total",
|
||||
},
|
||||
{
|
||||
version: "v0.9 → 1.34",
|
||||
phase: "Daemon · multi-mesh · multi-session",
|
||||
color: "var(--cm-cactus)",
|
||||
items: [
|
||||
"Persistent daemon — long-lived broker WS, durable outbox/inbox",
|
||||
"Universal multi-mesh daemon — one process, every joined mesh",
|
||||
"Per-session IPC tokens — auto-scope to the launched session",
|
||||
"Per-session broker presence — sibling sessions see each other",
|
||||
"Self-healing daemon lifecycle (auto-spawn, version probe)",
|
||||
"Multi-session correctness train — per-recipient SSE demux + inbox scoping",
|
||||
"Refuse-to-kick on control-plane (no more no-op kicks)",
|
||||
"Caller-stable idempotency on every send",
|
||||
"Stale CLAUDEMESH_CONFIG_DIR fallback",
|
||||
],
|
||||
stat: "1.34.15 shipped",
|
||||
},
|
||||
];
|
||||
|
||||
export const Timeline = () => {
|
||||
@@ -94,7 +111,7 @@ export const Timeline = () => {
|
||||
<section className="border-b border-[var(--cm-border)] bg-[var(--cm-bg)] px-6 py-24 md:px-12 md:py-32">
|
||||
<div className="mx-auto max-w-[var(--cm-max-w)]">
|
||||
<Reveal className="mb-6 flex justify-center">
|
||||
<SectionIcon glyph="layers" />
|
||||
<SectionIcon glyph="grid" />
|
||||
</Reveal>
|
||||
<Reveal delay={1}>
|
||||
<h2
|
||||
@@ -109,7 +126,8 @@ export const Timeline = () => {
|
||||
className="mx-auto mt-4 max-w-xl text-center text-[15px] leading-[1.6] text-[var(--cm-fg-secondary)]"
|
||||
style={{ fontFamily: "var(--cm-font-sans)" }}
|
||||
>
|
||||
66 npm releases. Every feature below is in production today.
|
||||
120+ npm releases through v1.34.15. Every feature below is in
|
||||
production today.
|
||||
</p>
|
||||
</Reveal>
|
||||
|
||||
@@ -210,8 +228,8 @@ export const Timeline = () => {
|
||||
className="text-[14px] text-[var(--cm-fg-tertiary)]"
|
||||
style={{ fontFamily: "var(--cm-font-serif)" }}
|
||||
>
|
||||
Daemon redesign · per-topic encryption · self-host
|
||||
packaging · federation
|
||||
HKDF cross-machine identity · session capabilities · A2A
|
||||
interop · self-host packaging · federation
|
||||
</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
@@ -4,28 +4,28 @@ import Link from "next/link";
|
||||
|
||||
const NEWS = [
|
||||
{
|
||||
tag: "New",
|
||||
title: "claudemesh launch (v0.1.4)",
|
||||
body: "Real-time peer messages pushed into Claude Code mid-turn. One command. Source open at github.com/alezmad/claudemesh-cli.",
|
||||
href: "https://github.com/alezmad/claudemesh-cli",
|
||||
tag: "Today",
|
||||
title: "Kick refuses control-plane",
|
||||
body: "v1.34.15 — broker now skips control-plane peers on kick and acks the skip. Use ban for hard removal, or take the daemon down for transient cases.",
|
||||
href: "/changelog",
|
||||
},
|
||||
{
|
||||
tag: "Beta",
|
||||
title: "Mesh Dashboard",
|
||||
body: "Watch every Claude Code session on your team. Routes, presence, priority — all live.",
|
||||
href: "#",
|
||||
tag: "This week",
|
||||
title: "Multi-session correctness",
|
||||
body: "1.34.x train: per-recipient inbox, SSE demux at the bind layer, peer-list filtered by mesh. Multiple sessions on one machine no longer cross-talk.",
|
||||
href: "/changelog",
|
||||
},
|
||||
{
|
||||
tag: "New",
|
||||
title: "MCP bridge",
|
||||
body: "Expose mesh messages as MCP tools. Your agent can message peers without leaving its context.",
|
||||
href: "#",
|
||||
tag: "Shipped",
|
||||
title: "Per-session presence",
|
||||
body: "v1.30.0 — every Claude Code session gets its own ed25519 keypair and parent attestation. The broker tracks sessions, not machines.",
|
||||
href: "/changelog",
|
||||
},
|
||||
{
|
||||
tag: "Launch",
|
||||
title: "Self-hosted broker",
|
||||
body: "One binary. SQLite-backed. Runs on a Pi. Your mesh, never the cloud's.",
|
||||
href: "#",
|
||||
tag: "Shipped",
|
||||
title: "Multi-mesh daemon",
|
||||
body: "v1.26.0 — one daemon, every mesh you've joined. Switch context with a flag. Self-host the broker in your VPC; same CLI, your URL.",
|
||||
href: "/changelog",
|
||||
},
|
||||
];
|
||||
|
||||
|
||||
@@ -25,6 +25,14 @@ const CARDS: Card[] = [
|
||||
weDo: "claudemesh connects full, independent Claude Code sessions across machines, across developers, across continents. Each peer keeps its own repo, its own perspective, its own scrollback.",
|
||||
tone: "compare",
|
||||
},
|
||||
{
|
||||
label: "vs. Agent Teams",
|
||||
title: "Multi-agent within one machine",
|
||||
theyDo:
|
||||
"Anthropic's experimental Agent Teams (Feb 2026, Claude Code v2.1.32+) coordinates multiple Claude Code sessions inside ONE Unix user's ~/.claude/ directory on ONE machine. Mailbox in process. Task list in a markdown file. Lead is fixed. Cleanup wipes the state.",
|
||||
weDo: "claudemesh runs across machines, users, and organizations. State, memory, topics, and skills survive every session. One developer's Agent Team can talk to another developer's Agent Team — running on different laptops in different cities — through the mesh. Use Agent Teams for within-machine concurrency, claudemesh for between-machine reach.",
|
||||
tone: "compare",
|
||||
},
|
||||
{
|
||||
label: "vs. OpenClaw",
|
||||
title: "Autonomous agents that run while you sleep",
|
||||
@@ -35,10 +43,10 @@ const CARDS: Card[] = [
|
||||
},
|
||||
{
|
||||
label: "What claudemesh is",
|
||||
title: "The wire between Claude Code sessions",
|
||||
title: "The wire across machines, users, and orgs",
|
||||
theyDo:
|
||||
"Every Claude Code session today is an island. Context dies with the terminal. Skills and MCPs are per-developer. Teammates relay insights through Slack.",
|
||||
weDo: "claudemesh is one thing: a peer network for Claude Code. Share context, files, skills, MCPs, and slash commands across sessions — end-to-end encrypted. Host the broker on claudemesh.com or run it in your VPC. Same CLI either way.",
|
||||
"Every Claude Code session is an island unless you wrap it. Anthropic's Agent Teams now ties them together within one Unix user, one machine. Beyond that — across laptops, across team members, across companies — the gap is still wide.",
|
||||
weDo: "claudemesh is one thing: an end-to-end encrypted backbone where Claude Code sessions, autonomous agents, and humans coordinate across every boundary your existing tools stop at. Persistent state, topics, memory, and skills span every machine the mesh reaches. Host the broker on claudemesh.com or run it in your VPC. Same CLI either way.",
|
||||
tone: "claim",
|
||||
},
|
||||
];
|
||||
|
||||
@@ -3,8 +3,14 @@ import type { UserConfig } from "@commitlint/types";
|
||||
const Configuration: UserConfig = {
|
||||
extends: ["@commitlint/config-conventional"],
|
||||
rules: {
|
||||
"body-max-length": [1, "always", 100],
|
||||
"body-max-line-length": [1, "always", 100],
|
||||
// body-max-length capped TOTAL body length at 100 chars — meaningless
|
||||
// for technical commits, fired a warning on every substantive
|
||||
// changelog-style message. Disabled (level 0).
|
||||
"body-max-length": [0, "always", 0],
|
||||
// Per-line body cap. Bumped from 100 to 200 so long URLs, file
|
||||
// paths, and copy-pasted error lines don't trip a warning that
|
||||
// adds nothing — but still catches accidental no-wrap.
|
||||
"body-max-line-length": [1, "always", 200],
|
||||
},
|
||||
};
|
||||
|
||||
|
||||
194
docs/roadmap.md
194
docs/roadmap.md
@@ -277,8 +277,9 @@ identity (deferred for security review).
|
||||
signature checks; daemon adds a slim `SessionBrokerClient` and
|
||||
registry lifecycle hooks. Also fixes a latent 1.29.0 TDZ bug where
|
||||
`claudemesh launch`'s IPC session-token registration was silently
|
||||
failing every run. Spec at
|
||||
`.artifacts/specs/2026-05-04-per-session-presence.md`.
|
||||
failing every run. Side-cleanup: 87 accumulated TS errors (77 broker,
|
||||
10 CLI) paid down to zero. *Shipped 2026-05-04 in CLI v1.30.0.*
|
||||
Spec at `.artifacts/specs/2026-05-04-per-session-presence.md`.
|
||||
|
||||
What's left for true v2.0.0 (next sessions):
|
||||
|
||||
@@ -291,6 +292,195 @@ What's left for true v2.0.0 (next sessions):
|
||||
|
||||
---
|
||||
|
||||
## v1.31.0 → v1.32.0 — *multi-session UX bundle* — *shipped*
|
||||
|
||||
The Sprint B push that made multiple Claude Code sessions on the
|
||||
same daemon actually pleasant — self-identity via session pubkey,
|
||||
`--self` fan-out, broker welcome.
|
||||
|
||||
- **1.31.x** — peer list shows `profile.role` and groups; resolves
|
||||
hex prefixes to full pubkeys before send; clean rebuild path with
|
||||
correct VERSION baked in.
|
||||
- **1.32.0** — multi-session UX bundle (self-identity, `--self`
|
||||
fan-out, broker welcome). *Shipped 2026-05-04 in CLI v1.32.0.*
|
||||
|
||||
---
|
||||
|
||||
## v1.34.x — *multi-session correctness train* — *shipped*
|
||||
|
||||
The 2026-05-04 ship train — seven releases over a few hours that
|
||||
took claudemesh from "works for one session" to "internally
|
||||
consistent for N sessions on one daemon." Every layer that was
|
||||
shared between sessions either grew per-recipient scoping or
|
||||
demuxed at its boundary.
|
||||
|
||||
The throughline: any time the daemon held shared state — bus,
|
||||
inbox, broker fan-out — two sessions belonging to the same member
|
||||
silently saw each other's traffic. Each release fixed one layer,
|
||||
each release exposed the next gap.
|
||||
|
||||
- **1.34.7 — inbox flush + delete commands.** First-class CLI
|
||||
cleanup for the persisted inbox; previously you had to drop into
|
||||
raw `sqlite3`. `claudemesh inbox flush --mesh|--before|--all`
|
||||
with `--all` confirmation guard, plus `claudemesh inbox delete
|
||||
<id>`. *Shipped 2026-05-04.*
|
||||
- **1.34.8 — read-state + TTL prune + first echo guard.** New
|
||||
`seen_at` column on `inbox`; live channel emits + interactive
|
||||
listings flip it; welcome filters on `seen_at IS NULL` instead
|
||||
of an arbitrary 24h window. Hourly prune deletes rows older than
|
||||
30 days. First attempt at a self-echo guard at the WS boundary
|
||||
(later proven incomplete in 1.34.13). *Shipped 2026-05-04.*
|
||||
- **1.34.9 — broader echo guard + system event polish.** Daemon-WS
|
||||
guard relaxed (1.34.8 required both axes; session-attributed
|
||||
echoes carry session pubkey on `senderPubkey` so the strict
|
||||
filter never triggered). Session-WS skips system events to dedupe
|
||||
peer_join broadcasts. Richer peer-join channel render
|
||||
(pubkey prefix + groups + last-seen for `peer_returned`).
|
||||
Daemon-staleness warning when CLI ≠ running daemon version.
|
||||
*Shipped 2026-05-04.*
|
||||
- **1.34.10 — per-session SSE demux + universal daemon.** The
|
||||
bus stays single-shot; demux happens at the SSE bind layer
|
||||
via `SseFilterOptions`. Each subscriber's session token resolves
|
||||
server-side to a session pubkey + member pubkey, and
|
||||
`shouldDeliver` filters on `recipient_pubkey` + `recipient_kind`.
|
||||
Also: `daemon up` and `install-service` deprecate `--mesh` /
|
||||
`--name` (universal daemon attaches to every joined mesh
|
||||
automatically); `daemon_started` boot log stamps the version.
|
||||
*Shipped 2026-05-04.*
|
||||
- **1.34.11 — inbox per-recipient column.** Storage half of
|
||||
1.34.10. New `recipient_pubkey` + `recipient_kind` columns on
|
||||
`inbox` (indexed, non-destructive migration; legacy rows land
|
||||
NULL and stay visible to everyone). `listInbox` accepts
|
||||
`recipientPubkey` + `recipientMemberPubkey`; `/v1/inbox`
|
||||
resolves them from the session token. Welcome auto-fixes —
|
||||
it already passed the token. *Shipped 2026-05-04.*
|
||||
- **1.34.12 — `daemon up` detaches by default.** Pre-1.34.12
|
||||
ran in foreground and streamed JSON logs to the terminal until
|
||||
Ctrl-C. Now spawns a detached child re-execing `daemon up
|
||||
--foreground` with stdout/stderr → `~/.claudemesh/daemon/
|
||||
daemon.log`; parent exits cleanly with pid + log path.
|
||||
Service units (launchd plist, systemd-user) explicitly pass
|
||||
`--foreground` so the service manager owns lifecycle.
|
||||
*Shipped 2026-05-04.*
|
||||
- **1.34.13 — MCP forwards session token on `/v1/events`.** The
|
||||
actual fix that activated 1.34.10's demux. The MCP server's
|
||||
SSE subscription wasn't sending the session token, so the
|
||||
daemon's `/v1/events` resolved `session` to null and the demux
|
||||
filter was empty — every MCP received the unfiltered global
|
||||
stream. `subscribeEvents` now passes `Authorization:
|
||||
ClaudeMesh-Session <token>`. *Shipped 2026-05-04.*
|
||||
|
||||
### Architecture invariant after 1.34.13
|
||||
|
||||
Every shared store / channel on the daemon now scopes by recipient.
|
||||
Single bus + single tables remain canonical; demux is isolated to
|
||||
one chokepoint per layer.
|
||||
|
||||
| Layer | Scoping mechanism | Shipped |
|
||||
|---|---|---|
|
||||
| EventBus | SSE demux at bind layer + token forwarding | 1.34.10 + 1.34.13 |
|
||||
| inbox.db | `recipient_pubkey` / `recipient_kind` columns | 1.34.11 |
|
||||
| outbox.db | `sender_session_pubkey` for routing | 1.34.0 |
|
||||
|
||||
### Known gaps — status after the 2026-05-04 follow-up sprint
|
||||
|
||||
Three of the four 1.34.x triage gaps shipped in 1.34.14 + 1.34.15
|
||||
(2026-05-04). Gap #4 is spec'd and queued.
|
||||
|
||||
- ✅ **Stale `CLAUDEMESH_CONFIG_DIR` falls back** *(1.34.14)*. The
|
||||
env var no longer silently breaks subsequent CLI calls. When the
|
||||
inherited path points at a tmpdir that no longer exists,
|
||||
`paths.ts` warns once on stderr (TTY-only) with a shell-specific
|
||||
unset hint and falls back to `~/.claudemesh`. The dir-existence
|
||||
check (not `config.json`) keeps fresh-launch first-write working.
|
||||
- ✅ **`peer list --mesh <slug>` actually scopes** *(1.34.15)*.
|
||||
Diagnosis from the original triage was wrong — broker has been
|
||||
scoping correctly since 1.26.0 via `conn.meshId`. Bug was CLI-
|
||||
side: `tryListPeersViaDaemon()` was called with no argument in
|
||||
`commands/peers.ts:140` and `commands/launch.ts:407`. Both now
|
||||
forward the slug as `?mesh=<slug>`. `send.ts` cross-mesh hex-
|
||||
prefix resolution intentionally untouched.
|
||||
- ✅ **`kick` refuses no-op kicks on control-plane** *(1.34.15)*.
|
||||
Broker now skips peers where `peerRole === "control-plane"` and
|
||||
surfaces them in a new additive ack field
|
||||
`skipped_control_plane`; CLI reads it and points the user at
|
||||
`ban` (remove member) or `daemon down` (take a daemon offline
|
||||
locally). Soft `disconnect` keeps old behavior — useful when
|
||||
intentionally nudging a control-plane peer to re-authenticate.
|
||||
`PeerConn` gains a `peerRole` slot populated at both
|
||||
`connections.set` sites. The richer `presence pause [--mesh X]`
|
||||
verb (option (b) from the triage) deferred as its own feature.
|
||||
- 📋 **Session capabilities — spec only**. Launched sessions still
|
||||
inherit all member grants transitively. Spec at
|
||||
`.artifacts/specs/2026-05-04-session-capabilities.md` covers a v2
|
||||
parent attestation alongside v1 with an `allowed_caps[]` subset,
|
||||
broker enforcement as `intersection(member.peerGrants, session.
|
||||
allowed_caps)`, and a bonus `state-write` cap to close the "any
|
||||
session can clobber shared keys like `current-pr`" footgun.
|
||||
Default when no caps subset is declared = full member set
|
||||
(today's behavior; opt-in restriction). Ships behind a 1-week
|
||||
dry-run window before flipping enforcement, mirroring the
|
||||
original per-peer-capabilities rollout. ~1 sprint of focused
|
||||
work; queued behind v0.3.0 topic-encryption.
|
||||
|
||||
---
|
||||
|
||||
## v1.34.16 + broker — *continuous presence* — *shipped*
|
||||
|
||||
User report on 2026-05-05: `claudemesh peer list` returned zero
|
||||
peers despite running sessions. Diagnosis: half-dead WS connections
|
||||
that NAT/CGNAT silently dropped, with no application-layer staleness
|
||||
detection on either side. Linux TCP keepalive default ≈ 2hrs idle
|
||||
+ 11min probes — sessions stayed zombie for hours before the kernel
|
||||
RST'd the socket and the daemon's existing close-handler reconnect
|
||||
fired.
|
||||
|
||||
Two layers shipped together:
|
||||
|
||||
- **Liveness watchdogs** *(broker + CLI 1.34.16)*. Both sides now
|
||||
detect stalled WS in 75s instead of waiting for the kernel.
|
||||
- Broker: `PeerConn.lastPongAt` bumped on every `pong`. The 30s
|
||||
ping loop also calls `ws.terminate()` on conns whose pong is
|
||||
>75s stale, firing the close handler → existing peer_left
|
||||
cleanup.
|
||||
- Daemon: `ws-lifecycle.ts` adds an idle watchdog at 30s cadence,
|
||||
started after hello-ack. Bumps `lastActivity` on incoming
|
||||
message + ping + pong frames. Sends its own `sock.ping()` if
|
||||
activity is recent, `sock.terminate()` if idle >75s. Watchdog
|
||||
cleared on close + explicit close().
|
||||
- 100x improvement on detection time (2hrs → 75s).
|
||||
- **Lease model** *(broker only, no protocol change)*. Peers no
|
||||
longer see `peer_left`/`peer_joined` for transient reconnects.
|
||||
- `PeerConn` gains `leaseState` ("online"|"offline"), `leaseUntil`,
|
||||
`evictionTimer`. On WS close, the conn enters **offline-leased**
|
||||
state for 90s instead of immediate cleanup.
|
||||
- `handleHello` and `handleSessionHello` check for an offline-
|
||||
leased entry matching the stable identity before running session-
|
||||
id dedup. On match: clear `evictionTimer`, swap `ws`, restore
|
||||
online state, drain queued DMs, return `silent: true`. The
|
||||
hello dispatcher skips the peer_joined broadcast.
|
||||
- `evictPresenceFully` extracted from the close handler — runs
|
||||
the peer_left broadcast + cleanup (URL watches, streams, MCP
|
||||
registry, clock auto-pause). Called by `evictionTimer` after 90s
|
||||
grace, or directly when no lease was online (defensive).
|
||||
- `broker.ts` exports `restorePresence(presenceId)` — clears
|
||||
`disconnectedAt` + bumps `lastPingAt`, called on reattach to
|
||||
undo the DB-level stale-presence sweeper if it fired during
|
||||
grace.
|
||||
- DMs sent during grace fall through to the existing message_queue
|
||||
path (sendToPeer no-ops on dead WS, queue row stays with
|
||||
deliveredAt=NULL, drained on reattach). Backward compatible
|
||||
with old daemons.
|
||||
|
||||
Spec at `.artifacts/specs/2026-05-05-continuous-presence.md`.
|
||||
Layer 3 (resume token to skip full attestation on reconnect) deferred
|
||||
— pure optimization, not needed for the user-visible "no
|
||||
invisibility moment" goal.
|
||||
|
||||
*Shipped 2026-05-05.*
|
||||
|
||||
---
|
||||
|
||||
## v2.0.0 — *HKDF cross-machine identity*
|
||||
|
||||
The remaining v2 promise after Sprint A: the user's account secret
|
||||
|
||||
@@ -0,0 +1,48 @@
|
||||
-- Milestone 1 (v2 agentic-comms architecture).
|
||||
--
|
||||
-- Two concerns rolled into one migration because both are tiny and both
|
||||
-- ship together with the broker change in the same PR:
|
||||
--
|
||||
-- 1. message_queue claim/lease columns (drainForMember race fix)
|
||||
-- --------------------------------------------------------------
|
||||
-- Before this migration, drainForMember claimed rows by setting
|
||||
-- `delivered_at = NOW()` inside the same UPDATE that selected them.
|
||||
-- If the recipient WS was closed between claim-time and ws.send(),
|
||||
-- the message was silently dropped — the row read as "delivered" so
|
||||
-- the next reconnect's drain skipped it. At-most-once semantics with
|
||||
-- no retry hook.
|
||||
--
|
||||
-- The fix moves to two-phase claim/deliver with a lease:
|
||||
-- claimed_at — set when drainForMember picks the row
|
||||
-- claim_id — presenceId of the claimer (debugging)
|
||||
-- claim_expires_at — claimed_at + 30s; if no `client_ack` lands by
|
||||
-- then, a sweeper clears the claim and the row
|
||||
-- is re-eligible for a new drain (at-least-once).
|
||||
--
|
||||
-- `delivered_at` only gets set when the recipient WS replies with a
|
||||
-- `client_ack` containing the original client_message_id. Until any
|
||||
-- daemon emits `client_ack`, claims will simply expire and re-deliver
|
||||
-- — which is the desired retry behaviour for unreliable transports.
|
||||
--
|
||||
-- 2. presence.role column
|
||||
-- --------------------------------------------------------------
|
||||
-- The CLI currently hides daemon connections from `peer list` by
|
||||
-- matching `peerType === 'claudemesh-daemon'`, which is fragile and
|
||||
-- overloads a free-form field. M1 introduces a typed `role` column on
|
||||
-- presence with three documented values:
|
||||
-- 'control-plane' — long-lived daemon WS (one per host)
|
||||
-- 'session' — per-Claude-Code-session WS (default)
|
||||
-- 'service' — autonomous bots/services attached to a mesh
|
||||
--
|
||||
-- Backfilled to 'session' (default) so legacy presence rows keep their
|
||||
-- existing visibility. The two hello paths in the broker pass
|
||||
-- 'control-plane' / 'session' explicitly. CLI-side filter swap
|
||||
-- (peerType -> role) is a follow-up worktree.
|
||||
|
||||
ALTER TABLE "mesh"."message_queue"
|
||||
ADD COLUMN "claimed_at" timestamp,
|
||||
ADD COLUMN "claim_id" text,
|
||||
ADD COLUMN "claim_expires_at" timestamp;
|
||||
|
||||
ALTER TABLE "mesh"."presence"
|
||||
ADD COLUMN "role" text NOT NULL DEFAULT 'session';
|
||||
@@ -326,6 +326,14 @@ export const presence = meshSchema.table("presence", {
|
||||
statusUpdatedAt: timestamp().defaultNow().notNull(),
|
||||
summary: text(),
|
||||
groups: jsonb().$type<{ name: string; role?: string }[]>().default([]),
|
||||
// v2 agentic-comms (M1): connection role for routing/visibility.
|
||||
// 'control-plane' — long-lived daemon WS (claudemesh daemon),
|
||||
// used for fan-out and presence orchestration.
|
||||
// Hidden from user-facing peer lists.
|
||||
// 'session' — per-Claude-Code session WS (default).
|
||||
// 'service' — autonomous bots/services attached to the mesh.
|
||||
// Always populated; default 'session' keeps legacy hellos working.
|
||||
role: text().notNull().default("session"),
|
||||
connectedAt: timestamp().defaultNow().notNull(),
|
||||
lastPingAt: timestamp().defaultNow().notNull(),
|
||||
disconnectedAt: timestamp(),
|
||||
@@ -367,6 +375,14 @@ export const messageQueue = meshSchema.table("message_queue", {
|
||||
// §4.4), hex-encoded. Nullable for legacy traffic. Brokers that want
|
||||
// to enforce idempotency on retries will read this column.
|
||||
requestFingerprint: text("request_fingerprint"),
|
||||
// v2 agentic-comms (M1): two-phase claim/deliver with lease.
|
||||
// `drainForMember` claims a row by setting (claimedAt, claimId,
|
||||
// claimExpiresAt) — NOT deliveredAt. The recipient's WS only marks
|
||||
// deliveredAt after replying with a `client_ack`. A periodic sweeper
|
||||
// reaps expired claims so dropped pushes are redelivered (at-least-once).
|
||||
claimedAt: timestamp(),
|
||||
claimId: text("claim_id"),
|
||||
claimExpiresAt: timestamp(),
|
||||
});
|
||||
|
||||
/**
|
||||
|
||||
Reference in New Issue
Block a user