feat(broker+cli): topics — conversation scope within a mesh (v0.2.0)
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled

Adds the third axis of mesh organization: mesh = trust boundary,
group = identity tag, topic = conversation scope. Topic-tagged
messages filter delivery by topic_member rows and persist to a
topic_message history table for back-scroll on reconnect.

Schema (additive):
- mesh.topic, mesh.topic_member, mesh.topic_message tables
- topic_visibility (public|private|dm) and topic_member_role
  (lead|member|observer) enums
- migration 0022_topics.sql, hand-written following project convention
  (drizzle journal has been drifting since 0011)

Broker:
- 10 helpers (createTopic, listTopics, findTopicByName, joinTopic,
  leaveTopic, topicMembers, getMemberTopicIds, appendTopicMessage,
  topicHistory, markTopicRead)
- drainForMember matches "#<topicId>" target_specs via member's
  topic memberships
- 7 WS handlers (topic_create/list/join/leave/members/history/mark_read)
  + resolveTopicId helper accepting id-or-name
- handleSend auto-persists topic-tagged messages to history

CLI:
- claudemesh topic create/list/join/leave/members/history/read
- claudemesh send "#deploys" "..." resolves topic name to id
- bundled skill teaches Claude the DM/group/topic decision matrix
- policy-classify recognizes topic create/join/leave as writes

Spec: .artifacts/specs/2026-05-02-v0.2.0-scope.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Alejandro Gutiérrez
2026-05-02 01:53:42 +01:00
parent b4f457fceb
commit 1afae7a507
12 changed files with 1741 additions and 196 deletions

View File

@@ -0,0 +1,271 @@
# claudemesh v0.2.0 — scope
**Date:** 2026-05-02
**Status:** draft
**Predecessor:** [`2026-05-02-architecture-north-star.md`](./2026-05-02-architecture-north-star.md) (1.5.0 architecture lock)
---
## Cut
**Theme: from agent-only mesh to mesh of agents, humans, and external systems — with conversation context.**
| # | Feature | Effort | Spine |
|---|---------|--------|-------|
| 1 | **Topics** (channels/rooms within a mesh) | 2-3 d | yes |
| 2 | **Humans in the mesh** (web chat panel) | 2-3 d | depends on #1 |
| 3 | **REST API + external WS** (API keys per mesh) | 2-3 d | depends on #1 |
| 4 | **Bridge peer** (forwards one topic between meshes) | 1 d | depends on #1 |
Optional pickup if all four ship early:
- **Local peer aliases** (~0.5 d) — IRC-style local labels for hard-to-remember displayNames.
- **Semantic peer search** (~0.5 d) — already in vision doc; useful once topics exist.
Total: 7-9 days plus 1-2 days slack. Targeting **release window: 2026-05-12 to 2026-05-16**.
---
## Why this cut
The 1.5.0 architecture (CLI-first, tool-less MCP, policy engine) is finished. The next bottleneck is **product surface**, not engineering.
Current taxonomy `mesh + group + role` is the right *organizational* structure but missing a *conversational* primitive. Every message is DM or `@group` broadcast — there's no continuity for "the deploys conversation," no scoped state/memory/files, no way for a human to join a topic without joining the whole mesh, no way for a bridge to forward a single thread of work.
**Topics fix this.** They are the spine of v0.2.0:
- Without topics, "humans in mesh" floods every human with every peer's chatter.
- Without topics, "bridge" forwards everything (loop risk, signal-to-noise problem).
- Without topics, REST API endpoints have no natural sub-mesh scope.
Once topics exist, humans + REST + bridge each become 50% smaller because they slot into a clean primitive instead of inventing one.
---
## Deferred
| Item | Why later |
|---|---|
| **Federation** (broker-to-broker) | Bridges prototype it. Learn from real use first. |
| **Sandboxes** (E2B / Modal) | Orthogonal capability. Separate release. |
| **Sim SDK** (`@claudemesh/sim`) | Niche audience; long-tail. v0.3.0+. |
| **Welcome back / persistent MCP** | Already in progress as 1.6.0 patch. |
| **Mesh telemetry** | Pre-PMF telemetry is busywork; users first. |
---
## Design sketches
### 1. Topics
**Mental model:** mesh is *who you trust*; group is *who you are*; topic is *what you're talking about*. Three orthogonal axes.
**Wire shape:**
```yaml
topic:
id: <ulid>
mesh_slug: openclaw
name: deploys # unique within mesh
description: "deploy + on-call"
visibility: public # public | private (invite-only) | dm (1:1, autocreated)
created_by: <pubkey>
created_at: <ts>
```
**Membership:**
```yaml
topic_member:
topic_id: <ulid>
pubkey: <hex> # session pubkey OR member_pubkey for durable identity
role: lead | member | observer
joined_at: <ts>
last_read_at: <ts> # for unread counts
```
**Messages reference a topic, not just a target:**
```jsonc
// existing send_message envelope gains a `topic` field
{
"to": "@deploys", // or topic id, or peer name (DM)
"topic": "deploys", // optional explicit, inferred from `to: @<topic>`
"message": "...",
"priority": "next"
}
```
**Resolution rules:**
- `to: "alice"` → DM to peer alice (no topic).
- `to: "@frontend"` → group broadcast (no topic — backwards compatible with 1.5.0).
- `to: "#deploys"` → topic message; delivered only to topic subscribers.
- `to: "*"` → mesh-wide broadcast (kept; lower-priority than topic for new comms).
**State/memory/files scoping:**
- `claudemesh state set <k> <v> --topic deploys` — namespace under topic.
- `claudemesh remember "..." --topic deploys` — topic-scoped memory.
- `claudemesh file list --topic deploys` — files visible only to topic members.
**CLI:**
```bash
claudemesh topic create deploys --description "deploy + on-call"
claudemesh topic list # all topics in mesh
claudemesh topic join deploys
claudemesh topic leave deploys
claudemesh topic invite deploys <peer> # private topics
claudemesh topic members deploys
claudemesh topic delete deploys # creator/admin only
claudemesh send "#deploys" "rolling out 1.5.1"
```
**MCP `claude/channel` notification gains `topic`** as an attribute so peers know which conversation an inbound message belongs to.
**Effort breakdown:** schema + drizzle migration + CLI verbs + broker routing changes (filter by topic membership) + skill update. ~250 LoC across CLI + ~200 LoC broker.
---
### 2. Humans in the mesh
**Mental model:** a human is a peer with `peer_type: "human"` whose presence is durable (no session pubkey rotation; identity tied to an account). They join *topics*, not the whole mesh — so they only see relevant traffic.
**Wire:**
```jsonc
// hello envelope gains:
{
"peer_type": "human",
"session_pubkey": <ephemeral, per browser tab>,
"member_pubkey": <durable, account-tied>,
"display_name": "Alejandro"
}
```
**Web panel (`apps/web`):**
```
/dashboard/mesh/<slug>/topic/<topic-name>
├── topic header (members, settings)
├── message stream (WS-driven, infinite scroll on history)
├── compose box (typing indicator broadcast on focus)
└── members sidebar (presence, profile, last_read_at)
```
**Backend changes:**
- Persistent message history per topic (drizzle table `topic_messages`; existing direct messages stay ephemeral by design).
- Topic-scoped read receipts (`topic_member.last_read_at`).
- Typing indicator: short-lived broadcast on the topic channel (`{type: "typing", peer: "..."}`).
**Privacy invariant:** a human in `#deploys` sees only `#deploys` traffic + DMs sent to them. Never the whole mesh. This is the *whole reason* topics come first.
**Effort:** WS endpoint already exists (broker side). Add: topic_messages table, history endpoint, web UI components (compose, stream, members). ~3 days.
---
### 3. REST API + external WS
**Auth:** API keys per mesh, scoped by capability + topic.
```yaml
api_key:
id: <ulid>
mesh_slug: openclaw
label: "ci-bot"
hash: <argon2id>
capabilities: ["send", "read"]
topic_scopes: ["#deploys"] # null = all topics; explicit = whitelist
created_at: <ts>
last_used_at: <ts>
revoked_at: <ts | null>
```
**CLI for issuance (admin only):**
```bash
claudemesh apikey create --label "ci-bot" --topic deploys --cap send,read
claudemesh apikey list
claudemesh apikey revoke <id>
```
**REST endpoints (claudemesh.com/api/v1):**
```
POST /v1/messages Send a message (auth: api key).
GET /v1/topics/:name/messages History (with pagination cursor).
GET /v1/peers List online peers (filtered by key scope).
GET /v1/state Read mesh state.
POST /v1/state Write mesh state.
```
**External WS:** `wss://ic.claudemesh.com/ws?api_key=...&topic=deploys` — connects with `peer_type: "external"`. Push-pipe parity with internal sessions; can subscribe to topic streams.
**Why REST keys not session keypairs:** external clients (Zapier, GitHub Actions, mobile apps, Slack workspace bots) need long-lived bearer-like creds, not ephemeral keypairs. Different threat model — scope tightly via topic + capability.
**Effort:** ~3 days. Mostly broker work; CLI gets the issuance verbs.
---
### 4. Bridge peer
**Mental model:** a bridge is a peer that holds memberships in two meshes and forwards traffic on a single topic between them. SDK-only (no broker changes).
**Implementation (uses existing `@claudemesh/sdk`):**
```typescript
import { Bridge } from "@claudemesh/sdk";
const bridge = new Bridge({
meshes: ["work", "external"],
topic: "incidents",
filter: (msg) => !msg.tags.includes("internal-only"),
loop_prevention: { tag: "via-bridge", max_hops: 2 },
});
await bridge.start();
```
**Loop prevention:** every forwarded message gets a `bridge_hop_<n>` tag; bridges drop messages that already carry their own tag (prevents echo) and any message with `max_hops` exceeded.
**CLI:** `claudemesh bridge run <config.yaml>` — runs an SDK bridge as a long-lived process. Useful for "run a bridge inside a docker container or systemd unit."
**What it deliberately doesn't do:**
- Cross-broker federation (that's a separate broker-to-broker protocol).
- Bidirectional state/memory sync (only messages on a single topic).
- Identity unification (a peer in mesh A is *not* the same peer in mesh B; the bridge appears as the messenger).
**Effort:** ~1 day on top of the existing SDK.
---
## Acceptance signals
v0.2.0 ships when all four are demonstrable end-to-end:
1. A peer creates `#deploys`, two other peers join it, traffic is topic-scoped, mesh-wide chat doesn't see it.
2. A human signs in at `claudemesh.com`, joins `#deploys`, sends a message, a Claude session in the mesh receives it as a `<channel>` interrupt with `topic="deploys"`.
3. A `curl` POST against `/v1/messages` with an API key delivers a message into `#deploys`; the same API key is rejected on `#secrets`.
4. A bridge peer running locally forwards `#incidents` between two test meshes; loop is prevented; one-shot demo recorded.
---
## Out of scope (explicitly)
- Topic hierarchy / nesting (flat namespace per mesh; revisit at scale).
- Topic-scoped capability grants (`grant <peer> read:#topic`) — solvable later via capability extension.
- Threads-within-topics (Slack-style). Defer.
- Voice / video / file-upload UX for humans — text only in v0.2.0.
- Federation, sandboxes, sim-sdk — explicitly deferred above.
---
## Risks
- **Topics retrofit risk** — existing 1.5.0 message envelope assumes "to" is peer/group/star. Adding `topic` is additive on the wire but changes routing logic. Test path: backfill existing meshes with a default `#general` topic; opt-in to topic-only routing.
- **Web chat session lifecycle** — humans expect "I closed the tab and came back, my place is preserved." Ephemeral session pubkeys break that. Workaround: tie human peer identity to `member_pubkey` + last_read_at on the topic; session pubkey rotates per tab but membership is durable.
- **API key abuse** — leaked keys = anyone can post. Mitigations: capability + topic scoping; rate limits per key; `last_used_at` + audit trail; revoke verb is fast.
---
## Open questions
1. Do existing `@group` semantics survive intact, or do we collapse `@group` and `#topic` into one primitive? (Answer favored: keep both — different axes.)
2. Should topics persist messages by default, or be opt-in? (Default: yes for `peer_type: "human"`-touched topics; configurable per topic for agent-only ones.)
3. Where does mesh-MCP discovery live in the topic model — per topic or per mesh? (Likely per mesh; mesh-MCP is infrastructure, not conversation.)