Files

Alejandro Gutiérrez b315b31cc9 docs: add peer session persistence and MCP notification to vision

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-04-08 00:15:49 +01:00

24 KiB

Raw Blame History

claudemesh — Vision & Feature Brainstorm

Date: 2026-04-07 23:01 CEST Author: Alejandro Gutiérrez + Claude (Opus 4.6) Status: Internal brainstorm — not committed to public roadmap Last updated: 2026-04-08 00:09 CEST

Tier 1 — High impact, buildable now

Add cwd to the WS hello handshake. Broker stores it in the peer record, list_peers returns it. Peers on the same machine see each other's working directories — lets AI reference files across sessions without guessing paths.

Effort: 30 min. One field in hello + peer list.

Implemented: 2026-04-07 23:30 · 810f372 · CLI 0.6.9 + broker deployed

2. Peer metadata: human vs AI, channel type, model — DONE

Why: Foundation for connectors, human peers, and smart routing (send complex analysis to the Opus peer, quick tasks to Sonnet).

Effort: 1 hour.

Implemented: 2026-04-07 23:30 · 810f372 · Shipped with item 1 (same commit)

3. System notifications (join/leave/resource events) — DONE

Broker pushes system-level messages when peers connect/disconnect, files get shared, state changes, tasks get created. Same subtype pattern as reminders: { type: "push", subtype: "system", event: "peer_joined", ... }.

Why: Mesh feels alive. AI can react to topology changes without polling.

Effort: 2 hours.

Implemented: 2026-04-07 23:20 · 453705a · peer_joined + peer_left broadcasts, system subtype in push

4. Cron-based reminders — DONE

Replace setTimeout with a persistent cron scheduler (broker-side). AI sends schedule_reminder --cron "0 */2 * * *" --message "check deploy status". Broker uses node-cron or Drizzle-backed scheduler. Survives broker restarts.

Why: Current reminders die if the broker restarts. Cron syntax is already familiar to AI.

Effort: 2 hours (+ DB migration for persistence).

Implemented: 2026-04-07 23:35 · e873807 · DB-persisted schedules, zero-dep cron parser, restart recovery, --cron CLI flag

5. Heartbeats / session supervisor + simulation clock — DONE

Keepalive layer: WebSocket ping/pong for connection health. A CLI-side supervisor monitors the WS connection and relaunches Claude Code if it drops. Broker marks peers as disconnected on WS close.

Simulation clock layer: Heartbeats become a broker-driven clock that peers can subscribe to. The broker broadcasts periodic { subtype: "heartbeat", tick: 42, simTime: "2026-04-08T14:30:00Z", speed: "x10" } messages at a configurable rate.

Time multiplier for load testing:

mesh_set_clock(speed: "x1") — real-time, normal operation
mesh_set_clock(speed: "x10") — 1 hour of simulated activity in 6 minutes
mesh_set_clock(speed: "x100") — 1 day of simulated activity in ~15 minutes

Use case — infrastructure stress testing: Spawn 10 AI peers, each simulating a real user persona (sales rep, admin, customer). Set the clock to x10. Each peer receives heartbeat ticks and acts according to the simulated time: "it's 9am, log in and check dashboard", "it's 11am, process 5 orders", "it's 3pm, run reports". The infrastructure sees realistic usage patterns at 10x speed.

What peers see:

> mesh_clock()
Simulation clock: x10 | sim time: 2026-04-08 14:30 | tick: 42/480

> [heartbeat tick 43 — sim time: 14:36]
  AI peer "Sales-Rep-1": creates 3 orders, searches inventory
  AI peer "Admin-1": approves pending orders, checks stock levels
  AI peer "Customer-1": browses catalog, adds to cart, checks out

Components:

Broker: clock state + periodic broadcast to all peers
MCP tools: mesh_set_clock(speed), mesh_clock(), mesh_pause_clock(), mesh_resume_clock()
Peer behavior: AI reads tick + simTime from heartbeat, decides actions based on its persona and the simulated time of day
Reporting: broker collects action counts per tick, produces load profile after the run

Why this is powerful: Unlike synthetic load testers (k6, Locust), AI peers exercise the full stack — UI flows, API sequences, edge cases, realistic data entry. They find bugs that scripted tests miss because they improvise like real users.

Effort: 1 day (heartbeat + clock), 1 day (simulation framework + personas).

Implemented: 2026-04-07 · 05d9b56 · Per-mesh clock state, configurable speed x1-x100, auto-pause on empty mesh, heartbeat ticks via system push

Tier 2 — Strong ideas, needs design

6. Mesh webhooks / REST API / external WebSocket — PARTIAL (webhooks done)

Three surfaces for external integration:

Inbound webhooks: POST https://ic.claudemesh.com/hook/<mesh-id>/<secret> → broker injects as a push to all peers or a specific group. GitHub, CI/CD, monitoring alerts become mesh messages.
REST API: Authenticated endpoints to send messages, read state, list peers from outside. Makes the mesh programmable from any language.
External WS: Non-Claude clients connect via WS with an API key (not a session keypair). Same protocol, different auth.

Prerequisite: API keys per mesh (not ephemeral session keypairs).

Effort: Half day (webhooks alone), 2-3 days (full API surface).

Partial: 2026-04-07 · b55cf26 · Inbound webhooks implemented (POST /hook/:meshId/:secret → push to mesh). REST API and external WS remain.

7. Connectors: Slack, Telegram as peers — DONE

Approach 1 — Connector-as-peer (recommended start): A bridge process joins the mesh as a peer named "Slack-#general" and relays messages bidirectionally. Peers see it in list_peers with peerType: "connector". One connector per channel.

Approach 2 — Connector-as-router: Broker-level integration — messages to #slack:general route through a registered connector. More elegant, but complex.

Ship as claudemesh-connector-slack, claudemesh-connector-telegram.

Effort: 1-2 days each.

Implemented: 2026-04-07 · Slack: 5563f90 (Socket Mode, echo prevention, auto-reconnect) · Telegram: fe92853 (zero-dep Bot API, long polling)

8. Humans in the mesh

Humans connect via the web dashboard or mobile app using the same WS protocol. peerType: "human" metadata tells AI to adjust communication style. The push system works natively in browsers (WS is bidirectional).

Challenge: UX. Humans need a chat interface with typing indicators, read receipts, message history — not raw JSON. The dashboard already exists at claudemesh.com; extend it with a chat panel.

Effort: 2-3 days (web chat panel).

9. Connecting non-Claude-Code AI — DONE

Any process that speaks the WS protocol can join. The barrier isn't the protocol — it's the MCP tool surface that makes Claude Code sessions first-class. For other LLMs:

SDK approach: npm install claudemesh-sdk — a JS/Python library that handles WS connection, crypto, and message parsing. Wrap any LLM's function-calling interface around it.
Push delivery: The push system works over WS. Non-Claude clients receive pushes the same way. The challenge is injecting them into the LLM's context — each platform has a different mechanism (OpenAI function results, Gemini tool responses, etc.).
Adapter pattern: claudemesh-adapter-openai, claudemesh-adapter-cursor, etc.

Effort: 1 day (SDK), 1 day per adapter.

Implemented: 2026-04-07 · 7e102a2 · @claudemesh/sdk — standalone TypeScript SDK with libsodium crypto_box, EventEmitter API, auto-reconnect

10. Mesh skills catalog — DONE

Peers publish skills: share_skill({ name: "pdf-generation", description: "...", instructions: "..." }). Other peers list_skills() and get_skill("pdf-generation") to load instructions into their context. Broker stores skills like memory/state.

Why: A mesh becomes a capability marketplace. One session installs a skill, all peers benefit. Skills can include tool definitions, system prompts, reference docs, and example workflows.

This is the killer feature. It turns claudemesh from a messaging layer into a knowledge-sharing platform.

Effort: 1 day.

Implemented: 2026-04-07 · c8cb1e3 · Full CRUD (share/get/list/remove), upsert by name, ILIKE search, Drizzle schema

11. Shared project files across peers — DONE

When a peer connects, it registers accessible paths (opt-in per directory). Other peers request files: get_peer_file(peer: "Alice", path: "src/auth.ts"). The owning peer reads the file and returns it over the mesh.

Security scoping options:

Opt-in per directory: claudemesh launch --share-dir ./src
Same-machine only (detect via hostname/IP)
Approval per request

Effort: 1 day.

Implemented: 2026-04-07 · 504111c · Broker relay (never reads content), CLI file serving with 1MB cap, path traversal rejection, hidden files excluded, 2-level dir listing. Plus hostname-based local/remote detection (2c9c8c7) and filesystem shortcut hint (a92cf6b).

12. Peer stats (context consumption, token usage) — DONE

Peers self-report: set_status extended with contextUsed: 85000, contextMax: 200000, tokensIn: 12000, tokensOut: 8000. Dashboard shows burn rate. Useful for load balancing — route work to the peer with the most context headroom.

Limitation: Claude Code doesn't expose context usage via API. Would need estimation from conversation length or /cost command parsing.

Effort: Half day (reporting infrastructure), unknown (accurate context measurement).

Implemented: 2026-04-07 · b3b9972 · Auto-reporting every 60s (messagesIn/Out, toolCalls, uptime, errors), mesh_stats MCP tool, stats in list_peers

Tier 3 — Big bets, needs careful thought

13. Mesh blockchain / signed audit log — DONE (audit log)

Honest assessment: A full blockchain is overkill for a cooperative mesh. What's actually valuable is the useful parts:

Signed append-only log: Immutable record of all decisions, state changes, and messages. Merkle tree integrity. Useful for compliance, debugging, and "who decided what."
Conflict resolution: Vector clocks or CRDTs for state, instead of last-write-wins.
Reputation: Track which peers deliver on tasks, respond promptly, produce quality work.

Reframe as: Signed audit trail with integrity proofs. Not a blockchain, but the valuable properties of one.

Effort: 3-5 days.

Implemented: 2026-04-07 · 86a2583 · SHA-256 hash chain audit log, append-only, no message content logged, chain verification endpoint, paginated query

14. Mesh of meshes / bridge

A meta-broker that routes between meshes. Use case: dev-team mesh and ops-team mesh coordinate on deploys.

Simple version: A bridge peer joins both meshes and relays tagged messages. No broker changes needed. Already feasible with today's protocol.

Federation version: Broker-to-broker peering protocol. Brokers exchange presence and route ciphertext across organizations.

Effort: 1 day (bridge peer), 1-2 weeks (federation protocol).

15. Mesh templates on creation — DONE

Predefined mesh configurations: roles, groups, state keys, system prompts, skills, and governance rules. Examples:

dev-team: @frontend, @backend, @devops groups; lead/member roles; state keys for sprint/deploy-frozen
research: @analysis, @writing groups; shared memory focus; context-sharing optimized
ops-incident: @oncall, @comms groups; high-urgency defaults; auto-escalation rules

Templates are JSON files. claudemesh create --template dev-team applies them at mesh creation. Templates are editable post-creation by mesh admin (or anyone, depending on governance).

Effort: Half day.

Implemented: 2026-04-07 · 69e93d4 · 5 templates (dev-team, research, ops-incident, simulation, personal) + claudemesh create command

16. Default private mesh per user — DONE

On claudemesh install, auto-create a personal mesh with the user as sole member. All their Claude Code sessions join by default. Zero-config — instant value without understanding meshes.

Effort: Half day.

Implemented: 2026-04-07 · b0dc538 · Install detects empty meshes, shows join guidance. Local-only mesh deferred (requires broker enrollment).

17. Mesh MCP proxy (dynamic tools without session restart) — DONE

Problem: Claude Code loads MCP servers at startup. You can't inject new tool definitions into a running session.

Solution: Route through the existing claudemesh MCP connection. A generic mesh_tool_call tool proxies to MCP servers registered in the mesh at runtime — no restart needed.

Flow:

A peer registers an MCP server: mesh_mcp_register(name: "github", transport: "stdio", command: "npx @github/mcp")
Broker stores the registration
Any peer calls mesh_tool_call(server: "github", tool: "list_repos", args: {...})
Broker routes to the hosting peer or a shared sidecar process
That host invokes the actual MCP server, returns the result through the mesh
Calling peer gets the response — all through the existing claudemesh WS connection

Two hosting models:

Peer-hosted: The registering peer runs the MCP server locally. Other peers proxy through them. If that peer disconnects, the MCP goes offline.
Broker-hosted: The broker spawns the MCP server as a sidecar. Always available. Better for shared tools (database, GitHub, Jira).

What AI sees:

> mesh_mcp_list()
Available mesh MCP servers:
- github (hosted by: Alice) — tools: list_repos, create_issue, ...
- jira (hosted by: broker) — tools: search_issues, create_ticket, ...
- postgres-prod (hosted by: broker) — tools: query, execute

> mesh_tool_call(server: "github", tool: "create_issue", args: {repo: "...", title: "..."})
Issue #42 created.

Limitation: Claude Code won't see these as first-class tools in its tool list — AI needs to know to use mesh_tool_call. MCP server instructions document the proxy pattern.

New MCP tools needed: mesh_mcp_register, mesh_mcp_list, mesh_tool_call, mesh_mcp_remove

Effort: 2-3 days.

Implemented: 2026-04-07 · 08e289a · Full round-trip: register → list → call → forward → execute → result. In-memory registry, 30s call timeout, auto-cleanup on disconnect.

18. Sandbox for code execution

Each mesh gets optional compute sandboxes (Docker containers, Firecracker VMs, or E2B-style). Peers request: execute_code(lang: "python", code: "..."). Broker provisions a sandbox, runs the code, returns stdout/stderr. Resources scale on demand as peers need sandboxes.

Build vs integrate:

Build: Docker-in-Docker on the broker host. Simple but security-sensitive.
Integrate: E2B, Modal, or Fly Machines as the sandbox backend. claudemesh MCP tool is a thin client. Scales naturally.

Effort: 2-3 days (E2B integration), 1-2 weeks (self-hosted sandboxes).

19. Mesh dashboard (real-time situational awareness) — DONE

Live web UI at claudemesh.com/dashboard showing:

Peer graph: Who's connected, status, groups, roles — nodes and edges
Message flow: Animated edges showing real-time traffic between peers
State/memory timeline: When values changed and who changed them
Resource panel: Files shared, tasks active, skills available
Peer detail: Click a peer → see summary, context usage, message history

Broker already tracks everything needed. Dashboard subscribes via WS and renders with D3/React.

Effort: 2-3 days (functional), 1 week (polished).

Implemented: 2026-04-07 · 59332dc peer graph (radial SVG, animated edges, group rings) + 7d432b3 state timeline + resource panel. Peer detail view remains.

20. Peer visibility and spatial topology — DONE (visibility + profiles)

Control which peers can see each other. Instead of a flat mesh where everyone sees everyone, the broker filters list_peers responses and message routing based on visibility rules.

Three visibility models:

Proximity-based (simulation): Each peer has coordinates (x, y) and a visibility radius. Only peers within range appear in list_peers. set_position(x, y) changes who you can see — spatial fog of war. Combined with the simulation clock, this creates emergent behavior: a "customer" peer walks into a "store zone", suddenly sees "sales rep" peers, initiates interaction.
Scope-based (organizational): Visibility follows group membership. Peers in @frontend see each other and @leads, but not @backend internals. Org-chart visibility without exposing every department.
Manual/dynamic: Peers or admins explicitly show/hide. set_visible(false) to go stealth (connected but invisible). Admin can force visibility/invisibility.

Who controls visibility:

Broker rules — mesh-wide policy set at creation or via template (e.g., "proximity" mode for simulations, "scope" for orgs)
Peer self-control — set_visible(false) to go stealth, set_position(x, y) to move in proximity mode
Admin override — mesh admin force-shows or force-hides peers
Dynamic conditions — broker changes visibility based on state keys, clock ticks, or events

Notifications: Peers receive { subtype: "system", event: "peer_visible" } when a new peer enters their visibility and peer_hidden when one leaves. Different from join/leave — the peer is still connected, just not visible to you.

Peer public profile (outside image): Each peer has a public-facing profile that other peers see — a curated view separate from internal state. Fields: avatar (emoji or URL), title (short role label), bio (one-liner), capabilities (what I can help with). Set via set_profile({ avatar: "🔧", title: "DevOps Lead", bio: "Infrastructure and deploys" }). This is what appears on the peer graph node and in list_peers. Peers choose how they present themselves to the mesh.

MCP tools: set_visible(visible), set_position(x, y), set_profile(profile), get_visible_peers(), set_visibility_mode(mode) (admin only)

Effort: 2-3 days.

Partial: 2026-04-07 · Visibility toggle (set_visible), public profiles (set_profile), hidden peer filtering in list_peers, peer_visible/peer_hidden system events, direct messages still reach hidden peers. Remaining: proximity-based (x,y coordinates), scope-based (group visibility rules).

21. Semantic peer search

In large meshes (50+ peers), scanning list_peers output is noise. A search_peers tool that filters and ranks by multiple dimensions:

Structured filters: name, group, role, status, peerType, channel, model, cwd
Free-text search: matches against peer summaries, profile bios, capabilities, and shared skills
Capability matching: "find a peer that knows about database migrations" searches across profile capabilities + skills catalog + recent summaries
Ranking: peers with more matching dimensions rank higher; active (idle/working) peers rank above DND/offline

MCP tool: search_peers(query, filters?) — returns a ranked list of matching peers with relevance scores.

Implementation: Broker-side — accepts a search_peers message, runs multi-field matching against the in-memory peer list + skills table. No external search engine needed for <500 peers; for larger meshes, wire into the existing Qdrant vector store (already available via vector_search).

Effort: Half day.

22. Mesh telemetry and debugging

A structured logging system where peers report errors, warnings, and debug info to the broker. Goes beyond the audit log (which tracks events) — this tracks operational health.

What peers report:

Errors: tool failures, connection drops, unhandled exceptions
Warnings: high context usage, slow responses, retry patterns
Debug: decision traces, task reasoning, why a particular approach was chosen
Performance: response latency per tool call, message round-trip times

Broker storage: Structured logs indexed by mesh, peer, timestamp, severity. Retained for N days (configurable). Queryable via WS messages.

AI self-analysis: Peers query their own logs to identify patterns: "I've hit this error 3 times in the last hour — what's common?" The mesh becomes self-diagnosing. Leads can query team-wide logs: "Which peers are seeing errors in the deploy flow?"

Reporting: Aggregated metrics per peer, per mesh, per time window. Error rates, common failure modes, response time percentiles. Surfaced in the dashboard or via mesh_report(timeframe: "24h").

MCP tools:

mesh_log(level, message, data?) — report a log entry
mesh_logs(query?, peer?, level?, last?) — query logs
mesh_report(timeframe?) — aggregated health report

Effort: 1-2 days.

23. Peer session persistence ("welcome back")

When a peer disconnects, their state is lost (groups, profile, visibility, stats, summary). On reconnect they start blank. Persist peer state so returning peers resume where they left off.

What persists (keyed by meshId + memberId):

Groups and roles
Profile (avatar, title, bio, capabilities)
Visibility setting
Last summary
Cumulative stats (messages, tool calls across all sessions)
Last seen timestamp

What resets: status (always "idle" on connect), WebSocket/presenceId (ephemeral).

Reconnect flow:

Peer sends hello with same memberId
Broker looks up peer_state table for (meshId, memberId)
If found: restore groups, profile, visibility, stats — hello fields take precedence if explicitly set
Enriched hello_ack includes restored: true and previous summary
System notification: "Welcome back, Alice! Last seen 2h ago. Restored: @frontend:lead, @devops:member"
On disconnect: upsert current state to peer_state

Why: AI sessions restart often (context limits, crashes, new tasks). Without persistence, every reconnect requires manual group joins and profile setup. With it, the mesh remembers who you are.

Effort: Half day.

Suggested build order

#	Feature	Effort	Unlocks	Status
1	Session path sharing	30 min	File referencing across sessions	DONE `810f372`
2	Peer metadata (type/channel/model)	1 hour	Connectors, humans, smart routing	DONE `810f372`
3	System notifications	2 hours	Reactive mesh, awareness	DONE `453705a`
4	Cron reminders	2 hours	Persistent scheduling	DONE `e873807`
5	Mesh templates	Half day	Better onboarding	DONE `69e93d4`
6	Default personal mesh	Half day	Zero-config start	DONE `b0dc538`
7	Inbound webhooks	Half day	External integrations	DONE `b55cf26`
8	Skills catalog	1 day	Knowledge marketplace	DONE `c8cb1e3`
9	Shared project files	1 day	Cross-session file access	DONE `504111c`
10	Slack connector	1-2 days	Reach beyond Claude Code	DONE `5563f90`
11	Mesh MCP proxy	2-3 days	Dynamic tools without restart	DONE `08e289a`
12	Dashboard (real-time)	2-3 days	Visual situational awareness	DONE `59332dc` + `7d432b3`
13	Human peers (web chat)	2-3 days	Humans in the loop
14	Simulation clock (heartbeat x1-x100)	2 days	AI-driven load testing	DONE `05d9b56`
15	Sandboxes (E2B)	2-3 days	Shared compute
16	Signed audit log	3-5 days	Trust, compliance	DONE `86a2583`
17	Bridge / federation	1-2 weeks	Multi-mesh coordination
18	Peer visibility + profiles	2-3 days	Simulation fog-of-war, org scoping	DONE (types.ts/index.ts)
19	Semantic peer search	Half day	Discovery in large meshes
20	Peer stats reporting	Half day	Resource awareness, load balancing	DONE `b3b9972`
21	SDK (@claudemesh/sdk)	1 day	Non-Claude-Code clients	DONE `7e102a2`
22	Telegram connector	1-2 days	Reach beyond Claude Code	DONE `fe92853`
23	Mesh telemetry + debugging	1-2 days	Self-diagnosing mesh
24	Peer session persistence	Half day	"Welcome back" on reconnect

This document captures a brainstorming session. Items are not commitments. Priorities will shift as we build and learn.

24 KiB Raw Blame History

claudemesh — Vision & Feature Brainstorm

Tier 1 — High impact, buildable now

1. Session path (pwd) sharing — DONE

2. Peer metadata: human vs AI, channel type, model — DONE

3. System notifications (join/leave/resource events) — DONE

4. Cron-based reminders — DONE

5. Heartbeats / session supervisor + simulation clock — DONE

Tier 2 — Strong ideas, needs design

6. Mesh webhooks / REST API / external WebSocket — PARTIAL (webhooks done)

7. Connectors: Slack, Telegram as peers — DONE

8. Humans in the mesh

9. Connecting non-Claude-Code AI — DONE

10. Mesh skills catalog — DONE

11. Shared project files across peers — DONE

12. Peer stats (context consumption, token usage) — DONE

Tier 3 — Big bets, needs careful thought

13. Mesh blockchain / signed audit log — DONE (audit log)

14. Mesh of meshes / bridge

15. Mesh templates on creation — DONE

16. Default private mesh per user — DONE

17. Mesh MCP proxy (dynamic tools without session restart) — DONE

18. Sandbox for code execution

19. Mesh dashboard (real-time situational awareness) — DONE

20. Peer visibility and spatial topology — DONE (visibility + profiles)

21. Semantic peer search

22. Mesh telemetry and debugging

23. Peer session persistence ("welcome back")

Suggested build order

24 KiB

Raw Blame History