docs: update vision — 17 of 23 items implemented, add telemetry idea

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-08 00:00:37 +01:00
parent b55cf269a4
commit 025a53a70c
1 changed files with 53 additions and 10 deletions
--- a/docs/vision-20260407.md
+++ b/docs/vision-20260407.md
@@ -285,6 +285,44 @@ Control which peers can see each other. Instead of a flat mesh where everyone se

 **Effort:** 2-3 days.

+### 21. Semantic peer search
+
+In large meshes (50+ peers), scanning `list_peers` output is noise. A `search_peers` tool that filters and ranks by multiple dimensions:
+
+- **Structured filters:** name, group, role, status, peerType, channel, model, cwd
+- **Free-text search:** matches against peer summaries, profile bios, capabilities, and shared skills
+- **Capability matching:** "find a peer that knows about database migrations" searches across profile capabilities + skills catalog + recent summaries
+- **Ranking:** peers with more matching dimensions rank higher; active (idle/working) peers rank above DND/offline
+
+**MCP tool:** `search_peers(query, filters?)` — returns a ranked list of matching peers with relevance scores.
+
+**Implementation:** Broker-side — accepts a `search_peers` message, runs multi-field matching against the in-memory peer list + skills table. No external search engine needed for <500 peers; for larger meshes, wire into the existing Qdrant vector store (already available via `vector_search`).
+
+**Effort:** Half day.
+
+### 22. Mesh telemetry and debugging
+
+A structured logging system where peers report errors, warnings, and debug info to the broker. Goes beyond the audit log (which tracks events) — this tracks operational health.
+
+**What peers report:**
+- Errors: tool failures, connection drops, unhandled exceptions
+- Warnings: high context usage, slow responses, retry patterns
+- Debug: decision traces, task reasoning, why a particular approach was chosen
+- Performance: response latency per tool call, message round-trip times
+
+**Broker storage:** Structured logs indexed by mesh, peer, timestamp, severity. Retained for N days (configurable). Queryable via WS messages.
+
+**AI self-analysis:** Peers query their own logs to identify patterns: "I've hit this error 3 times in the last hour — what's common?" The mesh becomes self-diagnosing. Leads can query team-wide logs: "Which peers are seeing errors in the deploy flow?"
+
+**Reporting:** Aggregated metrics per peer, per mesh, per time window. Error rates, common failure modes, response time percentiles. Surfaced in the dashboard or via `mesh_report(timeframe: "24h")`.
+
+**MCP tools:**
+- `mesh_log(level, message, data?)` — report a log entry
+- `mesh_logs(query?, peer?, level?, last?)` — query logs
+- `mesh_report(timeframe?)` — aggregated health report
+
+**Effort:** 1-2 days.
+
 ---

 ## Suggested build order
@@ -296,19 +334,24 @@ Control which peers can see each other. Instead of a flat mesh where everyone se
 | 3 | System notifications | 2 hours | Reactive mesh, awareness | **DONE** `453705a` |
 | 4 | Cron reminders | 2 hours | Persistent scheduling | **DONE** `e873807` |
 | 5 | Mesh templates | Half day | Better onboarding | **DONE** `69e93d4` |
-| 6 | Default personal mesh | Half day | Zero-config start | |
-| 7 | Inbound webhooks | Half day | External integrations | |
-| 8 | Skills catalog | 1 day | Knowledge marketplace | |
-| 9 | Shared project files | 1 day | Cross-session file access | |
-| 10 | Slack connector | 1-2 days | Reach beyond Claude Code | |
-| 11 | Mesh MCP proxy | 2-3 days | Dynamic tools without restart | |
-| 12 | Dashboard (real-time) | 2-3 days | Visual situational awareness | **PARTIAL** `59332dc` |
+| 6 | Default personal mesh | Half day | Zero-config start | **DONE** `b0dc538` |
+| 7 | Inbound webhooks | Half day | External integrations | **DONE** `b55cf26` |
+| 8 | Skills catalog | 1 day | Knowledge marketplace | **DONE** `c8cb1e3` |
+| 9 | Shared project files | 1 day | Cross-session file access | **DONE** `504111c` |
+| 10 | Slack connector | 1-2 days | Reach beyond Claude Code | **DONE** `5563f90` |
+| 11 | Mesh MCP proxy | 2-3 days | Dynamic tools without restart | **DONE** `08e289a` |
+| 12 | Dashboard (real-time) | 2-3 days | Visual situational awareness | **DONE** `59332dc` + `7d432b3` |
 | 13 | Human peers (web chat) | 2-3 days | Humans in the loop | |
-| 14 | Simulation clock (heartbeat x1-x100) | 2 days | AI-driven load testing | |
+| 14 | Simulation clock (heartbeat x1-x100) | 2 days | AI-driven load testing | **DONE** `05d9b56` |
 | 15 | Sandboxes (E2B) | 2-3 days | Shared compute | |
-| 16 | Signed audit log | 3-5 days | Trust, compliance | |
+| 16 | Signed audit log | 3-5 days | Trust, compliance | **DONE** `86a2583` |
 | 17 | Bridge / federation | 1-2 weeks | Multi-mesh coordination | |
-| 18 | Peer visibility + spatial topology | 2-3 days | Simulation fog-of-war, org scoping | |
+| 18 | Peer visibility + profiles | 2-3 days | Simulation fog-of-war, org scoping | **DONE** (types.ts/index.ts) |
+| 19 | Semantic peer search | Half day | Discovery in large meshes | |
+| 20 | Peer stats reporting | Half day | Resource awareness, load balancing | **DONE** `b3b9972` |
+| 21 | SDK (@claudemesh/sdk) | 1 day | Non-Claude-Code clients | **DONE** `7e102a2` |
+| 22 | Telegram connector | 1-2 days | Reach beyond Claude Code | **DONE** `fe92853` |
+| 23 | Mesh telemetry + debugging | 1-2 days | Self-diagnosing mesh | |

 ---