Commit Graph

8 Commits

Author SHA1 Message Date
Alejandro Gutiérrez
0c4a9591fa feat(broker): invite signature verification + atomic one-time-use
Completes the v0.1.0 security model. Every /join is now gated by a
signed invite that the broker re-verifies against the mesh owner's
ed25519 pubkey, plus an atomic single-use counter.

schema (migrations/0001_demonic_karnak.sql):
- mesh.mesh.owner_pubkey: ed25519 hex of the invite signer
- mesh.invite.token_bytes: canonical signed bytes (for re-verification)
Both nullable; required for new meshes going forward.

canonical invite format (signed bytes):
  `${v}|${mesh_id}|${mesh_slug}|${broker_url}|${expires_at}|
   ${mesh_root_key}|${role}|${owner_pubkey}`

wire format — invite payload in ic://join/<base64url(JSON)> now has:
  owner_pubkey: "<64 hex>"
  signature:    "<128 hex>"

broker joinMesh() (apps/broker/src/broker.ts):
1. verify ed25519 signature over canonical bytes using payload's
   owner_pubkey → else invite_bad_signature
2. load mesh, ensure mesh.owner_pubkey matches payload's owner_pubkey
   → else invite_owner_mismatch (prevents a malicious admin from
   substituting their own owner key)
3. load invite row by token, verify mesh_id matches → else
   invite_mesh_mismatch
4. expiry check → else invite_expired
5. revoked check → else invite_revoked
6. idempotency: if pubkey is already a member, return existing id
   WITHOUT burning an invite use
7. atomic CAS: UPDATE used_count = used_count + 1 WHERE used_count <
   max_uses → if 0 rows affected, return invite_exhausted
8. insert member with role from payload

cli side:
- apps/cli/src/invite/parse.ts: zod-validated owner_pubkey + signature
  fields; client verifies signature immediately and rejects tampered
  links (fail-fast before even touching the broker)
- buildSignedInvite() helper: owners sign invites client-side
- enrollWithBroker sends {invite_token, invite_payload, peer_pubkey,
  display_name} (was: {mesh_id, peer_pubkey, display_name, role})
- parseInviteLink is now async (libsodium ready + verify)

seed-test-mesh.ts generates an owner keypair, sets mesh.owner_pubkey,
builds + signs an invite, stores the invite row, emits ownerPubkey +
ownerSecretKey + inviteToken + inviteLink in the output JSON.

tests — invite-signature.test.ts (9 new):
- valid signed invite → join succeeds
- tampered payload → invite_bad_signature
- signer not the mesh owner → invite_owner_mismatch
- expired invite → invite_expired
- revoked invite → invite_revoked
- exhausted (maxUses=2, 3rd join) → invite_exhausted
- idempotent re-join doesn't burn a use
- atomic single-use: 5 concurrent joins → exactly 1 success, 4 exhausted
- mesh_id payload vs DB row mismatch → invite_mesh_mismatch

verified live: tampered link blocked client-side with a clear error.
Unmodified link joins cleanly end-to-end (roundtrip.ts + join-roundtrip.ts
both pass). 64/64 tests green.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:02:12 +01:00
Alejandro Gutiérrez
9d3dbcecaf feat(broker): verify ed25519 hello signature against member pubkey
WS handshake is now authenticated end-to-end. The broker proves that
every connected peer actually holds the secret key for the pubkey
they claim as identity — not just that they know the pubkey.

wire format change:
  {type:"hello", meshId, memberId, pubkey, sessionId, pid, cwd,
   timestamp, signature}
  where signature = ed25519_sign(canonical, secretKey)
  and canonical = `${meshId}|${memberId}|${pubkey}|${timestamp}`

broker verifies on every hello:
1. timestamp within ±60s of broker clock → else close(1008, timestamp_skew)
2. pubkey is 64 hex chars, signature is 128 hex chars → else malformed
3. crypto_sign_verify_detached(signature, canonical, pubkey) → else bad_signature
4. (existing) mesh.member row exists for (meshId, pubkey) → else unauthorized

All rejection paths close the WS with code 1008 + structured error
message + metrics counter increment (connections_rejected_total by
reason).

new modules:
- apps/broker/src/crypto.ts: canonicalHello, verifyHelloSignature,
  HELLO_SKEW_MS constant
- apps/cli/src/crypto/hello-sig.ts: matching signHello helper

clients updated:
- apps/cli/src/ws/client.ts: signs hello before send
- apps/broker/scripts/{peer-a,peer-b}.ts (smoke-test): sign hellos
  with seed-provided secret keys

new regression tests — tests/hello-signature.test.ts (7):
- valid signature accepted
- bad signature (signed with wrong key) rejected
- timestamp too old rejected (>60s)
- timestamp too far in future rejected (>60s)
- tampered canonical field (different meshId at verify time) rejected
- malformed hex pubkey rejected
- malformed signature length rejected

verified live:
- apps/broker/scripts/smoke-test.sh: full hello+ack+send+push flow
- apps/cli/scripts/roundtrip.ts: signed hello + encrypted message
- 55/55 tests pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:53:40 +01:00
Alejandro Gutiérrez
39b914bdce feat(broker): add /join endpoint for peer self-registration
Single HTTP POST /join the CLI calls after parsing an invite link +
generating an ed25519 keypair client-side. Broker validates the mesh
exists + is not archived, inserts a mesh.member row (or returns the
existing id for idempotency), returns {ok, memberId, alreadyMember?}.

body: {mesh_id, peer_pubkey, display_name, role}
- peer_pubkey must be 64 hex chars (32 bytes)
- role is "admin" | "member"

v0.1.0 trusts the request — no invite-token validation, no ed25519
signature check. Both land in Step 18 alongside libsodium wrapping.

size cap enforced via MAX_MESSAGE_BYTES (shared with hook endpoint).
structured log line per enrollment with truncated pubkey + whether
it was a new member or re-enrolled existing one.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:36:16 +01:00
Alejandro Gutiérrez
20d968f989 feat(cli): websocket client + MCP tool integration
broker-client: full WS client with hello handshake + ack, auto-reconnect
with exponential backoff (1s → 30s capped), in-memory outbound queue
(max 100) during reconnect, 500-entry push buffer for check_messages.

MCP tool integration:
- send_message: "slug:target" prefix or single-mesh fast path
- check_messages: drains push buffers across all clients
- set_status: fans manual override across all connected meshes
- set_summary: stubbed (broker protocol extension needed)
- list_peers: stubbed — lists connected mesh slugs + statuses

manager module holds Map<meshId, BrokerClient>, starts on MCP server
boot for every joined mesh in ~/.claudemesh/config.json.

new CLI command: seed-test-mesh injects a mesh row for dev testing.

also fixes a broker-side hello race: handleHello sent hello_ack before
the caller closure assigned presenceId, so clients sending right after
the ack hit the no_hello check. Fix: return presenceId, caller sets
closure var, THEN sends hello_ack. Queue drain is fire-and-forget now.

round-trip verified: two clients, A→B, push received with correct
senderPubkey + ciphertext. 44/44 broker tests still pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:30:11 +01:00
Alejandro Gutiérrez
5bf815b304 feat(broker): production hardening — caps, limits, metrics, logging
Adds the minimum ops surface area for a production broker without
over-engineering. All new config knobs are env-var driven with sane
defaults.

New modules:
- logger.ts: structured JSON logs (one line, stderr, ready for
  Loki/Datadog ingestion without preprocessing)
- metrics.ts: in-process Prometheus counters + gauges, exposed at
  GET /metrics. Tracks connections, messages, queue depth, TTL
  sweeps, hook requests, DB health.
- rate-limit.ts: token-bucket rate limiter keyed by (pid, cwd).
  Applied to POST /hook/set-status at 30/min default.
- db-health.ts: Postgres ping loop with exponential-backoff retry.
  GET /health returns 503 while DB is down.
- build-info.ts: version + gitSha (from GIT_SHA env or `git rev-parse`
  fallback) + uptime, surfaced on /health.

Behavior changes:
- Connection caps: MAX_CONNECTIONS_PER_MESH (default 100). Exceed →
  close(1008, "capacity") + metric increment.
- Message size: MAX_MESSAGE_BYTES (default 65536). WS applies it via
  `ws.maxPayload`. Hook POST bodies cap out with 413.
- Structured logs everywhere replacing the old `log()` helper.
- Env validation stricter: DATABASE_URL required + regex-checked for
  postgres:// prefix.

New endpoints:
- GET /health → {status, db, version, gitSha, uptime}. 503 if DB down.
- GET /metrics → Prometheus text format.

Verified: 21/21 tests still pass. Hit /health + /metrics live —
gitSha resolves correctly via `git rev-parse --short HEAD` in dev.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:14:31 +01:00
Alejandro Gutiérrez
0a97a0c369 refactor(broker): merge HTTP+WS to single port, populate senderPubkey on push
Single-port refactor:
- Drop the BROKER_PORT+1 HTTP side-port. Use `ws` with noServer:true
  and attach to a single node:http server via the 'upgrade' event.
- Clients connect to ws://host:PORT/ws
- Hook POSTs go to http://host:PORT/hook/set-status
- Health probe at http://host:PORT/health
- One port = one Traefik label, one cert, one deploy route. Matches
  the Coolify/VPS operational constraints.

senderPubkey on push:
- drainForMember now joins mesh.message_queue → mesh.member to return
  the sender's peerPubkey alongside each envelope. No extra round-trip,
  no cache invalidation needed (option A from review).
- index.ts populates WSPushMessage.senderPubkey from the join result
  instead of the empty-string placeholder.
- Receivers can now identify who sent a message directly from the push.

README updated with a routes table for the single-port layout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:35:05 +01:00
Alejandro Gutiérrez
3c0154ae70 feat(broker): port routing + status model from claude-intercom to postgres
Ports the proven claude-intercom broker logic into apps/broker with
SQLite → Drizzle/Postgres translation. Core state engine kept verbatim:
source-priority writes (hook > manual > jsonl), fresh-gating, TTL
sweeper for stuck-working, pending-status race handler, priority
delivery gates (now/next/low), Windows path encoding (5-candidate
fallback incl. Roberto's H:\Claude → H--Claude rule).

New modules:
- broker.ts (492 lines): writeStatus, handleHookSetStatus, sweepers,
  presence lifecycle, message queueing + drainForMember, sourceRank +
  isHookFresh / isSourceFresh logic, findMemberByPubkey (WS auth hook).
- paths.ts (141): cwdToProjectKeyCandidates + findActiveJsonl +
  inferStatusFromJsonl — JSONL fallback inference for peers without
  hooks installed or with stale hook signals.
- types.ts (111): WS protocol envelopes (hello/send/push/ack/error/
  set_status), HookSetStatusRequest/Response, ConnectedPeer view.
- index.ts (323): HTTP on BROKER_PORT+1 for /hook/set-status + /health;
  WebSocket on BROKER_PORT for authenticated peer connections with
  hello/send/set_status handlers; connections registry; heartbeat
  ping/pong every 30s; graceful SIGTERM/SIGINT that marks all active
  presences disconnected.

Mesh scoping: every query/mutation includes meshId. Peer identity is
split between mesh.member (stable) and mesh.presence (ephemeral). WS
hello authenticates by pubkey against mesh.member (signature verify is
stubbed — libsodium wiring lands in client-side package later).

Broker never sees plaintext: nonce + ciphertext are opaque text fields
passed through. Routing happens on targetSpec (pubkey | "#channel" |
"tag:xyz" | "*"), resolved against currently-connected peers.

Dependencies not installed; no tests run. Verified via static review
of imports against @turbostarter/db exports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:32:14 +01:00
Alejandro Gutiérrez
d5d0e6fdbb feat(broker): scaffold apps/broker workspace (bun WS runtime, no port yet)
- @claudemesh/broker package with bun dev/start scripts
- src/index.ts stub: WS server on BROKER_PORT, SIGTERM cleanup
- src/env.ts: Zod-validated env (BROKER_PORT, DATABASE_URL, STATUS_TTL_SECONDS, HOOK_FRESH_WINDOW_SECONDS)
- src/db.ts: re-exports Drizzle client from @turbostarter/db
- src/broker.ts + src/types.ts: placeholders for step 8 port
- README documents run commands, env vars, deploy targets
- tsconfig extends @turbostarter/tsconfig base
- eslint.config.js extends @turbostarter/eslint-config/base

Dependencies declared but not installed yet (ws, drizzle-orm, zod,
libsodium-wrappers + workspace deps). turbo.json unchanged: the global
dev task already has persistent=true + cache=false which is what the
broker needs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:24:17 +01:00