claudemesh

Author	SHA1	Message	Date
Alejandro Gutiérrez	033a2d37e1	feat(broker): canonical session-hello + parent-attestation helpers Adds the crypto primitives the 1.30.0 per-session broker presence flow needs: canonicalSessionAttestation/canonicalSessionHello bytes, and verifySessionAttestation/verifySessionHelloSignature with TTL bounds (≤24h) plus standard ed25519 + skew checks. 10 unit tests cover the hostile cases — expired attestation, over-TTL, wrong-key signing, tampered fields, and the "attacker captured the attestation but doesn't hold the session secret key" scenario. No wire changes yet — types and dispatch land in the next two commits. Spec: .artifacts/specs/2026-05-04-per-session-presence.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 12:57:28 +01:00
Alejandro Gutiérrez	05729ad8a4	feat(ga): close remaining GA blockers (backcompat, HA prep, tests, docs) Some checks failed CI / Lint (push) Has been cancelled Details CI / Typecheck (push) Has been cancelled Details CI / Broker tests (Postgres) (push) Has been cancelled Details CI / Docker build (linux/amd64) (push) Has been cancelled Details Backwards compat shim (task 27) - requireCliAuth() falls back to body.user_id when BROKER_LEGACY_AUTH=1 and no bearer present. Sets Deprecation + Warning headers + bumps a broker_legacy_auth_hits_total metric so operators can watch the legacy traffic drain to 0 before removing the shim. - All handlers parse body BEFORE requireCliAuth so the fallback can read user_id out of it. HA readiness (task 29) - .artifacts/specs/2026-04-15-broker-ha-statelessness-audit.md documents every in-memory symbol and rollout plan (phase 0-4). - packaging/docker-compose.ha-local.yml spins up 2 broker replicas behind Traefik sticky sessions for local smoke testing. - apps/broker/src/audit.ts now wraps writes in a transaction that takes pg_advisory_xact_lock(meshId) and re-reads the tail hash inside the txn. Concurrent broker replicas can no longer fork the audit chain. Deploy gate (task 30) - /health stays permissive (200 even on transient DB blips) so Docker doesn't kill the container on a glitch. - New /health/ready checks DB + optional EXPECTED_MIGRATION pin, returns 503 if either fails. External deploy gate can poll this and refuse to promote a broken deploy. Metrics dashboard (task 32) - packaging/grafana/claudemesh-broker.json: ready-to-import Grafana dashboard covering active conns, queue depth, routed/rejected rates, grant drops, legacy-auth hits, conn rejects. Tests (task 28) - audit-canonical.test.ts (4 tests) pins canonical JSON semantics. - grants-enforcement.test.ts (6 tests) covers the member-then- session-pubkey lookup with default/explicit/blocked branches. Docs (task 34) - docs/env-vars.md catalogues every env var the broker + CLI read. Crypto review prep (task 35) - .artifacts/specs/2026-04-15-crypto-review-packet.md: reviewer brief, threat model, scope, test coverage list, deliverables. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 23:51:28 +01:00
Alejandro Gutiérrez	c1fa3bcb5c	feat: anthropic-style mesh + invite redesign (wave 1 checkpoint) Ships the user-visible friction fixes and the foundation for the v2 invite protocol. API wiring + CLI client + email UI ship in wave 2. Meshes — shipped - Drop global UNIQUE on mesh.slug; mesh.id is canonical everywhere - Server derives slug from name; create form has no slug field - Two users can freely name their mesh "platform"; no collision errors - Migration 0017 Invites v1 — shipped (URL shortener, backward compatible) - New invite.code column (base62, 8 chars, nullable unique index) - createMyInvite mints both token + short code; returns shortUrl - GET /api/public/invite-code/:code resolves short code to token - New route /i/[code] server-redirects to /join/[token] - Invite generator UI shows short URL; QR encodes short URL - Advanced fields (role/maxUses/expiresInDays) collapsed under disclosure - Migration 0018 Invites v2 — foundation (broker + DB only; API+CLI+Web wiring in wave 2) - Broker: canonicalInviteV2, verifyInviteV2, sealRootKeyToRecipient - Broker: POST /invites/:code/claim endpoint (atomic single-use accounting) - Broker tests: invite-v2.test.ts (signature, expiry, revocation, exhaustion) - DB: mesh.invite gains version/capabilityV2/claimedByPubkey columns - DB: new mesh.pending_invite table for email invites - Migration 0019 - Contract locked in docs/protocol.md §v2 + SPEC.md §14b Consent landing — shipped - /join/[token] redesigned: explicit role, inviter, mesh stats, consent - New server components: invite-card, role-badge, inviter-line, consent-summary - "Join [mesh] as [Role]" primary action (not just "Join") Error surfacing — shipped - handle() now parses {error} responses from hono route catch blocks - onError fallback includes timestamp so handle() can match apiErrorSchema - Real error messages reach the UI instead of "Something went wrong" Docs - SPEC.md §14b: v2 invite protocol - docs/protocol.md: v2 claim wire format - docs/roadmap.md: status - .artifacts/specs/2026-04-10-anthropic-vision-meshes-invites.md Deferred to wave 2/3 - API claim route wiring (packages/api) - createMyInvite v2 capability generation - Email invite mutation + Postmark delivery - CLI v2 join flow (x25519 keypair + unseal) - Web invite-generator email field + v2 display Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 13:41:11 +01:00
Alejandro Gutiérrez	0c4a9591fa	feat(broker): invite signature verification + atomic one-time-use Completes the v0.1.0 security model. Every /join is now gated by a signed invite that the broker re-verifies against the mesh owner's ed25519 pubkey, plus an atomic single-use counter. schema (migrations/0001_demonic_karnak.sql): - mesh.mesh.owner_pubkey: ed25519 hex of the invite signer - mesh.invite.token_bytes: canonical signed bytes (for re-verification) Both nullable; required for new meshes going forward. canonical invite format (signed bytes): `${v}\|${mesh_id}\|${mesh_slug}\|${broker_url}\|${expires_at}\| ${mesh_root_key}\|${role}\|${owner_pubkey}` wire format — invite payload in ic://join/<base64url(JSON)> now has: owner_pubkey: "<64 hex>" signature: "<128 hex>" broker joinMesh() (apps/broker/src/broker.ts): 1. verify ed25519 signature over canonical bytes using payload's owner_pubkey → else invite_bad_signature 2. load mesh, ensure mesh.owner_pubkey matches payload's owner_pubkey → else invite_owner_mismatch (prevents a malicious admin from substituting their own owner key) 3. load invite row by token, verify mesh_id matches → else invite_mesh_mismatch 4. expiry check → else invite_expired 5. revoked check → else invite_revoked 6. idempotency: if pubkey is already a member, return existing id WITHOUT burning an invite use 7. atomic CAS: UPDATE used_count = used_count + 1 WHERE used_count < max_uses → if 0 rows affected, return invite_exhausted 8. insert member with role from payload cli side: - apps/cli/src/invite/parse.ts: zod-validated owner_pubkey + signature fields; client verifies signature immediately and rejects tampered links (fail-fast before even touching the broker) - buildSignedInvite() helper: owners sign invites client-side - enrollWithBroker sends {invite_token, invite_payload, peer_pubkey, display_name} (was: {mesh_id, peer_pubkey, display_name, role}) - parseInviteLink is now async (libsodium ready + verify) seed-test-mesh.ts generates an owner keypair, sets mesh.owner_pubkey, builds + signs an invite, stores the invite row, emits ownerPubkey + ownerSecretKey + inviteToken + inviteLink in the output JSON. tests — invite-signature.test.ts (9 new): - valid signed invite → join succeeds - tampered payload → invite_bad_signature - signer not the mesh owner → invite_owner_mismatch - expired invite → invite_expired - revoked invite → invite_revoked - exhausted (maxUses=2, 3rd join) → invite_exhausted - idempotent re-join doesn't burn a use - atomic single-use: 5 concurrent joins → exactly 1 success, 4 exhausted - mesh_id payload vs DB row mismatch → invite_mesh_mismatch verified live: tampered link blocked client-side with a clear error. Unmodified link joins cleanly end-to-end (roundtrip.ts + join-roundtrip.ts both pass). 64/64 tests green. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 23:02:12 +01:00
Alejandro Gutiérrez	9d3dbcecaf	feat(broker): verify ed25519 hello signature against member pubkey WS handshake is now authenticated end-to-end. The broker proves that every connected peer actually holds the secret key for the pubkey they claim as identity — not just that they know the pubkey. wire format change: {type:"hello", meshId, memberId, pubkey, sessionId, pid, cwd, timestamp, signature} where signature = ed25519_sign(canonical, secretKey) and canonical = `${meshId}\|${memberId}\|${pubkey}\|${timestamp}` broker verifies on every hello: 1. timestamp within ±60s of broker clock → else close(1008, timestamp_skew) 2. pubkey is 64 hex chars, signature is 128 hex chars → else malformed 3. crypto_sign_verify_detached(signature, canonical, pubkey) → else bad_signature 4. (existing) mesh.member row exists for (meshId, pubkey) → else unauthorized All rejection paths close the WS with code 1008 + structured error message + metrics counter increment (connections_rejected_total by reason). new modules: - apps/broker/src/crypto.ts: canonicalHello, verifyHelloSignature, HELLO_SKEW_MS constant - apps/cli/src/crypto/hello-sig.ts: matching signHello helper clients updated: - apps/cli/src/ws/client.ts: signs hello before send - apps/broker/scripts/{peer-a,peer-b}.ts (smoke-test): sign hellos with seed-provided secret keys new regression tests — tests/hello-signature.test.ts (7): - valid signature accepted - bad signature (signed with wrong key) rejected - timestamp too old rejected (>60s) - timestamp too far in future rejected (>60s) - tampered canonical field (different meshId at verify time) rejected - malformed hex pubkey rejected - malformed signature length rejected verified live: - apps/broker/scripts/smoke-test.sh: full hello+ack+send+push flow - apps/cli/scripts/roundtrip.ts: signed hello + encrypted message - 55/55 tests pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 22:53:40 +01:00
Alejandro Gutiérrez	81a8d0714b	feat(crypto): client-side direct-message encryption with crypto_box Direct messages between peers are now end-to-end encrypted. The broker only ever sees {nonce, ciphertext} — plaintext lives on the two endpoints. apps/cli/src/crypto/envelope.ts: - encryptDirect(message, recipientPubkeyHex, senderSecretKeyHex) → {nonce, ciphertext} via crypto_box_easy, 24-byte fresh nonce - decryptDirect(envelope, senderPubkeyHex, recipientSecretKeyHex) → plaintext or null (null on MAC failure / malformed input) - ed25519 keys (from Step 17) are converted to X25519 on the fly via crypto_sign_ed25519_{pk,sk}_to_curve25519 — one signing keypair covers both signing + encryption roles. BrokerClient.send(): - if targetSpec is a 64-hex pubkey → encrypt via crypto_box - else (broadcast "*" or channel "#foo") → base64-wrapped plaintext (shared-key encryption for channels lands in a later step) InboundPush now carries: - plaintext: string \| null (decrypted body, null if decryption failed OR it's a non-direct message) - kind: "direct" \| "broadcast" \| "channel" \| "unknown" MCP check_messages formatter reads plaintext directly. side-fixes pulled in during 18a: - apps/broker/scripts/seed-test-mesh.ts now generates real ed25519 keypairs (the previous "aaaa…" / "bbbb…" fillers weren't valid curve points, so crypto_sign_ed25519_pk_to_curve25519 rejected them). Seed output now includes secretKey for each peer. - apps/broker/src/broker.ts drainForMember wraps the atomic claim in a CTE + outer ORDER BY so FIFO ordering is SQL-sourced, not JS-sorted (Postgres microsecond timestamps collapse to the same Date.getTime() milliseconds otherwise). - vitest.config.ts fileParallelism: false — test files share DB state via cleanupAllTestMeshes afterAll, so running them in parallel caused one file's cleanup to race another's inserts. - integration/health.test.ts "returns 200" now uses waitFullyHealthy (a 200-only waiter) instead of waitHealthyOrAny — prevents a race with the startup DB ping. verified live: - apps/cli/scripts/roundtrip.ts (direct A→B): ciphertext in DB is opaque bytes (not base64-plaintext), decrypted correctly on arrival - apps/cli/scripts/join-roundtrip.ts (full join → encrypted send): PASSED - 48/48 broker tests green Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 22:48:33 +01:00
Alejandro Gutiérrez	cd389c6bdd	fix(broker): atomic message claim to prevent duplicate delivery drainForMember previously ran SELECT undelivered rows, THEN UPDATE delivered_at. Two concurrent callers (e.g. WS fan-out on send + handleHello's own drain for the target) could both SELECT the same row before either UPDATEd, pushing the same envelope twice. now: single atomic UPDATE ... FROM member ... WHERE id IN ( SELECT id ... FOR UPDATE SKIP LOCKED ) RETURNING mq.*, m.peer_pubkey AS sender_pubkey. FOR UPDATE SKIP LOCKED is the key primitive — concurrent callers each claim DISJOINT sets, so a message can never be drained twice. Union of all concurrent drains still covers every eligible row. re-sorts RETURNING rows by created_at client-side (Postgres makes no FIFO guarantee on the RETURNING clause's output order), and normalizes created_at to Date since raw-sql results can come back as ISO strings. regression: tests/dup-delivery.test.ts (4 tests) - two concurrent drains produce disjoint result sets - six concurrent drains partition cleanly (20 messages, each drained once) - subsequent drain after success returns empty - FIFO ordering preserved within a single drain 48/48 tests pass. Live round-trip no longer logs the double-push. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 22:39:48 +01:00
Alejandro Gutiérrez	3458860c1f	test(broker): coverage for hardening modules — caps, limits, metrics, health, logs Adds 23 tests across 4 files, taking total broker coverage from 21 → 44 passing in ~2.5s. Unit tests (no I/O): - tests/rate-limit.test.ts (6): TokenBucket capacity, refill rate, no-overflow cap, independent buckets per key, sweep GC. - tests/metrics.test.ts (5): all 10 series present in /metrics, counter increment semantics, labelled series produce distinct lines, gauge set overwrites, Prometheus format well-formed. - tests/logging.test.ts (5): JSON per line, required fields (ts, level, component, msg), context merging, level preservation, no plain-text escape hatches. Integration tests (spawn real broker subprocesses on random ports): - tests/integration/health.test.ts (7): * GET /health 200 + {status, db, version, gitSha, uptime} (healthy DB) * GET /health 503 + {status:degraded, db:down} (unreachable DB) * GET /metrics 200 text/plain with all expected series * GET /nope → 404 * POST /hook/set-status oversized body → 413 * POST /hook/set-status 6th req/min → 429 * Rate limit isolation by (pid, cwd) key Integration tests use node:child_process (vitest runs under Node, not Bun — Bun.spawn isn't available). Each suite spawns its own broker subprocess with a random port + tailored env vars. Not yet covered (flagged for follow-up): - WebSocket connection caps (needs seeded mesh + WS client setup) - WebSocket message-size rejection (ws.maxPayload behavior) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 22:19:14 +01:00
Alejandro Gutiérrez	e25115f1b0	test(broker): port test suite from claude-intercom to drizzle/postgres 21 integration tests (14 broker behavior + 7 path encoding), all passing in ~1s against a real Postgres (claudemesh_test database on the dev container). Test infrastructure: - apps/broker/vitest.config.ts extends @turbostarter/vitest-config/base - tests/helpers.ts: setupTestMesh() creates a fresh mesh + 2 members per test with a unique slug, returns cleanup function that cascades the delete. cleanupAllTestMeshes() as an afterAll safety net. - Mesh isolation in broker logic means tests don't interfere even when they share a database — no per-test TRUNCATE needed. Ported behavior tests (broker.test.ts, 14 tests): - hook flips status + queued "next" messages unblock - "now"-priority bypasses the working gate - DND is sacred (hooks cannot unset it) - hook source stays fresh through jsonl refresh - source decays to jsonl when hook signal goes stale - isHookFresh freshness window + source-type rules - TTL sweep flips stuck "working" → idle - TTL sweep leaves DND alone - first-turn race: hook fired pre-connect stashed in pending_status - applyPendingHookStatus picks newest matching entry - expired pending entries are ignored on connect - broadcast targetSpec (*) reaches all members - pubkey mismatch → message not drained - mesh isolation: peer in mesh X doesn't drain from mesh Y Ported encoding tests (encoding.test.ts, 7 tests): - macOS, Linux, Windows path encoding first-candidate correctness - Roberto's H:\Claude → H--Claude regression test (2026-04-04) - Candidate dedup, drive-stripped fallback, leading-dash fallback How to run: from apps/broker, DATABASE_URL="postgresql://.../claudemesh_test" pnpm test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-04 22:09:06 +01:00

9 Commits