30 Commits

Author SHA1 Message Date
Alejandro Gutiérrez
02b1e5695f feat: v0.2.0 — Groups (@group routing, roles, wizard)
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
Release / Publish multi-arch images (push) Has been cancelled
Phase A of the claudemesh spec. Peers can now join named groups
with roles, and messages route to @group targets.

Broker:
- @group routing in fan-out (matches peer group membership)
- @all alias for broadcast
- join_group/leave_group WS messages + DB persistence
- list_peers returns group metadata
- drainForMember matches @group targetSpecs in SQL

CLI:
- join_group/leave_group MCP tools
- send_message supports @group targets
- list_peers shows group membership
- PeerInfo includes groups array
- Peer name cache for push notifications

Launch:
- --role flag (optional peer role)
- --groups flag (comma-separated, e.g. "frontend:lead,reviewers")
- Interactive wizard for role + groups when flags omitted
- Groups written to session config for broker hello

Spec: SPEC.md added with full v0.2 vision (groups, state, memory)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 13:06:16 +01:00
Alejandro Gutiérrez
663f800b4b fix: v0.1.16 — fix message delivery between same-member sessions
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
Release / Publish multi-arch images (push) Has been cancelled
excludeSenderMemberId blocked delivery to ALL peers sharing the
same member_id (all sessions from one join). Replaced with
excludeSenderSessionPubkey which only excludes the sender's own
session — peers with different session pubkeys receive correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 12:44:29 +01:00
Alejandro Gutiérrez
2557235c68 fix: v0.1.15 — production hardening (7 fixes)
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
Release / Publish multi-arch images (push) Has been cancelled
Broker:
- Sweep stale presences (3 missed pings = disconnect, 30s interval)
- Exclude sender from broadcast fan-out + queue drain

CLI:
- Decrypt fallback: try base64 plaintext if crypto_box fails
- Stable session keypair across WS reconnects
- Peer name cache (30s TTL) instead of list_peers per push
- Clean up orphaned tmpdirs from crashed sessions (>1 hour old)
- Read displayName from config file (not just env var)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 12:22:04 +01:00
Alejandro Gutiérrez
92bb276a3e fix: v0.1.11 — fix crypto_box decryption with session pubkeys
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
Release / Publish multi-arch images (push) Has been cancelled
Store sender's sessionPubkey on message_queue at send time.
drainForMember returns COALESCE(sender_session_pubkey, peer_pubkey)
so the recipient gets the correct sender key for decryption.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 11:23:42 +01:00
Alejandro Gutiérrez
af8f8ed1f9 feat: v0.1.10 — per-session ephemeral keypairs
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
Release / Publish multi-arch images (push) Has been cancelled
Each WS connection generates its own ed25519 keypair (sessionPubkey)
sent in the hello handshake. The broker stores it on the presence
row and uses it for message routing + list_peers. This gives every
`claudemesh launch` a unique crypto identity without burning invite
uses — member auth stays permanent, session identity is ephemeral.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 11:14:33 +01:00
Alejandro Gutiérrez
2a2aac3622 feat(cli): v0.1.7 — --name, --mesh, --join flags for launch
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
Release / Publish multi-arch images (push) Has been cancelled
- `claudemesh launch --name Mou` sets per-session display name
- `claudemesh launch --mesh car-dealers` selects mesh (interactive picker if >1)
- `claudemesh launch --join <token-or-url>` joins a mesh inline before launching
- Broker stores per-presence displayName override (prefers over member default)
- Session config isolated via tmpdir (auto-cleanup on exit)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 10:45:29 +01:00
Alejandro Gutiérrez
d8bafe3144 fix(web): fully remove payload runtime from production build
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
Remove ALL Payload imports, withPayload wrapper, and (payload)
routes. Blog index + changelog are now static data arrays.
Blog post at /blog/peer-messaging-claude-code is static TSX.

Payload CMS stays as a dev dependency for future local admin
but has zero presence in the production build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 09:25:02 +01:00
Alejandro Gutiérrez
f4bcad91b0 refactor(deploy): trim docker images via pnpm deploy --legacy
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
Use pnpm deploy to flatten each package's runtime subset into /deploy,
then copy ONLY that into the runtime stage. Catalog + workspace:*
specifiers previously forced full-workspace resolution into every
image's node_modules — unnecessary for either runtime.

Results (arm64, same smoke tests pass):
- broker:   3.26GB → 341MB  (-90%, drops all devDeps incl. drizzle-kit)
- migrate:  3.27GB → 653MB  (-80%, keeps drizzle-kit which IS runtime)

Broker /health confirms GIT_SHA build-arg still propagates (gitSha:
"30bc24f" in smoke test). Migrate still reads drizzle.config.ts and
attempts the connection correctly.

--legacy flag needed because pnpm 10 defaults to inject-workspace-
packages mode which the monorepo doesn't opt into; legacy is safe here.
--ignore-scripts on deploy skips the root postinstall (sherif lint:ws)
which has nothing to do with runtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-05 15:37:21 +01:00
Alejandro Gutiérrez
2412267fb4 fix(web): disable anonymous login by default (guest button removal)
Some checks failed
CI / Lint (push) Has been cancelled
CI / Typecheck (push) Has been cancelled
CI / Broker tests (Postgres) (push) Has been cancelled
CI / Docker build (linux/amd64) (push) Has been cancelled
claudemesh requires an account — mesh membership is tied to user.id.
e8ad7a5 flipped the config default but the env var override at
env.config.ts:43 still defaulted to true, keeping the button visible.

Fixed at env var level + example files. Needs Coolify rebuild since
NEXT_PUBLIC_* is build-time in Next standalone.
2026-04-05 15:26:13 +01:00
Alejandro Gutiérrez
759a22e7c0 fix(api): sign invites with stored owner keypair instead of unsigned placeholder
Some checks failed
CI / Tests / 🧪 Test (push) Has been cancelled
Production /join on the broker (from feat 18c) rejects every invite
with invite_bad_signature because the web UI was emitting unsigned
payloads. This fixes that.

createMyMesh now generates ed25519 owner keypair + 32-byte root key
and stores all three on the mesh row. createMyInvite loads them,
signs the canonical invite bytes via crypto_sign_detached, and
emits a fully-signed payload matching what the broker expects:

  payload = {v, mesh_id, mesh_slug, broker_url, expires_at,
             mesh_root_key, role, owner_pubkey, signature}
  canonical = same fields minus signature, "|"-delimited
  signature = ed25519_sign(canonical, mesh.owner_secret_key)
  token = base64url(JSON(payload))   ← stored as invite.token

The base64url(JSON) token IS the DB lookup key — broker's /join
does `WHERE invite.token = <that string>`, then re-verifies the
signature it extracts from the decoded payload.

Also drops the sha256 derivePlaceholderRootKey() helper and the
encodeInviteLink helper, both replaced by inline logic.

backfill extended: the one-off script now populates owner_pubkey
AND owner_secret_key AND root_key together in a single pass. Query
condition is `WHERE any of the three IS NULL`, so running it
post-migration catches every row regardless of partial prior fills.

requires packages/api to depend on libsodium-wrappers + types
(added). 64/64 broker tests still green.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:12:04 +01:00
Alejandro Gutiérrez
fa23525c46 feat(broker): one-off owner_pubkey backfill script
Populates mesh.mesh.owner_pubkey for pre-18c rows by generating a
fresh ed25519 keypair per mesh + emitting the secret key to stdout
for out-of-band hand-off.

Idempotent: only patches rows WHERE owner_pubkey IS NULL. Machine-
readable output (tab-separated: mesh_id, slug, pubkey, secret_key)
so operators can pipe into a secure store.

Usage:
  DATABASE_URL=... bun apps/broker/scripts/backfill-owner-pubkey.ts > owners.tsv
  # then securely distribute secrets to mesh owners

Verified locally: nulled smoke-test mesh's owner_pubkey → ran backfill
→ fresh keypair written, secret emitted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:08:07 +01:00
Alejandro Gutiérrez
0c4a9591fa feat(broker): invite signature verification + atomic one-time-use
Completes the v0.1.0 security model. Every /join is now gated by a
signed invite that the broker re-verifies against the mesh owner's
ed25519 pubkey, plus an atomic single-use counter.

schema (migrations/0001_demonic_karnak.sql):
- mesh.mesh.owner_pubkey: ed25519 hex of the invite signer
- mesh.invite.token_bytes: canonical signed bytes (for re-verification)
Both nullable; required for new meshes going forward.

canonical invite format (signed bytes):
  `${v}|${mesh_id}|${mesh_slug}|${broker_url}|${expires_at}|
   ${mesh_root_key}|${role}|${owner_pubkey}`

wire format — invite payload in ic://join/<base64url(JSON)> now has:
  owner_pubkey: "<64 hex>"
  signature:    "<128 hex>"

broker joinMesh() (apps/broker/src/broker.ts):
1. verify ed25519 signature over canonical bytes using payload's
   owner_pubkey → else invite_bad_signature
2. load mesh, ensure mesh.owner_pubkey matches payload's owner_pubkey
   → else invite_owner_mismatch (prevents a malicious admin from
   substituting their own owner key)
3. load invite row by token, verify mesh_id matches → else
   invite_mesh_mismatch
4. expiry check → else invite_expired
5. revoked check → else invite_revoked
6. idempotency: if pubkey is already a member, return existing id
   WITHOUT burning an invite use
7. atomic CAS: UPDATE used_count = used_count + 1 WHERE used_count <
   max_uses → if 0 rows affected, return invite_exhausted
8. insert member with role from payload

cli side:
- apps/cli/src/invite/parse.ts: zod-validated owner_pubkey + signature
  fields; client verifies signature immediately and rejects tampered
  links (fail-fast before even touching the broker)
- buildSignedInvite() helper: owners sign invites client-side
- enrollWithBroker sends {invite_token, invite_payload, peer_pubkey,
  display_name} (was: {mesh_id, peer_pubkey, display_name, role})
- parseInviteLink is now async (libsodium ready + verify)

seed-test-mesh.ts generates an owner keypair, sets mesh.owner_pubkey,
builds + signs an invite, stores the invite row, emits ownerPubkey +
ownerSecretKey + inviteToken + inviteLink in the output JSON.

tests — invite-signature.test.ts (9 new):
- valid signed invite → join succeeds
- tampered payload → invite_bad_signature
- signer not the mesh owner → invite_owner_mismatch
- expired invite → invite_expired
- revoked invite → invite_revoked
- exhausted (maxUses=2, 3rd join) → invite_exhausted
- idempotent re-join doesn't burn a use
- atomic single-use: 5 concurrent joins → exactly 1 success, 4 exhausted
- mesh_id payload vs DB row mismatch → invite_mesh_mismatch

verified live: tampered link blocked client-side with a clear error.
Unmodified link joins cleanly end-to-end (roundtrip.ts + join-roundtrip.ts
both pass). 64/64 tests green.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:02:12 +01:00
Alejandro Gutiérrez
9d3dbcecaf feat(broker): verify ed25519 hello signature against member pubkey
WS handshake is now authenticated end-to-end. The broker proves that
every connected peer actually holds the secret key for the pubkey
they claim as identity — not just that they know the pubkey.

wire format change:
  {type:"hello", meshId, memberId, pubkey, sessionId, pid, cwd,
   timestamp, signature}
  where signature = ed25519_sign(canonical, secretKey)
  and canonical = `${meshId}|${memberId}|${pubkey}|${timestamp}`

broker verifies on every hello:
1. timestamp within ±60s of broker clock → else close(1008, timestamp_skew)
2. pubkey is 64 hex chars, signature is 128 hex chars → else malformed
3. crypto_sign_verify_detached(signature, canonical, pubkey) → else bad_signature
4. (existing) mesh.member row exists for (meshId, pubkey) → else unauthorized

All rejection paths close the WS with code 1008 + structured error
message + metrics counter increment (connections_rejected_total by
reason).

new modules:
- apps/broker/src/crypto.ts: canonicalHello, verifyHelloSignature,
  HELLO_SKEW_MS constant
- apps/cli/src/crypto/hello-sig.ts: matching signHello helper

clients updated:
- apps/cli/src/ws/client.ts: signs hello before send
- apps/broker/scripts/{peer-a,peer-b}.ts (smoke-test): sign hellos
  with seed-provided secret keys

new regression tests — tests/hello-signature.test.ts (7):
- valid signature accepted
- bad signature (signed with wrong key) rejected
- timestamp too old rejected (>60s)
- timestamp too far in future rejected (>60s)
- tampered canonical field (different meshId at verify time) rejected
- malformed hex pubkey rejected
- malformed signature length rejected

verified live:
- apps/broker/scripts/smoke-test.sh: full hello+ack+send+push flow
- apps/cli/scripts/roundtrip.ts: signed hello + encrypted message
- 55/55 tests pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:53:40 +01:00
Alejandro Gutiérrez
81a8d0714b feat(crypto): client-side direct-message encryption with crypto_box
Direct messages between peers are now end-to-end encrypted. The
broker only ever sees {nonce, ciphertext} — plaintext lives on the
two endpoints.

apps/cli/src/crypto/envelope.ts:
- encryptDirect(message, recipientPubkeyHex, senderSecretKeyHex)
  → {nonce, ciphertext} via crypto_box_easy, 24-byte fresh nonce
- decryptDirect(envelope, senderPubkeyHex, recipientSecretKeyHex)
  → plaintext or null (null on MAC failure / malformed input)
- ed25519 keys (from Step 17) are converted to X25519 on the fly via
  crypto_sign_ed25519_{pk,sk}_to_curve25519 — one signing keypair
  covers both signing + encryption roles.

BrokerClient.send():
- if targetSpec is a 64-hex pubkey → encrypt via crypto_box
- else (broadcast "*" or channel "#foo") → base64-wrapped plaintext
  (shared-key encryption for channels lands in a later step)

InboundPush now carries:
- plaintext: string | null   (decrypted body, null if decryption failed
                              OR it's a non-direct message)
- kind: "direct" | "broadcast" | "channel" | "unknown"
MCP check_messages formatter reads plaintext directly.

side-fixes pulled in during 18a:
- apps/broker/scripts/seed-test-mesh.ts now generates real ed25519
  keypairs (the previous "aaaa…" / "bbbb…" fillers weren't valid
  curve points, so crypto_sign_ed25519_pk_to_curve25519 rejected
  them). Seed output now includes secretKey for each peer.
- apps/broker/src/broker.ts drainForMember wraps the atomic claim in
  a CTE + outer ORDER BY so FIFO ordering is SQL-sourced, not
  JS-sorted (Postgres microsecond timestamps collapse to the same
  Date.getTime() milliseconds otherwise).
- vitest.config.ts fileParallelism: false — test files share
  DB state via cleanupAllTestMeshes afterAll, so running them in
  parallel caused one file's cleanup to race another's inserts.
- integration/health.test.ts "returns 200" now uses waitFullyHealthy
  (a 200-only waiter) instead of waitHealthyOrAny — prevents a race
  with the startup DB ping.

verified live:
- apps/cli/scripts/roundtrip.ts (direct A→B): ciphertext in DB is
  opaque bytes (not base64-plaintext), decrypted correctly on arrival
- apps/cli/scripts/join-roundtrip.ts (full join → encrypted send):
  PASSED
- 48/48 broker tests green

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:48:33 +01:00
Alejandro Gutiérrez
cd389c6bdd fix(broker): atomic message claim to prevent duplicate delivery
drainForMember previously ran SELECT undelivered rows, THEN UPDATE
delivered_at. Two concurrent callers (e.g. WS fan-out on send +
handleHello's own drain for the target) could both SELECT the same
row before either UPDATEd, pushing the same envelope twice.

now: single atomic UPDATE ... FROM member ... WHERE id IN (
  SELECT id ... FOR UPDATE SKIP LOCKED
) RETURNING mq.*, m.peer_pubkey AS sender_pubkey.

FOR UPDATE SKIP LOCKED is the key primitive — concurrent callers
each claim DISJOINT sets, so a message can never be drained twice.
Union of all concurrent drains still covers every eligible row.

re-sorts RETURNING rows by created_at client-side (Postgres makes no
FIFO guarantee on the RETURNING clause's output order), and normalizes
created_at to Date since raw-sql results can come back as ISO strings.

regression: tests/dup-delivery.test.ts (4 tests)
- two concurrent drains produce disjoint result sets
- six concurrent drains partition cleanly (20 messages, each drained once)
- subsequent drain after success returns empty
- FIFO ordering preserved within a single drain

48/48 tests pass. Live round-trip no longer logs the double-push.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:39:48 +01:00
Alejandro Gutiérrez
39b914bdce feat(broker): add /join endpoint for peer self-registration
Single HTTP POST /join the CLI calls after parsing an invite link +
generating an ed25519 keypair client-side. Broker validates the mesh
exists + is not archived, inserts a mesh.member row (or returns the
existing id for idempotency), returns {ok, memberId, alreadyMember?}.

body: {mesh_id, peer_pubkey, display_name, role}
- peer_pubkey must be 64 hex chars (32 bytes)
- role is "admin" | "member"

v0.1.0 trusts the request — no invite-token validation, no ed25519
signature check. Both land in Step 18 alongside libsodium wrapping.

size cap enforced via MAX_MESSAGE_BYTES (shared with hook endpoint).
structured log line per enrollment with truncated pubkey + whether
it was a new member or re-enrolled existing one.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:36:16 +01:00
Alejandro Gutiérrez
20d968f989 feat(cli): websocket client + MCP tool integration
broker-client: full WS client with hello handshake + ack, auto-reconnect
with exponential backoff (1s → 30s capped), in-memory outbound queue
(max 100) during reconnect, 500-entry push buffer for check_messages.

MCP tool integration:
- send_message: "slug:target" prefix or single-mesh fast path
- check_messages: drains push buffers across all clients
- set_status: fans manual override across all connected meshes
- set_summary: stubbed (broker protocol extension needed)
- list_peers: stubbed — lists connected mesh slugs + statuses

manager module holds Map<meshId, BrokerClient>, starts on MCP server
boot for every joined mesh in ~/.claudemesh/config.json.

new CLI command: seed-test-mesh injects a mesh row for dev testing.

also fixes a broker-side hello race: handleHello sent hello_ack before
the caller closure assigned presenceId, so clients sending right after
the ack hit the no_hello check. Fix: return presenceId, caller sets
closure var, THEN sends hello_ack. Queue drain is fire-and-forget now.

round-trip verified: two clients, A→B, push received with correct
senderPubkey + ciphertext. 44/44 broker tests still pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:30:11 +01:00
Alejandro Gutiérrez
c6674e971a chore(deploy): production Dockerfiles for broker + web + env template
Some checks failed
CI / Tests / 🧪 Test (push) Has been cancelled
- apps/broker/Dockerfile: oven/bun 1.2-slim runtime, multi-stage, pnpm deps,
  non-root bun user, GIT_SHA build-arg, /health-based HEALTHCHECK, port 7900
- apps/web/Dockerfile: Next.js 15 standalone, multi-stage, non-root nextjs
  user, NEXT_PUBLIC_* baked as build args, port 3000
- .env.production.template: DATABASE_URL, BetterAuth, OAuth, broker caps;
  no secrets
- Build context: repo root (pnpm workspace needs root pnpm-lock.yaml +
  pnpm-workspace.yaml); build with -f apps/{broker,web}/Dockerfile .

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:22:05 +01:00
Alejandro Gutiérrez
3458860c1f test(broker): coverage for hardening modules — caps, limits, metrics, health, logs
Adds 23 tests across 4 files, taking total broker coverage from
21 → 44 passing in ~2.5s.

Unit tests (no I/O):
- tests/rate-limit.test.ts (6): TokenBucket capacity, refill rate,
  no-overflow cap, independent buckets per key, sweep GC.
- tests/metrics.test.ts (5): all 10 series present in /metrics,
  counter increment semantics, labelled series produce distinct lines,
  gauge set overwrites, Prometheus format well-formed.
- tests/logging.test.ts (5): JSON per line, required fields (ts, level,
  component, msg), context merging, level preservation, no plain-text
  escape hatches.

Integration tests (spawn real broker subprocesses on random ports):
- tests/integration/health.test.ts (7):
  * GET /health 200 + {status, db, version, gitSha, uptime} (healthy DB)
  * GET /health 503 + {status:degraded, db:down} (unreachable DB)
  * GET /metrics 200 text/plain with all expected series
  * GET /nope → 404
  * POST /hook/set-status oversized body → 413
  * POST /hook/set-status 6th req/min → 429
  * Rate limit isolation by (pid, cwd) key

Integration tests use node:child_process (vitest runs under Node, not
Bun — Bun.spawn isn't available). Each suite spawns its own broker
subprocess with a random port + tailored env vars.

Not yet covered (flagged for follow-up):
- WebSocket connection caps (needs seeded mesh + WS client setup)
- WebSocket message-size rejection (ws.maxPayload behavior)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:19:14 +01:00
Alejandro Gutiérrez
5f8567614a docs(broker): production deployment spec
Authoritative runtime contract for the broker. Documents:
- HTTP + WS routes (single-port architecture)
- Required + optional env vars (DATABASE_URL, caps, TTLs, limits)
- /health and /metrics semantics, including 503 behavior on DB drop
- SIGTERM/SIGINT graceful shutdown sequence
- Recommended multi-stage Docker build (node:slim for pnpm, oven/bun
  for runtime) with GIT_SHA build-arg convention
- Signal/grace-period guidance for orchestrators
- Prometheus metric names + suggested alert thresholds
- CI pattern for the test suite (needs a live Postgres)
- Deployment target hand-off to the deploy lane

Complements the existing Dockerfile (claudemesh-3's work) with the
runtime contract the Dockerfile implements.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:15:24 +01:00
Alejandro Gutiérrez
5bf815b304 feat(broker): production hardening — caps, limits, metrics, logging
Adds the minimum ops surface area for a production broker without
over-engineering. All new config knobs are env-var driven with sane
defaults.

New modules:
- logger.ts: structured JSON logs (one line, stderr, ready for
  Loki/Datadog ingestion without preprocessing)
- metrics.ts: in-process Prometheus counters + gauges, exposed at
  GET /metrics. Tracks connections, messages, queue depth, TTL
  sweeps, hook requests, DB health.
- rate-limit.ts: token-bucket rate limiter keyed by (pid, cwd).
  Applied to POST /hook/set-status at 30/min default.
- db-health.ts: Postgres ping loop with exponential-backoff retry.
  GET /health returns 503 while DB is down.
- build-info.ts: version + gitSha (from GIT_SHA env or `git rev-parse`
  fallback) + uptime, surfaced on /health.

Behavior changes:
- Connection caps: MAX_CONNECTIONS_PER_MESH (default 100). Exceed →
  close(1008, "capacity") + metric increment.
- Message size: MAX_MESSAGE_BYTES (default 65536). WS applies it via
  `ws.maxPayload`. Hook POST bodies cap out with 413.
- Structured logs everywhere replacing the old `log()` helper.
- Env validation stricter: DATABASE_URL required + regex-checked for
  postgres:// prefix.

New endpoints:
- GET /health → {status, db, version, gitSha, uptime}. 503 if DB down.
- GET /metrics → Prometheus text format.

Verified: 21/21 tests still pass. Hit /health + /metrics live —
gitSha resolves correctly via `git rev-parse --short HEAD` in dev.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:14:31 +01:00
Alejandro Gutiérrez
e25115f1b0 test(broker): port test suite from claude-intercom to drizzle/postgres
21 integration tests (14 broker behavior + 7 path encoding), all
passing in ~1s against a real Postgres (claudemesh_test database on
the dev container).

Test infrastructure:
- apps/broker/vitest.config.ts extends @turbostarter/vitest-config/base
- tests/helpers.ts: setupTestMesh() creates a fresh mesh + 2 members
  per test with a unique slug, returns cleanup function that cascades
  the delete. cleanupAllTestMeshes() as an afterAll safety net.
- Mesh isolation in broker logic means tests don't interfere even when
  they share a database — no per-test TRUNCATE needed.

Ported behavior tests (broker.test.ts, 14 tests):
- hook flips status + queued "next" messages unblock
- "now"-priority bypasses the working gate
- DND is sacred (hooks cannot unset it)
- hook source stays fresh through jsonl refresh
- source decays to jsonl when hook signal goes stale
- isHookFresh freshness window + source-type rules
- TTL sweep flips stuck "working" → idle
- TTL sweep leaves DND alone
- first-turn race: hook fired pre-connect stashed in pending_status
- applyPendingHookStatus picks newest matching entry
- expired pending entries are ignored on connect
- broadcast targetSpec (*) reaches all members
- pubkey mismatch → message not drained
- mesh isolation: peer in mesh X doesn't drain from mesh Y

Ported encoding tests (encoding.test.ts, 7 tests):
- macOS, Linux, Windows path encoding first-candidate correctness
- Roberto's H:\Claude → H--Claude regression test (2026-04-04)
- Candidate dedup, drive-stripped fallback, leading-dash fallback

How to run: from apps/broker,
  DATABASE_URL="postgresql://.../claudemesh_test" pnpm test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 22:09:06 +01:00
Alejandro Gutiérrez
76760c9b8c test(broker): smoke test for hello + direct message flow
Some checks failed
CI / Tests / 🧪 Test (push) Has been cancelled
Adds scripts/{seed-test-mesh,peer-a,peer-b,smoke-test}.ts|.sh that
prove an end-to-end message flow works against a real Postgres:

- seed-test-mesh.ts creates user+mesh+2 members with deterministic
  hex pubkeys ("aa..aa", "bb..bb"), writes seed JSON to stdout
- peer-a.ts sends hello then a direct "send" message to peer B's
  pubkey with fake ciphertext "hello-from-a"
- peer-b.ts sends hello, waits up to 5s for a push, asserts
  senderPubkey matches peer A, exits 0/1
- smoke-test.sh wires the three together

Verified flow: hello registers presence row → send queues into
mesh.message_queue → fanout matches connected peer by pubkey →
drainForMember joins on mesh.member for senderPubkey → push lands
with ciphertext + correct sender attribution.

Also fixes a date-serialization bug that blocked the first run:
applyPendingHookStatus used `sql${col} >= ${jsDate}` which passed
JS Date.toString() to Postgres (failed to parse). Replaced raw
sql`` template with typed gte/desc/isNotNull operators from
drizzle-orm. Same fix applied in sweepPendingStatuses.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:53:33 +01:00
Alejandro Gutiérrez
56b70ac54c fix(broker): default port 7899 → 7900 to avoid collision with claude-intercom dev
Port 7899 is used by claude-intercom's broker on dev machines (it's
the convention for that tool). claudemesh is a distinct product and
should have its own default port. 7900 is unreserved and unconflicted.

Prod deploys override via BROKER_PORT env var, so this only affects
local dev ergonomics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:48:57 +01:00
Alejandro Gutiérrez
beeaa3b3c6 fix(db): rename mesh.member export to meshMember to avoid collision with auth.member
The schema/index.ts barrel does `export * from "./mesh"` + `export *
from "./auth"`. Both modules exported a symbol named `member`, which
caused TypeScript to silently exclude the ambiguous re-export and
drizzle-kit's introspection couldn't see mesh.member — its generated
migration was missing that table entirely.

Fix: rename the TypeScript binding only. The DB table name stays
"member" inside pgSchema "mesh" (still mesh.member in SQL):
- `export const member = schema.table("member", ...)` →
  `export const meshMember = schema.table("member", ...)`
- Internal references in mesh.ts updated (FK lambdas, relations,
  Zod schemas, inferred TS types)
- apps/broker/src/broker.ts import updated to meshMember as memberTable
- migrations/0000_sloppy_stryfe.sql regenerated — now includes all 7
  mesh.* tables (audit_log, invite, member, mesh, message_queue,
  pending_status, presence)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:47:02 +01:00
Alejandro Gutiérrez
cde08ea3c3 fix(broker,api): pin real ws version, drop @turbostarter/ai from packages/api
- apps/broker: ws 8.19.1 (didn't exist) → 8.20.0 (latest)
- packages/api: drop dangling @turbostarter/ai workspace ref (same
  prune debt as apps/web)
- pnpm-lock.yaml regenerated from 27 workspaces, 2476 resolved packages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:38:49 +01:00
Alejandro Gutiérrez
8438e498b6 fix(web): drop dangling @turbostarter/ai and @turbostarter/cms deps after prune
Step 3 pruned packages/ai + packages/cms but left workspace refs in
apps/web/package.json, which blocked pnpm install. Removes the two
dangling entries.

apps/web source imports remain broken until a later cleanup pass —
scope limited to unblocking the broker smoke test. Cleanup debt
inventory: 48 files import @turbostarter/ai, 5 files import
@turbostarter/cms (53 total, mostly .tsx under src/).

Also pins apps/broker's drizzle-orm to 0.44.7 (same as packages/db)
since there's no catalog entry for drizzle-orm.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:37:22 +01:00
Alejandro Gutiérrez
0a97a0c369 refactor(broker): merge HTTP+WS to single port, populate senderPubkey on push
Single-port refactor:
- Drop the BROKER_PORT+1 HTTP side-port. Use `ws` with noServer:true
  and attach to a single node:http server via the 'upgrade' event.
- Clients connect to ws://host:PORT/ws
- Hook POSTs go to http://host:PORT/hook/set-status
- Health probe at http://host:PORT/health
- One port = one Traefik label, one cert, one deploy route. Matches
  the Coolify/VPS operational constraints.

senderPubkey on push:
- drainForMember now joins mesh.message_queue → mesh.member to return
  the sender's peerPubkey alongside each envelope. No extra round-trip,
  no cache invalidation needed (option A from review).
- index.ts populates WSPushMessage.senderPubkey from the join result
  instead of the empty-string placeholder.
- Receivers can now identify who sent a message directly from the push.

README updated with a routes table for the single-port layout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:35:05 +01:00
Alejandro Gutiérrez
3c0154ae70 feat(broker): port routing + status model from claude-intercom to postgres
Ports the proven claude-intercom broker logic into apps/broker with
SQLite → Drizzle/Postgres translation. Core state engine kept verbatim:
source-priority writes (hook > manual > jsonl), fresh-gating, TTL
sweeper for stuck-working, pending-status race handler, priority
delivery gates (now/next/low), Windows path encoding (5-candidate
fallback incl. Roberto's H:\Claude → H--Claude rule).

New modules:
- broker.ts (492 lines): writeStatus, handleHookSetStatus, sweepers,
  presence lifecycle, message queueing + drainForMember, sourceRank +
  isHookFresh / isSourceFresh logic, findMemberByPubkey (WS auth hook).
- paths.ts (141): cwdToProjectKeyCandidates + findActiveJsonl +
  inferStatusFromJsonl — JSONL fallback inference for peers without
  hooks installed or with stale hook signals.
- types.ts (111): WS protocol envelopes (hello/send/push/ack/error/
  set_status), HookSetStatusRequest/Response, ConnectedPeer view.
- index.ts (323): HTTP on BROKER_PORT+1 for /hook/set-status + /health;
  WebSocket on BROKER_PORT for authenticated peer connections with
  hello/send/set_status handlers; connections registry; heartbeat
  ping/pong every 30s; graceful SIGTERM/SIGINT that marks all active
  presences disconnected.

Mesh scoping: every query/mutation includes meshId. Peer identity is
split between mesh.member (stable) and mesh.presence (ephemeral). WS
hello authenticates by pubkey against mesh.member (signature verify is
stubbed — libsodium wiring lands in client-side package later).

Broker never sees plaintext: nonce + ciphertext are opaque text fields
passed through. Routing happens on targetSpec (pubkey | "#channel" |
"tag:xyz" | "*"), resolved against currently-connected peers.

Dependencies not installed; no tests run. Verified via static review
of imports against @turbostarter/db exports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:32:14 +01:00
Alejandro Gutiérrez
d5d0e6fdbb feat(broker): scaffold apps/broker workspace (bun WS runtime, no port yet)
- @claudemesh/broker package with bun dev/start scripts
- src/index.ts stub: WS server on BROKER_PORT, SIGTERM cleanup
- src/env.ts: Zod-validated env (BROKER_PORT, DATABASE_URL, STATUS_TTL_SECONDS, HOOK_FRESH_WINDOW_SECONDS)
- src/db.ts: re-exports Drizzle client from @turbostarter/db
- src/broker.ts + src/types.ts: placeholders for step 8 port
- README documents run commands, env vars, deploy targets
- tsconfig extends @turbostarter/tsconfig base
- eslint.config.js extends @turbostarter/eslint-config/base

Dependencies declared but not installed yet (ws, drizzle-orm, zod,
libsodium-wrappers + workspace deps). turbo.json unchanged: the global
dev task already has persistent=true + cache=false which is what the
broker needs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 21:24:17 +01:00