Single HTTP POST /join the CLI calls after parsing an invite link +
generating an ed25519 keypair client-side. Broker validates the mesh
exists + is not archived, inserts a mesh.member row (or returns the
existing id for idempotency), returns {ok, memberId, alreadyMember?}.
body: {mesh_id, peer_pubkey, display_name, role}
- peer_pubkey must be 64 hex chars (32 bytes)
- role is "admin" | "member"
v0.1.0 trusts the request — no invite-token validation, no ed25519
signature check. Both land in Step 18 alongside libsodium wrapping.
size cap enforced via MAX_MESSAGE_BYTES (shared with hook endpoint).
structured log line per enrollment with truncated pubkey + whether
it was a new member or re-enrolled existing one.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
broker-client: full WS client with hello handshake + ack, auto-reconnect
with exponential backoff (1s → 30s capped), in-memory outbound queue
(max 100) during reconnect, 500-entry push buffer for check_messages.
MCP tool integration:
- send_message: "slug:target" prefix or single-mesh fast path
- check_messages: drains push buffers across all clients
- set_status: fans manual override across all connected meshes
- set_summary: stubbed (broker protocol extension needed)
- list_peers: stubbed — lists connected mesh slugs + statuses
manager module holds Map<meshId, BrokerClient>, starts on MCP server
boot for every joined mesh in ~/.claudemesh/config.json.
new CLI command: seed-test-mesh injects a mesh row for dev testing.
also fixes a broker-side hello race: handleHello sent hello_ack before
the caller closure assigned presenceId, so clients sending right after
the ack hit the no_hello check. Fix: return presenceId, caller sets
closure var, THEN sends hello_ack. Queue drain is fire-and-forget now.
round-trip verified: two clients, A→B, push received with correct
senderPubkey + ciphertext. 44/44 broker tests still pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds 23 tests across 4 files, taking total broker coverage from
21 → 44 passing in ~2.5s.
Unit tests (no I/O):
- tests/rate-limit.test.ts (6): TokenBucket capacity, refill rate,
no-overflow cap, independent buckets per key, sweep GC.
- tests/metrics.test.ts (5): all 10 series present in /metrics,
counter increment semantics, labelled series produce distinct lines,
gauge set overwrites, Prometheus format well-formed.
- tests/logging.test.ts (5): JSON per line, required fields (ts, level,
component, msg), context merging, level preservation, no plain-text
escape hatches.
Integration tests (spawn real broker subprocesses on random ports):
- tests/integration/health.test.ts (7):
* GET /health 200 + {status, db, version, gitSha, uptime} (healthy DB)
* GET /health 503 + {status:degraded, db:down} (unreachable DB)
* GET /metrics 200 text/plain with all expected series
* GET /nope → 404
* POST /hook/set-status oversized body → 413
* POST /hook/set-status 6th req/min → 429
* Rate limit isolation by (pid, cwd) key
Integration tests use node:child_process (vitest runs under Node, not
Bun — Bun.spawn isn't available). Each suite spawns its own broker
subprocess with a random port + tailored env vars.
Not yet covered (flagged for follow-up):
- WebSocket connection caps (needs seeded mesh + WS client setup)
- WebSocket message-size rejection (ws.maxPayload behavior)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Authoritative runtime contract for the broker. Documents:
- HTTP + WS routes (single-port architecture)
- Required + optional env vars (DATABASE_URL, caps, TTLs, limits)
- /health and /metrics semantics, including 503 behavior on DB drop
- SIGTERM/SIGINT graceful shutdown sequence
- Recommended multi-stage Docker build (node:slim for pnpm, oven/bun
for runtime) with GIT_SHA build-arg convention
- Signal/grace-period guidance for orchestrators
- Prometheus metric names + suggested alert thresholds
- CI pattern for the test suite (needs a live Postgres)
- Deployment target hand-off to the deploy lane
Complements the existing Dockerfile (claudemesh-3's work) with the
runtime contract the Dockerfile implements.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the minimum ops surface area for a production broker without
over-engineering. All new config knobs are env-var driven with sane
defaults.
New modules:
- logger.ts: structured JSON logs (one line, stderr, ready for
Loki/Datadog ingestion without preprocessing)
- metrics.ts: in-process Prometheus counters + gauges, exposed at
GET /metrics. Tracks connections, messages, queue depth, TTL
sweeps, hook requests, DB health.
- rate-limit.ts: token-bucket rate limiter keyed by (pid, cwd).
Applied to POST /hook/set-status at 30/min default.
- db-health.ts: Postgres ping loop with exponential-backoff retry.
GET /health returns 503 while DB is down.
- build-info.ts: version + gitSha (from GIT_SHA env or `git rev-parse`
fallback) + uptime, surfaced on /health.
Behavior changes:
- Connection caps: MAX_CONNECTIONS_PER_MESH (default 100). Exceed →
close(1008, "capacity") + metric increment.
- Message size: MAX_MESSAGE_BYTES (default 65536). WS applies it via
`ws.maxPayload`. Hook POST bodies cap out with 413.
- Structured logs everywhere replacing the old `log()` helper.
- Env validation stricter: DATABASE_URL required + regex-checked for
postgres:// prefix.
New endpoints:
- GET /health → {status, db, version, gitSha, uptime}. 503 if DB down.
- GET /metrics → Prometheus text format.
Verified: 21/21 tests still pass. Hit /health + /metrics live —
gitSha resolves correctly via `git rev-parse --short HEAD` in dev.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
21 integration tests (14 broker behavior + 7 path encoding), all
passing in ~1s against a real Postgres (claudemesh_test database on
the dev container).
Test infrastructure:
- apps/broker/vitest.config.ts extends @turbostarter/vitest-config/base
- tests/helpers.ts: setupTestMesh() creates a fresh mesh + 2 members
per test with a unique slug, returns cleanup function that cascades
the delete. cleanupAllTestMeshes() as an afterAll safety net.
- Mesh isolation in broker logic means tests don't interfere even when
they share a database — no per-test TRUNCATE needed.
Ported behavior tests (broker.test.ts, 14 tests):
- hook flips status + queued "next" messages unblock
- "now"-priority bypasses the working gate
- DND is sacred (hooks cannot unset it)
- hook source stays fresh through jsonl refresh
- source decays to jsonl when hook signal goes stale
- isHookFresh freshness window + source-type rules
- TTL sweep flips stuck "working" → idle
- TTL sweep leaves DND alone
- first-turn race: hook fired pre-connect stashed in pending_status
- applyPendingHookStatus picks newest matching entry
- expired pending entries are ignored on connect
- broadcast targetSpec (*) reaches all members
- pubkey mismatch → message not drained
- mesh isolation: peer in mesh X doesn't drain from mesh Y
Ported encoding tests (encoding.test.ts, 7 tests):
- macOS, Linux, Windows path encoding first-candidate correctness
- Roberto's H:\Claude → H--Claude regression test (2026-04-04)
- Candidate dedup, drive-stripped fallback, leading-dash fallback
How to run: from apps/broker,
DATABASE_URL="postgresql://.../claudemesh_test" pnpm test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Step 3 pruned packages/{ai,cms,cognitive-context} but left whole
route groups + feature modules that depended on them. Those files
were unbuildable since that prune. Removes them now so the workspace
can be validated:
Route groups:
- apps/web/src/app/[locale]/(apps)/{chat,image,pdf,tts}/
- apps/web/src/app/[locale]/(marketing)/blog/
Feature modules:
- apps/web/src/modules/{chat,image,pdf,tts,common/ai,marketing/blog}/
- packages/api/src/modules/ai/ (chat, image, pdf, stt, tts, router)
3 stragglers remain (separate handoff to claudemesh-2):
- apps/web/src/app/[locale]/(marketing)/legal/[slug]/page.tsx (cms)
- apps/web/src/app/sitemap.ts (cms)
- apps/web/src/modules/common/layout/credits/index.tsx (ai)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds scripts/{seed-test-mesh,peer-a,peer-b,smoke-test}.ts|.sh that
prove an end-to-end message flow works against a real Postgres:
- seed-test-mesh.ts creates user+mesh+2 members with deterministic
hex pubkeys ("aa..aa", "bb..bb"), writes seed JSON to stdout
- peer-a.ts sends hello then a direct "send" message to peer B's
pubkey with fake ciphertext "hello-from-a"
- peer-b.ts sends hello, waits up to 5s for a push, asserts
senderPubkey matches peer A, exits 0/1
- smoke-test.sh wires the three together
Verified flow: hello registers presence row → send queues into
mesh.message_queue → fanout matches connected peer by pubkey →
drainForMember joins on mesh.member for senderPubkey → push lands
with ciphertext + correct sender attribution.
Also fixes a date-serialization bug that blocked the first run:
applyPendingHookStatus used `sql${col} >= ${jsDate}` which passed
JS Date.toString() to Postgres (failed to parse). Replaced raw
sql`` template with typed gte/desc/isNotNull operators from
drizzle-orm. Same fix applied in sweepPendingStatuses.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Port 7899 is used by claude-intercom's broker on dev machines (it's
the convention for that tool). claudemesh is a distinct product and
should have its own default port. 7900 is unreserved and unconflicted.
Prod deploys override via BROKER_PORT env var, so this only affects
local dev ergonomics.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The schema/index.ts barrel does `export * from "./mesh"` + `export *
from "./auth"`. Both modules exported a symbol named `member`, which
caused TypeScript to silently exclude the ambiguous re-export and
drizzle-kit's introspection couldn't see mesh.member — its generated
migration was missing that table entirely.
Fix: rename the TypeScript binding only. The DB table name stays
"member" inside pgSchema "mesh" (still mesh.member in SQL):
- `export const member = schema.table("member", ...)` →
`export const meshMember = schema.table("member", ...)`
- Internal references in mesh.ts updated (FK lambdas, relations,
Zod schemas, inferred TS types)
- apps/broker/src/broker.ts import updated to meshMember as memberTable
- migrations/0000_sloppy_stryfe.sql regenerated — now includes all 7
mesh.* tables (audit_log, invite, member, mesh, message_queue,
pending_status, presence)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Step 3 pruned packages/ai + packages/cms but left workspace refs in
apps/web/package.json, which blocked pnpm install. Removes the two
dangling entries.
apps/web source imports remain broken until a later cleanup pass —
scope limited to unblocking the broker smoke test. Cleanup debt
inventory: 48 files import @turbostarter/ai, 5 files import
@turbostarter/cms (53 total, mostly .tsx under src/).
Also pins apps/broker's drizzle-orm to 0.44.7 (same as packages/db)
since there's no catalog entry for drizzle-orm.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single-port refactor:
- Drop the BROKER_PORT+1 HTTP side-port. Use `ws` with noServer:true
and attach to a single node:http server via the 'upgrade' event.
- Clients connect to ws://host:PORT/ws
- Hook POSTs go to http://host:PORT/hook/set-status
- Health probe at http://host:PORT/health
- One port = one Traefik label, one cert, one deploy route. Matches
the Coolify/VPS operational constraints.
senderPubkey on push:
- drainForMember now joins mesh.message_queue → mesh.member to return
the sender's peerPubkey alongside each envelope. No extra round-trip,
no cache invalidation needed (option A from review).
- index.ts populates WSPushMessage.senderPubkey from the join result
instead of the empty-string placeholder.
- Receivers can now identify who sent a message directly from the push.
README updated with a routes table for the single-port layout.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- @claudemesh/broker package with bun dev/start scripts
- src/index.ts stub: WS server on BROKER_PORT, SIGTERM cleanup
- src/env.ts: Zod-validated env (BROKER_PORT, DATABASE_URL, STATUS_TTL_SECONDS, HOOK_FRESH_WINDOW_SECONDS)
- src/db.ts: re-exports Drizzle client from @turbostarter/db
- src/broker.ts + src/types.ts: placeholders for step 8 port
- README documents run commands, env vars, deploy targets
- tsconfig extends @turbostarter/tsconfig base
- eslint.config.js extends @turbostarter/eslint-config/base
Dependencies declared but not installed yet (ws, drizzle-orm, zod,
libsodium-wrappers + workspace deps). turbo.json unchanged: the global
dev task already has persistent=true + cache=false which is what the
broker needs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- pgSchema "mesh" with 4 tables isolating the peer mesh domain
- Enums: visibility, transport, tier, role
- audit_log is metadata-only (E2E encryption enforced at broker/client)
- Cascade on mesh delete, soft-delete via archivedAt/revokedAt
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>