Authoritative runtime contract for the broker. Documents:
- HTTP + WS routes (single-port architecture)
- Required + optional env vars (DATABASE_URL, caps, TTLs, limits)
- /health and /metrics semantics, including 503 behavior on DB drop
- SIGTERM/SIGINT graceful shutdown sequence
- Recommended multi-stage Docker build (node:slim for pnpm, oven/bun
for runtime) with GIT_SHA build-arg convention
- Signal/grace-period guidance for orchestrators
- Prometheus metric names + suggested alert thresholds
- CI pattern for the test suite (needs a live Postgres)
- Deployment target hand-off to the deploy lane
Complements the existing Dockerfile (claudemesh-3's work) with the
runtime contract the Dockerfile implements.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds the minimum ops surface area for a production broker without
over-engineering. All new config knobs are env-var driven with sane
defaults.
New modules:
- logger.ts: structured JSON logs (one line, stderr, ready for
Loki/Datadog ingestion without preprocessing)
- metrics.ts: in-process Prometheus counters + gauges, exposed at
GET /metrics. Tracks connections, messages, queue depth, TTL
sweeps, hook requests, DB health.
- rate-limit.ts: token-bucket rate limiter keyed by (pid, cwd).
Applied to POST /hook/set-status at 30/min default.
- db-health.ts: Postgres ping loop with exponential-backoff retry.
GET /health returns 503 while DB is down.
- build-info.ts: version + gitSha (from GIT_SHA env or `git rev-parse`
fallback) + uptime, surfaced on /health.
Behavior changes:
- Connection caps: MAX_CONNECTIONS_PER_MESH (default 100). Exceed →
close(1008, "capacity") + metric increment.
- Message size: MAX_MESSAGE_BYTES (default 65536). WS applies it via
`ws.maxPayload`. Hook POST bodies cap out with 413.
- Structured logs everywhere replacing the old `log()` helper.
- Env validation stricter: DATABASE_URL required + regex-checked for
postgres:// prefix.
New endpoints:
- GET /health → {status, db, version, gitSha, uptime}. 503 if DB down.
- GET /metrics → Prometheus text format.
Verified: 21/21 tests still pass. Hit /health + /metrics live —
gitSha resolves correctly via `git rev-parse --short HEAD` in dev.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
21 integration tests (14 broker behavior + 7 path encoding), all
passing in ~1s against a real Postgres (claudemesh_test database on
the dev container).
Test infrastructure:
- apps/broker/vitest.config.ts extends @turbostarter/vitest-config/base
- tests/helpers.ts: setupTestMesh() creates a fresh mesh + 2 members
per test with a unique slug, returns cleanup function that cascades
the delete. cleanupAllTestMeshes() as an afterAll safety net.
- Mesh isolation in broker logic means tests don't interfere even when
they share a database — no per-test TRUNCATE needed.
Ported behavior tests (broker.test.ts, 14 tests):
- hook flips status + queued "next" messages unblock
- "now"-priority bypasses the working gate
- DND is sacred (hooks cannot unset it)
- hook source stays fresh through jsonl refresh
- source decays to jsonl when hook signal goes stale
- isHookFresh freshness window + source-type rules
- TTL sweep flips stuck "working" → idle
- TTL sweep leaves DND alone
- first-turn race: hook fired pre-connect stashed in pending_status
- applyPendingHookStatus picks newest matching entry
- expired pending entries are ignored on connect
- broadcast targetSpec (*) reaches all members
- pubkey mismatch → message not drained
- mesh isolation: peer in mesh X doesn't drain from mesh Y
Ported encoding tests (encoding.test.ts, 7 tests):
- macOS, Linux, Windows path encoding first-candidate correctness
- Roberto's H:\Claude → H--Claude regression test (2026-04-04)
- Candidate dedup, drive-stripped fallback, leading-dash fallback
How to run: from apps/broker,
DATABASE_URL="postgresql://.../claudemesh_test" pnpm test
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Step 3 pruned packages/{ai,cms,cognitive-context} but left whole
route groups + feature modules that depended on them. Those files
were unbuildable since that prune. Removes them now so the workspace
can be validated:
Route groups:
- apps/web/src/app/[locale]/(apps)/{chat,image,pdf,tts}/
- apps/web/src/app/[locale]/(marketing)/blog/
Feature modules:
- apps/web/src/modules/{chat,image,pdf,tts,common/ai,marketing/blog}/
- packages/api/src/modules/ai/ (chat, image, pdf, stt, tts, router)
3 stragglers remain (separate handoff to claudemesh-2):
- apps/web/src/app/[locale]/(marketing)/legal/[slug]/page.tsx (cms)
- apps/web/src/app/sitemap.ts (cms)
- apps/web/src/modules/common/layout/credits/index.tsx (ai)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds scripts/{seed-test-mesh,peer-a,peer-b,smoke-test}.ts|.sh that
prove an end-to-end message flow works against a real Postgres:
- seed-test-mesh.ts creates user+mesh+2 members with deterministic
hex pubkeys ("aa..aa", "bb..bb"), writes seed JSON to stdout
- peer-a.ts sends hello then a direct "send" message to peer B's
pubkey with fake ciphertext "hello-from-a"
- peer-b.ts sends hello, waits up to 5s for a push, asserts
senderPubkey matches peer A, exits 0/1
- smoke-test.sh wires the three together
Verified flow: hello registers presence row → send queues into
mesh.message_queue → fanout matches connected peer by pubkey →
drainForMember joins on mesh.member for senderPubkey → push lands
with ciphertext + correct sender attribution.
Also fixes a date-serialization bug that blocked the first run:
applyPendingHookStatus used `sql${col} >= ${jsDate}` which passed
JS Date.toString() to Postgres (failed to parse). Replaced raw
sql`` template with typed gte/desc/isNotNull operators from
drizzle-orm. Same fix applied in sweepPendingStatuses.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Port 7899 is used by claude-intercom's broker on dev machines (it's
the convention for that tool). claudemesh is a distinct product and
should have its own default port. 7900 is unreserved and unconflicted.
Prod deploys override via BROKER_PORT env var, so this only affects
local dev ergonomics.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The schema/index.ts barrel does `export * from "./mesh"` + `export *
from "./auth"`. Both modules exported a symbol named `member`, which
caused TypeScript to silently exclude the ambiguous re-export and
drizzle-kit's introspection couldn't see mesh.member — its generated
migration was missing that table entirely.
Fix: rename the TypeScript binding only. The DB table name stays
"member" inside pgSchema "mesh" (still mesh.member in SQL):
- `export const member = schema.table("member", ...)` →
`export const meshMember = schema.table("member", ...)`
- Internal references in mesh.ts updated (FK lambdas, relations,
Zod schemas, inferred TS types)
- apps/broker/src/broker.ts import updated to meshMember as memberTable
- migrations/0000_sloppy_stryfe.sql regenerated — now includes all 7
mesh.* tables (audit_log, invite, member, mesh, message_queue,
pending_status, presence)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Step 3 pruned packages/ai + packages/cms but left workspace refs in
apps/web/package.json, which blocked pnpm install. Removes the two
dangling entries.
apps/web source imports remain broken until a later cleanup pass —
scope limited to unblocking the broker smoke test. Cleanup debt
inventory: 48 files import @turbostarter/ai, 5 files import
@turbostarter/cms (53 total, mostly .tsx under src/).
Also pins apps/broker's drizzle-orm to 0.44.7 (same as packages/db)
since there's no catalog entry for drizzle-orm.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single-port refactor:
- Drop the BROKER_PORT+1 HTTP side-port. Use `ws` with noServer:true
and attach to a single node:http server via the 'upgrade' event.
- Clients connect to ws://host:PORT/ws
- Hook POSTs go to http://host:PORT/hook/set-status
- Health probe at http://host:PORT/health
- One port = one Traefik label, one cert, one deploy route. Matches
the Coolify/VPS operational constraints.
senderPubkey on push:
- drainForMember now joins mesh.message_queue → mesh.member to return
the sender's peerPubkey alongside each envelope. No extra round-trip,
no cache invalidation needed (option A from review).
- index.ts populates WSPushMessage.senderPubkey from the join result
instead of the empty-string placeholder.
- Receivers can now identify who sent a message directly from the push.
README updated with a routes table for the single-port layout.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- @claudemesh/broker package with bun dev/start scripts
- src/index.ts stub: WS server on BROKER_PORT, SIGTERM cleanup
- src/env.ts: Zod-validated env (BROKER_PORT, DATABASE_URL, STATUS_TTL_SECONDS, HOOK_FRESH_WINDOW_SECONDS)
- src/db.ts: re-exports Drizzle client from @turbostarter/db
- src/broker.ts + src/types.ts: placeholders for step 8 port
- README documents run commands, env vars, deploy targets
- tsconfig extends @turbostarter/tsconfig base
- eslint.config.js extends @turbostarter/eslint-config/base
Dependencies declared but not installed yet (ws, drizzle-orm, zod,
libsodium-wrappers + workspace deps). turbo.json unchanged: the global
dev task already has persistent=true + cache=false which is what the
broker needs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- pgSchema "mesh" with 4 tables isolating the peer mesh domain
- Enums: visibility, transport, tier, role
- audit_log is metadata-only (E2E encryption enforced at broker/client)
- Cascade on mesh delete, soft-delete via archivedAt/revokedAt
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>