Document v3.1.2 conventions: dedup scoping and sentinel values

Two micro-risk mitigations documented:

1. dedup_group_id: Format "{business_id}:{hash}" to prevent
   cross-tenant collision on similar reviews.

2. Sentinel conventions: 'ALL' (spatial) vs 'all' (semantic).
   Case matters — do not normalize.

Spec frozen as v3.1.2.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Alejandro Gutiérrez
2026-01-24 12:50:29 +00:00
parent 5ce3248efd
commit 3987a9ab4e

View File

@@ -233,7 +233,7 @@ CREATE TABLE reviews_enriched (
-- Quality control
trust_score FLOAT DEFAULT 1.0, -- 0.0 to 1.0
dedup_group_id TEXT, -- Groups duplicate/near-duplicate reviews
dedup_group_id TEXT, -- Tenant-scoped: format "{business_id}:{hash}"
is_suspicious BOOLEAN DEFAULT FALSE,
-- Processing metadata
@@ -405,7 +405,11 @@ CREATE INDEX idx_events_review ON issue_events(source, review_id, review_version
### 2.4 Unified Analytics Spine
**Design Decision**: `place_id = 'ALL'` is the sentinel for "all locations" rollups. This avoids NULL handling complexity while keeping the schema simple.
**Design Decision**: Sentinel value conventions (do not normalize):
- `place_id = 'ALL'` — spatial rollup (all locations)
- `subject_id = 'all'` — semantic rollup (all subjects within type)
Case matters: `'ALL'``'all'`. This avoids NULL handling while keeping the schema simple.
```sql
-- Fact table: pre-aggregated time-series metrics