Document v3.1.2 conventions: dedup scoping and sentinel values
Two micro-risk mitigations documented:
1. dedup_group_id: Format "{business_id}:{hash}" to prevent
cross-tenant collision on similar reviews.
2. Sentinel conventions: 'ALL' (spatial) vs 'all' (semantic).
Case matters — do not normalize.
Spec frozen as v3.1.2.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -233,7 +233,7 @@ CREATE TABLE reviews_enriched (
|
||||
|
||||
-- Quality control
|
||||
trust_score FLOAT DEFAULT 1.0, -- 0.0 to 1.0
|
||||
dedup_group_id TEXT, -- Groups duplicate/near-duplicate reviews
|
||||
dedup_group_id TEXT, -- Tenant-scoped: format "{business_id}:{hash}"
|
||||
is_suspicious BOOLEAN DEFAULT FALSE,
|
||||
|
||||
-- Processing metadata
|
||||
@@ -405,7 +405,11 @@ CREATE INDEX idx_events_review ON issue_events(source, review_id, review_version
|
||||
|
||||
### 2.4 Unified Analytics Spine
|
||||
|
||||
**Design Decision**: `place_id = 'ALL'` is the sentinel for "all locations" rollups. This avoids NULL handling complexity while keeping the schema simple.
|
||||
**Design Decision**: Sentinel value conventions (do not normalize):
|
||||
- `place_id = 'ALL'` — spatial rollup (all locations)
|
||||
- `subject_id = 'all'` — semantic rollup (all subjects within type)
|
||||
|
||||
Case matters: `'ALL'` ≠ `'all'`. This avoids NULL handling while keeping the schema simple.
|
||||
|
||||
```sql
|
||||
-- Fact table: pre-aggregated time-series metrics
|
||||
|
||||
Reference in New Issue
Block a user