New artifacts: - ReviewIQ-Pipeline-DevGuide.md: Entry point for pipeline work - ReviewIQ-Pipeline-Contracts-v1.md: Stage I/O specs, validation rules, test fixtures - ReviewIQ-Pipeline-Checklist.md: Per-stage implementation checklists - ReviewIQ-Codebase-Overview.md: File structure, integration points - ReviewIQ-v3.2.1-Taxonomy-Versioning.md: Taxonomy versioning addendum Updated: - ReviewIQ-v32-Decisions.md: Added B2 audit findings, taxonomy versioning decisions, pipeline status These artifacts enable parallel development of pipeline stages 1-4 with: - Independent validation (35 rules across stages) - Clear input/output contracts - Test fixtures for each stage - Definition of done criteria Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
9.7 KiB
ReviewIQ v3.2 Design Decisions
Fast context-recovery document — all key decisions without the full spec.
1. Markpoint
ID: reviewiq-v32-span-layer-2026-01-24-004
Status: Pipeline contracts defined, ready for parallel implementation
Based on: v3.2 (commit 43fd151)
START HERE: ReviewIQ-Pipeline-DevGuide.md (for pipeline implementation)
Key Documents:
- ReviewIQ-Pipeline-DevGuide.md (entry point for pipeline work)
- ReviewIQ-Codebase-Overview.md (file structure, what exists)
- ReviewIQ-Pipeline-Contracts-v1.md (stage I/O contracts, validation)
- ReviewIQ-Pipeline-Checklist.md (implementation checklist)
- ReviewIQ-v3.2.1-Taxonomy-Versioning.md (taxonomy versioning spec)
2. Core Design Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Span granularity | Clause/topic-level | Preserves multi-domain signal |
| span_id format | ULID (TEXT) | Survives re-segmentation |
| Span offsets | Required (NOT NULL) | Deterministic reconstruction |
| Offsets reference | reviews_enriched.text | Not text_normalized |
| Span → Issue mapping | One-to-one (UNIQUE span_id) | Atomic unit per issue |
| Primary span enforcement | Partial unique index | Exactly one per review version |
| Primary selection | I3>I2>I1, V->V±>V0>V+, span_index | Deterministic, stable |
| Reprocessing strategy | Soft-switch with is_active | No transient empty states |
| Span overlap | GiST exclusion constraint | Non-overlapping ranges enforced |
| Secondary codes | Array with cardinality ≤ 2 | Could normalize to link table later |
| Causal chain storage | JSONB | Flexibility, normalize later if needed |
| relation_type vs causal_chain | Separate concerns | relation = within-review, causal = root cause |
| Dimension columns | Postgres ENUMs | Type safety, storage efficiency |
| Trust score floor | 0.2 (GREATEST clamp) | Prevent multiplicative collapse |
| Issue routing key | (business_id, place_id, urt_primary, entity_normalized) | Deterministic, entity-aware |
| Issue ID generation | SHA256 via pgcrypto | Deterministic, collision-resistant |
| Text validation trigger | Conditional via session setting | Performance: skip in bulk loads |
| Relation validation | Application-level post-insert | Handles insertion order |
3. Extensions Required
| Extension | Purpose |
|---|---|
btree_gist |
Exclusion constraint for non-overlapping spans |
pgcrypto |
SHA256-based issue ID generation |
4. New Tables
| Table | Purpose |
|---|---|
review_spans |
Span-level URT classification |
review_span_secondary_codes |
(Optional) Normalized secondary codes |
5. Modified Tables
| Table | Changes |
|---|---|
issue_spans |
Added span_id FK (NOT NULL), removed direct review FK as canonical |
6. New ENUM Types
Valence & Intensity:
urt_valence— V-, V±, V0, V+urt_intensity— I1, I2, I3
Specificity & Actionability:
urt_specificity— S1, S2, S3urt_actionability— A1, A2, A3
Context & Evidence:
urt_temporal— TC (current), TR (recent), TH (historical), TF (future)urt_evidence— ES (stated), EI (inferred), EC (contextual)urt_comparative— CR-N (none), CR-B (better), CR-W (worse), CR-S (same)
Classification:
urt_profile— lite, core, standard, fullurt_confidence— low, medium, highurt_relation— cause_of, effect_of, contrast, resolutionurt_entity_type— location, staff, product, process, time, other
7. Key Functions
| Function | Purpose |
|---|---|
urt_validate_causal_chain() |
Validates causal JSONB structure |
validate_review_relations() |
Ensures related_span_id same-parent |
validate_active_spans() |
Ensures valid active span set |
set_primary_span() |
Deterministic primary selection |
generate_issue_id() |
SHA256-based issue ID |
8. Key Triggers
| Trigger | Purpose |
|---|---|
review_spans_validate_bounds |
span_end ≤ text length |
review_spans_validate_text |
span_text matches substring |
review_spans_validate_causal_chain |
causal_chain JSONB valid |
9. USN Format
Standard: URT:S:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}
Full: URT:F:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}:{causal}
Examples:
URT:S:SVC.SPD:V-I3:S3A3T2.E2.CR1— Specific service speed complaintURT:F:PRD.QUA:V+I2:S2A1T1.E3.CR2:staff→training— Product quality praise with causal chain
10. Span Boundary Rules
- Split on contrasting conjunctions — "but", "however", "although"
- Split on topic/target change — Different entity or aspect
- Split on valence change — Positive → Negative or vice versa
- Split on domain change — SVC → PRD → AMB
- Keep cause→effect together — Causal chain stays in one span
11. Deferred to v3.3+
| Item | Reason |
|---|---|
| Entity extraction implementation | Requires NER pipeline |
| Trust-weighted fact aggregation | Needs more span data |
| Secondary domain enforcement | App-level validation sufficient |
| Span-based fact counting | Currently review-based, optimize later |
12. Open Questions Resolved
| Question | Resolution |
|---|---|
| Span → Issue cardinality? | One-to-one (not many-to-many) |
| Offsets nullable for LLM-inferred? | No — required, NOT NULL |
| Reprocessing strategy? | Soft-switch with is_active flag |
| TEXT vs ENUM for dimensions? | ENUMs — committed to Postgres |
| Taxonomy evolution tracking? | Yes — versioned codes with explicit mappings (v3.2.1) |
| B2 schema vs v3.2 divergence? | Documented — B2 is canonical URT, v3.2 is app layer |
| Taxonomy versioning? | Yes — taxonomy_version column on spans, versioned code tables |
13. B2 Schema Audit Findings
Audit Date: 2026-01-24
The B2-database-schema.sql (canonical URT v5.1) and ReviewIQ v3.2 spec have deliberate divergences:
| Aspect | B2 (URT v5.1) | v3.2 (ReviewIQ) | Resolution |
|---|---|---|---|
| Purpose | Source-agnostic taxonomy | Google Reviews app layer | Keep both |
| ID strategy | UUIDs + sequential | Deterministic SHA256 | v3.2 choice |
| Type safety | VARCHAR + CHECK | Postgres ENUMs | v3.2 choice |
| Span table | spans |
review_spans |
v3.2 naming |
| Offset columns | char_start/char_end |
span_start/span_end |
Document divergence |
| Tenant model | Single-tenant | Multi-tenant (business_id) | v3.2 requirement |
| Issue-span mapping | Many-to-many | One-to-one | v3.2 choice |
| Causal chain | Normalized table | JSONB column | v3.2 flexibility |
| Reprocessing | Not supported | Soft-switch pattern | v3.2 innovation |
Action Items:
- Import reference data (domains, categories, subcodes) from B2 INSERTs
- Seed
urt_codes/urt_codes_versionedfrom B1-urt-codes.yaml - Do NOT adopt B2 structure directly — v3.2 has specific app requirements
14. Taxonomy Versioning (v3.2.1)
| Decision | Choice | Rationale |
|---|---|---|
| Track taxonomy version | Required column on spans | Classifications only meaningful in version context |
| Version ID format | v{major}.{minor} |
Human-readable, matches URT releases |
| Code FK strategy | Composite (code, version_id) |
Prevents orphaned classifications |
| Cross-version mappings | Explicit mapping table | Enables normalized trend queries |
| Mapping direction | Forward only (old→new) | Simpler model, matches time flow |
| Default version | 'v5.1' hardcoded |
Safe baseline, explicit upgrade path |
| Fact table versioning | Per-row taxonomy_version |
Enables version-specific aggregation |
Key Tables Added:
urt_taxonomy_versions— Version registry with validity periodsurt_codes_versioned— Full code definitions per version (SCD Type 2)urt_code_mappings— Cross-version translation rules
Key Functions Added:
translate_urt_code(code, from_version, to_version)— Single code translationget_code_lineage(code, version)— Full historical lineagedetect_taxonomy_drift(from_version, to_version)— Impact analysisaggregate_spans_normalized(...)— Version-normalized aggregation
Principle: Facts are immutable. A span classified as J1.01 in v5.1 stays that way forever. Translation is explicit and auditable.
See: .artifacts/ReviewIQ-v3.2.1-Taxonomy-Versioning.md
15. Pipeline Implementation Status
Overall: ~55% Complete (as of 2026-01-24)
| Stage | Name | Status | Owner |
|---|---|---|---|
| 0 | Raw Ingestion | ✅ DONE | Scraper Team |
| 1 | Normalization | ❌ TODO | TBD |
| 2 | LLM Classification | ❌ TODO | TBD |
| 3 | Issue Routing | ❌ TODO | TBD |
| 4 | Fact Aggregation | ❌ TODO | TBD |
What's Working:
- Google Maps scraping (v1.0.0)
- Job orchestration & queuing
- Webhook delivery
- Frontend job management
- Real-time SSE streaming
What's Missing:
- Entire enrichment pipeline (Stages 1-4)
- LLM integration
- Span extraction
- Issue routing
- Analytics aggregation
Parallel Development: Each stage can be implemented independently using the contracts defined in:
ReviewIQ-Pipeline-Contracts-v1.md— Full I/O specs, validation rules, test fixturesReviewIQ-Pipeline-Checklist.md— Implementation checklist, definition of done
Estimated Effort to 100%: 6-8 weeks
Quick Reference
Primary Span Selection Algorithm
ORDER BY:
1. intensity DESC (I3 > I2 > I1)
2. valence ASC (V- > V± > V0 > V+)
3. span_index ASC (first wins ties)
Issue Routing Key
(business_id, place_id, urt_primary, entity_normalized)
Trust Score Calculation
GREATEST(0.2, base_trust * modifiers) -- Floor prevents collapse
Last updated: 2026-01-24 (pipeline contracts + codebase overview)