# ReviewIQ v3.2 Design Decisions > Fast context-recovery document — all key decisions without the full spec. --- ## 1. Markpoint ``` ID: reviewiq-v32-span-layer-2026-01-24-004 Status: Pipeline contracts defined, ready for parallel implementation Based on: v3.2 (commit 43fd151) START HERE: ReviewIQ-Pipeline-DevGuide.md (for pipeline implementation) Key Documents: - ReviewIQ-Pipeline-DevGuide.md (entry point for pipeline work) - ReviewIQ-Codebase-Overview.md (file structure, what exists) - ReviewIQ-Pipeline-Contracts-v1.md (stage I/O contracts, validation) - ReviewIQ-Pipeline-Checklist.md (implementation checklist) - ReviewIQ-v3.2.1-Taxonomy-Versioning.md (taxonomy versioning spec) ``` --- ## 2. Core Design Decisions | Decision | Choice | Rationale | |----------|--------|-----------| | Span granularity | Clause/topic-level | Preserves multi-domain signal | | span_id format | ULID (TEXT) | Survives re-segmentation | | Span offsets | Required (NOT NULL) | Deterministic reconstruction | | Offsets reference | reviews_enriched.text | Not text_normalized | | Span → Issue mapping | One-to-one (UNIQUE span_id) | Atomic unit per issue | | Primary span enforcement | Partial unique index | Exactly one per review version | | Primary selection | I3>I2>I1, V->V±>V0>V+, span_index | Deterministic, stable | | Reprocessing strategy | Soft-switch with is_active | No transient empty states | | Span overlap | GiST exclusion constraint | Non-overlapping ranges enforced | | Secondary codes | Array with cardinality ≤ 2 | Could normalize to link table later | | Causal chain storage | JSONB | Flexibility, normalize later if needed | | relation_type vs causal_chain | Separate concerns | relation = within-review, causal = root cause | | Dimension columns | Postgres ENUMs | Type safety, storage efficiency | | Trust score floor | 0.2 (GREATEST clamp) | Prevent multiplicative collapse | | Issue routing key | (business_id, place_id, urt_primary, entity_normalized) | Deterministic, entity-aware | | Issue ID generation | SHA256 via pgcrypto | Deterministic, collision-resistant | | Text validation trigger | Conditional via session setting | Performance: skip in bulk loads | | Relation validation | Application-level post-insert | Handles insertion order | --- ## 3. Extensions Required | Extension | Purpose | |-----------|---------| | `btree_gist` | Exclusion constraint for non-overlapping spans | | `pgcrypto` | SHA256-based issue ID generation | --- ## 4. New Tables | Table | Purpose | |-------|---------| | `review_spans` | Span-level URT classification | | `review_span_secondary_codes` | (Optional) Normalized secondary codes | --- ## 5. Modified Tables | Table | Changes | |-------|---------| | `issue_spans` | Added `span_id` FK (NOT NULL), removed direct review FK as canonical | --- ## 6. New ENUM Types **Valence & Intensity:** - `urt_valence` — V-, V±, V0, V+ - `urt_intensity` — I1, I2, I3 **Specificity & Actionability:** - `urt_specificity` — S1, S2, S3 - `urt_actionability` — A1, A2, A3 **Context & Evidence:** - `urt_temporal` — TC (current), TR (recent), TH (historical), TF (future) - `urt_evidence` — ES (stated), EI (inferred), EC (contextual) - `urt_comparative` — CR-N (none), CR-B (better), CR-W (worse), CR-S (same) **Classification:** - `urt_profile` — lite, core, standard, full - `urt_confidence` — low, medium, high - `urt_relation` — cause_of, effect_of, contrast, resolution - `urt_entity_type` — location, staff, product, process, time, other --- ## 7. Key Functions | Function | Purpose | |----------|---------| | `urt_validate_causal_chain()` | Validates causal JSONB structure | | `validate_review_relations()` | Ensures related_span_id same-parent | | `validate_active_spans()` | Ensures valid active span set | | `set_primary_span()` | Deterministic primary selection | | `generate_issue_id()` | SHA256-based issue ID | --- ## 8. Key Triggers | Trigger | Purpose | |---------|---------| | `review_spans_validate_bounds` | span_end ≤ text length | | `review_spans_validate_text` | span_text matches substring | | `review_spans_validate_causal_chain` | causal_chain JSONB valid | --- ## 9. USN Format ``` Standard: URT:S:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR} Full: URT:F:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}:{causal} ``` **Examples:** - `URT:S:SVC.SPD:V-I3:S3A3T2.E2.CR1` — Specific service speed complaint - `URT:F:PRD.QUA:V+I2:S2A1T1.E3.CR2:staff→training` — Product quality praise with causal chain --- ## 10. Span Boundary Rules 1. **Split on contrasting conjunctions** — "but", "however", "although" 2. **Split on topic/target change** — Different entity or aspect 3. **Split on valence change** — Positive → Negative or vice versa 4. **Split on domain change** — SVC → PRD → AMB 5. **Keep cause→effect together** — Causal chain stays in one span --- ## 11. Deferred to v3.3+ | Item | Reason | |------|--------| | Entity extraction implementation | Requires NER pipeline | | Trust-weighted fact aggregation | Needs more span data | | Secondary domain enforcement | App-level validation sufficient | | Span-based fact counting | Currently review-based, optimize later | --- ## 12. Open Questions Resolved | Question | Resolution | |----------|------------| | Span → Issue cardinality? | **One-to-one** (not many-to-many) | | Offsets nullable for LLM-inferred? | **No** — required, NOT NULL | | Reprocessing strategy? | **Soft-switch** with is_active flag | | TEXT vs ENUM for dimensions? | **ENUMs** — committed to Postgres | | Taxonomy evolution tracking? | **Yes** — versioned codes with explicit mappings (v3.2.1) | | B2 schema vs v3.2 divergence? | **Documented** — B2 is canonical URT, v3.2 is app layer | | Taxonomy versioning? | **Yes** — `taxonomy_version` column on spans, versioned code tables | --- ## 13. B2 Schema Audit Findings **Audit Date**: 2026-01-24 The B2-database-schema.sql (canonical URT v5.1) and ReviewIQ v3.2 spec have deliberate divergences: | Aspect | B2 (URT v5.1) | v3.2 (ReviewIQ) | Resolution | |--------|---------------|-----------------|------------| | Purpose | Source-agnostic taxonomy | Google Reviews app layer | Keep both | | ID strategy | UUIDs + sequential | Deterministic SHA256 | v3.2 choice | | Type safety | VARCHAR + CHECK | Postgres ENUMs | v3.2 choice | | Span table | `spans` | `review_spans` | v3.2 naming | | Offset columns | `char_start/char_end` | `span_start/span_end` | Document divergence | | Tenant model | Single-tenant | Multi-tenant (business_id) | v3.2 requirement | | Issue-span mapping | Many-to-many | One-to-one | v3.2 choice | | Causal chain | Normalized table | JSONB column | v3.2 flexibility | | Reprocessing | Not supported | Soft-switch pattern | v3.2 innovation | **Action Items**: 1. Import reference data (domains, categories, subcodes) from B2 INSERTs 2. Seed `urt_codes` / `urt_codes_versioned` from B1-urt-codes.yaml 3. Do NOT adopt B2 structure directly — v3.2 has specific app requirements --- ## 14. Taxonomy Versioning (v3.2.1) | Decision | Choice | Rationale | |----------|--------|-----------| | Track taxonomy version | Required column on spans | Classifications only meaningful in version context | | Version ID format | `v{major}.{minor}` | Human-readable, matches URT releases | | Code FK strategy | Composite `(code, version_id)` | Prevents orphaned classifications | | Cross-version mappings | Explicit mapping table | Enables normalized trend queries | | Mapping direction | Forward only (old→new) | Simpler model, matches time flow | | Default version | `'v5.1'` hardcoded | Safe baseline, explicit upgrade path | | Fact table versioning | Per-row `taxonomy_version` | Enables version-specific aggregation | **Key Tables Added**: - `urt_taxonomy_versions` — Version registry with validity periods - `urt_codes_versioned` — Full code definitions per version (SCD Type 2) - `urt_code_mappings` — Cross-version translation rules **Key Functions Added**: - `translate_urt_code(code, from_version, to_version)` — Single code translation - `get_code_lineage(code, version)` — Full historical lineage - `detect_taxonomy_drift(from_version, to_version)` — Impact analysis - `aggregate_spans_normalized(...)` — Version-normalized aggregation **Principle**: Facts are immutable. A span classified as `J1.01` in v5.1 stays that way forever. Translation is explicit and auditable. See: `.artifacts/ReviewIQ-v3.2.1-Taxonomy-Versioning.md` --- ## 15. Pipeline Implementation Status **Overall: ~55% Complete** (as of 2026-01-24) | Stage | Name | Status | Owner | |-------|------|--------|-------| | 0 | Raw Ingestion | ✅ DONE | Scraper Team | | 1 | Normalization | ❌ TODO | TBD | | 2 | LLM Classification | ❌ TODO | TBD | | 3 | Issue Routing | ❌ TODO | TBD | | 4 | Fact Aggregation | ❌ TODO | TBD | **What's Working**: - Google Maps scraping (v1.0.0) - Job orchestration & queuing - Webhook delivery - Frontend job management - Real-time SSE streaming **What's Missing**: - Entire enrichment pipeline (Stages 1-4) - LLM integration - Span extraction - Issue routing - Analytics aggregation **Parallel Development**: Each stage can be implemented independently using the contracts defined in: - `ReviewIQ-Pipeline-Contracts-v1.md` — Full I/O specs, validation rules, test fixtures - `ReviewIQ-Pipeline-Checklist.md` — Implementation checklist, definition of done **Estimated Effort to 100%**: 6-8 weeks --- ## Quick Reference ### Primary Span Selection Algorithm ``` ORDER BY: 1. intensity DESC (I3 > I2 > I1) 2. valence ASC (V- > V± > V0 > V+) 3. span_index ASC (first wins ties) ``` ### Issue Routing Key ```sql (business_id, place_id, urt_primary, entity_normalized) ``` ### Trust Score Calculation ```sql GREATEST(0.2, base_trust * modifiers) -- Floor prevents collapse ``` --- *Last updated: 2026-01-24 (pipeline contracts + codebase overview)*