Files
whyrating-engine-legacy/.artifacts/ReviewIQ-v32-Decisions.md
Alejandro Gutiérrez acd3b22e88 docs: Add pipeline development artifacts for parallel implementation
New artifacts:
- ReviewIQ-Pipeline-DevGuide.md: Entry point for pipeline work
- ReviewIQ-Pipeline-Contracts-v1.md: Stage I/O specs, validation rules, test fixtures
- ReviewIQ-Pipeline-Checklist.md: Per-stage implementation checklists
- ReviewIQ-Codebase-Overview.md: File structure, integration points
- ReviewIQ-v3.2.1-Taxonomy-Versioning.md: Taxonomy versioning addendum

Updated:
- ReviewIQ-v32-Decisions.md: Added B2 audit findings, taxonomy versioning decisions, pipeline status

These artifacts enable parallel development of pipeline stages 1-4 with:
- Independent validation (35 rules across stages)
- Clear input/output contracts
- Test fixtures for each stage
- Definition of done criteria

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 17:08:40 +00:00

9.7 KiB

ReviewIQ v3.2 Design Decisions

Fast context-recovery document — all key decisions without the full spec.


1. Markpoint

ID:       reviewiq-v32-span-layer-2026-01-24-004
Status:   Pipeline contracts defined, ready for parallel implementation
Based on: v3.2 (commit 43fd151)

START HERE: ReviewIQ-Pipeline-DevGuide.md (for pipeline implementation)

Key Documents:
  - ReviewIQ-Pipeline-DevGuide.md      (entry point for pipeline work)
  - ReviewIQ-Codebase-Overview.md      (file structure, what exists)
  - ReviewIQ-Pipeline-Contracts-v1.md  (stage I/O contracts, validation)
  - ReviewIQ-Pipeline-Checklist.md     (implementation checklist)
  - ReviewIQ-v3.2.1-Taxonomy-Versioning.md (taxonomy versioning spec)

2. Core Design Decisions

Decision Choice Rationale
Span granularity Clause/topic-level Preserves multi-domain signal
span_id format ULID (TEXT) Survives re-segmentation
Span offsets Required (NOT NULL) Deterministic reconstruction
Offsets reference reviews_enriched.text Not text_normalized
Span → Issue mapping One-to-one (UNIQUE span_id) Atomic unit per issue
Primary span enforcement Partial unique index Exactly one per review version
Primary selection I3>I2>I1, V->V±>V0>V+, span_index Deterministic, stable
Reprocessing strategy Soft-switch with is_active No transient empty states
Span overlap GiST exclusion constraint Non-overlapping ranges enforced
Secondary codes Array with cardinality ≤ 2 Could normalize to link table later
Causal chain storage JSONB Flexibility, normalize later if needed
relation_type vs causal_chain Separate concerns relation = within-review, causal = root cause
Dimension columns Postgres ENUMs Type safety, storage efficiency
Trust score floor 0.2 (GREATEST clamp) Prevent multiplicative collapse
Issue routing key (business_id, place_id, urt_primary, entity_normalized) Deterministic, entity-aware
Issue ID generation SHA256 via pgcrypto Deterministic, collision-resistant
Text validation trigger Conditional via session setting Performance: skip in bulk loads
Relation validation Application-level post-insert Handles insertion order

3. Extensions Required

Extension Purpose
btree_gist Exclusion constraint for non-overlapping spans
pgcrypto SHA256-based issue ID generation

4. New Tables

Table Purpose
review_spans Span-level URT classification
review_span_secondary_codes (Optional) Normalized secondary codes

5. Modified Tables

Table Changes
issue_spans Added span_id FK (NOT NULL), removed direct review FK as canonical

6. New ENUM Types

Valence & Intensity:

  • urt_valence — V-, V±, V0, V+
  • urt_intensity — I1, I2, I3

Specificity & Actionability:

  • urt_specificity — S1, S2, S3
  • urt_actionability — A1, A2, A3

Context & Evidence:

  • urt_temporal — TC (current), TR (recent), TH (historical), TF (future)
  • urt_evidence — ES (stated), EI (inferred), EC (contextual)
  • urt_comparative — CR-N (none), CR-B (better), CR-W (worse), CR-S (same)

Classification:

  • urt_profile — lite, core, standard, full
  • urt_confidence — low, medium, high
  • urt_relation — cause_of, effect_of, contrast, resolution
  • urt_entity_type — location, staff, product, process, time, other

7. Key Functions

Function Purpose
urt_validate_causal_chain() Validates causal JSONB structure
validate_review_relations() Ensures related_span_id same-parent
validate_active_spans() Ensures valid active span set
set_primary_span() Deterministic primary selection
generate_issue_id() SHA256-based issue ID

8. Key Triggers

Trigger Purpose
review_spans_validate_bounds span_end ≤ text length
review_spans_validate_text span_text matches substring
review_spans_validate_causal_chain causal_chain JSONB valid

9. USN Format

Standard: URT:S:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}
Full:     URT:F:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}:{causal}

Examples:

  • URT:S:SVC.SPD:V-I3:S3A3T2.E2.CR1 — Specific service speed complaint
  • URT:F:PRD.QUA:V+I2:S2A1T1.E3.CR2:staff→training — Product quality praise with causal chain

10. Span Boundary Rules

  1. Split on contrasting conjunctions — "but", "however", "although"
  2. Split on topic/target change — Different entity or aspect
  3. Split on valence change — Positive → Negative or vice versa
  4. Split on domain change — SVC → PRD → AMB
  5. Keep cause→effect together — Causal chain stays in one span

11. Deferred to v3.3+

Item Reason
Entity extraction implementation Requires NER pipeline
Trust-weighted fact aggregation Needs more span data
Secondary domain enforcement App-level validation sufficient
Span-based fact counting Currently review-based, optimize later

12. Open Questions Resolved

Question Resolution
Span → Issue cardinality? One-to-one (not many-to-many)
Offsets nullable for LLM-inferred? No — required, NOT NULL
Reprocessing strategy? Soft-switch with is_active flag
TEXT vs ENUM for dimensions? ENUMs — committed to Postgres
Taxonomy evolution tracking? Yes — versioned codes with explicit mappings (v3.2.1)
B2 schema vs v3.2 divergence? Documented — B2 is canonical URT, v3.2 is app layer
Taxonomy versioning? Yestaxonomy_version column on spans, versioned code tables

13. B2 Schema Audit Findings

Audit Date: 2026-01-24

The B2-database-schema.sql (canonical URT v5.1) and ReviewIQ v3.2 spec have deliberate divergences:

Aspect B2 (URT v5.1) v3.2 (ReviewIQ) Resolution
Purpose Source-agnostic taxonomy Google Reviews app layer Keep both
ID strategy UUIDs + sequential Deterministic SHA256 v3.2 choice
Type safety VARCHAR + CHECK Postgres ENUMs v3.2 choice
Span table spans review_spans v3.2 naming
Offset columns char_start/char_end span_start/span_end Document divergence
Tenant model Single-tenant Multi-tenant (business_id) v3.2 requirement
Issue-span mapping Many-to-many One-to-one v3.2 choice
Causal chain Normalized table JSONB column v3.2 flexibility
Reprocessing Not supported Soft-switch pattern v3.2 innovation

Action Items:

  1. Import reference data (domains, categories, subcodes) from B2 INSERTs
  2. Seed urt_codes / urt_codes_versioned from B1-urt-codes.yaml
  3. Do NOT adopt B2 structure directly — v3.2 has specific app requirements

14. Taxonomy Versioning (v3.2.1)

Decision Choice Rationale
Track taxonomy version Required column on spans Classifications only meaningful in version context
Version ID format v{major}.{minor} Human-readable, matches URT releases
Code FK strategy Composite (code, version_id) Prevents orphaned classifications
Cross-version mappings Explicit mapping table Enables normalized trend queries
Mapping direction Forward only (old→new) Simpler model, matches time flow
Default version 'v5.1' hardcoded Safe baseline, explicit upgrade path
Fact table versioning Per-row taxonomy_version Enables version-specific aggregation

Key Tables Added:

  • urt_taxonomy_versions — Version registry with validity periods
  • urt_codes_versioned — Full code definitions per version (SCD Type 2)
  • urt_code_mappings — Cross-version translation rules

Key Functions Added:

  • translate_urt_code(code, from_version, to_version) — Single code translation
  • get_code_lineage(code, version) — Full historical lineage
  • detect_taxonomy_drift(from_version, to_version) — Impact analysis
  • aggregate_spans_normalized(...) — Version-normalized aggregation

Principle: Facts are immutable. A span classified as J1.01 in v5.1 stays that way forever. Translation is explicit and auditable.

See: .artifacts/ReviewIQ-v3.2.1-Taxonomy-Versioning.md


15. Pipeline Implementation Status

Overall: ~55% Complete (as of 2026-01-24)

Stage Name Status Owner
0 Raw Ingestion DONE Scraper Team
1 Normalization TODO TBD
2 LLM Classification TODO TBD
3 Issue Routing TODO TBD
4 Fact Aggregation TODO TBD

What's Working:

  • Google Maps scraping (v1.0.0)
  • Job orchestration & queuing
  • Webhook delivery
  • Frontend job management
  • Real-time SSE streaming

What's Missing:

  • Entire enrichment pipeline (Stages 1-4)
  • LLM integration
  • Span extraction
  • Issue routing
  • Analytics aggregation

Parallel Development: Each stage can be implemented independently using the contracts defined in:

  • ReviewIQ-Pipeline-Contracts-v1.md — Full I/O specs, validation rules, test fixtures
  • ReviewIQ-Pipeline-Checklist.md — Implementation checklist, definition of done

Estimated Effort to 100%: 6-8 weeks


Quick Reference

Primary Span Selection Algorithm

ORDER BY:
  1. intensity DESC (I3 > I2 > I1)
  2. valence ASC (V- > V± > V0 > V+)
  3. span_index ASC (first wins ties)

Issue Routing Key

(business_id, place_id, urt_primary, entity_normalized)

Trust Score Calculation

GREATEST(0.2, base_trust * modifiers)  -- Floor prevents collapse

Last updated: 2026-01-24 (pipeline contracts + codebase overview)