Files
whyrating-engine-legacy/.artifacts/ReviewIQ-v32-Decisions.md
Alejandro Gutiérrez 544e028c3f Phase 0: Project restructure to ReviewIQ platform architecture
New structure:
- scrapers/google_reviews/v1_0_0.py (was modules/scraper_clean.py)
- scrapers/base.py (BaseScraper interface)
- scrapers/registry.py (ScraperRegistry for version routing)
- core/database.py, models.py, config.py, enums.py
- utils/logger.py, crash_analyzer.py, health_checks.py, helpers.py, date_converter.py
- workers/chrome_pool.py
- services/webhook_service.py
- api/ routes structure (empty, ready for Phase 2)
- tests/ structure mirroring source

All imports updated in:
- api_server_production.py (7 import paths updated)
- utils/health_checks.py (scraper import path)

Legacy modules moved to modules/_legacy/:
- data_storage.py, image_handler.py, s3_handler.py (unused)

Syntax verified, frontend build passing.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 15:22:08 +00:00

5.3 KiB

ReviewIQ v3.2 Design Decisions

Fast context-recovery document — all key decisions without the full spec.


1. Markpoint

ID:       reviewiq-v32-span-layer-2026-01-24-001
Status:   v3.2 span layer complete
Based on: v3.1.2 (commit f998277)

2. Core Design Decisions

Decision Choice Rationale
Span granularity Clause/topic-level Preserves multi-domain signal
span_id format ULID (TEXT) Survives re-segmentation
Span offsets Required (NOT NULL) Deterministic reconstruction
Offsets reference reviews_enriched.text Not text_normalized
Span → Issue mapping One-to-one (UNIQUE span_id) Atomic unit per issue
Primary span enforcement Partial unique index Exactly one per review version
Primary selection I3>I2>I1, V->V±>V0>V+, span_index Deterministic, stable
Reprocessing strategy Soft-switch with is_active No transient empty states
Span overlap GiST exclusion constraint Non-overlapping ranges enforced
Secondary codes Array with cardinality ≤ 2 Could normalize to link table later
Causal chain storage JSONB Flexibility, normalize later if needed
relation_type vs causal_chain Separate concerns relation = within-review, causal = root cause
Dimension columns Postgres ENUMs Type safety, storage efficiency
Trust score floor 0.2 (GREATEST clamp) Prevent multiplicative collapse
Issue routing key (business_id, place_id, urt_primary, entity_normalized) Deterministic, entity-aware
Issue ID generation SHA256 via pgcrypto Deterministic, collision-resistant
Text validation trigger Conditional via session setting Performance: skip in bulk loads
Relation validation Application-level post-insert Handles insertion order

3. Extensions Required

Extension Purpose
btree_gist Exclusion constraint for non-overlapping spans
pgcrypto SHA256-based issue ID generation

4. New Tables

Table Purpose
review_spans Span-level URT classification
review_span_secondary_codes (Optional) Normalized secondary codes

5. Modified Tables

Table Changes
issue_spans Added span_id FK (NOT NULL), removed direct review FK as canonical

6. New ENUM Types

Valence & Intensity:

  • urt_valence — V-, V±, V0, V+
  • urt_intensity — I1, I2, I3

Specificity & Actionability:

  • urt_specificity — S1, S2, S3
  • urt_actionability — A1, A2, A3

Context & Evidence:

  • urt_temporal — T1, T2, T3
  • urt_evidence — E1, E2, E3
  • urt_comparative — CR1, CR2, CR3

Classification:

  • urt_profile — factual, emotional, comparative, etc.
  • urt_confidence — low, medium, high
  • urt_relation — elaborates, contrasts, causes, etc.
  • urt_entity_type — person, product, location, etc.

7. Key Functions

Function Purpose
urt_validate_causal_chain() Validates causal JSONB structure
validate_review_relations() Ensures related_span_id same-parent
validate_active_spans() Ensures valid active span set
set_primary_span() Deterministic primary selection
generate_issue_id() SHA256-based issue ID

8. Key Triggers

Trigger Purpose
review_spans_validate_bounds span_end ≤ text length
review_spans_validate_text span_text matches substring
review_spans_validate_causal_chain causal_chain JSONB valid

9. USN Format

Standard: URT:S:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}
Full:     URT:F:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}:{causal}

Examples:

  • URT:S:SVC.SPD:V-I3:S3A3T2.E2.CR1 — Specific service speed complaint
  • URT:F:PRD.QUA:V+I2:S2A1T1.E3.CR2:staff→training — Product quality praise with causal chain

10. Span Boundary Rules

  1. Split on contrasting conjunctions — "but", "however", "although"
  2. Split on topic/target change — Different entity or aspect
  3. Split on valence change — Positive → Negative or vice versa
  4. Split on domain change — SVC → PRD → AMB
  5. Keep cause→effect together — Causal chain stays in one span

11. Deferred to v3.3+

Item Reason
Entity extraction implementation Requires NER pipeline
Trust-weighted fact aggregation Needs more span data
Secondary domain enforcement App-level validation sufficient
Span-based fact counting Currently review-based, optimize later

12. Open Questions Resolved

Question Resolution
Span → Issue cardinality? One-to-one (not many-to-many)
Offsets nullable for LLM-inferred? No — required, NOT NULL
Reprocessing strategy? Soft-switch with is_active flag
TEXT vs ENUM for dimensions? ENUMs — committed to Postgres

Quick Reference

Primary Span Selection Algorithm

ORDER BY:
  1. intensity DESC (I3 > I2 > I1)
  2. valence ASC (V- > V± > V0 > V+)
  3. span_index ASC (first wins ties)

Issue Routing Key

(business_id, place_id, urt_primary, entity_normalized)

Trust Score Calculation

GREATEST(0.2, base_trust * modifiers)  -- Floor prevents collapse

Last updated: 2026-01-24