# ReviewIQ: Review Intelligence Pipeline **Version**: 3.0 **Status**: Architecture Specification **Date**: 2026-01-24 --- ## Executive Summary ReviewIQ transforms customer reviews into actionable business intelligence through a three-stage pipeline: 1. **Ingest** — LLM-powered URT classification with semantic embeddings 2. **Analyze** — Issue lifecycle management with sub-pattern discovery 3. **Report** — Statistically rigorous insights with trend detection **Design Principles**: - **Accuracy over heuristics**: LLM classification at ingest (~$0.0002/review) - **Taxonomy as structure**: URT provides stable, interpretable categories - **Local ML for depth**: Sub-clustering reveals actionable patterns within categories - **Feedback loop**: CR (Comparative Reference) signals verify resolution effectiveness --- ## Part 1: System Architecture ``` ┌─────────────────────────────────────────────────────────────────────────────────┐ │ REVIEWIQ PIPELINE │ ├─────────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ ┌──────────────────────────────────────────────────────┐ │ │ │ │ │ INGEST LAYER │ │ │ │ Reviews │────▶│ ┌─────────┐ ┌─────────┐ ┌─────────────────┐ │ │ │ │ (Input) │ │ │ Embed │ │ LLM │ │ Store │ │ │ │ │ │ │ │ Review │───▶│Classify │───▶│ (PostgreSQL) │ │ │ │ └─────────────┘ │ └─────────┘ └─────────┘ └─────────────────┘ │ │ │ │ │ │ │ │ │ │ │ ~$0.00 │ ~$0.0002 │ │ │ │ │ (local) │ per review │ │ │ └──────┼──────────────────────────────┼─────────────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────────────────────────────────────────────────────────────────┐ │ │ │ ISSUE AGGREGATION │ │ │ │ │ │ │ │ V- classified reviews ───▶ Match or Create Issue ───▶ Track State │ │ │ │ │ │ │ │ Rules: Same URT code + entity + location + time window = same issue │ │ │ │ States: DETECTED → ACKNOWLEDGED → IN_PROGRESS → RESOLVED → VERIFIED │ │ │ │ │ │ │ └──────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────────────────────────────────────────────────────────────────┐ │ │ │ REPORT GENERATION │ │ │ │ │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────┐ │ │ │ │ │ Aggregate │ │ Sub-Cluster │ │ Trend │ │ LLM │ │ │ │ │ │ by URT Code │──▶│ Within Codes │──▶│ Analysis │──▶│ Narrate │ │ │ │ │ │ (SQL) │ │ (HDBSCAN) │ │ (CR + Rate) │ │ │ │ │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ └──────────┘ │ │ │ │ $0.00 $0.00 $0.00 ~$0.15 │ │ │ │ │ │ │ └──────────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────────────────────────────────────────────┐ │ │ │ OUTPUT │ │ │ │ │ │ │ │ • Executive Summary with statistically defensible claims │ │ │ │ • Issues ranked by priority with sub-pattern breakdown │ │ │ │ • Strengths with trend signals │ │ │ │ • Staff performance insights │ │ │ │ • Actionable recommendations │ │ │ │ │ │ │ └──────────────────────────────────────────────────────────────────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────────┘ ``` --- ## Part 2: Ingest Layer ### 2.1 Design Philosophy The review is the atomic unit. We do not split reviews into fragments — this preserves context and enables accurate classification. A single review may contain multiple topics; URT's multi-coding (primary + up to 2 secondary codes) handles this naturally. ### 2.2 Dual Processing Each review undergoes two parallel operations: ```python async def ingest_review(review: dict) -> dict: """Ingest a single review: embed + classify.""" text = review['text'].strip() # Parallel execution embedding_task = asyncio.create_task(embed_review(text)) classification_task = asyncio.create_task(classify_review_llm(text)) embedding = await embedding_task classification = await classification_task return { 'review_id': review['review_id'], 'business_id': review['business_id'], 'text': text, 'embedding': embedding, 'date': review['date'], 'rating': review.get('rating'), **classification, # URT codes, valence, intensity, etc. } ``` ### 2.3 Embedding Local multilingual embeddings for semantic capabilities: ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer('intfloat/multilingual-e5-small') def embed_review(text: str) -> np.ndarray: """Generate normalized embedding for semantic search and clustering.""" # e5 models perform better with instruction prefix embedding = model.encode( f"passage: {text}", normalize_embeddings=True ) return embedding # 384 dimensions ``` **Why embed if we have URT codes?** - Sub-clustering within URT codes (pattern discovery) - Semantic quote selection (centroid-closest) - Similarity search for emerging patterns - Backup for low-confidence classifications ### 2.4 LLM Classification Single LLM call extracts complete URT classification: ```python CLASSIFICATION_PROMPT = """You are a customer feedback classifier using the Universal Review Taxonomy (URT). Analyze the review and return JSON with: { "urt_primary": "X1.23", // Main URT subcode "urt_secondary": ["Y2.34"], // 0-2 additional codes (different domains only) "valence": "V-", // V+, V-, V0, V± "intensity": "I2", // I1 (mild), I2 (moderate), I3 (strong) "comparative": "CR-N", // CR-N (none), CR-B (better), CR-W (worse), CR-S (same) "staff_mentions": ["Mike"], // Employee names mentioned "quotes": { // Key phrase for each code "X1.23": "exact phrase from review", "Y2.34": "another phrase" } } URT DOMAINS (choose primary from most impactful): - O (Offering): Product/service quality, function, completeness, fit - P (People): Staff attitude, competence, responsiveness, communication - J (Journey): Timing, ease, reliability, resolution - E (Environment): Physical space, digital interface, ambiance, safety - A (Access): Availability, accessibility, inclusivity, convenience - V (Value): Price, transparency, effort, worth - R (Relationship): Trust, dependability, recovery, loyalty RULES: 1. Primary = what customer is MOST affected by 2. Secondary must be DIFFERENT domains (P1.02 + P3.01 is invalid) 3. V± only when genuinely mixed (positive AND negative on different topics) 4. CR-B/W/S only for explicit self-comparison ("better than last time", "still broken") 5. quotes must be EXACT phrases from the review text Return valid JSON only.""" async def classify_review_llm(text: str) -> dict: """Complete URT classification via LLM.""" response = await llm.chat( model="gpt-4o-mini", # ~$0.0002 per review messages=[ {"role": "system", "content": CLASSIFICATION_PROMPT}, {"role": "user", "content": text} ], response_format={"type": "json_object"}, temperature=0.1 # Low temperature for consistency ) return json.loads(response.content) ``` ### 2.5 Batch Processing for Efficiency For bulk ingestion, batch multiple reviews per LLM call: ```python async def classify_batch(reviews: list[dict], batch_size: int = 10) -> list[dict]: """Process reviews in batches for ~40% cost reduction.""" results = [] for i in range(0, len(reviews), batch_size): batch = reviews[i:i+batch_size] prompt = BATCH_CLASSIFICATION_PROMPT + "\n\nREVIEWS:\n" for j, review in enumerate(batch): prompt += f"\n[{j}] {review['text']}\n---\n" response = await llm.chat( model="gpt-4o-mini", messages=[{"role": "system", "content": prompt}], response_format={"type": "json_object"} ) batch_results = json.loads(response.content)["classifications"] results.extend(batch_results) return results ``` ### 2.6 Data Model ```sql -- Core review storage with URT classification CREATE TABLE reviews ( review_id TEXT PRIMARY KEY, business_id TEXT NOT NULL, text TEXT NOT NULL, embedding VECTOR(384), date TIMESTAMP NOT NULL, rating SMALLINT, -- URT Classification (from LLM) urt_primary TEXT NOT NULL, -- 'J1.01', 'P1.02', etc. urt_secondary TEXT[] DEFAULT '{}', -- Max 2 valence TEXT NOT NULL, -- 'V+', 'V-', 'V0', 'V±' intensity TEXT NOT NULL, -- 'I1', 'I2', 'I3' comparative TEXT DEFAULT 'CR-N', -- 'CR-N', 'CR-B', 'CR-W', 'CR-S' -- Extracted entities staff_mentions TEXT[] DEFAULT '{}', quotes JSONB, -- {"code": "phrase", ...} -- Metadata created_at TIMESTAMP DEFAULT NOW(), classification_model TEXT DEFAULT 'gpt-4o-mini' ); -- Indexes for query patterns CREATE INDEX idx_reviews_business_date ON reviews(business_id, date DESC); CREATE INDEX idx_reviews_urt_primary ON reviews(business_id, urt_primary); CREATE INDEX idx_reviews_valence ON reviews(business_id, valence, date); CREATE INDEX idx_reviews_comparative ON reviews(comparative) WHERE comparative != 'CR-N'; CREATE INDEX idx_reviews_embedding ON reviews USING hnsw (embedding vector_cosine_ops); ``` --- ## Part 3: Issue Lifecycle Management Following the URT Issue Lifecycle Framework (C1), negative feedback (V-) generates trackable issues. ### 3.1 Issue Aggregation Multiple reviews about the same problem aggregate into a single issue: ```python def aggregate_to_issue(review: dict) -> str: """Match review to existing issue or create new one.""" if review['valence'] not in ('V-', 'V±'): return None # Only negative feedback creates issues # Find matching open issues matching = db.query(""" SELECT issue_id, primary_subcode, entity, location FROM issues WHERE business_id = %s AND primary_subcode = %s AND state NOT IN ('VERIFIED', 'DECLINED') AND created_at > NOW() - INTERVAL '30 days' """, [review['business_id'], review['urt_primary']]) for issue in matching: if is_same_issue(review, issue): # Aggregate to existing issue add_span_to_issue(issue['issue_id'], review) recalculate_priority(issue['issue_id']) return issue['issue_id'] # Check intensity threshold for new issue creation if should_create_issue(review): return create_issue(review) return None # Stored in buffer for future aggregation def should_create_issue(review: dict) -> bool: """Intensity-based issue creation thresholds.""" if review['intensity'] == 'I3': return True # Critical = immediate issue # Check aggregation buffer for patterns similar_count = count_similar_in_buffer(review, window_days=30) if review['intensity'] == 'I2' and similar_count >= 2: return True # Moderate + 2 others = issue if review['intensity'] == 'I1' and similar_count >= 4: return True # Mild + 4 others = issue return False ``` ### 3.2 Issue State Machine ``` DETECTED │ ┌────────────┼────────────┐ ▼ ▼ ▼ ACKNOWLEDGED DECLINED (escalate) │ │ ▼ │ IN_PROGRESS │ │ │ ▼ │ RESOLVED ◀────────────────────┘ │ ┌─────┼─────┐ ▼ ▼ VERIFIED REOPENED │ └──▶ (back to IN_PROGRESS) ``` ### 3.3 Priority Scoring ```python def calculate_priority(issue: dict) -> float: """ Priority combines intensity, volume, recency, and recurrence. P = I_weight × (1 + log(span_count)) × decay(days) × recurrence_boost × trend_modifier """ INTENSITY_WEIGHTS = {'I1': 1.0, 'I2': 2.0, 'I3': 4.0} i_weight = INTENSITY_WEIGHTS[issue['max_intensity']] volume_factor = 1 + math.log(issue['span_count']) days_old = (datetime.now() - issue['created_at']).days decay = math.exp(-0.023 * days_old) # Half-life ~30 days recurrence_boost = 1.0 + 0.5 * math.log2(issue['reopen_count'] + 1) # Trend modifier from CR signals if issue['recent_cr_w_count'] >= 2: trend_modifier = 1.3 # Worsening elif issue['recent_cr_b_count'] >= 2: trend_modifier = 0.7 # Improving else: trend_modifier = 1.0 # Stable return i_weight * volume_factor * decay * recurrence_boost * trend_modifier ``` ### 3.4 Resolution Verification via CR Signals The Comparative Reference (CR) dimension enables automatic verification: ```python def process_cr_signal(review: dict): """Handle CR-B/W/S signals for issue lifecycle.""" if review['comparative'] == 'CR-N': return # Find resolved issues with matching code resolved_issues = db.query(""" SELECT issue_id, state, resolved_at FROM issues WHERE business_id = %s AND primary_subcode = %s AND state IN ('RESOLVED', 'VERIFIED') AND resolved_at > NOW() - INTERVAL '60 days' """, [review['business_id'], review['urt_primary']]) for issue in resolved_issues: if review['comparative'] == 'CR-B': # Improvement signal → verify resolution if issue['state'] == 'RESOLVED': verify_issue(issue['issue_id'], review['review_id']) elif review['comparative'] in ('CR-S', 'CR-W'): # Unchanged or worsening → reopen reopen_issue(issue['issue_id'], review['review_id']) if review['comparative'] == 'CR-W': escalate_issue(issue['issue_id'], reason='REGRESSION') ``` ### 3.5 Issue Data Model ```sql CREATE TABLE issues ( issue_id TEXT PRIMARY KEY, business_id TEXT NOT NULL, primary_subcode TEXT NOT NULL, domain TEXT NOT NULL, -- State state TEXT NOT NULL DEFAULT 'DETECTED', priority_score FLOAT NOT NULL, confidence_score FLOAT NOT NULL, -- Aggregation review_ids TEXT[] NOT NULL, span_count INT NOT NULL DEFAULT 1, max_intensity TEXT NOT NULL, -- Ownership owner_team TEXT, owner_individual TEXT, -- Timestamps created_at TIMESTAMP DEFAULT NOW(), acknowledged_at TIMESTAMP, resolved_at TIMESTAMP, verified_at TIMESTAMP, -- Resolution reopen_count INT DEFAULT 0, resolution_code TEXT, resolution_notes TEXT, decline_reason TEXT, -- Context entity TEXT, -- Product, staff member, feature location TEXT, -- Physical or logical location causal_codes TEXT[], -- CD-O, MG-T, etc. -- Verification verification_window_days INT DEFAULT 60 ); CREATE TABLE issue_events ( event_id SERIAL PRIMARY KEY, issue_id TEXT REFERENCES issues(issue_id), event_type TEXT NOT NULL, -- 'state_change', 'span_added', 'priority_update' from_state TEXT, to_state TEXT, actor TEXT, notes TEXT, review_id TEXT, -- Triggering review if applicable created_at TIMESTAMP DEFAULT NOW() ); -- Time-series aggregation for impact charts CREATE TABLE issue_timeseries ( id SERIAL PRIMARY KEY, business_id TEXT NOT NULL, code TEXT NOT NULL, -- URT code (e.g., 'J1.01') period DATE NOT NULL, -- Bucket date (day/week/month) bucket_type TEXT NOT NULL, -- 'day', 'week', 'month' -- Counts review_count INT NOT NULL DEFAULT 0, negative_count INT NOT NULL DEFAULT 0, positive_count INT NOT NULL DEFAULT 0, -- Strength metrics strength_score FLOAT NOT NULL DEFAULT 0, -- Weighted by intensity avg_intensity FLOAT, max_intensity TEXT, -- CR signals in period cr_better INT DEFAULT 0, cr_worse INT DEFAULT 0, cr_same INT DEFAULT 0, UNIQUE(business_id, code, period, bucket_type) ); CREATE INDEX idx_timeseries_lookup ON issue_timeseries(business_id, code, period); ``` ### 3.6 Issue Review Drill-Down Retrieve all reviews belonging to a specific issue for detailed inspection: ```python def get_issue_reviews(issue_id: str, sort_by: str = 'date', limit: int = 50) -> list[dict]: """Fetch all reviews aggregated into an issue.""" issue = db.query_one(""" SELECT issue_id, review_ids, primary_subcode, business_id FROM issues WHERE issue_id = %s """, [issue_id]) if not issue: return [] order_clause = { 'date': 'date DESC', 'intensity': "CASE intensity WHEN 'I3' THEN 1 WHEN 'I2' THEN 2 ELSE 3 END", 'relevance': 'date DESC' # Could enhance with embedding similarity }.get(sort_by, 'date DESC') reviews = db.query(f""" SELECT review_id, text, date, rating, valence, intensity, comparative, quotes->>%s as quote, staff_mentions FROM reviews WHERE review_id = ANY(%s) ORDER BY {order_clause} LIMIT %s """, [issue['primary_subcode'], issue['review_ids'], limit]) return [{ **r, 'intensity_weight': {'I1': 1, 'I2': 2, 'I3': 4}[r['intensity']] } for r in reviews] ``` ### 3.7 Strength Score Aggregation A unified metric combining volume and intensity for impact measurement: ``` Strength Score = Σ (intensity_weight × review_count) Where: I1 (mild) → weight = 1 I2 (moderate) → weight = 2 I3 (strong) → weight = 4 ``` ```python INTENSITY_WEIGHTS = {'I1': 1, 'I2': 2, 'I3': 4} def compute_strength_score(reviews: list[dict]) -> float: """ Aggregate strength from multiple reviews. A single I3 review (weight=4) has same impact as: - 4 I1 reviews, or - 2 I2 reviews This captures that one "furious" customer signals more than four "mildly annoyed" customers. """ return sum(INTENSITY_WEIGHTS.get(r['intensity'], 1) for r in reviews) def compute_strength_by_code(business_id: str, code: str, start: date, end: date) -> dict: """Compute strength metrics for a URT code.""" reviews = db.query(""" SELECT intensity, valence, date FROM reviews WHERE business_id = %s AND (urt_primary = %s OR %s = ANY(urt_secondary)) AND date BETWEEN %s AND %s """, [business_id, code, code, start, end]) neg_reviews = [r for r in reviews if r['valence'] in ('V-', 'V±')] pos_reviews = [r for r in reviews if r['valence'] == 'V+'] return { 'code': code, 'total_count': len(reviews), 'negative_count': len(neg_reviews), 'positive_count': len(pos_reviews), 'negative_strength': compute_strength_score(neg_reviews), 'positive_strength': compute_strength_score(pos_reviews), 'avg_intensity': np.mean([ INTENSITY_WEIGHTS[r['intensity']] for r in reviews ]) if reviews else 0, 'max_intensity': max( (r['intensity'] for r in reviews), key=lambda i: INTENSITY_WEIGHTS.get(i, 0), default='I1' ) } ``` ### 3.8 Impact Timeline (Time-Series Aggregation) Generate data for line charts showing issue/strength evolution over time: ```python def build_impact_timeline(business_id: str, code: str, start: date, end: date, bucket: str = 'week') -> list[dict]: """ Time-series strength aggregation for charts. Returns data points for plotting: - X-axis: time periods - Y-axis: strength score (or count, rate) """ timeline = db.query(""" SELECT date_trunc(%s, date)::date as period, COUNT(*) as review_count, COUNT(*) FILTER (WHERE valence IN ('V-', 'V±')) as negative_count, COUNT(*) FILTER (WHERE valence = 'V+') as positive_count, SUM(CASE intensity WHEN 'I3' THEN 4 WHEN 'I2' THEN 2 ELSE 1 END) as strength_score, SUM(CASE intensity WHEN 'I3' THEN 4 WHEN 'I2' THEN 2 ELSE 1 END) FILTER (WHERE valence IN ('V-', 'V±')) as negative_strength, AVG(CASE intensity WHEN 'I3' THEN 3 WHEN 'I2' THEN 2 ELSE 1 END) as avg_intensity, MAX(CASE intensity WHEN 'I3' THEN 3 WHEN 'I2' THEN 2 ELSE 1 END) as max_intensity_num, COUNT(*) FILTER (WHERE comparative = 'CR-B') as cr_better, COUNT(*) FILTER (WHERE comparative = 'CR-W') as cr_worse, COUNT(*) FILTER (WHERE comparative = 'CR-S') as cr_same FROM reviews WHERE business_id = %s AND (urt_primary = %s OR %s = ANY(urt_secondary)) AND date BETWEEN %s AND %s GROUP BY 1 ORDER BY 1 """, [bucket, business_id, code, code, start, end]) # Fill gaps for continuous chart return fill_timeline_gaps(timeline, start, end, bucket) def fill_timeline_gaps(data: list[dict], start: date, end: date, bucket: str) -> list[dict]: """Ensure continuous timeline with zero-fill for missing periods.""" from pandas import date_range freq = {'day': 'D', 'week': 'W-MON', 'month': 'MS'}[bucket] all_periods = date_range(start, end, freq=freq) data_map = {row['period']: row for row in data} result = [] for period in all_periods: period_date = period.date() if period_date in data_map: result.append(data_map[period_date]) else: result.append({ 'period': period_date, 'review_count': 0, 'negative_count': 0, 'positive_count': 0, 'strength_score': 0, 'negative_strength': 0, 'avg_intensity': None, 'max_intensity_num': None, 'cr_better': 0, 'cr_worse': 0, 'cr_same': 0 }) return result def get_issue_impact_chart_data(issue_id: str, months_back: int = 6) -> dict: """ Generate chart-ready data for a specific issue. Returns structure suitable for Recharts/Chart.js: { "issue": {...}, "timeline": [ {"period": "2026-01-06", "strength": 12, "count": 5, ...}, {"period": "2026-01-13", "strength": 8, "count": 3, ...}, ... ], "summary": { "total_strength": 156, "peak_period": "2026-01-06", "trend": "improving" } } """ issue = db.query_one(""" SELECT issue_id, primary_subcode, business_id, created_at FROM issues WHERE issue_id = %s """, [issue_id]) end = date.today() start = end - timedelta(days=months_back * 30) timeline = build_impact_timeline( issue['business_id'], issue['primary_subcode'], start, end, bucket='week' ) # Compute summary stats total_strength = sum(t['negative_strength'] or 0 for t in timeline) peak = max(timeline, key=lambda t: t['negative_strength'] or 0) # Trend: compare last 4 weeks vs prior 4 weeks recent = timeline[-4:] if len(timeline) >= 4 else timeline prior = timeline[-8:-4] if len(timeline) >= 8 else [] recent_avg = np.mean([t['negative_strength'] or 0 for t in recent]) prior_avg = np.mean([t['negative_strength'] or 0 for t in prior]) if prior else recent_avg if recent_avg < prior_avg * 0.7: trend = 'improving' elif recent_avg > prior_avg * 1.3: trend = 'worsening' else: trend = 'stable' return { 'issue': { 'issue_id': issue['issue_id'], 'code': issue['primary_subcode'], 'name': URT_CODE_NAMES.get(issue['primary_subcode'], issue['primary_subcode']) }, 'timeline': [ { 'period': t['period'].isoformat(), 'strength': t['negative_strength'] or 0, 'count': t['negative_count'], 'avg_intensity': round(t['avg_intensity'], 2) if t['avg_intensity'] else None, 'cr_signals': { 'better': t['cr_better'], 'worse': t['cr_worse'], 'same': t['cr_same'] } } for t in timeline ], 'summary': { 'total_strength': total_strength, 'peak_period': peak['period'].isoformat(), 'peak_strength': peak['negative_strength'] or 0, 'trend': trend } } ``` ### 3.9 Timeline Data Model (Chart-Ready) ```typescript // TypeScript interface for frontend consumption interface IssueTimelinePoint { period: string; // ISO date "2026-01-06" strength: number; // Weighted strength score count: number; // Raw review count avg_intensity: number | null; cr_signals: { better: number; worse: number; same: number; }; } interface IssueImpactChart { issue: { issue_id: string; code: string; // "J1.01" name: string; // "Wait Time" }; timeline: IssueTimelinePoint[]; summary: { total_strength: number; peak_period: string; peak_strength: number; trend: 'improving' | 'worsening' | 'stable'; }; } // Example Recharts usage: // // // // ``` --- ## Part 4: Report Generation ### 4.1 Report Structure ```python def generate_report(business_id: str, start: date, end: date) -> dict: """Generate comprehensive business intelligence report.""" # 1. Aggregate statistics by URT code code_stats = compute_code_statistics(business_id, start, end) # 2. Deep analysis of top issues (sub-clustering) top_issues = analyze_top_issues(business_id, code_stats, start, end) # 3. Strength analysis strengths = analyze_strengths(business_id, code_stats, start, end) # 4. Trend analysis (vs prior period) trends = compute_trends(business_id, start, end) # 5. Staff insights staff = analyze_staff_mentions(business_id, start, end) # 6. Open issues summary open_issues = get_open_issues(business_id) # 7. Build payload payload = build_report_payload( business_id, start, end, top_issues, strengths, trends, staff, open_issues ) # 8. LLM narration narrative = await generate_narrative(payload) return { 'payload': payload, 'narrative': narrative, 'generated_at': datetime.now().isoformat() } ``` ### 4.2 Statistics Computation Review-level presence with Wilson confidence intervals: ```python def compute_code_statistics(business_id: str, start: date, end: date) -> list[dict]: """Aggregate statistics by URT code with confidence intervals.""" stats = db.query(""" WITH review_codes AS ( SELECT review_id, urt_primary as code, valence, intensity FROM reviews WHERE business_id = %s AND date BETWEEN %s AND %s UNION ALL SELECT review_id, unnest(urt_secondary) as code, valence, intensity FROM reviews WHERE business_id = %s AND date BETWEEN %s AND %s AND array_length(urt_secondary, 1) > 0 ), code_stats AS ( SELECT code, COUNT(DISTINCT review_id) as k, COUNT(DISTINCT review_id) FILTER (WHERE valence = 'V-') as k_neg, COUNT(DISTINCT review_id) FILTER (WHERE valence = 'V+') as k_pos, MAX(CASE intensity WHEN 'I3' THEN 3 WHEN 'I2' THEN 2 ELSE 1 END) as max_intensity FROM review_codes GROUP BY code ) SELECT cs.*, (SELECT COUNT(DISTINCT review_id) FROM reviews WHERE business_id = %s AND date BETWEEN %s AND %s) as n FROM code_stats cs WHERE k >= 3 ORDER BY k_neg DESC """, [business_id, start, end] * 4) results = [] for row in stats: n = row['n'] # Wilson confidence intervals ci_neg = wilson_ci(row['k_neg'], n) if row['k_neg'] > 0 else (0, 0) ci_pos = wilson_ci(row['k_pos'], n) if row['k_pos'] > 0 else (0, 0) results.append({ 'code': row['code'], 'domain': row['code'][0], 'name': URT_CODE_NAMES[row['code']], 'k': row['k'], 'k_neg': row['k_neg'], 'k_pos': row['k_pos'], 'n': n, 'rate_neg': row['k_neg'] / n if n > 0 else 0, 'rate_pos': row['k_pos'] / n if n > 0 else 0, 'ci_neg': ci_neg, 'ci_pos': ci_pos, 'max_intensity': f"I{row['max_intensity']}", }) return results def wilson_ci(k: int, n: int, z: float = 1.96) -> tuple[float, float]: """Wilson score interval for binomial proportion.""" if n == 0: return (0.0, 0.0) p = k / n denom = 1 + z**2 / n center = (p + z**2 / (2*n)) / denom margin = z * math.sqrt((p*(1-p) + z**2/(4*n)) / n) / denom return (max(0, center - margin), min(1, center + margin)) ``` ### 4.3 Sub-Pattern Discovery (Local ML) The key insight: **LLM gives categories, local ML reveals patterns within categories.** ```python def analyze_top_issues(business_id: str, code_stats: list, start: date, end: date, top_k: int = 5) -> list[dict]: """Deep analysis of top negative codes with sub-clustering.""" # Filter to significant negative codes issues_to_analyze = [ cs for cs in code_stats if cs['k_neg'] >= 8 and cs['ci_neg'][1] - cs['ci_neg'][0] <= 0.30 ][:top_k] results = [] for code_stat in issues_to_analyze: code = code_stat['code'] # Fetch all negative reviews for this code reviews = db.query(""" SELECT review_id, text, embedding, intensity, quotes, date FROM reviews WHERE business_id = %s AND (urt_primary = %s OR %s = ANY(urt_secondary)) AND valence IN ('V-', 'V±') AND date BETWEEN %s AND %s """, [business_id, code, code, start, end]) # Sub-cluster to find patterns sub_patterns = discover_sub_patterns(reviews, code) results.append({ 'code': code, 'name': code_stat['name'], 'total_reviews': code_stat['k_neg'], 'rate': code_stat['rate_neg'], 'ci': code_stat['ci_neg'], 'max_intensity': code_stat['max_intensity'], 'sub_patterns': sub_patterns, }) return results def discover_sub_patterns(reviews: list[dict], code: str, min_cluster_size: int = 3) -> list[dict]: """Cluster reviews within a URT code to find actionable sub-patterns.""" if len(reviews) < min_cluster_size * 2: # Too few for meaningful clustering return [{ 'label': 'General', 'count': len(reviews), 'percentage': 1.0, 'representative_quote': select_representative(reviews, code), 'sharpest_quote': select_sharpest(reviews, code), }] embeddings = np.array([r['embedding'] for r in reviews]) # HDBSCAN for small datasets, KMeans for larger if len(reviews) < 500: clusterer = hdbscan.HDBSCAN( min_cluster_size=min_cluster_size, min_samples=2, metric='euclidean' ) labels = clusterer.fit_predict(embeddings) else: k = min(8, max(3, int(np.sqrt(len(reviews) / 5)))) kmeans = KMeans(n_clusters=k, n_init=3) labels = kmeans.fit_predict(embeddings) # Group by cluster clusters = {} for review, label in zip(reviews, labels): if label == -1: # Noise continue if label not in clusters: clusters[label] = [] clusters[label].append(review) # Build sub-pattern descriptions patterns = [] for label, cluster_reviews in clusters.items(): if len(cluster_reviews) < min_cluster_size: continue cluster_embeddings = np.array([r['embedding'] for r in cluster_reviews]) centroid = cluster_embeddings.mean(axis=0) centroid /= np.linalg.norm(centroid) patterns.append({ 'label': extract_cluster_label(cluster_reviews, code), 'count': len(cluster_reviews), 'percentage': len(cluster_reviews) / len(reviews), 'representative_quote': select_representative(cluster_reviews, code, centroid), 'sharpest_quote': select_sharpest(cluster_reviews, code), 'avg_intensity': np.mean([ {'I1': 1, 'I2': 2, 'I3': 3}[r['intensity']] for r in cluster_reviews ]), 'centroid': centroid, # For trend matching }) patterns.sort(key=lambda x: x['count'], reverse=True) return patterns[:4] # Top 4 sub-patterns def select_representative(reviews: list, code: str, centroid: np.ndarray = None) -> str: """Select quote closest to centroid (most representative).""" if centroid is None: embeddings = np.array([r['embedding'] for r in reviews]) centroid = embeddings.mean(axis=0) centroid /= np.linalg.norm(centroid) best_review = max(reviews, key=lambda r: r['embedding'] @ centroid) # Return the extracted quote for this code, or truncated text if best_review.get('quotes') and code in best_review['quotes']: return best_review['quotes'][code] return best_review['text'][:150] def select_sharpest(reviews: list, code: str) -> str: """Select highest intensity quote (sharpest criticism).""" intensity_order = {'I3': 3, 'I2': 2, 'I1': 1} best_review = max(reviews, key=lambda r: intensity_order.get(r['intensity'], 0)) if best_review.get('quotes') and code in best_review['quotes']: return best_review['quotes'][code] return best_review['text'][:150] def extract_cluster_label(reviews: list, code: str) -> str: """Generate a concise label for the cluster.""" # Extract common phrases from quotes texts = [] for r in reviews: if r.get('quotes') and code in r['quotes']: texts.append(r['quotes'][code].lower()) else: texts.append(r['text'][:100].lower()) # Find distinctive 2-3 word phrases from collections import Counter all_text = ' '.join(texts) words = re.findall(r'\b[a-z]{3,}\b', all_text) # Bigrams, skip stopwords stopwords = {'the', 'and', 'was', 'were', 'for', 'that', 'this', 'with', 'but', 'have', 'had'} bigrams = [ f"{words[i]} {words[i+1]}" for i in range(len(words)-1) if words[i] not in stopwords and words[i+1] not in stopwords ] counts = Counter(bigrams) if counts: label = counts.most_common(1)[0][0] return label.title() return URT_CODE_NAMES.get(code, "General") ``` ### 4.4 Trend Analysis Combine rate comparison with CR signals: ```python def compute_trends(business_id: str, current_start: date, current_end: date) -> dict: """Compute trends vs prior period using rates and CR signals.""" period_length = (current_end - current_start).days prior_start = current_start - timedelta(days=period_length) prior_end = current_start # Current period stats current_stats = compute_code_statistics(business_id, current_start, current_end) current_map = {cs['code']: cs for cs in current_stats} # Prior period stats prior_stats = compute_code_statistics(business_id, prior_start, prior_end) prior_map = {cs['code']: cs for cs in prior_stats} # CR signals in current period cr_signals = db.query(""" SELECT urt_primary as code, comparative, COUNT(*) as count FROM reviews WHERE business_id = %s AND date BETWEEN %s AND %s AND comparative != 'CR-N' GROUP BY urt_primary, comparative """, [business_id, current_start, current_end]) cr_map = {} for row in cr_signals: if row['code'] not in cr_map: cr_map[row['code']] = {'CR-B': 0, 'CR-W': 0, 'CR-S': 0} cr_map[row['code']][row['comparative']] = row['count'] # Compute trends trends = {} for code, current in current_map.items(): prior = prior_map.get(code, {'rate_neg': 0, 'rate_pos': 0, 'k_neg': 0, 'k_pos': 0}) cr = cr_map.get(code, {'CR-B': 0, 'CR-W': 0, 'CR-S': 0}) # Rate-based trend (issues) rate_trend_neg = current['rate_neg'] - prior['rate_neg'] # CR-enhanced trend signal if cr['CR-W'] >= 2: trend_signal = 'worsening' elif cr['CR-B'] >= 2: trend_signal = 'improving' elif cr['CR-S'] >= 2: trend_signal = 'persistent' elif rate_trend_neg > 0.05: trend_signal = 'worsening' elif rate_trend_neg < -0.05: trend_signal = 'improving' else: trend_signal = 'stable' trends[code] = { 'rate_change_neg': rate_trend_neg, 'rate_change_pos': current['rate_pos'] - prior['rate_pos'], 'signal': trend_signal, 'cr_better': cr['CR-B'], 'cr_worse': cr['CR-W'], 'cr_same': cr['CR-S'], } return trends ``` ### 4.5 Staff Analysis ```python def analyze_staff_mentions(business_id: str, start: date, end: date) -> dict: """Aggregate staff performance from mentions.""" staff_data = db.query(""" SELECT unnest(staff_mentions) as staff_name, valence, intensity, urt_primary, quotes FROM reviews WHERE business_id = %s AND date BETWEEN %s AND %s AND array_length(staff_mentions, 1) > 0 """, [business_id, start, end]) staff_map = {} for row in staff_data: name = row['staff_name'] if name not in staff_map: staff_map[name] = { 'positive': [], 'negative': [], 'codes': Counter() } if row['valence'] in ('V+',): staff_map[name]['positive'].append(row) elif row['valence'] in ('V-',): staff_map[name]['negative'].append(row) staff_map[name]['codes'][row['urt_primary']] += 1 # Build summary staff_summary = [] for name, data in staff_map.items(): total = len(data['positive']) + len(data['negative']) if total < 2: continue # Need multiple mentions staff_summary.append({ 'name': name, 'total_mentions': total, 'positive': len(data['positive']), 'negative': len(data['negative']), 'sentiment_ratio': len(data['positive']) / total, 'top_codes': data['codes'].most_common(3), 'sample_praise': data['positive'][0]['quotes'] if data['positive'] else None, 'sample_criticism': data['negative'][0]['quotes'] if data['negative'] else None, }) staff_summary.sort(key=lambda x: x['total_mentions'], reverse=True) return { 'staff': staff_summary, 'top_performer': max(staff_summary, key=lambda x: x['sentiment_ratio']) if staff_summary else None, 'needs_attention': [s for s in staff_summary if s['sentiment_ratio'] < 0.5], } ``` ### 4.6 LLM Narrative Generation ```python NARRATIVE_PROMPT = """You are a business intelligence analyst writing an executive summary of customer feedback. You MUST follow these rules: 1. Only state claims supported by the provided data 2. Include specific numbers (percentages, counts) for every claim 3. Do not invent or hallucinate any statistics 4. Be direct and actionable, not vague 5. Highlight the most impactful findings first The report payload follows. Write a concise executive summary (~300 words) covering: - Top issues with their sub-patterns and severity - Notable strengths - Trend signals (improving/worsening/persistent) - Staff highlights - 2-3 prioritized recommendations REPORT DATA: {payload}""" async def generate_narrative(payload: dict) -> str: """Generate executive narrative from structured payload.""" response = await llm.chat( model="gpt-4o", # Use stronger model for narrative messages=[ {"role": "system", "content": NARRATIVE_PROMPT.format( payload=json.dumps(payload, indent=2) )} ], temperature=0.3 ) return response.content ``` --- ## Part 5: Report Output Example ### 5.1 Structured Payload ```json { "business_id": "rest_12345", "period": "2026-01-01 to 2026-01-31", "total_reviews": 234, "issues": [ { "code": "J1.01", "name": "Wait Time", "total_reviews": 47, "rate": 0.201, "ci": [0.153, 0.258], "max_intensity": "I3", "trend": { "signal": "worsening", "cr_worse": 3, "rate_change": 0.042 }, "sub_patterns": [ { "label": "Table Seating", "count": 20, "percentage": 0.426, "representative_quote": "Waited 45 minutes for a table even with reservation", "sharpest_quote": "HOUR wait on a Tuesday. Unacceptable." }, { "label": "Food After Ordering", "count": 15, "percentage": 0.319, "representative_quote": "Food took 40 minutes after we ordered", "sharpest_quote": "Over an hour for cold pasta" }, { "label": "Check Payment", "count": 12, "percentage": 0.255, "representative_quote": "Had to flag someone down just to pay", "sharpest_quote": "20 minutes for the check, ridiculous" } ] } ], "strengths": [ { "code": "O2.02", "name": "Craftsmanship", "total_reviews": 89, "rate": 0.380, "ci": [0.318, 0.446], "trend": {"signal": "stable"}, "representative_quote": "The pasta is clearly made fresh, incredible quality" } ], "staff": { "top_performer": { "name": "Maria", "mentions": 12, "sentiment_ratio": 0.917 }, "needs_attention": [ { "name": "Tom", "mentions": 8, "sentiment_ratio": 0.375, "top_issues": ["P1.02", "P3.01"] } ] }, "open_issues": [ { "issue_id": "ISSUE-2026-0142", "code": "J1.01", "state": "IN_PROGRESS", "priority": 7.45, "days_open": 12 } ] } ``` ### 5.2 Generated Narrative > **Executive Summary: January 2026** > > Analysis of 234 reviews reveals **wait times as the critical issue**, affecting 20.1% of customers (95% CI: 15.3%-25.8%) — a worsening trend with 3 explicit "worse than before" signals this month. > > **Wait Time Breakdown:** > - **Seating delays (43%)**: Customers report 30-60 minute waits despite reservations. *"Waited 45 minutes for a table even with reservation."* > - **Kitchen delays (32%)**: Food taking 40+ minutes after ordering. *"Over an hour for cold pasta."* > - **Checkout friction (26%)**: Difficulty getting the check. *"20 minutes for the check, ridiculous."* > > **Strengths remain strong**: Food craftsmanship praised in 38% of reviews, stable month-over-month. *"The pasta is clearly made fresh, incredible quality."* > > **Staff Notes**: Maria received 12 mentions with 92% positive sentiment. Tom (8 mentions, 38% positive) shows patterns in P1.02 (Respect) and P3.01 (Attentiveness) — recommend coaching session. > > **Prioritized Recommendations:** > 1. **Immediate**: Audit reservation system — seating bottleneck is primary wait issue > 2. **This Week**: Review kitchen workflow for food delivery timing > 3. **This Month**: Implement checkout process training (e.g., table check-in rotation) > > One high-priority issue (ISSUE-2026-0142) is in progress with 12 days elapsed. --- ## Part 6: Cost Model | Stage | When | Cost | Notes | |-------|------|------|-------| | **Embedding** | Per review ingested | $0.00 | Local model, ~50ms/review | | **LLM Classification** | Per review ingested | ~$0.0002 | GPT-4o-mini, batched | | **Issue Aggregation** | Per V- review | $0.00 | SQL queries | | **Sub-Clustering** | Per report | $0.00 | HDBSCAN/KMeans, <1s | | **Trend Analysis** | Per report | $0.00 | SQL + computation | | **LLM Narrative** | Per report | ~$0.15 | GPT-4o, single call | **Total Costs:** | Volume | Monthly Ingest | Reports (10/month) | Total | |--------|---------------|-------------------|-------| | 1K reviews | $0.20 | $1.50 | **$1.70** | | 10K reviews | $2.00 | $1.50 | **$3.50** | | 100K reviews | $20.00 | $1.50 | **$21.50** | --- ## Part 7: Implementation Checklist ### Phase 1: Core Pipeline - [ ] Set up PostgreSQL with pgvector extension - [ ] Implement embedding generation (multilingual-e5-small) - [ ] Build LLM classification module with batching - [ ] Create review ingestion pipeline - [ ] Implement URT code reference data ### Phase 2: Issue Lifecycle - [ ] Implement issue aggregation logic - [ ] Build state machine with transitions - [ ] Create priority scoring function - [ ] Add CR signal processing for verification - [ ] Set up issue event logging - [ ] Implement strength score aggregation - [ ] Build issue review drill-down query - [ ] Create impact timeline aggregation - [ ] Set up issue_timeseries table and population ### Phase 3: Report Generation - [ ] Build statistics aggregation queries - [ ] Implement sub-pattern clustering - [ ] Add trend analysis with CR integration - [ ] Create staff analysis module - [ ] Build narrative generation prompt ### Phase 4: Integration - [ ] API endpoints for ingestion - [ ] Report generation endpoint - [ ] Issue management endpoints - [ ] Issue timeline chart endpoint - [ ] Issue review table endpoint - [ ] Dashboard queries - [ ] Alert/notification hooks --- ## Part 8: Key Innovations | Innovation | Benefit | |------------|---------| | **LLM at ingest, not report** | Accurate classification amortized across all reports | | **URT as structure** | Stable, interpretable categories; no clustering drift | | **Multi-coding** | Handle complex reviews without fragmentation | | **Sub-clustering within codes** | Actionable patterns beyond category level | | **CR for verification** | Automatic resolution validation from customer feedback | | **Review as unit** | Preserve context; avoid embedding quality loss | | **Issue lifecycle** | Operational tracking with statistical rigor | | **Strength score** | Unified impact metric: volume × intensity | | **Impact timeline** | Time-series visualization for trend analysis | | **Issue drill-down** | Full review table for any aggregated issue | --- ## Document Control | Field | Value | |-------|-------| | **Document** | ReviewIQ Architecture v3.0 | | **Status** | Specification Complete | | **Date** | 2026-01-24 | | **Dependencies** | URT Specification v5.1, Issue Lifecycle Framework C1 | | **Cost Target** | <$25/month at 100K reviews | | **Accuracy Target** | >90% URT classification, >85% sub-pattern relevance | ### Changelog v3.0 | Addition | Description | |----------|-------------| | **3.6 Issue Review Drill-Down** | Query to fetch all reviews for a specific issue | | **3.7 Strength Score Aggregation** | Unified metric: count × intensity weight | | **3.8 Impact Timeline** | Time-series aggregation for line charts | | **3.9 Timeline Data Model** | TypeScript interface for frontend charts | | **issue_timeseries table** | Persistent time-bucketed aggregation | --- *End of ReviewIQ Architecture v3.0*