From ee596c7969e53a95a355ffd7ff34009f1af0ae26 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alejandro=20Guti=C3=A9rrez?= <35082514+alezmad@users.noreply.github.com> Date: Sat, 31 Jan 2026 23:24:57 +0000 Subject: [PATCH] docs(reputation-report): Add comprehensive pipeline documentation Documents: - Data flow and architecture - CLI options and usage - Output schema with examples - Scoring formulas - Production guardrails - Thresholds and domain mapping - Testing instructions Co-Authored-By: Claude Opus 4.5 --- .../docs/REPUTATION_REPORT.md | 387 ++++++++++++++++++ 1 file changed, 387 insertions(+) create mode 100644 packages/reviewiq-pipeline/docs/REPUTATION_REPORT.md diff --git a/packages/reviewiq-pipeline/docs/REPUTATION_REPORT.md b/packages/reviewiq-pipeline/docs/REPUTATION_REPORT.md new file mode 100644 index 0000000..ed32b4c --- /dev/null +++ b/packages/reviewiq-pipeline/docs/REPUTATION_REPORT.md @@ -0,0 +1,387 @@ +# Reputation Report Pipeline + +**Version:** 1.0 (v8) +**Status:** Production-ready +**Location:** `packages/reviewiq-pipeline/scripts/reputation_report.py` + +## Overview + +The Reputation Report generates business-facing, time-windowed reputation analytics from classified review spans. It produces a €50-value report suitable for SMB business owners, including: + +- Overall performance score (0-100 scale) +- Domain and primitive breakdowns +- Positive and negative drivers with evidence +- Time comparisons (current vs previous period) +- Sector benchmarks +- Timeline visualization data +- LLM-generated executive summary + +## Quick Start + +```bash +# Basic usage (last 365 days) +python scripts/reputation_report.py --business "Go Karts Mar Menor" --days 365 + +# With output file +python scripts/reputation_report.py --business "Business Name" --days 30 --output report.json + +# Custom date range +python scripts/reputation_report.py --business "Business Name" --start 2025-01-01 --end 2025-12-31 + +# Production mode (fail if LLM summary fails) +python scripts/reputation_report.py --business "Business Name" --days 365 --require-summary +``` + +## CLI Options + +| Option | Description | Default | +|--------|-------------|---------| +| `--business` | Business ID or search pattern (required) | - | +| `--days` | Last N days to analyze | 30 | +| `--start` | Window start (ISO-8601) | - | +| `--end` | Window end (ISO-8601) | - | +| `--run-id` | Specific run ID (overrides time window) | - | +| `--timezone` | IANA timezone for window | UTC | +| `--output, -o` | Output file path | stdout | +| `--quiet, -q` | Suppress console summary | false | +| `--no-summary` | Disable executive summary | false | +| `--require-summary` | Exit code 2 if LLM fails | false | +| `--summary-model` | LLM model for summary | gpt-4o-mini | + +## Data Flow + +``` +┌─────────────────────────────────────────────────────────────────┐ +│ INPUT │ +├─────────────────────────────────────────────────────────────────┤ +│ detected_spans_v2 ←──JOIN──→ review_facts_v1 │ +│ (primitives, valence, (review_time_utc, rating, │ +│ confidence, intensity) business_id) │ +└─────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ SPAN SELECTION │ +├─────────────────────────────────────────────────────────────────┤ +│ Mode: time_window │ +│ → Filter by review_time_utc in [start, end) │ +│ → Join on (review_id, business_id) for data isolation │ +│ │ +│ Mode: latest_run │ +│ → Filter by run_id │ +└─────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ COMPUTATION │ +├─────────────────────────────────────────────────────────────────┤ +│ 1. Population stats (review count, language distribution) │ +│ 2. Overall score: 100 × Σ(valence × conf × intensity) / Σ(...)│ +│ 3. Domain scores (O/P/J/E/V weighted averages) │ +│ 4. Primitive scores (per-primitive breakdown) │ +│ 5. Drivers (impact = weighted share of total) │ +│ 6. Alerts (SAFETY, UNMAPPED thresholds) │ +│ 7. Recommendations (templated playbooks) │ +└─────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ TIME COMPARISONS │ +├─────────────────────────────────────────────────────────────────┤ +│ Previous Window: │ +│ → Same duration, immediately preceding current │ +│ → Requires MIN_REVIEWS_FOR_COMPARISON (10) │ +│ → Requires MIN_COVERAGE_FOR_COMPARISON (80%) │ +│ │ +│ Sector Benchmark: │ +│ → Requires 500+ spans, 3+ businesses in sector │ +│ → Status: ok | insufficient_data | missing_sector_code │ +└─────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ EXECUTIVE SUMMARY │ +├─────────────────────────────────────────────────────────────────┤ +│ LLM-generated (gpt-4o-mini) with narrative guardrails: │ +│ │ +│ Weakness Priority: │ +│ 1. Negative driver (if drivers.negatives non-empty) │ +│ 2. Qualifying dip (within 90d, review_count ≥ 3) │ +│ 3. None ("no persistent weaknesses surfaced") │ +│ │ +│ Guardrails: │ +│ - No "recent dip" + "no major issues" contradiction │ +│ - Most recent qualifying dip if multiple exist │ +│ - Action must tie to cited weakness or top positive │ +│ │ +│ Fallback: Deterministic summary if LLM unavailable │ +└─────────────────────────────────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────────┐ +│ OUTPUT │ +├─────────────────────────────────────────────────────────────────┤ +│ JSON Report (schema_version: 1.0) │ +│ - business, window, population │ +│ - scores (overall, domains, primitives) │ +│ - drivers (positives, negatives with evidence) │ +│ - alerts, recommendations │ +│ - comparisons (previous_window, sector_benchmark) │ +│ - timeline (granularity, points) │ +│ - executive_summary, executive_summary_meta │ +└─────────────────────────────────────────────────────────────────┘ +``` + +## Output Schema + +### Top-Level Fields + +```json +{ + "schema_version": "1.0", + "report_id": "uuid", + "primary_run_id": "uuid | null", + "generated_at": "2026-01-31T12:00:00Z", + "window": { "start", "end", "timezone", "mode" }, + "business": { "business_id", "sector_code", "gbp_path" }, + "population": { ... }, + "scores": { "overall", "domains", "primitives" }, + "drivers": { "positives", "negatives" }, + "alerts": [ ... ], + "recommendations": [ ... ], + "comparisons": { "previous_window", "sector_benchmark" }, + "timeline": { "granularity", "points" }, + "executive_summary": "string | null", + "executive_summary_meta": { "enabled", "generated", "model", "error", "fallback_used" } +} +``` + +### Scores Structure + +```json +{ + "overall": { + "score": 85.3, + "score_domain_weighted": 85.7, + "positive_share": 0.897, + "negative_share": 0.077, + "mixed_share": 0.013, + "neutral_share": 0.013 + }, + "domains": { + "O": { "score": 100.0, "volume": 10 }, + "P": { "score": 86.2, "volume": 17 }, + "J": { "score": -23.4, "volume": 5 }, + "E": { "score": 94.8, "volume": 35 }, + "V": { "score": 100.0, "volume": 10 } + }, + "primitives": { + "VALUE_FOR_MONEY": { + "domain": "V", + "score": 100.0, + "volume": 10, + "valence_counts": { "+": 10, "-": 0, "0": 0, "±": 0 }, + "top_entities": [ ... ] + } + } +} +``` + +### Driver Structure + +```json +{ + "positives": [ + { + "primitive": "VALUE_FOR_MONEY", + "impact": 0.147, + "summary": "Positive V/VALUE_FOR_MONEY mentions.", + "evidence": [ + { + "review_id": "abc123", + "language": "en", + "span_text": "the prices are super affordable.", + "valence": "+", + "intensity": 2, + "confidence": 0.9 + } + ] + } + ], + "negatives": [ ... ] +} +``` + +### Timeline Structure + +```json +{ + "granularity": "month", + "points": [ + { + "bucket_start_utc": "2025-12-01T00:00:00Z", + "review_count": 8, + "span_count": 25, + "positive_count": 15, + "negative_count": 8, + "avg_rating": 2.88, + "strength_score": -32.6 + } + ] +} +``` + +## Production Guardrails + +### Data Isolation + +All queries join `detected_spans_v2` with `review_facts_v1` on **both** `review_id` AND `business_id` to prevent cross-business contamination: + +```sql +JOIN pipeline.review_facts_v1 f + ON f.review_id = s.review_id + AND f.business_id = s.business_id +``` + +### Score Consistency + +An invariant check ensures `scores.overall.score` matches `comparisons.previous_window.scores.overall.current`. If delta > 1.0, an `internal_inconsistency` alert is emitted. + +### Executive Summary Meta + +```json +{ + "enabled": true, + "generated": true, + "model": "gpt-4o-mini", + "error": null, + "generated_at": "2026-01-31T12:00:00Z", + "fallback_used": false +} +``` + +- `enabled`: Whether summary generation was requested +- `generated`: Whether LLM successfully produced a summary +- `error`: Error message if generation failed +- `fallback_used`: Whether deterministic fallback was used + +### Exit Codes + +| Code | Meaning | +|------|---------| +| 0 | Success | +| 1 | Business not found or no spans | +| 2 | `--require-summary` and LLM failed | + +## Scoring Formula + +### Overall Score + +Same formula as `PERIOD_SCORES_QUERY` for consistency: + +``` +score = 100 × Σ(valence × confidence × intensity) / Σ(confidence × intensity) +``` + +Where: +- valence: +1 for positive, -1 for negative, 0 for neutral/mixed +- confidence: 0.0 to 1.0 +- intensity: 1 to 3 + +### Domain-Weighted Score + +Alternative metric (exposed as `score_domain_weighted`): + +``` +score = Σ(domain_score × domain_volume) / Σ(domain_volume) +``` + +### Primitive Score + +``` +score = 100 × Σ(w × valence_num) / Σ(w) +w = confidence × (0.75 + 0.25×(detail-1)) × (0.8 + 0.2×(intensity-1)) +``` + +## Thresholds + +| Threshold | Value | Purpose | +|-----------|-------|---------| +| MIN_REVIEWS_FOR_COMPARISON | 10 | Minimum reviews per period for trend | +| MIN_COVERAGE_FOR_COMPARISON | 0.80 | Minimum review_time coverage | +| Sector benchmark spans | 500 | Minimum sector spans for benchmark | +| Sector benchmark businesses | 3 | Minimum businesses in sector | +| UNMAPPED rate warn | 0.10 | Alert if >10% unmapped | +| UNMAPPED rate critical | 0.15 | Critical alert if >15% unmapped | +| SAFETY negative warn | 0.05 | Alert if >5% SAFETY negative | +| SAFETY negative critical | 0.10 | Critical alert if >10% SAFETY negative | +| Dip recency | 90 days | Maximum age for "recent" dip | +| Dip volume | 3 reviews | Minimum reviews to qualify as dip | + +## Domain Mapping + +| Domain | Code | Primitives | +|--------|------|------------| +| Output/Product | O | TASTE, CRAFT, FRESHNESS, TEMPERATURE, EFFECTIVENESS, ACCURACY, CONDITION, CONSISTENCY | +| People/Service | P | MANNER, COMPETENCE, ATTENTIVENESS, COMMUNICATION | +| Journey/Process | J | SPEED, FRICTION, RELIABILITY, AVAILABILITY | +| Environment | E | CLEANLINESS, COMFORT, SAFETY, AMBIANCE, ACCESSIBILITY, DIGITAL_UX | +| Value | V | PRICE_LEVEL, PRICE_FAIRNESS, PRICE_TRANSPARENCY, VALUE_FOR_MONEY | +| Meta | meta | HONESTY, ETHICS, PROMISES, ACKNOWLEDGMENT, RESPONSE_QUALITY, RECOVERY, RETURN_INTENT, RECOMMEND, RECOGNITION, UNMAPPED, NON_INFORMATIVE | + +## Testing + +```bash +cd packages/reviewiq-pipeline +python -m pytest tests/test_executive_summary.py -v +``` + +16 tests covering: +- Negative driver priority over dips +- Qualifying dip selection (90 days + review_count ≥ 3) +- Most recent dip when multiple qualify +- Contradiction detection (dip + "no major issues") +- Non-qualifying dips not cited as "recent" +- Summary input construction + +## Environment Variables + +| Variable | Required | Description | +|----------|----------|-------------| +| DATABASE_URL | Yes | PostgreSQL connection string | +| OPENAI_API_KEY | No | Required for LLM summary (fallback used if missing) | + +## Example Output + +```bash +$ python scripts/reputation_report.py --business "Go Karts Mar Menor" --days 365 --quiet + +Report written to stdout + +============================================================ +REPUTATION REPORT: Go Karts Mar Menor +============================================================ +Window: 2025-01-31T12:00:00Z - 2026-01-31T12:00:00Z +Reviews: 27 +Content spans: 78 +Overall score: 85.3 +Positive share: 89.7% +Negative share: 7.7% + +Top positive drivers: + VALUE_FOR_MONEY: 14.7% impact + RECOMMEND: 14.5% impact + MANNER: 13.5% impact + +Top negative drivers: +============================================================ +``` + +## Changelog + +### v8 (2026-01-31) +- Initial production release +- Cross-business join safety +- Score formula alignment +- Executive summary with narrative guardrails +- Comprehensive test suite