docs(reputation-report): Add comprehensive pipeline documentation

Documents: - Data flow and architecture - CLI options and usage - Output schema with examples - Scoring formulas - Production guardrails - Thresholds and domain mapping - Testing instructions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 23:24:57 +00:00
parent 5324542e72
commit ee596c7969
1 changed files with 387 additions and 0 deletions
--- a/packages/reviewiq-pipeline/docs/REPUTATION_REPORT.md
+++ b/packages/reviewiq-pipeline/docs/REPUTATION_REPORT.md
@@ -0,0 +1,387 @@
+# Reputation Report Pipeline
+
+**Version:** 1.0 (v8)
+**Status:** Production-ready
+**Location:** `packages/reviewiq-pipeline/scripts/reputation_report.py`
+
+## Overview
+
+The Reputation Report generates business-facing, time-windowed reputation analytics from classified review spans. It produces a €50-value report suitable for SMB business owners, including:
+
+- Overall performance score (0-100 scale)
+- Domain and primitive breakdowns
+- Positive and negative drivers with evidence
+- Time comparisons (current vs previous period)
+- Sector benchmarks
+- Timeline visualization data
+- LLM-generated executive summary
+
+## Quick Start
+
+```bash
+# Basic usage (last 365 days)
+python scripts/reputation_report.py --business "Go Karts Mar Menor" --days 365
+
+# With output file
+python scripts/reputation_report.py --business "Business Name" --days 30 --output report.json
+
+# Custom date range
+python scripts/reputation_report.py --business "Business Name" --start 2025-01-01 --end 2025-12-31
+
+# Production mode (fail if LLM summary fails)
+python scripts/reputation_report.py --business "Business Name" --days 365 --require-summary
+```
+
+## CLI Options
+
+| Option | Description | Default |
+|--------|-------------|---------|
+| `--business` | Business ID or search pattern (required) | - |
+| `--days` | Last N days to analyze | 30 |
+| `--start` | Window start (ISO-8601) | - |
+| `--end` | Window end (ISO-8601) | - |
+| `--run-id` | Specific run ID (overrides time window) | - |
+| `--timezone` | IANA timezone for window | UTC |
+| `--output, -o` | Output file path | stdout |
+| `--quiet, -q` | Suppress console summary | false |
+| `--no-summary` | Disable executive summary | false |
+| `--require-summary` | Exit code 2 if LLM fails | false |
+| `--summary-model` | LLM model for summary | gpt-4o-mini |
+
+## Data Flow
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                         INPUT                                    │
+├─────────────────────────────────────────────────────────────────┤
+│  detected_spans_v2  ←──JOIN──→  review_facts_v1                 │
+│  (primitives, valence,         (review_time_utc, rating,        │
+│   confidence, intensity)        business_id)                     │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                    SPAN SELECTION                                │
+├─────────────────────────────────────────────────────────────────┤
+│  Mode: time_window                                               │
+│    → Filter by review_time_utc in [start, end)                  │
+│    → Join on (review_id, business_id) for data isolation        │
+│                                                                  │
+│  Mode: latest_run                                                │
+│    → Filter by run_id                                           │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                     COMPUTATION                                  │
+├─────────────────────────────────────────────────────────────────┤
+│  1. Population stats (review count, language distribution)       │
+│  2. Overall score: 100 × Σ(valence × conf × intensity) / Σ(...)│
+│  3. Domain scores (O/P/J/E/V weighted averages)                 │
+│  4. Primitive scores (per-primitive breakdown)                   │
+│  5. Drivers (impact = weighted share of total)                   │
+│  6. Alerts (SAFETY, UNMAPPED thresholds)                        │
+│  7. Recommendations (templated playbooks)                        │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                   TIME COMPARISONS                               │
+├─────────────────────────────────────────────────────────────────┤
+│  Previous Window:                                                │
+│    → Same duration, immediately preceding current                │
+│    → Requires MIN_REVIEWS_FOR_COMPARISON (10)                   │
+│    → Requires MIN_COVERAGE_FOR_COMPARISON (80%)                 │
+│                                                                  │
+│  Sector Benchmark:                                               │
+│    → Requires 500+ spans, 3+ businesses in sector               │
+│    → Status: ok | insufficient_data | missing_sector_code       │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                  EXECUTIVE SUMMARY                               │
+├─────────────────────────────────────────────────────────────────┤
+│  LLM-generated (gpt-4o-mini) with narrative guardrails:         │
+│                                                                  │
+│  Weakness Priority:                                              │
+│    1. Negative driver (if drivers.negatives non-empty)          │
+│    2. Qualifying dip (within 90d, review_count ≥ 3)             │
+│    3. None ("no persistent weaknesses surfaced")                │
+│                                                                  │
+│  Guardrails:                                                     │
+│    - No "recent dip" + "no major issues" contradiction          │
+│    - Most recent qualifying dip if multiple exist               │
+│    - Action must tie to cited weakness or top positive          │
+│                                                                  │
+│  Fallback: Deterministic summary if LLM unavailable             │
+└─────────────────────────────────────────────────────────────────┘
+                              │
+                              ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                        OUTPUT                                    │
+├─────────────────────────────────────────────────────────────────┤
+│  JSON Report (schema_version: 1.0)                              │
+│    - business, window, population                                │
+│    - scores (overall, domains, primitives)                       │
+│    - drivers (positives, negatives with evidence)               │
+│    - alerts, recommendations                                     │
+│    - comparisons (previous_window, sector_benchmark)            │
+│    - timeline (granularity, points)                             │
+│    - executive_summary, executive_summary_meta                   │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+## Output Schema
+
+### Top-Level Fields
+
+```json
+{
+  "schema_version": "1.0",
+  "report_id": "uuid",
+  "primary_run_id": "uuid | null",
+  "generated_at": "2026-01-31T12:00:00Z",
+  "window": { "start", "end", "timezone", "mode" },
+  "business": { "business_id", "sector_code", "gbp_path" },
+  "population": { ... },
+  "scores": { "overall", "domains", "primitives" },
+  "drivers": { "positives", "negatives" },
+  "alerts": [ ... ],
+  "recommendations": [ ... ],
+  "comparisons": { "previous_window", "sector_benchmark" },
+  "timeline": { "granularity", "points" },
+  "executive_summary": "string | null",
+  "executive_summary_meta": { "enabled", "generated", "model", "error", "fallback_used" }
+}
+```
+
+### Scores Structure
+
+```json
+{
+  "overall": {
+    "score": 85.3,
+    "score_domain_weighted": 85.7,
+    "positive_share": 0.897,
+    "negative_share": 0.077,
+    "mixed_share": 0.013,
+    "neutral_share": 0.013
+  },
+  "domains": {
+    "O": { "score": 100.0, "volume": 10 },
+    "P": { "score": 86.2, "volume": 17 },
+    "J": { "score": -23.4, "volume": 5 },
+    "E": { "score": 94.8, "volume": 35 },
+    "V": { "score": 100.0, "volume": 10 }
+  },
+  "primitives": {
+    "VALUE_FOR_MONEY": {
+      "domain": "V",
+      "score": 100.0,
+      "volume": 10,
+      "valence_counts": { "+": 10, "-": 0, "0": 0, "±": 0 },
+      "top_entities": [ ... ]
+    }
+  }
+}
+```
+
+### Driver Structure
+
+```json
+{
+  "positives": [
+    {
+      "primitive": "VALUE_FOR_MONEY",
+      "impact": 0.147,
+      "summary": "Positive V/VALUE_FOR_MONEY mentions.",
+      "evidence": [
+        {
+          "review_id": "abc123",
+          "language": "en",
+          "span_text": "the prices are super affordable.",
+          "valence": "+",
+          "intensity": 2,
+          "confidence": 0.9
+        }
+      ]
+    }
+  ],
+  "negatives": [ ... ]
+}
+```
+
+### Timeline Structure
+
+```json
+{
+  "granularity": "month",
+  "points": [
+    {
+      "bucket_start_utc": "2025-12-01T00:00:00Z",
+      "review_count": 8,
+      "span_count": 25,
+      "positive_count": 15,
+      "negative_count": 8,
+      "avg_rating": 2.88,
+      "strength_score": -32.6
+    }
+  ]
+}
+```
+
+## Production Guardrails
+
+### Data Isolation
+
+All queries join `detected_spans_v2` with `review_facts_v1` on **both** `review_id` AND `business_id` to prevent cross-business contamination:
+
+```sql
+JOIN pipeline.review_facts_v1 f
+  ON f.review_id = s.review_id
+ AND f.business_id = s.business_id
+```
+
+### Score Consistency
+
+An invariant check ensures `scores.overall.score` matches `comparisons.previous_window.scores.overall.current`. If delta > 1.0, an `internal_inconsistency` alert is emitted.
+
+### Executive Summary Meta
+
+```json
+{
+  "enabled": true,
+  "generated": true,
+  "model": "gpt-4o-mini",
+  "error": null,
+  "generated_at": "2026-01-31T12:00:00Z",
+  "fallback_used": false
+}
+```
+
+- `enabled`: Whether summary generation was requested
+- `generated`: Whether LLM successfully produced a summary
+- `error`: Error message if generation failed
+- `fallback_used`: Whether deterministic fallback was used
+
+### Exit Codes
+
+| Code | Meaning |
+|------|---------|
+| 0 | Success |
+| 1 | Business not found or no spans |
+| 2 | `--require-summary` and LLM failed |
+
+## Scoring Formula
+
+### Overall Score
+
+Same formula as `PERIOD_SCORES_QUERY` for consistency:
+
+```
+score = 100 × Σ(valence × confidence × intensity) / Σ(confidence × intensity)
+```
+
+Where:
+- valence: +1 for positive, -1 for negative, 0 for neutral/mixed
+- confidence: 0.0 to 1.0
+- intensity: 1 to 3
+
+### Domain-Weighted Score
+
+Alternative metric (exposed as `score_domain_weighted`):
+
+```
+score = Σ(domain_score × domain_volume) / Σ(domain_volume)
+```
+
+### Primitive Score
+
+```
+score = 100 × Σ(w × valence_num) / Σ(w)
+w = confidence × (0.75 + 0.25×(detail-1)) × (0.8 + 0.2×(intensity-1))
+```
+
+## Thresholds
+
+| Threshold | Value | Purpose |
+|-----------|-------|---------|
+| MIN_REVIEWS_FOR_COMPARISON | 10 | Minimum reviews per period for trend |
+| MIN_COVERAGE_FOR_COMPARISON | 0.80 | Minimum review_time coverage |
+| Sector benchmark spans | 500 | Minimum sector spans for benchmark |
+| Sector benchmark businesses | 3 | Minimum businesses in sector |
+| UNMAPPED rate warn | 0.10 | Alert if >10% unmapped |
+| UNMAPPED rate critical | 0.15 | Critical alert if >15% unmapped |
+| SAFETY negative warn | 0.05 | Alert if >5% SAFETY negative |
+| SAFETY negative critical | 0.10 | Critical alert if >10% SAFETY negative |
+| Dip recency | 90 days | Maximum age for "recent" dip |
+| Dip volume | 3 reviews | Minimum reviews to qualify as dip |
+
+## Domain Mapping
+
+| Domain | Code | Primitives |
+|--------|------|------------|
+| Output/Product | O | TASTE, CRAFT, FRESHNESS, TEMPERATURE, EFFECTIVENESS, ACCURACY, CONDITION, CONSISTENCY |
+| People/Service | P | MANNER, COMPETENCE, ATTENTIVENESS, COMMUNICATION |
+| Journey/Process | J | SPEED, FRICTION, RELIABILITY, AVAILABILITY |
+| Environment | E | CLEANLINESS, COMFORT, SAFETY, AMBIANCE, ACCESSIBILITY, DIGITAL_UX |
+| Value | V | PRICE_LEVEL, PRICE_FAIRNESS, PRICE_TRANSPARENCY, VALUE_FOR_MONEY |
+| Meta | meta | HONESTY, ETHICS, PROMISES, ACKNOWLEDGMENT, RESPONSE_QUALITY, RECOVERY, RETURN_INTENT, RECOMMEND, RECOGNITION, UNMAPPED, NON_INFORMATIVE |
+
+## Testing
+
+```bash
+cd packages/reviewiq-pipeline
+python -m pytest tests/test_executive_summary.py -v
+```
+
+16 tests covering:
+- Negative driver priority over dips
+- Qualifying dip selection (90 days + review_count ≥ 3)
+- Most recent dip when multiple qualify
+- Contradiction detection (dip + "no major issues")
+- Non-qualifying dips not cited as "recent"
+- Summary input construction
+
+## Environment Variables
+
+| Variable | Required | Description |
+|----------|----------|-------------|
+| DATABASE_URL | Yes | PostgreSQL connection string |
+| OPENAI_API_KEY | No | Required for LLM summary (fallback used if missing) |
+
+## Example Output
+
+```bash
+$ python scripts/reputation_report.py --business "Go Karts Mar Menor" --days 365 --quiet
+
+Report written to stdout
+
+============================================================
+REPUTATION REPORT: Go Karts Mar Menor
+============================================================
+Window: 2025-01-31T12:00:00Z - 2026-01-31T12:00:00Z
+Reviews: 27
+Content spans: 78
+Overall score: 85.3
+Positive share: 89.7%
+Negative share: 7.7%
+
+Top positive drivers:
+  VALUE_FOR_MONEY: 14.7% impact
+  RECOMMEND: 14.5% impact
+  MANNER: 13.5% impact
+
+Top negative drivers:
+============================================================
+```
+
+## Changelog
+
+### v8 (2026-01-31)
+- Initial production release
+- Cross-business join safety
+- Score formula alignment
+- Executive summary with narrative guardrails
+- Comprehensive test suite