Files

Alejandro Gutiérrez ee596c7969 docs(reputation-report): Add comprehensive pipeline documentation

Documents:
- Data flow and architecture
- CLI options and usage
- Output schema with examples
- Scoring formulas
- Production guardrails
- Thresholds and domain mapping
- Testing instructions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-31 23:24:57 +00:00

16 KiB

Raw Blame History

Reputation Report Pipeline

Version: 1.0 (v8) Status: Production-ready Location: packages/reviewiq-pipeline/scripts/reputation_report.py

Overview

The Reputation Report generates business-facing, time-windowed reputation analytics from classified review spans. It produces a €50-value report suitable for SMB business owners, including:

Overall performance score (0-100 scale)
Domain and primitive breakdowns
Positive and negative drivers with evidence
Time comparisons (current vs previous period)
Sector benchmarks
Timeline visualization data
LLM-generated executive summary

Quick Start

# Basic usage (last 365 days)
python scripts/reputation_report.py --business "Go Karts Mar Menor" --days 365

# With output file
python scripts/reputation_report.py --business "Business Name" --days 30 --output report.json

# Custom date range
python scripts/reputation_report.py --business "Business Name" --start 2025-01-01 --end 2025-12-31

# Production mode (fail if LLM summary fails)
python scripts/reputation_report.py --business "Business Name" --days 365 --require-summary

CLI Options

Option	Description	Default
`--business`	Business ID or search pattern (required)	-
`--days`	Last N days to analyze	30
`--start`	Window start (ISO-8601)	-
`--end`	Window end (ISO-8601)	-
`--run-id`	Specific run ID (overrides time window)	-
`--timezone`	IANA timezone for window	UTC
`--output, -o`	Output file path	stdout
`--quiet, -q`	Suppress console summary	false
`--no-summary`	Disable executive summary	false
`--require-summary`	Exit code 2 if LLM fails	false
`--summary-model`	LLM model for summary	gpt-4o-mini

Data Flow

┌─────────────────────────────────────────────────────────────────┐
│                         INPUT                                    │
├─────────────────────────────────────────────────────────────────┤
│  detected_spans_v2  ←──JOIN──→  review_facts_v1                 │
│  (primitives, valence,         (review_time_utc, rating,        │
│   confidence, intensity)        business_id)                     │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    SPAN SELECTION                                │
├─────────────────────────────────────────────────────────────────┤
│  Mode: time_window                                               │
│    → Filter by review_time_utc in [start, end)                  │
│    → Join on (review_id, business_id) for data isolation        │
│                                                                  │
│  Mode: latest_run                                                │
│    → Filter by run_id                                           │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                     COMPUTATION                                  │
├─────────────────────────────────────────────────────────────────┤
│  1. Population stats (review count, language distribution)       │
│  2. Overall score: 100 × Σ(valence × conf × intensity) / Σ(...)│
│  3. Domain scores (O/P/J/E/V weighted averages)                 │
│  4. Primitive scores (per-primitive breakdown)                   │
│  5. Drivers (impact = weighted share of total)                   │
│  6. Alerts (SAFETY, UNMAPPED thresholds)                        │
│  7. Recommendations (templated playbooks)                        │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                   TIME COMPARISONS                               │
├─────────────────────────────────────────────────────────────────┤
│  Previous Window:                                                │
│    → Same duration, immediately preceding current                │
│    → Requires MIN_REVIEWS_FOR_COMPARISON (10)                   │
│    → Requires MIN_COVERAGE_FOR_COMPARISON (80%)                 │
│                                                                  │
│  Sector Benchmark:                                               │
│    → Requires 500+ spans, 3+ businesses in sector               │
│    → Status: ok | insufficient_data | missing_sector_code       │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                  EXECUTIVE SUMMARY                               │
├─────────────────────────────────────────────────────────────────┤
│  LLM-generated (gpt-4o-mini) with narrative guardrails:         │
│                                                                  │
│  Weakness Priority:                                              │
│    1. Negative driver (if drivers.negatives non-empty)          │
│    2. Qualifying dip (within 90d, review_count ≥ 3)             │
│    3. None ("no persistent weaknesses surfaced")                │
│                                                                  │
│  Guardrails:                                                     │
│    - No "recent dip" + "no major issues" contradiction          │
│    - Most recent qualifying dip if multiple exist               │
│    - Action must tie to cited weakness or top positive          │
│                                                                  │
│  Fallback: Deterministic summary if LLM unavailable             │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                        OUTPUT                                    │
├─────────────────────────────────────────────────────────────────┤
│  JSON Report (schema_version: 1.0)                              │
│    - business, window, population                                │
│    - scores (overall, domains, primitives)                       │
│    - drivers (positives, negatives with evidence)               │
│    - alerts, recommendations                                     │
│    - comparisons (previous_window, sector_benchmark)            │
│    - timeline (granularity, points)                             │
│    - executive_summary, executive_summary_meta                   │
└─────────────────────────────────────────────────────────────────┘

Output Schema

Top-Level Fields

{
  "schema_version": "1.0",
  "report_id": "uuid",
  "primary_run_id": "uuid | null",
  "generated_at": "2026-01-31T12:00:00Z",
  "window": { "start", "end", "timezone", "mode" },
  "business": { "business_id", "sector_code", "gbp_path" },
  "population": { ... },
  "scores": { "overall", "domains", "primitives" },
  "drivers": { "positives", "negatives" },
  "alerts": [ ... ],
  "recommendations": [ ... ],
  "comparisons": { "previous_window", "sector_benchmark" },
  "timeline": { "granularity", "points" },
  "executive_summary": "string | null",
  "executive_summary_meta": { "enabled", "generated", "model", "error", "fallback_used" }
}

Scores Structure

{
  "overall": {
    "score": 85.3,
    "score_domain_weighted": 85.7,
    "positive_share": 0.897,
    "negative_share": 0.077,
    "mixed_share": 0.013,
    "neutral_share": 0.013
  },
  "domains": {
    "O": { "score": 100.0, "volume": 10 },
    "P": { "score": 86.2, "volume": 17 },
    "J": { "score": -23.4, "volume": 5 },
    "E": { "score": 94.8, "volume": 35 },
    "V": { "score": 100.0, "volume": 10 }
  },
  "primitives": {
    "VALUE_FOR_MONEY": {
      "domain": "V",
      "score": 100.0,
      "volume": 10,
      "valence_counts": { "+": 10, "-": 0, "0": 0, "±": 0 },
      "top_entities": [ ... ]
    }
  }
}

Driver Structure

{
  "positives": [
    {
      "primitive": "VALUE_FOR_MONEY",
      "impact": 0.147,
      "summary": "Positive V/VALUE_FOR_MONEY mentions.",
      "evidence": [
        {
          "review_id": "abc123",
          "language": "en",
          "span_text": "the prices are super affordable.",
          "valence": "+",
          "intensity": 2,
          "confidence": 0.9
        }
      ]
    }
  ],
  "negatives": [ ... ]
}

Timeline Structure

{
  "granularity": "month",
  "points": [
    {
      "bucket_start_utc": "2025-12-01T00:00:00Z",
      "review_count": 8,
      "span_count": 25,
      "positive_count": 15,
      "negative_count": 8,
      "avg_rating": 2.88,
      "strength_score": -32.6
    }
  ]
}

Production Guardrails

Data Isolation

All queries join detected_spans_v2 with review_facts_v1 on both review_id AND business_id to prevent cross-business contamination:

JOIN pipeline.review_facts_v1 f
  ON f.review_id = s.review_id
 AND f.business_id = s.business_id

Score Consistency

An invariant check ensures scores.overall.score matches comparisons.previous_window.scores.overall.current. If delta > 1.0, an internal_inconsistency alert is emitted.

Executive Summary Meta

{
  "enabled": true,
  "generated": true,
  "model": "gpt-4o-mini",
  "error": null,
  "generated_at": "2026-01-31T12:00:00Z",
  "fallback_used": false
}

enabled: Whether summary generation was requested
generated: Whether LLM successfully produced a summary
error: Error message if generation failed
fallback_used: Whether deterministic fallback was used

Exit Codes

Code	Meaning
0	Success
1	Business not found or no spans
2	`--require-summary` and LLM failed

Scoring Formula

Overall Score

Same formula as PERIOD_SCORES_QUERY for consistency:

score = 100 × Σ(valence × confidence × intensity) / Σ(confidence × intensity)

Where:

valence: +1 for positive, -1 for negative, 0 for neutral/mixed
confidence: 0.0 to 1.0
intensity: 1 to 3

Domain-Weighted Score

Alternative metric (exposed as score_domain_weighted):

score = Σ(domain_score × domain_volume) / Σ(domain_volume)

Primitive Score

score = 100 × Σ(w × valence_num) / Σ(w)
w = confidence × (0.75 + 0.25×(detail-1)) × (0.8 + 0.2×(intensity-1))

Thresholds

Threshold	Value	Purpose
MIN_REVIEWS_FOR_COMPARISON	10	Minimum reviews per period for trend
MIN_COVERAGE_FOR_COMPARISON	0.80	Minimum review_time coverage
Sector benchmark spans	500	Minimum sector spans for benchmark
Sector benchmark businesses	3	Minimum businesses in sector
UNMAPPED rate warn	0.10	Alert if >10% unmapped
UNMAPPED rate critical	0.15	Critical alert if >15% unmapped
SAFETY negative warn	0.05	Alert if >5% SAFETY negative
SAFETY negative critical	0.10	Critical alert if >10% SAFETY negative
Dip recency	90 days	Maximum age for "recent" dip
Dip volume	3 reviews	Minimum reviews to qualify as dip

Domain Mapping

Domain	Code	Primitives
Output/Product	O	TASTE, CRAFT, FRESHNESS, TEMPERATURE, EFFECTIVENESS, ACCURACY, CONDITION, CONSISTENCY
People/Service	P	MANNER, COMPETENCE, ATTENTIVENESS, COMMUNICATION
Journey/Process	J	SPEED, FRICTION, RELIABILITY, AVAILABILITY
Environment	E	CLEANLINESS, COMFORT, SAFETY, AMBIANCE, ACCESSIBILITY, DIGITAL_UX
Value	V	PRICE_LEVEL, PRICE_FAIRNESS, PRICE_TRANSPARENCY, VALUE_FOR_MONEY
Meta	meta	HONESTY, ETHICS, PROMISES, ACKNOWLEDGMENT, RESPONSE_QUALITY, RECOVERY, RETURN_INTENT, RECOMMEND, RECOGNITION, UNMAPPED, NON_INFORMATIVE

Testing

cd packages/reviewiq-pipeline
python -m pytest tests/test_executive_summary.py -v

16 tests covering:

Negative driver priority over dips
Qualifying dip selection (90 days + review_count ≥ 3)
Most recent dip when multiple qualify
Contradiction detection (dip + "no major issues")
Non-qualifying dips not cited as "recent"
Summary input construction

Environment Variables

Variable	Required	Description
DATABASE_URL	Yes	PostgreSQL connection string
OPENAI_API_KEY	No	Required for LLM summary (fallback used if missing)

Example Output

$ python scripts/reputation_report.py --business "Go Karts Mar Menor" --days 365 --quiet

Report written to stdout

============================================================
REPUTATION REPORT: Go Karts Mar Menor
============================================================
Window: 2025-01-31T12:00:00Z - 2026-01-31T12:00:00Z
Reviews: 27
Content spans: 78
Overall score: 85.3
Positive share: 89.7%
Negative share: 7.7%

Top positive drivers:
  VALUE_FOR_MONEY: 14.7% impact
  RECOMMEND: 14.5% impact
  MANNER: 13.5% impact

Top negative drivers:
============================================================

Changelog

v8 (2026-01-31)

Initial production release
Cross-business join safety
Score formula alignment
Executive summary with narrative guardrails
Comprehensive test suite

16 KiB Raw Blame History Unescape Escape