Files
whyrating-engine-legacy/packages/reviewiq-pipeline/docs/REPUTATION_REPORT.md
Alejandro Gutiérrez ee596c7969 docs(reputation-report): Add comprehensive pipeline documentation
Documents:
- Data flow and architecture
- CLI options and usage
- Output schema with examples
- Scoring formulas
- Production guardrails
- Thresholds and domain mapping
- Testing instructions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-31 23:24:57 +00:00

388 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Reputation Report Pipeline
**Version:** 1.0 (v8)
**Status:** Production-ready
**Location:** `packages/reviewiq-pipeline/scripts/reputation_report.py`
## Overview
The Reputation Report generates business-facing, time-windowed reputation analytics from classified review spans. It produces a €50-value report suitable for SMB business owners, including:
- Overall performance score (0-100 scale)
- Domain and primitive breakdowns
- Positive and negative drivers with evidence
- Time comparisons (current vs previous period)
- Sector benchmarks
- Timeline visualization data
- LLM-generated executive summary
## Quick Start
```bash
# Basic usage (last 365 days)
python scripts/reputation_report.py --business "Go Karts Mar Menor" --days 365
# With output file
python scripts/reputation_report.py --business "Business Name" --days 30 --output report.json
# Custom date range
python scripts/reputation_report.py --business "Business Name" --start 2025-01-01 --end 2025-12-31
# Production mode (fail if LLM summary fails)
python scripts/reputation_report.py --business "Business Name" --days 365 --require-summary
```
## CLI Options
| Option | Description | Default |
|--------|-------------|---------|
| `--business` | Business ID or search pattern (required) | - |
| `--days` | Last N days to analyze | 30 |
| `--start` | Window start (ISO-8601) | - |
| `--end` | Window end (ISO-8601) | - |
| `--run-id` | Specific run ID (overrides time window) | - |
| `--timezone` | IANA timezone for window | UTC |
| `--output, -o` | Output file path | stdout |
| `--quiet, -q` | Suppress console summary | false |
| `--no-summary` | Disable executive summary | false |
| `--require-summary` | Exit code 2 if LLM fails | false |
| `--summary-model` | LLM model for summary | gpt-4o-mini |
## Data Flow
```
┌─────────────────────────────────────────────────────────────────┐
│ INPUT │
├─────────────────────────────────────────────────────────────────┤
│ detected_spans_v2 ←──JOIN──→ review_facts_v1 │
│ (primitives, valence, (review_time_utc, rating, │
│ confidence, intensity) business_id) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ SPAN SELECTION │
├─────────────────────────────────────────────────────────────────┤
│ Mode: time_window │
│ → Filter by review_time_utc in [start, end) │
│ → Join on (review_id, business_id) for data isolation │
│ │
│ Mode: latest_run │
│ → Filter by run_id │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ COMPUTATION │
├─────────────────────────────────────────────────────────────────┤
│ 1. Population stats (review count, language distribution) │
│ 2. Overall score: 100 × Σ(valence × conf × intensity) / Σ(...)│
│ 3. Domain scores (O/P/J/E/V weighted averages) │
│ 4. Primitive scores (per-primitive breakdown) │
│ 5. Drivers (impact = weighted share of total) │
│ 6. Alerts (SAFETY, UNMAPPED thresholds) │
│ 7. Recommendations (templated playbooks) │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ TIME COMPARISONS │
├─────────────────────────────────────────────────────────────────┤
│ Previous Window: │
│ → Same duration, immediately preceding current │
│ → Requires MIN_REVIEWS_FOR_COMPARISON (10) │
│ → Requires MIN_COVERAGE_FOR_COMPARISON (80%) │
│ │
│ Sector Benchmark: │
│ → Requires 500+ spans, 3+ businesses in sector │
│ → Status: ok | insufficient_data | missing_sector_code │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ EXECUTIVE SUMMARY │
├─────────────────────────────────────────────────────────────────┤
│ LLM-generated (gpt-4o-mini) with narrative guardrails: │
│ │
│ Weakness Priority: │
│ 1. Negative driver (if drivers.negatives non-empty) │
│ 2. Qualifying dip (within 90d, review_count ≥ 3) │
│ 3. None ("no persistent weaknesses surfaced") │
│ │
│ Guardrails: │
│ - No "recent dip" + "no major issues" contradiction │
│ - Most recent qualifying dip if multiple exist │
│ - Action must tie to cited weakness or top positive │
│ │
│ Fallback: Deterministic summary if LLM unavailable │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│ OUTPUT │
├─────────────────────────────────────────────────────────────────┤
│ JSON Report (schema_version: 1.0) │
│ - business, window, population │
│ - scores (overall, domains, primitives) │
│ - drivers (positives, negatives with evidence) │
│ - alerts, recommendations │
│ - comparisons (previous_window, sector_benchmark) │
│ - timeline (granularity, points) │
│ - executive_summary, executive_summary_meta │
└─────────────────────────────────────────────────────────────────┘
```
## Output Schema
### Top-Level Fields
```json
{
"schema_version": "1.0",
"report_id": "uuid",
"primary_run_id": "uuid | null",
"generated_at": "2026-01-31T12:00:00Z",
"window": { "start", "end", "timezone", "mode" },
"business": { "business_id", "sector_code", "gbp_path" },
"population": { ... },
"scores": { "overall", "domains", "primitives" },
"drivers": { "positives", "negatives" },
"alerts": [ ... ],
"recommendations": [ ... ],
"comparisons": { "previous_window", "sector_benchmark" },
"timeline": { "granularity", "points" },
"executive_summary": "string | null",
"executive_summary_meta": { "enabled", "generated", "model", "error", "fallback_used" }
}
```
### Scores Structure
```json
{
"overall": {
"score": 85.3,
"score_domain_weighted": 85.7,
"positive_share": 0.897,
"negative_share": 0.077,
"mixed_share": 0.013,
"neutral_share": 0.013
},
"domains": {
"O": { "score": 100.0, "volume": 10 },
"P": { "score": 86.2, "volume": 17 },
"J": { "score": -23.4, "volume": 5 },
"E": { "score": 94.8, "volume": 35 },
"V": { "score": 100.0, "volume": 10 }
},
"primitives": {
"VALUE_FOR_MONEY": {
"domain": "V",
"score": 100.0,
"volume": 10,
"valence_counts": { "+": 10, "-": 0, "0": 0, "±": 0 },
"top_entities": [ ... ]
}
}
}
```
### Driver Structure
```json
{
"positives": [
{
"primitive": "VALUE_FOR_MONEY",
"impact": 0.147,
"summary": "Positive V/VALUE_FOR_MONEY mentions.",
"evidence": [
{
"review_id": "abc123",
"language": "en",
"span_text": "the prices are super affordable.",
"valence": "+",
"intensity": 2,
"confidence": 0.9
}
]
}
],
"negatives": [ ... ]
}
```
### Timeline Structure
```json
{
"granularity": "month",
"points": [
{
"bucket_start_utc": "2025-12-01T00:00:00Z",
"review_count": 8,
"span_count": 25,
"positive_count": 15,
"negative_count": 8,
"avg_rating": 2.88,
"strength_score": -32.6
}
]
}
```
## Production Guardrails
### Data Isolation
All queries join `detected_spans_v2` with `review_facts_v1` on **both** `review_id` AND `business_id` to prevent cross-business contamination:
```sql
JOIN pipeline.review_facts_v1 f
ON f.review_id = s.review_id
AND f.business_id = s.business_id
```
### Score Consistency
An invariant check ensures `scores.overall.score` matches `comparisons.previous_window.scores.overall.current`. If delta > 1.0, an `internal_inconsistency` alert is emitted.
### Executive Summary Meta
```json
{
"enabled": true,
"generated": true,
"model": "gpt-4o-mini",
"error": null,
"generated_at": "2026-01-31T12:00:00Z",
"fallback_used": false
}
```
- `enabled`: Whether summary generation was requested
- `generated`: Whether LLM successfully produced a summary
- `error`: Error message if generation failed
- `fallback_used`: Whether deterministic fallback was used
### Exit Codes
| Code | Meaning |
|------|---------|
| 0 | Success |
| 1 | Business not found or no spans |
| 2 | `--require-summary` and LLM failed |
## Scoring Formula
### Overall Score
Same formula as `PERIOD_SCORES_QUERY` for consistency:
```
score = 100 × Σ(valence × confidence × intensity) / Σ(confidence × intensity)
```
Where:
- valence: +1 for positive, -1 for negative, 0 for neutral/mixed
- confidence: 0.0 to 1.0
- intensity: 1 to 3
### Domain-Weighted Score
Alternative metric (exposed as `score_domain_weighted`):
```
score = Σ(domain_score × domain_volume) / Σ(domain_volume)
```
### Primitive Score
```
score = 100 × Σ(w × valence_num) / Σ(w)
w = confidence × (0.75 + 0.25×(detail-1)) × (0.8 + 0.2×(intensity-1))
```
## Thresholds
| Threshold | Value | Purpose |
|-----------|-------|---------|
| MIN_REVIEWS_FOR_COMPARISON | 10 | Minimum reviews per period for trend |
| MIN_COVERAGE_FOR_COMPARISON | 0.80 | Minimum review_time coverage |
| Sector benchmark spans | 500 | Minimum sector spans for benchmark |
| Sector benchmark businesses | 3 | Minimum businesses in sector |
| UNMAPPED rate warn | 0.10 | Alert if >10% unmapped |
| UNMAPPED rate critical | 0.15 | Critical alert if >15% unmapped |
| SAFETY negative warn | 0.05 | Alert if >5% SAFETY negative |
| SAFETY negative critical | 0.10 | Critical alert if >10% SAFETY negative |
| Dip recency | 90 days | Maximum age for "recent" dip |
| Dip volume | 3 reviews | Minimum reviews to qualify as dip |
## Domain Mapping
| Domain | Code | Primitives |
|--------|------|------------|
| Output/Product | O | TASTE, CRAFT, FRESHNESS, TEMPERATURE, EFFECTIVENESS, ACCURACY, CONDITION, CONSISTENCY |
| People/Service | P | MANNER, COMPETENCE, ATTENTIVENESS, COMMUNICATION |
| Journey/Process | J | SPEED, FRICTION, RELIABILITY, AVAILABILITY |
| Environment | E | CLEANLINESS, COMFORT, SAFETY, AMBIANCE, ACCESSIBILITY, DIGITAL_UX |
| Value | V | PRICE_LEVEL, PRICE_FAIRNESS, PRICE_TRANSPARENCY, VALUE_FOR_MONEY |
| Meta | meta | HONESTY, ETHICS, PROMISES, ACKNOWLEDGMENT, RESPONSE_QUALITY, RECOVERY, RETURN_INTENT, RECOMMEND, RECOGNITION, UNMAPPED, NON_INFORMATIVE |
## Testing
```bash
cd packages/reviewiq-pipeline
python -m pytest tests/test_executive_summary.py -v
```
16 tests covering:
- Negative driver priority over dips
- Qualifying dip selection (90 days + review_count ≥ 3)
- Most recent dip when multiple qualify
- Contradiction detection (dip + "no major issues")
- Non-qualifying dips not cited as "recent"
- Summary input construction
## Environment Variables
| Variable | Required | Description |
|----------|----------|-------------|
| DATABASE_URL | Yes | PostgreSQL connection string |
| OPENAI_API_KEY | No | Required for LLM summary (fallback used if missing) |
## Example Output
```bash
$ python scripts/reputation_report.py --business "Go Karts Mar Menor" --days 365 --quiet
Report written to stdout
============================================================
REPUTATION REPORT: Go Karts Mar Menor
============================================================
Window: 2025-01-31T12:00:00Z - 2026-01-31T12:00:00Z
Reviews: 27
Content spans: 78
Overall score: 85.3
Positive share: 89.7%
Negative share: 7.7%
Top positive drivers:
VALUE_FOR_MONEY: 14.7% impact
RECOMMEND: 14.5% impact
MANNER: 13.5% impact
Top negative drivers:
============================================================
```
## Changelog
### v8 (2026-01-31)
- Initial production release
- Cross-business join safety
- Score formula alignment
- Executive summary with narrative guardrails
- Comprehensive test suite