docs(reputation-report): Add comprehensive pipeline documentation
Documents: - Data flow and architecture - CLI options and usage - Output schema with examples - Scoring formulas - Production guardrails - Thresholds and domain mapping - Testing instructions Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
387
packages/reviewiq-pipeline/docs/REPUTATION_REPORT.md
Normal file
387
packages/reviewiq-pipeline/docs/REPUTATION_REPORT.md
Normal file
@@ -0,0 +1,387 @@
|
|||||||
|
# Reputation Report Pipeline
|
||||||
|
|
||||||
|
**Version:** 1.0 (v8)
|
||||||
|
**Status:** Production-ready
|
||||||
|
**Location:** `packages/reviewiq-pipeline/scripts/reputation_report.py`
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The Reputation Report generates business-facing, time-windowed reputation analytics from classified review spans. It produces a €50-value report suitable for SMB business owners, including:
|
||||||
|
|
||||||
|
- Overall performance score (0-100 scale)
|
||||||
|
- Domain and primitive breakdowns
|
||||||
|
- Positive and negative drivers with evidence
|
||||||
|
- Time comparisons (current vs previous period)
|
||||||
|
- Sector benchmarks
|
||||||
|
- Timeline visualization data
|
||||||
|
- LLM-generated executive summary
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Basic usage (last 365 days)
|
||||||
|
python scripts/reputation_report.py --business "Go Karts Mar Menor" --days 365
|
||||||
|
|
||||||
|
# With output file
|
||||||
|
python scripts/reputation_report.py --business "Business Name" --days 30 --output report.json
|
||||||
|
|
||||||
|
# Custom date range
|
||||||
|
python scripts/reputation_report.py --business "Business Name" --start 2025-01-01 --end 2025-12-31
|
||||||
|
|
||||||
|
# Production mode (fail if LLM summary fails)
|
||||||
|
python scripts/reputation_report.py --business "Business Name" --days 365 --require-summary
|
||||||
|
```
|
||||||
|
|
||||||
|
## CLI Options
|
||||||
|
|
||||||
|
| Option | Description | Default |
|
||||||
|
|--------|-------------|---------|
|
||||||
|
| `--business` | Business ID or search pattern (required) | - |
|
||||||
|
| `--days` | Last N days to analyze | 30 |
|
||||||
|
| `--start` | Window start (ISO-8601) | - |
|
||||||
|
| `--end` | Window end (ISO-8601) | - |
|
||||||
|
| `--run-id` | Specific run ID (overrides time window) | - |
|
||||||
|
| `--timezone` | IANA timezone for window | UTC |
|
||||||
|
| `--output, -o` | Output file path | stdout |
|
||||||
|
| `--quiet, -q` | Suppress console summary | false |
|
||||||
|
| `--no-summary` | Disable executive summary | false |
|
||||||
|
| `--require-summary` | Exit code 2 if LLM fails | false |
|
||||||
|
| `--summary-model` | LLM model for summary | gpt-4o-mini |
|
||||||
|
|
||||||
|
## Data Flow
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ INPUT │
|
||||||
|
├─────────────────────────────────────────────────────────────────┤
|
||||||
|
│ detected_spans_v2 ←──JOIN──→ review_facts_v1 │
|
||||||
|
│ (primitives, valence, (review_time_utc, rating, │
|
||||||
|
│ confidence, intensity) business_id) │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ SPAN SELECTION │
|
||||||
|
├─────────────────────────────────────────────────────────────────┤
|
||||||
|
│ Mode: time_window │
|
||||||
|
│ → Filter by review_time_utc in [start, end) │
|
||||||
|
│ → Join on (review_id, business_id) for data isolation │
|
||||||
|
│ │
|
||||||
|
│ Mode: latest_run │
|
||||||
|
│ → Filter by run_id │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ COMPUTATION │
|
||||||
|
├─────────────────────────────────────────────────────────────────┤
|
||||||
|
│ 1. Population stats (review count, language distribution) │
|
||||||
|
│ 2. Overall score: 100 × Σ(valence × conf × intensity) / Σ(...)│
|
||||||
|
│ 3. Domain scores (O/P/J/E/V weighted averages) │
|
||||||
|
│ 4. Primitive scores (per-primitive breakdown) │
|
||||||
|
│ 5. Drivers (impact = weighted share of total) │
|
||||||
|
│ 6. Alerts (SAFETY, UNMAPPED thresholds) │
|
||||||
|
│ 7. Recommendations (templated playbooks) │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ TIME COMPARISONS │
|
||||||
|
├─────────────────────────────────────────────────────────────────┤
|
||||||
|
│ Previous Window: │
|
||||||
|
│ → Same duration, immediately preceding current │
|
||||||
|
│ → Requires MIN_REVIEWS_FOR_COMPARISON (10) │
|
||||||
|
│ → Requires MIN_COVERAGE_FOR_COMPARISON (80%) │
|
||||||
|
│ │
|
||||||
|
│ Sector Benchmark: │
|
||||||
|
│ → Requires 500+ spans, 3+ businesses in sector │
|
||||||
|
│ → Status: ok | insufficient_data | missing_sector_code │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ EXECUTIVE SUMMARY │
|
||||||
|
├─────────────────────────────────────────────────────────────────┤
|
||||||
|
│ LLM-generated (gpt-4o-mini) with narrative guardrails: │
|
||||||
|
│ │
|
||||||
|
│ Weakness Priority: │
|
||||||
|
│ 1. Negative driver (if drivers.negatives non-empty) │
|
||||||
|
│ 2. Qualifying dip (within 90d, review_count ≥ 3) │
|
||||||
|
│ 3. None ("no persistent weaknesses surfaced") │
|
||||||
|
│ │
|
||||||
|
│ Guardrails: │
|
||||||
|
│ - No "recent dip" + "no major issues" contradiction │
|
||||||
|
│ - Most recent qualifying dip if multiple exist │
|
||||||
|
│ - Action must tie to cited weakness or top positive │
|
||||||
|
│ │
|
||||||
|
│ Fallback: Deterministic summary if LLM unavailable │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────────┐
|
||||||
|
│ OUTPUT │
|
||||||
|
├─────────────────────────────────────────────────────────────────┤
|
||||||
|
│ JSON Report (schema_version: 1.0) │
|
||||||
|
│ - business, window, population │
|
||||||
|
│ - scores (overall, domains, primitives) │
|
||||||
|
│ - drivers (positives, negatives with evidence) │
|
||||||
|
│ - alerts, recommendations │
|
||||||
|
│ - comparisons (previous_window, sector_benchmark) │
|
||||||
|
│ - timeline (granularity, points) │
|
||||||
|
│ - executive_summary, executive_summary_meta │
|
||||||
|
└─────────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## Output Schema
|
||||||
|
|
||||||
|
### Top-Level Fields
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"schema_version": "1.0",
|
||||||
|
"report_id": "uuid",
|
||||||
|
"primary_run_id": "uuid | null",
|
||||||
|
"generated_at": "2026-01-31T12:00:00Z",
|
||||||
|
"window": { "start", "end", "timezone", "mode" },
|
||||||
|
"business": { "business_id", "sector_code", "gbp_path" },
|
||||||
|
"population": { ... },
|
||||||
|
"scores": { "overall", "domains", "primitives" },
|
||||||
|
"drivers": { "positives", "negatives" },
|
||||||
|
"alerts": [ ... ],
|
||||||
|
"recommendations": [ ... ],
|
||||||
|
"comparisons": { "previous_window", "sector_benchmark" },
|
||||||
|
"timeline": { "granularity", "points" },
|
||||||
|
"executive_summary": "string | null",
|
||||||
|
"executive_summary_meta": { "enabled", "generated", "model", "error", "fallback_used" }
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Scores Structure
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"overall": {
|
||||||
|
"score": 85.3,
|
||||||
|
"score_domain_weighted": 85.7,
|
||||||
|
"positive_share": 0.897,
|
||||||
|
"negative_share": 0.077,
|
||||||
|
"mixed_share": 0.013,
|
||||||
|
"neutral_share": 0.013
|
||||||
|
},
|
||||||
|
"domains": {
|
||||||
|
"O": { "score": 100.0, "volume": 10 },
|
||||||
|
"P": { "score": 86.2, "volume": 17 },
|
||||||
|
"J": { "score": -23.4, "volume": 5 },
|
||||||
|
"E": { "score": 94.8, "volume": 35 },
|
||||||
|
"V": { "score": 100.0, "volume": 10 }
|
||||||
|
},
|
||||||
|
"primitives": {
|
||||||
|
"VALUE_FOR_MONEY": {
|
||||||
|
"domain": "V",
|
||||||
|
"score": 100.0,
|
||||||
|
"volume": 10,
|
||||||
|
"valence_counts": { "+": 10, "-": 0, "0": 0, "±": 0 },
|
||||||
|
"top_entities": [ ... ]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Driver Structure
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"positives": [
|
||||||
|
{
|
||||||
|
"primitive": "VALUE_FOR_MONEY",
|
||||||
|
"impact": 0.147,
|
||||||
|
"summary": "Positive V/VALUE_FOR_MONEY mentions.",
|
||||||
|
"evidence": [
|
||||||
|
{
|
||||||
|
"review_id": "abc123",
|
||||||
|
"language": "en",
|
||||||
|
"span_text": "the prices are super affordable.",
|
||||||
|
"valence": "+",
|
||||||
|
"intensity": 2,
|
||||||
|
"confidence": 0.9
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"negatives": [ ... ]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Timeline Structure
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"granularity": "month",
|
||||||
|
"points": [
|
||||||
|
{
|
||||||
|
"bucket_start_utc": "2025-12-01T00:00:00Z",
|
||||||
|
"review_count": 8,
|
||||||
|
"span_count": 25,
|
||||||
|
"positive_count": 15,
|
||||||
|
"negative_count": 8,
|
||||||
|
"avg_rating": 2.88,
|
||||||
|
"strength_score": -32.6
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Production Guardrails
|
||||||
|
|
||||||
|
### Data Isolation
|
||||||
|
|
||||||
|
All queries join `detected_spans_v2` with `review_facts_v1` on **both** `review_id` AND `business_id` to prevent cross-business contamination:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
JOIN pipeline.review_facts_v1 f
|
||||||
|
ON f.review_id = s.review_id
|
||||||
|
AND f.business_id = s.business_id
|
||||||
|
```
|
||||||
|
|
||||||
|
### Score Consistency
|
||||||
|
|
||||||
|
An invariant check ensures `scores.overall.score` matches `comparisons.previous_window.scores.overall.current`. If delta > 1.0, an `internal_inconsistency` alert is emitted.
|
||||||
|
|
||||||
|
### Executive Summary Meta
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"enabled": true,
|
||||||
|
"generated": true,
|
||||||
|
"model": "gpt-4o-mini",
|
||||||
|
"error": null,
|
||||||
|
"generated_at": "2026-01-31T12:00:00Z",
|
||||||
|
"fallback_used": false
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
- `enabled`: Whether summary generation was requested
|
||||||
|
- `generated`: Whether LLM successfully produced a summary
|
||||||
|
- `error`: Error message if generation failed
|
||||||
|
- `fallback_used`: Whether deterministic fallback was used
|
||||||
|
|
||||||
|
### Exit Codes
|
||||||
|
|
||||||
|
| Code | Meaning |
|
||||||
|
|------|---------|
|
||||||
|
| 0 | Success |
|
||||||
|
| 1 | Business not found or no spans |
|
||||||
|
| 2 | `--require-summary` and LLM failed |
|
||||||
|
|
||||||
|
## Scoring Formula
|
||||||
|
|
||||||
|
### Overall Score
|
||||||
|
|
||||||
|
Same formula as `PERIOD_SCORES_QUERY` for consistency:
|
||||||
|
|
||||||
|
```
|
||||||
|
score = 100 × Σ(valence × confidence × intensity) / Σ(confidence × intensity)
|
||||||
|
```
|
||||||
|
|
||||||
|
Where:
|
||||||
|
- valence: +1 for positive, -1 for negative, 0 for neutral/mixed
|
||||||
|
- confidence: 0.0 to 1.0
|
||||||
|
- intensity: 1 to 3
|
||||||
|
|
||||||
|
### Domain-Weighted Score
|
||||||
|
|
||||||
|
Alternative metric (exposed as `score_domain_weighted`):
|
||||||
|
|
||||||
|
```
|
||||||
|
score = Σ(domain_score × domain_volume) / Σ(domain_volume)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Primitive Score
|
||||||
|
|
||||||
|
```
|
||||||
|
score = 100 × Σ(w × valence_num) / Σ(w)
|
||||||
|
w = confidence × (0.75 + 0.25×(detail-1)) × (0.8 + 0.2×(intensity-1))
|
||||||
|
```
|
||||||
|
|
||||||
|
## Thresholds
|
||||||
|
|
||||||
|
| Threshold | Value | Purpose |
|
||||||
|
|-----------|-------|---------|
|
||||||
|
| MIN_REVIEWS_FOR_COMPARISON | 10 | Minimum reviews per period for trend |
|
||||||
|
| MIN_COVERAGE_FOR_COMPARISON | 0.80 | Minimum review_time coverage |
|
||||||
|
| Sector benchmark spans | 500 | Minimum sector spans for benchmark |
|
||||||
|
| Sector benchmark businesses | 3 | Minimum businesses in sector |
|
||||||
|
| UNMAPPED rate warn | 0.10 | Alert if >10% unmapped |
|
||||||
|
| UNMAPPED rate critical | 0.15 | Critical alert if >15% unmapped |
|
||||||
|
| SAFETY negative warn | 0.05 | Alert if >5% SAFETY negative |
|
||||||
|
| SAFETY negative critical | 0.10 | Critical alert if >10% SAFETY negative |
|
||||||
|
| Dip recency | 90 days | Maximum age for "recent" dip |
|
||||||
|
| Dip volume | 3 reviews | Minimum reviews to qualify as dip |
|
||||||
|
|
||||||
|
## Domain Mapping
|
||||||
|
|
||||||
|
| Domain | Code | Primitives |
|
||||||
|
|--------|------|------------|
|
||||||
|
| Output/Product | O | TASTE, CRAFT, FRESHNESS, TEMPERATURE, EFFECTIVENESS, ACCURACY, CONDITION, CONSISTENCY |
|
||||||
|
| People/Service | P | MANNER, COMPETENCE, ATTENTIVENESS, COMMUNICATION |
|
||||||
|
| Journey/Process | J | SPEED, FRICTION, RELIABILITY, AVAILABILITY |
|
||||||
|
| Environment | E | CLEANLINESS, COMFORT, SAFETY, AMBIANCE, ACCESSIBILITY, DIGITAL_UX |
|
||||||
|
| Value | V | PRICE_LEVEL, PRICE_FAIRNESS, PRICE_TRANSPARENCY, VALUE_FOR_MONEY |
|
||||||
|
| Meta | meta | HONESTY, ETHICS, PROMISES, ACKNOWLEDGMENT, RESPONSE_QUALITY, RECOVERY, RETURN_INTENT, RECOMMEND, RECOGNITION, UNMAPPED, NON_INFORMATIVE |
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd packages/reviewiq-pipeline
|
||||||
|
python -m pytest tests/test_executive_summary.py -v
|
||||||
|
```
|
||||||
|
|
||||||
|
16 tests covering:
|
||||||
|
- Negative driver priority over dips
|
||||||
|
- Qualifying dip selection (90 days + review_count ≥ 3)
|
||||||
|
- Most recent dip when multiple qualify
|
||||||
|
- Contradiction detection (dip + "no major issues")
|
||||||
|
- Non-qualifying dips not cited as "recent"
|
||||||
|
- Summary input construction
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
| Variable | Required | Description |
|
||||||
|
|----------|----------|-------------|
|
||||||
|
| DATABASE_URL | Yes | PostgreSQL connection string |
|
||||||
|
| OPENAI_API_KEY | No | Required for LLM summary (fallback used if missing) |
|
||||||
|
|
||||||
|
## Example Output
|
||||||
|
|
||||||
|
```bash
|
||||||
|
$ python scripts/reputation_report.py --business "Go Karts Mar Menor" --days 365 --quiet
|
||||||
|
|
||||||
|
Report written to stdout
|
||||||
|
|
||||||
|
============================================================
|
||||||
|
REPUTATION REPORT: Go Karts Mar Menor
|
||||||
|
============================================================
|
||||||
|
Window: 2025-01-31T12:00:00Z - 2026-01-31T12:00:00Z
|
||||||
|
Reviews: 27
|
||||||
|
Content spans: 78
|
||||||
|
Overall score: 85.3
|
||||||
|
Positive share: 89.7%
|
||||||
|
Negative share: 7.7%
|
||||||
|
|
||||||
|
Top positive drivers:
|
||||||
|
VALUE_FOR_MONEY: 14.7% impact
|
||||||
|
RECOMMEND: 14.5% impact
|
||||||
|
MANNER: 13.5% impact
|
||||||
|
|
||||||
|
Top negative drivers:
|
||||||
|
============================================================
|
||||||
|
```
|
||||||
|
|
||||||
|
## Changelog
|
||||||
|
|
||||||
|
### v8 (2026-01-31)
|
||||||
|
- Initial production release
|
||||||
|
- Cross-business join safety
|
||||||
|
- Score formula alignment
|
||||||
|
- Executive summary with narrative guardrails
|
||||||
|
- Comprehensive test suite
|
||||||
Reference in New Issue
Block a user