Files
whyrating-engine-legacy/.artifacts/ReviewIQ-Pipeline-DevGuide.md
Alejandro Gutiérrez acd3b22e88 docs: Add pipeline development artifacts for parallel implementation
New artifacts:
- ReviewIQ-Pipeline-DevGuide.md: Entry point for pipeline work
- ReviewIQ-Pipeline-Contracts-v1.md: Stage I/O specs, validation rules, test fixtures
- ReviewIQ-Pipeline-Checklist.md: Per-stage implementation checklists
- ReviewIQ-Codebase-Overview.md: File structure, integration points
- ReviewIQ-v3.2.1-Taxonomy-Versioning.md: Taxonomy versioning addendum

Updated:
- ReviewIQ-v32-Decisions.md: Added B2 audit findings, taxonomy versioning decisions, pipeline status

These artifacts enable parallel development of pipeline stages 1-4 with:
- Independent validation (35 rules across stages)
- Clear input/output contracts
- Test fixtures for each stage
- Definition of done criteria

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 17:08:40 +00:00

10 KiB

ReviewIQ Pipeline Development Guide

Purpose: Entry point for agents implementing the enrichment pipeline Last Updated: 2026-01-24


TL;DR - Current State

Pipeline Implementation: ~55% complete

✅ WORKING                          ❌ NOT IMPLEMENTED
──────────                          ──────────────────
Google Maps scraping                Stage 1: Normalization
Job orchestration                   Stage 2: LLM Classification
Chrome worker pool                  Stage 3: Issue Routing
Webhook delivery                    Stage 4: Fact Aggregation
SSE streaming                       Enrichment database schema
Frontend (job management)           Advanced analytics UI

Estimated effort to 100%: 6-8 weeks


Cold Start Instructions

A new agent should:

Step Action Time
1 Read this file (ReviewIQ-Pipeline-DevGuide.md) 2 min
2 Read ReviewIQ-v32-Decisions.md 5 min
3 Read ReviewIQ-Codebase-Overview.md 10 min
4 Read assigned stage in ReviewIQ-Pipeline-Contracts-v1.md 15 min
5 Use ReviewIQ-Pipeline-Checklist.md to verify completion Reference

Document Map

                       ┌─────────────────────────────────────┐
                       │  ReviewIQ-Pipeline-DevGuide.md      │
                       │         (YOU ARE HERE)              │
                       └─────────────────┬───────────────────┘
                                         │
           ┌─────────────────────────────┼─────────────────────────────┐
           │                             │                             │
           ▼                             ▼                             ▼
┌─────────────────────┐    ┌─────────────────────────┐    ┌─────────────────────┐
│ CONTEXT RECOVERY    │    │    IMPLEMENTATION       │    │    REFERENCE        │
├─────────────────────┤    ├─────────────────────────┤    ├─────────────────────┤
│                     │    │                         │    │                     │
│ ReviewIQ-v32-       │    │ Pipeline-Contracts-v1   │    │ Architecture-v3.2   │
│ Decisions.md        │    │ (I/O specs, validation) │    │ (full DDL spec)     │
│ (key decisions,     │    │                         │    │                     │
│ markpoint)          │    │ Pipeline-Checklist      │    │ v3.2.1-Taxonomy-    │
│                     │    │ (implementation tasks)  │    │ Versioning          │
│ Codebase-Overview   │    │                         │    │ (versioning spec)   │
│ (file structure,    │    │ LLM-Classification-     │    │                     │
│ integration points) │    │ Contract-v1             │    │ URT-v5.1-Reference  │
│                     │    │ (prompt engineering)    │    │ (dimension codes)   │
└─────────────────────┘    └─────────────────────────┘    └─────────────────────┘

Core Documents

Context & Status (Read First)

File Purpose Est. Read Time
ReviewIQ-Pipeline-DevGuide.md Entry point, document map 2 min
ReviewIQ-v32-Decisions.md Key decisions, current markpoint 5 min
ReviewIQ-Codebase-Overview.md File structure, what code exists, integration points 10 min

Implementation Guides (For Building)

File Purpose Est. Read Time
ReviewIQ-Pipeline-Contracts-v1.md Stage I/O specs, validation rules, test fixtures 15 min
ReviewIQ-Pipeline-Checklist.md Per-stage implementation checklist, definition of done 5 min
LLM-Classification-Contract-v1.md LLM prompt engineering spec (Stage 2) 10 min

Full Specifications (Reference)

File Purpose When to Read
ReviewIQ-Architecture-v3.2.md Complete v3.2 spec with DDL Schema details
ReviewIQ-v3.2.1-Taxonomy-Versioning.md Taxonomy versioning addendum Future-proofing
URT-v5.1-Reference.md URT dimension codes reference Classification reference

Legacy (Superseded - Reference Only)

File Note
ReviewIQ-Architecture-v2.md Superseded by v3.2
ReviewIQ-Architecture-v3.md Superseded by v3.2
ReviewIQ-Architecture-v3.1.md Superseded by v3.2
CONTEXT-KEEPER.md Use ReviewIQ-v32-Decisions.md instead

What's Captured in Artifacts

Context Document
Key architectural decisions ReviewIQ-v32-Decisions.md
Current implementation status (~55%) ReviewIQ-Codebase-Overview.md
Existing file structure ReviewIQ-Codebase-Overview.md
Integration points (where new code connects) ReviewIQ-Codebase-Overview.md
Stage input/output contracts ReviewIQ-Pipeline-Contracts-v1.md
Validation rules (35 total across stages) ReviewIQ-Pipeline-Contracts-v1.md
Test fixtures (5 sample JSON payloads) ReviewIQ-Pipeline-Contracts-v1.md
Implementation checklists ReviewIQ-Pipeline-Checklist.md
Definition of done per stage ReviewIQ-Pipeline-Checklist.md
LLM prompt specification LLM-Classification-Contract-v1.md
URT taxonomy codes URT-v5.1-Reference.md
Full database DDL ReviewIQ-Architecture-v3.2.md
Taxonomy versioning schema ReviewIQ-v3.2.1-Taxonomy-Versioning.md

Pipeline Stages

Stage Name Status Contract Section Validation Rules
0 Raw Ingestion Done Pipeline-Contracts § Stage 0 V0.1-V0.5
1 Normalization TODO Pipeline-Contracts § Stage 1 V1.1-V1.6
2 LLM Classification TODO Pipeline-Contracts § Stage 2 V2.1-V2.12
3 Issue Routing TODO Pipeline-Contracts § Stage 3 V3.1-V3.5
4 Fact Aggregation TODO Pipeline-Contracts § Stage 4 V4.1-V4.7

Parallel Development Assignment

Agent 1 - Stage 1 (Normalization)

Read:
  - ReviewIQ-Pipeline-Contracts-v1.md § Stage 1
  - ReviewIQ-Codebase-Overview.md (integration points)

Create:
  - pipeline/stage1_normalize.py
  - migrations/005_create_reviews_tables.sql
  - pipeline/tests/test_stage1.py

Validate:
  - V1.1-V1.6 rules pass
  - Integration test: Stage 0 → Stage 1 passes

Agent 2 - Stage 2 (LLM Classification)

Read:
  - ReviewIQ-Pipeline-Contracts-v1.md § Stage 2
  - LLM-Classification-Contract-v1.md
  - URT-v5.1-Reference.md

Create:
  - pipeline/stage2_classify.py
  - pipeline/llm_client.py
  - pipeline/span_extractor.py
  - migrations/006_create_spans_table.sql
  - migrations/007_create_urt_enums.sql
  - pipeline/tests/test_stage2.py

Validate:
  - V2.1-V2.12 rules pass
  - Integration test: Stage 1 → Stage 2 passes

Agent 3 - Stage 3 (Issue Routing)

Read:
  - ReviewIQ-Pipeline-Contracts-v1.md § Stage 3
  - ReviewIQ-Architecture-v3.2.md § Part 5 (issue lifecycle)

Create:
  - pipeline/stage3_route.py
  - pipeline/issue_manager.py
  - migrations/008_create_issues_tables.sql
  - pipeline/tests/test_stage3.py

Validate:
  - V3.1-V3.5 rules pass
  - Integration test: Stage 2 → Stage 3 passes

Agent 4 - Stage 4 (Fact Aggregation)

Read:
  - ReviewIQ-Pipeline-Contracts-v1.md § Stage 4
  - ReviewIQ-Architecture-v3.2.md § Part 6 (analytics)

Create:
  - pipeline/stage4_aggregate.py
  - migrations/009_create_facts_table.sql
  - pipeline/tests/test_stage4.py

Validate:
  - V4.1-V4.7 rules pass
  - E2E pipeline test passes

Success Criteria

Pipeline is complete when:

python -m pipeline.validate --job-id <JOB_ID> --verbose

# Expected output:
Stage 0: ✅ PASS (5/5 rules)
Stage 1: ✅ PASS (6/6 rules)
Stage 2: ✅ PASS (12/12 rules)
Stage 3: ✅ PASS (5/5 rules)
Stage 4: ✅ PASS (7/7 rules)
E2E Integration: ✅ PASS

Quick Commands

# Check current branch
git branch --show-current
# Expected: feature/platform-restructure

# View recent commits
git log --oneline -5

# Start database
docker-compose -f docker-compose.production.yml up -d postgres

# Run API server
python api_server_production.py

# Run frontend
cd frontend && npm run dev

# Run migrations (when created)
psql $DATABASE_URL -f migrations/005_create_reviews_tables.sql

# Run tests
pytest pipeline/tests/ -v

# Validate pipeline
python -m pipeline.validate --job-id <JOB_ID>

Environment Variables

# Database (required)
DATABASE_URL=postgresql://user:pass@localhost:5432/reviewiq

# LLM Provider (Stage 2)
OPENAI_API_KEY=sk-...
# OR
ANTHROPIC_API_KEY=sk-ant-...

# Embedding model (Stage 2)
EMBEDDING_MODEL=all-MiniLM-L6-v2

# Taxonomy version
DEFAULT_TAXONOMY_VERSION=v5.1

File Structure After Implementation

google-reviews-scraper-pro/
├── .artifacts/                    # ← Design documents
│   ├── ReviewIQ-Pipeline-DevGuide.md  # ← START HERE (for pipeline work)
│   ├── ReviewIQ-v32-Decisions.md
│   ├── ReviewIQ-Codebase-Overview.md
│   ├── ReviewIQ-Pipeline-Contracts-v1.md
│   ├── ReviewIQ-Pipeline-Checklist.md
│   └── ...
│
├── api_server_production.py       # ✅ Exists - Main API
├── core/database.py               # ✅ Exists - DB layer
├── scrapers/google_reviews/       # ✅ Exists - Scraper
│
├── pipeline/                      # ❌ TO CREATE
│   ├── stage1_normalize.py
│   ├── stage2_classify.py
│   ├── stage3_route.py
│   ├── stage4_aggregate.py
│   ├── llm_client.py
│   └── tests/
│
└── migrations/
    ├── 001-004                    # ✅ Exists
    └── 005-009                    # ❌ TO CREATE

Keep this guide updated when adding new artifacts or completing stages.