# ReviewIQ Pipeline Development Guide **Purpose**: Entry point for agents implementing the enrichment pipeline **Last Updated**: 2026-01-24 --- ## TL;DR - Current State **Pipeline Implementation: ~55% complete** ``` ✅ WORKING ❌ NOT IMPLEMENTED ────────── ────────────────── Google Maps scraping Stage 1: Normalization Job orchestration Stage 2: LLM Classification Chrome worker pool Stage 3: Issue Routing Webhook delivery Stage 4: Fact Aggregation SSE streaming Enrichment database schema Frontend (job management) Advanced analytics UI ``` **Estimated effort to 100%**: 6-8 weeks --- ## Cold Start Instructions A new agent should: | Step | Action | Time | |------|--------|------| | 1 | Read this file (`ReviewIQ-Pipeline-DevGuide.md`) | 2 min | | 2 | Read `ReviewIQ-v32-Decisions.md` | 5 min | | 3 | Read `ReviewIQ-Codebase-Overview.md` | 10 min | | 4 | Read assigned stage in `ReviewIQ-Pipeline-Contracts-v1.md` | 15 min | | 5 | Use `ReviewIQ-Pipeline-Checklist.md` to verify completion | Reference | --- ## Document Map ``` ┌─────────────────────────────────────┐ │ ReviewIQ-Pipeline-DevGuide.md │ │ (YOU ARE HERE) │ └─────────────────┬───────────────────┘ │ ┌─────────────────────────────┼─────────────────────────────┐ │ │ │ ▼ ▼ ▼ ┌─────────────────────┐ ┌─────────────────────────┐ ┌─────────────────────┐ │ CONTEXT RECOVERY │ │ IMPLEMENTATION │ │ REFERENCE │ ├─────────────────────┤ ├─────────────────────────┤ ├─────────────────────┤ │ │ │ │ │ │ │ ReviewIQ-v32- │ │ Pipeline-Contracts-v1 │ │ Architecture-v3.2 │ │ Decisions.md │ │ (I/O specs, validation) │ │ (full DDL spec) │ │ (key decisions, │ │ │ │ │ │ markpoint) │ │ Pipeline-Checklist │ │ v3.2.1-Taxonomy- │ │ │ │ (implementation tasks) │ │ Versioning │ │ Codebase-Overview │ │ │ │ (versioning spec) │ │ (file structure, │ │ LLM-Classification- │ │ │ │ integration points) │ │ Contract-v1 │ │ URT-v5.1-Reference │ │ │ │ (prompt engineering) │ │ (dimension codes) │ └─────────────────────┘ └─────────────────────────┘ └─────────────────────┘ ``` --- ## Core Documents ### Context & Status (Read First) | File | Purpose | Est. Read Time | |------|---------|----------------| | `ReviewIQ-Pipeline-DevGuide.md` | Entry point, document map | 2 min | | `ReviewIQ-v32-Decisions.md` | Key decisions, current markpoint | 5 min | | `ReviewIQ-Codebase-Overview.md` | File structure, what code exists, integration points | 10 min | ### Implementation Guides (For Building) | File | Purpose | Est. Read Time | |------|---------|----------------| | `ReviewIQ-Pipeline-Contracts-v1.md` | Stage I/O specs, validation rules, test fixtures | 15 min | | `ReviewIQ-Pipeline-Checklist.md` | Per-stage implementation checklist, definition of done | 5 min | | `LLM-Classification-Contract-v1.md` | LLM prompt engineering spec (Stage 2) | 10 min | ### Full Specifications (Reference) | File | Purpose | When to Read | |------|---------|--------------| | `ReviewIQ-Architecture-v3.2.md` | Complete v3.2 spec with DDL | Schema details | | `ReviewIQ-v3.2.1-Taxonomy-Versioning.md` | Taxonomy versioning addendum | Future-proofing | | `URT-v5.1-Reference.md` | URT dimension codes reference | Classification reference | ### Legacy (Superseded - Reference Only) | File | Note | |------|------| | `ReviewIQ-Architecture-v2.md` | Superseded by v3.2 | | `ReviewIQ-Architecture-v3.md` | Superseded by v3.2 | | `ReviewIQ-Architecture-v3.1.md` | Superseded by v3.2 | | `CONTEXT-KEEPER.md` | Use `ReviewIQ-v32-Decisions.md` instead | --- ## What's Captured in Artifacts | Context | Document | |---------|----------| | Key architectural decisions | `ReviewIQ-v32-Decisions.md` | | Current implementation status (~55%) | `ReviewIQ-Codebase-Overview.md` | | Existing file structure | `ReviewIQ-Codebase-Overview.md` | | Integration points (where new code connects) | `ReviewIQ-Codebase-Overview.md` | | Stage input/output contracts | `ReviewIQ-Pipeline-Contracts-v1.md` | | Validation rules (35 total across stages) | `ReviewIQ-Pipeline-Contracts-v1.md` | | Test fixtures (5 sample JSON payloads) | `ReviewIQ-Pipeline-Contracts-v1.md` | | Implementation checklists | `ReviewIQ-Pipeline-Checklist.md` | | Definition of done per stage | `ReviewIQ-Pipeline-Checklist.md` | | LLM prompt specification | `LLM-Classification-Contract-v1.md` | | URT taxonomy codes | `URT-v5.1-Reference.md` | | Full database DDL | `ReviewIQ-Architecture-v3.2.md` | | Taxonomy versioning schema | `ReviewIQ-v3.2.1-Taxonomy-Versioning.md` | --- ## Pipeline Stages | Stage | Name | Status | Contract Section | Validation Rules | |-------|------|--------|------------------|------------------| | 0 | Raw Ingestion | ✅ Done | Pipeline-Contracts § Stage 0 | V0.1-V0.5 | | 1 | Normalization | ❌ TODO | Pipeline-Contracts § Stage 1 | V1.1-V1.6 | | 2 | LLM Classification | ❌ TODO | Pipeline-Contracts § Stage 2 | V2.1-V2.12 | | 3 | Issue Routing | ❌ TODO | Pipeline-Contracts § Stage 3 | V3.1-V3.5 | | 4 | Fact Aggregation | ❌ TODO | Pipeline-Contracts § Stage 4 | V4.1-V4.7 | --- ## Parallel Development Assignment ### Agent 1 - Stage 1 (Normalization) ``` Read: - ReviewIQ-Pipeline-Contracts-v1.md § Stage 1 - ReviewIQ-Codebase-Overview.md (integration points) Create: - pipeline/stage1_normalize.py - migrations/005_create_reviews_tables.sql - pipeline/tests/test_stage1.py Validate: - V1.1-V1.6 rules pass - Integration test: Stage 0 → Stage 1 passes ``` ### Agent 2 - Stage 2 (LLM Classification) ``` Read: - ReviewIQ-Pipeline-Contracts-v1.md § Stage 2 - LLM-Classification-Contract-v1.md - URT-v5.1-Reference.md Create: - pipeline/stage2_classify.py - pipeline/llm_client.py - pipeline/span_extractor.py - migrations/006_create_spans_table.sql - migrations/007_create_urt_enums.sql - pipeline/tests/test_stage2.py Validate: - V2.1-V2.12 rules pass - Integration test: Stage 1 → Stage 2 passes ``` ### Agent 3 - Stage 3 (Issue Routing) ``` Read: - ReviewIQ-Pipeline-Contracts-v1.md § Stage 3 - ReviewIQ-Architecture-v3.2.md § Part 5 (issue lifecycle) Create: - pipeline/stage3_route.py - pipeline/issue_manager.py - migrations/008_create_issues_tables.sql - pipeline/tests/test_stage3.py Validate: - V3.1-V3.5 rules pass - Integration test: Stage 2 → Stage 3 passes ``` ### Agent 4 - Stage 4 (Fact Aggregation) ``` Read: - ReviewIQ-Pipeline-Contracts-v1.md § Stage 4 - ReviewIQ-Architecture-v3.2.md § Part 6 (analytics) Create: - pipeline/stage4_aggregate.py - migrations/009_create_facts_table.sql - pipeline/tests/test_stage4.py Validate: - V4.1-V4.7 rules pass - E2E pipeline test passes ``` --- ## Success Criteria Pipeline is complete when: ```bash python -m pipeline.validate --job-id --verbose # Expected output: Stage 0: ✅ PASS (5/5 rules) Stage 1: ✅ PASS (6/6 rules) Stage 2: ✅ PASS (12/12 rules) Stage 3: ✅ PASS (5/5 rules) Stage 4: ✅ PASS (7/7 rules) E2E Integration: ✅ PASS ``` --- ## Quick Commands ```bash # Check current branch git branch --show-current # Expected: feature/platform-restructure # View recent commits git log --oneline -5 # Start database docker-compose -f docker-compose.production.yml up -d postgres # Run API server python api_server_production.py # Run frontend cd frontend && npm run dev # Run migrations (when created) psql $DATABASE_URL -f migrations/005_create_reviews_tables.sql # Run tests pytest pipeline/tests/ -v # Validate pipeline python -m pipeline.validate --job-id ``` --- ## Environment Variables ```bash # Database (required) DATABASE_URL=postgresql://user:pass@localhost:5432/reviewiq # LLM Provider (Stage 2) OPENAI_API_KEY=sk-... # OR ANTHROPIC_API_KEY=sk-ant-... # Embedding model (Stage 2) EMBEDDING_MODEL=all-MiniLM-L6-v2 # Taxonomy version DEFAULT_TAXONOMY_VERSION=v5.1 ``` --- ## File Structure After Implementation ``` google-reviews-scraper-pro/ ├── .artifacts/ # ← Design documents │ ├── ReviewIQ-Pipeline-DevGuide.md # ← START HERE (for pipeline work) │ ├── ReviewIQ-v32-Decisions.md │ ├── ReviewIQ-Codebase-Overview.md │ ├── ReviewIQ-Pipeline-Contracts-v1.md │ ├── ReviewIQ-Pipeline-Checklist.md │ └── ... │ ├── api_server_production.py # ✅ Exists - Main API ├── core/database.py # ✅ Exists - DB layer ├── scrapers/google_reviews/ # ✅ Exists - Scraper │ ├── pipeline/ # ❌ TO CREATE │ ├── stage1_normalize.py │ ├── stage2_classify.py │ ├── stage3_route.py │ ├── stage4_aggregate.py │ ├── llm_client.py │ └── tests/ │ └── migrations/ ├── 001-004 # ✅ Exists └── 005-009 # ❌ TO CREATE ``` --- *Keep this guide updated when adding new artifacts or completing stages.*