New artifacts: - ReviewIQ-Pipeline-DevGuide.md: Entry point for pipeline work - ReviewIQ-Pipeline-Contracts-v1.md: Stage I/O specs, validation rules, test fixtures - ReviewIQ-Pipeline-Checklist.md: Per-stage implementation checklists - ReviewIQ-Codebase-Overview.md: File structure, integration points - ReviewIQ-v3.2.1-Taxonomy-Versioning.md: Taxonomy versioning addendum Updated: - ReviewIQ-v32-Decisions.md: Added B2 audit findings, taxonomy versioning decisions, pipeline status These artifacts enable parallel development of pipeline stages 1-4 with: - Independent validation (35 rules across stages) - Clear input/output contracts - Test fixtures for each stage - Definition of done criteria Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
10 KiB
10 KiB
ReviewIQ Pipeline Development Guide
Purpose: Entry point for agents implementing the enrichment pipeline Last Updated: 2026-01-24
TL;DR - Current State
Pipeline Implementation: ~55% complete
✅ WORKING ❌ NOT IMPLEMENTED
────────── ──────────────────
Google Maps scraping Stage 1: Normalization
Job orchestration Stage 2: LLM Classification
Chrome worker pool Stage 3: Issue Routing
Webhook delivery Stage 4: Fact Aggregation
SSE streaming Enrichment database schema
Frontend (job management) Advanced analytics UI
Estimated effort to 100%: 6-8 weeks
Cold Start Instructions
A new agent should:
| Step | Action | Time |
|---|---|---|
| 1 | Read this file (ReviewIQ-Pipeline-DevGuide.md) |
2 min |
| 2 | Read ReviewIQ-v32-Decisions.md |
5 min |
| 3 | Read ReviewIQ-Codebase-Overview.md |
10 min |
| 4 | Read assigned stage in ReviewIQ-Pipeline-Contracts-v1.md |
15 min |
| 5 | Use ReviewIQ-Pipeline-Checklist.md to verify completion |
Reference |
Document Map
┌─────────────────────────────────────┐
│ ReviewIQ-Pipeline-DevGuide.md │
│ (YOU ARE HERE) │
└─────────────────┬───────────────────┘
│
┌─────────────────────────────┼─────────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────────┐ ┌─────────────────────────┐ ┌─────────────────────┐
│ CONTEXT RECOVERY │ │ IMPLEMENTATION │ │ REFERENCE │
├─────────────────────┤ ├─────────────────────────┤ ├─────────────────────┤
│ │ │ │ │ │
│ ReviewIQ-v32- │ │ Pipeline-Contracts-v1 │ │ Architecture-v3.2 │
│ Decisions.md │ │ (I/O specs, validation) │ │ (full DDL spec) │
│ (key decisions, │ │ │ │ │
│ markpoint) │ │ Pipeline-Checklist │ │ v3.2.1-Taxonomy- │
│ │ │ (implementation tasks) │ │ Versioning │
│ Codebase-Overview │ │ │ │ (versioning spec) │
│ (file structure, │ │ LLM-Classification- │ │ │
│ integration points) │ │ Contract-v1 │ │ URT-v5.1-Reference │
│ │ │ (prompt engineering) │ │ (dimension codes) │
└─────────────────────┘ └─────────────────────────┘ └─────────────────────┘
Core Documents
Context & Status (Read First)
| File | Purpose | Est. Read Time |
|---|---|---|
ReviewIQ-Pipeline-DevGuide.md |
Entry point, document map | 2 min |
ReviewIQ-v32-Decisions.md |
Key decisions, current markpoint | 5 min |
ReviewIQ-Codebase-Overview.md |
File structure, what code exists, integration points | 10 min |
Implementation Guides (For Building)
| File | Purpose | Est. Read Time |
|---|---|---|
ReviewIQ-Pipeline-Contracts-v1.md |
Stage I/O specs, validation rules, test fixtures | 15 min |
ReviewIQ-Pipeline-Checklist.md |
Per-stage implementation checklist, definition of done | 5 min |
LLM-Classification-Contract-v1.md |
LLM prompt engineering spec (Stage 2) | 10 min |
Full Specifications (Reference)
| File | Purpose | When to Read |
|---|---|---|
ReviewIQ-Architecture-v3.2.md |
Complete v3.2 spec with DDL | Schema details |
ReviewIQ-v3.2.1-Taxonomy-Versioning.md |
Taxonomy versioning addendum | Future-proofing |
URT-v5.1-Reference.md |
URT dimension codes reference | Classification reference |
Legacy (Superseded - Reference Only)
| File | Note |
|---|---|
ReviewIQ-Architecture-v2.md |
Superseded by v3.2 |
ReviewIQ-Architecture-v3.md |
Superseded by v3.2 |
ReviewIQ-Architecture-v3.1.md |
Superseded by v3.2 |
CONTEXT-KEEPER.md |
Use ReviewIQ-v32-Decisions.md instead |
What's Captured in Artifacts
| Context | Document |
|---|---|
| Key architectural decisions | ReviewIQ-v32-Decisions.md |
| Current implementation status (~55%) | ReviewIQ-Codebase-Overview.md |
| Existing file structure | ReviewIQ-Codebase-Overview.md |
| Integration points (where new code connects) | ReviewIQ-Codebase-Overview.md |
| Stage input/output contracts | ReviewIQ-Pipeline-Contracts-v1.md |
| Validation rules (35 total across stages) | ReviewIQ-Pipeline-Contracts-v1.md |
| Test fixtures (5 sample JSON payloads) | ReviewIQ-Pipeline-Contracts-v1.md |
| Implementation checklists | ReviewIQ-Pipeline-Checklist.md |
| Definition of done per stage | ReviewIQ-Pipeline-Checklist.md |
| LLM prompt specification | LLM-Classification-Contract-v1.md |
| URT taxonomy codes | URT-v5.1-Reference.md |
| Full database DDL | ReviewIQ-Architecture-v3.2.md |
| Taxonomy versioning schema | ReviewIQ-v3.2.1-Taxonomy-Versioning.md |
Pipeline Stages
| Stage | Name | Status | Contract Section | Validation Rules |
|---|---|---|---|---|
| 0 | Raw Ingestion | ✅ Done | Pipeline-Contracts § Stage 0 | V0.1-V0.5 |
| 1 | Normalization | ❌ TODO | Pipeline-Contracts § Stage 1 | V1.1-V1.6 |
| 2 | LLM Classification | ❌ TODO | Pipeline-Contracts § Stage 2 | V2.1-V2.12 |
| 3 | Issue Routing | ❌ TODO | Pipeline-Contracts § Stage 3 | V3.1-V3.5 |
| 4 | Fact Aggregation | ❌ TODO | Pipeline-Contracts § Stage 4 | V4.1-V4.7 |
Parallel Development Assignment
Agent 1 - Stage 1 (Normalization)
Read:
- ReviewIQ-Pipeline-Contracts-v1.md § Stage 1
- ReviewIQ-Codebase-Overview.md (integration points)
Create:
- pipeline/stage1_normalize.py
- migrations/005_create_reviews_tables.sql
- pipeline/tests/test_stage1.py
Validate:
- V1.1-V1.6 rules pass
- Integration test: Stage 0 → Stage 1 passes
Agent 2 - Stage 2 (LLM Classification)
Read:
- ReviewIQ-Pipeline-Contracts-v1.md § Stage 2
- LLM-Classification-Contract-v1.md
- URT-v5.1-Reference.md
Create:
- pipeline/stage2_classify.py
- pipeline/llm_client.py
- pipeline/span_extractor.py
- migrations/006_create_spans_table.sql
- migrations/007_create_urt_enums.sql
- pipeline/tests/test_stage2.py
Validate:
- V2.1-V2.12 rules pass
- Integration test: Stage 1 → Stage 2 passes
Agent 3 - Stage 3 (Issue Routing)
Read:
- ReviewIQ-Pipeline-Contracts-v1.md § Stage 3
- ReviewIQ-Architecture-v3.2.md § Part 5 (issue lifecycle)
Create:
- pipeline/stage3_route.py
- pipeline/issue_manager.py
- migrations/008_create_issues_tables.sql
- pipeline/tests/test_stage3.py
Validate:
- V3.1-V3.5 rules pass
- Integration test: Stage 2 → Stage 3 passes
Agent 4 - Stage 4 (Fact Aggregation)
Read:
- ReviewIQ-Pipeline-Contracts-v1.md § Stage 4
- ReviewIQ-Architecture-v3.2.md § Part 6 (analytics)
Create:
- pipeline/stage4_aggregate.py
- migrations/009_create_facts_table.sql
- pipeline/tests/test_stage4.py
Validate:
- V4.1-V4.7 rules pass
- E2E pipeline test passes
Success Criteria
Pipeline is complete when:
python -m pipeline.validate --job-id <JOB_ID> --verbose
# Expected output:
Stage 0: ✅ PASS (5/5 rules)
Stage 1: ✅ PASS (6/6 rules)
Stage 2: ✅ PASS (12/12 rules)
Stage 3: ✅ PASS (5/5 rules)
Stage 4: ✅ PASS (7/7 rules)
E2E Integration: ✅ PASS
Quick Commands
# Check current branch
git branch --show-current
# Expected: feature/platform-restructure
# View recent commits
git log --oneline -5
# Start database
docker-compose -f docker-compose.production.yml up -d postgres
# Run API server
python api_server_production.py
# Run frontend
cd frontend && npm run dev
# Run migrations (when created)
psql $DATABASE_URL -f migrations/005_create_reviews_tables.sql
# Run tests
pytest pipeline/tests/ -v
# Validate pipeline
python -m pipeline.validate --job-id <JOB_ID>
Environment Variables
# Database (required)
DATABASE_URL=postgresql://user:pass@localhost:5432/reviewiq
# LLM Provider (Stage 2)
OPENAI_API_KEY=sk-...
# OR
ANTHROPIC_API_KEY=sk-ant-...
# Embedding model (Stage 2)
EMBEDDING_MODEL=all-MiniLM-L6-v2
# Taxonomy version
DEFAULT_TAXONOMY_VERSION=v5.1
File Structure After Implementation
google-reviews-scraper-pro/
├── .artifacts/ # ← Design documents
│ ├── ReviewIQ-Pipeline-DevGuide.md # ← START HERE (for pipeline work)
│ ├── ReviewIQ-v32-Decisions.md
│ ├── ReviewIQ-Codebase-Overview.md
│ ├── ReviewIQ-Pipeline-Contracts-v1.md
│ ├── ReviewIQ-Pipeline-Checklist.md
│ └── ...
│
├── api_server_production.py # ✅ Exists - Main API
├── core/database.py # ✅ Exists - DB layer
├── scrapers/google_reviews/ # ✅ Exists - Scraper
│
├── pipeline/ # ❌ TO CREATE
│ ├── stage1_normalize.py
│ ├── stage2_classify.py
│ ├── stage3_route.py
│ ├── stage4_aggregate.py
│ ├── llm_client.py
│ └── tests/
│
└── migrations/
├── 001-004 # ✅ Exists
└── 005-009 # ❌ TO CREATE
Keep this guide updated when adding new artifacts or completing stages.