New artifacts: - ReviewIQ-Pipeline-DevGuide.md: Entry point for pipeline work - ReviewIQ-Pipeline-Contracts-v1.md: Stage I/O specs, validation rules, test fixtures - ReviewIQ-Pipeline-Checklist.md: Per-stage implementation checklists - ReviewIQ-Codebase-Overview.md: File structure, integration points - ReviewIQ-v3.2.1-Taxonomy-Versioning.md: Taxonomy versioning addendum Updated: - ReviewIQ-v32-Decisions.md: Added B2 audit findings, taxonomy versioning decisions, pipeline status These artifacts enable parallel development of pipeline stages 1-4 with: - Independent validation (35 rules across stages) - Clear input/output contracts - Test fixtures for each stage - Definition of done criteria Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
313 lines
10 KiB
Markdown
313 lines
10 KiB
Markdown
# ReviewIQ Pipeline Development Guide
|
|
|
|
**Purpose**: Entry point for agents implementing the enrichment pipeline
|
|
**Last Updated**: 2026-01-24
|
|
|
|
---
|
|
|
|
## TL;DR - Current State
|
|
|
|
**Pipeline Implementation: ~55% complete**
|
|
|
|
```
|
|
✅ WORKING ❌ NOT IMPLEMENTED
|
|
────────── ──────────────────
|
|
Google Maps scraping Stage 1: Normalization
|
|
Job orchestration Stage 2: LLM Classification
|
|
Chrome worker pool Stage 3: Issue Routing
|
|
Webhook delivery Stage 4: Fact Aggregation
|
|
SSE streaming Enrichment database schema
|
|
Frontend (job management) Advanced analytics UI
|
|
```
|
|
|
|
**Estimated effort to 100%**: 6-8 weeks
|
|
|
|
---
|
|
|
|
## Cold Start Instructions
|
|
|
|
A new agent should:
|
|
|
|
| Step | Action | Time |
|
|
|------|--------|------|
|
|
| 1 | Read this file (`ReviewIQ-Pipeline-DevGuide.md`) | 2 min |
|
|
| 2 | Read `ReviewIQ-v32-Decisions.md` | 5 min |
|
|
| 3 | Read `ReviewIQ-Codebase-Overview.md` | 10 min |
|
|
| 4 | Read assigned stage in `ReviewIQ-Pipeline-Contracts-v1.md` | 15 min |
|
|
| 5 | Use `ReviewIQ-Pipeline-Checklist.md` to verify completion | Reference |
|
|
|
|
---
|
|
|
|
## Document Map
|
|
|
|
```
|
|
┌─────────────────────────────────────┐
|
|
│ ReviewIQ-Pipeline-DevGuide.md │
|
|
│ (YOU ARE HERE) │
|
|
└─────────────────┬───────────────────┘
|
|
│
|
|
┌─────────────────────────────┼─────────────────────────────┐
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌─────────────────────┐ ┌─────────────────────────┐ ┌─────────────────────┐
|
|
│ CONTEXT RECOVERY │ │ IMPLEMENTATION │ │ REFERENCE │
|
|
├─────────────────────┤ ├─────────────────────────┤ ├─────────────────────┤
|
|
│ │ │ │ │ │
|
|
│ ReviewIQ-v32- │ │ Pipeline-Contracts-v1 │ │ Architecture-v3.2 │
|
|
│ Decisions.md │ │ (I/O specs, validation) │ │ (full DDL spec) │
|
|
│ (key decisions, │ │ │ │ │
|
|
│ markpoint) │ │ Pipeline-Checklist │ │ v3.2.1-Taxonomy- │
|
|
│ │ │ (implementation tasks) │ │ Versioning │
|
|
│ Codebase-Overview │ │ │ │ (versioning spec) │
|
|
│ (file structure, │ │ LLM-Classification- │ │ │
|
|
│ integration points) │ │ Contract-v1 │ │ URT-v5.1-Reference │
|
|
│ │ │ (prompt engineering) │ │ (dimension codes) │
|
|
└─────────────────────┘ └─────────────────────────┘ └─────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## Core Documents
|
|
|
|
### Context & Status (Read First)
|
|
|
|
| File | Purpose | Est. Read Time |
|
|
|------|---------|----------------|
|
|
| `ReviewIQ-Pipeline-DevGuide.md` | Entry point, document map | 2 min |
|
|
| `ReviewIQ-v32-Decisions.md` | Key decisions, current markpoint | 5 min |
|
|
| `ReviewIQ-Codebase-Overview.md` | File structure, what code exists, integration points | 10 min |
|
|
|
|
### Implementation Guides (For Building)
|
|
|
|
| File | Purpose | Est. Read Time |
|
|
|------|---------|----------------|
|
|
| `ReviewIQ-Pipeline-Contracts-v1.md` | Stage I/O specs, validation rules, test fixtures | 15 min |
|
|
| `ReviewIQ-Pipeline-Checklist.md` | Per-stage implementation checklist, definition of done | 5 min |
|
|
| `LLM-Classification-Contract-v1.md` | LLM prompt engineering spec (Stage 2) | 10 min |
|
|
|
|
### Full Specifications (Reference)
|
|
|
|
| File | Purpose | When to Read |
|
|
|------|---------|--------------|
|
|
| `ReviewIQ-Architecture-v3.2.md` | Complete v3.2 spec with DDL | Schema details |
|
|
| `ReviewIQ-v3.2.1-Taxonomy-Versioning.md` | Taxonomy versioning addendum | Future-proofing |
|
|
| `URT-v5.1-Reference.md` | URT dimension codes reference | Classification reference |
|
|
|
|
### Legacy (Superseded - Reference Only)
|
|
|
|
| File | Note |
|
|
|------|------|
|
|
| `ReviewIQ-Architecture-v2.md` | Superseded by v3.2 |
|
|
| `ReviewIQ-Architecture-v3.md` | Superseded by v3.2 |
|
|
| `ReviewIQ-Architecture-v3.1.md` | Superseded by v3.2 |
|
|
| `CONTEXT-KEEPER.md` | Use `ReviewIQ-v32-Decisions.md` instead |
|
|
|
|
---
|
|
|
|
## What's Captured in Artifacts
|
|
|
|
| Context | Document |
|
|
|---------|----------|
|
|
| Key architectural decisions | `ReviewIQ-v32-Decisions.md` |
|
|
| Current implementation status (~55%) | `ReviewIQ-Codebase-Overview.md` |
|
|
| Existing file structure | `ReviewIQ-Codebase-Overview.md` |
|
|
| Integration points (where new code connects) | `ReviewIQ-Codebase-Overview.md` |
|
|
| Stage input/output contracts | `ReviewIQ-Pipeline-Contracts-v1.md` |
|
|
| Validation rules (35 total across stages) | `ReviewIQ-Pipeline-Contracts-v1.md` |
|
|
| Test fixtures (5 sample JSON payloads) | `ReviewIQ-Pipeline-Contracts-v1.md` |
|
|
| Implementation checklists | `ReviewIQ-Pipeline-Checklist.md` |
|
|
| Definition of done per stage | `ReviewIQ-Pipeline-Checklist.md` |
|
|
| LLM prompt specification | `LLM-Classification-Contract-v1.md` |
|
|
| URT taxonomy codes | `URT-v5.1-Reference.md` |
|
|
| Full database DDL | `ReviewIQ-Architecture-v3.2.md` |
|
|
| Taxonomy versioning schema | `ReviewIQ-v3.2.1-Taxonomy-Versioning.md` |
|
|
|
|
---
|
|
|
|
## Pipeline Stages
|
|
|
|
| Stage | Name | Status | Contract Section | Validation Rules |
|
|
|-------|------|--------|------------------|------------------|
|
|
| 0 | Raw Ingestion | ✅ Done | Pipeline-Contracts § Stage 0 | V0.1-V0.5 |
|
|
| 1 | Normalization | ❌ TODO | Pipeline-Contracts § Stage 1 | V1.1-V1.6 |
|
|
| 2 | LLM Classification | ❌ TODO | Pipeline-Contracts § Stage 2 | V2.1-V2.12 |
|
|
| 3 | Issue Routing | ❌ TODO | Pipeline-Contracts § Stage 3 | V3.1-V3.5 |
|
|
| 4 | Fact Aggregation | ❌ TODO | Pipeline-Contracts § Stage 4 | V4.1-V4.7 |
|
|
|
|
---
|
|
|
|
## Parallel Development Assignment
|
|
|
|
### Agent 1 - Stage 1 (Normalization)
|
|
```
|
|
Read:
|
|
- ReviewIQ-Pipeline-Contracts-v1.md § Stage 1
|
|
- ReviewIQ-Codebase-Overview.md (integration points)
|
|
|
|
Create:
|
|
- pipeline/stage1_normalize.py
|
|
- migrations/005_create_reviews_tables.sql
|
|
- pipeline/tests/test_stage1.py
|
|
|
|
Validate:
|
|
- V1.1-V1.6 rules pass
|
|
- Integration test: Stage 0 → Stage 1 passes
|
|
```
|
|
|
|
### Agent 2 - Stage 2 (LLM Classification)
|
|
```
|
|
Read:
|
|
- ReviewIQ-Pipeline-Contracts-v1.md § Stage 2
|
|
- LLM-Classification-Contract-v1.md
|
|
- URT-v5.1-Reference.md
|
|
|
|
Create:
|
|
- pipeline/stage2_classify.py
|
|
- pipeline/llm_client.py
|
|
- pipeline/span_extractor.py
|
|
- migrations/006_create_spans_table.sql
|
|
- migrations/007_create_urt_enums.sql
|
|
- pipeline/tests/test_stage2.py
|
|
|
|
Validate:
|
|
- V2.1-V2.12 rules pass
|
|
- Integration test: Stage 1 → Stage 2 passes
|
|
```
|
|
|
|
### Agent 3 - Stage 3 (Issue Routing)
|
|
```
|
|
Read:
|
|
- ReviewIQ-Pipeline-Contracts-v1.md § Stage 3
|
|
- ReviewIQ-Architecture-v3.2.md § Part 5 (issue lifecycle)
|
|
|
|
Create:
|
|
- pipeline/stage3_route.py
|
|
- pipeline/issue_manager.py
|
|
- migrations/008_create_issues_tables.sql
|
|
- pipeline/tests/test_stage3.py
|
|
|
|
Validate:
|
|
- V3.1-V3.5 rules pass
|
|
- Integration test: Stage 2 → Stage 3 passes
|
|
```
|
|
|
|
### Agent 4 - Stage 4 (Fact Aggregation)
|
|
```
|
|
Read:
|
|
- ReviewIQ-Pipeline-Contracts-v1.md § Stage 4
|
|
- ReviewIQ-Architecture-v3.2.md § Part 6 (analytics)
|
|
|
|
Create:
|
|
- pipeline/stage4_aggregate.py
|
|
- migrations/009_create_facts_table.sql
|
|
- pipeline/tests/test_stage4.py
|
|
|
|
Validate:
|
|
- V4.1-V4.7 rules pass
|
|
- E2E pipeline test passes
|
|
```
|
|
|
|
---
|
|
|
|
## Success Criteria
|
|
|
|
Pipeline is complete when:
|
|
|
|
```bash
|
|
python -m pipeline.validate --job-id <JOB_ID> --verbose
|
|
|
|
# Expected output:
|
|
Stage 0: ✅ PASS (5/5 rules)
|
|
Stage 1: ✅ PASS (6/6 rules)
|
|
Stage 2: ✅ PASS (12/12 rules)
|
|
Stage 3: ✅ PASS (5/5 rules)
|
|
Stage 4: ✅ PASS (7/7 rules)
|
|
E2E Integration: ✅ PASS
|
|
```
|
|
|
|
---
|
|
|
|
## Quick Commands
|
|
|
|
```bash
|
|
# Check current branch
|
|
git branch --show-current
|
|
# Expected: feature/platform-restructure
|
|
|
|
# View recent commits
|
|
git log --oneline -5
|
|
|
|
# Start database
|
|
docker-compose -f docker-compose.production.yml up -d postgres
|
|
|
|
# Run API server
|
|
python api_server_production.py
|
|
|
|
# Run frontend
|
|
cd frontend && npm run dev
|
|
|
|
# Run migrations (when created)
|
|
psql $DATABASE_URL -f migrations/005_create_reviews_tables.sql
|
|
|
|
# Run tests
|
|
pytest pipeline/tests/ -v
|
|
|
|
# Validate pipeline
|
|
python -m pipeline.validate --job-id <JOB_ID>
|
|
```
|
|
|
|
---
|
|
|
|
## Environment Variables
|
|
|
|
```bash
|
|
# Database (required)
|
|
DATABASE_URL=postgresql://user:pass@localhost:5432/reviewiq
|
|
|
|
# LLM Provider (Stage 2)
|
|
OPENAI_API_KEY=sk-...
|
|
# OR
|
|
ANTHROPIC_API_KEY=sk-ant-...
|
|
|
|
# Embedding model (Stage 2)
|
|
EMBEDDING_MODEL=all-MiniLM-L6-v2
|
|
|
|
# Taxonomy version
|
|
DEFAULT_TAXONOMY_VERSION=v5.1
|
|
```
|
|
|
|
---
|
|
|
|
## File Structure After Implementation
|
|
|
|
```
|
|
google-reviews-scraper-pro/
|
|
├── .artifacts/ # ← Design documents
|
|
│ ├── ReviewIQ-Pipeline-DevGuide.md # ← START HERE (for pipeline work)
|
|
│ ├── ReviewIQ-v32-Decisions.md
|
|
│ ├── ReviewIQ-Codebase-Overview.md
|
|
│ ├── ReviewIQ-Pipeline-Contracts-v1.md
|
|
│ ├── ReviewIQ-Pipeline-Checklist.md
|
|
│ └── ...
|
|
│
|
|
├── api_server_production.py # ✅ Exists - Main API
|
|
├── core/database.py # ✅ Exists - DB layer
|
|
├── scrapers/google_reviews/ # ✅ Exists - Scraper
|
|
│
|
|
├── pipeline/ # ❌ TO CREATE
|
|
│ ├── stage1_normalize.py
|
|
│ ├── stage2_classify.py
|
|
│ ├── stage3_route.py
|
|
│ ├── stage4_aggregate.py
|
|
│ ├── llm_client.py
|
|
│ └── tests/
|
|
│
|
|
└── migrations/
|
|
├── 001-004 # ✅ Exists
|
|
└── 005-009 # ❌ TO CREATE
|
|
```
|
|
|
|
---
|
|
|
|
*Keep this guide updated when adding new artifacts or completing stages.*
|