docs: Add pipeline development artifacts for parallel implementation
New artifacts: - ReviewIQ-Pipeline-DevGuide.md: Entry point for pipeline work - ReviewIQ-Pipeline-Contracts-v1.md: Stage I/O specs, validation rules, test fixtures - ReviewIQ-Pipeline-Checklist.md: Per-stage implementation checklists - ReviewIQ-Codebase-Overview.md: File structure, integration points - ReviewIQ-v3.2.1-Taxonomy-Versioning.md: Taxonomy versioning addendum Updated: - ReviewIQ-v32-Decisions.md: Added B2 audit findings, taxonomy versioning decisions, pipeline status These artifacts enable parallel development of pipeline stages 1-4 with: - Independent validation (35 rules across stages) - Clear input/output contracts - Test fixtures for each stage - Definition of done criteria Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -7,9 +7,18 @@
|
||||
## 1. Markpoint
|
||||
|
||||
```
|
||||
ID: reviewiq-v32-span-layer-2026-01-24-001
|
||||
Status: v3.2 span layer complete
|
||||
Based on: v3.1.2 (commit f998277)
|
||||
ID: reviewiq-v32-span-layer-2026-01-24-004
|
||||
Status: Pipeline contracts defined, ready for parallel implementation
|
||||
Based on: v3.2 (commit 43fd151)
|
||||
|
||||
START HERE: ReviewIQ-Pipeline-DevGuide.md (for pipeline implementation)
|
||||
|
||||
Key Documents:
|
||||
- ReviewIQ-Pipeline-DevGuide.md (entry point for pipeline work)
|
||||
- ReviewIQ-Codebase-Overview.md (file structure, what exists)
|
||||
- ReviewIQ-Pipeline-Contracts-v1.md (stage I/O contracts, validation)
|
||||
- ReviewIQ-Pipeline-Checklist.md (implementation checklist)
|
||||
- ReviewIQ-v3.2.1-Taxonomy-Versioning.md (taxonomy versioning spec)
|
||||
```
|
||||
|
||||
---
|
||||
@@ -152,6 +161,98 @@ Full: URT:F:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}:{causal}
|
||||
| Offsets nullable for LLM-inferred? | **No** — required, NOT NULL |
|
||||
| Reprocessing strategy? | **Soft-switch** with is_active flag |
|
||||
| TEXT vs ENUM for dimensions? | **ENUMs** — committed to Postgres |
|
||||
| Taxonomy evolution tracking? | **Yes** — versioned codes with explicit mappings (v3.2.1) |
|
||||
| B2 schema vs v3.2 divergence? | **Documented** — B2 is canonical URT, v3.2 is app layer |
|
||||
| Taxonomy versioning? | **Yes** — `taxonomy_version` column on spans, versioned code tables |
|
||||
|
||||
---
|
||||
|
||||
## 13. B2 Schema Audit Findings
|
||||
|
||||
**Audit Date**: 2026-01-24
|
||||
|
||||
The B2-database-schema.sql (canonical URT v5.1) and ReviewIQ v3.2 spec have deliberate divergences:
|
||||
|
||||
| Aspect | B2 (URT v5.1) | v3.2 (ReviewIQ) | Resolution |
|
||||
|--------|---------------|-----------------|------------|
|
||||
| Purpose | Source-agnostic taxonomy | Google Reviews app layer | Keep both |
|
||||
| ID strategy | UUIDs + sequential | Deterministic SHA256 | v3.2 choice |
|
||||
| Type safety | VARCHAR + CHECK | Postgres ENUMs | v3.2 choice |
|
||||
| Span table | `spans` | `review_spans` | v3.2 naming |
|
||||
| Offset columns | `char_start/char_end` | `span_start/span_end` | Document divergence |
|
||||
| Tenant model | Single-tenant | Multi-tenant (business_id) | v3.2 requirement |
|
||||
| Issue-span mapping | Many-to-many | One-to-one | v3.2 choice |
|
||||
| Causal chain | Normalized table | JSONB column | v3.2 flexibility |
|
||||
| Reprocessing | Not supported | Soft-switch pattern | v3.2 innovation |
|
||||
|
||||
**Action Items**:
|
||||
1. Import reference data (domains, categories, subcodes) from B2 INSERTs
|
||||
2. Seed `urt_codes` / `urt_codes_versioned` from B1-urt-codes.yaml
|
||||
3. Do NOT adopt B2 structure directly — v3.2 has specific app requirements
|
||||
|
||||
---
|
||||
|
||||
## 14. Taxonomy Versioning (v3.2.1)
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|----------|--------|-----------|
|
||||
| Track taxonomy version | Required column on spans | Classifications only meaningful in version context |
|
||||
| Version ID format | `v{major}.{minor}` | Human-readable, matches URT releases |
|
||||
| Code FK strategy | Composite `(code, version_id)` | Prevents orphaned classifications |
|
||||
| Cross-version mappings | Explicit mapping table | Enables normalized trend queries |
|
||||
| Mapping direction | Forward only (old→new) | Simpler model, matches time flow |
|
||||
| Default version | `'v5.1'` hardcoded | Safe baseline, explicit upgrade path |
|
||||
| Fact table versioning | Per-row `taxonomy_version` | Enables version-specific aggregation |
|
||||
|
||||
**Key Tables Added**:
|
||||
- `urt_taxonomy_versions` — Version registry with validity periods
|
||||
- `urt_codes_versioned` — Full code definitions per version (SCD Type 2)
|
||||
- `urt_code_mappings` — Cross-version translation rules
|
||||
|
||||
**Key Functions Added**:
|
||||
- `translate_urt_code(code, from_version, to_version)` — Single code translation
|
||||
- `get_code_lineage(code, version)` — Full historical lineage
|
||||
- `detect_taxonomy_drift(from_version, to_version)` — Impact analysis
|
||||
- `aggregate_spans_normalized(...)` — Version-normalized aggregation
|
||||
|
||||
**Principle**: Facts are immutable. A span classified as `J1.01` in v5.1 stays that way forever. Translation is explicit and auditable.
|
||||
|
||||
See: `.artifacts/ReviewIQ-v3.2.1-Taxonomy-Versioning.md`
|
||||
|
||||
---
|
||||
|
||||
## 15. Pipeline Implementation Status
|
||||
|
||||
**Overall: ~55% Complete** (as of 2026-01-24)
|
||||
|
||||
| Stage | Name | Status | Owner |
|
||||
|-------|------|--------|-------|
|
||||
| 0 | Raw Ingestion | ✅ DONE | Scraper Team |
|
||||
| 1 | Normalization | ❌ TODO | TBD |
|
||||
| 2 | LLM Classification | ❌ TODO | TBD |
|
||||
| 3 | Issue Routing | ❌ TODO | TBD |
|
||||
| 4 | Fact Aggregation | ❌ TODO | TBD |
|
||||
|
||||
**What's Working**:
|
||||
- Google Maps scraping (v1.0.0)
|
||||
- Job orchestration & queuing
|
||||
- Webhook delivery
|
||||
- Frontend job management
|
||||
- Real-time SSE streaming
|
||||
|
||||
**What's Missing**:
|
||||
- Entire enrichment pipeline (Stages 1-4)
|
||||
- LLM integration
|
||||
- Span extraction
|
||||
- Issue routing
|
||||
- Analytics aggregation
|
||||
|
||||
**Parallel Development**:
|
||||
Each stage can be implemented independently using the contracts defined in:
|
||||
- `ReviewIQ-Pipeline-Contracts-v1.md` — Full I/O specs, validation rules, test fixtures
|
||||
- `ReviewIQ-Pipeline-Checklist.md` — Implementation checklist, definition of done
|
||||
|
||||
**Estimated Effort to 100%**: 6-8 weeks
|
||||
|
||||
---
|
||||
|
||||
@@ -180,4 +281,4 @@ GREATEST(0.2, base_trust * modifiers) -- Floor prevents collapse
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2026-01-24*
|
||||
*Last updated: 2026-01-24 (pipeline contracts + codebase overview)*
|
||||
|
||||
Reference in New Issue
Block a user