feat: Add reviewiq-pipeline package for LLM-powered review classification

Implement a standalone Python package for processing customer reviews through a 4-stage pipeline using URT (Universal Review Taxonomy) v5.1: - Stage 1: Normalization (text cleaning, language detection, deduplication) - Stage 2: LLM Classification (OpenAI/Anthropic span extraction with URT codes) - Stage 3: Issue Routing (deterministic issue ID generation, span linking) - Stage 4: Fact Aggregation (time series metrics for dashboards) Package includes: - TypedDict contracts matching Pipeline-Contracts-v1.md - Async database layer with asyncpg and 5 SQL migrations - LLM client abstraction supporting both OpenAI and Anthropic - Sentence-transformers integration for embeddings - Validation rules V1.x through V4.x - CLI commands: migrate, run, validate, check - 55 unit and integration tests (all passing) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 18:07:11 +00:00
parent b780a23b66
commit 7d720f5378
34 changed files with 7222 additions and 0 deletions
--- a/packages/reviewiq-pipeline/README.md
+++ b/packages/reviewiq-pipeline/README.md
@@ -0,0 +1,97 @@
+# ReviewIQ Pipeline
+
+LLM-powered review classification and analysis pipeline using URT (Universal Review Taxonomy) v5.1.
+
+## Features
+
+- **Stage 1: Normalization** - Text cleaning, language detection, deduplication
+- **Stage 2: LLM Classification** - Span extraction with URT codes using OpenAI/Anthropic
+- **Stage 3: Issue Routing** - Route negative spans to issues for tracking
+- **Stage 4: Fact Aggregation** - Pre-aggregate metrics for dashboard queries
+
+## Installation
+
+```bash
+pip install reviewiq-pipeline
+```
+
+Or install from source:
+
+```bash
+pip install -e packages/reviewiq-pipeline
+```
+
+## Quick Start
+
+### Python API
+
+```python
+from reviewiq_pipeline import Pipeline, Config
+
+# Initialize
+config = Config(
+    database_url="postgresql://...",
+    llm_provider="openai",
+    llm_api_key="sk-...",
+    taxonomy_version="v5.1"
+)
+pipeline = Pipeline(config)
+
+# Run full pipeline
+result = await pipeline.process(scraper_output)
+
+# Or run individual stages
+stage1_result = await pipeline.normalize(scraper_output)
+stage2_result = await pipeline.classify(stage1_result)
+stage3_result = await pipeline.route(stage2_result)
+stage4_result = await pipeline.aggregate(business_id, date)
+
+# Validate
+validation = await pipeline.validate(job_id)
+```
+
+### CLI
+
+```bash
+# Run migrations
+reviewiq-pipeline migrate --database-url $DATABASE_URL
+
+# Process a job
+reviewiq-pipeline run --job-id <UUID> --stages 1,2,3,4
+
+# Validate pipeline output
+reviewiq-pipeline validate --job-id <UUID>
+```
+
+## Configuration
+
+Environment variables:
+
+- `DATABASE_URL` - PostgreSQL connection string
+- `LLM_PROVIDER` - `openai` or `anthropic`
+- `OPENAI_API_KEY` - OpenAI API key (if using OpenAI)
+- `ANTHROPIC_API_KEY` - Anthropic API key (if using Anthropic)
+- `TAXONOMY_VERSION` - URT taxonomy version (default: `v5.1`)
+
+## Development
+
+```bash
+# Install with dev dependencies
+pip install -e "packages/reviewiq-pipeline[dev]"
+
+# Run tests
+pytest
+
+# Run with coverage
+pytest --cov=reviewiq_pipeline
+
+# Type checking
+mypy src/reviewiq_pipeline
+
+# Linting
+ruff check src/reviewiq_pipeline
+```
+
+## License
+
+MIT