feat: Add reviewiq-pipeline package for LLM-powered review classification

Implement a standalone Python package for processing customer reviews through
a 4-stage pipeline using URT (Universal Review Taxonomy) v5.1:

- Stage 1: Normalization (text cleaning, language detection, deduplication)
- Stage 2: LLM Classification (OpenAI/Anthropic span extraction with URT codes)
- Stage 3: Issue Routing (deterministic issue ID generation, span linking)
- Stage 4: Fact Aggregation (time series metrics for dashboards)

Package includes:
- TypedDict contracts matching Pipeline-Contracts-v1.md
- Async database layer with asyncpg and 5 SQL migrations
- LLM client abstraction supporting both OpenAI and Anthropic
- Sentence-transformers integration for embeddings
- Validation rules V1.x through V4.x
- CLI commands: migrate, run, validate, check
- 55 unit and integration tests (all passing)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Alejandro Gutiérrez
2026-01-24 18:07:11 +00:00
parent b780a23b66
commit 7d720f5378
34 changed files with 7222 additions and 0 deletions

View File

@@ -0,0 +1,56 @@
"""
ReviewIQ Pipeline - LLM-powered review classification and analysis.
This package provides a complete pipeline for processing customer reviews:
- Stage 1: Normalization (text cleaning, language detection, deduplication)
- Stage 2: LLM Classification (span extraction with URT codes)
- Stage 3: Issue Routing (route negative spans to issues)
- Stage 4: Fact Aggregation (pre-aggregate metrics for dashboards)
"""
from reviewiq_pipeline.config import Config
from reviewiq_pipeline.contracts import (
ClassifiedReview,
ExtractedSpan,
FactRecord,
NormalizedReview,
RawReview,
RoutedSpan,
ScraperOutput,
Stage1Input,
Stage1Output,
Stage2Input,
Stage2Output,
Stage3Input,
Stage3Output,
Stage4Input,
Stage4Output,
ValidationError,
ValidationResult,
)
from reviewiq_pipeline.pipeline import Pipeline
__version__ = "0.1.0"
__all__ = [
# Main API
"Pipeline",
"Config",
# Contracts
"ScraperOutput",
"RawReview",
"Stage1Input",
"Stage1Output",
"NormalizedReview",
"Stage2Input",
"Stage2Output",
"ClassifiedReview",
"ExtractedSpan",
"Stage3Input",
"Stage3Output",
"RoutedSpan",
"Stage4Input",
"Stage4Output",
"FactRecord",
"ValidationResult",
"ValidationError",
]