# ReviewIQ Pipeline LLM-powered review classification and analysis pipeline using URT (Universal Review Taxonomy) v5.1. ## Features - **Stage 1: Normalization** - Text cleaning, language detection, deduplication - **Stage 2: LLM Classification** - Span extraction with URT codes using OpenAI/Anthropic - **Stage 3: Issue Routing** - Route negative spans to issues for tracking - **Stage 4: Fact Aggregation** - Pre-aggregate metrics for dashboard queries ## Installation ```bash pip install reviewiq-pipeline ``` Or install from source: ```bash pip install -e packages/reviewiq-pipeline ``` ## Quick Start ### Python API ```python from reviewiq_pipeline import Pipeline, Config # Initialize config = Config( database_url="postgresql://...", llm_provider="openai", llm_api_key="sk-...", taxonomy_version="v5.1" ) pipeline = Pipeline(config) # Run full pipeline result = await pipeline.process(scraper_output) # Or run individual stages stage1_result = await pipeline.normalize(scraper_output) stage2_result = await pipeline.classify(stage1_result) stage3_result = await pipeline.route(stage2_result) stage4_result = await pipeline.aggregate(business_id, date) # Validate validation = await pipeline.validate(job_id) ``` ### CLI ```bash # Run migrations reviewiq-pipeline migrate --database-url $DATABASE_URL # Process a job reviewiq-pipeline run --job-id --stages 1,2,3,4 # Validate pipeline output reviewiq-pipeline validate --job-id ``` ## Configuration Environment variables: - `DATABASE_URL` - PostgreSQL connection string - `LLM_PROVIDER` - `openai` or `anthropic` - `OPENAI_API_KEY` - OpenAI API key (if using OpenAI) - `ANTHROPIC_API_KEY` - Anthropic API key (if using Anthropic) - `TAXONOMY_VERSION` - URT taxonomy version (default: `v5.1`) ## Development ```bash # Install with dev dependencies pip install -e "packages/reviewiq-pipeline[dev]" # Run tests pytest # Run with coverage pytest --cov=reviewiq_pipeline # Type checking mypy src/reviewiq_pipeline # Linting ruff check src/reviewiq_pipeline ``` ## License MIT