Comprehensive documentation covering: - Actual production primitives (37 primitives across 5 domains) - O: TASTE, CRAFT, FRESHNESS, TEMPERATURE, EFFECTIVENESS, ACCURACY, CONDITION, CONSISTENCY - P: MANNER, COMPETENCE, ATTENTIVENESS, COMMUNICATION - J: SPEED, FRICTION, RELIABILITY, AVAILABILITY - E: CLEANLINESS, COMFORT, SAFETY, AMBIANCE, ACCESSIBILITY, DIGITAL_UX - V: PRICE_LEVEL, PRICE_FAIRNESS, PRICE_TRANSPARENCY, VALUE_FOR_MONEY - meta: HONESTY, ETHICS, PROMISES, etc. + UNMAPPED, NON_INFORMATIVE - Classification pipeline with config resolution - Non-informative detection (skip LLM for junk content) - Language detection and per-language UNMAPPED tracking - Database schema for detected_spans_v2 - Evaluation tooling and quality metrics Note: A larger taxonomy (~150 primitives) exists in gbp_primitive_prompts.py for future expansion. The production system uses the subset above. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ReviewIQ Pipeline
LLM-powered review classification and analysis pipeline using URT (Universal Review Taxonomy) v5.1.
Features
- Stage 1: Normalization - Text cleaning, language detection, deduplication
- Stage 2: LLM Classification - Span extraction with URT codes using OpenAI/Anthropic
- Stage 3: Issue Routing - Route negative spans to issues for tracking
- Stage 4: Fact Aggregation - Pre-aggregate metrics for dashboard queries
Installation
pip install reviewiq-pipeline
Or install from source:
pip install -e packages/reviewiq-pipeline
Quick Start
Python API
from reviewiq_pipeline import Pipeline, Config
# Initialize
config = Config(
database_url="postgresql://...",
llm_provider="openai",
llm_api_key="sk-...",
taxonomy_version="v5.1"
)
pipeline = Pipeline(config)
# Run full pipeline
result = await pipeline.process(scraper_output)
# Or run individual stages
stage1_result = await pipeline.normalize(scraper_output)
stage2_result = await pipeline.classify(stage1_result)
stage3_result = await pipeline.route(stage2_result)
stage4_result = await pipeline.aggregate(business_id, date)
# Validate
validation = await pipeline.validate(job_id)
CLI
# Run migrations
reviewiq-pipeline migrate --database-url $DATABASE_URL
# Process a job
reviewiq-pipeline run --job-id <UUID> --stages 1,2,3,4
# Validate pipeline output
reviewiq-pipeline validate --job-id <UUID>
Configuration
Environment variables:
DATABASE_URL- PostgreSQL connection stringLLM_PROVIDER-openaioranthropicOPENAI_API_KEY- OpenAI API key (if using OpenAI)ANTHROPIC_API_KEY- Anthropic API key (if using Anthropic)TAXONOMY_VERSION- URT taxonomy version (default:v5.1)
Development
# Install with dev dependencies
pip install -e "packages/reviewiq-pipeline[dev]"
# Run tests
pytest
# Run with coverage
pytest --cov=reviewiq_pipeline
# Type checking
mypy src/reviewiq_pipeline
# Linting
ruff check src/reviewiq_pipeline
License
MIT