Files
whyrating-engine-legacy/packages/reviewiq-pipeline
Alejandro Gutiérrez 0543a08242 docs: Add Classification System & Primitives Taxonomy documentation
Comprehensive documentation covering:
- Actual production primitives (37 primitives across 5 domains)
  - O: TASTE, CRAFT, FRESHNESS, TEMPERATURE, EFFECTIVENESS, ACCURACY, CONDITION, CONSISTENCY
  - P: MANNER, COMPETENCE, ATTENTIVENESS, COMMUNICATION
  - J: SPEED, FRICTION, RELIABILITY, AVAILABILITY
  - E: CLEANLINESS, COMFORT, SAFETY, AMBIANCE, ACCESSIBILITY, DIGITAL_UX
  - V: PRICE_LEVEL, PRICE_FAIRNESS, PRICE_TRANSPARENCY, VALUE_FOR_MONEY
  - meta: HONESTY, ETHICS, PROMISES, etc. + UNMAPPED, NON_INFORMATIVE
- Classification pipeline with config resolution
- Non-informative detection (skip LLM for junk content)
- Language detection and per-language UNMAPPED tracking
- Database schema for detected_spans_v2
- Evaluation tooling and quality metrics

Note: A larger taxonomy (~150 primitives) exists in gbp_primitive_prompts.py
for future expansion. The production system uses the subset above.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-01 00:35:46 +00:00
..

ReviewIQ Pipeline

LLM-powered review classification and analysis pipeline using URT (Universal Review Taxonomy) v5.1.

Features

  • Stage 1: Normalization - Text cleaning, language detection, deduplication
  • Stage 2: LLM Classification - Span extraction with URT codes using OpenAI/Anthropic
  • Stage 3: Issue Routing - Route negative spans to issues for tracking
  • Stage 4: Fact Aggregation - Pre-aggregate metrics for dashboard queries

Installation

pip install reviewiq-pipeline

Or install from source:

pip install -e packages/reviewiq-pipeline

Quick Start

Python API

from reviewiq_pipeline import Pipeline, Config

# Initialize
config = Config(
    database_url="postgresql://...",
    llm_provider="openai",
    llm_api_key="sk-...",
    taxonomy_version="v5.1"
)
pipeline = Pipeline(config)

# Run full pipeline
result = await pipeline.process(scraper_output)

# Or run individual stages
stage1_result = await pipeline.normalize(scraper_output)
stage2_result = await pipeline.classify(stage1_result)
stage3_result = await pipeline.route(stage2_result)
stage4_result = await pipeline.aggregate(business_id, date)

# Validate
validation = await pipeline.validate(job_id)

CLI

# Run migrations
reviewiq-pipeline migrate --database-url $DATABASE_URL

# Process a job
reviewiq-pipeline run --job-id <UUID> --stages 1,2,3,4

# Validate pipeline output
reviewiq-pipeline validate --job-id <UUID>

Configuration

Environment variables:

  • DATABASE_URL - PostgreSQL connection string
  • LLM_PROVIDER - openai or anthropic
  • OPENAI_API_KEY - OpenAI API key (if using OpenAI)
  • ANTHROPIC_API_KEY - Anthropic API key (if using Anthropic)
  • TAXONOMY_VERSION - URT taxonomy version (default: v5.1)

Development

# Install with dev dependencies
pip install -e "packages/reviewiq-pipeline[dev]"

# Run tests
pytest

# Run with coverage
pytest --cov=reviewiq_pipeline

# Type checking
mypy src/reviewiq_pipeline

# Linting
ruff check src/reviewiq_pipeline

License

MIT