feat: Add reviewiq-pipeline package for LLM-powered review classification
Implement a standalone Python package for processing customer reviews through a 4-stage pipeline using URT (Universal Review Taxonomy) v5.1: - Stage 1: Normalization (text cleaning, language detection, deduplication) - Stage 2: LLM Classification (OpenAI/Anthropic span extraction with URT codes) - Stage 3: Issue Routing (deterministic issue ID generation, span linking) - Stage 4: Fact Aggregation (time series metrics for dashboards) Package includes: - TypedDict contracts matching Pipeline-Contracts-v1.md - Async database layer with asyncpg and 5 SQL migrations - LLM client abstraction supporting both OpenAI and Anthropic - Sentence-transformers integration for embeddings - Validation rules V1.x through V4.x - CLI commands: migrate, run, validate, check - 55 unit and integration tests (all passing) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
97
packages/reviewiq-pipeline/README.md
Normal file
97
packages/reviewiq-pipeline/README.md
Normal file
@@ -0,0 +1,97 @@
|
||||
# ReviewIQ Pipeline
|
||||
|
||||
LLM-powered review classification and analysis pipeline using URT (Universal Review Taxonomy) v5.1.
|
||||
|
||||
## Features
|
||||
|
||||
- **Stage 1: Normalization** - Text cleaning, language detection, deduplication
|
||||
- **Stage 2: LLM Classification** - Span extraction with URT codes using OpenAI/Anthropic
|
||||
- **Stage 3: Issue Routing** - Route negative spans to issues for tracking
|
||||
- **Stage 4: Fact Aggregation** - Pre-aggregate metrics for dashboard queries
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install reviewiq-pipeline
|
||||
```
|
||||
|
||||
Or install from source:
|
||||
|
||||
```bash
|
||||
pip install -e packages/reviewiq-pipeline
|
||||
```
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Python API
|
||||
|
||||
```python
|
||||
from reviewiq_pipeline import Pipeline, Config
|
||||
|
||||
# Initialize
|
||||
config = Config(
|
||||
database_url="postgresql://...",
|
||||
llm_provider="openai",
|
||||
llm_api_key="sk-...",
|
||||
taxonomy_version="v5.1"
|
||||
)
|
||||
pipeline = Pipeline(config)
|
||||
|
||||
# Run full pipeline
|
||||
result = await pipeline.process(scraper_output)
|
||||
|
||||
# Or run individual stages
|
||||
stage1_result = await pipeline.normalize(scraper_output)
|
||||
stage2_result = await pipeline.classify(stage1_result)
|
||||
stage3_result = await pipeline.route(stage2_result)
|
||||
stage4_result = await pipeline.aggregate(business_id, date)
|
||||
|
||||
# Validate
|
||||
validation = await pipeline.validate(job_id)
|
||||
```
|
||||
|
||||
### CLI
|
||||
|
||||
```bash
|
||||
# Run migrations
|
||||
reviewiq-pipeline migrate --database-url $DATABASE_URL
|
||||
|
||||
# Process a job
|
||||
reviewiq-pipeline run --job-id <UUID> --stages 1,2,3,4
|
||||
|
||||
# Validate pipeline output
|
||||
reviewiq-pipeline validate --job-id <UUID>
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
Environment variables:
|
||||
|
||||
- `DATABASE_URL` - PostgreSQL connection string
|
||||
- `LLM_PROVIDER` - `openai` or `anthropic`
|
||||
- `OPENAI_API_KEY` - OpenAI API key (if using OpenAI)
|
||||
- `ANTHROPIC_API_KEY` - Anthropic API key (if using Anthropic)
|
||||
- `TAXONOMY_VERSION` - URT taxonomy version (default: `v5.1`)
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
# Install with dev dependencies
|
||||
pip install -e "packages/reviewiq-pipeline[dev]"
|
||||
|
||||
# Run tests
|
||||
pytest
|
||||
|
||||
# Run with coverage
|
||||
pytest --cov=reviewiq_pipeline
|
||||
|
||||
# Type checking
|
||||
mypy src/reviewiq_pipeline
|
||||
|
||||
# Linting
|
||||
ruff check src/reviewiq_pipeline
|
||||
```
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
Reference in New Issue
Block a user