Files
2026-02-02 18:19:00 +00:00

19 KiB

Classification System & Primitives Taxonomy

Version: 2.0 Status: Production Location: packages/reviewiq-pipeline/scripts/run_classification_v2.py

Overview

The Classification System transforms raw customer reviews into structured, actionable data by:

  1. Extracting spans - Identifying semantically meaningful segments within review text
  2. Classifying primitives - Mapping each span to a primitive (e.g., MANNER, SPEED, VALUE_FOR_MONEY)
  3. Scoring - Assigning valence, intensity, detail, and confidence to each span
  4. Filtering - Detecting non-informative content (emoji-only, translation artifacts)

The output is stored in pipeline.detected_spans_v2 and powers downstream analytics, issue routing, and the Reputation Report.

Note: There is a legacy system (stage2_classify.py) that uses URT codes (J1.01, O1.01). The current production system uses primitives with descriptive names.

Quick Start

# Classify reviews for a business (dry run)
python scripts/run_classification_v2.py --business "Go Karts Mar Menor" --limit 100 --dry-run

# Real LLM classification
python scripts/run_classification_v2.py --business "Go Karts Mar Menor" --limit 100 --use-llm

# Evaluate classification quality
python scripts/run_classification_v2.py --evaluate "Go Karts Mar Menor"

# Language analysis across all data
python scripts/run_classification_v2.py --language-analysis

Primitives Taxonomy

The production system uses 37 primitives across 5 domains plus meta primitives. These are defined in reputation_report.py's DOMAIN_MAP.

Note: A larger taxonomy of ~150 primitives exists in gbp_primitive_prompts.py for future expansion and business-specific configuration. The production system currently uses the subset below.

Domain Structure

Domain Code Primitives
Output O TASTE, CRAFT, FRESHNESS, TEMPERATURE, EFFECTIVENESS, ACCURACY, CONDITION, CONSISTENCY
People P MANNER, COMPETENCE, ATTENTIVENESS, COMMUNICATION
Journey J SPEED, FRICTION, RELIABILITY, AVAILABILITY
Environment E CLEANLINESS, COMFORT, SAFETY, AMBIANCE, ACCESSIBILITY, DIGITAL_UX
Value V PRICE_LEVEL, PRICE_FAIRNESS, PRICE_TRANSPARENCY, VALUE_FOR_MONEY
Meta meta HONESTY, ETHICS, PROMISES, ACKNOWLEDGMENT, RESPONSE_QUALITY, RECOVERY, RETURN_INTENT, RECOMMEND, RECOGNITION, UNMAPPED, NON_INFORMATIVE

Special Primitives

Primitive Purpose
UNMAPPED Could not classify to any primitive (target: <10%)
NON_INFORMATIVE No actionable content (emoji-only, translation artifacts)

Full Primitive Reference

OUTPUT (O) - Product/Service Quality

Primitive Description Example Signals
TASTE Flavor quality (food/beverage) "delicious", "bland", "amazing flavor"
CRAFT Skill of execution "expertly made", "sloppy work", "quality craftsmanship"
FRESHNESS How fresh/new the product is "fresh ingredients", "stale", "just made"
TEMPERATURE Serving temperature "served hot", "cold food", "perfect temperature"
EFFECTIVENESS Does it work/achieve purpose "works great", "didn't work", "effective"
ACCURACY Correct execution of order "exactly as ordered", "wrong order", "got it right"
CONDITION State at delivery "arrived perfect", "damaged", "pristine condition"
CONSISTENCY Same quality each time "always consistent", "hit or miss", "reliable quality"

PEOPLE (P) - Staff Interactions

Primitive Description Example Signals
MANNER Friendliness and warmth "so friendly", "rude", "welcoming"
COMPETENCE Knowledge and skill "very knowledgeable", "clueless", "professional"
ATTENTIVENESS Being present and responsive "attentive staff", "ignored us", "checked on us"
COMMUNICATION Clarity and updates "kept us informed", "no updates", "explained clearly"

JOURNEY (J) - Process and Timing

Primitive Description Example Signals
SPEED How fast things happen "quick service", "took forever", "fast"
FRICTION Ease of process "smooth process", "complicated", "hassle-free"
RELIABILITY Dependable service "always reliable", "unreliable", "consistent"
AVAILABILITY Access to service/staff "always available", "never open", "hard to reach"

ENVIRONMENT (E) - Physical/Digital Space

Primitive Description Example Signals
CLEANLINESS Hygiene and tidiness "spotless", "dirty", "very clean"
COMFORT Physical ease "comfortable", "cramped", "cozy seating"
SAFETY Physical safety "felt safe", "dangerous", "secure"
AMBIANCE Overall mood/atmosphere "great vibe", "loud", "nice atmosphere"
ACCESSIBILITY Ease of access (physical/digital) "wheelchair accessible", "hard to find", "easy to navigate"
DIGITAL_UX Digital experience "easy to use app", "website broken", "smooth online booking"

VALUE (V) - Cost and Worth

Primitive Description Example Signals
PRICE_LEVEL Absolute cost "affordable", "expensive", "cheap"
PRICE_FAIRNESS Fair for what you get "fair price", "overpriced", "worth every penny"
PRICE_TRANSPARENCY Clear about costs "no hidden fees", "surprise charges", "upfront pricing"
VALUE_FOR_MONEY Overall value assessment "great value", "not worth it", "bang for buck"

META - Trust and Sentiment

Primitive Description Example Signals
HONESTY Truthfulness "honest", "lied to us", "transparent"
ETHICS Moral conduct "ethical", "scam", "trustworthy"
PROMISES Keeping commitments "kept their word", "broke promises", "reliable"
ACKNOWLEDGMENT Recognizing issues "admitted mistake", "denied problem", "apologized"
RESPONSE_QUALITY How business responds "great response", "ignored complaint", "resolved quickly"
RECOVERY Making amends "made it right", "no compensation", "fixed the issue"
RETURN_INTENT Would come back "will be back", "never again", "definitely returning"
RECOMMEND Would suggest to others "highly recommend", "don't go", "tell your friends"
RECOGNITION Customer acknowledgment "remembered us", "treated like strangers", "knew our name"

Span Classification

What is a Span?

A span is a contiguous segment of review text that expresses a single semantic unit about the customer experience.

Review: "The food was delicious but we waited 45 minutes for a table."

Span 1: "The food was delicious"
  → Primitive: TASTE (O)
  → Valence: + (positive)
  → Intensity: 2 (moderate)

Span 2: "we waited 45 minutes for a table"
  → Primitive: SPEED (J)
  → Valence: - (negative)
  → Intensity: 3 (high - specific number)

Span Fields

interface ClassificationSpan {
  // Position
  text: string;           // Extracted text from review
  start: number;          // Character offset start
  end: number;            // Character offset end

  // Classification
  primitive: string;      // e.g., "MANNER", "SPEED", "VALUE_FOR_MONEY", "UNMAPPED"
  valence: "+" | "-" | "0" | "±";
  intensity: 1 | 2 | 3;   // 1=low, 2=moderate, 3=high
  detail: 1 | 2 | 3;      // 1=vague, 2=some detail, 3=specific
  confidence: number;     // 0.0 - 1.0

  // Entity extraction (optional)
  entity?: string;        // Named entity (e.g., "John", "Room 302")
  entity_type?: "staff" | "location" | "product" | "process" | "time";

  // For UNMAPPED spans
  unmapped_keywords?: string[];  // Keywords that couldn't be mapped
}

Valence Types

Code Meaning Example
+ Positive sentiment "excellent service"
- Negative sentiment "terrible wait"
0 Neutral/factual "open until 9pm"
± Mixed sentiment "good but expensive"

Intensity Levels

Value Level Signals
1 Low Generic mentions, implied sentiment
2 Medium Clear opinion, adjectives
3 High Strong language, specifics, numbers

Detail Levels

Value Level Description
1 Vague General statement, no specifics
2 Some detail Has some context or explanation
3 Specific Actionable detail, names, numbers

Confidence

A float from 0.0 to 1.0 indicating how confident the classifier is:

  • ≥ 0.8: High confidence, clear signal
  • 0.5 - 0.8: Medium confidence, reasonable inference
  • < 0.5: Low confidence - if below threshold, use UNMAPPED

Classification Pipeline

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                    Classification V2                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────┐ │
│  │  Config Resolver│ ─→ │  LLM Classifier │ ─→ │  Store      │ │
│  │                 │    │                 │    │             │ │
│  │  • GBP path     │    │  • OpenAI API   │    │  • spans_v2 │ │
│  │  • Sector brief │    │  • Primitives   │    │  • run_id   │ │
│  │  • Enabled prims│    │  • Language det │    │  • audit    │ │
│  └────────┬────────┘    └────────┬────────┘    └──────┬──────┘ │
│           │                      │                     │        │
│           │         ┌────────────┴────────────┐        │        │
│           │         │   Non-Informative       │        │        │
│           │         │   Detection (skip LLM)  │        │        │
│           │         └─────────────────────────┘        │        │
│           ▼                      ▼                     ▼        │
│  ┌─────────────────────────────────────────────────────────────┐│
│  │              pipeline.detected_spans_v2                     ││
│  │    (primitive, valence, intensity, detail, confidence)      ││
│  └─────────────────────────────────────────────────────────────┘│
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Non-Informative Detection

Before calling the LLM, reviews are checked for non-informative content to save cost:

# Conservative detection - only skip when VERY sure
def is_non_informative(text: str) -> tuple[bool, str]:
    """
    Returns (is_non_informative, reason).
    Reasons: 'empty', 'junk_pattern', 'no_content', 'pure_repetition'
    """

Detection Rules:

  • Empty text
  • Emoji-only content: ^[\U0001F300-\U0001F9FF\s\.\!\?]+$
  • Translation artifacts: "translated by google"
  • No alphanumeric content
  • Pure repetition: "good good good good"

Reviews that pass detection go to the LLM.

Config Resolution

Each business gets a resolved config based on its GBP (Google Business Profile) category:

resolver = ConfigResolver()
config = await resolver.resolve("Go Karts Mar Menor", pool)

# Returns:
{
    "business_id": "Go Karts Mar Menor",
    "sector_code": "recreation",
    "gbp_path": "Recreation.Amusement_Parks.Go_Karts",
    "config_version": "v2.1-2026-01-15",
    "enabled_primitives": ["SPEED", "SAFETY", "VALUE_FOR_MONEY", ...],
    "weights": {"SAFETY": 1.5, "SPEED": 1.2, ...},
    "brief": {"what_customers_judge": [...]}
}

LLM Classification Prompt

The classifier uses a structured prompt with business-specific primitives:

You are a review classifier using primitive-based analysis.

## ENABLED PRIMITIVES (use ONLY these)
- MANNER: Friendliness and warmth of staff (weight: 1.2x)
- SPEED: How fast things happen
- SAFETY: Physical safety and protection
...

## RULES
1. Extract 1-5 spans per review
2. Each span gets exactly ONE primitive
3. If nothing fits with confidence ≥ 0.5, use UNMAPPED
4. Valence: + (positive), - (negative), 0 (neutral), ± (mixed)
5. Intensity: 1 (low), 2 (moderate), 3 (high)
6. Detail: 1 (vague), 2 (some detail), 3 (specific)

## OUTPUT FORMAT (JSON)
{
  "spans": [
    {
      "text": "exact text from review",
      "start": 0,
      "end": 25,
      "primitive": "MANNER",
      "valence": "+",
      "intensity": 2,
      "detail": 2,
      "confidence": 0.85,
      "entity": null,
      "entity_type": null
    }
  ]
}

Language Detection

The LLM classifier auto-detects review language and returns it with confidence. This enables:

  • Per-language UNMAPPED rate tracking
  • Identification of languages needing better signal coverage
  • Multilingual analytics (7+ languages: Spanish, English, Dutch, German, Polish, Finnish, Danish)

Database Schema

pipeline.detected_spans_v2

CREATE TABLE pipeline.detected_spans_v2 (
    id BIGSERIAL PRIMARY KEY,

    -- Context
    job_id VARCHAR(50),                   -- Scraper job ID
    business_id VARCHAR(255) NOT NULL,
    review_id VARCHAR(255) NOT NULL,
    gbp_path ltree,                       -- e.g., 'Recreation.Go_Karts'
    sector_code VARCHAR(50),              -- e.g., 'recreation'
    config_version VARCHAR(100),          -- Config version used
    run_id UUID,                          -- Classification run ID

    -- Classification (primitives-based)
    primitive VARCHAR(50) NOT NULL,       -- e.g., "MANNER", "SPEED", "UNMAPPED"
    valence VARCHAR(5) NOT NULL,          -- +, -, 0, ±
    intensity INTEGER,                    -- 1, 2, 3
    detail INTEGER,                       -- 1, 2, 3
    mode VARCHAR(50),                     -- e.g., "dine_in", "delivery"
    confidence FLOAT NOT NULL,            -- 0.0 - 1.0

    -- Span position
    span_text TEXT NOT NULL,
    span_start INTEGER,
    span_end INTEGER,

    -- Entity extraction
    entity VARCHAR(255),
    entity_type VARCHAR(50),
    unmapped_keywords TEXT[],             -- Keywords for UNMAPPED spans

    -- Audit trail
    model VARCHAR(100),                   -- e.g., "gpt-4o-mini"
    raw_response JSONB,                   -- Full LLM response
    review_hash VARCHAR(32),              -- For deduplication
    language VARCHAR(10),                 -- Detected language

    created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);

-- Key indexes
CREATE INDEX idx_spans_v2_business_id ON detected_spans_v2(business_id);
CREATE INDEX idx_spans_v2_primitive ON detected_spans_v2(primitive);
CREATE INDEX idx_spans_v2_valence ON detected_spans_v2(valence);
CREATE INDEX idx_spans_v2_run_id ON detected_spans_v2(run_id);
CREATE INDEX idx_spans_v2_language ON detected_spans_v2(language);

Key Queries

Get all spans for a business in a time window:

SELECT s.*, f.review_time_utc, f.rating
FROM pipeline.detected_spans_v2 s
JOIN pipeline.review_facts_v1 f
  ON f.review_id = s.review_id
 AND f.business_id = s.business_id    -- CRITICAL: join on both!
WHERE s.business_id = $1
  AND f.review_time_utc >= $2
  AND f.review_time_utc < $3
ORDER BY f.review_time_utc DESC;

Aggregate by primitive:

SELECT
    primitive,
    valence,
    COUNT(*) as span_count,
    AVG(confidence) as avg_confidence,
    AVG(intensity) as avg_intensity
FROM pipeline.detected_spans_v2
WHERE business_id = $1
  AND created_at >= $2
GROUP BY primitive, valence
ORDER BY span_count DESC;

Configuration

Environment Variables

Variable Required Description
OPENAI_API_KEY Yes For LLM classification
DATABASE_URL Yes PostgreSQL connection

CLI Options

python run_classification_v2.py [OPTIONS]

Options:
  --business TEXT            Business name or pattern (required for classify/evaluate)
  --limit INT                Max reviews to process (default: 100)
  --dry-run                  Don't store results to database
  --evaluate BUSINESS        Evaluate existing classification quality
  --language-analysis        Analyze UNMAPPED rates by language across all data
  --ignore-legacy-language   Exclude rows with language='auto'/'unknown'/NULL
  --latest-hours INT         Only include spans from last N hours
  --use-existing             Use existing spans instead of jobs
  --use-llm                  Use real LLM classification (requires OPENAI_API_KEY)
  --model TEXT               Model for LLM (default: gpt-4o-mini)

Models

Model Cost Use Case
gpt-4o-mini Low Default, good balance
gpt-4o High Complex reviews, higher accuracy

Evaluation

The classifier includes built-in evaluation to measure quality:

# Evaluate classification quality for a business
python run_classification_v2.py --evaluate "Go Karts Mar Menor"

# Output includes:
# - UNMAPPED rate (target: < 10%)
# - UNMAPPED rate by language
# - Top primitives distribution
# - Contradiction detection (positive text + negative valence)
# - Confidence distribution

Quality Metrics

Metric Target Description
UNMAPPED rate < 10% Content spans that couldn't be classified
NON_INFORMATIVE rate < 30% Reviews with no actionable content
Avg confidence > 0.7 Average classifier confidence
Contradictions < 5% Valence mismatches (e.g., "great" → negative)

Language Analysis

# Analyze UNMAPPED rates across all languages and sectors
python run_classification_v2.py --language-analysis

# Exclude legacy data (auto/unknown language)
python run_classification_v2.py --language-analysis --ignore-legacy-language

# Only recent data
python run_classification_v2.py --language-analysis --latest-hours 24

Changelog

v2.0 (2026-01)

  • New primitives-based taxonomy (MANNER, SPEED, etc.)
  • Config resolution from GBP category hierarchy
  • Sector-specific enabled primitives and weights
  • Language detection with per-language UNMAPPED tracking
  • Non-informative detection to skip LLM for junk content
  • run_id for tracking classification runs
  • Evaluation tooling built-in

v1.0 (Legacy)

  • URT code-based classification (J1.01, O1.01)
  • Stored in review_spans table
  • Part of original pipeline package