Files
whyrating-engine-legacy/.artifacts/LLM-Classification-Contract-v1.md
Alejandro Gutiérrez 46cd54e275 Add LLM Classification Contract v1.0
Defines prompt, output schema, and validation rules for span-level
URT classification:

- System prompt with span extraction rules
- JSON schema for structured output
- 4 few-shot examples (multi-span, temporal, comparative)
- Structural and semantic validation rules
- Error handling with retry + fallback
- Performance considerations (token budget, batching, caching)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:07:31 +00:00

20 KiB

LLM Classification Contract v1.0

Purpose: Define the prompt, output schema, and validation rules for span-level URT classification. Target Model: Claude 3.5 Sonnet / GPT-4o (structured output mode) Date: 2026-01-24


1. Overview

The LLM receives a single review text and returns an array of spans — semantically distinct units of feedback. Each span is independently classified using URT v5.1.

Pipeline position:

reviews_raw.text → LLM → spans[] → review_spans table

2. System Prompt

You are a review classification system using URT (Universal Review Taxonomy) v5.1.

Your task is to extract semantic spans from customer reviews and classify each span independently.

## SPAN EXTRACTION RULES

1. **Split on contrasting conjunctions**: but, however, although, despite, yet, though
2. **Split on topic/target change**: food → service → bathroom = 3 spans
3. **Split on valence change**: positive → negative = split
4. **Split on domain change**: O (Offering) → J (Journey) → E (Environment) = split
5. **Keep together**: cause→effect within same feedback unit ("X because Y" = 1 span)

**Guardrails**:
- Max 3 spans per sentence (if 4+, re-check for over-splitting)
- Min 1 span per review (even single-word reviews)
- Spans must be non-overlapping and cover meaningful content

## URT DOMAINS (Tier-3 codes: X#.##)

| Domain | Code | Description |
|--------|------|-------------|
| Offering | O1-O4 | Product/service quality, features, variety |
| Price | P1-P4 | Value, pricing, promotions, payment |
| Journey | J1-J4 | Timing, process, convenience, accessibility |
| Environment | E1-E4 | Physical space, ambiance, cleanliness, digital UX |
| Attitude | A1-A4 | Staff behavior, helpfulness, professionalism |
| Voice | V1-V4 | Brand, communication, marketing, transparency |
| Relationship | R1-R4 | Loyalty, trust, consistency, personalization |

## DIMENSION CODES

### Valence
- V+ : Positive sentiment
- V- : Negative sentiment
- V0 : Neutral/factual
- V± : Mixed within the span

### Intensity
- I1 : Low ("okay", "fine", "decent")
- I2 : Moderate ("good", "bad", "slow")
- I3 : High ("amazing", "terrible", "unacceptable")

### Specificity
- S1 : Vague ("it was bad")
- S2 : Some detail ("the food was cold")
- S3 : Precise ("waited 45 minutes for appetizers")

### Actionability
- A1 : No clear action possible
- A2 : Possible actions, unclear which
- A3 : Clear, specific action ("train staff on X", "fix Y")

### Temporal
- TC : Current visit (default when no markers)
- TR : Recent pattern ("lately", "recently", "again")
- TH : Historical ("for years", "always", "used to")
- TF : Future ("won't return", "next time", "I expect")

### Evidence
- ES : Stated explicitly in text (default)
- EI : Inferred logically (not stated, but entailed)
- EC : Contextual (depends on surrounding text)

### Comparative
- CR-N : No comparison (default)
- CR-B : Better than alternatives
- CR-W : Worse than alternatives
- CR-S : Same as alternatives

## PRIMARY SPAN SELECTION

Mark exactly ONE span as is_primary=true using this order:
1. Highest intensity (I3 > I2 > I1)
2. Tie-break: negative over positive (V- > V± > V0 > V+)
3. Tie-break: earliest span_index

## USN (URT String Notation)

Generate a USN string for each span:

URT:S:{primary}[+{sec1}][+{sec2}]:{valence_sign}{intensity_num}:{S#}{A#}{temporal}.{evidence}.{CR_suffix}


Examples:
- `URT:S:J1.03:-2:22TC.ES.N` (J1.03, V-, I2, S2, A2, TC, ES, CR-N)
- `URT:S:P1.01+O2.03:+3:33TR.ES.B` (P1.01 primary, O2.03 secondary, V+, I3, S3, A3, TR, ES, CR-B)

Valence encoding: + for V+, - for V-, 0 for V0, ± for V±
CR suffix: N=CR-N, B=CR-B, W=CR-W, S=CR-S

## OUTPUT FORMAT

Return valid JSON matching the schema exactly. No markdown, no explanations.

3. Output JSON Schema

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "URT Span Extraction Response",
  "type": "object",
  "required": ["spans", "review_summary"],
  "additionalProperties": false,
  "properties": {
    "spans": {
      "type": "array",
      "minItems": 1,
      "maxItems": 15,
      "items": {
        "type": "object",
        "required": [
          "span_index",
          "span_text",
          "span_start",
          "span_end",
          "urt_primary",
          "urt_secondary",
          "valence",
          "intensity",
          "specificity",
          "actionability",
          "temporal",
          "evidence",
          "comparative",
          "is_primary",
          "usn"
        ],
        "additionalProperties": false,
        "properties": {
          "span_index": {
            "type": "integer",
            "minimum": 0,
            "description": "0-based position in review"
          },
          "span_text": {
            "type": "string",
            "minLength": 1,
            "description": "Exact text extracted from review"
          },
          "span_start": {
            "type": "integer",
            "minimum": 0,
            "description": "Character offset start (0-indexed)"
          },
          "span_end": {
            "type": "integer",
            "minimum": 1,
            "description": "Character offset end (exclusive)"
          },
          "urt_primary": {
            "type": "string",
            "pattern": "^[OPJEAVR][1-4]\\.[0-9]{2}$",
            "description": "Primary URT Tier-3 code"
          },
          "urt_secondary": {
            "type": "array",
            "maxItems": 2,
            "items": {
              "type": "string",
              "pattern": "^[OPJEAVR][1-4]\\.[0-9]{2}$"
            },
            "description": "Secondary codes (max 2, different domains preferred)"
          },
          "valence": {
            "type": "string",
            "enum": ["V+", "V-", "V0", "V±"]
          },
          "intensity": {
            "type": "string",
            "enum": ["I1", "I2", "I3"]
          },
          "specificity": {
            "type": "string",
            "enum": ["S1", "S2", "S3"]
          },
          "actionability": {
            "type": "string",
            "enum": ["A1", "A2", "A3"]
          },
          "temporal": {
            "type": "string",
            "enum": ["TC", "TR", "TH", "TF"]
          },
          "evidence": {
            "type": "string",
            "enum": ["ES", "EI", "EC"]
          },
          "comparative": {
            "type": "string",
            "enum": ["CR-N", "CR-B", "CR-W", "CR-S"]
          },
          "is_primary": {
            "type": "boolean",
            "description": "True for exactly one span per review"
          },
          "confidence": {
            "type": "string",
            "enum": ["high", "medium", "low"],
            "default": "medium"
          },
          "entity": {
            "type": ["string", "null"],
            "description": "Named entity if present (staff name, product, location)"
          },
          "entity_type": {
            "type": ["string", "null"],
            "enum": ["location", "staff", "product", "process", "time", "other", null]
          },
          "relation_type": {
            "type": ["string", "null"],
            "enum": ["cause_of", "effect_of", "contrast", "resolution", null],
            "description": "Relationship to another span in this review"
          },
          "related_span_index": {
            "type": ["integer", "null"],
            "minimum": 0,
            "description": "Index of related span (must be different from this span)"
          },
          "usn": {
            "type": "string",
            "pattern": "^URT:S:[OPJEAVR][1-4]\\.[0-9]{2}",
            "description": "URT String Notation for audit"
          }
        }
      }
    },
    "review_summary": {
      "type": "object",
      "required": ["dominant_valence", "dominant_domain", "span_count"],
      "properties": {
        "dominant_valence": {
          "type": "string",
          "enum": ["V+", "V-", "V0", "V±"]
        },
        "dominant_domain": {
          "type": "string",
          "pattern": "^[OPJEAVR]$"
        },
        "span_count": {
          "type": "integer",
          "minimum": 1
        },
        "has_comparative": {
          "type": "boolean"
        },
        "has_entity": {
          "type": "boolean"
        }
      }
    }
  }
}

4. Few-Shot Examples

Example 1: Multi-span review (3 spans)

Input:

{
  "review_text": "The food was incredible, best steak I've ever had. But we waited 45 minutes for a table even with a reservation. The manager comped our drinks which was nice.",
  "profile": "standard"
}

Output:

{
  "spans": [
    {
      "span_index": 0,
      "span_text": "The food was incredible, best steak I've ever had.",
      "span_start": 0,
      "span_end": 50,
      "urt_primary": "O1.01",
      "urt_secondary": [],
      "valence": "V+",
      "intensity": "I3",
      "specificity": "S2",
      "actionability": "A1",
      "temporal": "TC",
      "evidence": "ES",
      "comparative": "CR-B",
      "is_primary": false,
      "confidence": "high",
      "entity": "steak",
      "entity_type": "product",
      "relation_type": null,
      "related_span_index": null,
      "usn": "URT:S:O1.01:+3:21TC.ES.B"
    },
    {
      "span_index": 1,
      "span_text": "But we waited 45 minutes for a table even with a reservation.",
      "span_start": 51,
      "span_end": 112,
      "urt_primary": "J1.02",
      "urt_secondary": [],
      "valence": "V-",
      "intensity": "I3",
      "specificity": "S3",
      "actionability": "A3",
      "temporal": "TC",
      "evidence": "ES",
      "comparative": "CR-N",
      "is_primary": true,
      "confidence": "high",
      "entity": null,
      "entity_type": null,
      "relation_type": null,
      "related_span_index": null,
      "usn": "URT:S:J1.02:-3:33TC.ES.N"
    },
    {
      "span_index": 2,
      "span_text": "The manager comped our drinks which was nice.",
      "span_start": 113,
      "span_end": 158,
      "urt_primary": "A2.01",
      "urt_secondary": [],
      "valence": "V+",
      "intensity": "I2",
      "specificity": "S2",
      "actionability": "A1",
      "temporal": "TC",
      "evidence": "ES",
      "comparative": "CR-N",
      "is_primary": false,
      "confidence": "high",
      "entity": "manager",
      "entity_type": "staff",
      "relation_type": "resolution",
      "related_span_index": 1,
      "usn": "URT:S:A2.01:+2:21TC.ES.N"
    }
  ],
  "review_summary": {
    "dominant_valence": "V±",
    "dominant_domain": "J",
    "span_count": 3,
    "has_comparative": true,
    "has_entity": true
  }
}

Why this classification:

  • Span 0: Food quality (O1.01), superlative language (I3), "best ever" is comparative (CR-B)
  • Span 1: Wait time (J1.02), specific time (S3), actionable (A3), negative + I3 = primary
  • Span 2: Staff action (A2.01), resolution of span 1's problem

Example 2: Single-span review

Input:

{
  "review_text": "Great place!",
  "profile": "standard"
}

Output:

{
  "spans": [
    {
      "span_index": 0,
      "span_text": "Great place!",
      "span_start": 0,
      "span_end": 12,
      "urt_primary": "O1.01",
      "urt_secondary": [],
      "valence": "V+",
      "intensity": "I2",
      "specificity": "S1",
      "actionability": "A1",
      "temporal": "TC",
      "evidence": "ES",
      "comparative": "CR-N",
      "is_primary": true,
      "confidence": "medium",
      "entity": null,
      "entity_type": null,
      "relation_type": null,
      "related_span_index": null,
      "usn": "URT:S:O1.01:+2:11TC.ES.N"
    }
  ],
  "review_summary": {
    "dominant_valence": "V+",
    "dominant_domain": "O",
    "span_count": 1,
    "has_comparative": false,
    "has_entity": false
  }
}

Why this classification:

  • Vague positive (S1, A1), default to O1.01 for general positive
  • Lower confidence due to lack of specificity

Example 3: Historical + future temporal markers

Input:

{
  "review_text": "I've been coming here for years and the quality has always been consistent. Won't be back after today though - they've really gone downhill.",
  "profile": "standard"
}

Output:

{
  "spans": [
    {
      "span_index": 0,
      "span_text": "I've been coming here for years and the quality has always been consistent.",
      "span_start": 0,
      "span_end": 76,
      "urt_primary": "R1.01",
      "urt_secondary": ["O1.01"],
      "valence": "V+",
      "intensity": "I2",
      "specificity": "S2",
      "actionability": "A1",
      "temporal": "TH",
      "evidence": "ES",
      "comparative": "CR-N",
      "is_primary": false,
      "confidence": "high",
      "entity": null,
      "entity_type": null,
      "relation_type": "contrast",
      "related_span_index": 1,
      "usn": "URT:S:R1.01+O1.01:+2:21TH.ES.N"
    },
    {
      "span_index": 1,
      "span_text": "Won't be back after today though - they've really gone downhill.",
      "span_start": 77,
      "span_end": 141,
      "urt_primary": "R1.02",
      "urt_secondary": [],
      "valence": "V-",
      "intensity": "I3",
      "specificity": "S1",
      "actionability": "A1",
      "temporal": "TF",
      "evidence": "ES",
      "comparative": "CR-N",
      "is_primary": true,
      "confidence": "high",
      "entity": null,
      "entity_type": null,
      "relation_type": "contrast",
      "related_span_index": 0,
      "usn": "URT:S:R1.02:-3:11TF.ES.N"
    }
  ],
  "review_summary": {
    "dominant_valence": "V-",
    "dominant_domain": "R",
    "span_count": 2,
    "has_comparative": false,
    "has_entity": false
  }
}

Why this classification:

  • Span 0: Historical loyalty (TH), secondary O1.01 for quality mention
  • Span 1: Future intent (TF), I3 due to "really" + finality of "won't be back"
  • Mutual contrast relationship

Example 4: Comparative review

Input:

{
  "review_text": "Way better than the other coffee shops in the area. Their lattes are smoother and the prices are actually reasonable.",
  "profile": "standard"
}

Output:

{
  "spans": [
    {
      "span_index": 0,
      "span_text": "Way better than the other coffee shops in the area.",
      "span_start": 0,
      "span_end": 51,
      "urt_primary": "O1.01",
      "urt_secondary": [],
      "valence": "V+",
      "intensity": "I3",
      "specificity": "S1",
      "actionability": "A1",
      "temporal": "TC",
      "evidence": "ES",
      "comparative": "CR-B",
      "is_primary": true,
      "confidence": "high",
      "entity": null,
      "entity_type": null,
      "relation_type": null,
      "related_span_index": null,
      "usn": "URT:S:O1.01:+3:11TC.ES.B"
    },
    {
      "span_index": 1,
      "span_text": "Their lattes are smoother",
      "span_start": 52,
      "span_end": 77,
      "urt_primary": "O1.02",
      "urt_secondary": [],
      "valence": "V+",
      "intensity": "I2",
      "specificity": "S2",
      "actionability": "A1",
      "temporal": "TC",
      "evidence": "ES",
      "comparative": "CR-B",
      "is_primary": false,
      "confidence": "high",
      "entity": "lattes",
      "entity_type": "product",
      "relation_type": null,
      "related_span_index": null,
      "usn": "URT:S:O1.02:+2:21TC.ES.B"
    },
    {
      "span_index": 2,
      "span_text": "and the prices are actually reasonable.",
      "span_start": 78,
      "span_end": 117,
      "urt_primary": "P1.01",
      "urt_secondary": [],
      "valence": "V+",
      "intensity": "I2",
      "specificity": "S2",
      "actionability": "A1",
      "temporal": "TC",
      "evidence": "ES",
      "comparative": "CR-B",
      "is_primary": false,
      "confidence": "high",
      "entity": null,
      "entity_type": null,
      "relation_type": null,
      "related_span_index": null,
      "usn": "URT:S:P1.01:+2:21TC.ES.B"
    }
  ],
  "review_summary": {
    "dominant_valence": "V+",
    "dominant_domain": "O",
    "span_count": 3,
    "has_comparative": true,
    "has_entity": true
  }
}

5. Validation Rules

5.1 Structural Validation (pre-insert)

Rule Check Error
Span count 1 <= spans.length <= 15 INVALID_SPAN_COUNT
Exactly one primary spans.filter(s => s.is_primary).length === 1 INVALID_PRIMARY_COUNT
Contiguous indices spans[i].span_index === i for all i NON_CONTIGUOUS_INDEX
Non-overlapping spans[i].span_end <= spans[i+1].span_start OVERLAPPING_SPANS
Valid offsets span_end > span_start && span_start >= 0 INVALID_OFFSETS
Text matches review_text.slice(span_start, span_end) ~= span_text TEXT_MISMATCH
USN format Matches regex for profile INVALID_USN
Self-reference related_span_index !== span_index SELF_REFERENCE
Related exists related_span_index < spans.length INVALID_RELATION

5.2 Semantic Validation (warnings, not errors)

Rule Check Warning
Secondary domain Secondary codes should differ from primary domain SAME_DOMAIN_SECONDARY
Over-splitting More than 3 spans per sentence POSSIBLE_OVERSPLIT
Intensity/valence match I3 + V0 is unusual UNUSUAL_INTENSITY_VALENCE
Specificity/actionability S1 + A3 is rare UNUSUAL_SPEC_ACTION

5.3 Text Matching Rules

Allow normalization:

  • Whitespace collapse: multiple spaces → single space
  • Trim: leading/trailing whitespace
  • Case: must match exactly (no case normalization)
def text_matches(review_text: str, span: dict) -> bool:
    expected = review_text[span['span_start']:span['span_end']]
    actual = span['span_text']

    # Normalize whitespace
    expected_norm = ' '.join(expected.split())
    actual_norm = ' '.join(actual.split())

    return expected_norm == actual_norm

6. Error Handling

6.1 Retry Strategy

Error Type Action
JSON parse error Retry with "Return ONLY valid JSON" appended
Schema validation error Retry with specific field errors in prompt
Offset mismatch Retry with "Offsets must match exactly" warning
No primary span Auto-select using primary selection rules
Multiple primary spans Keep first by selection rules, unset others

6.2 Fallback Behavior

If after 3 retries the LLM still fails:

def fallback_single_span(review_text: str) -> dict:
    """Create minimal valid response for failed classification."""
    return {
        "spans": [{
            "span_index": 0,
            "span_text": review_text,
            "span_start": 0,
            "span_end": len(review_text),
            "urt_primary": "O1.01",  # Default: general offering
            "urt_secondary": [],
            "valence": "V0",  # Neutral - we don't know
            "intensity": "I1",
            "specificity": "S1",
            "actionability": "A1",
            "temporal": "TC",
            "evidence": "ES",
            "comparative": "CR-N",
            "is_primary": True,
            "confidence": "low",
            "entity": None,
            "entity_type": None,
            "relation_type": None,
            "related_span_index": None,
            "usn": "URT:S:O1.01:01:11TC.ES.N"
        }],
        "review_summary": {
            "dominant_valence": "V0",
            "dominant_domain": "O",
            "span_count": 1,
            "has_comparative": False,
            "has_entity": False
        },
        "_fallback": True,
        "_error": "Classification failed after 3 retries"
    }

7. Performance Considerations

7.1 Prompt Token Budget

Component Tokens (approx)
System prompt ~800
Schema ~400
3 few-shot examples ~1,200
Average review input ~100
Total input ~2,500
Average output ~300-800

7.2 Batching

For high-volume processing, consider:

  • Batch 5-10 short reviews per request
  • Use review_id field in input/output for correlation
  • Validate each review's spans independently

7.3 Caching

Cache key: sha256(review_text + model_version + prompt_version)

Invalidate on:

  • Model version change
  • Prompt version change
  • URT code taxonomy change

8. Version History

Version Date Changes
1.0 2026-01-24 Initial contract for URT-Standard profile

9. Future Extensions (v2.0)

  • Full profile support: Add causal_chain to output schema
  • Confidence calibration: Train confidence based on validation results
  • Entity linking: Link entities across reviews for trend detection
  • Multi-language: Add language detection and localized prompts

End of LLM Classification Contract v1.0