Files
whyrating-engine-legacy/.artifacts/LLM-Classification-Contract-v1.md
Alejandro Gutiérrez 43fd1515d2 Align artifacts with canonical URT v5.1 specification
Fixes inconsistencies discovered during audit against urt-taxonomy/:

- urt_profile ENUM: Add 'lite' and 'core' profiles (was missing)
- USN format: Use canonical regex from spec (was non-compliant)
- USN valence encoding: Add V0 (0) and V± (±) support
- USN grammar: Add Lite (URT:L:) and Core (URT:C:) formats
- Dimension codes: Fix temporal (TC/TR/TH/TF), evidence (ES/EI/EC),
  comparative (CR-N/CR-B/CR-W/CR-S) in decisions doc
- LLM contract: Full USN regex validation pattern

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:21:21 +00:00

755 lines
20 KiB
Markdown

# LLM Classification Contract v1.0
**Purpose**: Define the prompt, output schema, and validation rules for span-level URT classification.
**Target Model**: Claude 3.5 Sonnet / GPT-4o (structured output mode)
**Date**: 2026-01-24
---
## 1. Overview
The LLM receives a single review text and returns an array of **spans** — semantically distinct units of feedback. Each span is independently classified using URT v5.1.
**Pipeline position**:
```
reviews_raw.text → LLM → spans[] → review_spans table
```
---
## 2. System Prompt
```
You are a review classification system using URT (Universal Review Taxonomy) v5.1.
Your task is to extract semantic spans from customer reviews and classify each span independently.
## SPAN EXTRACTION RULES
1. **Split on contrasting conjunctions**: but, however, although, despite, yet, though
2. **Split on topic/target change**: food → service → bathroom = 3 spans
3. **Split on valence change**: positive → negative = split
4. **Split on domain change**: O (Offering) → J (Journey) → E (Environment) = split
5. **Keep together**: cause→effect within same feedback unit ("X because Y" = 1 span)
**Guardrails**:
- Max 3 spans per sentence (if 4+, re-check for over-splitting)
- Min 1 span per review (even single-word reviews)
- Spans must be non-overlapping and cover meaningful content
## URT DOMAINS (Tier-3 codes: X#.##)
| Domain | Code | Description |
|--------|------|-------------|
| Offering | O1-O4 | Product/service quality, features, variety |
| Price | P1-P4 | Value, pricing, promotions, payment |
| Journey | J1-J4 | Timing, process, convenience, accessibility |
| Environment | E1-E4 | Physical space, ambiance, cleanliness, digital UX |
| Attitude | A1-A4 | Staff behavior, helpfulness, professionalism |
| Voice | V1-V4 | Brand, communication, marketing, transparency |
| Relationship | R1-R4 | Loyalty, trust, consistency, personalization |
## DIMENSION CODES
### Valence
- V+ : Positive sentiment
- V- : Negative sentiment
- V0 : Neutral/factual
- V± : Mixed within the span
### Intensity
- I1 : Low ("okay", "fine", "decent")
- I2 : Moderate ("good", "bad", "slow")
- I3 : High ("amazing", "terrible", "unacceptable")
### Specificity
- S1 : Vague ("it was bad")
- S2 : Some detail ("the food was cold")
- S3 : Precise ("waited 45 minutes for appetizers")
### Actionability
- A1 : No clear action possible
- A2 : Possible actions, unclear which
- A3 : Clear, specific action ("train staff on X", "fix Y")
### Temporal
- TC : Current visit (default when no markers)
- TR : Recent pattern ("lately", "recently", "again")
- TH : Historical ("for years", "always", "used to")
- TF : Future ("won't return", "next time", "I expect")
### Evidence
- ES : Stated explicitly in text (default)
- EI : Inferred logically (not stated, but entailed)
- EC : Contextual (depends on surrounding text)
### Comparative
- CR-N : No comparison (default)
- CR-B : Better than alternatives
- CR-W : Worse than alternatives
- CR-S : Same as alternatives
## PRIMARY SPAN SELECTION
Mark exactly ONE span as is_primary=true using this order:
1. Highest intensity (I3 > I2 > I1)
2. Tie-break: negative over positive (V- > V± > V0 > V+)
3. Tie-break: earliest span_index
## USN (URT String Notation)
Generate a USN string for each span:
```
URT:S:{primary}[+{sec1}][+{sec2}]:{valence_sign}{intensity_num}:{S#}{A#}{temporal}.{evidence}.{CR_suffix}
```
Examples:
- `URT:S:J1.03:-2:22TC.ES.N` (J1.03, V-, I2, S2, A2, TC, ES, CR-N)
- `URT:S:P1.01+O2.03:+3:33TR.ES.B` (P1.01 primary, O2.03 secondary, V+, I3, S3, A3, TR, ES, CR-B)
Valence encoding: + for V+, - for V-, 0 for V0, ± for V±
CR suffix: N=CR-N, B=CR-B, W=CR-W, S=CR-S
## OUTPUT FORMAT
Return valid JSON matching the schema exactly. No markdown, no explanations.
```
---
## 3. Output JSON Schema
```json
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "URT Span Extraction Response",
"type": "object",
"required": ["spans", "review_summary"],
"additionalProperties": false,
"properties": {
"spans": {
"type": "array",
"minItems": 1,
"maxItems": 15,
"items": {
"type": "object",
"required": [
"span_index",
"span_text",
"span_start",
"span_end",
"urt_primary",
"urt_secondary",
"valence",
"intensity",
"specificity",
"actionability",
"temporal",
"evidence",
"comparative",
"is_primary",
"usn"
],
"additionalProperties": false,
"properties": {
"span_index": {
"type": "integer",
"minimum": 0,
"description": "0-based position in review"
},
"span_text": {
"type": "string",
"minLength": 1,
"description": "Exact text extracted from review"
},
"span_start": {
"type": "integer",
"minimum": 0,
"description": "Character offset start (0-indexed)"
},
"span_end": {
"type": "integer",
"minimum": 1,
"description": "Character offset end (exclusive)"
},
"urt_primary": {
"type": "string",
"pattern": "^[OPJEAVR][1-4]\\.[0-9]{2}$",
"description": "Primary URT Tier-3 code"
},
"urt_secondary": {
"type": "array",
"maxItems": 2,
"items": {
"type": "string",
"pattern": "^[OPJEAVR][1-4]\\.[0-9]{2}$"
},
"description": "Secondary codes (max 2, different domains preferred)"
},
"valence": {
"type": "string",
"enum": ["V+", "V-", "V0", "V±"]
},
"intensity": {
"type": "string",
"enum": ["I1", "I2", "I3"]
},
"specificity": {
"type": "string",
"enum": ["S1", "S2", "S3"]
},
"actionability": {
"type": "string",
"enum": ["A1", "A2", "A3"]
},
"temporal": {
"type": "string",
"enum": ["TC", "TR", "TH", "TF"]
},
"evidence": {
"type": "string",
"enum": ["ES", "EI", "EC"]
},
"comparative": {
"type": "string",
"enum": ["CR-N", "CR-B", "CR-W", "CR-S"]
},
"is_primary": {
"type": "boolean",
"description": "True for exactly one span per review"
},
"confidence": {
"type": "string",
"enum": ["high", "medium", "low"],
"default": "medium"
},
"entity": {
"type": ["string", "null"],
"description": "Named entity if present (staff name, product, location)"
},
"entity_type": {
"type": ["string", "null"],
"enum": ["location", "staff", "product", "process", "time", "other", null]
},
"relation_type": {
"type": ["string", "null"],
"enum": ["cause_of", "effect_of", "contrast", "resolution", null],
"description": "Relationship to another span in this review"
},
"related_span_index": {
"type": ["integer", "null"],
"minimum": 0,
"description": "Index of related span (must be different from this span)"
},
"usn": {
"type": "string",
"pattern": "^URT:S:[OPJEAVR][1-4]\\.[0-9]{2}(\\+[OPJEAVR][1-4]\\.[0-9]{2}){0,2}:[+\\-0±][123]:[1-3][1-3]T[CRHF]\\.E[SIC]\\.[NBWS]$",
"description": "URT String Notation for audit (Standard profile)"
}
}
}
},
"review_summary": {
"type": "object",
"required": ["dominant_valence", "dominant_domain", "span_count"],
"properties": {
"dominant_valence": {
"type": "string",
"enum": ["V+", "V-", "V0", "V±"]
},
"dominant_domain": {
"type": "string",
"pattern": "^[OPJEAVR]$"
},
"span_count": {
"type": "integer",
"minimum": 1
},
"has_comparative": {
"type": "boolean"
},
"has_entity": {
"type": "boolean"
}
}
}
}
}
```
---
## 4. Few-Shot Examples
### Example 1: Multi-span review (3 spans)
**Input**:
```json
{
"review_text": "The food was incredible, best steak I've ever had. But we waited 45 minutes for a table even with a reservation. The manager comped our drinks which was nice.",
"profile": "standard"
}
```
**Output**:
```json
{
"spans": [
{
"span_index": 0,
"span_text": "The food was incredible, best steak I've ever had.",
"span_start": 0,
"span_end": 50,
"urt_primary": "O1.01",
"urt_secondary": [],
"valence": "V+",
"intensity": "I3",
"specificity": "S2",
"actionability": "A1",
"temporal": "TC",
"evidence": "ES",
"comparative": "CR-B",
"is_primary": false,
"confidence": "high",
"entity": "steak",
"entity_type": "product",
"relation_type": null,
"related_span_index": null,
"usn": "URT:S:O1.01:+3:21TC.ES.B"
},
{
"span_index": 1,
"span_text": "But we waited 45 minutes for a table even with a reservation.",
"span_start": 51,
"span_end": 112,
"urt_primary": "J1.02",
"urt_secondary": [],
"valence": "V-",
"intensity": "I3",
"specificity": "S3",
"actionability": "A3",
"temporal": "TC",
"evidence": "ES",
"comparative": "CR-N",
"is_primary": true,
"confidence": "high",
"entity": null,
"entity_type": null,
"relation_type": null,
"related_span_index": null,
"usn": "URT:S:J1.02:-3:33TC.ES.N"
},
{
"span_index": 2,
"span_text": "The manager comped our drinks which was nice.",
"span_start": 113,
"span_end": 158,
"urt_primary": "A2.01",
"urt_secondary": [],
"valence": "V+",
"intensity": "I2",
"specificity": "S2",
"actionability": "A1",
"temporal": "TC",
"evidence": "ES",
"comparative": "CR-N",
"is_primary": false,
"confidence": "high",
"entity": "manager",
"entity_type": "staff",
"relation_type": "resolution",
"related_span_index": 1,
"usn": "URT:S:A2.01:+2:21TC.ES.N"
}
],
"review_summary": {
"dominant_valence": "V±",
"dominant_domain": "J",
"span_count": 3,
"has_comparative": true,
"has_entity": true
}
}
```
**Why this classification**:
- Span 0: Food quality (O1.01), superlative language (I3), "best ever" is comparative (CR-B)
- Span 1: Wait time (J1.02), specific time (S3), actionable (A3), negative + I3 = primary
- Span 2: Staff action (A2.01), resolution of span 1's problem
---
### Example 2: Single-span review
**Input**:
```json
{
"review_text": "Great place!",
"profile": "standard"
}
```
**Output**:
```json
{
"spans": [
{
"span_index": 0,
"span_text": "Great place!",
"span_start": 0,
"span_end": 12,
"urt_primary": "O1.01",
"urt_secondary": [],
"valence": "V+",
"intensity": "I2",
"specificity": "S1",
"actionability": "A1",
"temporal": "TC",
"evidence": "ES",
"comparative": "CR-N",
"is_primary": true,
"confidence": "medium",
"entity": null,
"entity_type": null,
"relation_type": null,
"related_span_index": null,
"usn": "URT:S:O1.01:+2:11TC.ES.N"
}
],
"review_summary": {
"dominant_valence": "V+",
"dominant_domain": "O",
"span_count": 1,
"has_comparative": false,
"has_entity": false
}
}
```
**Why this classification**:
- Vague positive (S1, A1), default to O1.01 for general positive
- Lower confidence due to lack of specificity
---
### Example 3: Historical + future temporal markers
**Input**:
```json
{
"review_text": "I've been coming here for years and the quality has always been consistent. Won't be back after today though - they've really gone downhill.",
"profile": "standard"
}
```
**Output**:
```json
{
"spans": [
{
"span_index": 0,
"span_text": "I've been coming here for years and the quality has always been consistent.",
"span_start": 0,
"span_end": 76,
"urt_primary": "R1.01",
"urt_secondary": ["O1.01"],
"valence": "V+",
"intensity": "I2",
"specificity": "S2",
"actionability": "A1",
"temporal": "TH",
"evidence": "ES",
"comparative": "CR-N",
"is_primary": false,
"confidence": "high",
"entity": null,
"entity_type": null,
"relation_type": "contrast",
"related_span_index": 1,
"usn": "URT:S:R1.01+O1.01:+2:21TH.ES.N"
},
{
"span_index": 1,
"span_text": "Won't be back after today though - they've really gone downhill.",
"span_start": 77,
"span_end": 141,
"urt_primary": "R1.02",
"urt_secondary": [],
"valence": "V-",
"intensity": "I3",
"specificity": "S1",
"actionability": "A1",
"temporal": "TF",
"evidence": "ES",
"comparative": "CR-N",
"is_primary": true,
"confidence": "high",
"entity": null,
"entity_type": null,
"relation_type": "contrast",
"related_span_index": 0,
"usn": "URT:S:R1.02:-3:11TF.ES.N"
}
],
"review_summary": {
"dominant_valence": "V-",
"dominant_domain": "R",
"span_count": 2,
"has_comparative": false,
"has_entity": false
}
}
```
**Why this classification**:
- Span 0: Historical loyalty (TH), secondary O1.01 for quality mention
- Span 1: Future intent (TF), I3 due to "really" + finality of "won't be back"
- Mutual contrast relationship
---
### Example 4: Comparative review
**Input**:
```json
{
"review_text": "Way better than the other coffee shops in the area. Their lattes are smoother and the prices are actually reasonable.",
"profile": "standard"
}
```
**Output**:
```json
{
"spans": [
{
"span_index": 0,
"span_text": "Way better than the other coffee shops in the area.",
"span_start": 0,
"span_end": 51,
"urt_primary": "O1.01",
"urt_secondary": [],
"valence": "V+",
"intensity": "I3",
"specificity": "S1",
"actionability": "A1",
"temporal": "TC",
"evidence": "ES",
"comparative": "CR-B",
"is_primary": true,
"confidence": "high",
"entity": null,
"entity_type": null,
"relation_type": null,
"related_span_index": null,
"usn": "URT:S:O1.01:+3:11TC.ES.B"
},
{
"span_index": 1,
"span_text": "Their lattes are smoother",
"span_start": 52,
"span_end": 77,
"urt_primary": "O1.02",
"urt_secondary": [],
"valence": "V+",
"intensity": "I2",
"specificity": "S2",
"actionability": "A1",
"temporal": "TC",
"evidence": "ES",
"comparative": "CR-B",
"is_primary": false,
"confidence": "high",
"entity": "lattes",
"entity_type": "product",
"relation_type": null,
"related_span_index": null,
"usn": "URT:S:O1.02:+2:21TC.ES.B"
},
{
"span_index": 2,
"span_text": "and the prices are actually reasonable.",
"span_start": 78,
"span_end": 117,
"urt_primary": "P1.01",
"urt_secondary": [],
"valence": "V+",
"intensity": "I2",
"specificity": "S2",
"actionability": "A1",
"temporal": "TC",
"evidence": "ES",
"comparative": "CR-B",
"is_primary": false,
"confidence": "high",
"entity": null,
"entity_type": null,
"relation_type": null,
"related_span_index": null,
"usn": "URT:S:P1.01:+2:21TC.ES.B"
}
],
"review_summary": {
"dominant_valence": "V+",
"dominant_domain": "O",
"span_count": 3,
"has_comparative": true,
"has_entity": true
}
}
```
---
## 5. Validation Rules
### 5.1 Structural Validation (pre-insert)
| Rule | Check | Error |
|------|-------|-------|
| Span count | `1 <= spans.length <= 15` | INVALID_SPAN_COUNT |
| Exactly one primary | `spans.filter(s => s.is_primary).length === 1` | INVALID_PRIMARY_COUNT |
| Contiguous indices | `spans[i].span_index === i` for all i | NON_CONTIGUOUS_INDEX |
| Non-overlapping | `spans[i].span_end <= spans[i+1].span_start` | OVERLAPPING_SPANS |
| Valid offsets | `span_end > span_start && span_start >= 0` | INVALID_OFFSETS |
| Text matches | `review_text.slice(span_start, span_end) ~= span_text` | TEXT_MISMATCH |
| USN format | Matches regex for profile | INVALID_USN |
| Self-reference | `related_span_index !== span_index` | SELF_REFERENCE |
| Related exists | `related_span_index < spans.length` | INVALID_RELATION |
### 5.2 Semantic Validation (warnings, not errors)
| Rule | Check | Warning |
|------|-------|---------|
| Secondary domain | Secondary codes should differ from primary domain | SAME_DOMAIN_SECONDARY |
| Over-splitting | More than 3 spans per sentence | POSSIBLE_OVERSPLIT |
| Intensity/valence match | I3 + V0 is unusual | UNUSUAL_INTENSITY_VALENCE |
| Specificity/actionability | S1 + A3 is rare | UNUSUAL_SPEC_ACTION |
### 5.3 Text Matching Rules
Allow normalization:
- Whitespace collapse: multiple spaces → single space
- Trim: leading/trailing whitespace
- Case: must match exactly (no case normalization)
```python
def text_matches(review_text: str, span: dict) -> bool:
expected = review_text[span['span_start']:span['span_end']]
actual = span['span_text']
# Normalize whitespace
expected_norm = ' '.join(expected.split())
actual_norm = ' '.join(actual.split())
return expected_norm == actual_norm
```
---
## 6. Error Handling
### 6.1 Retry Strategy
| Error Type | Action |
|------------|--------|
| JSON parse error | Retry with "Return ONLY valid JSON" appended |
| Schema validation error | Retry with specific field errors in prompt |
| Offset mismatch | Retry with "Offsets must match exactly" warning |
| No primary span | Auto-select using primary selection rules |
| Multiple primary spans | Keep first by selection rules, unset others |
### 6.2 Fallback Behavior
If after 3 retries the LLM still fails:
```python
def fallback_single_span(review_text: str) -> dict:
"""Create minimal valid response for failed classification."""
return {
"spans": [{
"span_index": 0,
"span_text": review_text,
"span_start": 0,
"span_end": len(review_text),
"urt_primary": "O1.01", # Default: general offering
"urt_secondary": [],
"valence": "V0", # Neutral - we don't know
"intensity": "I1",
"specificity": "S1",
"actionability": "A1",
"temporal": "TC",
"evidence": "ES",
"comparative": "CR-N",
"is_primary": True,
"confidence": "low",
"entity": None,
"entity_type": None,
"relation_type": None,
"related_span_index": None,
"usn": "URT:S:O1.01:01:11TC.ES.N"
}],
"review_summary": {
"dominant_valence": "V0",
"dominant_domain": "O",
"span_count": 1,
"has_comparative": False,
"has_entity": False
},
"_fallback": True,
"_error": "Classification failed after 3 retries"
}
```
---
## 7. Performance Considerations
### 7.1 Prompt Token Budget
| Component | Tokens (approx) |
|-----------|-----------------|
| System prompt | ~800 |
| Schema | ~400 |
| 3 few-shot examples | ~1,200 |
| Average review input | ~100 |
| **Total input** | ~2,500 |
| Average output | ~300-800 |
### 7.2 Batching
For high-volume processing, consider:
- Batch 5-10 short reviews per request
- Use `review_id` field in input/output for correlation
- Validate each review's spans independently
### 7.3 Caching
Cache key: `sha256(review_text + model_version + prompt_version)`
Invalidate on:
- Model version change
- Prompt version change
- URT code taxonomy change
---
## 8. Version History
| Version | Date | Changes |
|---------|------|---------|
| 1.0 | 2026-01-24 | Initial contract for URT-Standard profile |
---
## 9. Future Extensions (v2.0)
- **Full profile support**: Add `causal_chain` to output schema
- **Confidence calibration**: Train confidence based on validation results
- **Entity linking**: Link entities across reviews for trend detection
- **Multi-language**: Add language detection and localized prompts
---
*End of LLM Classification Contract v1.0*