Defines prompt, output schema, and validation rules for span-level URT classification: - System prompt with span extraction rules - JSON schema for structured output - 4 few-shot examples (multi-span, temporal, comparative) - Structural and semantic validation rules - Error handling with retry + fallback - Performance considerations (token budget, batching, caching) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
20 KiB
LLM Classification Contract v1.0
Purpose: Define the prompt, output schema, and validation rules for span-level URT classification. Target Model: Claude 3.5 Sonnet / GPT-4o (structured output mode) Date: 2026-01-24
1. Overview
The LLM receives a single review text and returns an array of spans — semantically distinct units of feedback. Each span is independently classified using URT v5.1.
Pipeline position:
reviews_raw.text → LLM → spans[] → review_spans table
2. System Prompt
You are a review classification system using URT (Universal Review Taxonomy) v5.1.
Your task is to extract semantic spans from customer reviews and classify each span independently.
## SPAN EXTRACTION RULES
1. **Split on contrasting conjunctions**: but, however, although, despite, yet, though
2. **Split on topic/target change**: food → service → bathroom = 3 spans
3. **Split on valence change**: positive → negative = split
4. **Split on domain change**: O (Offering) → J (Journey) → E (Environment) = split
5. **Keep together**: cause→effect within same feedback unit ("X because Y" = 1 span)
**Guardrails**:
- Max 3 spans per sentence (if 4+, re-check for over-splitting)
- Min 1 span per review (even single-word reviews)
- Spans must be non-overlapping and cover meaningful content
## URT DOMAINS (Tier-3 codes: X#.##)
| Domain | Code | Description |
|--------|------|-------------|
| Offering | O1-O4 | Product/service quality, features, variety |
| Price | P1-P4 | Value, pricing, promotions, payment |
| Journey | J1-J4 | Timing, process, convenience, accessibility |
| Environment | E1-E4 | Physical space, ambiance, cleanliness, digital UX |
| Attitude | A1-A4 | Staff behavior, helpfulness, professionalism |
| Voice | V1-V4 | Brand, communication, marketing, transparency |
| Relationship | R1-R4 | Loyalty, trust, consistency, personalization |
## DIMENSION CODES
### Valence
- V+ : Positive sentiment
- V- : Negative sentiment
- V0 : Neutral/factual
- V± : Mixed within the span
### Intensity
- I1 : Low ("okay", "fine", "decent")
- I2 : Moderate ("good", "bad", "slow")
- I3 : High ("amazing", "terrible", "unacceptable")
### Specificity
- S1 : Vague ("it was bad")
- S2 : Some detail ("the food was cold")
- S3 : Precise ("waited 45 minutes for appetizers")
### Actionability
- A1 : No clear action possible
- A2 : Possible actions, unclear which
- A3 : Clear, specific action ("train staff on X", "fix Y")
### Temporal
- TC : Current visit (default when no markers)
- TR : Recent pattern ("lately", "recently", "again")
- TH : Historical ("for years", "always", "used to")
- TF : Future ("won't return", "next time", "I expect")
### Evidence
- ES : Stated explicitly in text (default)
- EI : Inferred logically (not stated, but entailed)
- EC : Contextual (depends on surrounding text)
### Comparative
- CR-N : No comparison (default)
- CR-B : Better than alternatives
- CR-W : Worse than alternatives
- CR-S : Same as alternatives
## PRIMARY SPAN SELECTION
Mark exactly ONE span as is_primary=true using this order:
1. Highest intensity (I3 > I2 > I1)
2. Tie-break: negative over positive (V- > V± > V0 > V+)
3. Tie-break: earliest span_index
## USN (URT String Notation)
Generate a USN string for each span:
URT:S:{primary}[+{sec1}][+{sec2}]:{valence_sign}{intensity_num}:{S#}{A#}{temporal}.{evidence}.{CR_suffix}
Examples:
- `URT:S:J1.03:-2:22TC.ES.N` (J1.03, V-, I2, S2, A2, TC, ES, CR-N)
- `URT:S:P1.01+O2.03:+3:33TR.ES.B` (P1.01 primary, O2.03 secondary, V+, I3, S3, A3, TR, ES, CR-B)
Valence encoding: + for V+, - for V-, 0 for V0, ± for V±
CR suffix: N=CR-N, B=CR-B, W=CR-W, S=CR-S
## OUTPUT FORMAT
Return valid JSON matching the schema exactly. No markdown, no explanations.
3. Output JSON Schema
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "URT Span Extraction Response",
"type": "object",
"required": ["spans", "review_summary"],
"additionalProperties": false,
"properties": {
"spans": {
"type": "array",
"minItems": 1,
"maxItems": 15,
"items": {
"type": "object",
"required": [
"span_index",
"span_text",
"span_start",
"span_end",
"urt_primary",
"urt_secondary",
"valence",
"intensity",
"specificity",
"actionability",
"temporal",
"evidence",
"comparative",
"is_primary",
"usn"
],
"additionalProperties": false,
"properties": {
"span_index": {
"type": "integer",
"minimum": 0,
"description": "0-based position in review"
},
"span_text": {
"type": "string",
"minLength": 1,
"description": "Exact text extracted from review"
},
"span_start": {
"type": "integer",
"minimum": 0,
"description": "Character offset start (0-indexed)"
},
"span_end": {
"type": "integer",
"minimum": 1,
"description": "Character offset end (exclusive)"
},
"urt_primary": {
"type": "string",
"pattern": "^[OPJEAVR][1-4]\\.[0-9]{2}$",
"description": "Primary URT Tier-3 code"
},
"urt_secondary": {
"type": "array",
"maxItems": 2,
"items": {
"type": "string",
"pattern": "^[OPJEAVR][1-4]\\.[0-9]{2}$"
},
"description": "Secondary codes (max 2, different domains preferred)"
},
"valence": {
"type": "string",
"enum": ["V+", "V-", "V0", "V±"]
},
"intensity": {
"type": "string",
"enum": ["I1", "I2", "I3"]
},
"specificity": {
"type": "string",
"enum": ["S1", "S2", "S3"]
},
"actionability": {
"type": "string",
"enum": ["A1", "A2", "A3"]
},
"temporal": {
"type": "string",
"enum": ["TC", "TR", "TH", "TF"]
},
"evidence": {
"type": "string",
"enum": ["ES", "EI", "EC"]
},
"comparative": {
"type": "string",
"enum": ["CR-N", "CR-B", "CR-W", "CR-S"]
},
"is_primary": {
"type": "boolean",
"description": "True for exactly one span per review"
},
"confidence": {
"type": "string",
"enum": ["high", "medium", "low"],
"default": "medium"
},
"entity": {
"type": ["string", "null"],
"description": "Named entity if present (staff name, product, location)"
},
"entity_type": {
"type": ["string", "null"],
"enum": ["location", "staff", "product", "process", "time", "other", null]
},
"relation_type": {
"type": ["string", "null"],
"enum": ["cause_of", "effect_of", "contrast", "resolution", null],
"description": "Relationship to another span in this review"
},
"related_span_index": {
"type": ["integer", "null"],
"minimum": 0,
"description": "Index of related span (must be different from this span)"
},
"usn": {
"type": "string",
"pattern": "^URT:S:[OPJEAVR][1-4]\\.[0-9]{2}",
"description": "URT String Notation for audit"
}
}
}
},
"review_summary": {
"type": "object",
"required": ["dominant_valence", "dominant_domain", "span_count"],
"properties": {
"dominant_valence": {
"type": "string",
"enum": ["V+", "V-", "V0", "V±"]
},
"dominant_domain": {
"type": "string",
"pattern": "^[OPJEAVR]$"
},
"span_count": {
"type": "integer",
"minimum": 1
},
"has_comparative": {
"type": "boolean"
},
"has_entity": {
"type": "boolean"
}
}
}
}
}
4. Few-Shot Examples
Example 1: Multi-span review (3 spans)
Input:
{
"review_text": "The food was incredible, best steak I've ever had. But we waited 45 minutes for a table even with a reservation. The manager comped our drinks which was nice.",
"profile": "standard"
}
Output:
{
"spans": [
{
"span_index": 0,
"span_text": "The food was incredible, best steak I've ever had.",
"span_start": 0,
"span_end": 50,
"urt_primary": "O1.01",
"urt_secondary": [],
"valence": "V+",
"intensity": "I3",
"specificity": "S2",
"actionability": "A1",
"temporal": "TC",
"evidence": "ES",
"comparative": "CR-B",
"is_primary": false,
"confidence": "high",
"entity": "steak",
"entity_type": "product",
"relation_type": null,
"related_span_index": null,
"usn": "URT:S:O1.01:+3:21TC.ES.B"
},
{
"span_index": 1,
"span_text": "But we waited 45 minutes for a table even with a reservation.",
"span_start": 51,
"span_end": 112,
"urt_primary": "J1.02",
"urt_secondary": [],
"valence": "V-",
"intensity": "I3",
"specificity": "S3",
"actionability": "A3",
"temporal": "TC",
"evidence": "ES",
"comparative": "CR-N",
"is_primary": true,
"confidence": "high",
"entity": null,
"entity_type": null,
"relation_type": null,
"related_span_index": null,
"usn": "URT:S:J1.02:-3:33TC.ES.N"
},
{
"span_index": 2,
"span_text": "The manager comped our drinks which was nice.",
"span_start": 113,
"span_end": 158,
"urt_primary": "A2.01",
"urt_secondary": [],
"valence": "V+",
"intensity": "I2",
"specificity": "S2",
"actionability": "A1",
"temporal": "TC",
"evidence": "ES",
"comparative": "CR-N",
"is_primary": false,
"confidence": "high",
"entity": "manager",
"entity_type": "staff",
"relation_type": "resolution",
"related_span_index": 1,
"usn": "URT:S:A2.01:+2:21TC.ES.N"
}
],
"review_summary": {
"dominant_valence": "V±",
"dominant_domain": "J",
"span_count": 3,
"has_comparative": true,
"has_entity": true
}
}
Why this classification:
- Span 0: Food quality (O1.01), superlative language (I3), "best ever" is comparative (CR-B)
- Span 1: Wait time (J1.02), specific time (S3), actionable (A3), negative + I3 = primary
- Span 2: Staff action (A2.01), resolution of span 1's problem
Example 2: Single-span review
Input:
{
"review_text": "Great place!",
"profile": "standard"
}
Output:
{
"spans": [
{
"span_index": 0,
"span_text": "Great place!",
"span_start": 0,
"span_end": 12,
"urt_primary": "O1.01",
"urt_secondary": [],
"valence": "V+",
"intensity": "I2",
"specificity": "S1",
"actionability": "A1",
"temporal": "TC",
"evidence": "ES",
"comparative": "CR-N",
"is_primary": true,
"confidence": "medium",
"entity": null,
"entity_type": null,
"relation_type": null,
"related_span_index": null,
"usn": "URT:S:O1.01:+2:11TC.ES.N"
}
],
"review_summary": {
"dominant_valence": "V+",
"dominant_domain": "O",
"span_count": 1,
"has_comparative": false,
"has_entity": false
}
}
Why this classification:
- Vague positive (S1, A1), default to O1.01 for general positive
- Lower confidence due to lack of specificity
Example 3: Historical + future temporal markers
Input:
{
"review_text": "I've been coming here for years and the quality has always been consistent. Won't be back after today though - they've really gone downhill.",
"profile": "standard"
}
Output:
{
"spans": [
{
"span_index": 0,
"span_text": "I've been coming here for years and the quality has always been consistent.",
"span_start": 0,
"span_end": 76,
"urt_primary": "R1.01",
"urt_secondary": ["O1.01"],
"valence": "V+",
"intensity": "I2",
"specificity": "S2",
"actionability": "A1",
"temporal": "TH",
"evidence": "ES",
"comparative": "CR-N",
"is_primary": false,
"confidence": "high",
"entity": null,
"entity_type": null,
"relation_type": "contrast",
"related_span_index": 1,
"usn": "URT:S:R1.01+O1.01:+2:21TH.ES.N"
},
{
"span_index": 1,
"span_text": "Won't be back after today though - they've really gone downhill.",
"span_start": 77,
"span_end": 141,
"urt_primary": "R1.02",
"urt_secondary": [],
"valence": "V-",
"intensity": "I3",
"specificity": "S1",
"actionability": "A1",
"temporal": "TF",
"evidence": "ES",
"comparative": "CR-N",
"is_primary": true,
"confidence": "high",
"entity": null,
"entity_type": null,
"relation_type": "contrast",
"related_span_index": 0,
"usn": "URT:S:R1.02:-3:11TF.ES.N"
}
],
"review_summary": {
"dominant_valence": "V-",
"dominant_domain": "R",
"span_count": 2,
"has_comparative": false,
"has_entity": false
}
}
Why this classification:
- Span 0: Historical loyalty (TH), secondary O1.01 for quality mention
- Span 1: Future intent (TF), I3 due to "really" + finality of "won't be back"
- Mutual contrast relationship
Example 4: Comparative review
Input:
{
"review_text": "Way better than the other coffee shops in the area. Their lattes are smoother and the prices are actually reasonable.",
"profile": "standard"
}
Output:
{
"spans": [
{
"span_index": 0,
"span_text": "Way better than the other coffee shops in the area.",
"span_start": 0,
"span_end": 51,
"urt_primary": "O1.01",
"urt_secondary": [],
"valence": "V+",
"intensity": "I3",
"specificity": "S1",
"actionability": "A1",
"temporal": "TC",
"evidence": "ES",
"comparative": "CR-B",
"is_primary": true,
"confidence": "high",
"entity": null,
"entity_type": null,
"relation_type": null,
"related_span_index": null,
"usn": "URT:S:O1.01:+3:11TC.ES.B"
},
{
"span_index": 1,
"span_text": "Their lattes are smoother",
"span_start": 52,
"span_end": 77,
"urt_primary": "O1.02",
"urt_secondary": [],
"valence": "V+",
"intensity": "I2",
"specificity": "S2",
"actionability": "A1",
"temporal": "TC",
"evidence": "ES",
"comparative": "CR-B",
"is_primary": false,
"confidence": "high",
"entity": "lattes",
"entity_type": "product",
"relation_type": null,
"related_span_index": null,
"usn": "URT:S:O1.02:+2:21TC.ES.B"
},
{
"span_index": 2,
"span_text": "and the prices are actually reasonable.",
"span_start": 78,
"span_end": 117,
"urt_primary": "P1.01",
"urt_secondary": [],
"valence": "V+",
"intensity": "I2",
"specificity": "S2",
"actionability": "A1",
"temporal": "TC",
"evidence": "ES",
"comparative": "CR-B",
"is_primary": false,
"confidence": "high",
"entity": null,
"entity_type": null,
"relation_type": null,
"related_span_index": null,
"usn": "URT:S:P1.01:+2:21TC.ES.B"
}
],
"review_summary": {
"dominant_valence": "V+",
"dominant_domain": "O",
"span_count": 3,
"has_comparative": true,
"has_entity": true
}
}
5. Validation Rules
5.1 Structural Validation (pre-insert)
| Rule | Check | Error |
|---|---|---|
| Span count | 1 <= spans.length <= 15 |
INVALID_SPAN_COUNT |
| Exactly one primary | spans.filter(s => s.is_primary).length === 1 |
INVALID_PRIMARY_COUNT |
| Contiguous indices | spans[i].span_index === i for all i |
NON_CONTIGUOUS_INDEX |
| Non-overlapping | spans[i].span_end <= spans[i+1].span_start |
OVERLAPPING_SPANS |
| Valid offsets | span_end > span_start && span_start >= 0 |
INVALID_OFFSETS |
| Text matches | review_text.slice(span_start, span_end) ~= span_text |
TEXT_MISMATCH |
| USN format | Matches regex for profile | INVALID_USN |
| Self-reference | related_span_index !== span_index |
SELF_REFERENCE |
| Related exists | related_span_index < spans.length |
INVALID_RELATION |
5.2 Semantic Validation (warnings, not errors)
| Rule | Check | Warning |
|---|---|---|
| Secondary domain | Secondary codes should differ from primary domain | SAME_DOMAIN_SECONDARY |
| Over-splitting | More than 3 spans per sentence | POSSIBLE_OVERSPLIT |
| Intensity/valence match | I3 + V0 is unusual | UNUSUAL_INTENSITY_VALENCE |
| Specificity/actionability | S1 + A3 is rare | UNUSUAL_SPEC_ACTION |
5.3 Text Matching Rules
Allow normalization:
- Whitespace collapse: multiple spaces → single space
- Trim: leading/trailing whitespace
- Case: must match exactly (no case normalization)
def text_matches(review_text: str, span: dict) -> bool:
expected = review_text[span['span_start']:span['span_end']]
actual = span['span_text']
# Normalize whitespace
expected_norm = ' '.join(expected.split())
actual_norm = ' '.join(actual.split())
return expected_norm == actual_norm
6. Error Handling
6.1 Retry Strategy
| Error Type | Action |
|---|---|
| JSON parse error | Retry with "Return ONLY valid JSON" appended |
| Schema validation error | Retry with specific field errors in prompt |
| Offset mismatch | Retry with "Offsets must match exactly" warning |
| No primary span | Auto-select using primary selection rules |
| Multiple primary spans | Keep first by selection rules, unset others |
6.2 Fallback Behavior
If after 3 retries the LLM still fails:
def fallback_single_span(review_text: str) -> dict:
"""Create minimal valid response for failed classification."""
return {
"spans": [{
"span_index": 0,
"span_text": review_text,
"span_start": 0,
"span_end": len(review_text),
"urt_primary": "O1.01", # Default: general offering
"urt_secondary": [],
"valence": "V0", # Neutral - we don't know
"intensity": "I1",
"specificity": "S1",
"actionability": "A1",
"temporal": "TC",
"evidence": "ES",
"comparative": "CR-N",
"is_primary": True,
"confidence": "low",
"entity": None,
"entity_type": None,
"relation_type": None,
"related_span_index": None,
"usn": "URT:S:O1.01:01:11TC.ES.N"
}],
"review_summary": {
"dominant_valence": "V0",
"dominant_domain": "O",
"span_count": 1,
"has_comparative": False,
"has_entity": False
},
"_fallback": True,
"_error": "Classification failed after 3 retries"
}
7. Performance Considerations
7.1 Prompt Token Budget
| Component | Tokens (approx) |
|---|---|
| System prompt | ~800 |
| Schema | ~400 |
| 3 few-shot examples | ~1,200 |
| Average review input | ~100 |
| Total input | ~2,500 |
| Average output | ~300-800 |
7.2 Batching
For high-volume processing, consider:
- Batch 5-10 short reviews per request
- Use
review_idfield in input/output for correlation - Validate each review's spans independently
7.3 Caching
Cache key: sha256(review_text + model_version + prompt_version)
Invalidate on:
- Model version change
- Prompt version change
- URT code taxonomy change
8. Version History
| Version | Date | Changes |
|---|---|---|
| 1.0 | 2026-01-24 | Initial contract for URT-Standard profile |
9. Future Extensions (v2.0)
- Full profile support: Add
causal_chainto output schema - Confidence calibration: Train confidence based on validation results
- Entity linking: Link entities across reviews for trend detection
- Multi-language: Add language detection and localized prompts
End of LLM Classification Contract v1.0