Files

Alejandro Gutiérrez 3eda9bdbfa Add complete URT v5.1 taxonomy framework (11 artifacts)

Universal Review Taxonomy v5.1 implementation with:
- Track A (Training): A1 Quickstart, A2 QA Protocol, A3 Calibration Set, A4 Full Manual
- Track B (Engineering): B1 Code Registry, B2 Database Schema, B3 Owner Routing, B4 API Contract
- Track C (Analytics): C1 Issue Lifecycle, C2 KPI Mapping Guide
- Track D (Integration): D1 Dashboard Specification

Covers 7 domains, 28 categories, 138 subcodes, 16 causal codes, and 7 metadata dimensions.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

2026-01-24 10:51:41 +00:00

71 KiB

Raw Blame History

A3: Calibration Test Set

Universal Review Taxonomy (URT) v5.1 - Gold Standard Annotation Corpus

Purpose: Gold standard examples for annotator training, certification, and ongoing calibration Version: 5.1 | Status: Production Ready | Date: 2026-01-23

Overview
Test Set Structure
Calibration Examples by Domain
Confusion Pair Examples
Multi-Code Span Examples
Edge Cases and Boundary Conditions
Certification Test Structure
Scoring Rubric
Pass/Fail Criteria

1. Overview

1.1 Purpose

This document serves as the gold standard annotation corpus for URT v5.1. It provides:

Training material for new annotators to learn correct classification
Certification tests to verify annotator competency at each profile level
Calibration benchmarks for ongoing quality assurance
Reference examples for resolving disagreements

1.2 Gold Standard Creation Process

All examples in this corpus have been validated through:

Triple-blind annotation by 3+ certified expert annotators
Krippendorff's Alpha >= 0.90 agreement threshold
Expert panel review for disputed cases
Documentation of rationale for all decisions

1.3 Usage Guidelines

Use Case	Process
New annotator training	Work through examples in order, domain by domain
Certification testing	Select items by profile level (see Section 7)
Calibration sessions	Use confusion pairs and edge cases
Dispute resolution	Reference rationale for analogous cases

1.4 Corpus Statistics

Metric	Count
Total Examples	62
Examples per Domain	8-10
Confusion Pair Examples	12
Multi-Code Examples	8
Edge Cases	10
Difficulty: Easy	20
Difficulty: Medium	22
Difficulty: Hard	14
Difficulty: Expert	6

2. Test Set Structure

2.1 Difficulty Levels

Level	Description	Target Annotators	Typical Agreement
Easy	Clear single-domain, obvious valence/intensity	Entry-level	>= 0.95 Kappa
Medium	Standard cases requiring decision tree application	Standard	>= 0.85 Kappa
Hard	Confusion pairs, multi-code, subtle distinctions	Advanced	>= 0.75 Kappa
Expert	Edge cases, causal chains, complex metadata	Expert	>= 0.70 Kappa

2.2 Example Format

Each example includes:

ID: [Domain]-[Difficulty]-[Number]
Difficulty: Easy | Medium | Hard | Expert
Review Text: "[Realistic Google review style text]"

GOLD STANDARD CLASSIFICATION:
- Span: [Exact text boundaries]
- Primary Code: [Domain] > [Category] > [Subcode] ([Code])
- Secondary Codes: [If applicable]
- Valence: [V+/V-/V0/V+-]
- Intensity: [I1/I2/I3]
- Specificity: [S1/S2/S3] (Standard+)
- Actionability: [A1/A2/A3] (Standard+)
- Temporal: [TC/TR/TH/TF] (Standard+)
- Evidence: [ES/EI/EC] (Standard+)
- Comparative: [CR-N/CR-B/CR-W/CR-S] (Standard+)
- Causal Chain: [If applicable, Full profile only]

RATIONALE:
[Detailed explanation of why this code and not alternatives]

COMMON MISTAKES:
- [Mistake 1]: [Why it's wrong]
- [Mistake 2]: [Why it's wrong]

2.3 Profile Requirements by Example

Profile	Required Fields	Example Sections to Complete
Lite	Primary (domain), Valence	Domain, V only
Core	Primary (category), Valence, Intensity	Category, V, I
Standard	All metadata except causal	Full minus causal
Full	All fields including causal when supported	Complete

3. Calibration Examples by Domain

3.1 Offering (O) Domain

O-EASY-01: Basic Product Function

Difficulty: Easy

Review Text: "My new phone arrived and it won't even turn on. Completely dead out of the box."

GOLD STANDARD CLASSIFICATION:

Span: "it won't even turn on. Completely dead out of the box"
Primary Code: Offering > Function > Works/Doesn't Work (O1.01)
Secondary Codes: None
Valence: V-
Intensity: I3
Specificity: S2
Actionability: A3
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: This is a clear product functionality failure. The product does not perform its basic function (turning on). "Completely dead" and "won't even" indicate strong intensity (I3). Actionable because it requires replacement/repair (A3).

COMMON MISTAKES:

Coding as E2.02 (Digital Functionality): Wrong because this is about the physical product, not an app/interface
Coding as O2.05 (Condition at Delivery): While delivery-related, the core complaint is function, not cosmetic condition

O-EASY-02: Product Quality Praise

Difficulty: Easy

Review Text: "The steak was perfectly cooked, medium-rare exactly as I ordered. The meat was so tender it practically melted."

GOLD STANDARD CLASSIFICATION:

Span: "The steak was perfectly cooked, medium-rare exactly as I ordered"
Primary Code: Offering > Quality > Craftsmanship (O2.02)
Secondary Codes: O4.01 (Specification Match)
Valence: V+
Intensity: I2
Specificity: S3
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: "Perfectly cooked" addresses how well the dish was executed (craftsmanship). Secondary code O4.01 captures "exactly as I ordered" (specification match). Specific (S3) due to exact doneness mentioned.

COMMON MISTAKES:

Coding as O2.01 (Materials/Inputs): "Tender meat" might suggest quality ingredients, but "perfectly cooked" emphasizes execution/skill
Missing secondary code O4.01: "Exactly as I ordered" is a distinct point about specification match

O-MEDIUM-01: Product Durability Issue

Difficulty: Medium

Review Text: "I loved this jacket for the first month, but after just two washes the zipper broke and seams started coming apart."

GOLD STANDARD CLASSIFICATION:

Span: "after just two washes the zipper broke and seams started coming apart"
Primary Code: Offering > Function > Durability (O1.03)
Secondary Codes: O2.02 (Craftsmanship)
Valence: V-
Intensity: I2
Specificity: S3
Actionability: A2
Temporal: TR
Evidence: ES
Comparative: CR-N

RATIONALE: Primary complaint is durability ("after just two washes"). The zipper and seams failing are symptoms of poor durability. Secondary O2.02 captures the craftsmanship defect. Temporal is TR (recent pattern) because it tracks degradation over time.

COMMON MISTAKES:

Coding as O2.01 (Materials): Materials may be the root cause, but the customer's complaint is about how long it lasted (durability)
Coding Temporal as TC: This spans multiple wash cycles, indicating a recent pattern (TR)

O-MEDIUM-02: Outcome Achievement

Difficulty: Medium

Review Text: "I hired them to fix my back pain and after 6 sessions I'm still in agony. Complete waste of time and money."

GOLD STANDARD CLASSIFICATION:

Span: "I hired them to fix my back pain and after 6 sessions I'm still in agony"
Primary Code: Offering > Function > Outcome Achievement (O1.05)
Secondary Codes: V4.01 (Overall Value)
Valence: V-
Intensity: I3
Specificity: S3
Actionability: A2
Temporal: TR
Evidence: ES
Comparative: CR-S

RATIONALE: Customer's goal (pain relief) was not achieved. "Still in agony" after 6 sessions indicates the treatment didn't work. CR-S because "still" indicates the condition is unchanged. Secondary V4.01 captures "waste of time and money."

COMMON MISTAKES:

Coding as P2.02 (Technical Skill): While practitioner skill may be the cause, the customer is complaining about the outcome, not observing the technique
Missing CR-S: "Still in agony" is an explicit persistence signal

O-HARD-01: Specification Match vs Accuracy

Difficulty: Hard

Review Text: "I ordered the blue model size medium and received a red one in large. Not even close to what I wanted."

GOLD STANDARD CLASSIFICATION:

Span: "I ordered the blue model size medium and received a red one in large"
Primary Code: Offering > Fit > Specification Match (O4.01)
Secondary Codes: J3.02 (Accuracy)
Valence: V-
Intensity: I2
Specificity: S3
Actionability: A3
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: Primary is O4.01 because the customer is emphasizing the product doesn't match their specification/needs. Secondary J3.02 captures the process error that caused this. The distinction: O4.01 = "wrong product for me"; J3.02 = "they made a mistake in fulfillment."

COMMON MISTAKES:

Coding only as J3.02 (Accuracy): J3.02 alone misses that the customer's need wasn't met. Primary should be O4.01 (the product impact)
Confusing with O3.01 (All Components Present): Nothing is missing; it's the wrong item entirely

O-HARD-02: Quality vs Environment Aesthetics

Difficulty: Hard

Review Text: "The sushi presentation was amateur hour. Fish slapped on rice with no artistry whatsoever."

GOLD STANDARD CLASSIFICATION:

Span: "The sushi presentation was amateur hour. Fish slapped on rice with no artistry"
Primary Code: Offering > Quality > Presentation (O2.03)
Secondary Codes: None
Valence: V-
Intensity: I3
Specificity: S2
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: This is about the product's (sushi) visual/aesthetic quality, not the restaurant's ambiance. O2.03 covers how the product itself looks. "Amateur hour" and the vivid description indicate I3 intensity.

COMMON MISTAKES:

Coding as E3.05 (Aesthetics): E3.05 is for space/environment aesthetics, not product presentation
Coding as O2.02 (Craftsmanship): While related, "presentation" is the specific subcode for visual/aesthetic quality of the product

O-EXPERT-01: Feature Availability vs Inventory

Difficulty: Expert

Review Text: "The app advertises real-time translation but that feature has been 'coming soon' for 8 months now."

GOLD STANDARD CLASSIFICATION:

Span: "The app advertises real-time translation but that feature has been 'coming soon' for 8 months now"
Primary Code: Offering > Completeness > Feature Availability (O3.02)
Secondary Codes: V2.03 (Advertising Accuracy)
Valence: V-
Intensity: I2
Specificity: S3
Actionability: A2
Temporal: TH
Evidence: ES
Comparative: CR-S

RATIONALE: O3.02 captures that a promised feature isn't available. This differs from A1.03 (inventory) which is about physical stock. Secondary V2.03 addresses the misleading advertising. TH (historical) because 8 months is a long-standing pattern. CR-S because it's persistently unavailable.

COMMON MISTAKES:

Coding as A1.03 (Inventory/Capacity): A1.03 is for physical availability ("sold out"), not software features
Coding as E2.02 (Digital Functionality): The feature doesn't exist yet; it's not broken
Coding as R1.01 (Truthfulness): While deceptive, the primary focus is on the missing feature; advertising is secondary

3.2 People (P) Domain

P-EASY-01: Staff Warmth

Difficulty: Easy

Review Text: "Sarah at the front desk was so friendly and welcoming. She made us feel right at home from the moment we walked in."

GOLD STANDARD CLASSIFICATION:

Span: "Sarah at the front desk was so friendly and welcoming. She made us feel right at home"
Primary Code: People > Attitude > Warmth/Friendliness (P1.01)
Secondary Codes: None
Valence: V+
Intensity: I2
Specificity: S3
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: Classic example of staff warmth/friendliness. Named employee (S3). "So friendly and welcoming" is clear praise without extreme intensifiers (I2).

COMMON MISTAKES:

Coding as R4.03 (Relationship Building): R4.03 is about ongoing relationship investment, not single-visit warmth
Over-rating intensity to I3: "So friendly" is moderate; I3 would require "absolutely amazing" or similar

P-EASY-02: Staff Rudeness

Difficulty: Easy

Review Text: "The cashier rolled her eyes when I asked a question and said 'I already told you' in the most condescending tone."

GOLD STANDARD CLASSIFICATION:

Span: "The cashier rolled her eyes when I asked a question and said 'I already told you' in the most condescending tone"
Primary Code: People > Attitude > Respect (P1.02)
Secondary Codes: P4.05 (Tone)
Valence: V-
Intensity: I2
Specificity: S2
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: Eye-rolling and condescension are disrespect indicators (P1.02). Secondary P4.05 captures the communication tone specifically. Not I3 because language is descriptive but not extreme.

COMMON MISTAKES:

Coding as P4.01 (Clarity): The issue isn't unclear communication; it's disrespectful delivery
Coding as P1.03 (Empathy): While related, the specific behavior (eye-rolling, condescension) indicates disrespect

P-MEDIUM-01: Staff Competence vs Problem-Solving

Difficulty: Medium

Review Text: "I asked the tech about compatibility issues and he had no idea what I was talking about. Had to Google it myself."

GOLD STANDARD CLASSIFICATION:

Span: "I asked the tech about compatibility issues and he had no idea what I was talking about"
Primary Code: People > Competence > Knowledge (P2.01)
Secondary Codes: None
Valence: V-
Intensity: I2
Specificity: S2
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: This is about knowledge gap (P2.01), not problem-solving ability (P2.03). The tech didn't lack the ability to solve a problem; they lacked the knowledge to even understand the question.

COMMON MISTAKES:

Coding as P2.03 (Problem-Solving): P2.03 is for ability to find solutions; this is about not knowing basic information
Coding as O3.04 (Documentation): The customer needed staff knowledge, not documentation

P-MEDIUM-02: Responsiveness - Follow-Through

Difficulty: Medium

Review Text: "The manager promised to call me back within 24 hours to resolve the issue. It's been a week and nothing."

GOLD STANDARD CLASSIFICATION:

Span: "The manager promised to call me back within 24 hours to resolve the issue. It's been a week and nothing"
Primary Code: People > Responsiveness > Follow-Through (P3.04)
Secondary Codes: None
Valence: V-
Intensity: I2
Specificity: S3
Actionability: A2
Temporal: TR
Evidence: ES
Comparative: CR-N

RATIONALE: P3.04 (Follow-Through) is correct because this is a specific instance of failing to complete a promised action. Specific timeline mentioned (S3). Temporal is TR because it spans a week.

COMMON MISTAKES:

Coding as R1.02 (Promise Keeping): Use R1.02 for trust/pattern framing ("they never keep promises"). P3.04 is for specific interaction follow-through
Coding as J1.03 (Response Time): While timing is mentioned, the core issue is the broken promise, not response speed

P-HARD-01: Empathy vs Listening

Difficulty: Hard

Review Text: "I explained my situation three times but the rep just kept reading from the script. It's like she didn't hear a word I said."

GOLD STANDARD CLASSIFICATION:

Span: "I explained my situation three times but the rep just kept reading from the script"
Primary Code: People > Communication > Listening (P4.02)
Secondary Codes: P1.03 (Empathy)
Valence: V-
Intensity: I2
Specificity: S2
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: Primary is P4.02 (Listening) because "didn't hear a word" and "kept reading from script" indicate failure to process what was said. Secondary P1.03 captures the implied lack of empathy. The distinction: P4.02 = cognitive processing of input; P1.03 = emotional understanding.

COMMON MISTAKES:

Coding as P1.03 primary: The explicit complaint is about not listening (P4.02); empathy is implied but secondary
Adding P2.03 (Problem-Solving): The complaint isn't that they couldn't solve it; it's that they didn't listen

P-HARD-02: Professionalism vs Respect

Difficulty: Hard

Review Text: "The technician showed up in dirty clothes, was on his personal phone the whole time, and tracked mud through my house."

GOLD STANDARD CLASSIFICATION:

Span: "The technician showed up in dirty clothes, was on his personal phone the whole time, and tracked mud through my house"
Primary Code: People > Competence > Professionalism (P2.04)
Secondary Codes: P1.02 (Respect)
Valence: V-
Intensity: I3
Specificity: S3
Actionability: A3
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: P2.04 (Professionalism) covers conduct/standards violations (dirty clothes, personal phone). Secondary P1.02 captures the disrespect shown to the customer's home (tracked mud). Multiple specific failures = I3.

COMMON MISTAKES:

Coding as E1.01 (Cleanliness): E1.01 is about the business's space, not a technician's appearance or behavior
Coding P1.02 as primary: The behaviors described are professional conduct violations; disrespect is a consequence

3.3 Journey (J) Domain

J-EASY-01: Wait Time

Difficulty: Easy

Review Text: "Had an appointment at 2pm and wasn't seen until 3:15. Over an hour wait with no explanation or apology."

GOLD STANDARD CLASSIFICATION:

Span: "Had an appointment at 2pm and wasn't seen until 3:15. Over an hour wait"
Primary Code: Journey > Timing > Wait Time (J1.01)
Secondary Codes: None
Valence: V-
Intensity: I3
Specificity: S3
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: Clear wait time complaint with specific times (2pm, 3:15). Over an hour indicates I3 severity. Note: "no explanation or apology" could be a separate span (P4.03 or R3.02).

COMMON MISTAKES:

Coding as A1.02 (Booking Access): A1.02 is about ability to schedule; J1.01 is about wait time once there
Combining "no apology" in same span: This should be split as it's a separate complaint about recovery

J-EASY-02: Process Simplicity

Difficulty: Easy

Review Text: "Returning the item was surprisingly easy. Just scanned the QR code, dropped it off, and got my refund in 2 days."

GOLD STANDARD CLASSIFICATION:

Span: "Returning the item was surprisingly easy. Just scanned the QR code, dropped it off, and got my refund in 2 days"
Primary Code: Journey > Ease > Simplicity (J2.01)
Secondary Codes: J4.03 (Resolution Speed)
Valence: V+
Intensity: I2
Specificity: S3
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: "Surprisingly easy" and the simple process description indicate J2.01. Secondary J4.03 captures the fast resolution (2 days). S3 because specific steps and timeline mentioned.

COMMON MISTAKES:

Coding as J4.02 (Resolution Process): J4.02 is for problem handling; this is a standard return, not a problem resolution
Missing secondary code: The 2-day refund speed is a distinct positive point

J-MEDIUM-01: Process Accuracy

Difficulty: Medium

Review Text: "They delivered the wrong prescription to the wrong address. Could have been dangerous."

GOLD STANDARD CLASSIFICATION:

Span: "They delivered the wrong prescription to the wrong address"
Primary Code: Journey > Reliability > Accuracy (J3.02)
Secondary Codes: E4.01 (Physical Safety)
Valence: V-
Intensity: I3
Specificity: S2
Actionability: A3
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: J3.02 (Accuracy) covers incorrect execution of requests. Two errors (wrong prescription + wrong address) compound to I3. Secondary E4.01 captures the safety concern ("could have been dangerous").

COMMON MISTAKES:

Coding as O4.01 (Specification Match): O4.01 is about product-customer fit; J3.02 is about execution error
Coding as E4.03 (Security): This is safety (E4.01), not security/property protection

J-MEDIUM-02: Resolution Process

Difficulty: Medium

Review Text: "When I reported the problem, I was transferred to 4 different departments and had to explain everything from scratch each time."

GOLD STANDARD CLASSIFICATION:

Span: "I was transferred to 4 different departments and had to explain everything from scratch each time"
Primary Code: Journey > Resolution > Resolution Process (J4.02)
Secondary Codes: J2.04 (Handoffs)
Valence: V-
Intensity: I3
Specificity: S3
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: J4.02 covers how problems are handled - this describes a dysfunctional resolution process. Secondary J2.04 captures the handoff failures specifically. "4 different departments" = S3 and I3.

COMMON MISTAKES:

Coding as P4.03 (Proactive Updates): P4.03 is about keeping customer informed, not transfer issues
Coding as J2.04 alone: The primary complaint is the overall resolution process, not just handoffs

J-HARD-01: Resolution Speed vs Service Speed

Difficulty: Hard

Review Text: "After my complaint, it took them 3 weeks to finally send a replacement. The original delivery only took 2 days!"

GOLD STANDARD CLASSIFICATION:

Span: "After my complaint, it took them 3 weeks to finally send a replacement"
Primary Code: Journey > Resolution > Resolution Speed (J4.03)
Secondary Codes: None
Valence: V-
Intensity: I3
Specificity: S3
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-W

RATIONALE: J4.03 (Resolution Speed) not J1.02 (Service Speed) because this is fixing a problem, not initial delivery. CR-W because the implicit comparison shows decline (original was fast, resolution was slow).

COMMON MISTAKES:

Coding as J1.02 (Service Speed): J1.02 is for normal service delivery; J4.03 is specifically for problem resolution timing
Missing CR-W: The comparison to "original delivery only took 2 days" is an implicit decline signal

J-HARD-02: System Availability vs Product Function

Difficulty: Hard

Review Text: "The banking app was down for maintenance during tax weekend when I desperately needed to transfer funds."

GOLD STANDARD CLASSIFICATION:

Span: "The banking app was down for maintenance during tax weekend when I desperately needed to transfer funds"
Primary Code: Journey > Reliability > Availability (J3.03)
Secondary Codes: A1.01 (Operating Hours)
Valence: V-
Intensity: I3
Specificity: S3
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: J3.03 (System Availability/Uptime) not O1.01 because the app works; it was just unavailable. Secondary A1.01 captures the timing issue (inaccessible when needed). "Desperately needed" = I3.

COMMON MISTAKES:

Coding as O1.01 (Works/Doesn't Work): The app functions; it was just down for maintenance
Coding as E2.02 (Digital Functionality): E2.02 is for broken features, not scheduled downtime

3.4 Environment (E) Domain

E-EASY-01: Physical Cleanliness

Difficulty: Easy

Review Text: "The bathroom was disgusting. Toilet unflushed, paper towels on the floor, and the mirror was covered in water spots."

GOLD STANDARD CLASSIFICATION:

Span: "The bathroom was disgusting. Toilet unflushed, paper towels on the floor, and the mirror was covered in water spots"
Primary Code: Environment > Physical Space > Cleanliness (E1.01)
Secondary Codes: None
Valence: V-
Intensity: I3
Specificity: S3
Actionability: A3
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: Clear cleanliness issue with specific examples. "Disgusting" indicates I3. Multiple specific problems (S3) that are directly actionable (A3).

COMMON MISTAKES:

Coding as E4.02 (Health/Hygiene): E4.02 is for sanitation protocols; E1.01 is for general cleanliness
Splitting into multiple spans: These are all part of one cleanliness complaint

E-EASY-02: Digital Interface Design

Difficulty: Easy

Review Text: "The new website design is beautiful. Clean layout, modern fonts, and the color scheme is perfect."

GOLD STANDARD CLASSIFICATION:

Span: "The new website design is beautiful. Clean layout, modern fonts, and the color scheme is perfect"
Primary Code: Environment > Digital Space > Interface Design (E2.01)
Secondary Codes: None
Valence: V+
Intensity: I2
Specificity: S2
Actionability: A1
Temporal: TC
Evidence: ES
Comparative: CR-B

RATIONALE: E2.01 covers visual/interaction quality of digital interfaces. "New website design" signals CR-B (improvement from before). A1 because praise doesn't suggest specific action.

COMMON MISTAKES:

Coding as O2.03 (Product Presentation): O2.03 is for product aesthetics; E2.01 is for interface design
Missing CR-B: "New website design" implies improvement from previous version

Difficulty: Medium

Review Text: "I spent 20 minutes trying to find the cancellation option. It was buried 5 menus deep with no search function."

GOLD STANDARD CLASSIFICATION:

Span: "I spent 20 minutes trying to find the cancellation option. It was buried 5 menus deep"
Primary Code: Environment > Digital Space > Navigation (E2.04)
Secondary Codes: J2.01 (Simplicity)
Valence: V-
Intensity: I3
Specificity: S3
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: E2.04 (Interface Navigation) because it's about finding things in the digital interface specifically. Secondary J2.01 captures the process friction this caused. "20 minutes" and "5 menus deep" = S3, I3.

COMMON MISTAKES:

Coding as J2.01 primary: J2.01 is process simplicity; E2.04 is specifically about interface navigation
Coding as V2.04 (Terms Fairness): Hiding cancellation could be policy issue, but the complaint is about navigation

E-MEDIUM-02: Atmosphere/Vibe

Difficulty: Medium

Review Text: "The restaurant had such a cozy atmosphere. Soft lighting, gentle music, perfect for a date night."

GOLD STANDARD CLASSIFICATION:

Span: "The restaurant had such a cozy atmosphere. Soft lighting, gentle music, perfect for a date night"
Primary Code: Environment > Ambiance > Atmosphere/Vibe (E3.01)
Secondary Codes: None
Valence: V+
Intensity: I2
Specificity: S2
Actionability: A1
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: E3.01 captures overall mood/atmosphere. Lighting and music are components of atmosphere. This differs from E3.02 (noise specifically) or E3.05 (aesthetics/beauty).

COMMON MISTAKES:

Coding as E3.05 (Aesthetics): E3.05 is visual beauty; E3.01 is overall vibe/mood
Splitting lighting and music: These work together to create the atmosphere

E-HARD-01: Safety vs Cleanliness

Difficulty: Hard

Review Text: "The kitchen staff weren't wearing gloves and I saw the cook sneeze directly over the food prep area."

GOLD STANDARD CLASSIFICATION:

Span: "The kitchen staff weren't wearing gloves and I saw the cook sneeze directly over the food prep area"
Primary Code: Environment > Safety > Health/Hygiene (E4.02)
Secondary Codes: None
Valence: V-
Intensity: I3
Specificity: S3
Actionability: A3
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: E4.02 (Health/Hygiene) covers sanitation protocols and food safety, not just surface cleanliness. Specific witnessed violations (S3), highly actionable (A3), severe concern (I3).

COMMON MISTAKES:

Coding as E1.01 (Cleanliness): E1.01 is general tidiness; E4.02 is health/sanitation protocols
Coding as P2.04 (Professionalism): While unprofessional, the primary concern is health/safety

E-HARD-02: Mobile Experience vs Digital Accessibility

Difficulty: Hard

Review Text: "The mobile site is completely unusable. Buttons are tiny, text overlaps, and I can't even see the checkout button on my screen."

GOLD STANDARD CLASSIFICATION:

Span: "The mobile site is completely unusable. Buttons are tiny, text overlaps, and I can't even see the checkout button"
Primary Code: Environment > Digital Space > Mobile Experience (E2.05)
Secondary Codes: E2.01 (Interface Design)
Valence: V-
Intensity: I3
Specificity: S3
Actionability: A3
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: E2.05 (Mobile Experience) is for smartphone optimization issues. This is not A2.05 (Digital Accessibility) which is about assistive technology support. Secondary E2.01 captures the design failures.

COMMON MISTAKES:

Coding as A2.05 (Digital Accessibility): A2.05 is for screen readers, assistive tech; E2.05 is for mobile optimization
Coding as E2.02 (Functionality): The features work; they're just poorly optimized for mobile

3.5 Access (A) Domain

A-EASY-01: Operating Hours

Difficulty: Easy

Review Text: "Love that they're open until midnight. Perfect for late-night cravings!"

GOLD STANDARD CLASSIFICATION:

Span: "Love that they're open until midnight. Perfect for late-night cravings"
Primary Code: Access > Availability > Operating Hours (A1.01)
Secondary Codes: None
Valence: V+
Intensity: I2
Specificity: S2
Actionability: A1
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: Clear operating hours praise. A1.01, not P3.03 (staff availability) which is about whether staff are present when you need them.

COMMON MISTAKES:

Coding as P3.03 (Staff Availability): P3.03 is about staff being present; A1.01 is about business hours
Coding as A4.01 (Location): Location is where; hours are when

A-EASY-02: Physical Accessibility

Difficulty: Easy

Review Text: "As a wheelchair user, I was so happy to see they have a ramp, automatic doors, and an accessible bathroom."

GOLD STANDARD CLASSIFICATION:

Span: "they have a ramp, automatic doors, and an accessible bathroom"
Primary Code: Access > Accessibility > Physical Accessibility (A2.01)
Secondary Codes: None
Valence: V+
Intensity: I2
Specificity: S3
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: A2.01 covers mobility accommodations. Multiple specific features mentioned (S3). Distinct from E4.01 (physical safety) which is about protection from harm.

COMMON MISTAKES:

Coding as E4.01 (Physical Safety): E4.01 is about danger/harm; A2.01 is about disability accommodation
Coding as E1.03 (Layout/Design): While related to design, accessibility is a distinct concern

A-MEDIUM-01: Language Support

Difficulty: Medium

Review Text: "Nobody spoke Spanish and there were no signs or menus in Spanish. Felt completely lost."

GOLD STANDARD CLASSIFICATION:

Span: "Nobody spoke Spanish and there were no signs or menus in Spanish"
Primary Code: Access > Inclusivity > Language Support (A3.01)
Secondary Codes: None
Valence: V-
Intensity: I2
Specificity: S2
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: A3.01 covers language accessibility. This is not P2.01 (staff knowledge) because the issue isn't staff competence but language support as an inclusion matter.

COMMON MISTAKES:

Coding as P2.01 (Knowledge): P2.01 is about product knowledge; A3.01 is about language inclusion
Coding as E1.05 (Signage): While signs are mentioned, the core issue is language access

A-MEDIUM-02: Equal Treatment / Discrimination

Difficulty: Medium

Review Text: "The staff was super helpful to other customers but completely ignored me. I'm pretty sure it's because of how I was dressed."

GOLD STANDARD CLASSIFICATION:

Span: "The staff was super helpful to other customers but completely ignored me. I'm pretty sure it's because of how I was dressed"
Primary Code: Access > Inclusivity > Equal Treatment (A3.05)
Secondary Codes: P3.01 (Attentiveness)
Valence: V-
Intensity: I2
Specificity: S2
Actionability: A2
Temporal: TC
Evidence: EI
Comparative: CR-N

RATIONALE: A3.05 (Equal Treatment) because customer perceives identity-based discrimination ("because of how I was dressed"). Secondary P3.01 captures the service failure. Evidence is EI because discrimination is inferred.

COMMON MISTAKES:

Coding as P1.02 (Respect): Use P1.02 for general disrespect; A3.05 when identity-based discrimination is perceived
Coding as P3.03 (Availability): Staff were available, just selectively helpful

A-HARD-01: Booking Access vs Wait Time

Difficulty: Hard

Review Text: "I tried to book an appointment for two months and the earliest they had was in December. That's a 4-month wait just to get an appointment!"

GOLD STANDARD CLASSIFICATION:

Span: "I tried to book an appointment for two months and the earliest they had was in December. That's a 4-month wait just to get an appointment"
Primary Code: Access > Availability > Booking Access (A1.02)
Secondary Codes: None
Valence: V-
Intensity: I3
Specificity: S3
Actionability: A2
Temporal: TR
Evidence: ES
Comparative: CR-N

RATIONALE: A1.02 (Booking Access) is about ability to schedule, distinct from J1.01 (Wait Time) which is waiting once you arrive. "4-month wait just to get an appointment" = scheduling access, not day-of wait.

COMMON MISTAKES:

Coding as J1.01 (Wait Time): J1.01 is waiting at the appointment; A1.02 is getting the appointment in the first place
Coding as A1.03 (Inventory/Capacity): While capacity may be the cause, the complaint is about booking access

A-HARD-02: Digital Accessibility vs Mobile Experience

Difficulty: Hard

Review Text: "My screen reader can't interpret any of the buttons on their website. It's completely inaccessible for blind users like me."

GOLD STANDARD CLASSIFICATION:

Span: "My screen reader can't interpret any of the buttons on their website. It's completely inaccessible for blind users like me"
Primary Code: Access > Accessibility > Digital Accessibility (A2.05)
Secondary Codes: A2.02 (Visual Accessibility)
Valence: V-
Intensity: I3
Specificity: S2
Actionability: A3
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: A2.05 (Digital Accessibility) covers assistive technology support like screen readers. Secondary A2.02 captures the visual accessibility dimension. This is NOT E2.05 (Mobile Experience) which is about smartphone optimization.

COMMON MISTAKES:

Coding as E2.05 (Mobile Experience): E2.05 is for mobile optimization; A2.05 is for assistive technology
Coding as E2.02 (Digital Functionality): The site may function; it's just not accessible

3.6 Value (V) Domain

V-EASY-01: Absolute Price

Difficulty: Easy

Review Text: "Prices are insanely high. $25 for a basic salad is just ridiculous."

GOLD STANDARD CLASSIFICATION:

Span: "$25 for a basic salad is just ridiculous"
Primary Code: Value > Price > Absolute Price (V1.01)
Secondary Codes: None
Valence: V-
Intensity: I3
Specificity: S3
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: V1.01 is the absolute cost itself. "Insanely high" and "ridiculous" indicate I3. Specific price mentioned (S3). This is not V4.02 (quality-price ratio) because quality isn't mentioned.

COMMON MISTAKES:

Coding as V4.02 (Quality-Price Ratio): V4.02 requires quality discussion; this is pure price complaint
Coding as V1.03 (Price vs Market): No competitor comparison made

V-EASY-02: Hidden Costs

Difficulty: Easy

Review Text: "At checkout they added a $15 'service fee' that wasn't mentioned anywhere. Total bait and switch."

GOLD STANDARD CLASSIFICATION:

Span: "At checkout they added a $15 'service fee' that wasn't mentioned anywhere"
Primary Code: Value > Price > Hidden Costs (V1.04)
Secondary Codes: V2.02 (Fee Disclosure)
Valence: V-
Intensity: I2
Specificity: S3
Actionability: A3
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: V1.04 (Hidden Costs) is the unexpected charge itself. Secondary V2.02 captures the disclosure failure. Note: "bait and switch" could be separate span coded V2.03.

COMMON MISTAKES:

Coding as V2.02 primary: V1.04 is the charge itself; V2.02 is the disclosure failure
Coding as R1.04 (Ethics): While unethical, the explicit complaint is about the hidden fee

V-MEDIUM-01: Quality-Price Ratio

Difficulty: Medium

Review Text: "For what we paid, the food was disappointing. You can get much better quality at half the price elsewhere."

GOLD STANDARD CLASSIFICATION:

Span: "For what we paid, the food was disappointing"
Primary Code: Value > Worth > Quality-Price Ratio (V4.02)
Secondary Codes: None
Valence: V-
Intensity: I2
Specificity: S1
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: V4.02 specifically addresses what you get for what you pay. "For what we paid" directly frames quality relative to cost. Vague (S1) because no specifics on what was disappointing.

COMMON MISTAKES:

Coding as O2 (Quality): The quality alone isn't the complaint; it's quality relative to price
Coding as V1.03 (Price vs Market): While competitors mentioned, primary framing is quality-price ratio

V-MEDIUM-02: Terms Fairness

Difficulty: Medium

Review Text: "Their cancellation policy is criminal. Cancel more than 24 hours out and you still lose 50% of your payment."

GOLD STANDARD CLASSIFICATION:

Span: "Their cancellation policy is criminal. Cancel more than 24 hours out and you still lose 50% of your payment"
Primary Code: Value > Transparency > Terms Fairness (V2.04)
Secondary Codes: None
Valence: V-
Intensity: I3
Specificity: S3
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: V2.04 covers policy/contract reasonableness. "Criminal" indicates I3. Specific terms described (S3). This is NOT R1.05 (Fair Dealing) which is about equitable treatment patterns.

COMMON MISTAKES:

Coding as R1.05 (Fair Dealing): R1.05 is relationship/trust framing; V2.04 is specifically about policy terms
Coding as R1.04 (Ethics): While "criminal" is strong, the complaint is about policy, not organizational ethics

V-HARD-01: Overall Value (Rip-off)

Difficulty: Hard

Review Text: "Absolute rip-off. Paid $500 for what turned out to be a 30-minute consultation with zero actionable advice."

GOLD STANDARD CLASSIFICATION:

Span: "Absolute rip-off. Paid $500 for what turned out to be a 30-minute consultation with zero actionable advice"
Primary Code: Value > Worth > Overall Value (V4.01)
Secondary Codes: O1.05 (Outcome Achievement)
Valence: V-
Intensity: I3
Specificity: S3
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: V4.01 (Overall Value) for "rip-off" - this is an exchange complaint (poor value), not a character judgment (scam). Secondary O1.05 captures "zero actionable advice" (outcome failure).

COMMON MISTAKES:

Coding as R1.04 (Ethics/Scam): "Rip-off" = exchange complaint (V4.01); "scam" would be R1.04
Missing O1.05 secondary: "Zero actionable advice" is a distinct outcome achievement failure

V-HARD-02: Advertising Accuracy vs Truthfulness

Difficulty: Hard

Review Text: "The photos show a huge suite but the actual room was barely bigger than a closet. False advertising!"

GOLD STANDARD CLASSIFICATION:

Span: "The photos show a huge suite but the actual room was barely bigger than a closet. False advertising"
Primary Code: Value > Transparency > Advertising Accuracy (V2.03)
Secondary Codes: O4.01 (Specification Match)
Valence: V-
Intensity: I3
Specificity: S2
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: V2.03 (Advertising Accuracy) is about marketing matching reality. This is NOT R1.01 (Truthfulness) which is about trust/integrity/character. Secondary O4.01 captures product mismatch.

COMMON MISTAKES:

Coding as R1.01 (Truthfulness): R1.01 is for trust/integrity framing; V2.03 is specifically about advertising
Coding as V2.05 (Honest Representation): V2.03 is more specific to advertising; V2.05 is broader claims

3.7 Relationship (R) Domain

R-EASY-01: Truthfulness

Difficulty: Easy

Review Text: "They flat out lied to me about the warranty coverage. When I tried to claim, suddenly all these exclusions appeared."

GOLD STANDARD CLASSIFICATION:

Span: "They flat out lied to me about the warranty coverage"
Primary Code: Relationship > Integrity > Truthfulness (R1.01)
Secondary Codes: R2.05 (Guarantee Honor)
Valence: V-
Intensity: I3
Specificity: S2
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: R1.01 for explicit lying accusation. Secondary R2.05 captures the warranty failure. "Flat out lied" indicates I3 and clear R domain (trust/integrity framing).

COMMON MISTAKES:

Coding as V2.04 (Terms Fairness): While terms are involved, the explicit complaint is about lying
Coding as V2.02 (Fee Disclosure): The warranty exclusions aren't fees; it's about truthfulness

R-EASY-02: Track Record

Difficulty: Easy

Review Text: "I've been a customer for 10 years and they have never once let me down. Truly dependable."

GOLD STANDARD CLASSIFICATION:

Span: "I've been a customer for 10 years and they have never once let me down. Truly dependable"
Primary Code: Relationship > Dependability > Track Record (R2.01)
Secondary Codes: None
Valence: V+
Intensity: I3
Specificity: S3
Actionability: A1
Temporal: TH
Evidence: ES
Comparative: CR-N

RATIONALE: R2.01 covers historical performance. "10 years" and "never once" establish a track record. TH (historical) because of the long timeframe.

COMMON MISTAKES:

Coding as R2.02 (Consistency): R2.02 is "same experience each time"; R2.01 is overall track record
Coding Temporal as TC: "10 years" clearly indicates historical (TH), not current visit

R-MEDIUM-01: Recovery - Acknowledgment

Difficulty: Medium

Review Text: "When I pointed out the error, the manager immediately admitted their mistake and took full responsibility. Refreshing honesty."

GOLD STANDARD CLASSIFICATION:

Span: "the manager immediately admitted their mistake and took full responsibility"
Primary Code: Relationship > Recovery > Acknowledgment (R3.01)
Secondary Codes: R3.05 (Ownership)
Valence: V+
Intensity: I2
Specificity: S2
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: R3.01 (Acknowledgment) for admitting failures. Secondary R3.05 (Ownership) for "took full responsibility." This is NOT J4.01 (Problem Acknowledgment) which is recognizing the issue exists, not admitting fault.

COMMON MISTAKES:

Coding as J4.01 (Problem Acknowledgment): J4.01 = "yes, there's a problem"; R3.01 = "yes, we were wrong"
Missing R3.05: "Took full responsibility" is a distinct ownership signal

R-MEDIUM-02: Compensation

Difficulty: Medium

Review Text: "After the terrible experience, they refunded my money AND gave me a $100 credit. They really went above and beyond."

GOLD STANDARD CLASSIFICATION:

Span: "they refunded my money AND gave me a $100 credit"
Primary Code: Relationship > Recovery > Compensation (R3.03)
Secondary Codes: None
Valence: V+
Intensity: I3
Specificity: S3
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: R3.03 (Compensation) covers making amends beyond fixing the issue. "Above and beyond" with specific dollar amount = I3, S3. This is NOT J4.04 (Resolution Quality) which is whether the fix worked.

COMMON MISTAKES:

Coding as J4.04 (Resolution Quality): J4.04 = was it fixed? R3.03 = did they make amends?
Coding as V4.01 (Overall Value): This is about recovery relationship, not value assessment

R-HARD-01: Ethics (Scam)

Difficulty: Hard

Review Text: "This company is a complete scam. They're con artists who prey on elderly people with fake urgency tactics."

GOLD STANDARD CLASSIFICATION:

Span: "This company is a complete scam. They're con artists who prey on elderly people with fake urgency tactics"
Primary Code: Relationship > Integrity > Ethics (R1.04)
Secondary Codes: A3.05 (Equal Treatment)
Valence: V-
Intensity: I3
Specificity: S2
Actionability: A2
Temporal: TH
Evidence: ES
Comparative: CR-N

RATIONALE: R1.04 (Ethics) for "scam," "con artists" - these are moral judgments about organizational character/intent. Secondary A3.05 captures targeting vulnerable group. TH because implies ongoing pattern.

COMMON MISTAKES:

Coding as V4.01 (Overall Value): "Scam" is character judgment (R1.04), not exchange complaint
Coding as V2.03 (Advertising Accuracy): While deceptive, "scam" goes beyond advertising to ethics

R-HARD-02: Recovery vs Resolution (J4 vs R3)

Difficulty: Hard

Review Text: "Yes, they eventually fixed the problem, but they never apologized or acknowledged they caused it in the first place."

GOLD STANDARD CLASSIFICATION:

Span: "they never apologized or acknowledged they caused it in the first place"
Primary Code: Relationship > Recovery > Apology (R3.02)
Secondary Codes: R3.01 (Acknowledgment)
Valence: V-
Intensity: I2
Specificity: S2
Actionability: A2
Temporal: TC
Evidence: ES
Comparative: CR-N

RATIONALE: Primary R3.02 (Apology) for missing expression of regret. Secondary R3.01 (Acknowledgment) for not admitting fault. Note: "they eventually fixed the problem" could be separate span (J4.04, V+).

COMMON MISTAKES:

Coding as J4.01 (Problem Acknowledgment): J4.01 is recognizing the problem; R3.01 is admitting they caused it
Including the fix in same span: The fix and the apology failure are distinct feedback points

R-EXPERT-01: Promise Keeping vs Follow-Through

Difficulty: Expert

Review Text: "They promise the world to get your business but never deliver. Every single commitment they make turns out to be empty words."

GOLD STANDARD CLASSIFICATION:

Span: "They promise the world to get your business but never deliver. Every single commitment they make turns out to be empty words"
Primary Code: Relationship > Integrity > Promise Keeping (R1.02)
Secondary Codes: R2.04 (Trustworthiness)
Valence: V-
Intensity: I3
Specificity: S1
Actionability: A1
Temporal: TH
Evidence: ES
Comparative: CR-N

RATIONALE: R1.02 (Promise Keeping) not P3.04 (Follow-Through) because this is framed as trust/pattern ("Every single commitment," "never deliver") not a specific interaction. TH for ongoing pattern.

COMMON MISTAKES:

Coding as P3.04 (Follow-Through): P3.04 is specific interaction ("said they'd call, didn't"). R1.02 is pattern of broken promises
Coding Temporal as TC: "Every single commitment" indicates historical pattern (TH)

4. Confusion Pair Examples

These examples target the most common disambiguation challenges identified in the A1 Quickstart Guide.

4.1 V vs R: Scam vs Rip-off

CP-VR-01: Exchange Complaint (V)

Difficulty: Hard

Review Text: "Total rip-off! I paid premium prices for bargain basement quality. Won't get fooled again."

GOLD STANDARD CLASSIFICATION:

Primary Code: V4.01 (Overall Value)
Valence: V- | Intensity: I3 | CR: CR-N

RATIONALE: "Rip-off" is an exchange complaint - the focus is on poor value for money paid. This is V4.01, NOT R1.04. The test: Is the complaint about what I got vs. what I paid? (V) Or is it about their intent to deceive? (R)

CP-VR-02: Character Judgment (R)

Difficulty: Hard

Review Text: "This business is nothing but a scam operation. They deliberately mislead customers and have no intention of delivering what they promise."

GOLD STANDARD CLASSIFICATION:

Primary Code: R1.04 (Ethics)
Secondary Codes: R1.01 (Truthfulness)
Valence: V- | Intensity: I3 | CR: CR-N

RATIONALE: "Scam," "deliberately mislead," "no intention" - these are moral judgments about organizational character and intent. This is R1.04, NOT V4.01. The test: Is this about their intent to deceive/harm? (R)

4.2 J4 vs R3: Process vs Ownership

CP-JR-01: Resolution Process (J4)

Difficulty: Hard

Review Text: "When my order arrived wrong, they processed the return quickly and sent a replacement within 2 days."

GOLD STANDARD CLASSIFICATION:

Primary Code: J4.02 (Resolution Process)
Secondary Codes: J4.03 (Resolution Speed)
Valence: V+ | Intensity: I2 | CR: CR-N

RATIONALE: Focus is on what they did to fix it (process, speed). No mention of apology, ownership, or making amends beyond the fix. This is J4, NOT R3.

CP-JR-02: Recovery Ownership (R3)

Difficulty: Hard

Review Text: "When my order arrived wrong, they sincerely apologized, admitted it was their fault, and gave me a credit for my next order as a gesture of goodwill."

GOLD STANDARD CLASSIFICATION:

Primary Code: R3.02 (Apology)
Secondary Codes: R3.01 (Acknowledgment), R3.03 (Compensation)
Valence: V+ | Intensity: I2 | CR: CR-N

RATIONALE: Focus is on how they took responsibility - apology, admission, and goodwill gesture. This is R3, NOT J4. The test: Is this about the mechanics of fixing it (J4) or about accountability and making amends (R3)?

4.3 P2 vs P3: Competence vs Responsiveness

CP-PP-01: Competence - Problem-Solving (P2.03)

Difficulty: Medium

Review Text: "The tech couldn't figure out what was wrong with my computer. Tried three different things but had no idea what they were doing."

GOLD STANDARD CLASSIFICATION:

Primary Code: P2.03 (Problem-Solving)
Valence: V- | Intensity: I2 | CR: CR-N

RATIONALE: "Couldn't figure out," "had no idea" - this is about ability to solve, not willingness or attentiveness. This is P2.03 (skill to address issues).

CP-PP-02: Responsiveness - Follow-Through (P3.04)

Difficulty: Medium

Review Text: "The tech promised to get back to me with a solution within 24 hours but never called. Had to follow up myself three times."

GOLD STANDARD CLASSIFICATION:

Primary Code: P3.04 (Follow-Through)
Valence: V- | Intensity: I2 | CR: CR-N

RATIONALE: "Promised to get back," "never called," "follow up myself" - this is about completing promised actions, not technical ability. This is P3.04 (follow-through on commitments).

4.4 O1.01 vs J3.03: Product Function vs System Uptime

CP-OJ-01: Product Function (O1.01)

Difficulty: Medium

Review Text: "My new smartwatch won't sync with my phone no matter what I try. The Bluetooth connection just doesn't work."

GOLD STANDARD CLASSIFICATION:

Primary Code: O1.01 (Works/Doesn't Work)
Valence: V- | Intensity: I2 | CR: CR-N

RATIONALE: The product itself doesn't function. The watch's Bluetooth is broken. This is O1.01 (basic functionality), NOT J3.03.

CP-OJ-02: System Uptime (J3.03)

Difficulty: Medium

Review Text: "The app kept timing out all weekend. Their servers must have been down because nothing would load."

GOLD STANDARD CLASSIFICATION:

Primary Code: J3.03 (Availability)
Valence: V- | Intensity: I2 | CR: CR-N

RATIONALE: The service/system was unavailable, not the app itself broken. "Servers must have been down" = system uptime issue. This is J3.03, NOT O1.01 or E2.02.

4.5 P3.04 vs R1.02: Specific vs Pattern

CP-PR-01: Specific Follow-Through (P3.04)

Difficulty: Expert

Review Text: "The salesperson said she'd email me the quote by EOD but I never received it."

GOLD STANDARD CLASSIFICATION:

Primary Code: P3.04 (Follow-Through)
Valence: V- | Intensity: I1 | CR: CR-N

RATIONALE: Single specific incident of not completing a promised action. This is P3.04, a People domain code about individual behavior.

CP-PR-02: Pattern of Broken Promises (R1.02)

Difficulty: Expert

Review Text: "This company never keeps their promises. Every deadline they commit to gets pushed back, every commitment is an empty word."

GOLD STANDARD CLASSIFICATION:

Primary Code: R1.02 (Promise Keeping)
Valence: V- | Intensity: I3 | Temporal: TH | CR: CR-N

RATIONALE: Pattern-level complaint about organizational trustworthiness ("never keeps," "every deadline," "every commitment"). This is R1.02, a Relationship domain code about trust patterns.

5. Multi-Code Span Examples

MC-01: Cause-Effect Single Span

Difficulty: Medium

Review Text: "The wait was ridiculous because they only had one cashier working during the lunch rush."

GOLD STANDARD CLASSIFICATION:

Span: Full sentence (keep together)
Primary Code: J1.01 (Wait Time) - the impact
Secondary Codes: A1.04 (Staffing Levels) - the cause
Valence: V-
Intensity: I3
CR: CR-N

RATIONALE: Cause-effect in same clause stays together. Primary = customer-experienced impact (wait). Secondary = underlying cause (understaffing). Do NOT split.

MC-02: Two Distinct Complaints

Difficulty: Medium

Review Text: "The pizza was cold AND the delivery driver was rude about it."

GOLD STANDARD CLASSIFICATION:

Span 1: "The pizza was cold"
- Primary: O2.05 (Condition at Delivery)
- V-/I2
Span 2: "the delivery driver was rude about it"
- Primary: P1.02 (Respect)
- V-/I2

RATIONALE: Two genuinely different issues (product condition + staff behavior). SPLIT into separate spans because they're different domains with independent fixes.

MC-03: Multi-Domain Complex

Difficulty: Hard

Review Text: "Beautiful restaurant with friendly staff, but the food took forever and when it arrived it was barely warm."

GOLD STANDARD CLASSIFICATION:

Span 1: "Beautiful restaurant"
- Primary: E3.05 (Aesthetics)
- V+/I2
Span 2: "friendly staff"
- Primary: P1.01 (Warmth/Friendliness)
- V+/I2
Span 3: "the food took forever"
- Primary: J1.02 (Service Speed)
- V-/I2
Span 4: "when it arrived it was barely warm"
- Primary: O2.05 (Condition at Delivery)
- V-/I1

RATIONALE: Four distinct feedback points across four domains. Split at "but" (valence change) and at subject changes.

MC-04: Same Target, Mixed Assessment

Difficulty: Medium

Review Text: "The steak was good but overpriced for what you get."

GOLD STANDARD CLASSIFICATION:

Span: Keep as one (same target)
Primary Code: V4.02 (Quality-Price Ratio)
Valence: V+-
Intensity: I2

RATIONALE: Same target (the steak) with mixed assessment. Use V+- because positive and negative target the same thing. Do NOT split.

MC-05: Staff State Causal Chain

Difficulty: Expert

Review Text: "Our server seemed exhausted and kept forgetting our orders. When I asked about it, she said they'd been working 12-hour shifts all week."

GOLD STANDARD CLASSIFICATION:

Span: "Our server seemed exhausted and kept forgetting our orders"
Primary Code: J3.02 (Accuracy)
Secondary Codes: P3.01 (Attentiveness)
Valence: V-
Intensity: I2
Evidence: ES
Causal Chain: CD-S (Staff State: fatigue) - Evidence: ES (explicitly stated)

RATIONALE: Primary is the customer impact (wrong orders). Causal code CD-S is valid because the fatigue and 12-hour shifts are explicitly stated, not inferred.

MC-06: Management Oversight Causal

Difficulty: Expert

Review Text: "This is the fourth time I've reported this broken equipment and nothing has been done about it."

GOLD STANDARD CLASSIFICATION:

Span: Full sentence
Primary Code: E1.02 (Maintenance)
Secondary Codes: None
Valence: V-
Intensity: I3
Temporal: TH
CR: CR-S
Causal Chain: MG-O (Oversight failure) - Evidence: EI

RATIONALE: CR-S because "still" broken after multiple reports. Causal MG-O is valid via EI because "fourth time" + "nothing done" logically entails oversight/supervision failure.

MC-07: Improvement Signal with Context

Difficulty: Hard

Review Text: "The customer service used to be terrible but they've really turned things around. Fast, helpful, and actually solve problems now."

GOLD STANDARD CLASSIFICATION:

Span 1: "The customer service used to be terrible"
- Primary: P (domain level for context)
- V-/I2/CR-W (implicit past-negative)
Span 2: "they've really turned things around. Fast, helpful, and actually solve problems now"
- Primary: P3.05 (Urgency) or P2.03 (Problem-Solving)
- Secondary: P1.01 (Warmth/Friendliness)
- V+/I2/CR-B

RATIONALE: CR-B on the second span captures the improvement signal. "Turned things around" is explicit improvement.

MC-08: Compound Issue with Different Owners

Difficulty: Hard

Review Text: "The app crashes constantly AND when I call support they put me on hold for 45 minutes."

GOLD STANDARD CLASSIFICATION:

Span 1: "The app crashes constantly"
- Primary: E2.02 (Digital Functionality)
- V-/I3
- Owner: IT
Span 2: "when I call support they put me on hold for 45 minutes"
- Primary: J1.01 (Wait Time)
- V-/I3
- Owner: Operations

RATIONALE: Two distinct issues requiring different owners. Split to enable proper routing.

6. Edge Cases and Boundary Conditions

EC-01: Neutral Observation

Difficulty: Medium

Review Text: "The restaurant has about 50 seats and they serve Italian food. Parking lot fits maybe 20 cars."

GOLD STANDARD CLASSIFICATION:

Valence: V0 (Neutral)
Primary Code: E3.04 (Crowding/Capacity) or A4.02 (Parking)
Intensity: I1

RATIONALE: Pure factual observation without judgment. Assign V0 (Neutral). Code based on what information could be useful (capacity, parking).

EC-02: Sarcasm Detection

Difficulty: Hard

Review Text: "Oh sure, I LOVE waiting 2 hours for a table with a reservation. Really makes me feel valued."

GOLD STANDARD CLASSIFICATION:

Primary Code: J1.01 (Wait Time)
Valence: V- (negative despite positive words)
Intensity: I3 (CAPS, sarcasm = strong)
CR: CR-N

RATIONALE: Sarcasm inverts literal meaning. "LOVE" + extreme wait = clear complaint. Code the actual sentiment (V-), not the literal words.

EC-03: Future Intent Statement

Difficulty: Medium

Review Text: "If they don't fix the parking situation, I won't be coming back."

GOLD STANDARD CLASSIFICATION:

Primary Code: A4.02 (Parking)
Valence: V-
Intensity: I2
Temporal: TF (Future)

RATIONALE: Conditional future statement. Code the issue being complained about (parking). Temporal TF because it's about future intent/expectation.

EC-04: Competitor Comparison (NOT CR)

Difficulty: Hard

Review Text: "Their customer service is way better than Amazon's."

GOLD STANDARD CLASSIFICATION:

Primary Code: P (People domain)
Valence: V+
CR: CR-N (NOT CR-B)

RATIONALE: Competitor comparison is NOT CR. CR is only for self-comparison to customer's own past experience. This is CR-N despite "better than."

EC-05: Implicit Decline

Difficulty: Expert

Review Text: "This used to be my favorite restaurant."

GOLD STANDARD CLASSIFICATION:

Primary Code: R2.02 (Consistency)
Valence: V-
Intensity: I2
Temporal: TH
CR: CR-W

RATIONALE: "Used to be" implies past-positive, present-negative = CR-W (decline). The statement is negative despite no explicit current complaint. Past tense + "favorite" = implicit decline.

EC-06: Mixed Temporal Reference

Difficulty: Expert

Review Text: "The breakfast has always been good but lately the portions have gotten smaller."

GOLD STANDARD CLASSIFICATION:

Span 1: "The breakfast has always been good"
- Primary: O2.02 (Craftsmanship)
- V+/TH/CR-S
Span 2: "lately the portions have gotten smaller"
- Primary: O3.03 (Scope Delivery)
- V-/TR/CR-W

RATIONALE: Two different temporal references and valences. Split and assign appropriate T and CR values to each.

EC-07: Identity-Framed Complaint

Difficulty: Hard

Review Text: "As a vegetarian, there was literally nothing I could eat on the menu."

GOLD STANDARD CLASSIFICATION:

Primary Code: A3.03 (Dietary/Medical)
Valence: V-
Intensity: I3 ("literally nothing")

RATIONALE: A3.03 (dietary accommodation) not O4.03 (flexibility) because it's framed as identity-based inclusion ("as a vegetarian").

EC-08: Praise with Caveat

Difficulty: Medium

Review Text: "Everything was perfect... except for the price. But still worth it overall."

GOLD STANDARD CLASSIFICATION:

Span 1: "Everything was perfect"
- V+/I3
Span 2: "except for the price"
- Primary: V1.01 (Absolute Price)
- V-/I2
Span 3: "But still worth it overall"
- Primary: V4.01 (Overall Value)
- V+/I2

RATIONALE: Three distinct assessments. Final assessment is positive despite price complaint. Split at each valence shift.

EC-09: Ambiguous Referent

Difficulty: Expert

Review Text: "They really need to fix that."

GOLD STANDARD CLASSIFICATION:

Evidence: EC (Contextual)
Primary Code: [Cannot determine without context]

RATIONALE: Ambiguous referent ("that") requires surrounding context to classify. Mark Evidence as EC. If no context available, flag for review.

EC-10: Causal Without Evidence

Difficulty: Expert

Review Text: "The food took forever. They're probably understaffed or something."

GOLD STANDARD CLASSIFICATION:

Primary Code: J1.02 (Service Speed)
Valence: V-
Intensity: I2
Causal Chain: NONE

RATIONALE: "Probably" indicates speculation. Do NOT assign CD-O (Understaffing) as causal code. Causal codes require explicit statement or logical entailment, not customer guessing.

7. Certification Test Structure

7.1 Entry-Level Test (Lite Profile)

Target: New annotators beginning URT training Items: 20 Time Limit: 30 minutes Required Accuracy: >= 85%

Component	Items	Focus
Domain classification	14	Correctly identify O-P-J-E-A-V-R
Valence assignment	6	V+, V-, V0, V+-

Test Pool: Easy difficulty items only

O-EASY-01, O-EASY-02
P-EASY-01, P-EASY-02
J-EASY-01, J-EASY-02
E-EASY-01, E-EASY-02
A-EASY-01, A-EASY-02
V-EASY-01, V-EASY-02
R-EASY-01, R-EASY-02
Plus 6 additional Easy items for valence focus

7.2 Standard Certification (Core Profile)

Target: Annotators for dashboard/trend work Items: 40 Time Limit: 60 minutes Required Accuracy: >= 85%

Component	Items	Focus
Domain classification	10	All 7 domains
Category classification	20	Correctly identify O1-O4, P1-P4, etc.
Valence + Intensity	10	Correct V and I assignment

Test Pool: Easy + Medium difficulty

All Easy items
All Medium items (O-MEDIUM-01, O-MEDIUM-02, etc.)
Selected confusion pairs (CP-VR-01, CP-VR-02)

7.3 Advanced Certification (Standard Profile)

Target: Annotators for full analytics pipeline Items: 60 Time Limit: 90 minutes Required Accuracy: >= 80%

Component	Items	Focus
Subcode classification	30	Correct X.XX subcode
Multi-code assignment	10	Primary + Secondary
Full metadata	15	All 7 dimensions
Confusion pairs	5	V vs R, J4 vs R3, etc.

Test Pool: Easy + Medium + Hard difficulty

All Easy and Medium items
All Hard items
All confusion pair examples
Multi-code examples MC-01 through MC-04

7.4 Expert Certification (Full Profile)

Target: Lead annotators, QA reviewers, gold standard creators Items: 80 Time Limit: 120 minutes Required Accuracy: >= 80%

Component	Items	Focus
Subcode classification	30	All difficulty levels
Multi-code + Causal	15	Including causal chain
Full metadata	20	All dimensions including CR
Edge cases	10	Boundary conditions
Complex scenarios	5	Expert difficulty items

Test Pool: All difficulty levels

All items from all difficulty levels
All multi-code examples (MC-01 through MC-08)
All edge cases (EC-01 through EC-10)
Expert items (O-EXPERT-01, R-EXPERT-01, etc.)

8. Scoring Rubric

8.1 Point Values by Component

Component	Correct	Partial	Incorrect
Domain (Tier 1)	4 pts	0 pts	0 pts
Category (Tier 2)	3 pts	1 pt (correct domain)	0 pts
Subcode (Tier 3)	2 pts	1 pt (correct category)	0 pts
Valence	2 pts	0 pts	0 pts
Intensity	1 pt	0 pts	0 pts
Secondary Code (each)	1 pt	0.5 pt (correct domain)	0 pts
Specificity	1 pt	0 pts	0 pts
Actionability	1 pt	0 pts	0 pts
Temporal	1 pt	0 pts	0 pts
Evidence	1 pt	0 pts	0 pts
Comparative Reference	1 pt	0 pts	0 pts
Causal Code (each)	2 pts	0 pts	-1 pt (false positive)

8.2 Error Severity Weights

Severity	Weight	Examples
Critical	1.0	Wrong domain, wrong valence
Major	0.5	Wrong category, intensity off by 2, wrong CR direction
Minor	0.25	Wrong subcode (same category), intensity off by 1
Slip	0.1	Typo, boundary off by <5 chars

8.3 Profile-Specific Max Scores

Profile	Max Points per Item	Components Scored
Lite	6 pts	Domain (4) + Valence (2)
Core	10 pts	Domain (4) + Category (3) + V (2) + I (1)
Standard	17 pts	Subcode (2) + V (2) + I (1) + S (1) + A (1) + T (1) + E (1) + CR (1) + Secondary (2x1) + Domain context (4)
Full	23 pts	Standard + Causal (up to 3x2)

9. Pass/Fail Criteria

9.1 Certification Thresholds

Level	Overall Accuracy	Critical Errors	Domain Accuracy
Entry (Lite)	>= 85%	0 allowed	>= 90%
Standard (Core)	>= 85%	0 allowed	>= 90%
Advanced (Standard)	>= 80%	0 allowed	>= 85%
Expert (Full)	>= 80%	0 allowed	>= 85%

9.2 Automatic Failure Conditions

The following result in automatic test failure regardless of overall score:

Wrong Domain on 3+ items - indicates fundamental misunderstanding
Wrong Valence on 2+ items - basic sentiment recognition failure
V vs R confusion on both test items - critical disambiguation failure
J4 vs R3 confusion on both test items - critical disambiguation failure
Invented causal codes - assigning causal without evidence support

9.3 Retake Policy

Failure Type	Required Action	Wait Period
Score < threshold	Review A1 materials	3 days
Critical error(s)	Targeted training session	5 days
Domain accuracy failure	Domain-specific training	5 days
Second failure	Supervisor review required	7 days
Third failure	Consider alternative assignment	N/A

9.4 Ongoing Maintenance

Certified annotators must maintain:

Requirement	Frequency	Threshold
Accuracy spot check	Weekly	>= 90%
Calibration attendance	Weekly	90% sessions
Recertification quiz	Quarterly	>= 85%
IAA with peers	Bi-weekly	Kappa >= 0.75

9.5 Certification Levels Summary

Level	Profile	Test Items	Pass Score	Privileges
Entry	Lite	20	85%	Can annotate Lite profile
Standard	Core	40	85%	Can annotate Core profile
Advanced	Standard	60	80%	Can annotate Standard profile
Expert	Full	80	80%	Full profile + QA reviewer + Gold standard creation

Document References

Document	Location	Purpose
URT-Specification-v5.1.md	`/urt-taxonomy/spec/`	Full taxonomy reference
A1-Annotator-Quickstart.md	`/urt-taxonomy/track-a-training/`	Quick reference guide
A2-QA-Protocol.md	`/urt-taxonomy/track-a-training/`	Quality assurance procedures
B1-urt-codes.yaml	`/urt-taxonomy/track-b-engineering/`	Machine-readable code registry

Appendix: Quick Reference Tables

All 7 Domains

Code	Domain	Core Question
O	Offering	Does it work?
P	People	How did they treat me?
J	Journey	Was it smooth?
E	Environment	Is the space okay?
A	Access	Can I get it?
V	Value	Is it worth it?
R	Relationship	Can I trust them?

All 28 Categories

O	P	J	E	A	V	R
O1 Function	P1 Attitude	J1 Timing	E1 Physical	A1 Availability	V1 Price	R1 Integrity
O2 Quality	P2 Competence	J2 Ease	E2 Digital	A2 Accessibility	V2 Transparency	R2 Dependability
O3 Completeness	P3 Responsiveness	J3 Reliability	E3 Ambiance	A3 Inclusivity	V3 Effort	R3 Recovery
O4 Fit	P4 Communication	J4 Resolution	E4 Safety	A4 Convenience	V4 Worth	R4 Loyalty

Metadata Quick Reference

Dimension	Values	Default
Valence	V+, V-, V0, V+-	None
Intensity	I1, I2, I3	None
Specificity	S1, S2, S3	None
Actionability	A1, A2, A3	None
Temporal	TC, TR, TH, TF	TC
Evidence	ES, EI, EC	ES
Comparative	CR-N, CR-B, CR-W, CR-S	CR-N

URT v5.1 Calibration Test Set | Track A: Training Materials Gold Standard Corpus for Annotator Certification

71 KiB Raw Blame History

A3: Calibration Test Set

Universal Review Taxonomy (URT) v5.1 - Gold Standard Annotation Corpus

Table of Contents

1. Overview

1.1 Purpose

1.2 Gold Standard Creation Process

1.3 Usage Guidelines

1.4 Corpus Statistics

2. Test Set Structure

2.1 Difficulty Levels

2.2 Example Format

2.3 Profile Requirements by Example

3. Calibration Examples by Domain

3.1 Offering (O) Domain

O-EASY-01: Basic Product Function

O-EASY-02: Product Quality Praise

O-MEDIUM-01: Product Durability Issue

O-MEDIUM-02: Outcome Achievement

O-HARD-01: Specification Match vs Accuracy

O-HARD-02: Quality vs Environment Aesthetics

O-EXPERT-01: Feature Availability vs Inventory

3.2 People (P) Domain

P-EASY-01: Staff Warmth

P-EASY-02: Staff Rudeness

P-MEDIUM-01: Staff Competence vs Problem-Solving

P-MEDIUM-02: Responsiveness - Follow-Through

P-HARD-01: Empathy vs Listening

P-HARD-02: Professionalism vs Respect

3.3 Journey (J) Domain

J-EASY-01: Wait Time

J-EASY-02: Process Simplicity

J-MEDIUM-01: Process Accuracy

J-MEDIUM-02: Resolution Process

J-HARD-01: Resolution Speed vs Service Speed

J-HARD-02: System Availability vs Product Function

3.4 Environment (E) Domain

E-EASY-01: Physical Cleanliness

E-EASY-02: Digital Interface Design

E-MEDIUM-01: Interface Navigation

E-MEDIUM-02: Atmosphere/Vibe

E-HARD-01: Safety vs Cleanliness

E-HARD-02: Mobile Experience vs Digital Accessibility

3.5 Access (A) Domain

A-EASY-01: Operating Hours

A-EASY-02: Physical Accessibility

A-MEDIUM-01: Language Support

A-MEDIUM-02: Equal Treatment / Discrimination

A-HARD-01: Booking Access vs Wait Time

A-HARD-02: Digital Accessibility vs Mobile Experience

3.6 Value (V) Domain

V-EASY-01: Absolute Price

V-EASY-02: Hidden Costs

V-MEDIUM-01: Quality-Price Ratio

V-MEDIUM-02: Terms Fairness

V-HARD-01: Overall Value (Rip-off)

V-HARD-02: Advertising Accuracy vs Truthfulness

3.7 Relationship (R) Domain

R-EASY-01: Truthfulness

R-EASY-02: Track Record

R-MEDIUM-01: Recovery - Acknowledgment

R-MEDIUM-02: Compensation

R-HARD-01: Ethics (Scam)

R-HARD-02: Recovery vs Resolution (J4 vs R3)

R-EXPERT-01: Promise Keeping vs Follow-Through

4. Confusion Pair Examples

4.1 V vs R: Scam vs Rip-off

CP-VR-01: Exchange Complaint (V)

CP-VR-02: Character Judgment (R)

4.2 J4 vs R3: Process vs Ownership

CP-JR-01: Resolution Process (J4)

CP-JR-02: Recovery Ownership (R3)

4.3 P2 vs P3: Competence vs Responsiveness

CP-PP-01: Competence - Problem-Solving (P2.03)

CP-PP-02: Responsiveness - Follow-Through (P3.04)

4.4 O1.01 vs J3.03: Product Function vs System Uptime

CP-OJ-01: Product Function (O1.01)

CP-OJ-02: System Uptime (J3.03)

4.5 P3.04 vs R1.02: Specific vs Pattern

CP-PR-01: Specific Follow-Through (P3.04)

71 KiB

Raw Blame History