New structure: - scrapers/google_reviews/v1_0_0.py (was modules/scraper_clean.py) - scrapers/base.py (BaseScraper interface) - scrapers/registry.py (ScraperRegistry for version routing) - core/database.py, models.py, config.py, enums.py - utils/logger.py, crash_analyzer.py, health_checks.py, helpers.py, date_converter.py - workers/chrome_pool.py - services/webhook_service.py - api/ routes structure (empty, ready for Phase 2) - tests/ structure mirroring source All imports updated in: - api_server_production.py (7 import paths updated) - utils/health_checks.py (scraper import path) Legacy modules moved to modules/_legacy/: - data_storage.py, image_handler.py, s3_handler.py (unused) Syntax verified, frontend build passing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
7.7 KiB
Universal Review Taxonomy (URT) v5.1 Reference
Overview
The Universal Review Taxonomy (URT) is a classification system for customer feedback. It provides a structured approach to categorizing, annotating, and analyzing review content across any industry.
Key Characteristics
- Three Profiles: Core, Standard, Full (increasing detail)
- Seven Domains: Covering all aspects of customer experience
- Tier-3 Canonical Codes: Format
X#.##(e.g., J1.02, P2.15) - Dimensional Annotation: Valence, intensity, specificity, and more
- Causal Analysis: Root cause chains (Full profile)
Domain Codes
URT organizes feedback into seven domains, each identified by a single letter.
| Domain | Letter | Description |
|---|---|---|
| Offering | O | Product/service quality |
| Price | P | Value, pricing, promotions |
| Journey | J | Customer experience, timing, process |
| Environment | E | Physical/digital space |
| Attitude | A | Staff behavior, service attitude |
| Voice | V | Brand, communication, marketing |
| Relationship | R | Loyalty, trust, long-term relationship |
Tier-3 Code Format
Pattern: [OPJEAVR][1-4]\.[0-9]{2}
Examples:
J1.02- Journey domain, category 1, subcategory 02P2.15- Price domain, category 2, subcategory 15A3.01- Attitude domain, category 3, subcategory 01
Dimension Codes
Valence
Indicates the sentiment direction of the feedback.
| Code | Meaning |
|---|---|
| V+ | Positive |
| V- | Negative |
| V0 | Neutral |
| V± | Mixed |
Intensity
Indicates the strength of the expressed sentiment.
| Code | Meaning |
|---|---|
| I1 | Low intensity |
| I2 | Moderate intensity |
| I3 | High intensity |
Specificity (Standard+)
Indicates how detailed the feedback is.
| Code | Meaning |
|---|---|
| S1 | Low - vague, general |
| S2 | Medium - some detail |
| S3 | High - specific, precise |
Actionability (Standard+)
Indicates whether clear actions can be derived from the feedback.
| Code | Meaning |
|---|---|
| A1 | None - no clear action |
| A2 | Unclear - possible actions |
| A3 | Clear - specific actionable |
Temporal (Standard+)
Indicates the time frame referenced in the feedback.
| Code | Meaning | Markers |
|---|---|---|
| TC | Current - this visit | "today", "this time", "yesterday" |
| TR | Recent - last few visits | "lately", "recently", "again" |
| TH | Historical - long-standing | "for years", "always", "historically" |
| TF | Future - expectations | "I won't come back", "next time" |
Default: TC when no temporal language exists.
Evidence (Standard+)
Indicates how the information was obtained from the text.
| Code | Meaning | Example |
|---|---|---|
| ES | Stated - explicit in text | "Waited 45 minutes" |
| EI | Inferred - logically entailed | "Took 3 weeks to reply" → slow response |
| EC | Contextual - depends on context | "That happened again" |
Default: ES. Use EI/EC only when needed.
Comparative
Indicates whether the feedback compares to alternatives.
| Code | Meaning |
|---|---|
| CR-N | No comparison |
| CR-B | Better than alternatives |
| CR-W | Worse than alternatives |
| CR-S | Same as alternatives |
USN (URT String Notation)
USN is a compact string encoding for URT annotations.
Grammar
Standard: URT:S:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}
Full: URT:F:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}:{causal}
Encoding Rules
Valence:
+for V+-for V-
Intensity:
1for I12for I23for I3
Examples
Standard Profile:
URT:S:J1.03:-2:22TC.ES.N
Decoded:
- Profile: Standard
- Code: J1.03
- Valence: V- (negative)
- Intensity: I2
- Specificity: S2
- Actionability: A2
- Temporal: TC
- Evidence: ES
- Comparative: CR-N
Full Profile with Causal Chain:
URT:F:J1.01+A1.04:-3:23TR.EI.S:CD.O,MG.O
Decoded:
- Profile: Full
- Codes: J1.01, A1.04
- Valence: V- (negative)
- Intensity: I3
- Specificity: S2
- Actionability: A3
- Temporal: TR
- Evidence: EI
- Comparative: CR-S
- Causal: CD.O (Conditions-Operational), MG.O (Management-Oversight)
Causal Chain (Full Profile Only)
The causal chain identifies root causes across three layers, ordered from immediate to systemic.
Layers
| Layer | Codes | Scope |
|---|---|---|
| conditions | CD-S, CD-T, CD-E, CD-F, CD-O | Staff State, Team Dynamics, Equipment, Facility, Operational |
| management | MG-P, MG-T, MG-O, MG-R, MG-C | Planning, Training, Oversight, Resources, Communication |
| systemic | SY-R, SY-P, SY-C, SY-S, SY-H, SY-X | Resource Decisions, Policy, Culture, Standards, Human Capital, External |
Code Reference
Conditions Layer:
CD-S- Staff StateCD-T- Team DynamicsCD-E- EquipmentCD-F- FacilityCD-O- Operational
Management Layer:
MG-P- PlanningMG-T- TrainingMG-O- OversightMG-R- ResourcesMG-C- Communication
Systemic Layer:
SY-R- Resource DecisionsSY-P- PolicySY-C- CultureSY-S- StandardsSY-H- Human CapitalSY-X- External
JSONB Schema
[
{"layer": "conditions", "code": "CD-O", "evidence": "ES"},
{"layer": "management", "code": "MG-P", "evidence": "EI"}
]
Constraints
- Maximum 3 entries (one per layer)
- Only include when text explicitly supports it
- Order: conditions → management → systemic
Span Boundary Detection Rules
Spans are detected at the clause/topic level, not sentence level.
Split Rules (in priority order)
- Split on contrasting conjunctions: but, however, although, despite, yet
- Split when subject/target changes (topic shift)
- Split when valence changes (positive ↔ negative)
- Split when domain changes (O/P/J/E/A/V/R)
- Keep together for cause→effect within same feedback unit
Guidelines
- Maximum: ~3 spans per sentence
- Validation: If 4+ spans detected, re-check for over-splitting
Example
Input:
"The food was great but the service was slow and the bathroom was dirty."
Output: 3 spans
- "The food was great" (Offering, positive)
- "the service was slow" (Journey/Attitude, negative)
- "the bathroom was dirty" (Environment, negative)
Reasoning: Topic shift + domain shift at each boundary.
Primary Span Selection
When a review contains multiple spans, select the primary span using these criteria in order:
Selection Priority
- Highest intensity (I3 > I2 > I1)
- Tie-break: Negative over positive (V- > V± > V0 > V+)
- Tie-break: Earliest span_index
Example
Given spans:
- Span 0: I2, V+
- Span 1: I3, V+
- Span 2: I3, V-
Primary: Span 2 (highest intensity I3, negative valence wins tie-break)
Secondary Codes Rules
Secondary codes capture additional topics mentioned in a span.
Constraints
- Maximum: 2 secondary codes
- Format: Must be Tier-3 (X#.##)
- Recommendation: Should be different domain from primary
Example
Primary: J1.03 (Journey)
Secondary: A2.01, E1.05 (Attitude, Environment)
Quick Reference Card
Profiles
| Profile | Dimensions | Causal Chain |
|---|---|---|
| Core | V, I | No |
| Standard | V, I, S, A, T, E, CR | No |
| Full | V, I, S, A, T, E, CR | Yes |
USN Quick Format
URT:{S|F}:{tier3_codes}:{valence}{intensity}:{SAT}.{E}.{CR}[:{causal}]
Domain Letters
O P J E A V R
│ │ │ │ │ │ └─ Relationship
│ │ │ │ │ └─── Voice
│ │ │ │ └───── Attitude
│ │ │ └─────── Environment
│ │ └───────── Journey
│ └─────────── Price
└───────────── Offering