New structure: - scrapers/google_reviews/v1_0_0.py (was modules/scraper_clean.py) - scrapers/base.py (BaseScraper interface) - scrapers/registry.py (ScraperRegistry for version routing) - core/database.py, models.py, config.py, enums.py - utils/logger.py, crash_analyzer.py, health_checks.py, helpers.py, date_converter.py - workers/chrome_pool.py - services/webhook_service.py - api/ routes structure (empty, ready for Phase 2) - tests/ structure mirroring source All imports updated in: - api_server_production.py (7 import paths updated) - utils/health_checks.py (scraper import path) Legacy modules moved to modules/_legacy/: - data_storage.py, image_handler.py, s3_handler.py (unused) Syntax verified, frontend build passing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
332 lines
7.7 KiB
Markdown
332 lines
7.7 KiB
Markdown
# Universal Review Taxonomy (URT) v5.1 Reference
|
|
|
|
## Overview
|
|
|
|
The Universal Review Taxonomy (URT) is a classification system for customer feedback. It provides a structured approach to categorizing, annotating, and analyzing review content across any industry.
|
|
|
|
### Key Characteristics
|
|
|
|
- **Three Profiles**: Core, Standard, Full (increasing detail)
|
|
- **Seven Domains**: Covering all aspects of customer experience
|
|
- **Tier-3 Canonical Codes**: Format `X#.##` (e.g., J1.02, P2.15)
|
|
- **Dimensional Annotation**: Valence, intensity, specificity, and more
|
|
- **Causal Analysis**: Root cause chains (Full profile)
|
|
|
|
---
|
|
|
|
## Domain Codes
|
|
|
|
URT organizes feedback into seven domains, each identified by a single letter.
|
|
|
|
| Domain | Letter | Description |
|
|
|--------|--------|-------------|
|
|
| Offering | O | Product/service quality |
|
|
| Price | P | Value, pricing, promotions |
|
|
| Journey | J | Customer experience, timing, process |
|
|
| Environment | E | Physical/digital space |
|
|
| Attitude | A | Staff behavior, service attitude |
|
|
| Voice | V | Brand, communication, marketing |
|
|
| Relationship | R | Loyalty, trust, long-term relationship |
|
|
|
|
### Tier-3 Code Format
|
|
|
|
```
|
|
Pattern: [OPJEAVR][1-4]\.[0-9]{2}
|
|
```
|
|
|
|
Examples:
|
|
- `J1.02` - Journey domain, category 1, subcategory 02
|
|
- `P2.15` - Price domain, category 2, subcategory 15
|
|
- `A3.01` - Attitude domain, category 3, subcategory 01
|
|
|
|
---
|
|
|
|
## Dimension Codes
|
|
|
|
### Valence
|
|
|
|
Indicates the sentiment direction of the feedback.
|
|
|
|
| Code | Meaning |
|
|
|------|---------|
|
|
| V+ | Positive |
|
|
| V- | Negative |
|
|
| V0 | Neutral |
|
|
| V± | Mixed |
|
|
|
|
### Intensity
|
|
|
|
Indicates the strength of the expressed sentiment.
|
|
|
|
| Code | Meaning |
|
|
|------|---------|
|
|
| I1 | Low intensity |
|
|
| I2 | Moderate intensity |
|
|
| I3 | High intensity |
|
|
|
|
### Specificity (Standard+)
|
|
|
|
Indicates how detailed the feedback is.
|
|
|
|
| Code | Meaning |
|
|
|------|---------|
|
|
| S1 | Low - vague, general |
|
|
| S2 | Medium - some detail |
|
|
| S3 | High - specific, precise |
|
|
|
|
### Actionability (Standard+)
|
|
|
|
Indicates whether clear actions can be derived from the feedback.
|
|
|
|
| Code | Meaning |
|
|
|------|---------|
|
|
| A1 | None - no clear action |
|
|
| A2 | Unclear - possible actions |
|
|
| A3 | Clear - specific actionable |
|
|
|
|
### Temporal (Standard+)
|
|
|
|
Indicates the time frame referenced in the feedback.
|
|
|
|
| Code | Meaning | Markers |
|
|
|------|---------|---------|
|
|
| TC | Current - this visit | "today", "this time", "yesterday" |
|
|
| TR | Recent - last few visits | "lately", "recently", "again" |
|
|
| TH | Historical - long-standing | "for years", "always", "historically" |
|
|
| TF | Future - expectations | "I won't come back", "next time" |
|
|
|
|
**Default**: TC when no temporal language exists.
|
|
|
|
### Evidence (Standard+)
|
|
|
|
Indicates how the information was obtained from the text.
|
|
|
|
| Code | Meaning | Example |
|
|
|------|---------|---------|
|
|
| ES | Stated - explicit in text | "Waited 45 minutes" |
|
|
| EI | Inferred - logically entailed | "Took 3 weeks to reply" → slow response |
|
|
| EC | Contextual - depends on context | "That happened again" |
|
|
|
|
**Default**: ES. Use EI/EC only when needed.
|
|
|
|
### Comparative
|
|
|
|
Indicates whether the feedback compares to alternatives.
|
|
|
|
| Code | Meaning |
|
|
|------|---------|
|
|
| CR-N | No comparison |
|
|
| CR-B | Better than alternatives |
|
|
| CR-W | Worse than alternatives |
|
|
| CR-S | Same as alternatives |
|
|
|
|
---
|
|
|
|
## USN (URT String Notation)
|
|
|
|
USN is a compact string encoding for URT annotations.
|
|
|
|
### Grammar
|
|
|
|
```
|
|
Standard: URT:S:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}
|
|
Full: URT:F:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}:{causal}
|
|
```
|
|
|
|
### Encoding Rules
|
|
|
|
**Valence**:
|
|
- `+` for V+
|
|
- `-` for V-
|
|
|
|
**Intensity**:
|
|
- `1` for I1
|
|
- `2` for I2
|
|
- `3` for I3
|
|
|
|
### Examples
|
|
|
|
**Standard Profile**:
|
|
```
|
|
URT:S:J1.03:-2:22TC.ES.N
|
|
```
|
|
Decoded:
|
|
- Profile: Standard
|
|
- Code: J1.03
|
|
- Valence: V- (negative)
|
|
- Intensity: I2
|
|
- Specificity: S2
|
|
- Actionability: A2
|
|
- Temporal: TC
|
|
- Evidence: ES
|
|
- Comparative: CR-N
|
|
|
|
**Full Profile with Causal Chain**:
|
|
```
|
|
URT:F:J1.01+A1.04:-3:23TR.EI.S:CD.O,MG.O
|
|
```
|
|
Decoded:
|
|
- Profile: Full
|
|
- Codes: J1.01, A1.04
|
|
- Valence: V- (negative)
|
|
- Intensity: I3
|
|
- Specificity: S2
|
|
- Actionability: A3
|
|
- Temporal: TR
|
|
- Evidence: EI
|
|
- Comparative: CR-S
|
|
- Causal: CD.O (Conditions-Operational), MG.O (Management-Oversight)
|
|
|
|
---
|
|
|
|
## Causal Chain (Full Profile Only)
|
|
|
|
The causal chain identifies root causes across three layers, ordered from immediate to systemic.
|
|
|
|
### Layers
|
|
|
|
| Layer | Codes | Scope |
|
|
|-------|-------|-------|
|
|
| conditions | CD-S, CD-T, CD-E, CD-F, CD-O | Staff State, Team Dynamics, Equipment, Facility, Operational |
|
|
| management | MG-P, MG-T, MG-O, MG-R, MG-C | Planning, Training, Oversight, Resources, Communication |
|
|
| systemic | SY-R, SY-P, SY-C, SY-S, SY-H, SY-X | Resource Decisions, Policy, Culture, Standards, Human Capital, External |
|
|
|
|
### Code Reference
|
|
|
|
**Conditions Layer**:
|
|
- `CD-S` - Staff State
|
|
- `CD-T` - Team Dynamics
|
|
- `CD-E` - Equipment
|
|
- `CD-F` - Facility
|
|
- `CD-O` - Operational
|
|
|
|
**Management Layer**:
|
|
- `MG-P` - Planning
|
|
- `MG-T` - Training
|
|
- `MG-O` - Oversight
|
|
- `MG-R` - Resources
|
|
- `MG-C` - Communication
|
|
|
|
**Systemic Layer**:
|
|
- `SY-R` - Resource Decisions
|
|
- `SY-P` - Policy
|
|
- `SY-C` - Culture
|
|
- `SY-S` - Standards
|
|
- `SY-H` - Human Capital
|
|
- `SY-X` - External
|
|
|
|
### JSONB Schema
|
|
|
|
```json
|
|
[
|
|
{"layer": "conditions", "code": "CD-O", "evidence": "ES"},
|
|
{"layer": "management", "code": "MG-P", "evidence": "EI"}
|
|
]
|
|
```
|
|
|
|
### Constraints
|
|
|
|
- Maximum 3 entries (one per layer)
|
|
- Only include when text explicitly supports it
|
|
- Order: conditions → management → systemic
|
|
|
|
---
|
|
|
|
## Span Boundary Detection Rules
|
|
|
|
Spans are detected at the clause/topic level, not sentence level.
|
|
|
|
### Split Rules (in priority order)
|
|
|
|
1. **Split on contrasting conjunctions**: but, however, although, despite, yet
|
|
2. **Split when subject/target changes** (topic shift)
|
|
3. **Split when valence changes** (positive ↔ negative)
|
|
4. **Split when domain changes** (O/P/J/E/A/V/R)
|
|
5. **Keep together** for cause→effect within same feedback unit
|
|
|
|
### Guidelines
|
|
|
|
- **Maximum**: ~3 spans per sentence
|
|
- **Validation**: If 4+ spans detected, re-check for over-splitting
|
|
|
|
### Example
|
|
|
|
**Input**:
|
|
> "The food was great but the service was slow and the bathroom was dirty."
|
|
|
|
**Output**: 3 spans
|
|
1. "The food was great" (Offering, positive)
|
|
2. "the service was slow" (Journey/Attitude, negative)
|
|
3. "the bathroom was dirty" (Environment, negative)
|
|
|
|
**Reasoning**: Topic shift + domain shift at each boundary.
|
|
|
|
---
|
|
|
|
## Primary Span Selection
|
|
|
|
When a review contains multiple spans, select the primary span using these criteria in order:
|
|
|
|
### Selection Priority
|
|
|
|
1. **Highest intensity** (I3 > I2 > I1)
|
|
2. **Tie-break**: Negative over positive (V- > V± > V0 > V+)
|
|
3. **Tie-break**: Earliest span_index
|
|
|
|
### Example
|
|
|
|
Given spans:
|
|
- Span 0: I2, V+
|
|
- Span 1: I3, V+
|
|
- Span 2: I3, V-
|
|
|
|
**Primary**: Span 2 (highest intensity I3, negative valence wins tie-break)
|
|
|
|
---
|
|
|
|
## Secondary Codes Rules
|
|
|
|
Secondary codes capture additional topics mentioned in a span.
|
|
|
|
### Constraints
|
|
|
|
- **Maximum**: 2 secondary codes
|
|
- **Format**: Must be Tier-3 (X#.##)
|
|
- **Recommendation**: Should be different domain from primary
|
|
|
|
### Example
|
|
|
|
Primary: `J1.03` (Journey)
|
|
Secondary: `A2.01`, `E1.05` (Attitude, Environment)
|
|
|
|
---
|
|
|
|
## Quick Reference Card
|
|
|
|
### Profiles
|
|
|
|
| Profile | Dimensions | Causal Chain |
|
|
|---------|------------|--------------|
|
|
| Core | V, I | No |
|
|
| Standard | V, I, S, A, T, E, CR | No |
|
|
| Full | V, I, S, A, T, E, CR | Yes |
|
|
|
|
### USN Quick Format
|
|
|
|
```
|
|
URT:{S|F}:{tier3_codes}:{valence}{intensity}:{SAT}.{E}.{CR}[:{causal}]
|
|
```
|
|
|
|
### Domain Letters
|
|
|
|
```
|
|
O P J E A V R
|
|
│ │ │ │ │ │ └─ Relationship
|
|
│ │ │ │ │ └─── Voice
|
|
│ │ │ │ └───── Attitude
|
|
│ │ │ └─────── Environment
|
|
│ │ └───────── Journey
|
|
│ └─────────── Price
|
|
└───────────── Offering
|
|
```
|