Phase 0: Project restructure to ReviewIQ platform architecture
New structure: - scrapers/google_reviews/v1_0_0.py (was modules/scraper_clean.py) - scrapers/base.py (BaseScraper interface) - scrapers/registry.py (ScraperRegistry for version routing) - core/database.py, models.py, config.py, enums.py - utils/logger.py, crash_analyzer.py, health_checks.py, helpers.py, date_converter.py - workers/chrome_pool.py - services/webhook_service.py - api/ routes structure (empty, ready for Phase 2) - tests/ structure mirroring source All imports updated in: - api_server_production.py (7 import paths updated) - utils/health_checks.py (scraper import path) Legacy modules moved to modules/_legacy/: - data_storage.py, image_handler.py, s3_handler.py (unused) Syntax verified, frontend build passing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
331
.artifacts/URT-v5.1-Reference.md
Normal file
331
.artifacts/URT-v5.1-Reference.md
Normal file
@@ -0,0 +1,331 @@
|
||||
# Universal Review Taxonomy (URT) v5.1 Reference
|
||||
|
||||
## Overview
|
||||
|
||||
The Universal Review Taxonomy (URT) is a classification system for customer feedback. It provides a structured approach to categorizing, annotating, and analyzing review content across any industry.
|
||||
|
||||
### Key Characteristics
|
||||
|
||||
- **Three Profiles**: Core, Standard, Full (increasing detail)
|
||||
- **Seven Domains**: Covering all aspects of customer experience
|
||||
- **Tier-3 Canonical Codes**: Format `X#.##` (e.g., J1.02, P2.15)
|
||||
- **Dimensional Annotation**: Valence, intensity, specificity, and more
|
||||
- **Causal Analysis**: Root cause chains (Full profile)
|
||||
|
||||
---
|
||||
|
||||
## Domain Codes
|
||||
|
||||
URT organizes feedback into seven domains, each identified by a single letter.
|
||||
|
||||
| Domain | Letter | Description |
|
||||
|--------|--------|-------------|
|
||||
| Offering | O | Product/service quality |
|
||||
| Price | P | Value, pricing, promotions |
|
||||
| Journey | J | Customer experience, timing, process |
|
||||
| Environment | E | Physical/digital space |
|
||||
| Attitude | A | Staff behavior, service attitude |
|
||||
| Voice | V | Brand, communication, marketing |
|
||||
| Relationship | R | Loyalty, trust, long-term relationship |
|
||||
|
||||
### Tier-3 Code Format
|
||||
|
||||
```
|
||||
Pattern: [OPJEAVR][1-4]\.[0-9]{2}
|
||||
```
|
||||
|
||||
Examples:
|
||||
- `J1.02` - Journey domain, category 1, subcategory 02
|
||||
- `P2.15` - Price domain, category 2, subcategory 15
|
||||
- `A3.01` - Attitude domain, category 3, subcategory 01
|
||||
|
||||
---
|
||||
|
||||
## Dimension Codes
|
||||
|
||||
### Valence
|
||||
|
||||
Indicates the sentiment direction of the feedback.
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| V+ | Positive |
|
||||
| V- | Negative |
|
||||
| V0 | Neutral |
|
||||
| V± | Mixed |
|
||||
|
||||
### Intensity
|
||||
|
||||
Indicates the strength of the expressed sentiment.
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| I1 | Low intensity |
|
||||
| I2 | Moderate intensity |
|
||||
| I3 | High intensity |
|
||||
|
||||
### Specificity (Standard+)
|
||||
|
||||
Indicates how detailed the feedback is.
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| S1 | Low - vague, general |
|
||||
| S2 | Medium - some detail |
|
||||
| S3 | High - specific, precise |
|
||||
|
||||
### Actionability (Standard+)
|
||||
|
||||
Indicates whether clear actions can be derived from the feedback.
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| A1 | None - no clear action |
|
||||
| A2 | Unclear - possible actions |
|
||||
| A3 | Clear - specific actionable |
|
||||
|
||||
### Temporal (Standard+)
|
||||
|
||||
Indicates the time frame referenced in the feedback.
|
||||
|
||||
| Code | Meaning | Markers |
|
||||
|------|---------|---------|
|
||||
| TC | Current - this visit | "today", "this time", "yesterday" |
|
||||
| TR | Recent - last few visits | "lately", "recently", "again" |
|
||||
| TH | Historical - long-standing | "for years", "always", "historically" |
|
||||
| TF | Future - expectations | "I won't come back", "next time" |
|
||||
|
||||
**Default**: TC when no temporal language exists.
|
||||
|
||||
### Evidence (Standard+)
|
||||
|
||||
Indicates how the information was obtained from the text.
|
||||
|
||||
| Code | Meaning | Example |
|
||||
|------|---------|---------|
|
||||
| ES | Stated - explicit in text | "Waited 45 minutes" |
|
||||
| EI | Inferred - logically entailed | "Took 3 weeks to reply" → slow response |
|
||||
| EC | Contextual - depends on context | "That happened again" |
|
||||
|
||||
**Default**: ES. Use EI/EC only when needed.
|
||||
|
||||
### Comparative
|
||||
|
||||
Indicates whether the feedback compares to alternatives.
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| CR-N | No comparison |
|
||||
| CR-B | Better than alternatives |
|
||||
| CR-W | Worse than alternatives |
|
||||
| CR-S | Same as alternatives |
|
||||
|
||||
---
|
||||
|
||||
## USN (URT String Notation)
|
||||
|
||||
USN is a compact string encoding for URT annotations.
|
||||
|
||||
### Grammar
|
||||
|
||||
```
|
||||
Standard: URT:S:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}
|
||||
Full: URT:F:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}:{causal}
|
||||
```
|
||||
|
||||
### Encoding Rules
|
||||
|
||||
**Valence**:
|
||||
- `+` for V+
|
||||
- `-` for V-
|
||||
|
||||
**Intensity**:
|
||||
- `1` for I1
|
||||
- `2` for I2
|
||||
- `3` for I3
|
||||
|
||||
### Examples
|
||||
|
||||
**Standard Profile**:
|
||||
```
|
||||
URT:S:J1.03:-2:22TC.ES.N
|
||||
```
|
||||
Decoded:
|
||||
- Profile: Standard
|
||||
- Code: J1.03
|
||||
- Valence: V- (negative)
|
||||
- Intensity: I2
|
||||
- Specificity: S2
|
||||
- Actionability: A2
|
||||
- Temporal: TC
|
||||
- Evidence: ES
|
||||
- Comparative: CR-N
|
||||
|
||||
**Full Profile with Causal Chain**:
|
||||
```
|
||||
URT:F:J1.01+A1.04:-3:23TR.EI.S:CD.O,MG.O
|
||||
```
|
||||
Decoded:
|
||||
- Profile: Full
|
||||
- Codes: J1.01, A1.04
|
||||
- Valence: V- (negative)
|
||||
- Intensity: I3
|
||||
- Specificity: S2
|
||||
- Actionability: A3
|
||||
- Temporal: TR
|
||||
- Evidence: EI
|
||||
- Comparative: CR-S
|
||||
- Causal: CD.O (Conditions-Operational), MG.O (Management-Oversight)
|
||||
|
||||
---
|
||||
|
||||
## Causal Chain (Full Profile Only)
|
||||
|
||||
The causal chain identifies root causes across three layers, ordered from immediate to systemic.
|
||||
|
||||
### Layers
|
||||
|
||||
| Layer | Codes | Scope |
|
||||
|-------|-------|-------|
|
||||
| conditions | CD-S, CD-T, CD-E, CD-F, CD-O | Staff State, Team Dynamics, Equipment, Facility, Operational |
|
||||
| management | MG-P, MG-T, MG-O, MG-R, MG-C | Planning, Training, Oversight, Resources, Communication |
|
||||
| systemic | SY-R, SY-P, SY-C, SY-S, SY-H, SY-X | Resource Decisions, Policy, Culture, Standards, Human Capital, External |
|
||||
|
||||
### Code Reference
|
||||
|
||||
**Conditions Layer**:
|
||||
- `CD-S` - Staff State
|
||||
- `CD-T` - Team Dynamics
|
||||
- `CD-E` - Equipment
|
||||
- `CD-F` - Facility
|
||||
- `CD-O` - Operational
|
||||
|
||||
**Management Layer**:
|
||||
- `MG-P` - Planning
|
||||
- `MG-T` - Training
|
||||
- `MG-O` - Oversight
|
||||
- `MG-R` - Resources
|
||||
- `MG-C` - Communication
|
||||
|
||||
**Systemic Layer**:
|
||||
- `SY-R` - Resource Decisions
|
||||
- `SY-P` - Policy
|
||||
- `SY-C` - Culture
|
||||
- `SY-S` - Standards
|
||||
- `SY-H` - Human Capital
|
||||
- `SY-X` - External
|
||||
|
||||
### JSONB Schema
|
||||
|
||||
```json
|
||||
[
|
||||
{"layer": "conditions", "code": "CD-O", "evidence": "ES"},
|
||||
{"layer": "management", "code": "MG-P", "evidence": "EI"}
|
||||
]
|
||||
```
|
||||
|
||||
### Constraints
|
||||
|
||||
- Maximum 3 entries (one per layer)
|
||||
- Only include when text explicitly supports it
|
||||
- Order: conditions → management → systemic
|
||||
|
||||
---
|
||||
|
||||
## Span Boundary Detection Rules
|
||||
|
||||
Spans are detected at the clause/topic level, not sentence level.
|
||||
|
||||
### Split Rules (in priority order)
|
||||
|
||||
1. **Split on contrasting conjunctions**: but, however, although, despite, yet
|
||||
2. **Split when subject/target changes** (topic shift)
|
||||
3. **Split when valence changes** (positive ↔ negative)
|
||||
4. **Split when domain changes** (O/P/J/E/A/V/R)
|
||||
5. **Keep together** for cause→effect within same feedback unit
|
||||
|
||||
### Guidelines
|
||||
|
||||
- **Maximum**: ~3 spans per sentence
|
||||
- **Validation**: If 4+ spans detected, re-check for over-splitting
|
||||
|
||||
### Example
|
||||
|
||||
**Input**:
|
||||
> "The food was great but the service was slow and the bathroom was dirty."
|
||||
|
||||
**Output**: 3 spans
|
||||
1. "The food was great" (Offering, positive)
|
||||
2. "the service was slow" (Journey/Attitude, negative)
|
||||
3. "the bathroom was dirty" (Environment, negative)
|
||||
|
||||
**Reasoning**: Topic shift + domain shift at each boundary.
|
||||
|
||||
---
|
||||
|
||||
## Primary Span Selection
|
||||
|
||||
When a review contains multiple spans, select the primary span using these criteria in order:
|
||||
|
||||
### Selection Priority
|
||||
|
||||
1. **Highest intensity** (I3 > I2 > I1)
|
||||
2. **Tie-break**: Negative over positive (V- > V± > V0 > V+)
|
||||
3. **Tie-break**: Earliest span_index
|
||||
|
||||
### Example
|
||||
|
||||
Given spans:
|
||||
- Span 0: I2, V+
|
||||
- Span 1: I3, V+
|
||||
- Span 2: I3, V-
|
||||
|
||||
**Primary**: Span 2 (highest intensity I3, negative valence wins tie-break)
|
||||
|
||||
---
|
||||
|
||||
## Secondary Codes Rules
|
||||
|
||||
Secondary codes capture additional topics mentioned in a span.
|
||||
|
||||
### Constraints
|
||||
|
||||
- **Maximum**: 2 secondary codes
|
||||
- **Format**: Must be Tier-3 (X#.##)
|
||||
- **Recommendation**: Should be different domain from primary
|
||||
|
||||
### Example
|
||||
|
||||
Primary: `J1.03` (Journey)
|
||||
Secondary: `A2.01`, `E1.05` (Attitude, Environment)
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference Card
|
||||
|
||||
### Profiles
|
||||
|
||||
| Profile | Dimensions | Causal Chain |
|
||||
|---------|------------|--------------|
|
||||
| Core | V, I | No |
|
||||
| Standard | V, I, S, A, T, E, CR | No |
|
||||
| Full | V, I, S, A, T, E, CR | Yes |
|
||||
|
||||
### USN Quick Format
|
||||
|
||||
```
|
||||
URT:{S|F}:{tier3_codes}:{valence}{intensity}:{SAT}.{E}.{CR}[:{causal}]
|
||||
```
|
||||
|
||||
### Domain Letters
|
||||
|
||||
```
|
||||
O P J E A V R
|
||||
│ │ │ │ │ │ └─ Relationship
|
||||
│ │ │ │ │ └─── Voice
|
||||
│ │ │ │ └───── Attitude
|
||||
│ │ │ └─────── Environment
|
||||
│ │ └───────── Journey
|
||||
│ └─────────── Price
|
||||
└───────────── Offering
|
||||
```
|
||||
Reference in New Issue
Block a user