239 lines
7.5 KiB
Markdown
239 lines
7.5 KiB
Markdown
# URT Taxonomy Gap Analysis
|
|
|
|
## Executive Summary
|
|
|
|
The current taxonomy has **significant gaps** that cause ~30-40% of review content to be classified as generic codes (V4.03, O1.05) when more specific codes would be appropriate.
|
|
|
|
**Current State**: 7 domains, 28 categories, 552 subcodes
|
|
**Gap Impact**: ~653 reviews (58% of dataset) mention topics without specific codes
|
|
|
|
---
|
|
|
|
## Critical Gaps (High Frequency, No Coverage)
|
|
|
|
### 🔴 Gap 1: Family/Kids Experience
|
|
**Mentions**: 205 reviews (18% of dataset)
|
|
**Current Mapping**: → V4.03 (Generic) or O1.05 (Outcome)
|
|
|
|
**Missing Codes**:
|
|
| Proposed Code | Name | Definition |
|
|
|---------------|------|------------|
|
|
| O1.06 | Family Suitability | Appropriate for children and families |
|
|
| O1.07 | Age Appropriateness | Suitable for specific age groups |
|
|
| E1.06 | Child-Friendly Facilities | Amenities for children |
|
|
|
|
**Example Reviews Being Misclassified**:
|
|
- "Brilliant day for adults and kids" → V4.03 (should be O1.06)
|
|
- "Great family fun" → O1.05 (should be O1.06)
|
|
- "Los niños disfrutaron mucho" → V4.03 (should be O1.06)
|
|
|
|
---
|
|
|
|
### 🔴 Gap 2: Fun/Entertainment Value
|
|
**Mentions**: 198 reviews (18% of dataset)
|
|
**Current Mapping**: → V4.03 (Generic) or O1.05 (Outcome)
|
|
|
|
**Missing Codes**:
|
|
| Proposed Code | Name | Definition |
|
|
|---------------|------|------------|
|
|
| O1.08 | Entertainment Value | How enjoyable/fun the experience was |
|
|
| O1.09 | Excitement Level | Thrill and adrenaline factor |
|
|
| O1.10 | Engagement | How captivating the experience was |
|
|
|
|
**Example Reviews Being Misclassified**:
|
|
- "Everyone had a blast" → V4.03 (should be O1.08)
|
|
- "Muy divertido" → V4.03 (should be O1.08)
|
|
- "Fantastische kartbaan" → V4.03 (should be O1.08)
|
|
|
|
---
|
|
|
|
### 🔴 Gap 3: Recommendation Intent
|
|
**Mentions**: 103 reviews (9% of dataset)
|
|
**Current Mapping**: → V4.03 (Generic)
|
|
|
|
**Missing Codes**:
|
|
| Proposed Code | Name | Definition |
|
|
|---------------|------|------------|
|
|
| R1.06 | Would Recommend | Intent to recommend to others |
|
|
| R1.07 | Would Not Recommend | Explicit anti-recommendation |
|
|
| V4.06 | Net Promoter Signal | Explicit NPS-style sentiment |
|
|
|
|
**Example Reviews Being Misclassified**:
|
|
- "100% recomendable" → V4.03 (should be R1.06)
|
|
- "Highly recommend" → V4.03 (should be R1.06)
|
|
- "Don't come here" → V4.03 V- (should be R1.07)
|
|
|
|
---
|
|
|
|
### 🟡 Gap 4: Return Intent
|
|
**Mentions**: 65 reviews (6% of dataset)
|
|
**Current Mapping**: → V4.03 or R4.03
|
|
|
|
**Missing Codes**:
|
|
| Proposed Code | Name | Definition |
|
|
|---------------|------|------------|
|
|
| R1.08 | Will Return | Intent to visit again |
|
|
| R1.09 | Won't Return | Explicit no-return statement |
|
|
|
|
**Example Reviews**:
|
|
- "We'll definitely be back" → R4.03 (should be R1.08)
|
|
- "No volveré" → V4.03 (should be R1.09)
|
|
|
|
---
|
|
|
|
### 🟡 Gap 5: Food & Beverage
|
|
**Mentions**: 59 reviews (5% of dataset)
|
|
**Current Mapping**: → O1.01 or V4.03
|
|
|
|
**Missing Codes**:
|
|
| Proposed Code | Name | Definition |
|
|
|---------------|------|------------|
|
|
| O2.06 | Food Quality | Taste, freshness, presentation |
|
|
| O2.07 | Drink Quality | Beverage quality |
|
|
| O2.08 | Menu Variety | Range of food/drink options |
|
|
| O2.09 | Portion Size | Amount of food served |
|
|
|
|
**Example Reviews**:
|
|
- "Great food at the cafe" → O1.01 (should be O2.06)
|
|
- "Drinks were overpriced" → V1.01 (should be O2.07 + V1.01)
|
|
|
|
---
|
|
|
|
### 🟡 Gap 6: Excitement/Thrill
|
|
**Mentions**: 23 reviews (2% of dataset)
|
|
**Current Mapping**: → V4.03 or O1.05
|
|
|
|
**Missing Code**:
|
|
| Proposed Code | Name | Definition |
|
|
|---------------|------|------------|
|
|
| O1.09 | Excitement Level | Thrill and adrenaline factor |
|
|
|
|
---
|
|
|
|
## Medium Gaps (Moderate Frequency)
|
|
|
|
### Gap 7: Booking/Reservation Process
|
|
**Current**: J2.xx exists but limited
|
|
|
|
**Missing**:
|
|
| Code | Name | Definition |
|
|
|------|------|------------|
|
|
| J2.06 | Online Booking | Digital reservation experience |
|
|
| J2.07 | Booking Confirmation | Clear confirmation process |
|
|
|
|
---
|
|
|
|
### Gap 8: Group Experience
|
|
**Missing**:
|
|
| Code | Name | Definition |
|
|
|------|------|------------|
|
|
| O1.11 | Group Suitability | Good for groups/parties |
|
|
| O1.12 | Team Building | Corporate/team activities |
|
|
|
|
---
|
|
|
|
### Gap 9: Seasonal/Weather Factors
|
|
**Missing**:
|
|
| Code | Name | Definition |
|
|
|------|------|------------|
|
|
| E1.07 | Weather Protection | Shelter from elements |
|
|
| E1.08 | Seasonal Suitability | Appropriate for season |
|
|
|
|
---
|
|
|
|
## Impact Analysis
|
|
|
|
### Current Classification Distribution (V4.03 Overuse)
|
|
|
|
```
|
|
Code | Count | % | Issue
|
|
--------|-------|------|-------
|
|
P1.01 | 477 | 14% | ✅ Correct usage
|
|
V4.03 | 319 | 10% | ⚠️ Likely 50%+ misclassified
|
|
O1.02 | 270 | 8% | ✅ Correct usage
|
|
V1.01 | 211 | 6% | ✅ Correct usage
|
|
O1.01 | 174 | 5% | ✅ Correct usage
|
|
```
|
|
|
|
### Estimated Misclassification Rate
|
|
|
|
| Gap Topic | Reviews | Est. Misclassified | % of Total |
|
|
|-----------|---------|-------------------|------------|
|
|
| Family/Kids | 205 | ~180 | 16% |
|
|
| Fun/Entertainment | 198 | ~170 | 15% |
|
|
| Recommendation | 103 | ~95 | 8% |
|
|
| Return Intent | 65 | ~50 | 4% |
|
|
| Food/Drinks | 59 | ~40 | 4% |
|
|
| Excitement | 23 | ~20 | 2% |
|
|
| **TOTAL** | **653** | **~555** | **~49%** |
|
|
|
|
---
|
|
|
|
## Recommended Taxonomy Additions
|
|
|
|
### Priority 1: Add to O1 (Core Product/Service)
|
|
```sql
|
|
INSERT INTO pipeline.urt_subcodes (code, category_code, domain_code, name, definition) VALUES
|
|
('O1.06', 'O1', 'O', 'Family Suitability', 'Appropriate for children and families'),
|
|
('O1.07', 'O1', 'O', 'Age Appropriateness', 'Suitable for specific age groups'),
|
|
('O1.08', 'O1', 'O', 'Entertainment Value', 'How enjoyable/fun the experience was'),
|
|
('O1.09', 'O1', 'O', 'Excitement Level', 'Thrill and adrenaline factor'),
|
|
('O1.10', 'O1', 'O', 'Engagement', 'How captivating the experience was'),
|
|
('O1.11', 'O1', 'O', 'Group Suitability', 'Good for groups/parties');
|
|
```
|
|
|
|
### Priority 2: Add to R1 (Relationship/Loyalty)
|
|
```sql
|
|
INSERT INTO pipeline.urt_subcodes (code, category_code, domain_code, name, definition) VALUES
|
|
('R1.06', 'R1', 'R', 'Would Recommend', 'Intent to recommend to others'),
|
|
('R1.07', 'R1', 'R', 'Would Not Recommend', 'Explicit anti-recommendation'),
|
|
('R1.08', 'R1', 'R', 'Will Return', 'Intent to visit again'),
|
|
('R1.09', 'R1', 'R', 'Won''t Return', 'Explicit no-return statement');
|
|
```
|
|
|
|
### Priority 3: Add to O2 (Product Features)
|
|
```sql
|
|
INSERT INTO pipeline.urt_subcodes (code, category_code, domain_code, name, definition) VALUES
|
|
('O2.06', 'O2', 'O', 'Food Quality', 'Taste, freshness, presentation of food'),
|
|
('O2.07', 'O2', 'O', 'Drink Quality', 'Quality of beverages'),
|
|
('O2.08', 'O2', 'O', 'Menu Variety', 'Range of food/drink options'),
|
|
('O2.09', 'O2', 'O', 'Portion Size', 'Amount of food served');
|
|
```
|
|
|
|
---
|
|
|
|
## Validation Query
|
|
|
|
After adding codes, verify reduction in V4.03 usage:
|
|
|
|
```sql
|
|
-- Before: V4.03 count
|
|
SELECT COUNT(*) FROM pipeline.review_spans WHERE urt_primary = 'V4.03';
|
|
-- Expected: ~319
|
|
|
|
-- After reclassification, target:
|
|
-- V4.03: ~100 (true generic)
|
|
-- O1.06-O1.11: ~200 (entertainment/family)
|
|
-- R1.06-R1.09: ~150 (recommendation/return)
|
|
```
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
**Is the taxonomy ready for production?** ❌ **No**
|
|
|
|
**Critical Issues**:
|
|
1. ~49% of reviews mention topics without specific codes
|
|
2. V4.03 is a catch-all masking actionable insights
|
|
3. Industry-specific codes (entertainment, F&B) are missing
|
|
|
|
**Recommendation**: Add 14 new subcodes before production to capture:
|
|
- Family/Kids experience (O1.06, O1.07)
|
|
- Entertainment value (O1.08, O1.09, O1.10)
|
|
- Recommendation intent (R1.06, R1.07)
|
|
- Return intent (R1.08, R1.09)
|
|
- Food/Beverage (O2.06-O2.09)
|
|
|
|
**Estimated Improvement**: Classification accuracy from ~50% specific to ~85% specific.
|