7.5 KiB
URT Taxonomy Gap Analysis
Executive Summary
The current taxonomy has significant gaps that cause ~30-40% of review content to be classified as generic codes (V4.03, O1.05) when more specific codes would be appropriate.
Current State: 7 domains, 28 categories, 552 subcodes Gap Impact: ~653 reviews (58% of dataset) mention topics without specific codes
Critical Gaps (High Frequency, No Coverage)
🔴 Gap 1: Family/Kids Experience
Mentions: 205 reviews (18% of dataset) Current Mapping: → V4.03 (Generic) or O1.05 (Outcome)
Missing Codes:
| Proposed Code | Name | Definition |
|---|---|---|
| O1.06 | Family Suitability | Appropriate for children and families |
| O1.07 | Age Appropriateness | Suitable for specific age groups |
| E1.06 | Child-Friendly Facilities | Amenities for children |
Example Reviews Being Misclassified:
- "Brilliant day for adults and kids" → V4.03 (should be O1.06)
- "Great family fun" → O1.05 (should be O1.06)
- "Los niños disfrutaron mucho" → V4.03 (should be O1.06)
🔴 Gap 2: Fun/Entertainment Value
Mentions: 198 reviews (18% of dataset) Current Mapping: → V4.03 (Generic) or O1.05 (Outcome)
Missing Codes:
| Proposed Code | Name | Definition |
|---|---|---|
| O1.08 | Entertainment Value | How enjoyable/fun the experience was |
| O1.09 | Excitement Level | Thrill and adrenaline factor |
| O1.10 | Engagement | How captivating the experience was |
Example Reviews Being Misclassified:
- "Everyone had a blast" → V4.03 (should be O1.08)
- "Muy divertido" → V4.03 (should be O1.08)
- "Fantastische kartbaan" → V4.03 (should be O1.08)
🔴 Gap 3: Recommendation Intent
Mentions: 103 reviews (9% of dataset) Current Mapping: → V4.03 (Generic)
Missing Codes:
| Proposed Code | Name | Definition |
|---|---|---|
| R1.06 | Would Recommend | Intent to recommend to others |
| R1.07 | Would Not Recommend | Explicit anti-recommendation |
| V4.06 | Net Promoter Signal | Explicit NPS-style sentiment |
Example Reviews Being Misclassified:
- "100% recomendable" → V4.03 (should be R1.06)
- "Highly recommend" → V4.03 (should be R1.06)
- "Don't come here" → V4.03 V- (should be R1.07)
🟡 Gap 4: Return Intent
Mentions: 65 reviews (6% of dataset) Current Mapping: → V4.03 or R4.03
Missing Codes:
| Proposed Code | Name | Definition |
|---|---|---|
| R1.08 | Will Return | Intent to visit again |
| R1.09 | Won't Return | Explicit no-return statement |
Example Reviews:
- "We'll definitely be back" → R4.03 (should be R1.08)
- "No volveré" → V4.03 (should be R1.09)
🟡 Gap 5: Food & Beverage
Mentions: 59 reviews (5% of dataset) Current Mapping: → O1.01 or V4.03
Missing Codes:
| Proposed Code | Name | Definition |
|---|---|---|
| O2.06 | Food Quality | Taste, freshness, presentation |
| O2.07 | Drink Quality | Beverage quality |
| O2.08 | Menu Variety | Range of food/drink options |
| O2.09 | Portion Size | Amount of food served |
Example Reviews:
- "Great food at the cafe" → O1.01 (should be O2.06)
- "Drinks were overpriced" → V1.01 (should be O2.07 + V1.01)
🟡 Gap 6: Excitement/Thrill
Mentions: 23 reviews (2% of dataset) Current Mapping: → V4.03 or O1.05
Missing Code:
| Proposed Code | Name | Definition |
|---|---|---|
| O1.09 | Excitement Level | Thrill and adrenaline factor |
Medium Gaps (Moderate Frequency)
Gap 7: Booking/Reservation Process
Current: J2.xx exists but limited
Missing:
| Code | Name | Definition |
|---|---|---|
| J2.06 | Online Booking | Digital reservation experience |
| J2.07 | Booking Confirmation | Clear confirmation process |
Gap 8: Group Experience
Missing:
| Code | Name | Definition |
|---|---|---|
| O1.11 | Group Suitability | Good for groups/parties |
| O1.12 | Team Building | Corporate/team activities |
Gap 9: Seasonal/Weather Factors
Missing:
| Code | Name | Definition |
|---|---|---|
| E1.07 | Weather Protection | Shelter from elements |
| E1.08 | Seasonal Suitability | Appropriate for season |
Impact Analysis
Current Classification Distribution (V4.03 Overuse)
Code | Count | % | Issue
--------|-------|------|-------
P1.01 | 477 | 14% | ✅ Correct usage
V4.03 | 319 | 10% | ⚠️ Likely 50%+ misclassified
O1.02 | 270 | 8% | ✅ Correct usage
V1.01 | 211 | 6% | ✅ Correct usage
O1.01 | 174 | 5% | ✅ Correct usage
Estimated Misclassification Rate
| Gap Topic | Reviews | Est. Misclassified | % of Total |
|---|---|---|---|
| Family/Kids | 205 | ~180 | 16% |
| Fun/Entertainment | 198 | ~170 | 15% |
| Recommendation | 103 | ~95 | 8% |
| Return Intent | 65 | ~50 | 4% |
| Food/Drinks | 59 | ~40 | 4% |
| Excitement | 23 | ~20 | 2% |
| TOTAL | 653 | ~555 | ~49% |
Recommended Taxonomy Additions
Priority 1: Add to O1 (Core Product/Service)
INSERT INTO pipeline.urt_subcodes (code, category_code, domain_code, name, definition) VALUES
('O1.06', 'O1', 'O', 'Family Suitability', 'Appropriate for children and families'),
('O1.07', 'O1', 'O', 'Age Appropriateness', 'Suitable for specific age groups'),
('O1.08', 'O1', 'O', 'Entertainment Value', 'How enjoyable/fun the experience was'),
('O1.09', 'O1', 'O', 'Excitement Level', 'Thrill and adrenaline factor'),
('O1.10', 'O1', 'O', 'Engagement', 'How captivating the experience was'),
('O1.11', 'O1', 'O', 'Group Suitability', 'Good for groups/parties');
Priority 2: Add to R1 (Relationship/Loyalty)
INSERT INTO pipeline.urt_subcodes (code, category_code, domain_code, name, definition) VALUES
('R1.06', 'R1', 'R', 'Would Recommend', 'Intent to recommend to others'),
('R1.07', 'R1', 'R', 'Would Not Recommend', 'Explicit anti-recommendation'),
('R1.08', 'R1', 'R', 'Will Return', 'Intent to visit again'),
('R1.09', 'R1', 'R', 'Won''t Return', 'Explicit no-return statement');
Priority 3: Add to O2 (Product Features)
INSERT INTO pipeline.urt_subcodes (code, category_code, domain_code, name, definition) VALUES
('O2.06', 'O2', 'O', 'Food Quality', 'Taste, freshness, presentation of food'),
('O2.07', 'O2', 'O', 'Drink Quality', 'Quality of beverages'),
('O2.08', 'O2', 'O', 'Menu Variety', 'Range of food/drink options'),
('O2.09', 'O2', 'O', 'Portion Size', 'Amount of food served');
Validation Query
After adding codes, verify reduction in V4.03 usage:
-- Before: V4.03 count
SELECT COUNT(*) FROM pipeline.review_spans WHERE urt_primary = 'V4.03';
-- Expected: ~319
-- After reclassification, target:
-- V4.03: ~100 (true generic)
-- O1.06-O1.11: ~200 (entertainment/family)
-- R1.06-R1.09: ~150 (recommendation/return)
Conclusion
Is the taxonomy ready for production? ❌ No
Critical Issues:
- ~49% of reviews mention topics without specific codes
- V4.03 is a catch-all masking actionable insights
- Industry-specific codes (entertainment, F&B) are missing
Recommendation: Add 14 new subcodes before production to capture:
- Family/Kids experience (O1.06, O1.07)
- Entertainment value (O1.08, O1.09, O1.10)
- Recommendation intent (R1.06, R1.07)
- Return intent (R1.08, R1.09)
- Food/Beverage (O2.06-O2.09)
Estimated Improvement: Classification accuracy from ~50% specific to ~85% specific.