# URT Taxonomy Gap Analysis ## Executive Summary The current taxonomy has **significant gaps** that cause ~30-40% of review content to be classified as generic codes (V4.03, O1.05) when more specific codes would be appropriate. **Current State**: 7 domains, 28 categories, 552 subcodes **Gap Impact**: ~653 reviews (58% of dataset) mention topics without specific codes --- ## Critical Gaps (High Frequency, No Coverage) ### πŸ”΄ Gap 1: Family/Kids Experience **Mentions**: 205 reviews (18% of dataset) **Current Mapping**: β†’ V4.03 (Generic) or O1.05 (Outcome) **Missing Codes**: | Proposed Code | Name | Definition | |---------------|------|------------| | O1.06 | Family Suitability | Appropriate for children and families | | O1.07 | Age Appropriateness | Suitable for specific age groups | | E1.06 | Child-Friendly Facilities | Amenities for children | **Example Reviews Being Misclassified**: - "Brilliant day for adults and kids" β†’ V4.03 (should be O1.06) - "Great family fun" β†’ O1.05 (should be O1.06) - "Los niΓ±os disfrutaron mucho" β†’ V4.03 (should be O1.06) --- ### πŸ”΄ Gap 2: Fun/Entertainment Value **Mentions**: 198 reviews (18% of dataset) **Current Mapping**: β†’ V4.03 (Generic) or O1.05 (Outcome) **Missing Codes**: | Proposed Code | Name | Definition | |---------------|------|------------| | O1.08 | Entertainment Value | How enjoyable/fun the experience was | | O1.09 | Excitement Level | Thrill and adrenaline factor | | O1.10 | Engagement | How captivating the experience was | **Example Reviews Being Misclassified**: - "Everyone had a blast" β†’ V4.03 (should be O1.08) - "Muy divertido" β†’ V4.03 (should be O1.08) - "Fantastische kartbaan" β†’ V4.03 (should be O1.08) --- ### πŸ”΄ Gap 3: Recommendation Intent **Mentions**: 103 reviews (9% of dataset) **Current Mapping**: β†’ V4.03 (Generic) **Missing Codes**: | Proposed Code | Name | Definition | |---------------|------|------------| | R1.06 | Would Recommend | Intent to recommend to others | | R1.07 | Would Not Recommend | Explicit anti-recommendation | | V4.06 | Net Promoter Signal | Explicit NPS-style sentiment | **Example Reviews Being Misclassified**: - "100% recomendable" β†’ V4.03 (should be R1.06) - "Highly recommend" β†’ V4.03 (should be R1.06) - "Don't come here" β†’ V4.03 V- (should be R1.07) --- ### 🟑 Gap 4: Return Intent **Mentions**: 65 reviews (6% of dataset) **Current Mapping**: β†’ V4.03 or R4.03 **Missing Codes**: | Proposed Code | Name | Definition | |---------------|------|------------| | R1.08 | Will Return | Intent to visit again | | R1.09 | Won't Return | Explicit no-return statement | **Example Reviews**: - "We'll definitely be back" β†’ R4.03 (should be R1.08) - "No volverΓ©" β†’ V4.03 (should be R1.09) --- ### 🟑 Gap 5: Food & Beverage **Mentions**: 59 reviews (5% of dataset) **Current Mapping**: β†’ O1.01 or V4.03 **Missing Codes**: | Proposed Code | Name | Definition | |---------------|------|------------| | O2.06 | Food Quality | Taste, freshness, presentation | | O2.07 | Drink Quality | Beverage quality | | O2.08 | Menu Variety | Range of food/drink options | | O2.09 | Portion Size | Amount of food served | **Example Reviews**: - "Great food at the cafe" β†’ O1.01 (should be O2.06) - "Drinks were overpriced" β†’ V1.01 (should be O2.07 + V1.01) --- ### 🟑 Gap 6: Excitement/Thrill **Mentions**: 23 reviews (2% of dataset) **Current Mapping**: β†’ V4.03 or O1.05 **Missing Code**: | Proposed Code | Name | Definition | |---------------|------|------------| | O1.09 | Excitement Level | Thrill and adrenaline factor | --- ## Medium Gaps (Moderate Frequency) ### Gap 7: Booking/Reservation Process **Current**: J2.xx exists but limited **Missing**: | Code | Name | Definition | |------|------|------------| | J2.06 | Online Booking | Digital reservation experience | | J2.07 | Booking Confirmation | Clear confirmation process | --- ### Gap 8: Group Experience **Missing**: | Code | Name | Definition | |------|------|------------| | O1.11 | Group Suitability | Good for groups/parties | | O1.12 | Team Building | Corporate/team activities | --- ### Gap 9: Seasonal/Weather Factors **Missing**: | Code | Name | Definition | |------|------|------------| | E1.07 | Weather Protection | Shelter from elements | | E1.08 | Seasonal Suitability | Appropriate for season | --- ## Impact Analysis ### Current Classification Distribution (V4.03 Overuse) ``` Code | Count | % | Issue --------|-------|------|------- P1.01 | 477 | 14% | βœ… Correct usage V4.03 | 319 | 10% | ⚠️ Likely 50%+ misclassified O1.02 | 270 | 8% | βœ… Correct usage V1.01 | 211 | 6% | βœ… Correct usage O1.01 | 174 | 5% | βœ… Correct usage ``` ### Estimated Misclassification Rate | Gap Topic | Reviews | Est. Misclassified | % of Total | |-----------|---------|-------------------|------------| | Family/Kids | 205 | ~180 | 16% | | Fun/Entertainment | 198 | ~170 | 15% | | Recommendation | 103 | ~95 | 8% | | Return Intent | 65 | ~50 | 4% | | Food/Drinks | 59 | ~40 | 4% | | Excitement | 23 | ~20 | 2% | | **TOTAL** | **653** | **~555** | **~49%** | --- ## Recommended Taxonomy Additions ### Priority 1: Add to O1 (Core Product/Service) ```sql INSERT INTO pipeline.urt_subcodes (code, category_code, domain_code, name, definition) VALUES ('O1.06', 'O1', 'O', 'Family Suitability', 'Appropriate for children and families'), ('O1.07', 'O1', 'O', 'Age Appropriateness', 'Suitable for specific age groups'), ('O1.08', 'O1', 'O', 'Entertainment Value', 'How enjoyable/fun the experience was'), ('O1.09', 'O1', 'O', 'Excitement Level', 'Thrill and adrenaline factor'), ('O1.10', 'O1', 'O', 'Engagement', 'How captivating the experience was'), ('O1.11', 'O1', 'O', 'Group Suitability', 'Good for groups/parties'); ``` ### Priority 2: Add to R1 (Relationship/Loyalty) ```sql INSERT INTO pipeline.urt_subcodes (code, category_code, domain_code, name, definition) VALUES ('R1.06', 'R1', 'R', 'Would Recommend', 'Intent to recommend to others'), ('R1.07', 'R1', 'R', 'Would Not Recommend', 'Explicit anti-recommendation'), ('R1.08', 'R1', 'R', 'Will Return', 'Intent to visit again'), ('R1.09', 'R1', 'R', 'Won''t Return', 'Explicit no-return statement'); ``` ### Priority 3: Add to O2 (Product Features) ```sql INSERT INTO pipeline.urt_subcodes (code, category_code, domain_code, name, definition) VALUES ('O2.06', 'O2', 'O', 'Food Quality', 'Taste, freshness, presentation of food'), ('O2.07', 'O2', 'O', 'Drink Quality', 'Quality of beverages'), ('O2.08', 'O2', 'O', 'Menu Variety', 'Range of food/drink options'), ('O2.09', 'O2', 'O', 'Portion Size', 'Amount of food served'); ``` --- ## Validation Query After adding codes, verify reduction in V4.03 usage: ```sql -- Before: V4.03 count SELECT COUNT(*) FROM pipeline.review_spans WHERE urt_primary = 'V4.03'; -- Expected: ~319 -- After reclassification, target: -- V4.03: ~100 (true generic) -- O1.06-O1.11: ~200 (entertainment/family) -- R1.06-R1.09: ~150 (recommendation/return) ``` --- ## Conclusion **Is the taxonomy ready for production?** ❌ **No** **Critical Issues**: 1. ~49% of reviews mention topics without specific codes 2. V4.03 is a catch-all masking actionable insights 3. Industry-specific codes (entertainment, F&B) are missing **Recommendation**: Add 14 new subcodes before production to capture: - Family/Kids experience (O1.06, O1.07) - Entertainment value (O1.08, O1.09, O1.10) - Recommendation intent (R1.06, R1.07) - Return intent (R1.08, R1.09) - Food/Beverage (O2.06-O2.09) **Estimated Improvement**: Classification accuracy from ~50% specific to ~85% specific.