Files
whyrating-engine-legacy/packages/reviewiq-pipeline/TAXONOMY_GAPS.md
2026-02-02 18:19:00 +00:00

7.5 KiB

URT Taxonomy Gap Analysis

Executive Summary

The current taxonomy has significant gaps that cause ~30-40% of review content to be classified as generic codes (V4.03, O1.05) when more specific codes would be appropriate.

Current State: 7 domains, 28 categories, 552 subcodes Gap Impact: ~653 reviews (58% of dataset) mention topics without specific codes


Critical Gaps (High Frequency, No Coverage)

🔴 Gap 1: Family/Kids Experience

Mentions: 205 reviews (18% of dataset) Current Mapping: → V4.03 (Generic) or O1.05 (Outcome)

Missing Codes:

Proposed Code Name Definition
O1.06 Family Suitability Appropriate for children and families
O1.07 Age Appropriateness Suitable for specific age groups
E1.06 Child-Friendly Facilities Amenities for children

Example Reviews Being Misclassified:

  • "Brilliant day for adults and kids" → V4.03 (should be O1.06)
  • "Great family fun" → O1.05 (should be O1.06)
  • "Los niños disfrutaron mucho" → V4.03 (should be O1.06)

🔴 Gap 2: Fun/Entertainment Value

Mentions: 198 reviews (18% of dataset) Current Mapping: → V4.03 (Generic) or O1.05 (Outcome)

Missing Codes:

Proposed Code Name Definition
O1.08 Entertainment Value How enjoyable/fun the experience was
O1.09 Excitement Level Thrill and adrenaline factor
O1.10 Engagement How captivating the experience was

Example Reviews Being Misclassified:

  • "Everyone had a blast" → V4.03 (should be O1.08)
  • "Muy divertido" → V4.03 (should be O1.08)
  • "Fantastische kartbaan" → V4.03 (should be O1.08)

🔴 Gap 3: Recommendation Intent

Mentions: 103 reviews (9% of dataset) Current Mapping: → V4.03 (Generic)

Missing Codes:

Proposed Code Name Definition
R1.06 Would Recommend Intent to recommend to others
R1.07 Would Not Recommend Explicit anti-recommendation
V4.06 Net Promoter Signal Explicit NPS-style sentiment

Example Reviews Being Misclassified:

  • "100% recomendable" → V4.03 (should be R1.06)
  • "Highly recommend" → V4.03 (should be R1.06)
  • "Don't come here" → V4.03 V- (should be R1.07)

🟡 Gap 4: Return Intent

Mentions: 65 reviews (6% of dataset) Current Mapping: → V4.03 or R4.03

Missing Codes:

Proposed Code Name Definition
R1.08 Will Return Intent to visit again
R1.09 Won't Return Explicit no-return statement

Example Reviews:

  • "We'll definitely be back" → R4.03 (should be R1.08)
  • "No volveré" → V4.03 (should be R1.09)

🟡 Gap 5: Food & Beverage

Mentions: 59 reviews (5% of dataset) Current Mapping: → O1.01 or V4.03

Missing Codes:

Proposed Code Name Definition
O2.06 Food Quality Taste, freshness, presentation
O2.07 Drink Quality Beverage quality
O2.08 Menu Variety Range of food/drink options
O2.09 Portion Size Amount of food served

Example Reviews:

  • "Great food at the cafe" → O1.01 (should be O2.06)
  • "Drinks were overpriced" → V1.01 (should be O2.07 + V1.01)

🟡 Gap 6: Excitement/Thrill

Mentions: 23 reviews (2% of dataset) Current Mapping: → V4.03 or O1.05

Missing Code:

Proposed Code Name Definition
O1.09 Excitement Level Thrill and adrenaline factor

Medium Gaps (Moderate Frequency)

Gap 7: Booking/Reservation Process

Current: J2.xx exists but limited

Missing:

Code Name Definition
J2.06 Online Booking Digital reservation experience
J2.07 Booking Confirmation Clear confirmation process

Gap 8: Group Experience

Missing:

Code Name Definition
O1.11 Group Suitability Good for groups/parties
O1.12 Team Building Corporate/team activities

Gap 9: Seasonal/Weather Factors

Missing:

Code Name Definition
E1.07 Weather Protection Shelter from elements
E1.08 Seasonal Suitability Appropriate for season

Impact Analysis

Current Classification Distribution (V4.03 Overuse)

Code    | Count | %    | Issue
--------|-------|------|-------
P1.01   |   477 | 14%  | ✅ Correct usage
V4.03   |   319 | 10%  | ⚠️ Likely 50%+ misclassified
O1.02   |   270 |  8%  | ✅ Correct usage
V1.01   |   211 |  6%  | ✅ Correct usage
O1.01   |   174 |  5%  | ✅ Correct usage

Estimated Misclassification Rate

Gap Topic Reviews Est. Misclassified % of Total
Family/Kids 205 ~180 16%
Fun/Entertainment 198 ~170 15%
Recommendation 103 ~95 8%
Return Intent 65 ~50 4%
Food/Drinks 59 ~40 4%
Excitement 23 ~20 2%
TOTAL 653 ~555 ~49%

Priority 1: Add to O1 (Core Product/Service)

INSERT INTO pipeline.urt_subcodes (code, category_code, domain_code, name, definition) VALUES
('O1.06', 'O1', 'O', 'Family Suitability', 'Appropriate for children and families'),
('O1.07', 'O1', 'O', 'Age Appropriateness', 'Suitable for specific age groups'),
('O1.08', 'O1', 'O', 'Entertainment Value', 'How enjoyable/fun the experience was'),
('O1.09', 'O1', 'O', 'Excitement Level', 'Thrill and adrenaline factor'),
('O1.10', 'O1', 'O', 'Engagement', 'How captivating the experience was'),
('O1.11', 'O1', 'O', 'Group Suitability', 'Good for groups/parties');

Priority 2: Add to R1 (Relationship/Loyalty)

INSERT INTO pipeline.urt_subcodes (code, category_code, domain_code, name, definition) VALUES
('R1.06', 'R1', 'R', 'Would Recommend', 'Intent to recommend to others'),
('R1.07', 'R1', 'R', 'Would Not Recommend', 'Explicit anti-recommendation'),
('R1.08', 'R1', 'R', 'Will Return', 'Intent to visit again'),
('R1.09', 'R1', 'R', 'Won''t Return', 'Explicit no-return statement');

Priority 3: Add to O2 (Product Features)

INSERT INTO pipeline.urt_subcodes (code, category_code, domain_code, name, definition) VALUES
('O2.06', 'O2', 'O', 'Food Quality', 'Taste, freshness, presentation of food'),
('O2.07', 'O2', 'O', 'Drink Quality', 'Quality of beverages'),
('O2.08', 'O2', 'O', 'Menu Variety', 'Range of food/drink options'),
('O2.09', 'O2', 'O', 'Portion Size', 'Amount of food served');

Validation Query

After adding codes, verify reduction in V4.03 usage:

-- Before: V4.03 count
SELECT COUNT(*) FROM pipeline.review_spans WHERE urt_primary = 'V4.03';
-- Expected: ~319

-- After reclassification, target:
-- V4.03: ~100 (true generic)
-- O1.06-O1.11: ~200 (entertainment/family)
-- R1.06-R1.09: ~150 (recommendation/return)

Conclusion

Is the taxonomy ready for production? No

Critical Issues:

  1. ~49% of reviews mention topics without specific codes
  2. V4.03 is a catch-all masking actionable insights
  3. Industry-specific codes (entertainment, F&B) are missing

Recommendation: Add 14 new subcodes before production to capture:

  • Family/Kids experience (O1.06, O1.07)
  • Entertainment value (O1.08, O1.09, O1.10)
  • Recommendation intent (R1.06, R1.07)
  • Return intent (R1.08, R1.09)
  • Food/Beverage (O2.06-O2.09)

Estimated Improvement: Classification accuracy from ~50% specific to ~85% specific.