# Review Data Structure Analysis ## ✅ Current Data Types (All Correct) Based on analysis of scraped reviews from the API: ```typescript interface Review { author: string; // ✓ string rating: number; // ✓ number (not string!) text: string | null; // ✓ string or null date_text: string; // ✓ string (relative dates) avatar_url: string | null; // ✓ string or null profile_url: string | null; // ✓ string or null review_id: string; // ✓ string } ``` **All API data types match the TypeScript interface - no conversion needed!** ## 🐛 Bug Found & Fixed ### Issue: Date Parsing **Problem:** The `parseDateText()` function used `parseInt(text)` which returns `NaN` for strings like "Hace 2 semanas", then defaulted to `1` via `|| 1`. This caused: - "Hace 2 semanas" (2 weeks ago) → parsed as **1 week ago** ❌ - "Hace 6 años" (6 years ago) → parsed as **1 year ago** ❌ - "Hace un año" (1 year ago) → parsed as **1 year ago** ✓ (correct by accident) **Root cause:** `parseInt("Hace 2 semanas")` = `NaN`, and `NaN || 1` = `1` **Fix:** Added `extractNumber()` function that uses regex to extract the number: ```typescript function extractNumber(text: string): number { const match = text.match(/\d+/); if (match) return parseInt(match[0]); // Handle Spanish "un/una" (one) if (text.includes('un ') || text.includes('una ')) return 1; return 1; } ``` ### Verified Results ``` Date: "Hace 2 semanas" → 2026-01-04 ✓ Date: "Hace 2 meses" → 2025-11-18 ✓ Date: "Hace un año" → 2025-01-18 ✓ Date: "Hace 6 años" → 2020-01-18 ✓ ``` ## 📅 Date Format Patterns Found ### Standard Formats - `"Hace X semanas"` - X weeks ago - `"Hace X meses"` - X months ago - `"Hace X años"` - X years ago - `"Hace un año"` - 1 year ago (special case: "un" instead of "1") ### Edited Review Format - `"Fecha de edición: Hace X meses"` - Edited X months ago ### Date Range Distribution (from 244 reviews) - **Last week:** ~2 reviews - **Last month:** ~5-7 reviews - **Last year:** ~30-40 reviews - **1-2 years:** ~20-30 reviews - **2+ years:** ~150+ reviews ## ⚠️ Imprecision Considerations ### Current Approach Relative dates like "Hace 2 meses" are converted to **exact dates** (e.g., exactly 2 months ago from today). ### Limitation - "Hace 2 meses" could mean anywhere from 2.0 to 2.99 months ago - This introduces a ~±15 day margin of error for month boundaries - Similar issues with "Hace un año" (could be 1.0 to 1.99 years) ### Potential Improvements #### Option 1: Conservative Filtering (Current Implementation) - Treat "Hace 2 meses" as exactly 2 months ago - Simple, fast, slightly underestimates recency - **Status: ✓ Implemented** #### Option 2: Range-Based Filtering ```typescript // Consider "Hace 2 meses" as a range: [2 months, 3 months) // Include in "last month" filter if lower bound < 1 month ``` - More accurate for boundary cases - More complex implementation - May include slightly older reviews #### Option 3: Add Buffer Zones ```typescript // Add 10% buffer to cutoff dates const monthAgo = new Date(); monthAgo.setMonth(monthAgo.getMonth() - 1.1); // Include slight overlap ``` - Catches boundary cases - Simple to implement - May include some false positives ### Recommendation **Keep current implementation** (Option 1) because: 1. Date strings are already approximate ("Hace 2 meses" vs exact date) 2. Users expect "Last Month" to mean roughly 30 days, not exactly 3. Performance is better with simple date math 4. The error margin is acceptable for review analytics ## 🎯 Filter Accuracy With the fixed parsing, date filters now work correctly: | Filter | Cutoff Date | Expected Coverage | |--------|------------|------------------| | Last Week | 7 days ago | ~0-3 reviews | | Last Month | 30 days ago | ~5-10 reviews | | Last Year | 365 days ago | ~30-50 reviews | | All Time | No limit | All 244 reviews | ## 🔍 Additional Data Quality Notes 1. **Rating is numeric:** Already a number (1-5), no parsing needed 2. **Duplicate review_ids:** Some reviews share the same `review_id`, hence the key change to `${index}-${review_id}` 3. **Null text:** Some reviews have `text: null` - handled with `|| 'No review text'` 4. **Avatar URLs:** Most reviews have avatar images (~90%+) 5. **Spanish language:** All dates in Spanish, handled by parsing logic ## 📊 Type Safety Checklist - [x] Review interface matches API response - [x] Rating is number type (not string) - [x] Date parsing extracts numbers correctly - [x] Null values handled for text, avatar_url, profile_url - [x] Timeline data points typed correctly - [x] Date range type defined ('week' | 'month' | 'year' | 'all') ## ✨ Status: FIXED The date filtering now works correctly with proper number extraction from Spanish date strings. All data types are validated and match the API schema.