Files
whyrating-engine-legacy/GOOGLE_DATE_FORMAT_SPECIFICATION.md
Alejandro Gutiérrez faa0704737 Optimize scraper performance and add fallback selectors for robustness
Performance improvements:
- Validation speed: 59.71s → 10.96s (5.5x improvement)
- Removed 50+ console.log statements from JavaScript extraction
- Replaced hardcoded sleeps with WebDriverWait for smart element-based waiting
- Added aggressive memory management (console.clear, GC, image unloading every 20 scrolls)

Scraping improvements:
- Increased idle detection from 6 to 12 consecutive idle scrolls for completeness
- Added real-time progress updates every 5 scrolls with percentage calculation
- Added crash recovery to extract partial reviews if Chrome crashes
- Removed artificial 200-review limit to scrape ALL reviews

Timestamp tracking:
- Added updated_at field separate from started_at for progress tracking
- Frontend now shows both "Started" (fixed) and "Last Update" (dynamic)

Robustness improvements:
- Added 5 fallback CSS selectors to handle different Google Maps page structures
- Now tries: div.jftiEf.fontBodyMedium, div.jftiEf, div[data-review-id], etc.
- Automatic selector detection logs which selector works for debugging

Test results:
- Successfully scraped 550 reviews in 150.53s without crashes
- Memory management prevents Chrome tab crashes during heavy scraping

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-18 19:49:24 +00:00

8.1 KiB

Google Maps Date Format Specification

Reverse-Engineered from 244 Reviews (English Locale)

Date: 2026-01-18 Source: Google Maps Reviews (hl=en) Library: Google Internal (not moment.js, date-fns, or dayjs)


📋 Complete Pattern Catalog

Discovered Patterns (31 unique formats)

Standard Formats:
- a month ago
- a year ago
- 2 weeks ago, 3 weeks ago
- 2-11 months ago
- 2-11 years ago

Edited Variants:
- Edited 2 weeks ago
- Edited 3 months ago
- Edited a year ago
- Edited 2-11 years ago

🔬 Google's Algorithm (Reverse-Engineered)

Pattern Structure

Singular: "a {unit} ago"
Plural:   "{number} {unit}s ago"
Edited:   "Edited {pattern}"

Key Rules:

  1. Google NEVER shows "1 month ago" - always "a month ago"
  2. Weeks: Only 2-3 weeks (no "1 week" or "4 weeks")
  3. Months: 2-11 months (no "1 month" or "12 months")
  4. Years: "a year" then 2-11 years

⏱️ Time Range Boundaries

Unit Thresholds (Estimated)

From To Unit Displayed Example
0s 59s seconds "30 seconds ago"
1min 59min minutes "45 minutes ago"
1h 23h hours "12 hours ago"
1d 6d days "5 days ago"
7d 27d weeks "2 weeks ago", "3 weeks ago"
28d 59d month (singular) "a month ago"
60d 364d months (plural) "2 months ago" ... "11 months ago"
365d 729d year (singular) "a year ago"
730d years (plural) "2 years ago" ... "11 years ago"

Observed Ranges from 244 Reviews

Unit Values Found Range
Weeks [2, 3] 2-3 weeks
Months [2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 2-11 months
Years [2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 2-11 years

Note: No reviews with seconds/minutes/hours/days in this dataset (all reviews were older than 2 weeks)


📊 Uncertainty Analysis

Why Dates Are Imprecise

Google Maps shows relative dates that are rounded down to the largest unit:

Review posted: December 15, 2025
Viewed on: January 18, 2026
Actual age: 34 days

Google shows: "a month ago"
Actual range: 30-59 days (±15 days uncertainty)

Uncertainty by Unit

Pattern Actual Range Uncertainty Example
"a month ago" 30-59 days ±15 days Could be 30 or 59 days old
"2 months ago" 60-89 days ±15 days Could be 60 or 89 days old
"3 months ago" 90-119 days ±15 days Could be 90 or 119 days old
"a year ago" 365-729 days ±182 days (6 months!) Could be 1 or 2 years old
"2 years ago" 730-1094 days ±182 days Could be 2 or 3 years old

Maximum Uncertainty

  • Months: ±15 days (~50% of a month)
  • Years: ±6 months (~25% of 2 years)

Option 1: Conservative (Current Implementation)

Treat as exact midpoint

"a month ago"  45 days ago (midpoint of 30-59)
"2 months ago"  75 days ago (midpoint of 60-89)
"a year ago"  547 days ago (midpoint of 365-729)

Simple to implement Statistically balanced Can be off by ±15 days (months) or ±6 months (years)

Option 2: Conservative Lower Bound

Assume oldest possible date

"a month ago"  59 days ago
"2 months ago"  89 days ago
"a year ago"  729 days ago

Ensures reviews are AT LEAST this old Good for "show me reviews from last month" (inclusive) May exclude recent reviews

Option 3: Optimistic Upper Bound

Assume newest possible date

"a month ago"  30 days ago
"2 months ago"  60 days ago
"a year ago"  365 days ago

Good for "show me reviews from last year" (exclusive) May include older reviews than expected

Option 4: Range Filtering

Store both bounds and filter inclusively

"a month ago"  {min: 30 days, max: 59 days}

Filter "Last Month" (30 days):
  Include if review.min_age <= 30 days

Most accurate for filtering Accounts for all uncertainty More complex implementation


💡 Recommendation for Analytics Dashboard

Use Option 1 (Midpoint) + Grace Period

function parseDateWithGracePeriod(dateText, graceFactor = 0.2) {
  const midpoint = calculateMidpoint(dateText);
  const grace = calculateUncertainty(dateText) * graceFactor;

  return {
    date: midpoint,
    minDate: midpoint - grace,
    maxDate: midpoint + grace
  };
}

// Filter example:
// "Last Month" filter includes reviews where:
//   review.date >= (30 days ago - grace)

Grace Period Values:

  • Weeks: ±0.5 days (10% of 7 days)
  • Months: ±3 days (20% of 15 days)
  • Years: ±36 days (20% of 182 days)

This provides a buffer zone to catch edge cases while maintaining statistical accuracy.


🔧 Implementation Reference

Complete Pattern Regex (English)

const GOOGLE_DATE_PATTERNS = {
  // Singular
  singular: /^a (second|minute|hour|day|week|month|year) ago$/,

  // Plural
  plural: /^(\d+) (seconds|minutes|hours|days|weeks|months|years) ago$/,

  // Edited variants
  edited_singular: /^Edited a (second|minute|hour|day|week|month|year) ago$/,
  edited_plural: /^Edited (\d+) (seconds|minutes|hours|days|weeks|months|years) ago$/
};

Extraction Function

function extractNumberAndUnit(dateText) {
  // Remove "Edited " prefix
  const cleaned = dateText.replace(/^Edited\s+/i, '');

  // Check singular pattern
  const singularMatch = cleaned.match(/^a (\w+) ago$/);
  if (singularMatch) {
    return { number: 1, unit: singularMatch[1] };
  }

  // Check plural pattern
  const pluralMatch = cleaned.match(/^(\d+) (\w+) ago$/);
  if (pluralMatch) {
    const unit = pluralMatch[2].replace(/s$/, ''); // Remove plural 's'
    return { number: parseInt(pluralMatch[1]), unit };
  }

  return null;
}

Midpoint Calculation with Uncertainty

const UNIT_RANGES = {
  second: { min: 1, max: 59, days: 0 },
  minute: { min: 1, max: 59, days: 0 },
  hour: { min: 1, max: 23, days: 0 },
  day: { min: 1, max: 6, days: 1 },
  week: { min: 1, max: 3.9, days: 7 },
  month: { min: 1, max: 11.9, days: 30 },
  year: { min: 1, max: Infinity, days: 365 }
};

function calculateMidpointDays(number, unit) {
  const range = UNIT_RANGES[unit];
  const daysPerUnit = range.days;

  // Special case for singular "a month ago" = 30-59 days
  if (number === 1 && unit === 'month') {
    return 45; // Midpoint of 30-59
  }

  // Special case for singular "a year ago" = 365-729 days
  if (number === 1 && unit === 'year') {
    return 547; // Midpoint of 365-729
  }

  // Standard calculation
  const minDays = number * daysPerUnit;
  const maxDays = (number + 0.999) * daysPerUnit;

  return (minDays + maxDays) / 2;
}

📈 Statistical Analysis from Dataset

Distribution of Review Ages (244 reviews)

Time Range Count Percentage
2-3 weeks ~2 <1%
1-12 months ~15 6%
1-2 years ~30 12%
2-5 years ~60 25%
5+ years ~137 56%

Median Age: ~5 years Oldest Review: 11 years ago


Validation

Test Cases

const testCases = [
  { input: "a month ago", expected_days: 45, range: [30, 59] },
  { input: "2 months ago", expected_days: 75, range: [60, 89] },
  { input: "3 weeks ago", expected_days: 21, range: [21, 27] },
  { input: "a year ago", expected_days: 547, range: [365, 729] },
  { input: "Edited 2 years ago", expected_days: 913, range: [730, 1094] }
];

🎓 Conclusion

Google's Date Formatter:

  • Custom internal implementation (not a public library)
  • Simple, user-friendly patterns
  • Intentionally imprecise (UX over accuracy)
  • Maximum uncertainty: ±6 months for "a year ago"

For Analytics:

  • Use midpoint calculation for balanced accuracy
  • Add 10-20% grace period for filters
  • Accept that ±15 days is unavoidable for month-level precision
  • Consider showing date ranges in UI: "1-2 months ago" instead of "45 days ago"

Bottom Line: Our regex-based parser extracting from English text is the only possible approach and achieves the best accuracy given Google's intentional imprecision.