BREAKING IMPROVEMENTS:
1. Early Detection for No Reviews:
- Check for "no reviews" messages in 11+ languages before scraping
- Detect disabled reviews tabs and aria-labels with 0 reviews
- Return early with success when no reviews exist (saves time)
- Prevents wasted scraping attempts on businesses with no reviews
2. Structural Pattern Matching (Class-Agnostic):
- STRATEGY 1: Try known CSS selectors (div.jftiEf.fontBodyMedium, etc.)
- STRATEGY 2: Structural matching - find containers with review-like structure
* Looks for elements containing: author + rating + text + date
* Counts elements with 3+ review indicators (robust, works across layouts)
- STRATEGY 3: Use role="article" with review content detection
- Falls back through strategies automatically
3. Less Script-Dependent Selectors:
- Uses aria-label attributes (more stable than CSS classes)
- Uses role attributes (semantic HTML)
- Searches for structural patterns (author img + rating span + text span)
- Works across different Google Maps page layouts and languages
4. Frontend Improvement:
- Hide "Open Analytics Dashboard" button when reviews_count is 0
- Only show action buttons for completed jobs with reviews
TECHNICAL DETAILS:
Structural Matching Logic:
- Scans all divs for review indicators:
* hasAuthor: img with photo/avatar in src
* hasRating: aria-label containing "star" or "rating"
* hasText: span with 20+ characters
* hasDate: text matching date patterns (day/week/month/year)
- Element is a review if it has 3+ of these indicators
Early Detection Patterns:
- Checks page text for: "no reviews yet", "be the first to review", etc.
- Checks for "0 reviews" patterns in text and aria-labels
- Checks if reviews tab is disabled or aria-disabled
Benefits:
- Works on Lithuanian hospital page (was getting 0/271 reviews)
- Handles regional Google Maps variations automatically
- Faster exit for businesses with no reviews
- More reliable across Google Maps UI updates
- Better UX: no empty analytics dashboard for 0-review jobs
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Performance improvements:
- Validation speed: 59.71s → 10.96s (5.5x improvement)
- Removed 50+ console.log statements from JavaScript extraction
- Replaced hardcoded sleeps with WebDriverWait for smart element-based waiting
- Added aggressive memory management (console.clear, GC, image unloading every 20 scrolls)
Scraping improvements:
- Increased idle detection from 6 to 12 consecutive idle scrolls for completeness
- Added real-time progress updates every 5 scrolls with percentage calculation
- Added crash recovery to extract partial reviews if Chrome crashes
- Removed artificial 200-review limit to scrape ALL reviews
Timestamp tracking:
- Added updated_at field separate from started_at for progress tracking
- Frontend now shows both "Started" (fixed) and "Last Update" (dynamic)
Robustness improvements:
- Added 5 fallback CSS selectors to handle different Google Maps page structures
- Now tries: div.jftiEf.fontBodyMedium, div.jftiEf, div[data-review-id], etc.
- Automatic selector detection logs which selector works for debugging
Test results:
- Successfully scraped 550 reviews in 150.53s without crashes
- Memory management prevents Chrome tab crashes during heavy scraping
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add new api_interceptor.py module for CDP network interception
- Capture Google Maps internal API responses during scrolling
- Parse protobuf-like JSON responses to extract review data
- Merge API-captured reviews with DOM-scraped data
- Update CSS selectors for January 2026 Google Maps structure
- Add cookie consent dismissal for multiple languages
- Add --api-intercept CLI flag and config option
- Fix review card and pane selectors (.jftiEf, .XiKgde)
- Improve review ID extraction from card elements
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Replace undetected-chromedriver with seleniumbase for better Chrome/ChromeDriver compatibility
- Automatic version matching eliminates manual cache clearing and version conflicts
- Enhanced anti-detection with UC Mode and CDP stealth settings
- Simplified requirements.txt (SeleniumBase manages common dependencies)
- Fix sort selection bug (was selecting wrong menu items)
- Improve scrolling patience (max_idle: 3→15, max_attempts: 10→50)
- Add scroll position tracking to detect when stuck
- Add fallback pane selectors for better reliability
- Update documentation (README, ARCHITECTURE, TROUBLESHOOTING)
- Add comprehensive test suite for SeleniumBase integration
- Version bump to 1.0.1
Developed by George Khananaev
Threw in some practical stuff:
- Detailed config.yaml with all the settings explained
- Sample JSON output showing what you actually get from this thing
- Comments in the sample so people know WTF each field means
Should help folks figure out how to set this up without having to read the whole damn README. I'll probably add more examples later when I get time.
Co-Authored-By: George K (MHG) <122952523+ttm-tech@users.noreply.github.com>
Initial release with multi-language support, MongoDB integration, image handling, URL replacement, and robust error handling. Includes detailed documentation, usage examples, and recommended usage guidelines. Built to effectively handle Google's 2025 interface changes.