Major refactoring to achieve 100% review collection:
CONTINUOUS SCROLLING:
- Background thread scrolls NON-STOP at 5ms intervals (no gaps!)
- Main thread checks every 2s while scrolling continues
- Stops immediately when all reviews collected
- Solves the core problem: gaps between bursts caused Google to stop loading
SMART TIMEOUT:
- Gap-based: 3x average gap between review loads
- Initial timeout: 3x time since first load (or 15s default)
- Adaptive: evolves from conservative early timeout to smart gap-based
- Detailed logging shows timeout calculations
RESULTS:
- 100% completion (271/271) vs previous 91% (247/271)
- 3.5x faster (~17s vs 60s)
- Clean thread management with proper shutdown
REMOVED:
- All burst scrolling code (~100 lines)
- Scroll stuck detection (no longer needed)
- Dynamic sleep logic (replaced with continuous scrolling)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Performance improvements:
- Validation speed: 59.71s → 10.96s (5.5x improvement)
- Removed 50+ console.log statements from JavaScript extraction
- Replaced hardcoded sleeps with WebDriverWait for smart element-based waiting
- Added aggressive memory management (console.clear, GC, image unloading every 20 scrolls)
Scraping improvements:
- Increased idle detection from 6 to 12 consecutive idle scrolls for completeness
- Added real-time progress updates every 5 scrolls with percentage calculation
- Added crash recovery to extract partial reviews if Chrome crashes
- Removed artificial 200-review limit to scrape ALL reviews
Timestamp tracking:
- Added updated_at field separate from started_at for progress tracking
- Frontend now shows both "Started" (fixed) and "Last Update" (dynamic)
Robustness improvements:
- Added 5 fallback CSS selectors to handle different Google Maps page structures
- Now tries: div.jftiEf.fontBodyMedium, div.jftiEf, div[data-review-id], etc.
- Automatic selector detection logs which selector works for debugging
Test results:
- Successfully scraped 550 reviews in 150.53s without crashes
- Memory management prevents Chrome tab crashes during heavy scraping
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add new api_interceptor.py module for CDP network interception
- Capture Google Maps internal API responses during scrolling
- Parse protobuf-like JSON responses to extract review data
- Merge API-captured reviews with DOM-scraped data
- Update CSS selectors for January 2026 Google Maps structure
- Add cookie consent dismissal for multiple languages
- Add --api-intercept CLI flag and config option
- Fix review card and pane selectors (.jftiEf, .XiKgde)
- Improve review ID extraction from card elements
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Replace undetected-chromedriver with seleniumbase for better Chrome/ChromeDriver compatibility
- Automatic version matching eliminates manual cache clearing and version conflicts
- Enhanced anti-detection with UC Mode and CDP stealth settings
- Simplified requirements.txt (SeleniumBase manages common dependencies)
- Fix sort selection bug (was selecting wrong menu items)
- Improve scrolling patience (max_idle: 3→15, max_attempts: 10→50)
- Add scroll position tracking to detect when stuck
- Add fallback pane selectors for better reliability
- Update documentation (README, ARCHITECTURE, TROUBLESHOOTING)
- Add comprehensive test suite for SeleniumBase integration
- Version bump to 1.0.1
Developed by George Khananaev
Initial release with multi-language support, MongoDB integration, image handling, URL replacement, and robust error handling. Includes detailed documentation, usage examples, and recommended usage guidelines. Built to effectively handle Google's 2025 interface changes.