The multi-sort loop was calling get_dom_reviews() which doesn't exist.
API interception alone is sufficient for capturing reviews during
multi-sort passes, so we now use only api_reviews.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
v1.0.0 improvements:
- Add captcha detection (reCAPTCHA, unusual traffic, challenges)
- Block fonts, analytics, maps tiles for faster scrolling
- Add 95% close-enough threshold to skip unnecessary retries
- Stop immediately if captcha detected instead of retrying
v1.1.0 new features:
- Multi-sort strategy to bypass ~1000 review limit
- Cycles through newest/lowest/highest/relevant sorts
- Auto mode: enables multi-sort when total > 1000
- Diminishing returns detection (stops if <5% new per pass)
- Configurable sort order and thresholds
Also adds test_scraper_v110.py CLI tool for testing multi-sort.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: Cards were hidden but not removed from DOM, causing
memory buildup (400+ nodes) that crashed Chrome tabs.
Changes:
- Actually remove processed cards from DOM (not just hide them)
- Keep last 50 cards for scroll reference/continuity
- Remove adjacent separator elements along with cards
- Add logging when DOM cleanup removes cards
- Cards near scroll end stay visible for reference
This should prevent "tab crashed" errors during long scraping
sessions with 500+ reviews.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>