Files
whyrating-engine-legacy/modules/scraper_clean.py
Alejandro Gutiérrez d989178119 7x faster scraping with JS parsing + batch flushing
Performance improvements:
- JS-based DOM parsing (single browser call vs Selenium round-trips)
- Batch flushing to disk every 500 reviews to free memory
- Hide parsed elements (display:none) to reduce DOM overhead
- Cycle timing instrumentation for debugging slowdowns

Results: 2826 reviews in 6.7min (7.1/sec) vs 2190 in 37min (1.0/sec)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 10:01:22 +00:00

28 KiB