Files
whyrating-engine-legacy/modules/scraper_clean.py
Alejandro Gutiérrez ff03a4a1b7 Add hard refresh recovery for stuck scraper
When the scraper gets stuck (8+ failed soft recovery attempts), it now
does a hard page refresh and re-setups everything:
- Reloads the page
- Re-clicks reviews tab
- Re-sorts by newest
- Re-injects API interceptor
- Continues collecting with existing seen_ids for deduplication

Key changes:
- Extract page setup into reusable setup_reviews_page() function
- Add do_hard_refresh() that calls setup on refresh
- Trigger hard refresh after 8 failed soft recoveries
- Try hard refresh before timeout gives up completely
- Max 3 hard refreshes before truly giving up
- Reset recovery counter after successful hard refresh

This ensures the scraper can recover from browser issues, DOM detachment,
or other problems that soft recovery (scroll tricks) can't fix.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 13:42:54 +00:00

39 KiB