Switch production to scraper_clean with hard refresh recovery

- Add fast_scrape_reviews() wrapper to scraper_clean.py for API compatibility
- Set window size (1200x900) in wrapper to ensure proper Google Maps rendering
- Update job_manager.py to import from scraper_clean instead of fast_scraper
- Production now uses clean scraper with:
  - Hard refresh recovery when stuck after 8+ soft recovery attempts
  - API interception + DOM parsing for complete data collection
  - Automatic deduplication across refreshes

Tested: 589/589 reviews collected in 55s

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Alejandro Gutiérrez
2026-01-22 14:18:10 +00:00
parent ff03a4a1b7
commit a6d6531543
2 changed files with 96 additions and 1 deletions

View File

@@ -15,7 +15,7 @@ from dataclasses import dataclass, asdict
from modules.config import load_config
from modules.scraper import GoogleReviewsScraper
from modules.fast_scraper import fast_scrape_reviews
from modules.scraper_clean import fast_scrape_reviews # Updated to use clean scraper with hard refresh recovery
from modules.chrome_pool import get_scraping_worker, release_scraping_worker
log = logging.getLogger("scraper")