Files
Alejandro Gutiérrez faa0704737 Optimize scraper performance and add fallback selectors for robustness
Performance improvements:
- Validation speed: 59.71s → 10.96s (5.5x improvement)
- Removed 50+ console.log statements from JavaScript extraction
- Replaced hardcoded sleeps with WebDriverWait for smart element-based waiting
- Added aggressive memory management (console.clear, GC, image unloading every 20 scrolls)

Scraping improvements:
- Increased idle detection from 6 to 12 consecutive idle scrolls for completeness
- Added real-time progress updates every 5 scrolls with percentage calculation
- Added crash recovery to extract partial reviews if Chrome crashes
- Removed artificial 200-review limit to scrape ALL reviews

Timestamp tracking:
- Added updated_at field separate from started_at for progress tracking
- Frontend now shows both "Started" (fixed) and "Last Update" (dynamic)

Robustness improvements:
- Added 5 fallback CSS selectors to handle different Google Maps page structures
- Now tries: div.jftiEf.fontBodyMedium, div.jftiEf, div[data-review-id], etc.
- Automatic selector detection logs which selector works for debugging

Test results:
- Successfully scraped 550 reviews in 150.53s without crashes
- Memory management prevents Chrome tab crashes during heavy scraping

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-18 19:49:24 +00:00

2.0 KiB

Google Reviews Scraper - Testing Interface

A Next.js web interface for testing the containerized Google Reviews Scraper API.

Features

  • 🎯 URL Input - Paste any Google Maps business URL
  • 📊 Real-time Status - Live job tracking with polling
  • Performance Metrics - Reviews count, time, speed
  • 📱 Review Display - Beautiful UI for scraped reviews
  • 💾 Export JSON - Download reviews as JSON

Quick Start

1. Start the Scraper API

First, make sure the containerized scraper is running:

cd ..
docker-compose -f docker-compose.production.yml up -d

The API should be running at http://localhost:8000

2. Start the Web Interface

npm install
npm run dev

Open http://localhost:3000

Usage

  1. Paste a Google Maps URL

    https://www.google.com/maps/place/Business+Name/...
    
  2. Click "Scrape"

    • Job is submitted to the API
    • Status updates in real-time
    • Reviews appear when complete
  3. View Results

    • See all scraped reviews
    • Export as JSON
    • View performance metrics

Environment Variables

Create .env.local if you need to customize:

# API URL (default: http://localhost:8000)
NEXT_PUBLIC_API_URL=http://localhost:8000

API Endpoints Used

This interface connects to:

  • POST /scrape - Submit scraping job
  • GET /jobs/{job_id} - Get job status
  • GET /jobs/{job_id}/reviews - Get reviews

Tech Stack

  • Next.js 15 - React framework
  • TypeScript - Type safety
  • Tailwind CSS - Styling
  • API Proxy - Next.js API routes proxy to scraper API

Development

npm run dev       # Start dev server
npm run build     # Build for production
npm run start     # Start production server
npm run lint      # Run ESLint

Notes

  • The interface polls job status every 2 seconds
  • Polling stops when job completes or fails
  • Reviews are fetched with a limit of 1000 by default
  • Export button downloads reviews as formatted JSON