Performance improvements: - Validation speed: 59.71s → 10.96s (5.5x improvement) - Removed 50+ console.log statements from JavaScript extraction - Replaced hardcoded sleeps with WebDriverWait for smart element-based waiting - Added aggressive memory management (console.clear, GC, image unloading every 20 scrolls) Scraping improvements: - Increased idle detection from 6 to 12 consecutive idle scrolls for completeness - Added real-time progress updates every 5 scrolls with percentage calculation - Added crash recovery to extract partial reviews if Chrome crashes - Removed artificial 200-review limit to scrape ALL reviews Timestamp tracking: - Added updated_at field separate from started_at for progress tracking - Frontend now shows both "Started" (fixed) and "Last Update" (dynamic) Robustness improvements: - Added 5 fallback CSS selectors to handle different Google Maps page structures - Now tries: div.jftiEf.fontBodyMedium, div.jftiEf, div[data-review-id], etc. - Automatic selector detection logs which selector works for debugging Test results: - Successfully scraped 550 reviews in 150.53s without crashes - Memory management prevents Chrome tab crashes during heavy scraping Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2.0 KiB
2.0 KiB
Google Reviews Scraper - Testing Interface
A Next.js web interface for testing the containerized Google Reviews Scraper API.
Features
- 🎯 URL Input - Paste any Google Maps business URL
- 📊 Real-time Status - Live job tracking with polling
- ⚡ Performance Metrics - Reviews count, time, speed
- 📱 Review Display - Beautiful UI for scraped reviews
- 💾 Export JSON - Download reviews as JSON
Quick Start
1. Start the Scraper API
First, make sure the containerized scraper is running:
cd ..
docker-compose -f docker-compose.production.yml up -d
The API should be running at http://localhost:8000
2. Start the Web Interface
npm install
npm run dev
Usage
-
Paste a Google Maps URL
https://www.google.com/maps/place/Business+Name/... -
Click "Scrape"
- Job is submitted to the API
- Status updates in real-time
- Reviews appear when complete
-
View Results
- See all scraped reviews
- Export as JSON
- View performance metrics
Environment Variables
Create .env.local if you need to customize:
# API URL (default: http://localhost:8000)
NEXT_PUBLIC_API_URL=http://localhost:8000
API Endpoints Used
This interface connects to:
POST /scrape- Submit scraping jobGET /jobs/{job_id}- Get job statusGET /jobs/{job_id}/reviews- Get reviews
Tech Stack
- Next.js 15 - React framework
- TypeScript - Type safety
- Tailwind CSS - Styling
- API Proxy - Next.js API routes proxy to scraper API
Development
npm run dev # Start dev server
npm run build # Build for production
npm run start # Start production server
npm run lint # Run ESLint
Notes
- The interface polls job status every 2 seconds
- Polling stops when job completes or fails
- Reviews are fetched with a limit of 1000 by default
- Export button downloads reviews as formatted JSON