Files
whyrating-engine-legacy/web
Alejandro Gutiérrez c8c24ae483 Add robust structural pattern matching and early no-reviews detection
BREAKING IMPROVEMENTS:

1. Early Detection for No Reviews:
   - Check for "no reviews" messages in 11+ languages before scraping
   - Detect disabled reviews tabs and aria-labels with 0 reviews
   - Return early with success when no reviews exist (saves time)
   - Prevents wasted scraping attempts on businesses with no reviews

2. Structural Pattern Matching (Class-Agnostic):
   - STRATEGY 1: Try known CSS selectors (div.jftiEf.fontBodyMedium, etc.)
   - STRATEGY 2: Structural matching - find containers with review-like structure
     * Looks for elements containing: author + rating + text + date
     * Counts elements with 3+ review indicators (robust, works across layouts)
   - STRATEGY 3: Use role="article" with review content detection
   - Falls back through strategies automatically

3. Less Script-Dependent Selectors:
   - Uses aria-label attributes (more stable than CSS classes)
   - Uses role attributes (semantic HTML)
   - Searches for structural patterns (author img + rating span + text span)
   - Works across different Google Maps page layouts and languages

4. Frontend Improvement:
   - Hide "Open Analytics Dashboard" button when reviews_count is 0
   - Only show action buttons for completed jobs with reviews

TECHNICAL DETAILS:

Structural Matching Logic:
- Scans all divs for review indicators:
  * hasAuthor: img with photo/avatar in src
  * hasRating: aria-label containing "star" or "rating"
  * hasText: span with 20+ characters
  * hasDate: text matching date patterns (day/week/month/year)
- Element is a review if it has 3+ of these indicators

Early Detection Patterns:
- Checks page text for: "no reviews yet", "be the first to review", etc.
- Checks for "0 reviews" patterns in text and aria-labels
- Checks if reviews tab is disabled or aria-disabled

Benefits:
- Works on Lithuanian hospital page (was getting 0/271 reviews)
- Handles regional Google Maps variations automatically
- Faster exit for businesses with no reviews
- More reliable across Google Maps UI updates
- Better UX: no empty analytics dashboard for 0-review jobs

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-18 19:52:39 +00:00
..

Google Reviews Scraper - Testing Interface

A Next.js web interface for testing the containerized Google Reviews Scraper API.

Features

  • 🎯 URL Input - Paste any Google Maps business URL
  • 📊 Real-time Status - Live job tracking with polling
  • Performance Metrics - Reviews count, time, speed
  • 📱 Review Display - Beautiful UI for scraped reviews
  • 💾 Export JSON - Download reviews as JSON

Quick Start

1. Start the Scraper API

First, make sure the containerized scraper is running:

cd ..
docker-compose -f docker-compose.production.yml up -d

The API should be running at http://localhost:8000

2. Start the Web Interface

npm install
npm run dev

Open http://localhost:3000

Usage

  1. Paste a Google Maps URL

    https://www.google.com/maps/place/Business+Name/...
    
  2. Click "Scrape"

    • Job is submitted to the API
    • Status updates in real-time
    • Reviews appear when complete
  3. View Results

    • See all scraped reviews
    • Export as JSON
    • View performance metrics

Environment Variables

Create .env.local if you need to customize:

# API URL (default: http://localhost:8000)
NEXT_PUBLIC_API_URL=http://localhost:8000

API Endpoints Used

This interface connects to:

  • POST /scrape - Submit scraping job
  • GET /jobs/{job_id} - Get job status
  • GET /jobs/{job_id}/reviews - Get reviews

Tech Stack

  • Next.js 15 - React framework
  • TypeScript - Type safety
  • Tailwind CSS - Styling
  • API Proxy - Next.js API routes proxy to scraper API

Development

npm run dev       # Start dev server
npm run build     # Build for production
npm run start     # Start production server
npm run lint      # Run ESLint

Notes

  • The interface polls job status every 2 seconds
  • Polling stops when job completes or fails
  • Reviews are fetched with a limit of 1000 by default
  • Export button downloads reviews as formatted JSON