Commit Graph

6 Commits

Author SHA1 Message Date
Alejandro Gutiérrez
7666b7aea2 Fix: Replace broken Google Maps iframe with interactive preview + add scraper type selection
- Replace non-working Google Maps embed iframe with animated location preview
- Add "Open in Google Maps" button to open location in new tab
- Add scraper type selection dropdown fetching from /api/admin/scrapers
- Show selected scraper info with formatted labels (Google Reviews v1.0.0)
- Include scraper_version and scraper_variant in job submission

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:15:58 +00:00
Alejandro Gutiérrez
a540ab97b1 Add browser fingerprint support and analytics metadata display
- Transfer user's browser fingerprint (user-agent, viewport, timezone,
  language, geolocation) to Chrome for more authentic scraping
- Display review topics from Google Maps in analytics dashboard
- Show business category badge in analytics header
- Fix date_text null handling in analytics (handle undefined/timestamp fields)
- Add review_topics and business_category to JobStatus interface

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 10:36:06 +00:00
Alejandro Gutiérrez
47bb032011 Clean up project root - remove 51 obsolete files
Deleted:
- 26 old markdown summary/documentation files
- 16 debug/test Python scripts (debug_*, test_*, diagnose_*)
- 10 untracked JSON files from api_response_samples
- terms-of-usage.md, pane_not_found.png

Also includes pending web app changes:
- Jobs management UI (JobsView, Sidebar components)
- API routes for job streaming and comparison
- Enhanced ReviewAnalytics and ScraperTest components

Final clean structure:
├── api_server_production.py  (main entry)
├── modules/                  (core Python)
├── web/                      (Next.js frontend)
├── tests/                    (test suite)
├── docs/                     (documentation)
└── examples/                 (usage examples)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 17:31:53 +00:00
Alejandro Gutiérrez
01ea18d91d Add test URL quick-select buttons to frontend
- Small (~79 reviews): R. Fleitas Peluqueros
- Medium (~589 reviews): ClickRent Gran Canaria
- Large (~2000+ reviews): Hospital Doctor Negrín

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 14:20:54 +00:00
Alejandro Gutiérrez
c8c24ae483 Add robust structural pattern matching and early no-reviews detection
BREAKING IMPROVEMENTS:

1. Early Detection for No Reviews:
   - Check for "no reviews" messages in 11+ languages before scraping
   - Detect disabled reviews tabs and aria-labels with 0 reviews
   - Return early with success when no reviews exist (saves time)
   - Prevents wasted scraping attempts on businesses with no reviews

2. Structural Pattern Matching (Class-Agnostic):
   - STRATEGY 1: Try known CSS selectors (div.jftiEf.fontBodyMedium, etc.)
   - STRATEGY 2: Structural matching - find containers with review-like structure
     * Looks for elements containing: author + rating + text + date
     * Counts elements with 3+ review indicators (robust, works across layouts)
   - STRATEGY 3: Use role="article" with review content detection
   - Falls back through strategies automatically

3. Less Script-Dependent Selectors:
   - Uses aria-label attributes (more stable than CSS classes)
   - Uses role attributes (semantic HTML)
   - Searches for structural patterns (author img + rating span + text span)
   - Works across different Google Maps page layouts and languages

4. Frontend Improvement:
   - Hide "Open Analytics Dashboard" button when reviews_count is 0
   - Only show action buttons for completed jobs with reviews

TECHNICAL DETAILS:

Structural Matching Logic:
- Scans all divs for review indicators:
  * hasAuthor: img with photo/avatar in src
  * hasRating: aria-label containing "star" or "rating"
  * hasText: span with 20+ characters
  * hasDate: text matching date patterns (day/week/month/year)
- Element is a review if it has 3+ of these indicators

Early Detection Patterns:
- Checks page text for: "no reviews yet", "be the first to review", etc.
- Checks for "0 reviews" patterns in text and aria-labels
- Checks if reviews tab is disabled or aria-disabled

Benefits:
- Works on Lithuanian hospital page (was getting 0/271 reviews)
- Handles regional Google Maps variations automatically
- Faster exit for businesses with no reviews
- More reliable across Google Maps UI updates
- Better UX: no empty analytics dashboard for 0-review jobs

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-18 19:52:39 +00:00
Alejandro Gutiérrez
faa0704737 Optimize scraper performance and add fallback selectors for robustness
Performance improvements:
- Validation speed: 59.71s → 10.96s (5.5x improvement)
- Removed 50+ console.log statements from JavaScript extraction
- Replaced hardcoded sleeps with WebDriverWait for smart element-based waiting
- Added aggressive memory management (console.clear, GC, image unloading every 20 scrolls)

Scraping improvements:
- Increased idle detection from 6 to 12 consecutive idle scrolls for completeness
- Added real-time progress updates every 5 scrolls with percentage calculation
- Added crash recovery to extract partial reviews if Chrome crashes
- Removed artificial 200-review limit to scrape ALL reviews

Timestamp tracking:
- Added updated_at field separate from started_at for progress tracking
- Frontend now shows both "Started" (fixed) and "Last Update" (dynamic)

Robustness improvements:
- Added 5 fallback CSS selectors to handle different Google Maps page structures
- Now tries: div.jftiEf.fontBodyMedium, div.jftiEf, div[data-review-id], etc.
- Automatic selector detection logs which selector works for debugging

Test results:
- Successfully scraped 550 reviews in 150.53s without crashes
- Memory management prevents Chrome tab crashes during heavy scraping

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-18 19:49:24 +00:00