- Replace undetected-chromedriver with seleniumbase for better Chrome/ChromeDriver compatibility - Automatic version matching eliminates manual cache clearing and version conflicts - Enhanced anti-detection with UC Mode and CDP stealth settings - Simplified requirements.txt (SeleniumBase manages common dependencies) - Fix sort selection bug (was selecting wrong menu items) - Improve scrolling patience (max_idle: 3→15, max_attempts: 10→50) - Add scroll position tracking to detect when stuck - Add fallback pane selectors for better reliability - Update documentation (README, ARCHITECTURE, TROUBLESHOOTING) - Add comprehensive test suite for SeleniumBase integration - Version bump to 1.0.1 Developed by George Khananaev
13 KiB
Troubleshooting Guide
This guide covers common issues and their solutions when running Google Reviews Scraper Pro.
Table of Contents
- Chrome & ChromeDriver Issues
- MongoDB Issues
- AWS S3 Issues
- Scraping Issues
- API Server Issues
- Image Download Issues
- Configuration Issues
- Performance Issues
- Python & Dependencies Issues
Chrome & ChromeDriver Issues
Issue: ChromeDriver Version Mismatch
Error Message:
SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 143
Current browser version is 142.0.7444.176
Cause: Chrome/ChromeDriver version mismatch (this issue is now automatically handled by SeleniumBase).
Solution:
Good News: With SeleniumBase UC Mode, version mismatches are automatically resolved!
-
Update Chrome to latest version:
- macOS: Open Chrome → Menu → Help → About Google Chrome
- Or run:
open -a "Google Chrome" "chrome://settings/help"
-
Upgrade SeleniumBase (if needed):
pip install --upgrade seleniumbase -
Run scraper again - SeleniumBase automatically downloads the matching ChromeDriver.
Issue: ChromeOptions Reuse Error
Error Message:
RuntimeError: you cannot reuse the ChromeOptions object
Cause: Internal error when retrying Chrome initialization.
Solution: Clear the ChromeDriver cache (see above) and restart the scraper.
Issue: Chrome Binary Not Found
Error Message:
WebDriverException: Message: unknown error: cannot find Chrome binary
Cause: Chrome is not installed or not in the expected location.
Solution:
-
Install Chrome:
- Download from: https://www.google.com/chrome/
-
For custom Chrome location, set environment variable:
export CHROME_BIN=/path/to/chrome -
Docker users: Ensure Chrome is installed in Dockerfile:
RUN apt-get update && apt-get install -y google-chrome-stable ENV CHROME_BIN=/usr/bin/google-chrome
Issue: Chrome Crashes in Headless Mode
Error Message:
WebDriverException: Message: chrome not reachable
Solution:
-
Add required flags (already included in scraper, but verify):
--no-sandbox --disable-dev-shm-usage --disable-gpu -
Increase shared memory (Docker):
docker run --shm-size=2g your-image -
Try non-headless mode to debug:
python start.py --headless false
MongoDB Issues
Issue: Connection Timeout
Error Message:
ServerSelectionTimeoutError: connection timed out
Cause: MongoDB server unreachable or network issues.
Solution:
-
Verify MongoDB is running:
# Local MongoDB mongosh --eval "db.adminCommand('ping')" # Check service status sudo systemctl status mongod -
Check connection URI:
# config.yaml mongodb: uri: "mongodb://username:password@host:27017/" -
For MongoDB Atlas:
- Whitelist your IP address in Atlas dashboard
- Verify cluster is active
- Check network connectivity
-
Test connection manually:
python -c "from pymongo import MongoClient; c = MongoClient('your-uri', serverSelectionTimeoutMS=5000); print(c.server_info())"
Issue: Authentication Failed
Error Message:
OperationFailure: Authentication failed
Solution:
- Verify credentials in connection URI
- Check database name matches the authentication database
- Use correct URI format:
mongodb://username:password@host:27017/database?authSource=admin
Issue: SSL Certificate Error
Error Message:
SSL: CERTIFICATE_VERIFY_FAILED
Solution:
-
For macOS, run:
/Applications/Python\ 3.x/Install\ Certificates.command -
Or install certifi:
pip install --upgrade certifi -
The scraper auto-handles this, but if issues persist:
import certifi import os os.environ['SSL_CERT_FILE'] = certifi.where()
AWS S3 Issues
Issue: Access Denied
Error Message:
ClientError: An error occurred (AccessDenied) when calling the PutObject operation
Solution:
-
Verify AWS credentials:
# config.yaml s3: aws_access_key_id: "YOUR_ACCESS_KEY" aws_secret_access_key: "YOUR_SECRET_KEY" -
Check IAM permissions - required policy:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:ListBucket", "s3:PutObjectAcl" ], "Resource": [ "arn:aws:s3:::your-bucket-name", "arn:aws:s3:::your-bucket-name/*" ] } ] } -
Check bucket policy allows public-read if using public URLs
Issue: Bucket Not Found
Error Message:
ClientError: An error occurred (NoSuchBucket)
Solution:
-
Verify bucket name in config.yaml
-
Check region matches bucket location:
s3: region_name: "us-east-1" # Must match bucket region bucket_name: "your-bucket" -
Create bucket if it doesn't exist via AWS Console or CLI
Issue: Invalid Credentials
Error Message:
NoCredentialsError: Unable to locate credentials
Solution:
-
Set credentials in config.yaml or environment variables:
export AWS_ACCESS_KEY_ID=your_key export AWS_SECRET_ACCESS_KEY=your_secret -
Or use AWS credentials file:
~/.aws/credentials [default] aws_access_key_id = YOUR_KEY aws_secret_access_key = YOUR_SECRET
Scraping Issues
Issue: Reviews Tab Not Found
Error Message:
TimeoutException: Reviews tab not found or could not be clicked
Cause: Google Maps UI changed or page didn't load properly.
Solution:
-
Try non-headless mode to see what's happening:
python start.py --headless false -
Check the URL is a valid Google Maps place URL
-
Increase timeout - network may be slow
-
Clear cookies/cache - Google may be showing consent dialogs
-
Try different sort order:
python start.py --sort relevance
Issue: No Reviews Found
Error Message:
WARNING: No review cards found in this iteration
Cause: Page structure changed or place has no reviews.
Solution:
- Verify the place has reviews by opening URL in browser
- Check if page requires login for reviews
- Wait longer for page to load - add delay in config
- Check for CAPTCHA - may need to solve manually first
Issue: Stale Element Reference
Error Message:
StaleElementReferenceException: stale element reference: element is not attached to the page document
Cause: Page updated while scraping.
Solution: This is handled automatically by the scraper. If persistent:
- Reduce scroll speed - increase sleep time
- Run in non-headless mode to observe behavior
- Restart scraper - temporary DOM issue
Issue: Cookie Consent Blocking
Cause: Cookie dialog not being dismissed.
Solution:
-
Clear browser data:
rm -rf ~/Library/Application\ Support/undetected_chromedriver -
The scraper handles this automatically, but you can:
- Open the URL manually first and accept cookies
- Use a different Google account region
API Server Issues
Issue: Port Already in Use
Error Message:
OSError: [Errno 48] Address already in use
Solution:
-
Find and kill the process:
# Find process using port 8000 lsof -i :8000 # Kill the process kill -9 <PID> -
Use different port:
uvicorn api_server:app --port 8080
Issue: Max Concurrent Jobs Reached
Error Message:
HTTP 429: Maximum concurrent jobs (3) reached
Solution:
- Wait for existing jobs to complete
- Cancel pending jobs:
curl -X POST "http://localhost:8000/jobs/{job_id}/cancel" - Increase limit in
api_server.py(not recommended for stability)
Issue: CORS Errors (Browser)
Error Message:
Access-Control-Allow-Origin header missing
Solution: CORS is enabled by default. If issues persist:
- Check allowed origins in
api_server.py - For development, ensure middleware is configured:
app.add_middleware( CORSMiddleware, allow_origins=["*"], allow_methods=["*"], allow_headers=["*"], )
Image Download Issues
Issue: Images Not Downloading
Cause: Network issues or Google blocking requests.
Solution:
-
Check network connectivity
-
Verify image URLs are accessible
-
Reduce parallel downloads:
download_threads: 2 # Reduce from default 4 -
Check disk space for image storage
Issue: Images Corrupted or Wrong Size
Cause: Partial downloads or URL issues.
Solution:
-
Clear image directory and re-run:
rm -rf review_images/ -
Check max dimensions in config:
max_width: 1200 max_height: 1200
Issue: Permission Denied Writing Images
Error Message:
PermissionError: [Errno 13] Permission denied
Solution:
-
Check directory permissions:
chmod 755 review_images/ -
Use different directory:
image_dir: "/path/with/write/access"
Configuration Issues
Issue: Config File Not Found
Error Message:
FileNotFoundError: config.yaml not found
Solution:
-
Create config.yaml from example:
cp examples/config-example.txt config.yaml -
Specify custom path:
python start.py --config /path/to/config.yaml
Issue: Invalid YAML Syntax
Error Message:
yaml.scanner.ScannerError: mapping values are not allowed here
Solution:
- Validate YAML syntax using online validator
- Check indentation - use spaces, not tabs
- Escape special characters in strings:
url: "https://example.com?param=value" # Use quotes
Issue: Invalid Configuration Values
Error Message:
ValueError: Invalid sort_by value
Solution:
-
Check allowed values:
sort_by: newest, highest, lowest, relevanceheadless: true, false
-
Verify types:
download_threads: 4 # Integer, not string headless: true # Boolean, not string "true"
Performance Issues
Issue: Scraping Too Slow
Solution:
-
Use headless mode:
python start.py --headless -
Reduce image download threads if network is slow:
download_threads: 2 -
Disable image downloading for faster scraping:
download_images: false -
Use SSD for faster JSON/image writes
Issue: High Memory Usage
Solution:
- Process in batches - use
stop_on_matchfor incremental scraping - Disable image downloading temporarily
- Close other applications
- Increase system swap if needed
Issue: Chrome Using Too Much CPU
Solution:
- Use headless mode - reduces rendering overhead
- Add GPU flags:
--disable-gpu --disable-software-rasterizer - Limit concurrent jobs in API mode
Python & Dependencies Issues
Issue: Module Not Found
Error Message:
ModuleNotFoundError: No module named 'undetected_chromedriver'
Solution:
-
Install dependencies:
pip install -r requirements.txt -
Verify virtual environment is activated:
source venv/bin/activate # Linux/macOS venv\Scripts\activate # Windows
Issue: Incompatible Package Versions
Error Message:
ImportError: cannot import name 'X' from 'Y'
Solution:
-
Reinstall all dependencies:
pip uninstall -r requirements.txt -y pip install -r requirements.txt -
Create fresh virtual environment:
python -m venv fresh_venv source fresh_venv/bin/activate pip install -r requirements.txt
Issue: Python Version Incompatibility
Error Message:
SyntaxError: invalid syntax
Solution:
-
Check Python version (requires 3.9+):
python --version -
Install correct Python version:
# macOS with pyenv pyenv install 3.13.1 pyenv local 3.13.1 # Or use system package manager
Getting Help
If your issue isn't listed here:
-
Enable debug logging:
LOG_LEVEL=DEBUG python start.py -
Check logs for detailed error messages
-
Search existing issues on GitHub
-
Create a new issue with:
- Error message (full traceback)
- Python version (
python --version) - OS and version
- Chrome version
- Steps to reproduce