migrate to SeleniumBase UC Mode for automatic version management

- Replace undetected-chromedriver with seleniumbase for better Chrome/ChromeDriver compatibility - Automatic version matching eliminates manual cache clearing and version conflicts - Enhanced anti-detection with UC Mode and CDP stealth settings - Simplified requirements.txt (SeleniumBase manages common dependencies) - Fix sort selection bug (was selecting wrong menu items) - Improve scrolling patience (max_idle: 3→15, max_attempts: 10→50) - Add scroll position tracking to detect when stuck - Add fallback pane selectors for better reliability - Update documentation (README, ARCHITECTURE, TROUBLESHOOTING) - Add comprehensive test suite for SeleniumBase integration - Version bump to 1.0.1 Developed by George Khananaev
2025-12-07 19:40:13 +07:00
parent 6b60b02eec
commit 262f0c0be7
7 changed files with 3802 additions and 106 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -11,6 +11,7 @@ Desktop.ini
 # -----------------------------------------------------------
 .idea/
 .vscode/
 .claude/
 *.swp
 *.swo
 *~
@@ -48,6 +49,7 @@ logs.db
 *.sqlite
 *.sqlite3
 *.db
 docs/AGENTS_LOG
 # -----------------------------------------------------------
 # Config Files
@@ -68,6 +70,12 @@ review_images/
 images/
 downloaded_images/
 # -----------------------------------------------------------
 # SeleniumBase Files
 # -----------------------------------------------------------
 downloaded_files/
 *.lock
 # -----------------------------------------------------------
 # Temporary and Output Files
 # -----------------------------------------------------------
--- a/README.md
+++ b/README.md
@@ -1,16 +1,16 @@
 # 🔥 Google Reviews Scraper Pro (2025) 🔥
-![Google Reviews Scraper Pro](https://img.shields.io/badge/Version-1.0.0-brightgreen)
+![Google Reviews Scraper Pro](https://img.shields.io/badge/Version-1.0.1-brightgreen)
 ![Python](https://img.shields.io/badge/Python-3.10%20%7C%203.11%20%7C%203.12%20%7C%203.13-blue)
 ![License](https://img.shields.io/badge/License-MIT-yellow)
-![Last Update](https://img.shields.io/badge/Last%20Updated-April%202025-red)
+![Last Update](https://img.shields.io/badge/Last%20Updated-December%202025-red)
 **FINALLY! A scraper that ACTUALLY WORKS in 2025!** While others break with every Google update, this bad boy keeps on trucking. Say goodbye to the frustration of constantly broken scrapers and hello to a beast that rips through Google's defenses like a hot knife through butter. This battle-tested, rock-solid solution will extract every juicy detail from Google reviews while laughing in the face of rate limiting.
 ## 🌟 Feature Artillery
 - **Bulletproof in 2025**: While the competition falls apart, we've cracked Google's latest tricks
- **Ninja-Mode Selenium**: Our undetected-chromedriver flies under the radar where others get insta-blocked
+- **Enhanced SeleniumBase UC Mode**: Superior anti-detection with automatic Chrome/ChromeDriver version matching - no more version headaches!
 - **Polyglot Powerhouse**: Devours reviews in a smorgasbord of languages - English, Hebrew, Thai, German, you name it!
 - **MongoDB Mastery**: Dumps pristine data structures straight into your MongoDB instance
 - **Paranoid Backups**: Mirrors everything to local JSON files because losing data sucks
@@ -350,9 +350,10 @@ print(f"Reviews with images: {len(reviews_with_images)}")
 ### DEFCON Scenarios & Quick Fixes
 1. **Chrome/Driver Having a Lovers' Quarrel**
-   - Update your damn Chrome browser already! It's 2025, people
+   - **Good news!** SeleniumBase handles Chrome/ChromeDriver version matching automatically
-   - Nuke and reinstall the driver: `pip uninstall undetected-chromedriver` then `pip install undetected-chromedriver==3.5.4`
+   - Update Chrome browser: Go to chrome://settings/help
-   - If you're on Ubuntu, sometimes a simple `apt update && apt upgrade` fixes weird Chrome issues
+   - SeleniumBase will automatically download the matching ChromeDriver - no manual intervention needed!
   - If issues persist: `pip install --upgrade seleniumbase`
 2. **MongoDB Throwing a Tantrum**
   - Double-check your connection string - typos are the #1 culprit
--- a/docs/ARCHITECTURE.md
+++ b/docs/ARCHITECTURE.md
--- a/docs/TROUBLESHOOTING.md
+++ b/docs/TROUBLESHOOTING.md
@@ -0,0 +1,708 @@
 # Troubleshooting Guide
 This guide covers common issues and their solutions when running Google Reviews Scraper Pro.
 ---
 ## Table of Contents
 1. [Chrome & ChromeDriver Issues](#chrome--chromedriver-issues)
 2. [MongoDB Issues](#mongodb-issues)
 3. [AWS S3 Issues](#aws-s3-issues)
 4. [Scraping Issues](#scraping-issues)
 5. [API Server Issues](#api-server-issues)
 6. [Image Download Issues](#image-download-issues)
 7. [Configuration Issues](#configuration-issues)
 8. [Performance Issues](#performance-issues)
 9. [Python & Dependencies Issues](#python--dependencies-issues)
 ---
 ## Chrome & ChromeDriver Issues
 ### Issue: ChromeDriver Version Mismatch
 **Error Message:**
 ```
 SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 143
 Current browser version is 142.0.7444.176
 ```
 **Cause:** Chrome/ChromeDriver version mismatch (this issue is now automatically handled by SeleniumBase).
 **Solution:**
 **Good News:** With SeleniumBase UC Mode, version mismatches are automatically resolved!
 1. **Update Chrome to latest version:**
   - macOS: Open Chrome → Menu → Help → About Google Chrome
   - Or run: `open -a "Google Chrome" "chrome://settings/help"`
 2. **Upgrade SeleniumBase (if needed):**
   ```bash
   pip install --upgrade seleniumbase
   ```
 3. **Run scraper again** - SeleniumBase automatically downloads the matching ChromeDriver.
 ---
 ### Issue: ChromeOptions Reuse Error
 **Error Message:**
 ```
 RuntimeError: you cannot reuse the ChromeOptions object
 ```
 **Cause:** Internal error when retrying Chrome initialization.
 **Solution:** Clear the ChromeDriver cache (see above) and restart the scraper.
 ---
 ### Issue: Chrome Binary Not Found
 **Error Message:**
 ```
 WebDriverException: Message: unknown error: cannot find Chrome binary
 ```
 **Cause:** Chrome is not installed or not in the expected location.
 **Solution:**
 1. **Install Chrome:**
   - Download from: https://www.google.com/chrome/
 2. **For custom Chrome location, set environment variable:**
   ```bash
   export CHROME_BIN=/path/to/chrome
   ```
 3. **Docker users:** Ensure Chrome is installed in Dockerfile:
   ```dockerfile
   RUN apt-get update && apt-get install -y google-chrome-stable
   ENV CHROME_BIN=/usr/bin/google-chrome
   ```
 ---
 ### Issue: Chrome Crashes in Headless Mode
 **Error Message:**
 ```
 WebDriverException: Message: chrome not reachable
 ```
 **Solution:**
 1. **Add required flags** (already included in scraper, but verify):
   ```
   --no-sandbox
   --disable-dev-shm-usage
   --disable-gpu
   ```
 2. **Increase shared memory** (Docker):
   ```bash
   docker run --shm-size=2g your-image
   ```
 3. **Try non-headless mode** to debug:
   ```bash
   python start.py --headless false
   ```
 ---
 ## MongoDB Issues
 ### Issue: Connection Timeout
 **Error Message:**
 ```
 ServerSelectionTimeoutError: connection timed out
 ```
 **Cause:** MongoDB server unreachable or network issues.
 **Solution:**
 1. **Verify MongoDB is running:**
   ```bash
   # Local MongoDB
   mongosh --eval "db.adminCommand('ping')"
   # Check service status
   sudo systemctl status mongod
   ```
 2. **Check connection URI:**
   ```yaml
   # config.yaml
   mongodb:
     uri: "mongodb://username:password@host:27017/"
   ```
 3. **For MongoDB Atlas:**
   - Whitelist your IP address in Atlas dashboard
   - Verify cluster is active
   - Check network connectivity
 4. **Test connection manually:**
   ```bash
   python -c "from pymongo import MongoClient; c = MongoClient('your-uri', serverSelectionTimeoutMS=5000); print(c.server_info())"
   ```
 ---
 ### Issue: Authentication Failed
 **Error Message:**
 ```
 OperationFailure: Authentication failed
 ```
 **Solution:**
 1. **Verify credentials** in connection URI
 2. **Check database name** matches the authentication database
 3. **Use correct URI format:**
   ```
   mongodb://username:password@host:27017/database?authSource=admin
   ```
 ---
 ### Issue: SSL Certificate Error
 **Error Message:**
 ```
 SSL: CERTIFICATE_VERIFY_FAILED
 ```
 **Solution:**
 1. **For macOS**, run:
   ```bash
   /Applications/Python\ 3.x/Install\ Certificates.command
   ```
 2. **Or install certifi:**
   ```bash
   pip install --upgrade certifi
   ```
 3. **The scraper auto-handles this**, but if issues persist:
   ```python
   import certifi
   import os
   os.environ['SSL_CERT_FILE'] = certifi.where()
   ```
 ---
 ## AWS S3 Issues
 ### Issue: Access Denied
 **Error Message:**
 ```
 ClientError: An error occurred (AccessDenied) when calling the PutObject operation
 ```
 **Solution:**
 1. **Verify AWS credentials:**
   ```yaml
   # config.yaml
   s3:
     aws_access_key_id: "YOUR_ACCESS_KEY"
     aws_secret_access_key: "YOUR_SECRET_KEY"
   ```
 2. **Check IAM permissions** - required policy:
   ```json
   {
     "Version": "2012-10-17",
     "Statement": [
       {
         "Effect": "Allow",
         "Action": [
           "s3:PutObject",
           "s3:GetObject",
           "s3:ListBucket",
           "s3:PutObjectAcl"
         ],
         "Resource": [
           "arn:aws:s3:::your-bucket-name",
           "arn:aws:s3:::your-bucket-name/*"
         ]
       }
     ]
   }
   ```
 3. **Check bucket policy** allows public-read if using public URLs
 ---
 ### Issue: Bucket Not Found
 **Error Message:**
 ```
 ClientError: An error occurred (NoSuchBucket)
 ```
 **Solution:**
 1. **Verify bucket name** in config.yaml
 2. **Check region** matches bucket location:
   ```yaml
   s3:
     region_name: "us-east-1"  # Must match bucket region
     bucket_name: "your-bucket"
   ```
 3. **Create bucket** if it doesn't exist via AWS Console or CLI
 ---
 ### Issue: Invalid Credentials
 **Error Message:**
 ```
 NoCredentialsError: Unable to locate credentials
 ```
 **Solution:**
 1. **Set credentials in config.yaml** or environment variables:
   ```bash
   export AWS_ACCESS_KEY_ID=your_key
   export AWS_SECRET_ACCESS_KEY=your_secret
   ```
 2. **Or use AWS credentials file:**
   ```
   ~/.aws/credentials
   [default]
   aws_access_key_id = YOUR_KEY
   aws_secret_access_key = YOUR_SECRET
   ```
 ---
 ## Scraping Issues
 ### Issue: Reviews Tab Not Found
 **Error Message:**
 ```
 TimeoutException: Reviews tab not found or could not be clicked
 ```
 **Cause:** Google Maps UI changed or page didn't load properly.
 **Solution:**
 1. **Try non-headless mode** to see what's happening:
   ```bash
   python start.py --headless false
   ```
 2. **Check the URL** is a valid Google Maps place URL
 3. **Increase timeout** - network may be slow
 4. **Clear cookies/cache** - Google may be showing consent dialogs
 5. **Try different sort order:**
   ```bash
   python start.py --sort relevance
   ```
 ---
 ### Issue: No Reviews Found
 **Error Message:**
 ```
 WARNING: No review cards found in this iteration
 ```
 **Cause:** Page structure changed or place has no reviews.
 **Solution:**
 1. **Verify the place has reviews** by opening URL in browser
 2. **Check if page requires login** for reviews
 3. **Wait longer** for page to load - add delay in config
 4. **Check for CAPTCHA** - may need to solve manually first
 ---
 ### Issue: Stale Element Reference
 **Error Message:**
 ```
 StaleElementReferenceException: stale element reference: element is not attached to the page document
 ```
 **Cause:** Page updated while scraping.
 **Solution:** This is handled automatically by the scraper. If persistent:
 1. **Reduce scroll speed** - increase sleep time
 2. **Run in non-headless mode** to observe behavior
 3. **Restart scraper** - temporary DOM issue
 ---
 ### Issue: Cookie Consent Blocking
 **Cause:** Cookie dialog not being dismissed.
 **Solution:**
 1. **Clear browser data:**
   ```bash
   rm -rf ~/Library/Application\ Support/undetected_chromedriver
   ```
 2. **The scraper handles this automatically**, but you can:
   - Open the URL manually first and accept cookies
   - Use a different Google account region
 ---
 ## API Server Issues
 ### Issue: Port Already in Use
 **Error Message:**
 ```
 OSError: [Errno 48] Address already in use
 ```
 **Solution:**
 1. **Find and kill the process:**
   ```bash
   # Find process using port 8000
   lsof -i :8000
   # Kill the process
   kill -9 <PID>
   ```
 2. **Use different port:**
   ```bash
   uvicorn api_server:app --port 8080
   ```
 ---
 ### Issue: Max Concurrent Jobs Reached
 **Error Message:**
 ```
 HTTP 429: Maximum concurrent jobs (3) reached
 ```
 **Solution:**
 1. **Wait for existing jobs** to complete
 2. **Cancel pending jobs:**
   ```bash
   curl -X POST "http://localhost:8000/jobs/{job_id}/cancel"
   ```
 3. **Increase limit** in `api_server.py` (not recommended for stability)
 ---
 ### Issue: CORS Errors (Browser)
 **Error Message:**
 ```
 Access-Control-Allow-Origin header missing
 ```
 **Solution:** CORS is enabled by default. If issues persist:
 1. **Check allowed origins** in `api_server.py`
 2. **For development**, ensure middleware is configured:
   ```python
   app.add_middleware(
       CORSMiddleware,
       allow_origins=["*"],
       allow_methods=["*"],
       allow_headers=["*"],
   )
   ```
 ---
 ## Image Download Issues
 ### Issue: Images Not Downloading
 **Cause:** Network issues or Google blocking requests.
 **Solution:**
 1. **Check network connectivity**
 2. **Verify image URLs** are accessible
 3. **Reduce parallel downloads:**
   ```yaml
   download_threads: 2  # Reduce from default 4
   ```
 4. **Check disk space** for image storage
 ---
 ### Issue: Images Corrupted or Wrong Size
 **Cause:** Partial downloads or URL issues.
 **Solution:**
 1. **Clear image directory** and re-run:
   ```bash
   rm -rf review_images/
   ```
 2. **Check max dimensions** in config:
   ```yaml
   max_width: 1200
   max_height: 1200
   ```
 ---
 ### Issue: Permission Denied Writing Images
 **Error Message:**
 ```
 PermissionError: [Errno 13] Permission denied
 ```
 **Solution:**
 1. **Check directory permissions:**
   ```bash
   chmod 755 review_images/
   ```
 2. **Use different directory:**
   ```yaml
   image_dir: "/path/with/write/access"
   ```
 ---
 ## Configuration Issues
 ### Issue: Config File Not Found
 **Error Message:**
 ```
 FileNotFoundError: config.yaml not found
 ```
 **Solution:**
 1. **Create config.yaml** from example:
   ```bash
   cp examples/config-example.txt config.yaml
   ```
 2. **Specify custom path:**
   ```bash
   python start.py --config /path/to/config.yaml
   ```
 ---
 ### Issue: Invalid YAML Syntax
 **Error Message:**
 ```
 yaml.scanner.ScannerError: mapping values are not allowed here
 ```
 **Solution:**
 1. **Validate YAML syntax** using online validator
 2. **Check indentation** - use spaces, not tabs
 3. **Escape special characters** in strings:
   ```yaml
   url: "https://example.com?param=value"  # Use quotes
   ```
 ---
 ### Issue: Invalid Configuration Values
 **Error Message:**
 ```
 ValueError: Invalid sort_by value
 ```
 **Solution:**
 1. **Check allowed values:**
   - `sort_by`: newest, highest, lowest, relevance
   - `headless`: true, false
 2. **Verify types:**
   ```yaml
   download_threads: 4      # Integer, not string
   headless: true           # Boolean, not string "true"
   ```
 ---
 ## Performance Issues
 ### Issue: Scraping Too Slow
 **Solution:**
 1. **Use headless mode:**
   ```bash
   python start.py --headless
   ```
 2. **Reduce image download threads** if network is slow:
   ```yaml
   download_threads: 2
   ```
 3. **Disable image downloading** for faster scraping:
   ```yaml
   download_images: false
   ```
 4. **Use SSD** for faster JSON/image writes
 ---
 ### Issue: High Memory Usage
 **Solution:**
 1. **Process in batches** - use `stop_on_match` for incremental scraping
 2. **Disable image downloading** temporarily
 3. **Close other applications**
 4. **Increase system swap** if needed
 ---
 ### Issue: Chrome Using Too Much CPU
 **Solution:**
 1. **Use headless mode** - reduces rendering overhead
 2. **Add GPU flags:**
   ```
   --disable-gpu
   --disable-software-rasterizer
   ```
 3. **Limit concurrent jobs** in API mode
 ---
 ## Python & Dependencies Issues
 ### Issue: Module Not Found
 **Error Message:**
 ```
 ModuleNotFoundError: No module named 'undetected_chromedriver'
 ```
 **Solution:**
 1. **Install dependencies:**
   ```bash
   pip install -r requirements.txt
   ```
 2. **Verify virtual environment is activated:**
   ```bash
   source venv/bin/activate  # Linux/macOS
   venv\Scripts\activate     # Windows
   ```
 ---
 ### Issue: Incompatible Package Versions
 **Error Message:**
 ```
 ImportError: cannot import name 'X' from 'Y'
 ```
 **Solution:**
 1. **Reinstall all dependencies:**
   ```bash
   pip uninstall -r requirements.txt -y
   pip install -r requirements.txt
   ```
 2. **Create fresh virtual environment:**
   ```bash
   python -m venv fresh_venv
   source fresh_venv/bin/activate
   pip install -r requirements.txt
   ```
 ---
 ### Issue: Python Version Incompatibility
 **Error Message:**
 ```
 SyntaxError: invalid syntax
 ```
 **Solution:**
 1. **Check Python version** (requires 3.9+):
   ```bash
   python --version
   ```
 2. **Install correct Python version:**
   ```bash
   # macOS with pyenv
   pyenv install 3.13.1
   pyenv local 3.13.1
   # Or use system package manager
   ```
 ---
 ## Getting Help
 If your issue isn't listed here:
 1. **Enable debug logging:**
   ```bash
   LOG_LEVEL=DEBUG python start.py
   ```
 2. **Check logs** for detailed error messages
 3. **Search existing issues** on GitHub
 4. **Create a new issue** with:
   - Error message (full traceback)
   - Python version (`python --version`)
   - OS and version
   - Chrome version
   - Steps to reproduce
--- a/modules/scraper.py
+++ b/modules/scraper.py
@@ -1,5 +1,6 @@
 """
 Selenium scraping logic for Google Maps Reviews.
 Uses SeleniumBase UC Mode for enhanced anti-detection and better Chrome version management.
 """
 import logging
@@ -10,7 +11,7 @@ import time
 import traceback
 from typing import Dict, Any, List
-import undetected_chromedriver as uc
+from seleniumbase import Driver
 from selenium.common.exceptions import TimeoutException, StaleElementReferenceException
 from selenium.webdriver import Chrome
 from selenium.webdriver.common.action_chains import ActionChains
@@ -169,72 +170,87 @@ class GoogleReviewsScraper:
        self.backup_to_json = config.get("backup_to_json", True)
        self.overwrite_existing = config.get("overwrite_existing", False)
-    def setup_driver(self, headless: bool) -> Chrome:
+    def setup_driver(self, headless: bool):
        """
-        Set up and configure Chrome driver with flexibility for different environments.
+        Set up and configure Chrome driver using SeleniumBase UC Mode.
        SeleniumBase provides enhanced anti-detection and automatic Chrome/ChromeDriver version management.
        Works in both Docker containers and on regular OS installations (Windows, Mac, Linux).
        """
        # Determine if we're running in a container
        in_container = os.environ.get('CHROME_BIN') is not None
        # Create Chrome options
        opts = uc.ChromeOptions()
        opts.add_argument("--window-size=1400,900")
        opts.add_argument("--ignore-certificate-errors")
        opts.add_argument("--disable-gpu")  # Improves performance
        opts.add_argument("--disable-dev-shm-usage")  # Helps with stability
        opts.add_argument("--no-sandbox")  # More stable in some environments
        # Use headless mode if requested
        if headless:
            opts.add_argument("--headless=new")
        # Log platform information for debugging
        log.info(f"Platform: {platform.platform()}")
        log.info(f"Python version: {platform.python_version()}")
        log.info("Using SeleniumBase UC Mode for enhanced anti-detection")
        # Determine if we're running in a container
        in_container = os.environ.get('CHROME_BIN') is not None
        # If in container, use environment-provided binaries
        if in_container:
            chrome_binary = os.environ.get('CHROME_BIN')
            chromedriver_path = os.environ.get('CHROMEDRIVER_PATH')
            log.info(f"Container environment detected")
            log.info(f"Chrome binary: {chrome_binary}")
            log.info(f"ChromeDriver path: {chromedriver_path}")
            # Create driver with custom binary location for containers
            if chrome_binary and os.path.exists(chrome_binary):
-                log.info(f"Using Chrome binary from environment: {chrome_binary}")
+                try:
-                opts.binary_location = chrome_binary
+                    driver = Driver(
-
+                        uc=True,
-            try:
+                        headless=headless,
-                # Try creating Chrome driver with undetected_chromedriver
+                        binary_location=chrome_binary,
-                log.info("Attempting to create undetected_chromedriver instance")
+                        page_load_strategy="normal"
-                driver = uc.Chrome(options=opts)
+                    )
-                log.info("Successfully created undetected_chromedriver instance")
+                    log.info("Successfully created SeleniumBase UC driver with custom binary")
-            except Exception as e:
+                except Exception as e:
-                # Fall back to regular Selenium if undetected_chromedriver fails
+                    log.warning(f"Failed to create driver with custom binary: {e}")
-                log.warning(f"Failed to create undetected_chromedriver instance: {e}")
+                    # Fall back to default
-                log.info("Falling back to regular Selenium Chrome")
+                    driver = Driver(
-
+                        uc=True,
-                # Import Selenium webdriver here to avoid potential import issues
+                        headless=headless,
-                from selenium import webdriver
+                        page_load_strategy="normal"
-                from selenium.webdriver.chrome.service import Service
+                    )
-
+                    log.info("Successfully created SeleniumBase UC driver with defaults")
-                if chromedriver_path and os.path.exists(chromedriver_path):
+            else:
-                    log.info(f"Using ChromeDriver from path: {chromedriver_path}")
+                driver = Driver(
-                    service = Service(executable_path=chromedriver_path)
+                    uc=True,
-                    driver = webdriver.Chrome(service=service, options=opts)
+                    headless=headless,
-                else:
+                    page_load_strategy="normal"
-                    log.info("Using default ChromeDriver")
+                )
-                    driver = webdriver.Chrome(options=opts)
+                log.info("Successfully created SeleniumBase UC driver")
        else:
-            # On regular OS, use default undetected_chromedriver
+            # Regular OS environment - SeleniumBase handles version matching automatically
-            log.info("Using standard undetected_chromedriver setup")
+            log.info("Creating SeleniumBase UC Mode driver")
-            driver = uc.Chrome(options=opts)
+            try:
                driver = Driver(
                    uc=True,
                    headless=headless,
                    page_load_strategy="normal",
                    incognito=True  # Use incognito mode for better stealth
                )
                log.info("Successfully created SeleniumBase UC driver")
            except Exception as e:
                log.error(f"Failed to create SeleniumBase driver: {e}")
                raise
        # Set page load timeout to avoid hanging
        driver.set_page_load_timeout(30)
-        log.info("Chrome driver setup completed successfully")
+
        # Set window size
        driver.set_window_size(1400, 900)
        # Add additional stealth settings
        try:
            # Disable automation flags
            driver.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {
                'source': '''
                    Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
                    Object.defineProperty(navigator, 'plugins', {get: () => [1, 2, 3, 4, 5]});
                    Object.defineProperty(navigator, 'languages', {get: () => ['en-US', 'en']});
                '''
            })
            log.info("Additional stealth settings applied")
        except Exception as e:
            log.debug(f"Could not apply additional stealth settings: {e}")
        log.info("SeleniumBase UC driver setup completed successfully")
        return driver
    def dismiss_cookies(self, driver: Chrome):
@@ -471,9 +487,11 @@ class GoogleReviewsScraper:
                        parts = current_url.split('/place/')
                        new_url = f"{parts[0]}/place/{parts[1].split('/')[0]}/reviews?hl={lang_code}"
                        driver.get(new_url)
-                        time.sleep(2)
+                        time.sleep(3)  # Increased wait time for page load
                        if "review" in driver.current_url.lower():
                            log.info("Navigated directly to reviews page via URL")
                            # Extra wait for reviews to render after URL navigation
                            time.sleep(2)
                            return True
            # Try to identify reviews link in URL
@@ -481,9 +499,11 @@ class GoogleReviewsScraper:
                parts = current_url.split('/place/')
                new_url = f"{parts[0]}/place/{parts[1].split('/')[0]}/reviews"
                driver.get(new_url)
-                time.sleep(2)
+                time.sleep(3)  # Increased wait time for page load
                if "review" in driver.current_url.lower():
                    log.info("Navigated directly to reviews page via URL")
                    # Extra wait for reviews to render after URL navigation
                    time.sleep(2)
                    return True
        except Exception as url_error:
            log.warning(f"Failed to navigate to reviews via URL: {url_error}")
@@ -831,34 +851,37 @@ class GoogleReviewsScraper:
                target_item = None
                matched_text = None
-                # 1. First try direct text matching
+                # Log all available menu items for debugging
-                wanted_labels = SORT_OPTIONS.get(method, [])
+                log.info(f"Available menu items: {[text for _, text in visible_items]}")
-                for item, text in visible_items:
+                # Use position-based selection (most reliable for Google Maps)
                position_map = {
                    "relevance": 0,  # Usually the first option
                    "newest": 1,  # Usually the second option
                    "highest": 2,  # Usually the third option
                    "lowest": 3  # Usually the fourth option
                }
                pos = position_map.get(method, -1)
                if pos >= 0 and pos < len(visible_items):
                    target_item, matched_text = visible_items[pos]
                    log.info(f"Selected menu item at position {pos + 1}: '{matched_text}' for sort method '{method}'")
                    # Validate the selection makes sense
                    wanted_labels = SORT_OPTIONS.get(method, [])
                    text_clean = matched_text.lower()
                    # Check if selected text contains any of the expected keywords
                    valid_selection = False
                    for label in wanted_labels:
-                        if (label in text or text in label or
+                        if label.lower() in text_clean or text_clean in label.lower():
-                                (len(text) > 0 and len(label) > 0 and
+                            valid_selection = True
                                 text.lower().startswith(label.lower()[:3]))):
                            target_item = item
                            matched_text = text
                            log.info(f"Found matching menu item: '{text}' for '{label}'")
                            break
                    if target_item:
                        break
-                # 2. If no match found, try position-based selection
+                    if not valid_selection:
-                if not target_item and visible_items:
+                        log.warning(f"WARNING: Selected '{matched_text}' doesn't match expected '{method}' - might be wrong sort!")
-                    position_map = {
+                else:
-                        "relevance": 0,  # Usually the first option
+                    log.warning(f"Position {pos} not available in menu (only {len(visible_items)} items)")
                        "newest": 1,  # Usually the second option
                        "highest": 2,  # Usually the third option
                        "lowest": 3  # Usually the fourth option
                    }
                    pos = position_map.get(method, -1)
                    if pos >= 0 and pos < len(visible_items):
                        target_item, matched_text = visible_items[pos]
                        log.info(f"Using position-based selection (position {pos}) for '{method}'")
                # 3. If target found, click it
                if target_item:
@@ -1108,16 +1131,55 @@ class GoogleReviewsScraper:
            self.dismiss_cookies(driver)
            self.click_reviews_tab(driver)
            self.set_sort(driver, sort_by)
-            # Add a wait after setting sort to allow results to load
+            # Extra wait after clicking reviews tab to ensure page loads
-            time.sleep(1)
+            log.info("Waiting for reviews page to fully load...")
            time.sleep(3)
            # Wait for page to be fully interactive
            try:
                wait.until(lambda d: d.execute_script("return document.readyState") == "complete")
                log.info("Page DOM is ready")
            except:
                log.debug("Could not verify page ready state")
            # Verify we're on a reviews page before proceeding
            if "review" not in driver.current_url.lower():
                log.warning("URL doesn't contain 'review' - might not be on reviews page")
            # Try to set sort - but don't fail if it doesn't work
            try:
                self.set_sort(driver, sort_by)
            except Exception as sort_error:
                log.warning(f"Sort failed but continuing: {sort_error}")
            # Add a longer wait after setting sort to allow results to load
            log.info("Waiting for reviews to render...")
            time.sleep(3)
            # Use try-except to handle cases where the pane is not found
-            try:
+            # Try multiple selectors for the reviews pane
-                pane = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, PANE_SEL)))
+            pane = None
-            except TimeoutException:
+            pane_selectors = [
-                log.warning("Could not find reviews pane. Page structure might have changed.")
+                PANE_SEL,  # Primary selector
                'div[role="main"] div.m6QErb',  # Simplified version
                'div.m6QErb.DxyBCb',  # Even more simplified
                'div[role="main"]'  # Most generic
            ]
            for selector in pane_selectors:
                try:
                    log.info(f"Trying to find reviews pane with selector: {selector}")
                    pane = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, selector)))
                    if pane:
                        log.info(f"Found reviews pane with selector: {selector}")
                        break
                except TimeoutException:
                    log.debug(f"Pane not found with selector: {selector}")
                    continue
            if not pane:
                log.warning("Could not find reviews pane with any selector. Page structure might have changed.")
                return False
            pbar = tqdm(desc="Scraped", ncols=80, initial=len(seen))
@@ -1132,8 +1194,12 @@ class GoogleReviewsScraper:
                log.warning(f"Error setting up scroll script: {e}")
                scroll_script = "window.scrollBy(0, 300);"  # Fallback to simple scrolling
-            max_attempts = 10  # Limit the number of attempts to find reviews
+            max_attempts = 50  # Increased from 10 to 50 for very patient scrolling
            attempts = 0
            max_idle = 15  # Increased from 3 to 15 - much more patience for lazy-loaded reviews
            consecutive_no_cards = 0  # Track how many times we find zero cards
            last_scroll_position = 0
            scroll_stuck_count = 0
            while attempts < max_attempts:
                try:
@@ -1142,12 +1208,23 @@ class GoogleReviewsScraper:
                    # Check for valid cards
                    if len(cards) == 0:
-                        log.debug("No review cards found in this iteration")
+                        consecutive_no_cards += 1
                        log.info(f"No review cards found in this iteration (consecutive: {consecutive_no_cards})")
                        # If we keep finding no cards, might have hit the end
                        if consecutive_no_cards > 5:
                            log.warning("No cards found for 5+ iterations - might be at end of reviews")
                            break
                        attempts += 1
-                        # Try scrolling anyway
+                        # Try aggressive scrolling
                        driver.execute_script(scroll_script)
                        time.sleep(1)
                        driver.execute_script("window.scrollBy(0, 1000);")  # Extra scroll
                        time.sleep(1.5)
                        continue
                    else:
                        consecutive_no_cards = 0  # Reset counter when we find cards
                    for c in cards:
                        try:
@@ -1186,12 +1263,48 @@ class GoogleReviewsScraper:
                        idle = 0
                        attempts = 0  # Reset attempts counter when we successfully process a review
-                    if idle >= 3:
+                    if idle >= max_idle:
                        log.info(f"Stopping: No new reviews found after {max_idle} scroll attempts")
                        break
                    if not fresh_cards:
                        idle += 1
                        attempts += 1
                        log.info(f"No new reviews in this iteration (idle: {idle}/{max_idle}, attempts: {attempts}/{max_attempts}, total seen: {len(seen)})")
                        # When no new reviews, scroll more aggressively
                        try:
                            # Try multiple scroll methods
                            driver.execute_script(scroll_script)
                            time.sleep(0.5)
                            driver.execute_script("window.scrollBy(0, 500);")  # Extra scroll
                            time.sleep(0.5)
                        except Exception as e:
                            log.warning(f"Error scrolling: {e}")
                    else:
                        log.info(f"Found {len(fresh_cards)} new reviews in this iteration")
                    # Check if we're actually scrolling or stuck
                    try:
                        current_scroll = driver.execute_script("return arguments[0].scrollTop;", pane)
                        if current_scroll == last_scroll_position and len(fresh_cards) == 0:
                            scroll_stuck_count += 1
                            log.warning(f"Scroll position hasn't changed (stuck at {current_scroll}px, stuck count: {scroll_stuck_count})")
                            if scroll_stuck_count > 5:
                                log.warning("Scroll is stuck - trying alternative scroll method")
                                # Try clicking the last visible review to force loading
                                try:
                                    driver.execute_script("arguments[0].lastElementChild.scrollIntoView();", pane)
                                    time.sleep(2)
                                except:
                                    pass
                                scroll_stuck_count = 0
                        else:
                            scroll_stuck_count = 0
                            last_scroll_position = current_scroll
                    except:
                        pass
                    # Use JavaScript for smoother scrolling
                    try:
@@ -1201,8 +1314,13 @@ class GoogleReviewsScraper:
                        # Try a simpler scroll method
                        driver.execute_script("window.scrollBy(0, 300);")
-                    # Dynamic sleep: sleep less when processing many reviews
+                    # Dynamic sleep: sleep less when processing many reviews, more when finding none
-                    sleep_time = 0.7 if len(fresh_cards) > 5 else 1.0
+                    if len(fresh_cards) > 5:
                        sleep_time = 0.7
                    elif len(fresh_cards) == 0:
                        sleep_time = 2.0  # Wait longer when finding nothing (let page load)
                    else:
                        sleep_time = 1.0
                    time.sleep(sleep_time)
                except StaleElementReferenceException:
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,17 +1,8 @@
-requests==2.32.3
+seleniumbase>=4.34.9
 beautifulsoup4==4.12.3
 aiohttp==3.11.11
 googletrans==4.0.2
-selenium==4.15.2
+tqdm>=4.66.3
 undetected-chromedriver==3.5.4
 tqdm==4.66.3
 pymongo==4.12.0
 pyyaml==6.0.1
 certifi==2024.7.4
 webdriver-manager==4.0.2
 setuptools==79.0.1
 boto3==1.35.1
 pytest==7.4.3
 fastapi==0.104.1
 uvicorn==0.24.0
 botocore~=1.35.99
--- a/tests/test_seleniumbase_integration.py
+++ b/tests/test_seleniumbase_integration.py
@@ -0,0 +1,110 @@
 """
 Tests for SeleniumBase UC Mode integration.
 Verifies that the driver setup works correctly with the new library.
 """
 import pytest
 from modules.scraper import GoogleReviewsScraper
 def test_seleniumbase_driver_creation():
    """Test that SeleniumBase driver can be created successfully"""
    config = {
        "url": "https://maps.app.goo.gl/test",
        "headless": True,
        "use_mongodb": False,
        "backup_to_json": False
    }
    scraper = GoogleReviewsScraper(config)
    # Test driver creation
    driver = None
    try:
        driver = scraper.setup_driver(headless=True)
        assert driver is not None
        assert driver.name == "chrome"
        # Verify driver can navigate
        driver.get("https://www.google.com")
        assert "google" in driver.current_url.lower()
    finally:
        if driver:
            driver.quit()
 def test_seleniumbase_driver_headless_mode():
    """Test that headless mode works correctly"""
    config = {
        "url": "https://maps.app.goo.gl/test",
        "headless": True,
        "use_mongodb": False,
        "backup_to_json": False
    }
    scraper = GoogleReviewsScraper(config)
    driver = None
    try:
        driver = scraper.setup_driver(headless=True)
        assert driver is not None
        # In headless mode, window size should still be set
        size = driver.get_window_size()
        assert size['width'] == 1400
        assert size['height'] == 900
    finally:
        if driver:
            driver.quit()
 def test_seleniumbase_driver_nonheadless_mode():
    """Test that non-headless mode works correctly"""
    config = {
        "url": "https://maps.app.goo.gl/test",
        "headless": False,
        "use_mongodb": False,
        "backup_to_json": False
    }
    scraper = GoogleReviewsScraper(config)
    driver = None
    try:
        driver = scraper.setup_driver(headless=False)
        assert driver is not None
        assert driver.name == "chrome"
    finally:
        if driver:
            driver.quit()
@pytest.mark.skip(reason="Integration test - requires network access")
 def test_seleniumbase_google_maps_access():
    """Test that driver can access Google Maps (integration test)"""
    config = {
        "url": "https://maps.app.goo.gl/6tkNMDjcj3SS6LJe9",
        "headless": True,
        "use_mongodb": False,
        "backup_to_json": False
    }
    scraper = GoogleReviewsScraper(config)
    driver = None
    try:
        driver = scraper.setup_driver(headless=True)
        driver.get(config["url"])
        # Wait for redirect to Google Maps
        import time
        time.sleep(3)
        assert "google.com/maps" in driver.current_url
    finally:
        if driver:
            driver.quit()