# API Quick Start - Fast Google Reviews Scraper ## โšก Ultra-Fast API (18.9 seconds!) REST API for scraping Google Maps reviews using the optimized DOM-only scraper. **Performance**: ~18.9 seconds for 244 reviews (8.2x faster than original!) --- ## ๐Ÿš€ Quick Start ### 1. Install & Run ```bash # Install dependencies pip install fastapi uvicorn seleniumbase pyyaml # Start API server python api_server.py ``` Server starts on: `http://localhost:8000` ### 2. Use the API ```bash # Start a scraping job curl -X POST "http://localhost:8000/scrape" \ -H "Content-Type: application/json" \ -d '{ "url": "https://www.google.com/maps/place/YOUR_BUSINESS_URL", "headless": true }' ``` **Response:** ```json { "job_id": "550e8400-e29b-41d4-a716-446655440000", "status": "started" } ``` ### 3. Check Status ```bash # Check job status curl "http://localhost:8000/jobs/550e8400-e29b-41d4-a716-446655440000" ``` **Response:** ```json { "status": "completed", "reviews_count": 244, "scrape_time": 18.9 } ``` ### 4. Get Reviews ```bash # Get the actual reviews curl "http://localhost:8000/jobs/550e8400-e29b-41d4-a716-446655440000/reviews" \ -o reviews.json ``` --- ## ๐Ÿ“‹ Key Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/scrape` | POST | Start scraping job | | `/jobs/{job_id}` | GET | Get job status | | `/jobs/{job_id}/reviews` | GET | Get scraped reviews | | `/jobs` | GET | List all jobs | | `/stats` | GET | Get statistics | --- ## ๐Ÿ’ป Python Example ```python import requests import time # 1. Start job response = requests.post( "http://localhost:8000/scrape", json={ "url": "https://www.google.com/maps/place/...", "headless": True } ) job_id = response.json()['job_id'] # 2. Wait for completion while True: job = requests.get(f"http://localhost:8000/jobs/{job_id}").json() if job['status'] in ['completed', 'failed']: break time.sleep(2) # 3. Get reviews reviews = requests.get( f"http://localhost:8000/jobs/{job_id}/reviews" ).json()['reviews'] print(f"Got {len(reviews)} reviews!") ``` --- ## ๐Ÿงช Test It ```bash # Run the test script python test_fast_api.py ``` This will: - Start a job - Poll until complete - Save reviews to JSON - Show statistics --- ## ๐Ÿ“š Full Documentation See [API_DOCUMENTATION.md](API_DOCUMENTATION.md) for: - Complete endpoint reference - Advanced examples - Error handling - Production deployment - Monitoring & troubleshooting --- ## ๐ŸŽฏ API Features โœ… **Ultra-fast scraping** (18.9s average) โœ… **Background job processing** (non-blocking) โœ… **Concurrent jobs** (up to 3 simultaneous) โœ… **Job status tracking** (pending/running/completed) โœ… **Review data retrieval** (via dedicated endpoint) โœ… **Automatic cleanup** (removes old jobs) โœ… **GDPR auto-handling** (no manual intervention) โœ… **REST API** (language-agnostic) โœ… **OpenAPI docs** (visit `/docs` for Swagger UI) --- ## ๐Ÿ”ง Configuration ### API Server ```python # In api_server.py job_manager = JobManager(max_concurrent_jobs=3) # Max parallel jobs uvicorn.run( "api_server:app", host="0.0.0.0", # Listen on all interfaces port=8000, # Port number reload=True # Auto-reload on code changes ) ``` ### Scraping Options ```json { "url": "https://www.google.com/maps/place/...", "headless": true, // Run Chrome in headless mode "max_scrolls": 35 // Maximum scrolls (default: 35) } ``` --- ## ๐Ÿ“Š Performance ``` Operation Time % of Total โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Scrolling (dynamic) ~14s 74% Setup & navigation ~4.5s 24% JavaScript extraction ~0.01s 0.1% โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ TOTAL ~18.9s 100% ``` **8.2x faster than the original scraper!** ๐Ÿš€ --- ## ๐ŸŒ Interactive Documentation Visit `http://localhost:8000/docs` for: - Interactive API testing - Request/response schemas - Try out endpoints directly in browser --- ## โš™๏ธ What Changed? The API now uses the **fast DOM-only scraper** (`modules/fast_scraper.py`) instead of the old scraper: **Before**: 155 seconds โŒ **Now**: 18.9 seconds โœ… **Key optimizations**: 1. GDPR consent auto-handling 2. Dynamic scroll waiting (adapts to page speed) 3. JavaScript extraction (40x faster than Selenium) 4. Universal design (no hardcoded values) --- **Ready to scrape at 8.2x speed via API!** ๐Ÿš€