whyrating-engine-legacy/API_QUICKSTART.md

# API Quick Start - Fast Google Reviews Scraper

## ⚡ Ultra-Fast API (18.9 seconds!)

REST API for scraping Google Maps reviews using the optimized DOM-only scraper.

**Performance**: ~18.9 seconds for 244 reviews (8.2x faster than original!)

---

## 🚀 Quick Start

### 1. Install & Run

```bash
# Install dependencies
pip install fastapi uvicorn seleniumbase pyyaml

# Start API server
python api_server.py
```

Server starts on: `http://localhost:8000`

### 2. Use the API

```bash
# Start a scraping job
curl -X POST "http://localhost:8000/scrape" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://www.google.com/maps/place/YOUR_BUSINESS_URL",
    "headless": true
  }'
```

**Response:**
```json
{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "status": "started"
}
```

### 3. Check Status

```bash
# Check job status
curl "http://localhost:8000/jobs/550e8400-e29b-41d4-a716-446655440000"
```

**Response:**
```json
{
  "status": "completed",
  "reviews_count": 244,
  "scrape_time": 18.9
}
```

### 4. Get Reviews

```bash
# Get the actual reviews
curl "http://localhost:8000/jobs/550e8400-e29b-41d4-a716-446655440000/reviews" \
  -o reviews.json
```

---

## 📋 Key Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/scrape` | POST | Start scraping job |
| `/jobs/{job_id}` | GET | Get job status |
| `/jobs/{job_id}/reviews` | GET | Get scraped reviews |
| `/jobs` | GET | List all jobs |
| `/stats` | GET | Get statistics |

---

## 💻 Python Example

```python
import requests
import time

# 1. Start job
response = requests.post(
    "http://localhost:8000/scrape",
    json={
        "url": "https://www.google.com/maps/place/...",
        "headless": True
    }
)
job_id = response.json()['job_id']

# 2. Wait for completion
while True:
    job = requests.get(f"http://localhost:8000/jobs/{job_id}").json()
    if job['status'] in ['completed', 'failed']:
        break
    time.sleep(2)

# 3. Get reviews
reviews = requests.get(
    f"http://localhost:8000/jobs/{job_id}/reviews"
).json()['reviews']

print(f"Got {len(reviews)} reviews!")
```

---

## 🧪 Test It

```bash
# Run the test script
python test_fast_api.py
```

This will:
- Start a job
- Poll until complete
- Save reviews to JSON
- Show statistics

---

## 📚 Full Documentation

See [API_DOCUMENTATION.md](API_DOCUMENTATION.md) for:
- Complete endpoint reference
- Advanced examples
- Error handling
- Production deployment
- Monitoring & troubleshooting

---

## 🎯 API Features

✅ **Ultra-fast scraping** (18.9s average)
✅ **Background job processing** (non-blocking)
✅ **Concurrent jobs** (up to 3 simultaneous)
✅ **Job status tracking** (pending/running/completed)
✅ **Review data retrieval** (via dedicated endpoint)
✅ **Automatic cleanup** (removes old jobs)
✅ **GDPR auto-handling** (no manual intervention)
✅ **REST API** (language-agnostic)
✅ **OpenAPI docs** (visit `/docs` for Swagger UI)

---

## 🔧 Configuration

### API Server

```python
# In api_server.py
job_manager = JobManager(max_concurrent_jobs=3)  # Max parallel jobs

uvicorn.run(
    "api_server:app",
    host="0.0.0.0",  # Listen on all interfaces
    port=8000,        # Port number
    reload=True       # Auto-reload on code changes
)
```

### Scraping Options

```json
{
  "url": "https://www.google.com/maps/place/...",
  "headless": true,     // Run Chrome in headless mode
  "max_scrolls": 35     // Maximum scrolls (default: 35)
}
```

---

## 📊 Performance

```
Operation                 Time      % of Total
──────────────────────────────────────────────
Scrolling (dynamic)       ~14s      74%
Setup & navigation        ~4.5s     24%
JavaScript extraction     ~0.01s    0.1%
──────────────────────────────────────────────
TOTAL                     ~18.9s    100%
```

**8.2x faster than the original scraper!** 🚀

---

## 🌐 Interactive Documentation

Visit `http://localhost:8000/docs` for:
- Interactive API testing
- Request/response schemas
- Try out endpoints directly in browser

---

## ⚙️ What Changed?

The API now uses the **fast DOM-only scraper** (`modules/fast_scraper.py`) instead of the old scraper:

**Before**: 155 seconds ❌
**Now**: 18.9 seconds ✅

**Key optimizations**:
1. GDPR consent auto-handling
2. Dynamic scroll waiting (adapts to page speed)
3. JavaScript extraction (40x faster than Selenium)
4. Universal design (no hardcoded values)

---

**Ready to scrape at 8.2x speed via API!** 🚀