143 lines
3.7 KiB
Markdown
143 lines
3.7 KiB
Markdown
# Google Reviews Scraper Pro - Claude Code Instructions
|
|
|
|
## Quick Start
|
|
|
|
### Run with NUC Database (Recommended)
|
|
The PostgreSQL database is hosted on the NUC server. Only the API runs locally.
|
|
|
|
```bash
|
|
# Use NUC database config
|
|
cp .env.nuc .env
|
|
|
|
# Start API only (connects to NUC database)
|
|
docker compose -f docker-compose.production.yml -f docker-compose.nuc.yml up -d
|
|
|
|
# View logs
|
|
docker compose -f docker-compose.production.yml logs -f api
|
|
```
|
|
|
|
### Run Fully Local (Legacy)
|
|
Runs both PostgreSQL and API locally.
|
|
|
|
```bash
|
|
# Use local database config
|
|
cp .env.example .env
|
|
# Edit .env with your settings
|
|
|
|
# Start all services
|
|
docker compose -f docker-compose.production.yml up -d
|
|
```
|
|
|
|
## NUC Database Connection
|
|
|
|
| Property | Value |
|
|
|----------|-------|
|
|
| Host | 192.168.1.3 |
|
|
| Port | 5437 |
|
|
| Database | scraper |
|
|
| User | scraper |
|
|
| Password | scraper_nuc_2026 |
|
|
| Coolify UUID | g4s8w4csk8s8ocswg48kkogo |
|
|
|
|
```bash
|
|
# Direct connection
|
|
psql postgresql://scraper:scraper_nuc_2026@192.168.1.3:5437/scraper
|
|
|
|
# Via SSH tunnel (if needed)
|
|
ssh -L 5437:localhost:5437 nuc
|
|
```
|
|
|
|
## Service URLs
|
|
|
|
| Service | URL |
|
|
|---------|-----|
|
|
| API | http://localhost:8001 |
|
|
| API Docs | http://localhost:8001/docs |
|
|
| VNC (browser debugging) | http://localhost:6080 |
|
|
| VNC (client) | vnc://localhost:5900 |
|
|
|
|
## Common Commands
|
|
|
|
```bash
|
|
# Start services
|
|
docker compose -f docker-compose.production.yml -f docker-compose.nuc.yml up -d
|
|
|
|
# Stop services
|
|
docker compose -f docker-compose.production.yml -f docker-compose.nuc.yml down
|
|
|
|
# View API logs
|
|
docker logs -f scraper-api
|
|
|
|
# Rebuild API after code changes
|
|
docker compose -f docker-compose.production.yml -f docker-compose.nuc.yml up -d --build api
|
|
|
|
# Run a scrape job (example)
|
|
curl -X POST http://localhost:8001/api/jobs \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"url": "https://www.google.com/maps/place/..."}'
|
|
|
|
# Check job status
|
|
curl http://localhost:8001/api/jobs/{job_id}
|
|
```
|
|
|
|
## Database Management
|
|
|
|
```bash
|
|
# Connect to NUC database
|
|
docker run --rm -it postgres:15-alpine psql postgresql://scraper:scraper_nuc_2026@192.168.1.3:5437/scraper
|
|
|
|
# Backup database
|
|
ssh nuc "docker exec postgres-g4s8w4csk8s8ocswg48kkogo pg_dump -U scraper scraper" > backup.sql
|
|
|
|
# Restore database
|
|
cat backup.sql | ssh nuc "docker exec -i postgres-g4s8w4csk8s8ocswg48kkogo psql -U scraper scraper"
|
|
```
|
|
|
|
## Project Structure
|
|
|
|
```
|
|
├── api/ # FastAPI backend
|
|
├── packages/
|
|
│ ├── pipeline-core/ # Shared pipeline utilities
|
|
│ └── reviewiq-pipeline/ # Review analysis pipeline
|
|
├── web/ # Next.js frontend (optional)
|
|
├── db/init/ # Database initialization scripts
|
|
├── docker-compose.production.yml # Main compose file
|
|
├── docker-compose.nuc.yml # NUC database override
|
|
├── .env.nuc # NUC environment config
|
|
└── Dockerfile # API container build
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### API can't connect to NUC database
|
|
```bash
|
|
# Check NUC is reachable
|
|
nc -zv 192.168.1.3 5437
|
|
|
|
# Check database is running
|
|
ssh nuc "docker ps | grep postgres-g4s8w4csk8s8ocswg48kkogo"
|
|
|
|
# Restart database on NUC
|
|
ssh nuc "docker restart postgres-g4s8w4csk8s8ocswg48kkogo"
|
|
```
|
|
|
|
### Chrome/Scraping issues
|
|
```bash
|
|
# Check VNC for visual debugging
|
|
open http://localhost:6080
|
|
|
|
# Increase shared memory if crashes
|
|
# Edit docker-compose: shm_size: 4gb
|
|
```
|
|
|
|
## Environment Variables
|
|
|
|
| Variable | Description | Default |
|
|
|----------|-------------|---------|
|
|
| DATABASE_URL | PostgreSQL connection string | (required) |
|
|
| API_BASE_URL | Public API URL | http://localhost:8001 |
|
|
| MAX_CONCURRENT_JOBS | Parallel scrape jobs | 5 |
|
|
| OPENAI_API_KEY | For ReviewIQ analysis | (optional) |
|
|
| ANTHROPIC_API_KEY | For ReviewIQ analysis | (optional) |
|