Initial commit - WhyRating Engine (Google Reviews Scraper)

This commit is contained in:
Alejandro Gutiérrez
2026-02-02 18:19:00 +00:00
parent 0543a08242
commit 2206ddeff2
136 changed files with 51138 additions and 855 deletions

142
CLAUDE.md Normal file
View File

@@ -0,0 +1,142 @@
# Google Reviews Scraper Pro - Claude Code Instructions
## Quick Start
### Run with NUC Database (Recommended)
The PostgreSQL database is hosted on the NUC server. Only the API runs locally.
```bash
# Use NUC database config
cp .env.nuc .env
# Start API only (connects to NUC database)
docker compose -f docker-compose.production.yml -f docker-compose.nuc.yml up -d
# View logs
docker compose -f docker-compose.production.yml logs -f api
```
### Run Fully Local (Legacy)
Runs both PostgreSQL and API locally.
```bash
# Use local database config
cp .env.example .env
# Edit .env with your settings
# Start all services
docker compose -f docker-compose.production.yml up -d
```
## NUC Database Connection
| Property | Value |
|----------|-------|
| Host | 192.168.1.3 |
| Port | 5437 |
| Database | scraper |
| User | scraper |
| Password | scraper_nuc_2026 |
| Coolify UUID | g4s8w4csk8s8ocswg48kkogo |
```bash
# Direct connection
psql postgresql://scraper:scraper_nuc_2026@192.168.1.3:5437/scraper
# Via SSH tunnel (if needed)
ssh -L 5437:localhost:5437 nuc
```
## Service URLs
| Service | URL |
|---------|-----|
| API | http://localhost:8001 |
| API Docs | http://localhost:8001/docs |
| VNC (browser debugging) | http://localhost:6080 |
| VNC (client) | vnc://localhost:5900 |
## Common Commands
```bash
# Start services
docker compose -f docker-compose.production.yml -f docker-compose.nuc.yml up -d
# Stop services
docker compose -f docker-compose.production.yml -f docker-compose.nuc.yml down
# View API logs
docker logs -f scraper-api
# Rebuild API after code changes
docker compose -f docker-compose.production.yml -f docker-compose.nuc.yml up -d --build api
# Run a scrape job (example)
curl -X POST http://localhost:8001/api/jobs \
-H "Content-Type: application/json" \
-d '{"url": "https://www.google.com/maps/place/..."}'
# Check job status
curl http://localhost:8001/api/jobs/{job_id}
```
## Database Management
```bash
# Connect to NUC database
docker run --rm -it postgres:15-alpine psql postgresql://scraper:scraper_nuc_2026@192.168.1.3:5437/scraper
# Backup database
ssh nuc "docker exec postgres-g4s8w4csk8s8ocswg48kkogo pg_dump -U scraper scraper" > backup.sql
# Restore database
cat backup.sql | ssh nuc "docker exec -i postgres-g4s8w4csk8s8ocswg48kkogo psql -U scraper scraper"
```
## Project Structure
```
├── api/ # FastAPI backend
├── packages/
│ ├── pipeline-core/ # Shared pipeline utilities
│ └── reviewiq-pipeline/ # Review analysis pipeline
├── web/ # Next.js frontend (optional)
├── db/init/ # Database initialization scripts
├── docker-compose.production.yml # Main compose file
├── docker-compose.nuc.yml # NUC database override
├── .env.nuc # NUC environment config
└── Dockerfile # API container build
```
## Troubleshooting
### API can't connect to NUC database
```bash
# Check NUC is reachable
nc -zv 192.168.1.3 5437
# Check database is running
ssh nuc "docker ps | grep postgres-g4s8w4csk8s8ocswg48kkogo"
# Restart database on NUC
ssh nuc "docker restart postgres-g4s8w4csk8s8ocswg48kkogo"
```
### Chrome/Scraping issues
```bash
# Check VNC for visual debugging
open http://localhost:6080
# Increase shared memory if crashes
# Edit docker-compose: shm_size: 4gb
```
## Environment Variables
| Variable | Description | Default |
|----------|-------------|---------|
| DATABASE_URL | PostgreSQL connection string | (required) |
| API_BASE_URL | Public API URL | http://localhost:8001 |
| MAX_CONCURRENT_JOBS | Parallel scrape jobs | 5 |
| OPENAI_API_KEY | For ReviewIQ analysis | (optional) |
| ANTHROPIC_API_KEY | For ReviewIQ analysis | (optional) |