# ReviewIQ Platform - Implementation Context Keeper > **Purpose**: Restore context quickly after conversation compaction. Read this first when resuming work. --- ## What Is This Project? **ReviewIQ** is a multi-tenant scraping-as-a-service platform. Currently scrapes Google Reviews, will expand to other sources (Yelp, TripAdvisor, etc.). **Primary consumer**: veritasreview.com (external service that generates insights from scraped data) --- ## Current State (as of 2025-01-24) ### Working Features - Google Reviews scraper (`modules/scraper_clean.py`) - fully functional - Job queue with PostgreSQL storage - Real-time SSE streaming of logs/progress - Web UI for job management and analytics - Chrome pool for browser management - Crash detection and analysis - JobDevTools observability panel ### Repository - **Location**: `/Users/agutierrez/Desktop/google-reviews-scraper-pro` - **Branch**: `master` - **Spec document**: `.artifacts/ReviewIQ-Platform-Spec.md` (v1.2) --- ## What We're Building (Spec Summary) ### New Capabilities 1. **Requester tracking** - who requested each scrape (client_id, source, purpose) 2. **Batch jobs** - submit multiple URLs as a group 3. **Webhooks** - callback when jobs complete 4. **Priority levels** - normal, high, urgent 5. **Scraper versioning** - stable/beta/canary with A/B traffic routing 6. **Main dashboard** - system health, client breakdown, scraper performance 7. **Multiple job types** - architecture supports future scrapers ### API Design - Separate endpoints per job type: `POST /api/scrape/google-reviews` - Batch endpoint: `POST /api/scrape/google-reviews/batch` - Each scraper version is independent, registered in scraper_registry ### Key Data Model Additions ``` Jobs table (new fields): - requester_client_id, requester_source, scrape_purpose, requester_metadata - batch_id, batch_index - job_type, scraper_version, scraper_variant, priority - callback_url, callback_status, callback_sent_at New tables: - batches (batch grouping) - scraper_registry (version management) - api_keys (authentication) ``` --- ## Target Project Structure ``` reviewiq/ # Will rename from google-reviews-scraper-pro ├── api/ │ ├── server.py │ └── routes/ # scrape.py, jobs.py, batches.py, dashboard.py, admin.py ├── scrapers/ │ ├── registry.py │ ├── base.py │ └── google_reviews/ │ └── v1_0_0.py # Migrated from scraper_clean.py ├── core/ │ ├── database.py │ ├── models.py │ ├── enums.py │ └── config.py ├── services/ │ ├── job_service.py │ ├── batch_service.py │ ├── webhook_service.py │ └── dashboard_service.py ├── workers/ │ ├── chrome_pool.py │ ├── job_executor.py │ └── webhook_worker.py ├── utils/ │ ├── logger.py │ ├── crash_analyzer.py │ └── health_checks.py ├── tests/ ├── web/ # Next.js frontend (existing) └── migrations/ ``` --- ## Implementation Phases | Phase | Description | Status | |-------|-------------|--------| | 0 | Project restructure (move files to new locations) | ✅ COMPLETE | | 1 | Database migrations (new fields + tables) | ✅ COMPLETE | | 2 | Requester & batch support | Not started | | 3 | Webhooks | Not started | | 4 | Scraper versioning & registry | Not started | | 5 | Main dashboard UI | Not started | | 6 | A/B traffic management | Not started | | 7 | Authentication (API keys) | Not started | **Phase 0 must complete first.** Then phases 1-5 can parallelize. --- ## Key Decisions Made 1. **Separate endpoints per job type** - not a single `/api/scrape` with type parameter 2. **Scraper versions in files** - `v1_0_0.py`, `v2_0_0.py` (underscores for valid Python) 3. **No legacy aliases** - `scraper_clean.py` deleted after migration, not kept as alias 4. **API backwards compatible** - `POST /api/scrape` still works (routes to google-reviews) 5. **Output schema defined** - for external insights service integration (see spec section 6) --- ## Important Constraints - **Don't break current scraper** - it works, migrate carefully - **Backwards compatible API** - existing integrations must keep working - **Clean architecture** - no legacy file names, proper structure from start - **Database migrations** - preserve existing job data --- ## Files to Know | Current Location | Purpose | |------------------|---------| | `modules/scraper_clean.py` | Main Google Reviews scraper (96KB) | | `modules/database.py` | PostgreSQL database manager | | `api_server_production.py` | FastAPI server (will be split into api/) | | `web/app/jobs/[id]/page.tsx` | Job detail page with DevTools | | `.artifacts/ReviewIQ-Platform-Spec.md` | Full specification document | --- ## Quick Commands ```bash # Run backend python api_server_production.py # Run frontend cd web && npm run dev # Docker docker-compose -f docker-compose.production.yml up # Build frontend cd web && npm run build ``` --- ## Resuming Work When resuming after context compaction: 1. Read this file first 2. Check `.artifacts/ReviewIQ-Platform-Spec.md` for full details 3. Check git log for recent changes: `git log --oneline -10` 4. Check current phase status in this file 5. Continue implementation from where left off --- *Last updated: 2025-01-24*