Files
whyrating-engine-legacy/.artifacts/CONTEXT-KEEPER.md
Alejandro Gutiérrez 39c80fc8be Phases 5-7: Dashboard UI, Admin API, and Auth middleware
Phase 5 - Main Dashboard:
- Dashboard overview page with system health stats
- Jobs by status breakdown, success rates, top clients
- Dashboard API (/api/dashboard/overview, by-client, problems, by-version)

Phase 6 - Admin/Scraper Management:
- Scrapers management page with traffic allocation UI
- Admin API for scraper CRUD operations
- Traffic percentage updates for A/B testing
- Promote/deprecate scraper versions

Phase 7 - Authentication:
- API key authentication middleware
- SHA-256 key hashing (keys never stored in plain text)
- Scope-based authorization (jobs:read, jobs:write, admin)
- Rate limiting per API key

Also:
- Updated api_server_production.py to include new routers
- Extended core/database.py with dashboard query methods
- Added dashboard link to sidebar navigation
- Updated CONTEXT-KEEPER.md to mark all phases complete

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 15:43:00 +00:00

6.2 KiB

ReviewIQ Platform - Implementation Context Keeper

Purpose: Restore context quickly after conversation compaction. Read this first when resuming work.


What Is This Project?

ReviewIQ is a multi-tenant scraping-as-a-service platform. Currently scrapes Google Reviews, will expand to other sources (Yelp, TripAdvisor, etc.).

Primary consumer: veritasreview.com (external service that generates insights from scraped data)


Current State (as of 2026-01-24)

Working Features

  • Google Reviews scraper (scrapers/google_reviews/v1_0_0.py) - fully functional
  • Job queue with PostgreSQL storage
  • Real-time SSE streaming of logs/progress
  • Web UI for job management and analytics
  • Chrome pool for browser management
  • Crash detection and analysis
  • JobDevTools observability panel
  • NEW: Requester tracking (client_id, source, purpose)
  • NEW: Batch job submission API
  • NEW: Webhook/callback delivery with retries
  • NEW: Scraper versioning with A/B traffic routing
  • NEW: Main dashboard with system health stats
  • NEW: Admin API for scraper management
  • NEW: API key authentication middleware

Repository

  • Location: /Users/agutierrez/Desktop/google-reviews-scraper-pro
  • Branch: master
  • Spec document: .artifacts/ReviewIQ-Platform-Spec.md (v1.2)

What We're Building (Spec Summary)

New Capabilities

  1. Requester tracking - who requested each scrape (client_id, source, purpose)
  2. Batch jobs - submit multiple URLs as a group
  3. Webhooks - callback when jobs complete
  4. Priority levels - normal, high, urgent
  5. Scraper versioning - stable/beta/canary with A/B traffic routing
  6. Main dashboard - system health, client breakdown, scraper performance
  7. Multiple job types - architecture supports future scrapers

API Design

  • Separate endpoints per job type: POST /api/scrape/google-reviews
  • Batch endpoint: POST /api/scrape/google-reviews/batch
  • Each scraper version is independent, registered in scraper_registry

Key Data Model Additions

Jobs table (new fields):
- requester_client_id, requester_source, scrape_purpose, requester_metadata
- batch_id, batch_index
- job_type, scraper_version, scraper_variant, priority
- callback_url, callback_status, callback_sent_at

New tables:
- batches (batch grouping)
- scraper_registry (version management)
- api_keys (authentication)

Target Project Structure

reviewiq/                          # Will rename from google-reviews-scraper-pro
├── api/
│   ├── server.py
│   └── routes/                    # scrape.py, jobs.py, batches.py, dashboard.py, admin.py
├── scrapers/
│   ├── registry.py
│   ├── base.py
│   └── google_reviews/
│       └── v1_0_0.py              # Migrated from scraper_clean.py
├── core/
│   ├── database.py
│   ├── models.py
│   ├── enums.py
│   └── config.py
├── services/
│   ├── job_service.py
│   ├── batch_service.py
│   ├── webhook_service.py
│   └── dashboard_service.py
├── workers/
│   ├── chrome_pool.py
│   ├── job_executor.py
│   └── webhook_worker.py
├── utils/
│   ├── logger.py
│   ├── crash_analyzer.py
│   └── health_checks.py
├── tests/
├── web/                           # Next.js frontend (existing)
└── migrations/

Implementation Phases

Phase Description Status
0 Project restructure (move files to new locations) COMPLETE
1 Database migrations (new fields + tables) COMPLETE
2 Requester & batch support COMPLETE
3 Webhooks COMPLETE
4 Scraper versioning & registry COMPLETE
5 Main dashboard UI COMPLETE
6 A/B traffic management (Admin API) COMPLETE
7 Authentication middleware COMPLETE

All phases complete! Core platform ready for integration testing.


Key Decisions Made

  1. Separate endpoints per job type - not a single /api/scrape with type parameter
  2. Scraper versions in files - v1_0_0.py, v2_0_0.py (underscores for valid Python)
  3. No legacy aliases - scraper_clean.py deleted after migration, not kept as alias
  4. API backwards compatible - POST /api/scrape still works (routes to google-reviews)
  5. Output schema defined - for external insights service integration (see spec section 6)

Important Constraints

  • Don't break current scraper - it works, migrate carefully
  • Backwards compatible API - existing integrations must keep working
  • Clean architecture - no legacy file names, proper structure from start
  • Database migrations - preserve existing job data

Files to Know

Location Purpose
scrapers/google_reviews/v1_0_0.py Main Google Reviews scraper (migrated)
scrapers/registry.py Scraper version registry with A/B routing
core/database.py PostgreSQL database manager
api_server_production.py FastAPI server with all routers
api/routes/dashboard.py Dashboard API endpoints
api/routes/admin.py Admin/scraper management API
api/routes/batches.py Batch job submission API
api/middleware/auth.py API key authentication middleware
web/app/dashboard/page.tsx Main dashboard UI
web/app/dashboard/scrapers/page.tsx Scraper management UI
web/app/jobs/[id]/page.tsx Job detail page with DevTools
migrations/versions/ SQL migration files (001-004)
.artifacts/ReviewIQ-Platform-Spec.md Full specification document

Quick Commands

# Run backend
python api_server_production.py

# Run frontend
cd web && npm run dev

# Docker
docker-compose -f docker-compose.production.yml up

# Build frontend
cd web && npm run build

Resuming Work

When resuming after context compaction:

  1. Read this file first
  2. Check .artifacts/ReviewIQ-Platform-Spec.md for full details
  3. Check git log for recent changes: git log --oneline -10
  4. Check current phase status in this file
  5. Continue implementation from where left off

Last updated: 2026-01-24