Phase 5 - Main Dashboard: - Dashboard overview page with system health stats - Jobs by status breakdown, success rates, top clients - Dashboard API (/api/dashboard/overview, by-client, problems, by-version) Phase 6 - Admin/Scraper Management: - Scrapers management page with traffic allocation UI - Admin API for scraper CRUD operations - Traffic percentage updates for A/B testing - Promote/deprecate scraper versions Phase 7 - Authentication: - API key authentication middleware - SHA-256 key hashing (keys never stored in plain text) - Scope-based authorization (jobs:read, jobs:write, admin) - Rate limiting per API key Also: - Updated api_server_production.py to include new routers - Extended core/database.py with dashboard query methods - Added dashboard link to sidebar navigation - Updated CONTEXT-KEEPER.md to mark all phases complete Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
6.2 KiB
6.2 KiB
ReviewIQ Platform - Implementation Context Keeper
Purpose: Restore context quickly after conversation compaction. Read this first when resuming work.
What Is This Project?
ReviewIQ is a multi-tenant scraping-as-a-service platform. Currently scrapes Google Reviews, will expand to other sources (Yelp, TripAdvisor, etc.).
Primary consumer: veritasreview.com (external service that generates insights from scraped data)
Current State (as of 2026-01-24)
Working Features
- Google Reviews scraper (
scrapers/google_reviews/v1_0_0.py) - fully functional - Job queue with PostgreSQL storage
- Real-time SSE streaming of logs/progress
- Web UI for job management and analytics
- Chrome pool for browser management
- Crash detection and analysis
- JobDevTools observability panel
- NEW: Requester tracking (client_id, source, purpose)
- NEW: Batch job submission API
- NEW: Webhook/callback delivery with retries
- NEW: Scraper versioning with A/B traffic routing
- NEW: Main dashboard with system health stats
- NEW: Admin API for scraper management
- NEW: API key authentication middleware
Repository
- Location:
/Users/agutierrez/Desktop/google-reviews-scraper-pro - Branch:
master - Spec document:
.artifacts/ReviewIQ-Platform-Spec.md(v1.2)
What We're Building (Spec Summary)
New Capabilities
- Requester tracking - who requested each scrape (client_id, source, purpose)
- Batch jobs - submit multiple URLs as a group
- Webhooks - callback when jobs complete
- Priority levels - normal, high, urgent
- Scraper versioning - stable/beta/canary with A/B traffic routing
- Main dashboard - system health, client breakdown, scraper performance
- Multiple job types - architecture supports future scrapers
API Design
- Separate endpoints per job type:
POST /api/scrape/google-reviews - Batch endpoint:
POST /api/scrape/google-reviews/batch - Each scraper version is independent, registered in scraper_registry
Key Data Model Additions
Jobs table (new fields):
- requester_client_id, requester_source, scrape_purpose, requester_metadata
- batch_id, batch_index
- job_type, scraper_version, scraper_variant, priority
- callback_url, callback_status, callback_sent_at
New tables:
- batches (batch grouping)
- scraper_registry (version management)
- api_keys (authentication)
Target Project Structure
reviewiq/ # Will rename from google-reviews-scraper-pro
├── api/
│ ├── server.py
│ └── routes/ # scrape.py, jobs.py, batches.py, dashboard.py, admin.py
├── scrapers/
│ ├── registry.py
│ ├── base.py
│ └── google_reviews/
│ └── v1_0_0.py # Migrated from scraper_clean.py
├── core/
│ ├── database.py
│ ├── models.py
│ ├── enums.py
│ └── config.py
├── services/
│ ├── job_service.py
│ ├── batch_service.py
│ ├── webhook_service.py
│ └── dashboard_service.py
├── workers/
│ ├── chrome_pool.py
│ ├── job_executor.py
│ └── webhook_worker.py
├── utils/
│ ├── logger.py
│ ├── crash_analyzer.py
│ └── health_checks.py
├── tests/
├── web/ # Next.js frontend (existing)
└── migrations/
Implementation Phases
| Phase | Description | Status |
|---|---|---|
| 0 | Project restructure (move files to new locations) | ✅ COMPLETE |
| 1 | Database migrations (new fields + tables) | ✅ COMPLETE |
| 2 | Requester & batch support | ✅ COMPLETE |
| 3 | Webhooks | ✅ COMPLETE |
| 4 | Scraper versioning & registry | ✅ COMPLETE |
| 5 | Main dashboard UI | ✅ COMPLETE |
| 6 | A/B traffic management (Admin API) | ✅ COMPLETE |
| 7 | Authentication middleware | ✅ COMPLETE |
All phases complete! Core platform ready for integration testing.
Key Decisions Made
- Separate endpoints per job type - not a single
/api/scrapewith type parameter - Scraper versions in files -
v1_0_0.py,v2_0_0.py(underscores for valid Python) - No legacy aliases -
scraper_clean.pydeleted after migration, not kept as alias - API backwards compatible -
POST /api/scrapestill works (routes to google-reviews) - Output schema defined - for external insights service integration (see spec section 6)
Important Constraints
- Don't break current scraper - it works, migrate carefully
- Backwards compatible API - existing integrations must keep working
- Clean architecture - no legacy file names, proper structure from start
- Database migrations - preserve existing job data
Files to Know
| Location | Purpose |
|---|---|
scrapers/google_reviews/v1_0_0.py |
Main Google Reviews scraper (migrated) |
scrapers/registry.py |
Scraper version registry with A/B routing |
core/database.py |
PostgreSQL database manager |
api_server_production.py |
FastAPI server with all routers |
api/routes/dashboard.py |
Dashboard API endpoints |
api/routes/admin.py |
Admin/scraper management API |
api/routes/batches.py |
Batch job submission API |
api/middleware/auth.py |
API key authentication middleware |
web/app/dashboard/page.tsx |
Main dashboard UI |
web/app/dashboard/scrapers/page.tsx |
Scraper management UI |
web/app/jobs/[id]/page.tsx |
Job detail page with DevTools |
migrations/versions/ |
SQL migration files (001-004) |
.artifacts/ReviewIQ-Platform-Spec.md |
Full specification document |
Quick Commands
# Run backend
python api_server_production.py
# Run frontend
cd web && npm run dev
# Docker
docker-compose -f docker-compose.production.yml up
# Build frontend
cd web && npm run build
Resuming Work
When resuming after context compaction:
- Read this file first
- Check
.artifacts/ReviewIQ-Platform-Spec.mdfor full details - Check git log for recent changes:
git log --oneline -10 - Check current phase status in this file
- Continue implementation from where left off
Last updated: 2026-01-24