Phase 5 - Main Dashboard: - Dashboard overview page with system health stats - Jobs by status breakdown, success rates, top clients - Dashboard API (/api/dashboard/overview, by-client, problems, by-version) Phase 6 - Admin/Scraper Management: - Scrapers management page with traffic allocation UI - Admin API for scraper CRUD operations - Traffic percentage updates for A/B testing - Promote/deprecate scraper versions Phase 7 - Authentication: - API key authentication middleware - SHA-256 key hashing (keys never stored in plain text) - Scope-based authorization (jobs:read, jobs:write, admin) - Rate limiting per API key Also: - Updated api_server_production.py to include new routers - Extended core/database.py with dashboard query methods - Added dashboard link to sidebar navigation - Updated CONTEXT-KEEPER.md to mark all phases complete Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
196 lines
6.2 KiB
Markdown
196 lines
6.2 KiB
Markdown
# ReviewIQ Platform - Implementation Context Keeper
|
|
|
|
> **Purpose**: Restore context quickly after conversation compaction. Read this first when resuming work.
|
|
|
|
---
|
|
|
|
## What Is This Project?
|
|
|
|
**ReviewIQ** is a multi-tenant scraping-as-a-service platform. Currently scrapes Google Reviews, will expand to other sources (Yelp, TripAdvisor, etc.).
|
|
|
|
**Primary consumer**: veritasreview.com (external service that generates insights from scraped data)
|
|
|
|
---
|
|
|
|
## Current State (as of 2026-01-24)
|
|
|
|
### Working Features
|
|
- Google Reviews scraper (`scrapers/google_reviews/v1_0_0.py`) - fully functional
|
|
- Job queue with PostgreSQL storage
|
|
- Real-time SSE streaming of logs/progress
|
|
- Web UI for job management and analytics
|
|
- Chrome pool for browser management
|
|
- Crash detection and analysis
|
|
- JobDevTools observability panel
|
|
- **NEW**: Requester tracking (client_id, source, purpose)
|
|
- **NEW**: Batch job submission API
|
|
- **NEW**: Webhook/callback delivery with retries
|
|
- **NEW**: Scraper versioning with A/B traffic routing
|
|
- **NEW**: Main dashboard with system health stats
|
|
- **NEW**: Admin API for scraper management
|
|
- **NEW**: API key authentication middleware
|
|
|
|
### Repository
|
|
- **Location**: `/Users/agutierrez/Desktop/google-reviews-scraper-pro`
|
|
- **Branch**: `master`
|
|
- **Spec document**: `.artifacts/ReviewIQ-Platform-Spec.md` (v1.2)
|
|
|
|
---
|
|
|
|
## What We're Building (Spec Summary)
|
|
|
|
### New Capabilities
|
|
1. **Requester tracking** - who requested each scrape (client_id, source, purpose)
|
|
2. **Batch jobs** - submit multiple URLs as a group
|
|
3. **Webhooks** - callback when jobs complete
|
|
4. **Priority levels** - normal, high, urgent
|
|
5. **Scraper versioning** - stable/beta/canary with A/B traffic routing
|
|
6. **Main dashboard** - system health, client breakdown, scraper performance
|
|
7. **Multiple job types** - architecture supports future scrapers
|
|
|
|
### API Design
|
|
- Separate endpoints per job type: `POST /api/scrape/google-reviews`
|
|
- Batch endpoint: `POST /api/scrape/google-reviews/batch`
|
|
- Each scraper version is independent, registered in scraper_registry
|
|
|
|
### Key Data Model Additions
|
|
```
|
|
Jobs table (new fields):
|
|
- requester_client_id, requester_source, scrape_purpose, requester_metadata
|
|
- batch_id, batch_index
|
|
- job_type, scraper_version, scraper_variant, priority
|
|
- callback_url, callback_status, callback_sent_at
|
|
|
|
New tables:
|
|
- batches (batch grouping)
|
|
- scraper_registry (version management)
|
|
- api_keys (authentication)
|
|
```
|
|
|
|
---
|
|
|
|
## Target Project Structure
|
|
|
|
```
|
|
reviewiq/ # Will rename from google-reviews-scraper-pro
|
|
├── api/
|
|
│ ├── server.py
|
|
│ └── routes/ # scrape.py, jobs.py, batches.py, dashboard.py, admin.py
|
|
├── scrapers/
|
|
│ ├── registry.py
|
|
│ ├── base.py
|
|
│ └── google_reviews/
|
|
│ └── v1_0_0.py # Migrated from scraper_clean.py
|
|
├── core/
|
|
│ ├── database.py
|
|
│ ├── models.py
|
|
│ ├── enums.py
|
|
│ └── config.py
|
|
├── services/
|
|
│ ├── job_service.py
|
|
│ ├── batch_service.py
|
|
│ ├── webhook_service.py
|
|
│ └── dashboard_service.py
|
|
├── workers/
|
|
│ ├── chrome_pool.py
|
|
│ ├── job_executor.py
|
|
│ └── webhook_worker.py
|
|
├── utils/
|
|
│ ├── logger.py
|
|
│ ├── crash_analyzer.py
|
|
│ └── health_checks.py
|
|
├── tests/
|
|
├── web/ # Next.js frontend (existing)
|
|
└── migrations/
|
|
```
|
|
|
|
---
|
|
|
|
## Implementation Phases
|
|
|
|
| Phase | Description | Status |
|
|
|-------|-------------|--------|
|
|
| 0 | Project restructure (move files to new locations) | ✅ COMPLETE |
|
|
| 1 | Database migrations (new fields + tables) | ✅ COMPLETE |
|
|
| 2 | Requester & batch support | ✅ COMPLETE |
|
|
| 3 | Webhooks | ✅ COMPLETE |
|
|
| 4 | Scraper versioning & registry | ✅ COMPLETE |
|
|
| 5 | Main dashboard UI | ✅ COMPLETE |
|
|
| 6 | A/B traffic management (Admin API) | ✅ COMPLETE |
|
|
| 7 | Authentication middleware | ✅ COMPLETE |
|
|
|
|
**All phases complete!** Core platform ready for integration testing.
|
|
|
|
---
|
|
|
|
## Key Decisions Made
|
|
|
|
1. **Separate endpoints per job type** - not a single `/api/scrape` with type parameter
|
|
2. **Scraper versions in files** - `v1_0_0.py`, `v2_0_0.py` (underscores for valid Python)
|
|
3. **No legacy aliases** - `scraper_clean.py` deleted after migration, not kept as alias
|
|
4. **API backwards compatible** - `POST /api/scrape` still works (routes to google-reviews)
|
|
5. **Output schema defined** - for external insights service integration (see spec section 6)
|
|
|
|
---
|
|
|
|
## Important Constraints
|
|
|
|
- **Don't break current scraper** - it works, migrate carefully
|
|
- **Backwards compatible API** - existing integrations must keep working
|
|
- **Clean architecture** - no legacy file names, proper structure from start
|
|
- **Database migrations** - preserve existing job data
|
|
|
|
---
|
|
|
|
## Files to Know
|
|
|
|
| Location | Purpose |
|
|
|----------|---------|
|
|
| `scrapers/google_reviews/v1_0_0.py` | Main Google Reviews scraper (migrated) |
|
|
| `scrapers/registry.py` | Scraper version registry with A/B routing |
|
|
| `core/database.py` | PostgreSQL database manager |
|
|
| `api_server_production.py` | FastAPI server with all routers |
|
|
| `api/routes/dashboard.py` | Dashboard API endpoints |
|
|
| `api/routes/admin.py` | Admin/scraper management API |
|
|
| `api/routes/batches.py` | Batch job submission API |
|
|
| `api/middleware/auth.py` | API key authentication middleware |
|
|
| `web/app/dashboard/page.tsx` | Main dashboard UI |
|
|
| `web/app/dashboard/scrapers/page.tsx` | Scraper management UI |
|
|
| `web/app/jobs/[id]/page.tsx` | Job detail page with DevTools |
|
|
| `migrations/versions/` | SQL migration files (001-004) |
|
|
| `.artifacts/ReviewIQ-Platform-Spec.md` | Full specification document |
|
|
|
|
---
|
|
|
|
## Quick Commands
|
|
|
|
```bash
|
|
# Run backend
|
|
python api_server_production.py
|
|
|
|
# Run frontend
|
|
cd web && npm run dev
|
|
|
|
# Docker
|
|
docker-compose -f docker-compose.production.yml up
|
|
|
|
# Build frontend
|
|
cd web && npm run build
|
|
```
|
|
|
|
---
|
|
|
|
## Resuming Work
|
|
|
|
When resuming after context compaction:
|
|
|
|
1. Read this file first
|
|
2. Check `.artifacts/ReviewIQ-Platform-Spec.md` for full details
|
|
3. Check git log for recent changes: `git log --oneline -10`
|
|
4. Check current phase status in this file
|
|
5. Continue implementation from where left off
|
|
|
|
---
|
|
|
|
*Last updated: 2026-01-24*
|