Add CONTEXT-KEEPER.md for conversation continuity

Quick-reference document for resuming work after context compaction. Contains: project overview, current state, spec summary, phases, key decisions, file locations, and resumption instructions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 15:14:01 +00:00
parent 12d37e350b
commit bb0291f265
1 changed files with 180 additions and 0 deletions
--- a/.artifacts/CONTEXT-KEEPER.md
+++ b/.artifacts/CONTEXT-KEEPER.md
@@ -0,0 +1,180 @@
+# ReviewIQ Platform - Implementation Context Keeper
+
+> **Purpose**: Restore context quickly after conversation compaction. Read this first when resuming work.
+
+---
+
+## What Is This Project?
+
+**ReviewIQ** is a multi-tenant scraping-as-a-service platform. Currently scrapes Google Reviews, will expand to other sources (Yelp, TripAdvisor, etc.).
+
+**Primary consumer**: veritasreview.com (external service that generates insights from scraped data)
+
+---
+
+## Current State (as of 2025-01-24)
+
+### Working Features
+- Google Reviews scraper (`modules/scraper_clean.py`) - fully functional
+- Job queue with PostgreSQL storage
+- Real-time SSE streaming of logs/progress
+- Web UI for job management and analytics
+- Chrome pool for browser management
+- Crash detection and analysis
+- JobDevTools observability panel
+
+### Repository
+- **Location**: `/Users/agutierrez/Desktop/google-reviews-scraper-pro`
+- **Branch**: `master`
+- **Spec document**: `.artifacts/ReviewIQ-Platform-Spec.md` (v1.2)
+
+---
+
+## What We're Building (Spec Summary)
+
+### New Capabilities
+1. **Requester tracking** - who requested each scrape (client_id, source, purpose)
+2. **Batch jobs** - submit multiple URLs as a group
+3. **Webhooks** - callback when jobs complete
+4. **Priority levels** - normal, high, urgent
+5. **Scraper versioning** - stable/beta/canary with A/B traffic routing
+6. **Main dashboard** - system health, client breakdown, scraper performance
+7. **Multiple job types** - architecture supports future scrapers
+
+### API Design
+- Separate endpoints per job type: `POST /api/scrape/google-reviews`
+- Batch endpoint: `POST /api/scrape/google-reviews/batch`
+- Each scraper version is independent, registered in scraper_registry
+
+### Key Data Model Additions
+```
+Jobs table (new fields):
+- requester_client_id, requester_source, scrape_purpose, requester_metadata
+- batch_id, batch_index
+- job_type, scraper_version, scraper_variant, priority
+- callback_url, callback_status, callback_sent_at
+
+New tables:
+- batches (batch grouping)
+- scraper_registry (version management)
+- api_keys (authentication)
+```
+
+---
+
+## Target Project Structure
+
+```
+reviewiq/                          # Will rename from google-reviews-scraper-pro
+├── api/
+│   ├── server.py
+│   └── routes/                    # scrape.py, jobs.py, batches.py, dashboard.py, admin.py
+├── scrapers/
+│   ├── registry.py
+│   ├── base.py
+│   └── google_reviews/
+│       └── v1_0_0.py              # Migrated from scraper_clean.py
+├── core/
+│   ├── database.py
+│   ├── models.py
+│   ├── enums.py
+│   └── config.py
+├── services/
+│   ├── job_service.py
+│   ├── batch_service.py
+│   ├── webhook_service.py
+│   └── dashboard_service.py
+├── workers/
+│   ├── chrome_pool.py
+│   ├── job_executor.py
+│   └── webhook_worker.py
+├── utils/
+│   ├── logger.py
+│   ├── crash_analyzer.py
+│   └── health_checks.py
+├── tests/
+├── web/                           # Next.js frontend (existing)
+└── migrations/
+```
+
+---
+
+## Implementation Phases
+
+| Phase | Description | Status |
+|-------|-------------|--------|
+| 0 | Project restructure (move files to new locations) | Not started |
+| 1 | Database migrations (new fields + tables) | Not started |
+| 2 | Requester & batch support | Not started |
+| 3 | Webhooks | Not started |
+| 4 | Scraper versioning & registry | Not started |
+| 5 | Main dashboard UI | Not started |
+| 6 | A/B traffic management | Not started |
+| 7 | Authentication (API keys) | Not started |
+
+**Phase 0 must complete first.** Then phases 1-5 can parallelize.
+
+---
+
+## Key Decisions Made
+
+1. **Separate endpoints per job type** - not a single `/api/scrape` with type parameter
+2. **Scraper versions in files** - `v1_0_0.py`, `v2_0_0.py` (underscores for valid Python)
+3. **No legacy aliases** - `scraper_clean.py` deleted after migration, not kept as alias
+4. **API backwards compatible** - `POST /api/scrape` still works (routes to google-reviews)
+5. **Output schema defined** - for external insights service integration (see spec section 6)
+
+---
+
+## Important Constraints
+
+- **Don't break current scraper** - it works, migrate carefully
+- **Backwards compatible API** - existing integrations must keep working
+- **Clean architecture** - no legacy file names, proper structure from start
+- **Database migrations** - preserve existing job data
+
+---
+
+## Files to Know
+
+| Current Location | Purpose |
+|------------------|---------|
+| `modules/scraper_clean.py` | Main Google Reviews scraper (96KB) |
+| `modules/database.py` | PostgreSQL database manager |
+| `api_server_production.py` | FastAPI server (will be split into api/) |
+| `web/app/jobs/[id]/page.tsx` | Job detail page with DevTools |
+| `.artifacts/ReviewIQ-Platform-Spec.md` | Full specification document |
+
+---
+
+## Quick Commands
+
+```bash
+# Run backend
+python api_server_production.py
+
+# Run frontend
+cd web && npm run dev
+
+# Docker
+docker-compose -f docker-compose.production.yml up
+
+# Build frontend
+cd web && npm run build
+```
+
+---
+
+## Resuming Work
+
+When resuming after context compaction:
+
+1. Read this file first
+2. Check `.artifacts/ReviewIQ-Platform-Spec.md` for full details
+3. Check git log for recent changes: `git log --oneline -10`
+4. Check current phase status in this file
+5. Continue implementation from where left off
+
+---
+
+*Last updated: 2025-01-24*