whyrating-engine-legacy

Author	SHA1	Message	Date
Alejandro Gutiérrez	acd3b22e88	docs: Add pipeline development artifacts for parallel implementation New artifacts: - ReviewIQ-Pipeline-DevGuide.md: Entry point for pipeline work - ReviewIQ-Pipeline-Contracts-v1.md: Stage I/O specs, validation rules, test fixtures - ReviewIQ-Pipeline-Checklist.md: Per-stage implementation checklists - ReviewIQ-Codebase-Overview.md: File structure, integration points - ReviewIQ-v3.2.1-Taxonomy-Versioning.md: Taxonomy versioning addendum Updated: - ReviewIQ-v32-Decisions.md: Added B2 audit findings, taxonomy versioning decisions, pipeline status These artifacts enable parallel development of pipeline stages 1-4 with: - Independent validation (35 rules across stages) - Clear input/output contracts - Test fixtures for each stage - Definition of done criteria Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 17:08:40 +00:00
Alejandro Gutiérrez	43fd1515d2	Align artifacts with canonical URT v5.1 specification Fixes inconsistencies discovered during audit against urt-taxonomy/: - urt_profile ENUM: Add 'lite' and 'core' profiles (was missing) - USN format: Use canonical regex from spec (was non-compliant) - USN valence encoding: Add V0 (0) and V± (±) support - USN grammar: Add Lite (URT:L:) and Core (URT:C:) formats - Dimension codes: Fix temporal (TC/TR/TH/TF), evidence (ES/EI/EC), comparative (CR-N/CR-B/CR-W/CR-S) in decisions doc - LLM contract: Full USN regex validation pattern Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 16:21:21 +00:00
Alejandro Gutiérrez	46cd54e275	Add LLM Classification Contract v1.0 Defines prompt, output schema, and validation rules for span-level URT classification: - System prompt with span extraction rules - JSON schema for structured output - 4 few-shot examples (multi-span, temporal, comparative) - Structural and semantic validation rules - Error handling with retry + fallback - Performance considerations (token budget, batching, caching) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 16:07:31 +00:00
Alejandro Gutiérrez	39c80fc8be	Phases 5-7: Dashboard UI, Admin API, and Auth middleware Phase 5 - Main Dashboard: - Dashboard overview page with system health stats - Jobs by status breakdown, success rates, top clients - Dashboard API (/api/dashboard/overview, by-client, problems, by-version) Phase 6 - Admin/Scraper Management: - Scrapers management page with traffic allocation UI - Admin API for scraper CRUD operations - Traffic percentage updates for A/B testing - Promote/deprecate scraper versions Phase 7 - Authentication: - API key authentication middleware - SHA-256 key hashing (keys never stored in plain text) - Scope-based authorization (jobs:read, jobs:write, admin) - Rate limiting per API key Also: - Updated api_server_production.py to include new routers - Extended core/database.py with dashboard query methods - Added dashboard link to sidebar navigation - Updated CONTEXT-KEEPER.md to mark all phases complete Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 15:43:00 +00:00
Alejandro Gutiérrez	788ef84756	Phases 2-4: Requester support, batches, webhooks, scraper registry Phase 2 - Requester & Batch Support: - core/database.py: Added create_job params (requester_, batch_, priority, callback_*) - core/database.py: Added batch methods (create_batch, get_batch, update_batch_progress, get_batches) - core/database.py: Added update_job_callback for tracking webhook delivery - api/routes/batches.py: New endpoints: - POST /api/scrape/google-reviews/batch (submit batch) - GET /api/batches (list batches) - GET /api/batches/{id} (batch detail) - DELETE /api/batches/{id} (cancel batch) - api_server_production.py: Updated /api/scrape with requester, priority, callback fields - api_server_production.py: New primary endpoint POST /api/scrape/google-reviews Phase 3 - Webhooks: - services/job_callback_service.py: New service with: - JobCallbackService: send_job_callback, send_batch_callback, retry_failed_callbacks - JobCallbackDispatcher: Background worker for callback monitoring - Payload formats per spec (job.completed, job.failed, batch.completed) - Exponential backoff for retries - Error classification for failure payloads Phase 4 - Scraper Registry: - scrapers/registry.py: Database-backed version routing: - get_scraper(): Version/variant/A/B routing - _get_weighted_scraper(): Traffic-weighted random selection - 60-second TTL cache for performance - register_scraper, deprecate_scraper, update_traffic_allocation - LegacyScraperRegistry preserved for backwards compatibility Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 15:35:58 +00:00
Alejandro Gutiérrez	2412996c54	Phase 1: Database migrations for platform features Migrations created: - 001_add_job_platform_fields.sql: Add 15 new columns to jobs table - Requester tracking (client_id, source, purpose, metadata) - Batch support (batch_id, batch_index) - Execution tracking (job_type, scraper_version, variant, priority) - Webhook callbacks (url, status, sent_at, attempts) - Result summary (JSONB for cross-type dashboard) - 7 indexes for query performance - 5 CHECK constraints for data validation - 002_create_batches_table.sql: Batch job grouping - Tracks batch progress (total/completed/failed) - Batch-level callbacks - Requester association - 003_create_scraper_registry.sql: Scraper version management - Version routing (stable/beta/canary variants) - A/B traffic splitting (traffic_pct) - Priority-based routing - Seeds google_reviews v1.0.0 as stable default - 004_create_api_keys.sql: API authentication - Secure key storage (SHA-256 hashes, not plaintext) - Scopes-based permissions - Rate limiting support - Key lifecycle (expiry, active status) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 15:24:28 +00:00
Alejandro Gutiérrez	544e028c3f	Phase 0: Project restructure to ReviewIQ platform architecture New structure: - scrapers/google_reviews/v1_0_0.py (was modules/scraper_clean.py) - scrapers/base.py (BaseScraper interface) - scrapers/registry.py (ScraperRegistry for version routing) - core/database.py, models.py, config.py, enums.py - utils/logger.py, crash_analyzer.py, health_checks.py, helpers.py, date_converter.py - workers/chrome_pool.py - services/webhook_service.py - api/ routes structure (empty, ready for Phase 2) - tests/ structure mirroring source All imports updated in: - api_server_production.py (7 import paths updated) - utils/health_checks.py (scraper import path) Legacy modules moved to modules/_legacy/: - data_storage.py, image_handler.py, s3_handler.py (unused) Syntax verified, frontend build passing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 15:22:08 +00:00
Alejandro Gutiérrez	bb0291f265	Add CONTEXT-KEEPER.md for conversation continuity Quick-reference document for resuming work after context compaction. Contains: project overview, current state, spec summary, phases, key decisions, file locations, and resumption instructions. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 15:14:01 +00:00
Alejandro Gutiérrez	12d37e350b	Fix JobDevTools contrast + log normalization, add Platform Spec - Fix contrast issues in JobDevTools (level badges, text colors, timestamps) - Make log normalization more robust (handles old/new formats, edge cases) - Add ReviewIQ Platform Spec v1.2 defining: - Multi-tenant scraping-as-a-service architecture - Requester metadata, batches, webhooks, priority - Scraper versioning with A/B testing (stable/beta/canary) - API endpoints for job types, dashboard, admin - Output schemas for external service integration - Project structure reorganization plan Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 15:13:19 +00:00
Alejandro Gutiérrez	f99827717f	Final polish: v3.1.2 operational safety constraints - Add chk_dedup_scoped constraint enforcing tenant-scoped dedup format - Filter location_type='owned' in populate_facts() for 'ALL' rollup - Document competitor exclusion from 'ALL' sentinel rollups - Add explicit comments in aggregation code for maintainability Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 12:55:31 +00:00
Alejandro Gutiérrez	3987a9ab4e	Document v3.1.2 conventions: dedup scoping and sentinel values Two micro-risk mitigations documented: 1. dedup_group_id: Format "{business_id}:{hash}" to prevent cross-tenant collision on similar reviews. 2. Sentinel conventions: 'ALL' (spatial) vs 'all' (semantic). Case matters — do not normalize. Spec frozen as v3.1.2. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 12:50:29 +00:00
Alejandro Gutiérrez	9515dd2d42	Polish ReviewIQ v3.1.2: tenant-scoping and FK integrity Final fixes for production-ready spec: 1. locations.location_type: Added 'owned'\|'competitor' flag. Competitors now inserted into locations (preserves FK integrity). 2. Competitor fact query: Added business_id filter to prevent cross-tenant contamination when same competitor tracked by multiple customers. 3. issue_events versioning: Added source + review_version columns for complete review reference in audit log. 4. Enrichment tenant-scoping: business_id now passed from ingest job (not looked up). Validates place_id exists under tenant. 5. Footer: Fixed version string v3.1.1 → v3.1.2. Status: Ship-ready specification. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 12:34:35 +00:00
Alejandro Gutiérrez	44d017b3f7	Finalize ReviewIQ Architecture v3.1.2 (production-ready) Three final fixes applied: 1. issue_spans versioning: Added source + review_version columns with FK to reviews_enriched(source, review_id, review_version). Spans now correctly reference the exact review version. 2. Competitor business_id rule: Clarified that competitor reviews use customer's business_id + competitor's place_id (not NULL). Keeps facts and joins working without special-case logic. 3. Trust-weighted facts: Clarified trust_weighted_* columns are reserved but not populated in v3.1. Trust scoring applies to issue priority only. Aggregation deferred to v3.2. Status: Production-grade architecture specification. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 12:31:16 +00:00
Alejandro Gutiérrez	d43c574b0c	Add ReviewIQ Architecture v3.1.1 specification Complete pipeline architecture for Google Reviews intelligence: - Versioned reviews_enriched with (source, review_id, version) PK - Tenant-scoped locations with (business_id, place_id) PK - Relational issue_spans replacing array aggregation - Unified fact_timeseries spine with 'ALL' sentinel for rollups - Clean competitor model (separate table, no fake business_ids) - Trust scoring and dedup support - KPI-ready join keys Reviewed and fixed: PK for edited reviews, multi-tenant overlap, param ordering bugs, fact population scope, entity field deferral. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 12:25:46 +00:00
Alejandro Gutiérrez	3da243be79	Add ReviewIQ pipeline spec and metadata extraction test - reviewiq-pipeline-v1-final.md: Earlier pipeline specification - test_metadata_extraction.py: Test script for metadata extraction Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 11:21:33 +00:00
Alejandro Gutiérrez	59368a5bd5	Add Job DevTools implementation task breakdown 18 tasks organized in 5 parallel tracks: - Track A: Backend logging infrastructure (4 tasks) - Track B: Frontend log viewer (5 tasks) - Track C: Crash analysis (4 tasks) - Track D: Session & metrics (3 tasks) - Track E: Review topics (2 tasks) Includes dependency graph and 7-wave execution plan for parallel AI agent workflow. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 11:14:02 +00:00
Alejandro Gutiérrez	65fcaf43e8	Add Job DevTools specification document Comprehensive spec for observability suite including: - Structured logging system with categories - Crash intelligence and pattern analysis - Copy/export functionality - Session fingerprint panel - Real-time metrics dashboard - Review topics inference Organized by priority (P0-P3) with parallel implementation tracks. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-24 11:10:34 +00:00

17 Commits