Phase 0: Project restructure to ReviewIQ platform architecture

New structure: - scrapers/google_reviews/v1_0_0.py (was modules/scraper_clean.py) - scrapers/base.py (BaseScraper interface) - scrapers/registry.py (ScraperRegistry for version routing) - core/database.py, models.py, config.py, enums.py - utils/logger.py, crash_analyzer.py, health_checks.py, helpers.py, date_converter.py - workers/chrome_pool.py - services/webhook_service.py - api/ routes structure (empty, ready for Phase 2) - tests/ structure mirroring source All imports updated in: - api_server_production.py (7 import paths updated) - utils/health_checks.py (scraper import path) Legacy modules moved to modules/_legacy/: - data_storage.py, image_handler.py, s3_handler.py (unused) Syntax verified, frontend build passing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 15:22:08 +00:00
parent bb0291f265
commit 544e028c3f
37 changed files with 5782 additions and 30 deletions
--- a/.artifacts/ReviewIQ-Architecture-v2.md
+++ b/.artifacts/ReviewIQ-Architecture-v2.md
--- a/.artifacts/ReviewIQ-Architecture-v3.2.md
+++ b/.artifacts/ReviewIQ-Architecture-v3.2.md
--- a/.artifacts/ReviewIQ-Architecture-v3.md
+++ b/.artifacts/ReviewIQ-Architecture-v3.md
--- a/.artifacts/ReviewIQ-v32-Decisions.md
+++ b/.artifacts/ReviewIQ-v32-Decisions.md
@@ -0,0 +1,183 @@
+# ReviewIQ v3.2 Design Decisions
+
+> Fast context-recovery document — all key decisions without the full spec.
+
+---
+
+## 1. Markpoint
+
+```
+ID:       reviewiq-v32-span-layer-2026-01-24-001
+Status:   v3.2 span layer complete
+Based on: v3.1.2 (commit f998277)
+```
+
+---
+
+## 2. Core Design Decisions
+
+| Decision | Choice | Rationale |
+|----------|--------|-----------|
+| Span granularity | Clause/topic-level | Preserves multi-domain signal |
+| span_id format | ULID (TEXT) | Survives re-segmentation |
+| Span offsets | Required (NOT NULL) | Deterministic reconstruction |
+| Offsets reference | reviews_enriched.text | Not text_normalized |
+| Span → Issue mapping | One-to-one (UNIQUE span_id) | Atomic unit per issue |
+| Primary span enforcement | Partial unique index | Exactly one per review version |
+| Primary selection | I3>I2>I1, V->V±>V0>V+, span_index | Deterministic, stable |
+| Reprocessing strategy | Soft-switch with is_active | No transient empty states |
+| Span overlap | GiST exclusion constraint | Non-overlapping ranges enforced |
+| Secondary codes | Array with cardinality ≤ 2 | Could normalize to link table later |
+| Causal chain storage | JSONB | Flexibility, normalize later if needed |
+| relation_type vs causal_chain | Separate concerns | relation = within-review, causal = root cause |
+| Dimension columns | Postgres ENUMs | Type safety, storage efficiency |
+| Trust score floor | 0.2 (GREATEST clamp) | Prevent multiplicative collapse |
+| Issue routing key | (business_id, place_id, urt_primary, entity_normalized) | Deterministic, entity-aware |
+| Issue ID generation | SHA256 via pgcrypto | Deterministic, collision-resistant |
+| Text validation trigger | Conditional via session setting | Performance: skip in bulk loads |
+| Relation validation | Application-level post-insert | Handles insertion order |
+
+---
+
+## 3. Extensions Required
+
+| Extension | Purpose |
+|-----------|---------|
+| `btree_gist` | Exclusion constraint for non-overlapping spans |
+| `pgcrypto` | SHA256-based issue ID generation |
+
+---
+
+## 4. New Tables
+
+| Table | Purpose |
+|-------|---------|
+| `review_spans` | Span-level URT classification |
+| `review_span_secondary_codes` | (Optional) Normalized secondary codes |
+
+---
+
+## 5. Modified Tables
+
+| Table | Changes |
+|-------|---------|
+| `issue_spans` | Added `span_id` FK (NOT NULL), removed direct review FK as canonical |
+
+---
+
+## 6. New ENUM Types
+
+**Valence & Intensity:**
+- `urt_valence` — V-, V±, V0, V+
+- `urt_intensity` — I1, I2, I3
+
+**Specificity & Actionability:**
+- `urt_specificity` — S1, S2, S3
+- `urt_actionability` — A1, A2, A3
+
+**Context & Evidence:**
+- `urt_temporal` — T1, T2, T3
+- `urt_evidence` — E1, E2, E3
+- `urt_comparative` — CR1, CR2, CR3
+
+**Classification:**
+- `urt_profile` — factual, emotional, comparative, etc.
+- `urt_confidence` — low, medium, high
+- `urt_relation` — elaborates, contrasts, causes, etc.
+- `urt_entity_type` — person, product, location, etc.
+
+---
+
+## 7. Key Functions
+
+| Function | Purpose |
+|----------|---------|
+| `urt_validate_causal_chain()` | Validates causal JSONB structure |
+| `validate_review_relations()` | Ensures related_span_id same-parent |
+| `validate_active_spans()` | Ensures valid active span set |
+| `set_primary_span()` | Deterministic primary selection |
+| `generate_issue_id()` | SHA256-based issue ID |
+
+---
+
+## 8. Key Triggers
+
+| Trigger | Purpose |
+|---------|---------|
+| `review_spans_validate_bounds` | span_end ≤ text length |
+| `review_spans_validate_text` | span_text matches substring |
+| `review_spans_validate_causal_chain` | causal_chain JSONB valid |
+
+---
+
+## 9. USN Format
+
+```
+Standard: URT:S:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}
+Full:     URT:F:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}:{causal}
+```
+
+**Examples:**
+- `URT:S:SVC.SPD:V-I3:S3A3T2.E2.CR1` — Specific service speed complaint
+- `URT:F:PRD.QUA:V+I2:S2A1T1.E3.CR2:staff→training` — Product quality praise with causal chain
+
+---
+
+## 10. Span Boundary Rules
+
+1. **Split on contrasting conjunctions** — "but", "however", "although"
+2. **Split on topic/target change** — Different entity or aspect
+3. **Split on valence change** — Positive → Negative or vice versa
+4. **Split on domain change** — SVC → PRD → AMB
+5. **Keep cause→effect together** — Causal chain stays in one span
+
+---
+
+## 11. Deferred to v3.3+
+
+| Item | Reason |
+|------|--------|
+| Entity extraction implementation | Requires NER pipeline |
+| Trust-weighted fact aggregation | Needs more span data |
+| Secondary domain enforcement | App-level validation sufficient |
+| Span-based fact counting | Currently review-based, optimize later |
+
+---
+
+## 12. Open Questions Resolved
+
+| Question | Resolution |
+|----------|------------|
+| Span → Issue cardinality? | **One-to-one** (not many-to-many) |
+| Offsets nullable for LLM-inferred? | **No** — required, NOT NULL |
+| Reprocessing strategy? | **Soft-switch** with is_active flag |
+| TEXT vs ENUM for dimensions? | **ENUMs** — committed to Postgres |
+
+---
+
+## Quick Reference
+
+### Primary Span Selection Algorithm
+
+```
+ORDER BY:
+  1. intensity DESC (I3 > I2 > I1)
+  2. valence ASC (V- > V± > V0 > V+)
+  3. span_index ASC (first wins ties)
+```
+
+### Issue Routing Key
+
+```sql
+(business_id, place_id, urt_primary, entity_normalized)
+```
+
+### Trust Score Calculation
+
+```sql
+GREATEST(0.2, base_trust * modifiers)  -- Floor prevents collapse
+```
+
+---
+
+*Last updated: 2026-01-24*
--- a/.artifacts/URT-v5.1-Reference.md
+++ b/.artifacts/URT-v5.1-Reference.md
@@ -0,0 +1,331 @@
+# Universal Review Taxonomy (URT) v5.1 Reference
+
+## Overview
+
+The Universal Review Taxonomy (URT) is a classification system for customer feedback. It provides a structured approach to categorizing, annotating, and analyzing review content across any industry.
+
+### Key Characteristics
+
+- **Three Profiles**: Core, Standard, Full (increasing detail)
+- **Seven Domains**: Covering all aspects of customer experience
+- **Tier-3 Canonical Codes**: Format `X#.##` (e.g., J1.02, P2.15)
+- **Dimensional Annotation**: Valence, intensity, specificity, and more
+- **Causal Analysis**: Root cause chains (Full profile)
+
+---
+
+## Domain Codes
+
+URT organizes feedback into seven domains, each identified by a single letter.
+
+| Domain | Letter | Description |
+|--------|--------|-------------|
+| Offering | O | Product/service quality |
+| Price | P | Value, pricing, promotions |
+| Journey | J | Customer experience, timing, process |
+| Environment | E | Physical/digital space |
+| Attitude | A | Staff behavior, service attitude |
+| Voice | V | Brand, communication, marketing |
+| Relationship | R | Loyalty, trust, long-term relationship |
+
+### Tier-3 Code Format
+
+```
+Pattern: [OPJEAVR][1-4]\.[0-9]{2}
+```
+
+Examples:
+- `J1.02` - Journey domain, category 1, subcategory 02
+- `P2.15` - Price domain, category 2, subcategory 15
+- `A3.01` - Attitude domain, category 3, subcategory 01
+
+---
+
+## Dimension Codes
+
+### Valence
+
+Indicates the sentiment direction of the feedback.
+
+| Code | Meaning |
+|------|---------|
+| V+ | Positive |
+| V- | Negative |
+| V0 | Neutral |
+| V± | Mixed |
+
+### Intensity
+
+Indicates the strength of the expressed sentiment.
+
+| Code | Meaning |
+|------|---------|
+| I1 | Low intensity |
+| I2 | Moderate intensity |
+| I3 | High intensity |
+
+### Specificity (Standard+)
+
+Indicates how detailed the feedback is.
+
+| Code | Meaning |
+|------|---------|
+| S1 | Low - vague, general |
+| S2 | Medium - some detail |
+| S3 | High - specific, precise |
+
+### Actionability (Standard+)
+
+Indicates whether clear actions can be derived from the feedback.
+
+| Code | Meaning |
+|------|---------|
+| A1 | None - no clear action |
+| A2 | Unclear - possible actions |
+| A3 | Clear - specific actionable |
+
+### Temporal (Standard+)
+
+Indicates the time frame referenced in the feedback.
+
+| Code | Meaning | Markers |
+|------|---------|---------|
+| TC | Current - this visit | "today", "this time", "yesterday" |
+| TR | Recent - last few visits | "lately", "recently", "again" |
+| TH | Historical - long-standing | "for years", "always", "historically" |
+| TF | Future - expectations | "I won't come back", "next time" |
+
+**Default**: TC when no temporal language exists.
+
+### Evidence (Standard+)
+
+Indicates how the information was obtained from the text.
+
+| Code | Meaning | Example |
+|------|---------|---------|
+| ES | Stated - explicit in text | "Waited 45 minutes" |
+| EI | Inferred - logically entailed | "Took 3 weeks to reply" → slow response |
+| EC | Contextual - depends on context | "That happened again" |
+
+**Default**: ES. Use EI/EC only when needed.
+
+### Comparative
+
+Indicates whether the feedback compares to alternatives.
+
+| Code | Meaning |
+|------|---------|
+| CR-N | No comparison |
+| CR-B | Better than alternatives |
+| CR-W | Worse than alternatives |
+| CR-S | Same as alternatives |
+
+---
+
+## USN (URT String Notation)
+
+USN is a compact string encoding for URT annotations.
+
+### Grammar
+
+```
+Standard: URT:S:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}
+Full:     URT:F:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}:{causal}
+```
+
+### Encoding Rules
+
+**Valence**:
+- `+` for V+
+- `-` for V-
+
+**Intensity**:
+- `1` for I1
+- `2` for I2
+- `3` for I3
+
+### Examples
+
+**Standard Profile**:
+```
+URT:S:J1.03:-2:22TC.ES.N
+```
+Decoded:
+- Profile: Standard
+- Code: J1.03
+- Valence: V- (negative)
+- Intensity: I2
+- Specificity: S2
+- Actionability: A2
+- Temporal: TC
+- Evidence: ES
+- Comparative: CR-N
+
+**Full Profile with Causal Chain**:
+```
+URT:F:J1.01+A1.04:-3:23TR.EI.S:CD.O,MG.O
+```
+Decoded:
+- Profile: Full
+- Codes: J1.01, A1.04
+- Valence: V- (negative)
+- Intensity: I3
+- Specificity: S2
+- Actionability: A3
+- Temporal: TR
+- Evidence: EI
+- Comparative: CR-S
+- Causal: CD.O (Conditions-Operational), MG.O (Management-Oversight)
+
+---
+
+## Causal Chain (Full Profile Only)
+
+The causal chain identifies root causes across three layers, ordered from immediate to systemic.
+
+### Layers
+
+| Layer | Codes | Scope |
+|-------|-------|-------|
+| conditions | CD-S, CD-T, CD-E, CD-F, CD-O | Staff State, Team Dynamics, Equipment, Facility, Operational |
+| management | MG-P, MG-T, MG-O, MG-R, MG-C | Planning, Training, Oversight, Resources, Communication |
+| systemic | SY-R, SY-P, SY-C, SY-S, SY-H, SY-X | Resource Decisions, Policy, Culture, Standards, Human Capital, External |
+
+### Code Reference
+
+**Conditions Layer**:
+- `CD-S` - Staff State
+- `CD-T` - Team Dynamics
+- `CD-E` - Equipment
+- `CD-F` - Facility
+- `CD-O` - Operational
+
+**Management Layer**:
+- `MG-P` - Planning
+- `MG-T` - Training
+- `MG-O` - Oversight
+- `MG-R` - Resources
+- `MG-C` - Communication
+
+**Systemic Layer**:
+- `SY-R` - Resource Decisions
+- `SY-P` - Policy
+- `SY-C` - Culture
+- `SY-S` - Standards
+- `SY-H` - Human Capital
+- `SY-X` - External
+
+### JSONB Schema
+
+```json
+[
+  {"layer": "conditions", "code": "CD-O", "evidence": "ES"},
+  {"layer": "management", "code": "MG-P", "evidence": "EI"}
+]
+```
+
+### Constraints
+
+- Maximum 3 entries (one per layer)
+- Only include when text explicitly supports it
+- Order: conditions → management → systemic
+
+---
+
+## Span Boundary Detection Rules
+
+Spans are detected at the clause/topic level, not sentence level.
+
+### Split Rules (in priority order)
+
+1. **Split on contrasting conjunctions**: but, however, although, despite, yet
+2. **Split when subject/target changes** (topic shift)
+3. **Split when valence changes** (positive ↔ negative)
+4. **Split when domain changes** (O/P/J/E/A/V/R)
+5. **Keep together** for cause→effect within same feedback unit
+
+### Guidelines
+
+- **Maximum**: ~3 spans per sentence
+- **Validation**: If 4+ spans detected, re-check for over-splitting
+
+### Example
+
+**Input**:
+> "The food was great but the service was slow and the bathroom was dirty."
+
+**Output**: 3 spans
+1. "The food was great" (Offering, positive)
+2. "the service was slow" (Journey/Attitude, negative)
+3. "the bathroom was dirty" (Environment, negative)
+
+**Reasoning**: Topic shift + domain shift at each boundary.
+
+---
+
+## Primary Span Selection
+
+When a review contains multiple spans, select the primary span using these criteria in order:
+
+### Selection Priority
+
+1. **Highest intensity** (I3 > I2 > I1)
+2. **Tie-break**: Negative over positive (V- > V± > V0 > V+)
+3. **Tie-break**: Earliest span_index
+
+### Example
+
+Given spans:
+- Span 0: I2, V+
+- Span 1: I3, V+
+- Span 2: I3, V-
+
+**Primary**: Span 2 (highest intensity I3, negative valence wins tie-break)
+
+---
+
+## Secondary Codes Rules
+
+Secondary codes capture additional topics mentioned in a span.
+
+### Constraints
+
+- **Maximum**: 2 secondary codes
+- **Format**: Must be Tier-3 (X#.##)
+- **Recommendation**: Should be different domain from primary
+
+### Example
+
+Primary: `J1.03` (Journey)
+Secondary: `A2.01`, `E1.05` (Attitude, Environment)
+
+---
+
+## Quick Reference Card
+
+### Profiles
+
+| Profile | Dimensions | Causal Chain |
+|---------|------------|--------------|
+| Core | V, I | No |
+| Standard | V, I, S, A, T, E, CR | No |
+| Full | V, I, S, A, T, E, CR | Yes |
+
+### USN Quick Format
+
+```
+URT:{S|F}:{tier3_codes}:{valence}{intensity}:{SAT}.{E}.{CR}[:{causal}]
+```
+
+### Domain Letters
+
+```
+O P J E A V R
+│ │ │ │ │ │ └─ Relationship
+│ │ │ │ │ └─── Voice
+│ │ │ │ └───── Attitude
+│ │ │ └─────── Environment
+│ │ └───────── Journey
+│ └─────────── Price
+└───────────── Offering
+```
--- a/api/init.py
+++ b/api/init.py
--- a/api/middleware/init.py
+++ b/api/middleware/init.py
--- a/api/routes/init.py
+++ b/api/routes/init.py
--- a/api_server_production.py
+++ b/api_server_production.py
@@ -20,13 +20,13 @@ from fastapi.middleware.cors import CORSMiddleware
 from pydantic import BaseModel, HttpUrl, Field
 from fastapi.responses import JSONResponse, StreamingResponse

-from modules.database import DatabaseManager, JobStatus
-from modules.webhooks import WebhookDispatcher, WebhookManager
-from modules.health_checks import HealthCheckSystem
-from modules.scraper_clean import fast_scrape_reviews, LogCapture, get_business_card_info  # Clean scraper
-from modules.crash_analyzer import analyze_crash, summarize_crash_patterns, apply_auto_fix
-from modules.structured_logger import StructuredLogger, LogEntry
-from modules.chrome_pool import (
+from core.database import DatabaseManager, JobStatus
+from services.webhook_service import WebhookDispatcher, WebhookManager
+from utils.health_checks import HealthCheckSystem
+from scrapers.google_reviews.v1_0_0 import fast_scrape_reviews, LogCapture, get_business_card_info  # Clean scraper
+from utils.crash_analyzer import analyze_crash, summarize_crash_patterns, apply_auto_fix
+from utils.logger import StructuredLogger, LogEntry
+from workers.chrome_pool import (
    start_worker_pools,
    stop_worker_pools,
    get_validation_worker,
--- a/core/init.py
+++ b/core/init.py
--- a/modules/config.py
+++ b/modules/config.py
--- a/modules/database.py
+++ b/modules/database.py
@@ -8,22 +8,13 @@ import json
 from datetime import datetime
 from typing import Optional, List, Dict, Any
 from uuid import UUID, uuid4
-from enum import Enum
 import logging

+from core.enums import JobStatus
+
 log = logging.getLogger(__name__)


-class JobStatus(str, Enum):
-    """Job status enumeration"""
-    PENDING = "pending"
-    RUNNING = "running"
-    COMPLETED = "completed"
-    FAILED = "failed"
-    CANCELLED = "cancelled"
-    PARTIAL = "partial"  # Job crashed but has partial reviews saved
-
-
 class DatabaseManager:
    """PostgreSQL database manager with connection pooling"""

--- a/core/enums.py
+++ b/core/enums.py
@@ -0,0 +1,14 @@
+"""
+Enumerations for the ReviewIQ project.
+"""
+from enum import Enum
+
+
+class JobStatus(str, Enum):
+    """Job status enumeration"""
+    PENDING = "pending"
+    RUNNING = "running"
+    COMPLETED = "completed"
+    FAILED = "failed"
+    CANCELLED = "cancelled"
+    PARTIAL = "partial"  # Job crashed but has partial reviews saved
--- a/modules/models.py
+++ b/modules/models.py
@@ -6,7 +6,7 @@ from dataclasses import dataclass, field

 from selenium.webdriver.remote.webelement import WebElement

-from modules.utils import (try_find, first_text, first_attr, safe_int, detect_lang, parse_date_to_iso)
+from utils.helpers import (try_find, first_text, first_attr, safe_int, detect_lang, parse_date_to_iso)


@dataclass
--- a/modules/_legacy/data_storage.py
+++ b/modules/_legacy/data_storage.py
--- a/modules/_legacy/image_handler.py
+++ b/modules/_legacy/image_handler.py
--- a/modules/_legacy/s3_handler.py
+++ b/modules/_legacy/s3_handler.py
--- a/scrapers/init.py
+++ b/scrapers/init.py
@@ -0,0 +1,10 @@
+"""
+Scrapers Package
+
+This package contains all scraper implementations for the ReviewIQ system.
+"""
+
+from scrapers.base import BaseScraper
+from scrapers.registry import ScraperRegistry, registry
+
+__all__ = ["BaseScraper", "ScraperRegistry", "registry"]
--- a/scrapers/base.py
+++ b/scrapers/base.py
@@ -0,0 +1,97 @@
+"""
+Base Scraper Interface
+
+This module defines the abstract base class that all scrapers must implement.
+It ensures consistent interface across different scraper implementations.
+"""
+
+from abc import ABC, abstractmethod
+from typing import Any, Callable, Dict, List, Optional
+
+
+class BaseScraper(ABC):
+    """
+    Abstract base class for all scrapers in the ReviewIQ system.
+
+    All concrete scraper implementations must inherit from this class
+    and implement the required abstract methods.
+    """
+
+    @abstractmethod
+    def scrape(
+        self,
+        driver: Any,
+        url: str,
+        max_reviews: int = 5000,
+        timeout_no_new: int = 15,
+        flush_callback: Optional[Callable[[List[Dict]], None]] = None,
+        flush_batch_size: int = 500,
+        progress_callback: Optional[Callable[[int, Optional[int]], None]] = None,
+        validation_only: bool = False
+    ) -> Dict[str, Any]:
+        """
+        Scrape reviews from the given URL.
+
+        Args:
+            driver: WebDriver instance (e.g., Selenium WebDriver)
+            url: The URL to scrape reviews from
+            max_reviews: Maximum number of reviews to collect
+            timeout_no_new: Seconds to wait with no new reviews before stopping
+            flush_callback: Optional callback called with reviews batches for streaming
+            flush_batch_size: Number of reviews before triggering flush_callback
+            progress_callback: Optional callback(current_count, total_count) for progress
+            validation_only: If True, return early after extracting metadata only
+
+        Returns:
+            Dictionary containing:
+                - reviews: List of review dictionaries
+                - total: Total number of reviews collected
+                - error: Error message if any, None otherwise
+                - Additional scraper-specific metadata
+        """
+        pass
+
+    @abstractmethod
+    def validate_url(self, url: str) -> bool:
+        """
+        Validate if the given URL is supported by this scraper.
+
+        Args:
+            url: The URL to validate
+
+        Returns:
+            True if the URL is valid for this scraper, False otherwise
+        """
+        pass
+
+    @abstractmethod
+    def get_business_info(self, driver: Any, url: str) -> Dict[str, Any]:
+        """
+        Extract business information from the URL without scraping reviews.
+
+        Args:
+            driver: WebDriver instance
+            url: The URL to extract info from
+
+        Returns:
+            Dictionary containing business metadata (name, rating, address, etc.)
+        """
+        pass
+
+    @property
+    @abstractmethod
+    def name(self) -> str:
+        """Return the human-readable name of this scraper."""
+        pass
+
+    @property
+    @abstractmethod
+    def version(self) -> str:
+        """Return the version string of this scraper."""
+        pass
+
+    @property
+    @abstractmethod
+    def supported_domains(self) -> List[str]:
+        """Return list of domains this scraper supports."""
+        pass
--- a/scrapers/google_reviews/init.py
+++ b/scrapers/google_reviews/init.py
@@ -0,0 +1,21 @@
+"""
+Google Reviews Scraper Package
+
+This package contains the Google Reviews scraper implementations.
+"""
+
+from scrapers.google_reviews.v1_0_0 import (
+    scrape_reviews,
+    fast_scrape_reviews,
+    get_business_card_info,
+    extract_about_info,
+    LogCapture,
+)
+
+__all__ = [
+    "scrape_reviews",
+    "fast_scrape_reviews",
+    "get_business_card_info",
+    "extract_about_info",
+    "LogCapture",
+]
--- a/scrapers/google_reviews/v1_0_0.py
+++ b/scrapers/google_reviews/v1_0_0.py
@@ -1,7 +1,12 @@
 """
-Clean Google Maps Reviews Scraper
+Google Reviews Scraper v1.0.0
+
+This module provides the core Google Maps reviews scraping functionality.
 - Simple down scrolling
 - DOM scraping + API interception
+
+Version: 1.0.0
+Migrated from: modules/scraper_clean.py
 """

 import re
@@ -12,7 +17,7 @@ from datetime import datetime
 from typing import List, Optional
 from selenium.webdriver.common.by import By

-from modules.structured_logger import StructuredLogger
+from utils.logger import StructuredLogger

 def get_chrome_memory(driver) -> Optional[int]:
    """Get Chrome memory usage in MB using CDP."""
--- a/scrapers/registry.py
+++ b/scrapers/registry.py
@@ -0,0 +1,138 @@
+"""
+Scraper Registry
+
+This module provides a registry for managing and discovering scrapers.
+It allows dynamic registration and lookup of scraper implementations.
+"""
+
+from typing import Dict, List, Optional, Type
+
+from scrapers.base import BaseScraper
+
+
+class ScraperRegistry:
+    """
+    Registry for managing scraper implementations.
+
+    The registry allows:
+    - Registering scrapers by name and version
+    - Looking up scrapers by domain or name
+    - Listing all available scrapers
+
+    Usage:
+        registry = ScraperRegistry()
+        registry.register(GoogleReviewsScraper)
+        scraper = registry.get_scraper_for_url("https://google.com/maps/place/...")
+    """
+
+    _instance: Optional["ScraperRegistry"] = None
+    _scrapers: Dict[str, Type[BaseScraper]]
+
+    def __new__(cls) -> "ScraperRegistry":
+        """Singleton pattern to ensure one global registry."""
+        if cls._instance is None:
+            cls._instance = super().__new__(cls)
+            cls._instance._scrapers = {}
+            cls._instance._domain_map = {}
+        return cls._instance
+
+    def register(self, scraper_class: Type[BaseScraper], name: Optional[str] = None) -> None:
+        """
+        Register a scraper class with the registry.
+
+        Args:
+            scraper_class: The scraper class to register (must inherit from BaseScraper)
+            name: Optional name override, defaults to scraper_class.name property
+        """
+        # Create a temporary instance to get properties
+        # Note: In production, we might want scraper_class to have class-level properties
+        instance = scraper_class.__new__(scraper_class)
+
+        scraper_name = name or instance.name
+        scraper_version = instance.version
+        key = f"{scraper_name}:{scraper_version}"
+
+        self._scrapers[key] = scraper_class
+
+        # Map domains to this scraper
+        for domain in instance.supported_domains:
+            if domain not in self._domain_map:
+                self._domain_map[domain] = []
+            self._domain_map[domain].append(key)
+
+    def get_scraper(self, name: str, version: Optional[str] = None) -> Optional[Type[BaseScraper]]:
+        """
+        Get a scraper class by name and optional version.
+
+        Args:
+            name: The scraper name
+            version: Optional version string. If not provided, returns the latest.
+
+        Returns:
+            The scraper class, or None if not found
+        """
+        if version:
+            key = f"{name}:{version}"
+            return self._scrapers.get(key)
+
+        # Find latest version for this name
+        matching = [k for k in self._scrapers.keys() if k.startswith(f"{name}:")]
+        if not matching:
+            return None
+
+        # Sort by version and return latest
+        matching.sort(reverse=True)
+        return self._scrapers.get(matching[0])
+
+    def get_scraper_for_url(self, url: str) -> Optional[Type[BaseScraper]]:
+        """
+        Find a suitable scraper for the given URL.
+
+        Args:
+            url: The URL to find a scraper for
+
+        Returns:
+            The scraper class that can handle this URL, or None if no match
+        """
+        from urllib.parse import urlparse
+
+        parsed = urlparse(url)
+        domain = parsed.netloc.lower()
+
+        # Remove www. prefix for matching
+        if domain.startswith("www."):
+            domain = domain[4:]
+
+        scraper_keys = self._domain_map.get(domain, [])
+        if not scraper_keys:
+            return None
+
+        # Return the latest version
+        scraper_keys.sort(reverse=True)
+        return self._scrapers.get(scraper_keys[0])
+
+    def list_scrapers(self) -> List[Dict[str, str]]:
+        """
+        List all registered scrapers.
+
+        Returns:
+            List of dictionaries with scraper info (name, version, domains)
+        """
+        result = []
+        for key, scraper_class in self._scrapers.items():
+            instance = scraper_class.__new__(scraper_class)
+            result.append({
+                "name": instance.name,
+                "version": instance.version,
+                "domains": instance.supported_domains
+            })
+        return result
+
+    def clear(self) -> None:
+        """Clear all registered scrapers. Useful for testing."""
+        self._scrapers.clear()
+        self._domain_map.clear()
+
+
+# Global registry instance
+registry = ScraperRegistry()
--- a/services/init.py
+++ b/services/init.py
--- a/services/webhook_service.py
+++ b/services/webhook_service.py
--- a/tests/api/init.py
+++ b/tests/api/init.py
--- a/tests/integration/init.py
+++ b/tests/integration/init.py
--- a/tests/scrapers/init.py
+++ b/tests/scrapers/init.py
--- a/tests/scrapers/google_reviews/init.py
+++ b/tests/scrapers/google_reviews/init.py
--- a/tests/services/init.py
+++ b/tests/services/init.py
--- a/utils/init.py
+++ b/utils/init.py
--- a/modules/crash_analyzer.py
+++ b/modules/crash_analyzer.py
--- a/modules/date_converter.py
+++ b/modules/date_converter.py
--- a/modules/health_checks.py
+++ b/modules/health_checks.py
@@ -67,7 +67,7 @@ class CanaryMonitor:
                # Alert if multiple consecutive failures
                if self.consecutive_failures >= 3:
                    await self.send_alert(
-                        f"🚨 CRITICAL: Scraper canary failed {self.consecutive_failures} times in a row! "
+                        f"CRITICAL: Scraper canary failed {self.consecutive_failures} times in a row! "
                        f"Last error: {str(e)[:200]}"
                    )

@@ -90,7 +90,7 @@ class CanaryMonitor:
        - Scrape time is reasonable
        - Data structure is valid
        """
-        from modules.scraper_clean import fast_scrape_reviews
+        from scrapers.google_reviews.v1_0_0 import fast_scrape_reviews

        log.info(f"Running canary scrape test on {self.test_url[:60]}...")
        self.last_run = datetime.now()
@@ -121,7 +121,7 @@ class CanaryMonitor:
            if all_passed:
                # Success!
                log.info(
-                    f"✅ Canary test PASSED: {result['count']} reviews in {result['time']:.1f}s"
+                    f"Canary test PASSED: {result['count']} reviews in {result['time']:.1f}s"
                )
                self.consecutive_failures = 0
                self.last_success = datetime.now()
@@ -144,7 +144,7 @@ class CanaryMonitor:
                # Validation failed
                failed_checks = [k for k, v in checks.items() if not v]
                log.error(
-                    f"❌ Canary test FAILED: validation failed on {failed_checks}"
+                    f"Canary test FAILED: validation failed on {failed_checks}"
                )
                self.consecutive_failures += 1
                self.last_result = {
@@ -167,12 +167,12 @@ class CanaryMonitor:
                # Alert on failure
                if self.consecutive_failures >= 3:
                    await self.send_alert(
-                        f"🚨 CRITICAL: Canary validation failed {self.consecutive_failures} times! "
+                        f"CRITICAL: Canary validation failed {self.consecutive_failures} times! "
                        f"Failed checks: {failed_checks}"
                    )

        except asyncio.TimeoutError:
-            log.error("❌ Canary test TIMEOUT (>60s)")
+            log.error("Canary test TIMEOUT (>60s)")
            self.consecutive_failures += 1
            self.last_result = {
                "status": "timeout",
@@ -186,11 +186,11 @@ class CanaryMonitor:

            if self.consecutive_failures >= 3:
                await self.send_alert(
-                    f"🚨 CRITICAL: Canary timeout {self.consecutive_failures} times!"
+                    f"CRITICAL: Canary timeout {self.consecutive_failures} times!"
                )

        except Exception as e:
-            log.error(f"❌ Canary test ERROR: {e}")
+            log.error(f"Canary test ERROR: {e}")
            self.consecutive_failures += 1
            self.last_result = {
                "status": "error",
--- a/utils/helpers.py
+++ b/utils/helpers.py
--- a/modules/structured_logger.py
+++ b/modules/structured_logger.py
--- a/workers/init.py
+++ b/workers/init.py
--- a/workers/chrome_pool.py
+++ b/workers/chrome_pool.py