Phase 0: Project restructure to ReviewIQ platform architecture

New structure: - scrapers/google_reviews/v1_0_0.py (was modules/scraper_clean.py) - scrapers/base.py (BaseScraper interface) - scrapers/registry.py (ScraperRegistry for version routing) - core/database.py, models.py, config.py, enums.py - utils/logger.py, crash_analyzer.py, health_checks.py, helpers.py, date_converter.py - workers/chrome_pool.py - services/webhook_service.py - api/ routes structure (empty, ready for Phase 2) - tests/ structure mirroring source All imports updated in: - api_server_production.py (7 import paths updated) - utils/health_checks.py (scraper import path) Legacy modules moved to modules/_legacy/: - data_storage.py, image_handler.py, s3_handler.py (unused) Syntax verified, frontend build passing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 15:22:08 +00:00
parent bb0291f265
commit 544e028c3f
37 changed files with 5782 additions and 30 deletions
--- a/.artifacts/ReviewIQ-Architecture-v2.md
+++ b/.artifacts/ReviewIQ-Architecture-v2.md
--- a/.artifacts/ReviewIQ-Architecture-v3.2.md
+++ b/.artifacts/ReviewIQ-Architecture-v3.2.md
--- a/.artifacts/ReviewIQ-Architecture-v3.md
+++ b/.artifacts/ReviewIQ-Architecture-v3.md
--- a/.artifacts/ReviewIQ-v32-Decisions.md
+++ b/.artifacts/ReviewIQ-v32-Decisions.md
@@ -0,0 +1,183 @@
 # ReviewIQ v3.2 Design Decisions
 > Fast context-recovery document — all key decisions without the full spec.
 ---
 ## 1. Markpoint
 ```
 ID:       reviewiq-v32-span-layer-2026-01-24-001
 Status:   v3.2 span layer complete
 Based on: v3.1.2 (commit f998277)
 ```
 ---
 ## 2. Core Design Decisions
 | Decision | Choice | Rationale |
 |----------|--------|-----------|
 | Span granularity | Clause/topic-level | Preserves multi-domain signal |
 | span_id format | ULID (TEXT) | Survives re-segmentation |
 | Span offsets | Required (NOT NULL) | Deterministic reconstruction |
 | Offsets reference | reviews_enriched.text | Not text_normalized |
 | Span → Issue mapping | One-to-one (UNIQUE span_id) | Atomic unit per issue |
 | Primary span enforcement | Partial unique index | Exactly one per review version |
 | Primary selection | I3>I2>I1, V->V±>V0>V+, span_index | Deterministic, stable |
 | Reprocessing strategy | Soft-switch with is_active | No transient empty states |
 | Span overlap | GiST exclusion constraint | Non-overlapping ranges enforced |
 | Secondary codes | Array with cardinality ≤ 2 | Could normalize to link table later |
 | Causal chain storage | JSONB | Flexibility, normalize later if needed |
 | relation_type vs causal_chain | Separate concerns | relation = within-review, causal = root cause |
 | Dimension columns | Postgres ENUMs | Type safety, storage efficiency |
 | Trust score floor | 0.2 (GREATEST clamp) | Prevent multiplicative collapse |
 | Issue routing key | (business_id, place_id, urt_primary, entity_normalized) | Deterministic, entity-aware |
 | Issue ID generation | SHA256 via pgcrypto | Deterministic, collision-resistant |
 | Text validation trigger | Conditional via session setting | Performance: skip in bulk loads |
 | Relation validation | Application-level post-insert | Handles insertion order |
 ---
 ## 3. Extensions Required
 | Extension | Purpose |
 |-----------|---------|
 | `btree_gist` | Exclusion constraint for non-overlapping spans |
 | `pgcrypto` | SHA256-based issue ID generation |
 ---
 ## 4. New Tables
 | Table | Purpose |
 |-------|---------|
 | `review_spans` | Span-level URT classification |
 | `review_span_secondary_codes` | (Optional) Normalized secondary codes |
 ---
 ## 5. Modified Tables
 | Table | Changes |
 |-------|---------|
 | `issue_spans` | Added `span_id` FK (NOT NULL), removed direct review FK as canonical |
 ---
 ## 6. New ENUM Types
 **Valence & Intensity:**
 - `urt_valence` — V-, V±, V0, V+
 - `urt_intensity` — I1, I2, I3
 **Specificity & Actionability:**
 - `urt_specificity` — S1, S2, S3
 - `urt_actionability` — A1, A2, A3
 **Context & Evidence:**
 - `urt_temporal` — T1, T2, T3
 - `urt_evidence` — E1, E2, E3
 - `urt_comparative` — CR1, CR2, CR3
 **Classification:**
 - `urt_profile` — factual, emotional, comparative, etc.
 - `urt_confidence` — low, medium, high
 - `urt_relation` — elaborates, contrasts, causes, etc.
 - `urt_entity_type` — person, product, location, etc.
 ---
 ## 7. Key Functions
 | Function | Purpose |
 |----------|---------|
 | `urt_validate_causal_chain()` | Validates causal JSONB structure |
 | `validate_review_relations()` | Ensures related_span_id same-parent |
 | `validate_active_spans()` | Ensures valid active span set |
 | `set_primary_span()` | Deterministic primary selection |
 | `generate_issue_id()` | SHA256-based issue ID |
 ---
 ## 8. Key Triggers
 | Trigger | Purpose |
 |---------|---------|
 | `review_spans_validate_bounds` | span_end ≤ text length |
 | `review_spans_validate_text` | span_text matches substring |
 | `review_spans_validate_causal_chain` | causal_chain JSONB valid |
 ---
 ## 9. USN Format
 ```
 Standard: URT:S:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}
 Full:     URT:F:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}:{causal}
 ```
 **Examples:**
 - `URT:S:SVC.SPD:V-I3:S3A3T2.E2.CR1` — Specific service speed complaint
 - `URT:F:PRD.QUA:V+I2:S2A1T1.E3.CR2:staff→training` — Product quality praise with causal chain
 ---
 ## 10. Span Boundary Rules
 1. **Split on contrasting conjunctions** — "but", "however", "although"
 2. **Split on topic/target change** — Different entity or aspect
 3. **Split on valence change** — Positive → Negative or vice versa
 4. **Split on domain change** — SVC → PRD → AMB
 5. **Keep cause→effect together** — Causal chain stays in one span
 ---
 ## 11. Deferred to v3.3+
 | Item | Reason |
 |------|--------|
 | Entity extraction implementation | Requires NER pipeline |
 | Trust-weighted fact aggregation | Needs more span data |
 | Secondary domain enforcement | App-level validation sufficient |
 | Span-based fact counting | Currently review-based, optimize later |
 ---
 ## 12. Open Questions Resolved
 | Question | Resolution |
 |----------|------------|
 | Span → Issue cardinality? | **One-to-one** (not many-to-many) |
 | Offsets nullable for LLM-inferred? | **No** — required, NOT NULL |
 | Reprocessing strategy? | **Soft-switch** with is_active flag |
 | TEXT vs ENUM for dimensions? | **ENUMs** — committed to Postgres |
 ---
 ## Quick Reference
 ### Primary Span Selection Algorithm
 ```
 ORDER BY:
  1. intensity DESC (I3 > I2 > I1)
  2. valence ASC (V- > V± > V0 > V+)
  3. span_index ASC (first wins ties)
 ```
 ### Issue Routing Key
 ```sql
 (business_id, place_id, urt_primary, entity_normalized)
 ```
 ### Trust Score Calculation
 ```sql
 GREATEST(0.2, base_trust * modifiers)  -- Floor prevents collapse
 ```
 ---
 *Last updated: 2026-01-24*
--- a/.artifacts/URT-v5.1-Reference.md
+++ b/.artifacts/URT-v5.1-Reference.md
@@ -0,0 +1,331 @@
 # Universal Review Taxonomy (URT) v5.1 Reference
 ## Overview
 The Universal Review Taxonomy (URT) is a classification system for customer feedback. It provides a structured approach to categorizing, annotating, and analyzing review content across any industry.
 ### Key Characteristics
 - **Three Profiles**: Core, Standard, Full (increasing detail)
 - **Seven Domains**: Covering all aspects of customer experience
 - **Tier-3 Canonical Codes**: Format `X#.##` (e.g., J1.02, P2.15)
 - **Dimensional Annotation**: Valence, intensity, specificity, and more
 - **Causal Analysis**: Root cause chains (Full profile)
 ---
 ## Domain Codes
 URT organizes feedback into seven domains, each identified by a single letter.
 | Domain | Letter | Description |
 |--------|--------|-------------|
 | Offering | O | Product/service quality |
 | Price | P | Value, pricing, promotions |
 | Journey | J | Customer experience, timing, process |
 | Environment | E | Physical/digital space |
 | Attitude | A | Staff behavior, service attitude |
 | Voice | V | Brand, communication, marketing |
 | Relationship | R | Loyalty, trust, long-term relationship |
 ### Tier-3 Code Format
 ```
 Pattern: [OPJEAVR][1-4]\.[0-9]{2}
 ```
 Examples:
 - `J1.02` - Journey domain, category 1, subcategory 02
 - `P2.15` - Price domain, category 2, subcategory 15
 - `A3.01` - Attitude domain, category 3, subcategory 01
 ---
 ## Dimension Codes
 ### Valence
 Indicates the sentiment direction of the feedback.
 | Code | Meaning |
 |------|---------|
 | V+ | Positive |
 | V- | Negative |
 | V0 | Neutral |
 | V± | Mixed |
 ### Intensity
 Indicates the strength of the expressed sentiment.
 | Code | Meaning |
 |------|---------|
 | I1 | Low intensity |
 | I2 | Moderate intensity |
 | I3 | High intensity |
 ### Specificity (Standard+)
 Indicates how detailed the feedback is.
 | Code | Meaning |
 |------|---------|
 | S1 | Low - vague, general |
 | S2 | Medium - some detail |
 | S3 | High - specific, precise |
 ### Actionability (Standard+)
 Indicates whether clear actions can be derived from the feedback.
 | Code | Meaning |
 |------|---------|
 | A1 | None - no clear action |
 | A2 | Unclear - possible actions |
 | A3 | Clear - specific actionable |
 ### Temporal (Standard+)
 Indicates the time frame referenced in the feedback.
 | Code | Meaning | Markers |
 |------|---------|---------|
 | TC | Current - this visit | "today", "this time", "yesterday" |
 | TR | Recent - last few visits | "lately", "recently", "again" |
 | TH | Historical - long-standing | "for years", "always", "historically" |
 | TF | Future - expectations | "I won't come back", "next time" |
 **Default**: TC when no temporal language exists.
 ### Evidence (Standard+)
 Indicates how the information was obtained from the text.
 | Code | Meaning | Example |
 |------|---------|---------|
 | ES | Stated - explicit in text | "Waited 45 minutes" |
 | EI | Inferred - logically entailed | "Took 3 weeks to reply" → slow response |
 | EC | Contextual - depends on context | "That happened again" |
 **Default**: ES. Use EI/EC only when needed.
 ### Comparative
 Indicates whether the feedback compares to alternatives.
 | Code | Meaning |
 |------|---------|
 | CR-N | No comparison |
 | CR-B | Better than alternatives |
 | CR-W | Worse than alternatives |
 | CR-S | Same as alternatives |
 ---
 ## USN (URT String Notation)
 USN is a compact string encoding for URT annotations.
 ### Grammar
 ```
 Standard: URT:S:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}
 Full:     URT:F:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}:{causal}
 ```
 ### Encoding Rules
 **Valence**:
 - `+` for V+
 - `-` for V-
 **Intensity**:
 - `1` for I1
 - `2` for I2
 - `3` for I3
 ### Examples
 **Standard Profile**:
 ```
 URT:S:J1.03:-2:22TC.ES.N
 ```
 Decoded:
 - Profile: Standard
 - Code: J1.03
 - Valence: V- (negative)
 - Intensity: I2
 - Specificity: S2
 - Actionability: A2
 - Temporal: TC
 - Evidence: ES
 - Comparative: CR-N
 **Full Profile with Causal Chain**:
 ```
 URT:F:J1.01+A1.04:-3:23TR.EI.S:CD.O,MG.O
 ```
 Decoded:
 - Profile: Full
 - Codes: J1.01, A1.04
 - Valence: V- (negative)
 - Intensity: I3
 - Specificity: S2
 - Actionability: A3
 - Temporal: TR
 - Evidence: EI
 - Comparative: CR-S
 - Causal: CD.O (Conditions-Operational), MG.O (Management-Oversight)
 ---
 ## Causal Chain (Full Profile Only)
 The causal chain identifies root causes across three layers, ordered from immediate to systemic.
 ### Layers
 | Layer | Codes | Scope |
 |-------|-------|-------|
 | conditions | CD-S, CD-T, CD-E, CD-F, CD-O | Staff State, Team Dynamics, Equipment, Facility, Operational |
 | management | MG-P, MG-T, MG-O, MG-R, MG-C | Planning, Training, Oversight, Resources, Communication |
 | systemic | SY-R, SY-P, SY-C, SY-S, SY-H, SY-X | Resource Decisions, Policy, Culture, Standards, Human Capital, External |
 ### Code Reference
 **Conditions Layer**:
 - `CD-S` - Staff State
 - `CD-T` - Team Dynamics
 - `CD-E` - Equipment
 - `CD-F` - Facility
 - `CD-O` - Operational
 **Management Layer**:
 - `MG-P` - Planning
 - `MG-T` - Training
 - `MG-O` - Oversight
 - `MG-R` - Resources
 - `MG-C` - Communication
 **Systemic Layer**:
 - `SY-R` - Resource Decisions
 - `SY-P` - Policy
 - `SY-C` - Culture
 - `SY-S` - Standards
 - `SY-H` - Human Capital
 - `SY-X` - External
 ### JSONB Schema
 ```json
 [
  {"layer": "conditions", "code": "CD-O", "evidence": "ES"},
  {"layer": "management", "code": "MG-P", "evidence": "EI"}
 ]
 ```
 ### Constraints
 - Maximum 3 entries (one per layer)
 - Only include when text explicitly supports it
 - Order: conditions → management → systemic
 ---
 ## Span Boundary Detection Rules
 Spans are detected at the clause/topic level, not sentence level.
 ### Split Rules (in priority order)
 1. **Split on contrasting conjunctions**: but, however, although, despite, yet
 2. **Split when subject/target changes** (topic shift)
 3. **Split when valence changes** (positive ↔ negative)
 4. **Split when domain changes** (O/P/J/E/A/V/R)
 5. **Keep together** for cause→effect within same feedback unit
 ### Guidelines
 - **Maximum**: ~3 spans per sentence
 - **Validation**: If 4+ spans detected, re-check for over-splitting
 ### Example
 **Input**:
 > "The food was great but the service was slow and the bathroom was dirty."
 **Output**: 3 spans
 1. "The food was great" (Offering, positive)
 2. "the service was slow" (Journey/Attitude, negative)
 3. "the bathroom was dirty" (Environment, negative)
 **Reasoning**: Topic shift + domain shift at each boundary.
 ---
 ## Primary Span Selection
 When a review contains multiple spans, select the primary span using these criteria in order:
 ### Selection Priority
 1. **Highest intensity** (I3 > I2 > I1)
 2. **Tie-break**: Negative over positive (V- > V± > V0 > V+)
 3. **Tie-break**: Earliest span_index
 ### Example
 Given spans:
 - Span 0: I2, V+
 - Span 1: I3, V+
 - Span 2: I3, V-
 **Primary**: Span 2 (highest intensity I3, negative valence wins tie-break)
 ---
 ## Secondary Codes Rules
 Secondary codes capture additional topics mentioned in a span.
 ### Constraints
 - **Maximum**: 2 secondary codes
 - **Format**: Must be Tier-3 (X#.##)
 - **Recommendation**: Should be different domain from primary
 ### Example
 Primary: `J1.03` (Journey)
 Secondary: `A2.01`, `E1.05` (Attitude, Environment)
 ---
 ## Quick Reference Card
 ### Profiles
 | Profile | Dimensions | Causal Chain |
 |---------|------------|--------------|
 | Core | V, I | No |
 | Standard | V, I, S, A, T, E, CR | No |
 | Full | V, I, S, A, T, E, CR | Yes |
 ### USN Quick Format
 ```
 URT:{S|F}:{tier3_codes}:{valence}{intensity}:{SAT}.{E}.{CR}[:{causal}]
 ```
 ### Domain Letters
 ```
 O P J E A V R
 │ │ │ │ │ │ └─ Relationship
 │ │ │ │ │ └─── Voice
 │ │ │ │ └───── Attitude
 │ │ │ └─────── Environment
 │ │ └───────── Journey
 │ └─────────── Price
 └───────────── Offering
 ```
--- a/api/init.py
+++ b/api/init.py
--- a/api/middleware/init.py
+++ b/api/middleware/init.py
--- a/api/routes/init.py
+++ b/api/routes/init.py
--- a/api_server_production.py
+++ b/api_server_production.py
@@ -20,13 +20,13 @@ from fastapi.middleware.cors import CORSMiddleware
 from pydantic import BaseModel, HttpUrl, Field
 from fastapi.responses import JSONResponse, StreamingResponse
-from modules.database import DatabaseManager, JobStatus
+from core.database import DatabaseManager, JobStatus
-from modules.webhooks import WebhookDispatcher, WebhookManager
+from services.webhook_service import WebhookDispatcher, WebhookManager
-from modules.health_checks import HealthCheckSystem
+from utils.health_checks import HealthCheckSystem
-from modules.scraper_clean import fast_scrape_reviews, LogCapture, get_business_card_info  # Clean scraper
+from scrapers.google_reviews.v1_0_0 import fast_scrape_reviews, LogCapture, get_business_card_info  # Clean scraper
-from modules.crash_analyzer import analyze_crash, summarize_crash_patterns, apply_auto_fix
+from utils.crash_analyzer import analyze_crash, summarize_crash_patterns, apply_auto_fix
-from modules.structured_logger import StructuredLogger, LogEntry
+from utils.logger import StructuredLogger, LogEntry
-from modules.chrome_pool import (
+from workers.chrome_pool import (
    start_worker_pools,
    stop_worker_pools,
    get_validation_worker,
--- a/core/init.py
+++ b/core/init.py
--- a/modules/config.py
+++ b/modules/config.py
--- a/modules/database.py
+++ b/modules/database.py
@@ -8,22 +8,13 @@ import json
 from datetime import datetime
 from typing import Optional, List, Dict, Any
 from uuid import UUID, uuid4
 from enum import Enum
 import logging
 from core.enums import JobStatus
 log = logging.getLogger(__name__)
 class JobStatus(str, Enum):
    """Job status enumeration"""
    PENDING = "pending"
    RUNNING = "running"
    COMPLETED = "completed"
    FAILED = "failed"
    CANCELLED = "cancelled"
    PARTIAL = "partial"  # Job crashed but has partial reviews saved
 class DatabaseManager:
    """PostgreSQL database manager with connection pooling"""
--- a/core/enums.py
+++ b/core/enums.py
@@ -0,0 +1,14 @@
 """
 Enumerations for the ReviewIQ project.
 """
 from enum import Enum
 class JobStatus(str, Enum):
    """Job status enumeration"""
    PENDING = "pending"
    RUNNING = "running"
    COMPLETED = "completed"
    FAILED = "failed"
    CANCELLED = "cancelled"
    PARTIAL = "partial"  # Job crashed but has partial reviews saved
--- a/modules/models.py
+++ b/modules/models.py
@@ -6,7 +6,7 @@ from dataclasses import dataclass, field
 from selenium.webdriver.remote.webelement import WebElement
-from modules.utils import (try_find, first_text, first_attr, safe_int, detect_lang, parse_date_to_iso)
+from utils.helpers import (try_find, first_text, first_attr, safe_int, detect_lang, parse_date_to_iso)
@dataclass
@@ -27,7 +27,7 @@ class RawReview:
    owner_date: str = ""
    owner_text: str = ""
    review_date: str = ""  # ISO format date
-    
+
    # Translation fields
    translations: dict = field(default_factory=dict)  # Store translations by language code
--- a/modules/_legacy/data_storage.py
+++ b/modules/_legacy/data_storage.py
--- a/modules/_legacy/image_handler.py
+++ b/modules/_legacy/image_handler.py
--- a/modules/_legacy/s3_handler.py
+++ b/modules/_legacy/s3_handler.py
--- a/scrapers/init.py
+++ b/scrapers/init.py
@@ -0,0 +1,10 @@
 """
 Scrapers Package
 This package contains all scraper implementations for the ReviewIQ system.
 """
 from scrapers.base import BaseScraper
 from scrapers.registry import ScraperRegistry, registry
 __all__ = ["BaseScraper", "ScraperRegistry", "registry"]
--- a/scrapers/base.py
+++ b/scrapers/base.py
@@ -0,0 +1,97 @@
 """
 Base Scraper Interface
 This module defines the abstract base class that all scrapers must implement.
 It ensures consistent interface across different scraper implementations.
 """
 from abc import ABC, abstractmethod
 from typing import Any, Callable, Dict, List, Optional
 class BaseScraper(ABC):
    """
    Abstract base class for all scrapers in the ReviewIQ system.
    All concrete scraper implementations must inherit from this class
    and implement the required abstract methods.
    """
    @abstractmethod
    def scrape(
        self,
        driver: Any,
        url: str,
        max_reviews: int = 5000,
        timeout_no_new: int = 15,
        flush_callback: Optional[Callable[[List[Dict]], None]] = None,
        flush_batch_size: int = 500,
        progress_callback: Optional[Callable[[int, Optional[int]], None]] = None,
        validation_only: bool = False
    ) -> Dict[str, Any]:
        """
        Scrape reviews from the given URL.
        Args:
            driver: WebDriver instance (e.g., Selenium WebDriver)
            url: The URL to scrape reviews from
            max_reviews: Maximum number of reviews to collect
            timeout_no_new: Seconds to wait with no new reviews before stopping
            flush_callback: Optional callback called with reviews batches for streaming
            flush_batch_size: Number of reviews before triggering flush_callback
            progress_callback: Optional callback(current_count, total_count) for progress
            validation_only: If True, return early after extracting metadata only
        Returns:
            Dictionary containing:
                - reviews: List of review dictionaries
                - total: Total number of reviews collected
                - error: Error message if any, None otherwise
                - Additional scraper-specific metadata
        """
        pass
    @abstractmethod
    def validate_url(self, url: str) -> bool:
        """
        Validate if the given URL is supported by this scraper.
        Args:
            url: The URL to validate
        Returns:
            True if the URL is valid for this scraper, False otherwise
        """
        pass
    @abstractmethod
    def get_business_info(self, driver: Any, url: str) -> Dict[str, Any]:
        """
        Extract business information from the URL without scraping reviews.
        Args:
            driver: WebDriver instance
            url: The URL to extract info from
        Returns:
            Dictionary containing business metadata (name, rating, address, etc.)
        """
        pass
    @property
    @abstractmethod
    def name(self) -> str:
        """Return the human-readable name of this scraper."""
        pass
    @property
    @abstractmethod
    def version(self) -> str:
        """Return the version string of this scraper."""
        pass
    @property
    @abstractmethod
    def supported_domains(self) -> List[str]:
        """Return list of domains this scraper supports."""
        pass
--- a/scrapers/google_reviews/init.py
+++ b/scrapers/google_reviews/init.py
@@ -0,0 +1,21 @@
 """
 Google Reviews Scraper Package
 This package contains the Google Reviews scraper implementations.
 """
 from scrapers.google_reviews.v1_0_0 import (
    scrape_reviews,
    fast_scrape_reviews,
    get_business_card_info,
    extract_about_info,
    LogCapture,
 )
 __all__ = [
    "scrape_reviews",
    "fast_scrape_reviews",
    "get_business_card_info",
    "extract_about_info",
    "LogCapture",
 ]
--- a/scrapers/google_reviews/v1_0_0.py
+++ b/scrapers/google_reviews/v1_0_0.py
@@ -1,7 +1,12 @@
 """
-Clean Google Maps Reviews Scraper
+Google Reviews Scraper v1.0.0
 This module provides the core Google Maps reviews scraping functionality.
 - Simple down scrolling
 - DOM scraping + API interception
 Version: 1.0.0
 Migrated from: modules/scraper_clean.py
 """
 import re
@@ -12,7 +17,7 @@ from datetime import datetime
 from typing import List, Optional
 from selenium.webdriver.common.by import By
-from modules.structured_logger import StructuredLogger
+from utils.logger import StructuredLogger
 def get_chrome_memory(driver) -> Optional[int]:
    """Get Chrome memory usage in MB using CDP."""
--- a/scrapers/registry.py
+++ b/scrapers/registry.py
@@ -0,0 +1,138 @@
 """
 Scraper Registry
 This module provides a registry for managing and discovering scrapers.
 It allows dynamic registration and lookup of scraper implementations.
 """
 from typing import Dict, List, Optional, Type
 from scrapers.base import BaseScraper
 class ScraperRegistry:
    """
    Registry for managing scraper implementations.
    The registry allows:
    - Registering scrapers by name and version
    - Looking up scrapers by domain or name
    - Listing all available scrapers
    Usage:
        registry = ScraperRegistry()
        registry.register(GoogleReviewsScraper)
        scraper = registry.get_scraper_for_url("https://google.com/maps/place/...")
    """
    _instance: Optional["ScraperRegistry"] = None
    _scrapers: Dict[str, Type[BaseScraper]]
    def __new__(cls) -> "ScraperRegistry":
        """Singleton pattern to ensure one global registry."""
        if cls._instance is None:
            cls._instance = super().__new__(cls)
            cls._instance._scrapers = {}
            cls._instance._domain_map = {}
        return cls._instance
    def register(self, scraper_class: Type[BaseScraper], name: Optional[str] = None) -> None:
        """
        Register a scraper class with the registry.
        Args:
            scraper_class: The scraper class to register (must inherit from BaseScraper)
            name: Optional name override, defaults to scraper_class.name property
        """
        # Create a temporary instance to get properties
        # Note: In production, we might want scraper_class to have class-level properties
        instance = scraper_class.__new__(scraper_class)
        scraper_name = name or instance.name
        scraper_version = instance.version
        key = f"{scraper_name}:{scraper_version}"
        self._scrapers[key] = scraper_class
        # Map domains to this scraper
        for domain in instance.supported_domains:
            if domain not in self._domain_map:
                self._domain_map[domain] = []
            self._domain_map[domain].append(key)
    def get_scraper(self, name: str, version: Optional[str] = None) -> Optional[Type[BaseScraper]]:
        """
        Get a scraper class by name and optional version.
        Args:
            name: The scraper name
            version: Optional version string. If not provided, returns the latest.
        Returns:
            The scraper class, or None if not found
        """
        if version:
            key = f"{name}:{version}"
            return self._scrapers.get(key)
        # Find latest version for this name
        matching = [k for k in self._scrapers.keys() if k.startswith(f"{name}:")]
        if not matching:
            return None
        # Sort by version and return latest
        matching.sort(reverse=True)
        return self._scrapers.get(matching[0])
    def get_scraper_for_url(self, url: str) -> Optional[Type[BaseScraper]]:
        """
        Find a suitable scraper for the given URL.
        Args:
            url: The URL to find a scraper for
        Returns:
            The scraper class that can handle this URL, or None if no match
        """
        from urllib.parse import urlparse
        parsed = urlparse(url)
        domain = parsed.netloc.lower()
        # Remove www. prefix for matching
        if domain.startswith("www."):
            domain = domain[4:]
        scraper_keys = self._domain_map.get(domain, [])
        if not scraper_keys:
            return None
        # Return the latest version
        scraper_keys.sort(reverse=True)
        return self._scrapers.get(scraper_keys[0])
    def list_scrapers(self) -> List[Dict[str, str]]:
        """
        List all registered scrapers.
        Returns:
            List of dictionaries with scraper info (name, version, domains)
        """
        result = []
        for key, scraper_class in self._scrapers.items():
            instance = scraper_class.__new__(scraper_class)
            result.append({
                "name": instance.name,
                "version": instance.version,
                "domains": instance.supported_domains
            })
        return result
    def clear(self) -> None:
        """Clear all registered scrapers. Useful for testing."""
        self._scrapers.clear()
        self._domain_map.clear()
 # Global registry instance
 registry = ScraperRegistry()
--- a/services/init.py
+++ b/services/init.py
--- a/services/webhook_service.py
+++ b/services/webhook_service.py
--- a/tests/api/init.py
+++ b/tests/api/init.py
--- a/tests/integration/init.py
+++ b/tests/integration/init.py
--- a/tests/scrapers/init.py
+++ b/tests/scrapers/init.py
--- a/tests/scrapers/google_reviews/init.py
+++ b/tests/scrapers/google_reviews/init.py
--- a/tests/services/init.py
+++ b/tests/services/init.py
--- a/utils/init.py
+++ b/utils/init.py
--- a/modules/crash_analyzer.py
+++ b/modules/crash_analyzer.py
--- a/modules/date_converter.py
+++ b/modules/date_converter.py
--- a/modules/health_checks.py
+++ b/modules/health_checks.py
@@ -67,7 +67,7 @@ class CanaryMonitor:
                # Alert if multiple consecutive failures
                if self.consecutive_failures >= 3:
                    await self.send_alert(
-                        f"🚨 CRITICAL: Scraper canary failed {self.consecutive_failures} times in a row! "
+                        f"CRITICAL: Scraper canary failed {self.consecutive_failures} times in a row! "
                        f"Last error: {str(e)[:200]}"
                    )
@@ -90,7 +90,7 @@ class CanaryMonitor:
        - Scrape time is reasonable
        - Data structure is valid
        """
-        from modules.scraper_clean import fast_scrape_reviews
+        from scrapers.google_reviews.v1_0_0 import fast_scrape_reviews
        log.info(f"Running canary scrape test on {self.test_url[:60]}...")
        self.last_run = datetime.now()
@@ -121,7 +121,7 @@ class CanaryMonitor:
            if all_passed:
                # Success!
                log.info(
-                    f"✅ Canary test PASSED: {result['count']} reviews in {result['time']:.1f}s"
+                    f"Canary test PASSED: {result['count']} reviews in {result['time']:.1f}s"
                )
                self.consecutive_failures = 0
                self.last_success = datetime.now()
@@ -144,7 +144,7 @@ class CanaryMonitor:
                # Validation failed
                failed_checks = [k for k, v in checks.items() if not v]
                log.error(
-                    f"❌ Canary test FAILED: validation failed on {failed_checks}"
+                    f"Canary test FAILED: validation failed on {failed_checks}"
                )
                self.consecutive_failures += 1
                self.last_result = {
@@ -167,12 +167,12 @@ class CanaryMonitor:
                # Alert on failure
                if self.consecutive_failures >= 3:
                    await self.send_alert(
-                        f"🚨 CRITICAL: Canary validation failed {self.consecutive_failures} times! "
+                        f"CRITICAL: Canary validation failed {self.consecutive_failures} times! "
                        f"Failed checks: {failed_checks}"
                    )
        except asyncio.TimeoutError:
-            log.error("❌ Canary test TIMEOUT (>60s)")
+            log.error("Canary test TIMEOUT (>60s)")
            self.consecutive_failures += 1
            self.last_result = {
                "status": "timeout",
@@ -186,11 +186,11 @@ class CanaryMonitor:
            if self.consecutive_failures >= 3:
                await self.send_alert(
-                    f"🚨 CRITICAL: Canary timeout {self.consecutive_failures} times!"
+                    f"CRITICAL: Canary timeout {self.consecutive_failures} times!"
                )
        except Exception as e:
-            log.error(f"❌ Canary test ERROR: {e}")
+            log.error(f"Canary test ERROR: {e}")
            self.consecutive_failures += 1
            self.last_result = {
                "status": "error",
--- a/utils/helpers.py
+++ b/utils/helpers.py
--- a/modules/structured_logger.py
+++ b/modules/structured_logger.py
--- a/workers/init.py
+++ b/workers/init.py
--- a/workers/chrome_pool.py
+++ b/workers/chrome_pool.py