# Job DevTools - Implementation Tasks ## Dependency Graph ``` Wave 1 (Parallel start): #1 StructuredLogger ──┬──▶ #2 Migrate scraper ──▶ #3 SSE stream ──▶ #5 JobDevTools │ │ ├──▶ #4 DB schema ──┬──▶ #10 Crash analyzer ▼ │ │ │ #6 LogViewer │ │ ▼ │ │ ├──▶ #11 Crash API ▼ │ │ │ #7 CopyToolbar │ │ ▼ │ │ │ #12 CrashReport ▼ │ │ #8 LogEntry │ └──▶ #13 Session capture │ │ │ │ └──▶ #9 Crash detection ▼ │ │ #14 SessionPanel │ │ │ │ └───────────────────┼───────────────┘ │ #16 Topics inference ──▶ #17 Topic tags ▼ #15 MetricsDashboard │ ▼ #18 INTEGRATION ``` --- ## Task Details ### Track A: Backend Logging Infrastructure #### Task #1: Create StructuredLogger class in Python backend **Priority:** P0 (Foundation) **Blocks:** #2, #3, #4, #9 Create `modules/structured_logger.py`: ```python from dataclasses import dataclass, field, asdict from typing import Optional, Dict, Any, List, Literal from datetime import datetime import threading import time LogLevel = Literal['DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL'] LogCategory = Literal['scraper', 'browser', 'network', 'system'] @dataclass class LogEntry: timestamp: str timestamp_ms: int level: LogLevel category: LogCategory message: str metrics: Optional[Dict[str, Any]] = None network: Optional[Dict[str, Any]] = None snapshot_id: Optional[str] = None class StructuredLogger: def __init__(self, max_entries: int = 10000): self._logs: List[LogEntry] = [] self._lock = threading.Lock() self._max_entries = max_entries def _log(self, level: LogLevel, category: LogCategory, message: str, metrics: Dict = None, network: Dict = None, snapshot_id: str = None): now = datetime.utcnow() entry = LogEntry( timestamp=now.isoformat() + 'Z', timestamp_ms=int(time.time() * 1000), level=level, category=category, message=message, metrics=metrics, network=network, snapshot_id=snapshot_id ) with self._lock: self._logs.append(entry) if len(self._logs) > self._max_entries: self._logs = self._logs[-self._max_entries:] def debug(self, category: LogCategory, message: str, **kwargs): self._log('DEBUG', category, message, **kwargs) def info(self, category: LogCategory, message: str, **kwargs): self._log('INFO', category, message, **kwargs) def warn(self, category: LogCategory, message: str, **kwargs): self._log('WARN', category, message, **kwargs) def error(self, category: LogCategory, message: str, **kwargs): self._log('ERROR', category, message, **kwargs) def fatal(self, category: LogCategory, message: str, **kwargs): self._log('FATAL', category, message, **kwargs) def get_logs(self) -> List[Dict]: with self._lock: return [asdict(e) for e in self._logs] def get_logs_by_category(self, category: LogCategory) -> List[Dict]: with self._lock: return [asdict(e) for e in self._logs if e.category == category] ``` --- #### Task #2: Migrate scraper_clean.py to use StructuredLogger **Blocked by:** #1 **Blocks:** #3 Update all log calls in `modules/scraper_clean.py`: - Replace `LogCapture` with `StructuredLogger` - Add category to each log call - Add metrics where relevant (scroll_count, reviews_count, memory_mb) --- #### Task #3: Update SSE stream to emit structured log events **Blocked by:** #1, #2 **Blocks:** #5, #15 Update `api_server_production.py`: - Change log event format to include full LogEntry structure - Add metrics event type emitted every 5 seconds - Backward compatibility for old clients --- #### Task #4: Add crash_reports table and schema **Blocked by:** #1 **Blocks:** #10, #11, #13 Add to `modules/database.py`: ```sql CREATE TABLE crash_reports ( crash_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), job_id UUID REFERENCES jobs(job_id) ON DELETE CASCADE, created_at TIMESTAMP NOT NULL DEFAULT NOW(), crash_type VARCHAR(50) NOT NULL, error_message TEXT, state JSONB NOT NULL, metrics_history JSONB, logs_before_crash JSONB, analysis JSONB, screenshot_url TEXT ); ALTER TABLE jobs ADD COLUMN IF NOT EXISTS session_fingerprint JSONB; ALTER TABLE jobs ADD COLUMN IF NOT EXISTS metrics_history JSONB; ``` --- ### Track B: Frontend Log Viewer #### Task #5: Create JobDevTools React container component **Blocked by:** #3 **Blocks:** #6, #18 Create `web/components/JobDevTools/index.tsx`: - Tab bar: All, Scraper, Browser, Network, System - Count badges per tab - Renders LogViewer, CopyToolbar, SessionPanel, CrashReport --- #### Task #6: Create LogViewer component with virtualized list **Blocked by:** #5 **Blocks:** #7, #18 Create `web/components/JobDevTools/LogViewer.tsx`: - Virtualized list (react-window) - Level filter, search, auto-scroll toggle - Timestamp format toggle --- #### Task #7: Create CopyToolbar and copy utilities **Blocked by:** #6 **Blocks:** #8, #18 Create: - `web/components/JobDevTools/CopyToolbar.tsx` - `web/lib/copy-utils.ts` --- #### Task #8: Create LogEntry row component with click-to-copy **Blocked by:** #7 **Blocks:** #18 Create `web/components/JobDevTools/LogEntry.tsx`: - Click to copy, shift+click for range - Level/category badges with colors - Expandable metrics view --- ### Track C: Crash Analysis #### Task #9: Implement crash detection wrapper in scraper **Blocked by:** #1 **Blocks:** #10 Add to `modules/scraper_clean.py`: - Wrap execution in try/catch - Periodic metrics sampling (5s interval) - Compile CrashReport on failure - Helper: get_chrome_memory(), get_dom_node_count(), classify_crash() --- #### Task #10: Create crash pattern analyzer **Blocked by:** #4, #9 **Blocks:** #11 Create `modules/crash_analyzer.py`: - Pattern detection: memory_exhaustion, dom_bloat, rate_limited, consent_loop, scroll_timeout, element_stale - Confidence scoring - Suggested fix generation - Auto-fix parameters --- #### Task #11: Add crash report API endpoints **Blocked by:** #4, #10 **Blocks:** #12 Add to `api_server_production.py`: - GET /jobs/{job_id}/crash-report - POST /jobs/{job_id}/retry?apply_fix=... - GET /crashes/stats --- #### Task #12: Create CrashReport frontend component **Blocked by:** #11 **Blocks:** #18 Create `web/components/JobDevTools/CrashReport.tsx`: - Timeline to crash visualization - Pattern analysis display - "Apply Fix & Retry" button - Collapsible logs before crash --- ### Track D: Session & Metrics #### Task #13: Capture and store session fingerprint in backend **Blocked by:** #4 **Blocks:** #14 Add to `modules/scraper_clean.py`: - Compile SessionFingerprint at job start - Run bot detection tests - Store in job metadata --- #### Task #14: Create SessionPanel frontend component **Blocked by:** #13 **Blocks:** #18 Create `web/components/JobDevTools/SessionPanel.tsx`: - "What Google Saw" display - Identity, Geolocation, Viewport sections - Bot detection indicators (green/yellow/red) --- #### Task #15: Create MetricsDashboard with real-time charts **Blocked by:** #3 **Blocks:** #18 Create `web/components/JobDevTools/MetricsDashboard.tsx`: - Extraction rate line chart - Cumulative reviews area chart - Memory usage line chart - API vs DOM pie chart --- ### Track E: Review Topics #### Task #16: Implement review topics inference algorithm **Blocks:** #17 Add to `modules/scraper_clean.py`: - `infer_review_topics(review_text, topics)` function - Word boundary matching - Simple stemming variants - Add 'topics' field to each review --- #### Task #17: Add topic tags to review cards in frontend **Blocked by:** #16 Update: - `web/components/ReviewAnalytics.tsx` - `web/lib/analytics.ts` Add topic tags to reviews, topic filter, topic distribution chart. --- #### Task #18: Integrate JobDevTools into job detail page **Blocked by:** #5, #6, #7, #8, #12, #14, #15 Replace current log display with JobDevTools component. Handle both old and new log formats. Connect SSE stream for real-time updates. --- ## Execution Waves | Wave | Tasks | Parallel Agents | |------|-------|-----------------| | 1 | #1, #16 | 2 | | 2 | #2, #4, #9, #17 | 4 | | 3 | #3, #10, #13 | 3 | | 4 | #5, #11, #14, #15 | 4 | | 5 | #6, #12 | 2 | | 6 | #7 → #8 | 1 (sequential) | | 7 | #18 | 1 | **Critical Path:** #1 → #2 → #3 → #5 → #6 → #7 → #8 → #18