18 tasks organized in 5 parallel tracks: - Track A: Backend logging infrastructure (4 tasks) - Track B: Frontend log viewer (5 tasks) - Track C: Crash analysis (4 tasks) - Track D: Session & metrics (3 tasks) - Track E: Review topics (2 tasks) Includes dependency graph and 7-wave execution plan for parallel AI agent workflow. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
9.5 KiB
Job DevTools - Implementation Tasks
Dependency Graph
Wave 1 (Parallel start):
#1 StructuredLogger ──┬──▶ #2 Migrate scraper ──▶ #3 SSE stream ──▶ #5 JobDevTools
│ │
├──▶ #4 DB schema ──┬──▶ #10 Crash analyzer ▼
│ │ │ #6 LogViewer
│ │ ▼ │
│ ├──▶ #11 Crash API ▼
│ │ │ #7 CopyToolbar
│ │ ▼ │
│ │ #12 CrashReport ▼
│ │ #8 LogEntry
│ └──▶ #13 Session capture │
│ │ │
└──▶ #9 Crash detection ▼ │
│ #14 SessionPanel │
│ │ │
└───────────────────┼───────────────┘
│
#16 Topics inference ──▶ #17 Topic tags ▼
#15 MetricsDashboard
│
▼
#18 INTEGRATION
Task Details
Track A: Backend Logging Infrastructure
Task #1: Create StructuredLogger class in Python backend
Priority: P0 (Foundation) Blocks: #2, #3, #4, #9
Create modules/structured_logger.py:
from dataclasses import dataclass, field, asdict
from typing import Optional, Dict, Any, List, Literal
from datetime import datetime
import threading
import time
LogLevel = Literal['DEBUG', 'INFO', 'WARN', 'ERROR', 'FATAL']
LogCategory = Literal['scraper', 'browser', 'network', 'system']
@dataclass
class LogEntry:
timestamp: str
timestamp_ms: int
level: LogLevel
category: LogCategory
message: str
metrics: Optional[Dict[str, Any]] = None
network: Optional[Dict[str, Any]] = None
snapshot_id: Optional[str] = None
class StructuredLogger:
def __init__(self, max_entries: int = 10000):
self._logs: List[LogEntry] = []
self._lock = threading.Lock()
self._max_entries = max_entries
def _log(self, level: LogLevel, category: LogCategory, message: str,
metrics: Dict = None, network: Dict = None, snapshot_id: str = None):
now = datetime.utcnow()
entry = LogEntry(
timestamp=now.isoformat() + 'Z',
timestamp_ms=int(time.time() * 1000),
level=level,
category=category,
message=message,
metrics=metrics,
network=network,
snapshot_id=snapshot_id
)
with self._lock:
self._logs.append(entry)
if len(self._logs) > self._max_entries:
self._logs = self._logs[-self._max_entries:]
def debug(self, category: LogCategory, message: str, **kwargs):
self._log('DEBUG', category, message, **kwargs)
def info(self, category: LogCategory, message: str, **kwargs):
self._log('INFO', category, message, **kwargs)
def warn(self, category: LogCategory, message: str, **kwargs):
self._log('WARN', category, message, **kwargs)
def error(self, category: LogCategory, message: str, **kwargs):
self._log('ERROR', category, message, **kwargs)
def fatal(self, category: LogCategory, message: str, **kwargs):
self._log('FATAL', category, message, **kwargs)
def get_logs(self) -> List[Dict]:
with self._lock:
return [asdict(e) for e in self._logs]
def get_logs_by_category(self, category: LogCategory) -> List[Dict]:
with self._lock:
return [asdict(e) for e in self._logs if e.category == category]
Task #2: Migrate scraper_clean.py to use StructuredLogger
Blocked by: #1 Blocks: #3
Update all log calls in modules/scraper_clean.py:
- Replace
LogCapturewithStructuredLogger - Add category to each log call
- Add metrics where relevant (scroll_count, reviews_count, memory_mb)
Task #3: Update SSE stream to emit structured log events
Blocked by: #1, #2 Blocks: #5, #15
Update api_server_production.py:
- Change log event format to include full LogEntry structure
- Add metrics event type emitted every 5 seconds
- Backward compatibility for old clients
Task #4: Add crash_reports table and schema
Blocked by: #1 Blocks: #10, #11, #13
Add to modules/database.py:
CREATE TABLE crash_reports (
crash_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
job_id UUID REFERENCES jobs(job_id) ON DELETE CASCADE,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
crash_type VARCHAR(50) NOT NULL,
error_message TEXT,
state JSONB NOT NULL,
metrics_history JSONB,
logs_before_crash JSONB,
analysis JSONB,
screenshot_url TEXT
);
ALTER TABLE jobs ADD COLUMN IF NOT EXISTS session_fingerprint JSONB;
ALTER TABLE jobs ADD COLUMN IF NOT EXISTS metrics_history JSONB;
Track B: Frontend Log Viewer
Task #5: Create JobDevTools React container component
Blocked by: #3 Blocks: #6, #18
Create web/components/JobDevTools/index.tsx:
- Tab bar: All, Scraper, Browser, Network, System
- Count badges per tab
- Renders LogViewer, CopyToolbar, SessionPanel, CrashReport
Task #6: Create LogViewer component with virtualized list
Blocked by: #5 Blocks: #7, #18
Create web/components/JobDevTools/LogViewer.tsx:
- Virtualized list (react-window)
- Level filter, search, auto-scroll toggle
- Timestamp format toggle
Task #7: Create CopyToolbar and copy utilities
Blocked by: #6 Blocks: #8, #18
Create:
web/components/JobDevTools/CopyToolbar.tsxweb/lib/copy-utils.ts
Task #8: Create LogEntry row component with click-to-copy
Blocked by: #7 Blocks: #18
Create web/components/JobDevTools/LogEntry.tsx:
- Click to copy, shift+click for range
- Level/category badges with colors
- Expandable metrics view
Track C: Crash Analysis
Task #9: Implement crash detection wrapper in scraper
Blocked by: #1 Blocks: #10
Add to modules/scraper_clean.py:
- Wrap execution in try/catch
- Periodic metrics sampling (5s interval)
- Compile CrashReport on failure
- Helper: get_chrome_memory(), get_dom_node_count(), classify_crash()
Task #10: Create crash pattern analyzer
Blocked by: #4, #9 Blocks: #11
Create modules/crash_analyzer.py:
- Pattern detection: memory_exhaustion, dom_bloat, rate_limited, consent_loop, scroll_timeout, element_stale
- Confidence scoring
- Suggested fix generation
- Auto-fix parameters
Task #11: Add crash report API endpoints
Blocked by: #4, #10 Blocks: #12
Add to api_server_production.py:
- GET /jobs/{job_id}/crash-report
- POST /jobs/{job_id}/retry?apply_fix=...
- GET /crashes/stats
Task #12: Create CrashReport frontend component
Blocked by: #11 Blocks: #18
Create web/components/JobDevTools/CrashReport.tsx:
- Timeline to crash visualization
- Pattern analysis display
- "Apply Fix & Retry" button
- Collapsible logs before crash
Track D: Session & Metrics
Task #13: Capture and store session fingerprint in backend
Blocked by: #4 Blocks: #14
Add to modules/scraper_clean.py:
- Compile SessionFingerprint at job start
- Run bot detection tests
- Store in job metadata
Task #14: Create SessionPanel frontend component
Blocked by: #13 Blocks: #18
Create web/components/JobDevTools/SessionPanel.tsx:
- "What Google Saw" display
- Identity, Geolocation, Viewport sections
- Bot detection indicators (green/yellow/red)
Task #15: Create MetricsDashboard with real-time charts
Blocked by: #3 Blocks: #18
Create web/components/JobDevTools/MetricsDashboard.tsx:
- Extraction rate line chart
- Cumulative reviews area chart
- Memory usage line chart
- API vs DOM pie chart
Track E: Review Topics
Task #16: Implement review topics inference algorithm
Blocks: #17
Add to modules/scraper_clean.py:
infer_review_topics(review_text, topics)function- Word boundary matching
- Simple stemming variants
- Add 'topics' field to each review
Task #17: Add topic tags to review cards in frontend
Blocked by: #16
Update:
web/components/ReviewAnalytics.tsxweb/lib/analytics.ts
Add topic tags to reviews, topic filter, topic distribution chart.
Task #18: Integrate JobDevTools into job detail page
Blocked by: #5, #6, #7, #8, #12, #14, #15
Replace current log display with JobDevTools component. Handle both old and new log formats. Connect SSE stream for real-time updates.
Execution Waves
| Wave | Tasks | Parallel Agents |
|---|---|---|
| 1 | #1, #16 | 2 |
| 2 | #2, #4, #9, #17 | 4 |
| 3 | #3, #10, #13 | 3 |
| 4 | #5, #11, #14, #15 | 4 |
| 5 | #6, #12 | 2 |
| 6 | #7 → #8 | 1 (sequential) |
| 7 | #18 | 1 |
Critical Path: #1 → #2 → #3 → #5 → #6 → #7 → #8 → #18