Wave 3: SSE structured logs, crash analyzer, session fingerprint

- Task #3: Update SSE stream to emit structured log events
  (type: "log" for entries, type: "metrics" every 5s, ?format=legacy for backward compat)
- Task #10: Create crash pattern analyzer module
  (6 patterns: memory_exhaustion, dom_bloat, rate_limited, consent_loop, scroll_timeout, element_stale)
  (confidence scoring, auto-fix params, summarize_crash_patterns for recurring issues)
- Task #13: Capture session fingerprint in backend
  (user_agent, platform, timezone, webgl, canvas, bot_detection_tests)
  (saved on success and failure for debugging)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Alejandro Gutiérrez
2026-01-24 12:34:17 +00:00
parent 44d017b3f7
commit f4ca60349e
4 changed files with 1152 additions and 41 deletions

View File

@@ -430,6 +430,47 @@ class DatabaseManager:
log.debug(f"Incremental save: {len(reviews)} reviews for job {job_id}")
async def update_session_fingerprint(
self,
job_id: UUID,
session_fingerprint: Dict[str, Any]
):
"""
Update the session fingerprint for a job.
This should be called early in the scraping process after the browser
fingerprint is captured, to record browser characteristics for
bot detection analysis.
Args:
job_id: Job UUID
session_fingerprint: Dictionary containing browser fingerprint data:
- user_agent: Browser user agent string
- platform: OS platform
- language: Primary language
- languages: List of accepted languages
- timezone: Timezone string
- screen: {width, height, colorDepth}
- viewport: {width, height}
- webgl_vendor: WebGL vendor string
- webgl_renderer: WebGL renderer string
- canvas_fingerprint: Canvas fingerprint hash
- hardware_concurrency: Number of CPU cores
- device_memory: Device memory in GB
- bot_detection_tests: {webdriver_hidden, chrome_runtime, permissions_query}
- captured_at: ISO timestamp when fingerprint was captured
"""
async with self.pool.acquire() as conn:
await conn.execute("""
UPDATE jobs
SET
session_fingerprint = $2::jsonb,
updated_at = NOW()
WHERE job_id = $1
""", job_id, json.dumps(session_fingerprint))
log.debug(f"Updated session fingerprint for job {job_id}")
async def mark_job_partial(
self,
job_id: UUID,