Wave 3: SSE structured logs, crash analyzer, session fingerprint
- Task #3: Update SSE stream to emit structured log events (type: "log" for entries, type: "metrics" every 5s, ?format=legacy for backward compat) - Task #10: Create crash pattern analyzer module (6 patterns: memory_exhaustion, dom_bloat, rate_limited, consent_loop, scroll_timeout, element_stale) (confidence scoring, auto-fix params, summarize_crash_patterns for recurring issues) - Task #13: Capture session fingerprint in backend (user_agent, platform, timezone, webgl, canvas, bot_detection_tests) (saved on success and failure for debugging) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -430,6 +430,47 @@ class DatabaseManager:
|
||||
|
||||
log.debug(f"Incremental save: {len(reviews)} reviews for job {job_id}")
|
||||
|
||||
async def update_session_fingerprint(
|
||||
self,
|
||||
job_id: UUID,
|
||||
session_fingerprint: Dict[str, Any]
|
||||
):
|
||||
"""
|
||||
Update the session fingerprint for a job.
|
||||
|
||||
This should be called early in the scraping process after the browser
|
||||
fingerprint is captured, to record browser characteristics for
|
||||
bot detection analysis.
|
||||
|
||||
Args:
|
||||
job_id: Job UUID
|
||||
session_fingerprint: Dictionary containing browser fingerprint data:
|
||||
- user_agent: Browser user agent string
|
||||
- platform: OS platform
|
||||
- language: Primary language
|
||||
- languages: List of accepted languages
|
||||
- timezone: Timezone string
|
||||
- screen: {width, height, colorDepth}
|
||||
- viewport: {width, height}
|
||||
- webgl_vendor: WebGL vendor string
|
||||
- webgl_renderer: WebGL renderer string
|
||||
- canvas_fingerprint: Canvas fingerprint hash
|
||||
- hardware_concurrency: Number of CPU cores
|
||||
- device_memory: Device memory in GB
|
||||
- bot_detection_tests: {webdriver_hidden, chrome_runtime, permissions_query}
|
||||
- captured_at: ISO timestamp when fingerprint was captured
|
||||
"""
|
||||
async with self.pool.acquire() as conn:
|
||||
await conn.execute("""
|
||||
UPDATE jobs
|
||||
SET
|
||||
session_fingerprint = $2::jsonb,
|
||||
updated_at = NOW()
|
||||
WHERE job_id = $1
|
||||
""", job_id, json.dumps(session_fingerprint))
|
||||
|
||||
log.debug(f"Updated session fingerprint for job {job_id}")
|
||||
|
||||
async def mark_job_partial(
|
||||
self,
|
||||
job_id: UUID,
|
||||
|
||||
Reference in New Issue
Block a user