Commit Graph

35 Commits

Author SHA1 Message Date
Alejandro Gutiérrez
194e6e0fbf feat: Add view toggle between table and card views on pipeline page
- Add ViewToggle component with table/cards icons
- Default to table view with TanStack table
- Card view shows execution cards in grid layout
- Toggle persists view preference during session

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 21:19:30 +00:00
Alejandro Gutiérrez
4d48437b21 feat: Add TanStack table for pipeline executions with debug modal
- Create ExecutionsView component with TanStack Table
- Add status filter buttons with count badges
- Add action buttons: Analytics, Metrics, Debug
- Add debug modal with AI copy-paste button for failed executions
- Generate detailed debug report with stage metrics and error context
- Update executions page to use new component

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 21:16:58 +00:00
Alejandro Gutiérrez
796f587c57 feat: Add pipeline execution UI, stage metrics, and API proxy routes
- Add run pipeline page with job selection UI
- Add execution detail page with stage metrics visualization
- Add stage_metrics and total_duration_ms to pipeline.executions table
- Create Next.js API proxy routes for all pipeline endpoints
- Fix trailing slash issues in pipeline-api.ts URLs
- Add Docker volume mounts for pipeline packages
- Add REVIEWIQ_DATABASE_URL and LLM API keys to docker-compose
- Fix JSONB field parsing in execution detail endpoint

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 21:14:27 +00:00
Alejandro Gutiérrez
acdfed8044 fix: Improve version dropdown text contrast
Added text-gray-900 and font-medium classes to select element
for better readability.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 20:22:05 +00:00
Alejandro Gutiérrez
9f714913db feat: Add scraper version selector to frontend
- Add version selector dropdown in scrape confirmation modal
- Default to v1.1.0 (Multi-Sort) which bypasses ~1000 review limit
- Pass scraper_version through API proxy to backend
- Update /new page fallback to show v1.1.0 as available
- Show version description explaining multi-sort benefits

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 19:13:52 +00:00
Alejandro Gutiérrez
824634aa76 feat: Add extensible multi-pipeline integration system
This commit implements a plugin-like pipeline architecture with:

Pipeline Core Package (packages/pipeline-core/):
- BasePipeline abstract class all pipelines implement
- PipelineRegistry for database-backed discovery/management
- PipelineRunner for execution with status tracking
- DashboardConfig contracts for dynamic widget definitions

Database Migration (006_pipeline_registry.sql):
- pipeline.registry table for registered pipelines
- pipeline.executions table for execution history
- Views for execution stats and monitoring

ReviewIQ Pipeline Refactor:
- Implements BasePipeline interface
- Adds get_dashboard_config() with widget definitions
- Adds get_widget_data() methods for all dashboard widgets
- Maintains backward compatibility with Pipeline alias

Generic Pipeline API (api/routes/pipelines.py):
- GET /api/pipelines - List all registered pipelines
- GET /api/pipelines/{id} - Pipeline details
- POST /api/pipelines/{id}/execute - Execute pipeline
- GET /api/pipelines/{id}/dashboard - Dashboard config
- GET /api/pipelines/{id}/widgets/{w} - Widget data
- GET /api/pipelines/{id}/executions - Execution history

Frontend Dynamic Dashboard System:
- DynamicDashboard component renders from config
- WidgetRegistry maps types to components
- Widget components: StatCard, LineChart, BarChart,
  PieChart, DataTable, Heatmap
- Pipeline API client library

Frontend Pipeline Pages:
- /pipelines - List all registered pipelines
- /pipelines/[id] - Dynamic dashboard for pipeline
- /pipelines/[id]/executions - Execution history
- Pipelines nav item in Sidebar

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 19:05:38 +00:00
Alejandro Gutiérrez
65eb979c12 feat: Add "Copy Crash Report" button for failed/partial jobs
- Generate structured markdown crash report optimized for Claude
- Includes: job metadata, timeline, progress, error, logs (last 50)
- Adds context and suggested investigation steps
- Orange clipboard button appears for failed/partial jobs
- Shows green checkmark briefly after successful copy
- Fetches logs async when generating report

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 17:09:48 +00:00
Alejandro Gutiérrez
c2996bef1e fix: Calculate job speed using last successful data retrieval timestamp
- Use updated_at (last successful data loop) instead of Date.now()
- Speed now reflects actual data retrieval rate, not declining over time
- Updated in table column, monitored job view, and stats row
- Fall back to Date.now() if updated_at is not available

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 17:04:35 +00:00
Alejandro Gutiérrez
5165d65152 fix: Center confirmation modal using transform
- Use fixed positioning with top/left 50% and translate -50%
- More reliable centering regardless of parent containers
- Add max-width for mobile responsiveness

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:50:08 +00:00
Alejandro Gutiérrez
83b245bbfc fix: Show blue background with spinner during validation
- Keep blue background when isCheckingReviews is true
- Add cursor-wait during validation
- Move disabled styling to explicit condition check
- White spinner now visible on blue background

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:49:35 +00:00
Alejandro Gutiérrez
e0e86d2830 feat: Persist jobs to localStorage and reset search after launch
- Reset search fields after job is successfully launched
- Allow user to immediately start another scrape
- Save active jobs to localStorage for persistence across refresh
- Restore jobs from localStorage on page load
- Resume polling for non-terminal jobs (pending/running)
- Filter out jobs older than 24 hours
- Add remove button (X) to each job card
- Clean up localStorage when jobs are removed

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:47:01 +00:00
Alejandro Gutiérrez
0c8da54045 fix: Center confirmation modal properly
- Remove w-full that caused alignment issues
- Use fixed width (400px) for consistent centering

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:40:54 +00:00
Alejandro Gutiérrez
ccfe00cebe fix: Properly center map click modal
- Remove w-full and mx-auto that caused alignment issues
- Use fixed width (280px) instead of max-w-xs
- Let flex container handle centering

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:40:12 +00:00
Alejandro Gutiérrez
956d5dacda fix: Center map click modal with proper padding
- Center modal properly within map preview area
- Add 24px padding from map edges
- Make modal more compact (max-w-xs)
- Reduce text and element sizes for better fit

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:38:49 +00:00
Alejandro Gutiérrez
d4c3018429 refactor: Change search fields to horizontal layout
- Place Business Name, Location, and Validate button in same row
- Reduce padding and font sizes for compact inline layout
- Show abbreviated text on mobile (responsive)
- Use checkmark indicator for auto-detected location

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:37:08 +00:00
Alejandro Gutiérrez
82b2c51e4e feat: Split search into Business Name + Location fields
- Split single search input into two fields: Business Name (required)
  and Location (auto-detected from IP geolocation)
- Auto-fill location field with city/country from IP on page load
- Add click overlay on map iframe to prevent interaction
- Add warning modal when user clicks map, directing them to use search
- Update test URLs to use split format
- Make Validate button full-width for better UX

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:35:15 +00:00
Alejandro Gutiérrez
afab5127b3 Restore Google Maps iframe preview
- Restore original Google Maps embed iframe approach
- URL: maps.google.com/maps?q=...&output=embed&z=15
- Add "Open in Maps" overlay button on the map
- Height 300px for better visibility

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:29:33 +00:00
Alejandro Gutiérrez
43fd1515d2 Align artifacts with canonical URT v5.1 specification
Fixes inconsistencies discovered during audit against urt-taxonomy/:

- urt_profile ENUM: Add 'lite' and 'core' profiles (was missing)
- USN format: Use canonical regex from spec (was non-compliant)
- USN valence encoding: Add V0 (0) and V± (±) support
- USN grammar: Add Lite (URT:L:) and Core (URT:C:) formats
- Dimension codes: Fix temporal (TC/TR/TH/TF), evidence (ES/EI/EC),
  comparative (CR-N/CR-B/CR-W/CR-S) in decisions doc
- LLM contract: Full USN regex validation pattern

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:21:21 +00:00
Alejandro Gutiérrez
7666b7aea2 Fix: Replace broken Google Maps iframe with interactive preview + add scraper type selection
- Replace non-working Google Maps embed iframe with animated location preview
- Add "Open in Google Maps" button to open location in new tab
- Add scraper type selection dropdown fetching from /api/admin/scrapers
- Show selected scraper info with formatted labels (Google Reviews v1.0.0)
- Include scraper_version and scraper_variant in job submission

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:15:58 +00:00
Alejandro Gutiérrez
3317553658 Wire frontend to real API endpoints
Dashboard page:
- Fetch top clients from /api/dashboard/by-client
- Show loading state while fetching
- Display empty state when no client data
- Show real client_id, job count, and success rate

Scrapers page:
- Fetch versions from /api/admin/scrapers
- Wire promote/deprecate buttons to real API calls
- Wire add version form to POST /api/admin/scrapers
- Wire traffic allocation to PUT /api/admin/scrapers/{id}/traffic
- Add loading and error states

Dockerfile:
- Add COPY commands for new directories (api/, core/, scrapers/, etc.)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 16:05:29 +00:00
Alejandro Gutiérrez
39c80fc8be Phases 5-7: Dashboard UI, Admin API, and Auth middleware
Phase 5 - Main Dashboard:
- Dashboard overview page with system health stats
- Jobs by status breakdown, success rates, top clients
- Dashboard API (/api/dashboard/overview, by-client, problems, by-version)

Phase 6 - Admin/Scraper Management:
- Scrapers management page with traffic allocation UI
- Admin API for scraper CRUD operations
- Traffic percentage updates for A/B testing
- Promote/deprecate scraper versions

Phase 7 - Authentication:
- API key authentication middleware
- SHA-256 key hashing (keys never stored in plain text)
- Scope-based authorization (jobs:read, jobs:write, admin)
- Rate limiting per API key

Also:
- Updated api_server_production.py to include new routers
- Extended core/database.py with dashboard query methods
- Added dashboard link to sidebar navigation
- Updated CONTEXT-KEEPER.md to mark all phases complete

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 15:43:00 +00:00
Alejandro Gutiérrez
12d37e350b Fix JobDevTools contrast + log normalization, add Platform Spec
- Fix contrast issues in JobDevTools (level badges, text colors, timestamps)
- Make log normalization more robust (handles old/new formats, edge cases)
- Add ReviewIQ Platform Spec v1.2 defining:
  - Multi-tenant scraping-as-a-service architecture
  - Requester metadata, batches, webhooks, priority
  - Scraper versioning with A/B testing (stable/beta/canary)
  - API endpoints for job types, dashboard, admin
  - Output schemas for external service integration
  - Project structure reorganization plan

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 15:13:19 +00:00
Alejandro Gutiérrez
1e5401a9d1 Fix: Handle undefined rating_snapshot in job detail page 2026-01-24 13:15:14 +00:00
Alejandro Gutiérrez
eab0b4a7e9 Fix: Maximum update depth exceeded in NewScrapePage
Wrap handleJobsChange in useCallback to prevent infinite re-renders
caused by onJobsChange dependency changing on every render.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 13:14:23 +00:00
Alejandro Gutiérrez
cd9639f3b1 Wave 7: Integrate JobDevTools into job detail page (FINAL)
- Task #18: Complete integration of all JobDevTools components
  - Updated job detail page (/jobs/[id]) with full JobDevTools UI
  - Connected SSE stream for real-time structured logs + metrics
  - Added crash-report and retry API routes for Next.js
  - Added format conversion for old/new log formats
  - Added DevTools links to JobsView modal and actions column
  - Wired up CrashReport retry with auto-fix parameters
  - Integrated SessionPanel for fingerprint display
  - Integrated MetricsDashboard for real-time charts

Job DevTools implementation complete: 18/18 tasks

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 13:11:19 +00:00
Alejandro Gutiérrez
c6443166b2 Wave 6: CopyToolbar utilities and LogEntry row component
- Task #7: Create CopyToolbar and copy utilities
  (copy-utils.ts with text/JSON/CSV formatting, clipboard API with fallback)
  (CopyToolbar with copy all/selected, format dropdown, download export)
- Task #8: Create LogEntry row component
  (click-to-copy with visual feedback, expandable metrics view)
  (level/category badges, search highlighting, shift+click selection)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 12:51:48 +00:00
Alejandro Gutiérrez
5ce3248efd Wave 5: LogViewer virtualized list and CrashReport component
- Task #6: Create LogViewer with react-window virtualization
  (search with highlighting, auto-scroll toggle, timestamp format toggle)
  (shift+click range selection, level/category color badges)
- Task #12: Create CrashReport frontend component
  (crash timeline SVG, pattern analysis with confidence bar)
  (auto-fix params display, retry API integration)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 12:44:35 +00:00
Alejandro Gutiérrez
2637d982e0 Wave 4: JobDevTools UI components and crash report API
- Task #5: Create JobDevTools container component
  (tabs: All/Scraper/Browser/Network/System, level filters, count badges)
- Task #11: Add crash report API endpoints
  (GET /jobs/{id}/crash-report, POST /jobs/{id}/retry?apply_fix=true, GET /crashes/stats)
- Task #14: Create SessionPanel component
  (fingerprint display, bot detection indicators, collapsible sections)
- Task #15: Create MetricsDashboard with recharts
  (extraction rate, cumulative reviews, memory usage, scroll progress)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 12:37:56 +00:00
Alejandro Gutiérrez
9e1bcde981 Wave 2: Migrate scraper to StructuredLogger, add crash detection & topic tags
- Task #2: Migrate scraper_clean.py to use StructuredLogger with categories
  (37 log calls with metrics across browser/scraper/network/system)
- Task #4: Add crash_reports table schema and database methods
  (save_crash_report, get_crash_report, get_crash_stats)
- Task #9: Implement crash detection wrapper with metrics sampling
  (get_chrome_memory, get_dom_node_count, classify_crash)
- Task #17: Add topic tags to frontend ReviewAnalytics
  (topic filter UI, tags on cards, topics in modal)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 12:17:23 +00:00
Alejandro Gutiérrez
b1296059a9 Add URL-based routing with sidebar navigation
Replace client-side state switching with proper Next.js routes:
- /new - New scrape form
- /jobs - Jobs list with table view
- /jobs/[id] - Individual job details and logs
- /analytics - Analytics overview (completed jobs)
- /analytics/[id] - Analytics for specific job

Add JobsContext for shared state across routes. Update Sidebar
to use next/link with pathname matching. Root page redirects to /new.

Also adds partial job status styling to JobsView.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 10:58:48 +00:00
Alejandro Gutiérrez
a540ab97b1 Add browser fingerprint support and analytics metadata display
- Transfer user's browser fingerprint (user-agent, viewport, timezone,
  language, geolocation) to Chrome for more authentic scraping
- Display review topics from Google Maps in analytics dashboard
- Show business category badge in analytics header
- Fix date_text null handling in analytics (handle undefined/timestamp fields)
- Add review_topics and business_category to JobStatus interface

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 10:36:06 +00:00
Alejandro Gutiérrez
47bb032011 Clean up project root - remove 51 obsolete files
Deleted:
- 26 old markdown summary/documentation files
- 16 debug/test Python scripts (debug_*, test_*, diagnose_*)
- 10 untracked JSON files from api_response_samples
- terms-of-usage.md, pane_not_found.png

Also includes pending web app changes:
- Jobs management UI (JobsView, Sidebar components)
- API routes for job streaming and comparison
- Enhanced ReviewAnalytics and ScraperTest components

Final clean structure:
├── api_server_production.py  (main entry)
├── modules/                  (core Python)
├── web/                      (Next.js frontend)
├── tests/                    (test suite)
├── docs/                     (documentation)
└── examples/                 (usage examples)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-23 17:31:53 +00:00
Alejandro Gutiérrez
01ea18d91d Add test URL quick-select buttons to frontend
- Small (~79 reviews): R. Fleitas Peluqueros
- Medium (~589 reviews): ClickRent Gran Canaria
- Large (~2000+ reviews): Hospital Doctor Negrín

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-22 14:20:54 +00:00
Alejandro Gutiérrez
c8c24ae483 Add robust structural pattern matching and early no-reviews detection
BREAKING IMPROVEMENTS:

1. Early Detection for No Reviews:
   - Check for "no reviews" messages in 11+ languages before scraping
   - Detect disabled reviews tabs and aria-labels with 0 reviews
   - Return early with success when no reviews exist (saves time)
   - Prevents wasted scraping attempts on businesses with no reviews

2. Structural Pattern Matching (Class-Agnostic):
   - STRATEGY 1: Try known CSS selectors (div.jftiEf.fontBodyMedium, etc.)
   - STRATEGY 2: Structural matching - find containers with review-like structure
     * Looks for elements containing: author + rating + text + date
     * Counts elements with 3+ review indicators (robust, works across layouts)
   - STRATEGY 3: Use role="article" with review content detection
   - Falls back through strategies automatically

3. Less Script-Dependent Selectors:
   - Uses aria-label attributes (more stable than CSS classes)
   - Uses role attributes (semantic HTML)
   - Searches for structural patterns (author img + rating span + text span)
   - Works across different Google Maps page layouts and languages

4. Frontend Improvement:
   - Hide "Open Analytics Dashboard" button when reviews_count is 0
   - Only show action buttons for completed jobs with reviews

TECHNICAL DETAILS:

Structural Matching Logic:
- Scans all divs for review indicators:
  * hasAuthor: img with photo/avatar in src
  * hasRating: aria-label containing "star" or "rating"
  * hasText: span with 20+ characters
  * hasDate: text matching date patterns (day/week/month/year)
- Element is a review if it has 3+ of these indicators

Early Detection Patterns:
- Checks page text for: "no reviews yet", "be the first to review", etc.
- Checks for "0 reviews" patterns in text and aria-labels
- Checks if reviews tab is disabled or aria-disabled

Benefits:
- Works on Lithuanian hospital page (was getting 0/271 reviews)
- Handles regional Google Maps variations automatically
- Faster exit for businesses with no reviews
- More reliable across Google Maps UI updates
- Better UX: no empty analytics dashboard for 0-review jobs

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-18 19:52:39 +00:00
Alejandro Gutiérrez
faa0704737 Optimize scraper performance and add fallback selectors for robustness
Performance improvements:
- Validation speed: 59.71s → 10.96s (5.5x improvement)
- Removed 50+ console.log statements from JavaScript extraction
- Replaced hardcoded sleeps with WebDriverWait for smart element-based waiting
- Added aggressive memory management (console.clear, GC, image unloading every 20 scrolls)

Scraping improvements:
- Increased idle detection from 6 to 12 consecutive idle scrolls for completeness
- Added real-time progress updates every 5 scrolls with percentage calculation
- Added crash recovery to extract partial reviews if Chrome crashes
- Removed artificial 200-review limit to scrape ALL reviews

Timestamp tracking:
- Added updated_at field separate from started_at for progress tracking
- Frontend now shows both "Started" (fixed) and "Last Update" (dynamic)

Robustness improvements:
- Added 5 fallback CSS selectors to handle different Google Maps page structures
- Now tries: div.jftiEf.fontBodyMedium, div.jftiEf, div[data-review-id], etc.
- Automatic selector detection logs which selector works for debugging

Test results:
- Successfully scraped 550 reviews in 150.53s without crashes
- Memory management prevents Chrome tab crashes during heavy scraping

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-18 19:49:24 +00:00