Commit Graph

1 Commits

Author SHA1 Message Date
Alejandro Gutiérrez
e2d7f6f118 feat: Add ScraperV1Adapter and real data pipeline test
- Add ScraperV1Adapter to transform scraped reviews into pipeline format
  - Handles relative timestamps (centerDate)
  - Generates deterministic IDs for DOM-sourced reviews
  - Filters out empty (rating-only) reviews

- Add sample barbershop reviews (79 reviews, 46 with text)
  - Real data from Las Palmas barbershop
  - Multi-language: Spanish, English, German, Norwegian, Italian

- Add test_pipeline_real_data.py for E2E testing with real data
  - Uses mock classifier based on keywords and rating
  - Full pipeline flow: raw -> enriched -> spans -> issues -> facts

Test results with real data:
- 46 reviews processed
- 6 languages detected (es: 35, en: 7, de: 1, no: 1, it: 1, ca: 1)
- 3 issues identified from negative reviews
- 29 fact records aggregated across date range 2017-2025

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 18:35:09 +00:00