Add API interception for hybrid scraping and update selectors

- Add new api_interceptor.py module for CDP network interception
- Capture Google Maps internal API responses during scrolling
- Parse protobuf-like JSON responses to extract review data
- Merge API-captured reviews with DOM-scraped data
- Update CSS selectors for January 2026 Google Maps structure
- Add cookie consent dismissal for multiple languages
- Add --api-intercept CLI flag and config option
- Fix review card and pane selectors (.jftiEf, .XiKgde)
- Improve review ID extraction from card elements

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
Alejandro Gutiérrez
2026-01-17 21:51:10 +00:00
parent 262f0c0be7
commit bdffb5eaac
5 changed files with 782 additions and 36 deletions

View File

@@ -57,6 +57,10 @@ def parse_arguments():
ap.add_argument("--custom-params", type=str, default=None,
help="JSON string with custom parameters to add to each document (e.g. '{\"company\":\"Thaitours\"}')")
# API interception option
ap.add_argument("--api-intercept", action="store_true", dest="enable_api_intercept",
help="enable API response interception for faster data capture (experimental)")
args = ap.parse_args()
# Handle config path