- Reject authors with <= 3 chars (language codes like "es", "it", "no")
- Reject known non-review authors ("google", "maps", etc.)
- Reject timestamps that are URLs or very short strings
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Use div.jftiEf[data-review-id] selector to exclude button elements
- Reload original URL after consent (prevents URL corruption)
- Parse full DOM data after scrolling stops
- Deduplicate API reviews by author match
- Remove slow "More" button clicking for speed
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Key improvements:
- Background thread scrolling at 10Hz (0.1s intervals) for smooth continuous scroll
- JavaScript-based review ID collection (doesn't affect scroll position)
- API interception via injected fetch/XHR interceptor
- Total review count extraction from page
- Auto-stop when all reviews collected or timeout reached
The scroll issue was caused by Selenium's find_elements() affecting scroll
position. Using pure JavaScript for data collection keeps scroll pinned to bottom.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>