Initial commit - WhyRating Engine (Google Reviews Scraper)

This commit is contained in:
Alejandro Gutiérrez
2026-02-02 18:19:00 +00:00
parent 0543a08242
commit 2206ddeff2
136 changed files with 51138 additions and 855 deletions

View File

@@ -308,11 +308,15 @@ You are a review classifier using primitive-based analysis.
"spans": [
{
"text": "exact text from review",
"start": 0,
"end": 25,
"primitive": "MANNER",
"valence": "+",
"intensity": 2,
"detail": 2,
"confidence": 0.85
"confidence": 0.85,
"entity": null,
"entity_type": null
}
]
}
@@ -427,13 +431,16 @@ ORDER BY span_count DESC;
python run_classification_v2.py [OPTIONS]
Options:
--business TEXT Business name or pattern (required for classify/evaluate)
--limit INT Max reviews to process (default: 100)
--dry-run Don't store results to database
--evaluate BUSINESS Evaluate existing classification quality
--language-analysis Analyze UNMAPPED rates by language across all data
--use-llm Use real LLM classification (default: mock)
--model TEXT Model for LLM (default: gpt-4o-mini)
--business TEXT Business name or pattern (required for classify/evaluate)
--limit INT Max reviews to process (default: 100)
--dry-run Don't store results to database
--evaluate BUSINESS Evaluate existing classification quality
--language-analysis Analyze UNMAPPED rates by language across all data
--ignore-legacy-language Exclude rows with language='auto'/'unknown'/NULL
--latest-hours INT Only include spans from last N hours
--use-existing Use existing spans instead of jobs
--use-llm Use real LLM classification (requires OPENAI_API_KEY)
--model TEXT Model for LLM (default: gpt-4o-mini)
```
### Models