fix(synthesis): Select most common business_id to handle data leakage
Changed the business name query to ORDER BY COUNT(*) DESC instead of arbitrary LIMIT 1, ensuring the correct business is identified even when trace amounts of other business data leak into a job. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -486,10 +486,13 @@ class Stage5Synthesizer:
|
||||
ORDER BY negative DESC
|
||||
""", job_id)
|
||||
|
||||
# Business name
|
||||
# Business name - get the most common one (in case of data leakage)
|
||||
business = await self.pool.fetchval("""
|
||||
SELECT DISTINCT business_id FROM pipeline.reviews_enriched
|
||||
WHERE job_id = $1::uuid LIMIT 1
|
||||
SELECT business_id FROM pipeline.reviews_enriched
|
||||
WHERE job_id = $1::uuid
|
||||
GROUP BY business_id
|
||||
ORDER BY COUNT(*) DESC
|
||||
LIMIT 1
|
||||
""", job_id)
|
||||
|
||||
# MOMENTUM: Calculate from data (not LLM guess)
|
||||
|
||||
Reference in New Issue
Block a user