feat(pipeline): Add Stage 5 Synthesis for AI-generated narratives

- Add Stage5Synthesizer class that generates AI narratives and action plans - Add generate() method to LLMClient for synthesis generation - Integrate Stage 5 into pipeline runner after route stage - Add synthesis JSONB column to pipeline.executions table - Update reviewiq_analytics API to return synthesis data - Synthesis includes: executive narrative, sentiment/category/timeline insights, action plan, marketing angles, and priority recommendations Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-29 03:12:53 +00:00
parent c8ecb4b98f
commit 9b667e69a7
5 changed files with 3129 additions and 67 deletions
--- a/packages/reviewiq-pipeline/src/reviewiq_pipeline/services/llm_client.py
+++ b/packages/reviewiq-pipeline/src/reviewiq_pipeline/services/llm_client.py
@@ -29,28 +29,205 @@ Your task is to extract semantic spans from customer reviews and classify each s

 ## SPAN EXTRACTION RULES

-1. **Split on contrasting conjunctions**: but, however, although, despite, yet, though
-2. **Split on topic/target change**: food → service → bathroom = 3 spans
-3. **Split on valence change**: positive → negative = split
-4. **Split on domain change**: O (Offering) → J (Journey) → E (Environment) = split
-5. **Keep together**: cause→effect within same feedback unit ("X because Y" = 1 span)
+**CRITICAL: Use TOPIC-BASED splitting, NOT sentence-based splitting.**
+
+A span = all consecutive text about the SAME topic/domain, regardless of sentence count.
+
+### When to KEEP TOGETHER (same span):
+- Multiple sentences about the same topic: "The food was great. I loved the pasta. The sauce was perfect." → ONE span (all about Offering)
+- Cause and effect: "The wait was long because they were understaffed" → ONE span
+- Elaboration: "Staff was rude. They ignored us for 20 minutes." → ONE span (both about People)
+- Single-topic reviews: Even if 5 sentences, if all about food → ONE span
+
+### When to SPLIT (separate spans):
+- Contrasting conjunctions that change topic: "Food was great BUT service was slow" → TWO spans
+- Domain change: food (O) → staff (P) → ambiance (E) = split at each change
+- Target change: "The waiter was nice but the manager was rude" → TWO spans (different people)
+
+### Examples:
+- "Amazing food. Best burger ever. Fries were crispy too." → 1 span (all Offering, V+)
+- "Food was great but we waited an hour." → 2 spans (Offering V+, Journey V-)
+- "I've been coming here for years. Always consistent quality." → 1 span (Relationship)
+- "The staff are lovely and amazing with kids. More highchairs are definitely needed though." → 2 spans (People V+, Access V-)

 **Guardrails**:
- Max 3 spans per sentence (if 4+, re-check for over-splitting)
- Min 1 span per review (even single-word reviews)
- Spans must be non-overlapping and cover meaningful content
+- Prefer FEWER, LARGER spans over many small ones
+- Most reviews should have 1-3 spans, rarely more
+- Min 1 span per review
+- Spans must be non-overlapping

-## URT DOMAINS (Tier-3 codes: X#.##)
+## URT TAXONOMY - COMPLETE (138 codes, use EXACT codes)

-| Domain | Code | Description |
-|--------|------|-------------|
-| Offering | O1-O4 | Product/service quality, features, variety |
-| Price | P1-P4 | Value, pricing, promotions, payment |
-| Journey | J1-J4 | Timing, process, convenience, accessibility |
-| Environment | E1-E4 | Physical space, ambiance, cleanliness, digital UX |
-| Attitude | A1-A4 | Staff behavior, helpfulness, professionalism |
-| Voice | V1-V4 | Brand, communication, marketing, transparency |
-| Relationship | R1-R4 | Loyalty, trust, consistency, personalization |
+### O - OFFERING (Product/Service Quality) - 18 codes
+O1.01 Works/Doesn't Work: Basic functionality success or failure
+O1.02 Performance Level: How well it operates
+O1.03 Durability: Longevity and resistance to wear
+O1.04 Reliability: Consistency of function over time
+O1.05 Outcome Achievement: Did customer accomplish their goal?
+O2.01 Materials/Inputs: Quality of components or ingredients
+O2.02 Craftsmanship: Skill of construction or execution
+O2.03 Presentation: Visual and aesthetic quality
+O2.04 Attention to Detail: Finishing touches and refinement
+O2.05 Condition at Delivery: State when received
+O3.01 All Components Present: Nothing missing from what was promised
+O3.02 Feature Availability: Promised features actually work
+O3.03 Scope Delivery: Full scope of work completed
+O3.04 Documentation: Supporting materials provided
+O4.01 Specification Match: Matches what was ordered
+O4.02 Personalization: Adapted to individual preferences
+O4.03 Flexibility: Can be modified or adjusted
+O4.04 Appropriateness: Right solution for the need
+
+### P - PEOPLE (Staff Interactions) - 20 codes
+P1.01 Warmth: Friendly and welcoming manner
+P1.02 Respect: Treated with dignity
+P1.03 Patience: Calm and tolerant approach
+P1.04 Enthusiasm: Energy and engagement
+P1.05 Empathy: Understanding feelings
+P2.01 Knowledge: Expertise and understanding
+P2.02 Skill: Technical ability
+P2.03 Problem Solving: Ability to find solutions
+P2.04 Advice Quality: Helpful recommendations
+P2.05 Training Level: Staff training evident
+P3.01 Attentiveness: Being present and engaged
+P3.02 Initiative: Proactive help
+P3.03 Follow-through: Completing promised actions
+P3.04 Availability: Being available when needed
+P3.05 Dedication: Commitment to helping
+P4.01 Clarity: Clear communication
+P4.02 Listening: Understanding customer needs
+P4.03 Transparency: Honest and open
+P4.04 Honesty: Truthful communication
+P4.05 Proactive Updates: Keeping customer informed
+
+### J - JOURNEY (Process & Timing) - 20 codes
+J1.01 Speed: How fast things happen
+J1.02 Punctuality: On-time delivery
+J1.03 Queue Management: Handling of waiting customers
+J1.04 Punctuality: Meeting scheduled times
+J1.05 Pacing: Appropriate speed (not rushed/dragged)
+J2.01 Simplicity: Easy process
+J2.02 Friction: Obstacles encountered
+J2.03 Navigation: Finding what you need
+J2.04 Booking Availability: Slots/capacity when needed
+J2.05 Inventory: Stock availability
+J3.01 Consistency: Same experience every time
+J3.02 Accuracy: Getting it right
+J3.03 Uptime: System availability
+J3.04 Data Accuracy: Correct info in systems
+J3.05 Integration: Systems work together
+J4.01 Problem Recognition: Acknowledging issues
+J4.02 Resolution Speed: How fast problems get fixed
+J4.03 Resolution Fairness: Fair handling of issues
+J4.04 Escalation: Getting to right person
+J4.05 Closure: Issue fully resolved
+
+### E - ENVIRONMENT (Physical & Digital Space) - 20 codes
+E1.01 Cleanliness: How clean the space is
+E1.02 Comfort: Physical comfort
+E1.03 Space Design: Layout and organization
+E1.04 Ambiance: Atmosphere and vibe
+E1.05 Comfort: Physical comfort
+E2.01 Lighting: Light quality and level
+E2.02 Sound/Noise: Audio environment
+E2.03 Temperature: Climate control
+E2.04 Visual Design: Aesthetics of interface
+E2.05 Mobile Experience: Mobile usability
+E3.01 Interface Design: Digital UX/UI
+E3.02 App/Website Speed: Digital performance
+E3.03 Usability: Ease of digital use
+E3.04 Health Safety: Health precautions
+E3.05 Cyber Security: Digital security
+E4.01 Safety: Physical safety
+E4.02 Security: Protection of belongings/data
+E4.03 Health/Hygiene: Health standards
+E4.04 Social Responsibility: Ethical practices
+E4.05 Community Impact: Local community effect
+
+### A - ACCESS (Availability & Accessibility) - 20 codes
+A1.01 Hours: Operating hours
+A1.02 Booking Availability: Appointment slots
+A1.03 Inventory: Product availability
+A1.04 Wayfinding: Finding destination
+A1.05 Physical Accessibility: Disability accommodations
+A2.01 Physical Access: Mobility accessibility
+A2.02 Language Access: Language accommodation
+A2.03 Digital Accessibility: Screen reader/a11y
+A2.04 Language Accessibility: Multilingual support
+A2.05 Hours of Operation: Service availability times
+A3.01 Diversity Welcome: All backgrounds welcome
+A3.02 Accommodation: Special needs accommodation
+A3.03 Response Time: Speed of getting answers
+A3.04 Documentation Clarity: Clear instructions
+A3.05 Support Accessibility: Getting help when needed
+A4.01 Location: Physical location convenience
+A4.02 Parking: Parking availability
+A4.03 Multiple Channels: Ways to engage
+A4.04 Payment Flexibility: Multiple payment options
+A4.05 Refund Accessibility: Getting money back
+
+### V - VALUE (Pricing & Costs) - 20 codes ⚠️ USE FOR ALL PRICE/COST/FEE MENTIONS
+V1.01 Price Level: Cost amount ("cheap", "expensive", "affordable", "€", "$")
+V1.02 Price Fairness: Fair for what you get
+V1.03 Hidden Costs: Unexpected charges, surprise fees, hidden fees, extra charges
+V1.04 Price Transparency: Clear pricing upfront
+V1.05 Price Stability: Consistent pricing
+V2.01 Clear Pricing: Easy to understand costs
+V2.02 Honest Billing: Accurate charges
+V2.03 Policy Clarity: Clear terms and conditions
+V2.04 Quality-Price Ratio: Worth vs cost
+V2.05 Competitive Value: Compared to alternatives
+V3.01 Time Investment: Time required
+V3.02 Hassle Factor: Difficulty and inconvenience
+V3.03 Mental Load: Cognitive effort required
+V3.04 Promotion Clarity: Clear offer terms
+V3.05 Reward Redemption: Using points/rewards
+V4.01 Value for Money: Worth what you paid
+V4.02 ROI: Return on investment
+V4.03 Overall Satisfaction: Happy with the exchange
+V4.04 Billing Accuracy: Correct charges
+V4.05 Billing Resolution: Fixing billing issues
+
+### R - RELATIONSHIP (Trust & Loyalty) - 20 codes
+R1.01 Honesty: Truthfulness
+R1.02 Ethics: Ethical behavior, deceptive practices, scams
+R1.03 Promises Kept: Following through on promises
+R1.04 Ethics: Ethical behavior
+R1.05 Accountability: Taking responsibility
+R2.01 Consistency: Reliable over time
+R2.02 Trustworthiness: Can be trusted
+R2.03 Accountability: Takes responsibility
+R2.04 Predictability: Consistent experience
+R2.05 Standards: Meeting quality standards
+R3.01 Error Acknowledgment: Admits mistakes
+R3.02 Apology Quality: Sincere apologies
+R3.03 Making It Right: Correcting mistakes
+R3.04 Personal Connection: Human touch
+R3.05 Going Extra Mile: Beyond expectations
+R4.01 Customer Recognition: Remembers customers
+R4.02 Loyalty Rewards: Rewards for loyalty
+R4.03 Long-term Relationship: Builds relationships
+R4.04 Service Recovery: Making things right
+R4.05 Feedback Response: Acting on feedback
+
+## CLASSIFICATION EXAMPLES (Critical Distinctions)
+
+**PRICING/COSTS → V codes (Value), NOT P codes:**
+- "Cheap prices", "good price", "€50" → V1.01 Price Level
+- "Hidden charges", "surprise fees", "extra €35" → V1.03 Hidden Costs
+- "Great value for money" → V4.01 Value for Money
+- "Overcharged", "wrong amount" → V4.04 Billing Accuracy
+
+**STAFF BEHAVIOR → P codes (People):**
+- "Staff was friendly", "welcoming" → P1.01 Warmth
+- "Rude", "disrespectful", "ignored us" → P1.02 Respect
+- "Patient", "took their time" → P1.03 Patience
+- "Knowledgeable", "expert" → P2.01 Knowledge
+
+**DECEPTION/ETHICS → R codes (Relationship):**
+- "They lied", "misleading" → R1.01 Honesty
+- "Felt scammed", "dishonest practices" → R1.02 Ethics
+- "Didn't honor the deal" → R1.03 Promises Kept

 ## DIMENSION CODES

@@ -159,6 +336,20 @@ class LLMClientBase(ABC):
        self.config = config
        self.total_tokens_used = 0
        self.total_cost_usd = 0.0
+        self._custom_prompt: str | None = None
+
+    def set_prompt(self, prompt: str) -> None:
+        """
+        Set a custom system prompt (e.g., built dynamically from database).
+
+        Args:
+            prompt: The system prompt to use for classification
+        """
+        self._custom_prompt = prompt
+
+    def get_prompt(self) -> str:
+        """Get the current system prompt (custom or default)."""
+        return self._custom_prompt or SYSTEM_PROMPT

    @abstractmethod
    async def classify(
@@ -178,6 +369,28 @@ class LLMClientBase(ABC):
        """
        pass

+    @abstractmethod
+    async def generate(
+        self,
+        system_prompt: str,
+        user_prompt: str,
+        temperature: float = 0.7,
+        max_tokens: int = 4000,
+    ) -> str:
+        """
+        Generate text using the LLM (for synthesis, narratives, etc.).
+
+        Args:
+            system_prompt: System instructions
+            user_prompt: User content/context
+            temperature: Creativity level (0-1)
+            max_tokens: Maximum response length
+
+        Returns:
+            Generated text response
+        """
+        pass
+
    @abstractmethod
    async def close(self) -> None:
        """Close the client and cleanup resources."""
@@ -211,7 +424,7 @@ class OpenAIClient(LLMClientBase):
        start_time = time.time()

        messages = [
-            {"role": "system", "content": SYSTEM_PROMPT},
+            {"role": "system", "content": self.get_prompt()},
            {
                "role": "user",
                "content": f'Classify this review:\n\n"{review_text}"',
@@ -255,6 +468,43 @@ class OpenAIClient(LLMClientBase):

        return result, metadata

+    async def generate(
+        self,
+        system_prompt: str,
+        user_prompt: str,
+        temperature: float = 0.7,
+        max_tokens: int = 4000,
+    ) -> str:
+        """Generate text using OpenAI."""
+        messages = [
+            {"role": "system", "content": system_prompt},
+            {"role": "user", "content": user_prompt},
+        ]
+
+        response = await self.client.chat.completions.create(
+            model=self.model,
+            messages=messages,
+            temperature=temperature,
+            max_tokens=max_tokens,
+            response_format={"type": "json_object"},
+            timeout=self.config.llm_timeout_seconds,
+        )
+
+        content = response.choices[0].message.content
+        if not content:
+            raise ValueError("Empty response from OpenAI")
+
+        # Track usage
+        if response.usage:
+            input_tokens = response.usage.prompt_tokens
+            output_tokens = response.usage.completion_tokens
+            pricing = self.PRICING.get(self.model, {"input": 0.15, "output": 0.60})
+            cost = (input_tokens * pricing["input"] + output_tokens * pricing["output"]) / 1_000_000
+            self.total_tokens_used += input_tokens + output_tokens
+            self.total_cost_usd += cost
+
+        return content
+
    async def close(self) -> None:
        """Close the OpenAI client."""
        await self.client.close()
@@ -289,7 +539,7 @@ class AnthropicClient(LLMClientBase):
        response = await self.client.messages.create(
            model=self.model,
            max_tokens=4096,
-            system=SYSTEM_PROMPT,
+            system=self.get_prompt(),
            messages=[
                {
                    "role": "user",
@@ -329,6 +579,58 @@ class AnthropicClient(LLMClientBase):

        return result, metadata

+    async def generate(
+        self,
+        system_prompt: str,
+        user_prompt: str,
+        temperature: float = 0.7,
+        max_tokens: int = 4000,
+    ) -> str:
+        """Generate text using Anthropic."""
+        response = await self.client.messages.create(
+            model=self.model,
+            max_tokens=max_tokens,
+            system=system_prompt,
+            messages=[{"role": "user", "content": user_prompt}],
+            temperature=temperature,
+        )
+
+        content = response.content[0].text if response.content else ""
+        if not content:
+            raise ValueError("Empty response from Anthropic")
+
+        # Track usage
+        input_tokens = response.usage.input_tokens
+        output_tokens = response.usage.output_tokens
+        pricing = self.PRICING.get(self.model, {"input": 3.0, "output": 15.0})
+        cost = (input_tokens * pricing["input"] + output_tokens * pricing["output"]) / 1_000_000
+        self.total_tokens_used += input_tokens + output_tokens
+        self.total_cost_usd += cost
+
+        # Extract JSON from response (handles code blocks)
+        return self._extract_json_string(content)
+
+    def _extract_json_string(self, content: str) -> str:
+        """Extract JSON string from response, handling markdown code blocks."""
+        import re
+        content = content.strip()
+
+        # If it starts with {, return as-is
+        if content.startswith("{"):
+            return content
+
+        # Try to find JSON in code blocks
+        json_match = re.search(r"```(?:json)?\s*([\s\S]*?)\s*```", content)
+        if json_match:
+            return json_match.group(1)
+
+        # Try to find JSON object
+        json_match = re.search(r"\{[\s\S]*\}", content)
+        if json_match:
+            return json_match.group(0)
+
+        return content
+
    def _extract_json(self, content: str) -> dict[str, Any]:
        """Extract JSON from response, handling markdown code blocks."""
        content = content.strip()