Phase 0: Project restructure to ReviewIQ platform architecture
New structure: - scrapers/google_reviews/v1_0_0.py (was modules/scraper_clean.py) - scrapers/base.py (BaseScraper interface) - scrapers/registry.py (ScraperRegistry for version routing) - core/database.py, models.py, config.py, enums.py - utils/logger.py, crash_analyzer.py, health_checks.py, helpers.py, date_converter.py - workers/chrome_pool.py - services/webhook_service.py - api/ routes structure (empty, ready for Phase 2) - tests/ structure mirroring source All imports updated in: - api_server_production.py (7 import paths updated) - utils/health_checks.py (scraper import path) Legacy modules moved to modules/_legacy/: - data_storage.py, image_handler.py, s3_handler.py (unused) Syntax verified, frontend build passing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
1143
.artifacts/ReviewIQ-Architecture-v2.md
Normal file
1143
.artifacts/ReviewIQ-Architecture-v2.md
Normal file
File diff suppressed because it is too large
Load Diff
2306
.artifacts/ReviewIQ-Architecture-v3.2.md
Normal file
2306
.artifacts/ReviewIQ-Architecture-v3.2.md
Normal file
File diff suppressed because it is too large
Load Diff
1513
.artifacts/ReviewIQ-Architecture-v3.md
Normal file
1513
.artifacts/ReviewIQ-Architecture-v3.md
Normal file
File diff suppressed because it is too large
Load Diff
183
.artifacts/ReviewIQ-v32-Decisions.md
Normal file
183
.artifacts/ReviewIQ-v32-Decisions.md
Normal file
@@ -0,0 +1,183 @@
|
||||
# ReviewIQ v3.2 Design Decisions
|
||||
|
||||
> Fast context-recovery document — all key decisions without the full spec.
|
||||
|
||||
---
|
||||
|
||||
## 1. Markpoint
|
||||
|
||||
```
|
||||
ID: reviewiq-v32-span-layer-2026-01-24-001
|
||||
Status: v3.2 span layer complete
|
||||
Based on: v3.1.2 (commit f998277)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Core Design Decisions
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|----------|--------|-----------|
|
||||
| Span granularity | Clause/topic-level | Preserves multi-domain signal |
|
||||
| span_id format | ULID (TEXT) | Survives re-segmentation |
|
||||
| Span offsets | Required (NOT NULL) | Deterministic reconstruction |
|
||||
| Offsets reference | reviews_enriched.text | Not text_normalized |
|
||||
| Span → Issue mapping | One-to-one (UNIQUE span_id) | Atomic unit per issue |
|
||||
| Primary span enforcement | Partial unique index | Exactly one per review version |
|
||||
| Primary selection | I3>I2>I1, V->V±>V0>V+, span_index | Deterministic, stable |
|
||||
| Reprocessing strategy | Soft-switch with is_active | No transient empty states |
|
||||
| Span overlap | GiST exclusion constraint | Non-overlapping ranges enforced |
|
||||
| Secondary codes | Array with cardinality ≤ 2 | Could normalize to link table later |
|
||||
| Causal chain storage | JSONB | Flexibility, normalize later if needed |
|
||||
| relation_type vs causal_chain | Separate concerns | relation = within-review, causal = root cause |
|
||||
| Dimension columns | Postgres ENUMs | Type safety, storage efficiency |
|
||||
| Trust score floor | 0.2 (GREATEST clamp) | Prevent multiplicative collapse |
|
||||
| Issue routing key | (business_id, place_id, urt_primary, entity_normalized) | Deterministic, entity-aware |
|
||||
| Issue ID generation | SHA256 via pgcrypto | Deterministic, collision-resistant |
|
||||
| Text validation trigger | Conditional via session setting | Performance: skip in bulk loads |
|
||||
| Relation validation | Application-level post-insert | Handles insertion order |
|
||||
|
||||
---
|
||||
|
||||
## 3. Extensions Required
|
||||
|
||||
| Extension | Purpose |
|
||||
|-----------|---------|
|
||||
| `btree_gist` | Exclusion constraint for non-overlapping spans |
|
||||
| `pgcrypto` | SHA256-based issue ID generation |
|
||||
|
||||
---
|
||||
|
||||
## 4. New Tables
|
||||
|
||||
| Table | Purpose |
|
||||
|-------|---------|
|
||||
| `review_spans` | Span-level URT classification |
|
||||
| `review_span_secondary_codes` | (Optional) Normalized secondary codes |
|
||||
|
||||
---
|
||||
|
||||
## 5. Modified Tables
|
||||
|
||||
| Table | Changes |
|
||||
|-------|---------|
|
||||
| `issue_spans` | Added `span_id` FK (NOT NULL), removed direct review FK as canonical |
|
||||
|
||||
---
|
||||
|
||||
## 6. New ENUM Types
|
||||
|
||||
**Valence & Intensity:**
|
||||
- `urt_valence` — V-, V±, V0, V+
|
||||
- `urt_intensity` — I1, I2, I3
|
||||
|
||||
**Specificity & Actionability:**
|
||||
- `urt_specificity` — S1, S2, S3
|
||||
- `urt_actionability` — A1, A2, A3
|
||||
|
||||
**Context & Evidence:**
|
||||
- `urt_temporal` — T1, T2, T3
|
||||
- `urt_evidence` — E1, E2, E3
|
||||
- `urt_comparative` — CR1, CR2, CR3
|
||||
|
||||
**Classification:**
|
||||
- `urt_profile` — factual, emotional, comparative, etc.
|
||||
- `urt_confidence` — low, medium, high
|
||||
- `urt_relation` — elaborates, contrasts, causes, etc.
|
||||
- `urt_entity_type` — person, product, location, etc.
|
||||
|
||||
---
|
||||
|
||||
## 7. Key Functions
|
||||
|
||||
| Function | Purpose |
|
||||
|----------|---------|
|
||||
| `urt_validate_causal_chain()` | Validates causal JSONB structure |
|
||||
| `validate_review_relations()` | Ensures related_span_id same-parent |
|
||||
| `validate_active_spans()` | Ensures valid active span set |
|
||||
| `set_primary_span()` | Deterministic primary selection |
|
||||
| `generate_issue_id()` | SHA256-based issue ID |
|
||||
|
||||
---
|
||||
|
||||
## 8. Key Triggers
|
||||
|
||||
| Trigger | Purpose |
|
||||
|---------|---------|
|
||||
| `review_spans_validate_bounds` | span_end ≤ text length |
|
||||
| `review_spans_validate_text` | span_text matches substring |
|
||||
| `review_spans_validate_causal_chain` | causal_chain JSONB valid |
|
||||
|
||||
---
|
||||
|
||||
## 9. USN Format
|
||||
|
||||
```
|
||||
Standard: URT:S:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}
|
||||
Full: URT:F:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}:{causal}
|
||||
```
|
||||
|
||||
**Examples:**
|
||||
- `URT:S:SVC.SPD:V-I3:S3A3T2.E2.CR1` — Specific service speed complaint
|
||||
- `URT:F:PRD.QUA:V+I2:S2A1T1.E3.CR2:staff→training` — Product quality praise with causal chain
|
||||
|
||||
---
|
||||
|
||||
## 10. Span Boundary Rules
|
||||
|
||||
1. **Split on contrasting conjunctions** — "but", "however", "although"
|
||||
2. **Split on topic/target change** — Different entity or aspect
|
||||
3. **Split on valence change** — Positive → Negative or vice versa
|
||||
4. **Split on domain change** — SVC → PRD → AMB
|
||||
5. **Keep cause→effect together** — Causal chain stays in one span
|
||||
|
||||
---
|
||||
|
||||
## 11. Deferred to v3.3+
|
||||
|
||||
| Item | Reason |
|
||||
|------|--------|
|
||||
| Entity extraction implementation | Requires NER pipeline |
|
||||
| Trust-weighted fact aggregation | Needs more span data |
|
||||
| Secondary domain enforcement | App-level validation sufficient |
|
||||
| Span-based fact counting | Currently review-based, optimize later |
|
||||
|
||||
---
|
||||
|
||||
## 12. Open Questions Resolved
|
||||
|
||||
| Question | Resolution |
|
||||
|----------|------------|
|
||||
| Span → Issue cardinality? | **One-to-one** (not many-to-many) |
|
||||
| Offsets nullable for LLM-inferred? | **No** — required, NOT NULL |
|
||||
| Reprocessing strategy? | **Soft-switch** with is_active flag |
|
||||
| TEXT vs ENUM for dimensions? | **ENUMs** — committed to Postgres |
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Primary Span Selection Algorithm
|
||||
|
||||
```
|
||||
ORDER BY:
|
||||
1. intensity DESC (I3 > I2 > I1)
|
||||
2. valence ASC (V- > V± > V0 > V+)
|
||||
3. span_index ASC (first wins ties)
|
||||
```
|
||||
|
||||
### Issue Routing Key
|
||||
|
||||
```sql
|
||||
(business_id, place_id, urt_primary, entity_normalized)
|
||||
```
|
||||
|
||||
### Trust Score Calculation
|
||||
|
||||
```sql
|
||||
GREATEST(0.2, base_trust * modifiers) -- Floor prevents collapse
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2026-01-24*
|
||||
331
.artifacts/URT-v5.1-Reference.md
Normal file
331
.artifacts/URT-v5.1-Reference.md
Normal file
@@ -0,0 +1,331 @@
|
||||
# Universal Review Taxonomy (URT) v5.1 Reference
|
||||
|
||||
## Overview
|
||||
|
||||
The Universal Review Taxonomy (URT) is a classification system for customer feedback. It provides a structured approach to categorizing, annotating, and analyzing review content across any industry.
|
||||
|
||||
### Key Characteristics
|
||||
|
||||
- **Three Profiles**: Core, Standard, Full (increasing detail)
|
||||
- **Seven Domains**: Covering all aspects of customer experience
|
||||
- **Tier-3 Canonical Codes**: Format `X#.##` (e.g., J1.02, P2.15)
|
||||
- **Dimensional Annotation**: Valence, intensity, specificity, and more
|
||||
- **Causal Analysis**: Root cause chains (Full profile)
|
||||
|
||||
---
|
||||
|
||||
## Domain Codes
|
||||
|
||||
URT organizes feedback into seven domains, each identified by a single letter.
|
||||
|
||||
| Domain | Letter | Description |
|
||||
|--------|--------|-------------|
|
||||
| Offering | O | Product/service quality |
|
||||
| Price | P | Value, pricing, promotions |
|
||||
| Journey | J | Customer experience, timing, process |
|
||||
| Environment | E | Physical/digital space |
|
||||
| Attitude | A | Staff behavior, service attitude |
|
||||
| Voice | V | Brand, communication, marketing |
|
||||
| Relationship | R | Loyalty, trust, long-term relationship |
|
||||
|
||||
### Tier-3 Code Format
|
||||
|
||||
```
|
||||
Pattern: [OPJEAVR][1-4]\.[0-9]{2}
|
||||
```
|
||||
|
||||
Examples:
|
||||
- `J1.02` - Journey domain, category 1, subcategory 02
|
||||
- `P2.15` - Price domain, category 2, subcategory 15
|
||||
- `A3.01` - Attitude domain, category 3, subcategory 01
|
||||
|
||||
---
|
||||
|
||||
## Dimension Codes
|
||||
|
||||
### Valence
|
||||
|
||||
Indicates the sentiment direction of the feedback.
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| V+ | Positive |
|
||||
| V- | Negative |
|
||||
| V0 | Neutral |
|
||||
| V± | Mixed |
|
||||
|
||||
### Intensity
|
||||
|
||||
Indicates the strength of the expressed sentiment.
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| I1 | Low intensity |
|
||||
| I2 | Moderate intensity |
|
||||
| I3 | High intensity |
|
||||
|
||||
### Specificity (Standard+)
|
||||
|
||||
Indicates how detailed the feedback is.
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| S1 | Low - vague, general |
|
||||
| S2 | Medium - some detail |
|
||||
| S3 | High - specific, precise |
|
||||
|
||||
### Actionability (Standard+)
|
||||
|
||||
Indicates whether clear actions can be derived from the feedback.
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| A1 | None - no clear action |
|
||||
| A2 | Unclear - possible actions |
|
||||
| A3 | Clear - specific actionable |
|
||||
|
||||
### Temporal (Standard+)
|
||||
|
||||
Indicates the time frame referenced in the feedback.
|
||||
|
||||
| Code | Meaning | Markers |
|
||||
|------|---------|---------|
|
||||
| TC | Current - this visit | "today", "this time", "yesterday" |
|
||||
| TR | Recent - last few visits | "lately", "recently", "again" |
|
||||
| TH | Historical - long-standing | "for years", "always", "historically" |
|
||||
| TF | Future - expectations | "I won't come back", "next time" |
|
||||
|
||||
**Default**: TC when no temporal language exists.
|
||||
|
||||
### Evidence (Standard+)
|
||||
|
||||
Indicates how the information was obtained from the text.
|
||||
|
||||
| Code | Meaning | Example |
|
||||
|------|---------|---------|
|
||||
| ES | Stated - explicit in text | "Waited 45 minutes" |
|
||||
| EI | Inferred - logically entailed | "Took 3 weeks to reply" → slow response |
|
||||
| EC | Contextual - depends on context | "That happened again" |
|
||||
|
||||
**Default**: ES. Use EI/EC only when needed.
|
||||
|
||||
### Comparative
|
||||
|
||||
Indicates whether the feedback compares to alternatives.
|
||||
|
||||
| Code | Meaning |
|
||||
|------|---------|
|
||||
| CR-N | No comparison |
|
||||
| CR-B | Better than alternatives |
|
||||
| CR-W | Worse than alternatives |
|
||||
| CR-S | Same as alternatives |
|
||||
|
||||
---
|
||||
|
||||
## USN (URT String Notation)
|
||||
|
||||
USN is a compact string encoding for URT annotations.
|
||||
|
||||
### Grammar
|
||||
|
||||
```
|
||||
Standard: URT:S:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}
|
||||
Full: URT:F:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}:{causal}
|
||||
```
|
||||
|
||||
### Encoding Rules
|
||||
|
||||
**Valence**:
|
||||
- `+` for V+
|
||||
- `-` for V-
|
||||
|
||||
**Intensity**:
|
||||
- `1` for I1
|
||||
- `2` for I2
|
||||
- `3` for I3
|
||||
|
||||
### Examples
|
||||
|
||||
**Standard Profile**:
|
||||
```
|
||||
URT:S:J1.03:-2:22TC.ES.N
|
||||
```
|
||||
Decoded:
|
||||
- Profile: Standard
|
||||
- Code: J1.03
|
||||
- Valence: V- (negative)
|
||||
- Intensity: I2
|
||||
- Specificity: S2
|
||||
- Actionability: A2
|
||||
- Temporal: TC
|
||||
- Evidence: ES
|
||||
- Comparative: CR-N
|
||||
|
||||
**Full Profile with Causal Chain**:
|
||||
```
|
||||
URT:F:J1.01+A1.04:-3:23TR.EI.S:CD.O,MG.O
|
||||
```
|
||||
Decoded:
|
||||
- Profile: Full
|
||||
- Codes: J1.01, A1.04
|
||||
- Valence: V- (negative)
|
||||
- Intensity: I3
|
||||
- Specificity: S2
|
||||
- Actionability: A3
|
||||
- Temporal: TR
|
||||
- Evidence: EI
|
||||
- Comparative: CR-S
|
||||
- Causal: CD.O (Conditions-Operational), MG.O (Management-Oversight)
|
||||
|
||||
---
|
||||
|
||||
## Causal Chain (Full Profile Only)
|
||||
|
||||
The causal chain identifies root causes across three layers, ordered from immediate to systemic.
|
||||
|
||||
### Layers
|
||||
|
||||
| Layer | Codes | Scope |
|
||||
|-------|-------|-------|
|
||||
| conditions | CD-S, CD-T, CD-E, CD-F, CD-O | Staff State, Team Dynamics, Equipment, Facility, Operational |
|
||||
| management | MG-P, MG-T, MG-O, MG-R, MG-C | Planning, Training, Oversight, Resources, Communication |
|
||||
| systemic | SY-R, SY-P, SY-C, SY-S, SY-H, SY-X | Resource Decisions, Policy, Culture, Standards, Human Capital, External |
|
||||
|
||||
### Code Reference
|
||||
|
||||
**Conditions Layer**:
|
||||
- `CD-S` - Staff State
|
||||
- `CD-T` - Team Dynamics
|
||||
- `CD-E` - Equipment
|
||||
- `CD-F` - Facility
|
||||
- `CD-O` - Operational
|
||||
|
||||
**Management Layer**:
|
||||
- `MG-P` - Planning
|
||||
- `MG-T` - Training
|
||||
- `MG-O` - Oversight
|
||||
- `MG-R` - Resources
|
||||
- `MG-C` - Communication
|
||||
|
||||
**Systemic Layer**:
|
||||
- `SY-R` - Resource Decisions
|
||||
- `SY-P` - Policy
|
||||
- `SY-C` - Culture
|
||||
- `SY-S` - Standards
|
||||
- `SY-H` - Human Capital
|
||||
- `SY-X` - External
|
||||
|
||||
### JSONB Schema
|
||||
|
||||
```json
|
||||
[
|
||||
{"layer": "conditions", "code": "CD-O", "evidence": "ES"},
|
||||
{"layer": "management", "code": "MG-P", "evidence": "EI"}
|
||||
]
|
||||
```
|
||||
|
||||
### Constraints
|
||||
|
||||
- Maximum 3 entries (one per layer)
|
||||
- Only include when text explicitly supports it
|
||||
- Order: conditions → management → systemic
|
||||
|
||||
---
|
||||
|
||||
## Span Boundary Detection Rules
|
||||
|
||||
Spans are detected at the clause/topic level, not sentence level.
|
||||
|
||||
### Split Rules (in priority order)
|
||||
|
||||
1. **Split on contrasting conjunctions**: but, however, although, despite, yet
|
||||
2. **Split when subject/target changes** (topic shift)
|
||||
3. **Split when valence changes** (positive ↔ negative)
|
||||
4. **Split when domain changes** (O/P/J/E/A/V/R)
|
||||
5. **Keep together** for cause→effect within same feedback unit
|
||||
|
||||
### Guidelines
|
||||
|
||||
- **Maximum**: ~3 spans per sentence
|
||||
- **Validation**: If 4+ spans detected, re-check for over-splitting
|
||||
|
||||
### Example
|
||||
|
||||
**Input**:
|
||||
> "The food was great but the service was slow and the bathroom was dirty."
|
||||
|
||||
**Output**: 3 spans
|
||||
1. "The food was great" (Offering, positive)
|
||||
2. "the service was slow" (Journey/Attitude, negative)
|
||||
3. "the bathroom was dirty" (Environment, negative)
|
||||
|
||||
**Reasoning**: Topic shift + domain shift at each boundary.
|
||||
|
||||
---
|
||||
|
||||
## Primary Span Selection
|
||||
|
||||
When a review contains multiple spans, select the primary span using these criteria in order:
|
||||
|
||||
### Selection Priority
|
||||
|
||||
1. **Highest intensity** (I3 > I2 > I1)
|
||||
2. **Tie-break**: Negative over positive (V- > V± > V0 > V+)
|
||||
3. **Tie-break**: Earliest span_index
|
||||
|
||||
### Example
|
||||
|
||||
Given spans:
|
||||
- Span 0: I2, V+
|
||||
- Span 1: I3, V+
|
||||
- Span 2: I3, V-
|
||||
|
||||
**Primary**: Span 2 (highest intensity I3, negative valence wins tie-break)
|
||||
|
||||
---
|
||||
|
||||
## Secondary Codes Rules
|
||||
|
||||
Secondary codes capture additional topics mentioned in a span.
|
||||
|
||||
### Constraints
|
||||
|
||||
- **Maximum**: 2 secondary codes
|
||||
- **Format**: Must be Tier-3 (X#.##)
|
||||
- **Recommendation**: Should be different domain from primary
|
||||
|
||||
### Example
|
||||
|
||||
Primary: `J1.03` (Journey)
|
||||
Secondary: `A2.01`, `E1.05` (Attitude, Environment)
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference Card
|
||||
|
||||
### Profiles
|
||||
|
||||
| Profile | Dimensions | Causal Chain |
|
||||
|---------|------------|--------------|
|
||||
| Core | V, I | No |
|
||||
| Standard | V, I, S, A, T, E, CR | No |
|
||||
| Full | V, I, S, A, T, E, CR | Yes |
|
||||
|
||||
### USN Quick Format
|
||||
|
||||
```
|
||||
URT:{S|F}:{tier3_codes}:{valence}{intensity}:{SAT}.{E}.{CR}[:{causal}]
|
||||
```
|
||||
|
||||
### Domain Letters
|
||||
|
||||
```
|
||||
O P J E A V R
|
||||
│ │ │ │ │ │ └─ Relationship
|
||||
│ │ │ │ │ └─── Voice
|
||||
│ │ │ │ └───── Attitude
|
||||
│ │ │ └─────── Environment
|
||||
│ │ └───────── Journey
|
||||
│ └─────────── Price
|
||||
└───────────── Offering
|
||||
```
|
||||
Reference in New Issue
Block a user