Phase 0: Project restructure to ReviewIQ platform architecture
New structure: - scrapers/google_reviews/v1_0_0.py (was modules/scraper_clean.py) - scrapers/base.py (BaseScraper interface) - scrapers/registry.py (ScraperRegistry for version routing) - core/database.py, models.py, config.py, enums.py - utils/logger.py, crash_analyzer.py, health_checks.py, helpers.py, date_converter.py - workers/chrome_pool.py - services/webhook_service.py - api/ routes structure (empty, ready for Phase 2) - tests/ structure mirroring source All imports updated in: - api_server_production.py (7 import paths updated) - utils/health_checks.py (scraper import path) Legacy modules moved to modules/_legacy/: - data_storage.py, image_handler.py, s3_handler.py (unused) Syntax verified, frontend build passing. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
183
.artifacts/ReviewIQ-v32-Decisions.md
Normal file
183
.artifacts/ReviewIQ-v32-Decisions.md
Normal file
@@ -0,0 +1,183 @@
|
||||
# ReviewIQ v3.2 Design Decisions
|
||||
|
||||
> Fast context-recovery document — all key decisions without the full spec.
|
||||
|
||||
---
|
||||
|
||||
## 1. Markpoint
|
||||
|
||||
```
|
||||
ID: reviewiq-v32-span-layer-2026-01-24-001
|
||||
Status: v3.2 span layer complete
|
||||
Based on: v3.1.2 (commit f998277)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Core Design Decisions
|
||||
|
||||
| Decision | Choice | Rationale |
|
||||
|----------|--------|-----------|
|
||||
| Span granularity | Clause/topic-level | Preserves multi-domain signal |
|
||||
| span_id format | ULID (TEXT) | Survives re-segmentation |
|
||||
| Span offsets | Required (NOT NULL) | Deterministic reconstruction |
|
||||
| Offsets reference | reviews_enriched.text | Not text_normalized |
|
||||
| Span → Issue mapping | One-to-one (UNIQUE span_id) | Atomic unit per issue |
|
||||
| Primary span enforcement | Partial unique index | Exactly one per review version |
|
||||
| Primary selection | I3>I2>I1, V->V±>V0>V+, span_index | Deterministic, stable |
|
||||
| Reprocessing strategy | Soft-switch with is_active | No transient empty states |
|
||||
| Span overlap | GiST exclusion constraint | Non-overlapping ranges enforced |
|
||||
| Secondary codes | Array with cardinality ≤ 2 | Could normalize to link table later |
|
||||
| Causal chain storage | JSONB | Flexibility, normalize later if needed |
|
||||
| relation_type vs causal_chain | Separate concerns | relation = within-review, causal = root cause |
|
||||
| Dimension columns | Postgres ENUMs | Type safety, storage efficiency |
|
||||
| Trust score floor | 0.2 (GREATEST clamp) | Prevent multiplicative collapse |
|
||||
| Issue routing key | (business_id, place_id, urt_primary, entity_normalized) | Deterministic, entity-aware |
|
||||
| Issue ID generation | SHA256 via pgcrypto | Deterministic, collision-resistant |
|
||||
| Text validation trigger | Conditional via session setting | Performance: skip in bulk loads |
|
||||
| Relation validation | Application-level post-insert | Handles insertion order |
|
||||
|
||||
---
|
||||
|
||||
## 3. Extensions Required
|
||||
|
||||
| Extension | Purpose |
|
||||
|-----------|---------|
|
||||
| `btree_gist` | Exclusion constraint for non-overlapping spans |
|
||||
| `pgcrypto` | SHA256-based issue ID generation |
|
||||
|
||||
---
|
||||
|
||||
## 4. New Tables
|
||||
|
||||
| Table | Purpose |
|
||||
|-------|---------|
|
||||
| `review_spans` | Span-level URT classification |
|
||||
| `review_span_secondary_codes` | (Optional) Normalized secondary codes |
|
||||
|
||||
---
|
||||
|
||||
## 5. Modified Tables
|
||||
|
||||
| Table | Changes |
|
||||
|-------|---------|
|
||||
| `issue_spans` | Added `span_id` FK (NOT NULL), removed direct review FK as canonical |
|
||||
|
||||
---
|
||||
|
||||
## 6. New ENUM Types
|
||||
|
||||
**Valence & Intensity:**
|
||||
- `urt_valence` — V-, V±, V0, V+
|
||||
- `urt_intensity` — I1, I2, I3
|
||||
|
||||
**Specificity & Actionability:**
|
||||
- `urt_specificity` — S1, S2, S3
|
||||
- `urt_actionability` — A1, A2, A3
|
||||
|
||||
**Context & Evidence:**
|
||||
- `urt_temporal` — T1, T2, T3
|
||||
- `urt_evidence` — E1, E2, E3
|
||||
- `urt_comparative` — CR1, CR2, CR3
|
||||
|
||||
**Classification:**
|
||||
- `urt_profile` — factual, emotional, comparative, etc.
|
||||
- `urt_confidence` — low, medium, high
|
||||
- `urt_relation` — elaborates, contrasts, causes, etc.
|
||||
- `urt_entity_type` — person, product, location, etc.
|
||||
|
||||
---
|
||||
|
||||
## 7. Key Functions
|
||||
|
||||
| Function | Purpose |
|
||||
|----------|---------|
|
||||
| `urt_validate_causal_chain()` | Validates causal JSONB structure |
|
||||
| `validate_review_relations()` | Ensures related_span_id same-parent |
|
||||
| `validate_active_spans()` | Ensures valid active span set |
|
||||
| `set_primary_span()` | Deterministic primary selection |
|
||||
| `generate_issue_id()` | SHA256-based issue ID |
|
||||
|
||||
---
|
||||
|
||||
## 8. Key Triggers
|
||||
|
||||
| Trigger | Purpose |
|
||||
|---------|---------|
|
||||
| `review_spans_validate_bounds` | span_end ≤ text length |
|
||||
| `review_spans_validate_text` | span_text matches substring |
|
||||
| `review_spans_validate_causal_chain` | causal_chain JSONB valid |
|
||||
|
||||
---
|
||||
|
||||
## 9. USN Format
|
||||
|
||||
```
|
||||
Standard: URT:S:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}
|
||||
Full: URT:F:{codes}:{V}{I}:{S}{A}{T}.{E}.{CR}:{causal}
|
||||
```
|
||||
|
||||
**Examples:**
|
||||
- `URT:S:SVC.SPD:V-I3:S3A3T2.E2.CR1` — Specific service speed complaint
|
||||
- `URT:F:PRD.QUA:V+I2:S2A1T1.E3.CR2:staff→training` — Product quality praise with causal chain
|
||||
|
||||
---
|
||||
|
||||
## 10. Span Boundary Rules
|
||||
|
||||
1. **Split on contrasting conjunctions** — "but", "however", "although"
|
||||
2. **Split on topic/target change** — Different entity or aspect
|
||||
3. **Split on valence change** — Positive → Negative or vice versa
|
||||
4. **Split on domain change** — SVC → PRD → AMB
|
||||
5. **Keep cause→effect together** — Causal chain stays in one span
|
||||
|
||||
---
|
||||
|
||||
## 11. Deferred to v3.3+
|
||||
|
||||
| Item | Reason |
|
||||
|------|--------|
|
||||
| Entity extraction implementation | Requires NER pipeline |
|
||||
| Trust-weighted fact aggregation | Needs more span data |
|
||||
| Secondary domain enforcement | App-level validation sufficient |
|
||||
| Span-based fact counting | Currently review-based, optimize later |
|
||||
|
||||
---
|
||||
|
||||
## 12. Open Questions Resolved
|
||||
|
||||
| Question | Resolution |
|
||||
|----------|------------|
|
||||
| Span → Issue cardinality? | **One-to-one** (not many-to-many) |
|
||||
| Offsets nullable for LLM-inferred? | **No** — required, NOT NULL |
|
||||
| Reprocessing strategy? | **Soft-switch** with is_active flag |
|
||||
| TEXT vs ENUM for dimensions? | **ENUMs** — committed to Postgres |
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### Primary Span Selection Algorithm
|
||||
|
||||
```
|
||||
ORDER BY:
|
||||
1. intensity DESC (I3 > I2 > I1)
|
||||
2. valence ASC (V- > V± > V0 > V+)
|
||||
3. span_index ASC (first wins ties)
|
||||
```
|
||||
|
||||
### Issue Routing Key
|
||||
|
||||
```sql
|
||||
(business_id, place_id, urt_primary, entity_normalized)
|
||||
```
|
||||
|
||||
### Trust Score Calculation
|
||||
|
||||
```sql
|
||||
GREATEST(0.2, base_trust * modifiers) -- Floor prevents collapse
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2026-01-24*
|
||||
Reference in New Issue
Block a user