# ReviewIQ v3.2.1 Addendum: Taxonomy Versioning **Version**: 3.2.1 **Status**: Draft Specification **Date**: 2026-01-24 **Extends**: ReviewIQ Architecture v3.2.0 --- ## Executive Summary This addendum introduces **taxonomy versioning** to ReviewIQ, enabling the system to track how URT classifications evolve over time. Classifications become fully contextualized facts that include the taxonomy version used, allowing accurate historical analysis, safe taxonomy evolution, and cross-version trend normalization. **Key Additions**: - `taxonomy_version` column on classification tables - `urt_taxonomy_versions` table for version metadata - `urt_codes_versioned` table for versioned code definitions - `urt_code_mappings` table for cross-version translation - Translation functions for normalized queries - Lineage tracking for code evolution **Design Principle**: Facts are immutable. A span classified as `J1.01` in taxonomy v5.1 stays that way forever. Translation between versions is explicit and auditable. --- ## Part 1: Problem Statement ### Why Taxonomy Versioning? Without versioning, these scenarios cause data integrity issues: | Scenario | Problem | |----------|---------| | Code renamed | Historical reports show wrong label | | Definition changed | Same code means different things over time | | Code split | Old data can't distinguish new subcategories | | Code merged | Trend analysis shows artificial drop | | Code deprecated | Orphaned data with no valid reference | ### The Three Dimensions of Classification Context A classification is only meaningful when you know: 1. **Taxonomy Version** — What do the codes mean? 2. **Model Version** — How was the text processed? (already in v3.2) 3. **Prompt Version** — What instructions were given? (optional) ``` ┌─────────────────────────────────────────────────────────────┐ │ CLASSIFICATION FACT │ │ │ │ "At time T, using taxonomy V and model M, │ │ this text was classified as code C with confidence X" │ │ │ │ This fact is IMMUTABLE. │ └─────────────────────────────────────────────────────────────┘ ``` --- ## Part 2: Schema Additions ### 2.1 Taxonomy Version Registry ```sql -- ═══════════════════════════════════════════════════════════════ -- TAXONOMY VERSION REGISTRY -- Tracks taxonomy releases like git tags -- ═══════════════════════════════════════════════════════════════ CREATE TABLE urt_taxonomy_versions ( version_id TEXT PRIMARY KEY, -- 'v5.1', 'v5.2', 'v6.0' semver TEXT NOT NULL, -- '5.1.0' for programmatic comparison -- Validity period effective_from DATE NOT NULL, effective_to DATE, -- NULL = current/open-ended is_current BOOLEAN NOT NULL DEFAULT FALSE, -- Metadata release_notes TEXT, changelog_url TEXT, -- Statistics (populated on release) domain_count SMALLINT, category_count SMALLINT, subcode_count SMALLINT, -- Migration hints breaking_changes BOOLEAN NOT NULL DEFAULT FALSE, migration_guide_url TEXT, predecessor_version TEXT REFERENCES urt_taxonomy_versions(version_id), -- Audit created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), created_by TEXT, -- Ensure semver format CONSTRAINT chk_semver_format CHECK (semver ~ '^\d+\.\d+\.\d+(-[a-zA-Z0-9]+)?$') ); -- Only one current version allowed CREATE UNIQUE INDEX uq_taxonomy_current ON urt_taxonomy_versions(is_current) WHERE is_current = TRUE; CREATE INDEX idx_taxonomy_versions_effective ON urt_taxonomy_versions(effective_from, effective_to); COMMENT ON TABLE urt_taxonomy_versions IS 'Registry of URT taxonomy versions with validity periods and migration metadata'; ``` ### 2.2 Versioned Code Definitions ```sql -- ═══════════════════════════════════════════════════════════════ -- VERSIONED CODE DEFINITIONS -- Full code definitions per taxonomy version (SCD Type 2 style) -- ═══════════════════════════════════════════════════════════════ CREATE TABLE urt_codes_versioned ( -- Composite primary key code TEXT NOT NULL, version_id TEXT NOT NULL REFERENCES urt_taxonomy_versions(version_id), -- Classification hierarchy domain CHAR(1) NOT NULL, category TEXT NOT NULL, -- 'J1', 'P2', etc. subcategory TEXT, -- 'J1.01', 'P2.03', etc. tier SMALLINT NOT NULL, -- 1=domain, 2=category, 3=subcode -- Semantics display_name TEXT NOT NULL, definition TEXT NOT NULL, keywords TEXT[] DEFAULT '{}', -- Examples (for classifier training/validation) examples JSONB, -- Format: { -- "positive": ["example text 1", "example text 2"], -- "negative": ["counter-example 1"], -- "boundary": ["edge case 1"] -- } -- Disambiguation dont_confuse_with TEXT, -- Another code dont_confuse_reason TEXT, -- Hierarchy links (within same version) parent_code TEXT, -- Category code if this is subcode -- Cross-version lineage change_type TEXT NOT NULL DEFAULT 'unchanged', predecessor_codes TEXT[] DEFAULT '{}', -- Codes in previous version this evolved from deprecation_reason TEXT, successor_hint TEXT, -- Suggested replacement if deprecated -- Ownership default_owner TEXT, -- Team responsible for this domain -- Ordering display_order SMALLINT NOT NULL DEFAULT 0, -- Audit created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), PRIMARY KEY (code, version_id), CONSTRAINT chk_tier_valid CHECK (tier IN (1, 2, 3)), CONSTRAINT chk_change_type_valid CHECK (change_type IN ( 'new', -- First appearance in this version 'unchanged', -- Identical to previous version 'renamed', -- Display name changed, same meaning 'redefined', -- Definition changed (semantic drift) 'deprecated', -- Marked for removal 'split', -- This code split from a previous code 'merged' -- This code merged from multiple previous codes )), CONSTRAINT chk_code_format CHECK ( (tier = 1 AND code ~ '^[OPJEAVR]$') OR (tier = 2 AND code ~ '^[OPJEAVR][1-4]$') OR (tier = 3 AND code ~ '^[OPJEAVR][1-4]\.[0-9]{2}$') ), CONSTRAINT chk_deprecated_has_reason CHECK (change_type != 'deprecated' OR deprecation_reason IS NOT NULL), -- Parent must be in same version CONSTRAINT fk_parent_code FOREIGN KEY (parent_code, version_id) REFERENCES urt_codes_versioned(code, version_id) ); CREATE INDEX idx_codes_versioned_domain ON urt_codes_versioned(version_id, domain); CREATE INDEX idx_codes_versioned_category ON urt_codes_versioned(version_id, category); CREATE INDEX idx_codes_versioned_change ON urt_codes_versioned(version_id, change_type) WHERE change_type != 'unchanged'; CREATE INDEX idx_codes_versioned_deprecated ON urt_codes_versioned(version_id) WHERE change_type = 'deprecated'; COMMENT ON TABLE urt_codes_versioned IS 'URT code definitions with full semantics, versioned per taxonomy release'; ``` ### 2.3 Cross-Version Code Mappings ```sql -- ═══════════════════════════════════════════════════════════════ -- CROSS-VERSION CODE MAPPINGS -- Explicit translation rules between taxonomy versions -- ═══════════════════════════════════════════════════════════════ CREATE TABLE urt_code_mappings ( id SERIAL PRIMARY KEY, -- Source (older version) from_code TEXT NOT NULL, from_version TEXT NOT NULL, -- Target (newer version) to_code TEXT NOT NULL, to_version TEXT NOT NULL, -- Mapping semantics mapping_type TEXT NOT NULL, -- Confidence and applicability confidence FLOAT NOT NULL DEFAULT 1.0, bidirectional BOOLEAN NOT NULL DEFAULT FALSE, -- Context notes TEXT, effective_from DATE, -- When this mapping became valid -- Audit created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), created_by TEXT, UNIQUE (from_code, from_version, to_code, to_version), CONSTRAINT fk_from_code FOREIGN KEY (from_code, from_version) REFERENCES urt_codes_versioned(code, version_id), CONSTRAINT fk_to_code FOREIGN KEY (to_code, to_version) REFERENCES urt_codes_versioned(code, version_id), CONSTRAINT chk_mapping_type_valid CHECK (mapping_type IN ( 'equivalent', -- Same meaning, safe 1:1 translation 'broader', -- to_code is more general (result of merge) 'narrower', -- to_code is more specific (result of split) 'related', -- Conceptually similar but not equivalent 'superseded' -- from_code deprecated, to_code is replacement )), CONSTRAINT chk_confidence_range CHECK (confidence > 0 AND confidence <= 1.0), CONSTRAINT chk_different_versions CHECK (from_version != to_version), CONSTRAINT chk_version_order CHECK (from_version < to_version) -- Mappings go forward in time ); CREATE INDEX idx_mappings_from ON urt_code_mappings(from_code, from_version); CREATE INDEX idx_mappings_to ON urt_code_mappings(to_code, to_version); CREATE INDEX idx_mappings_type ON urt_code_mappings(mapping_type); COMMENT ON TABLE urt_code_mappings IS 'Explicit translation rules between taxonomy versions for query normalization'; ``` ### 2.4 Schema Modifications to Existing Tables ```sql -- ═══════════════════════════════════════════════════════════════ -- MODIFICATIONS TO EXISTING TABLES -- Add taxonomy_version to classification tables -- ═══════════════════════════════════════════════════════════════ -- review_spans: Add taxonomy version ALTER TABLE review_spans ADD COLUMN taxonomy_version TEXT NOT NULL DEFAULT 'v5.1'; ALTER TABLE review_spans ADD COLUMN prompt_version TEXT; -- Optional: hash or ID of prompt template -- FK to versioned codes (replaces existing FK to urt_codes) ALTER TABLE review_spans DROP CONSTRAINT IF EXISTS fk_spans_urt_primary; ALTER TABLE review_spans ADD CONSTRAINT fk_spans_urt_versioned FOREIGN KEY (urt_primary, taxonomy_version) REFERENCES urt_codes_versioned(code, version_id); -- Index for version-aware queries CREATE INDEX idx_spans_taxonomy_version ON review_spans(taxonomy_version); CREATE INDEX idx_spans_code_version ON review_spans(urt_primary, taxonomy_version); -- reviews_enriched: Add taxonomy version for review-level classification ALTER TABLE reviews_enriched ADD COLUMN taxonomy_version TEXT NOT NULL DEFAULT 'v5.1'; CREATE INDEX idx_enriched_taxonomy_version ON reviews_enriched(taxonomy_version); -- issues: Track which taxonomy version the issue was created under ALTER TABLE issues ADD COLUMN taxonomy_version TEXT NOT NULL DEFAULT 'v5.1'; ALTER TABLE issues ADD CONSTRAINT fk_issues_urt_versioned FOREIGN KEY (primary_subcode, taxonomy_version) REFERENCES urt_codes_versioned(code, version_id); CREATE INDEX idx_issues_taxonomy_version ON issues(taxonomy_version); -- fact_timeseries: Track taxonomy version for aggregated facts ALTER TABLE fact_timeseries ADD COLUMN taxonomy_version TEXT NOT NULL DEFAULT 'v5.1'; -- Update unique constraint to include taxonomy version ALTER TABLE fact_timeseries DROP CONSTRAINT IF EXISTS fact_timeseries_business_id_place_id_period_date_bucket_typ_key; ALTER TABLE fact_timeseries ADD CONSTRAINT uq_fact_timeseries UNIQUE (business_id, place_id, period_date, bucket_type, subject_type, subject_id, taxonomy_version); ``` --- ## Part 3: Functions ### 3.1 Code Translation ```sql -- ═══════════════════════════════════════════════════════════════ -- TRANSLATION FUNCTIONS -- ═══════════════════════════════════════════════════════════════ -- Translate a single code between versions CREATE OR REPLACE FUNCTION translate_urt_code( p_code TEXT, p_from_version TEXT, p_to_version TEXT ) RETURNS TABLE ( translated_code TEXT, mapping_type TEXT, confidence FLOAT, notes TEXT ) AS $$ BEGIN -- If same version, return as-is IF p_from_version = p_to_version THEN RETURN QUERY SELECT p_code, 'identity'::TEXT, 1.0::FLOAT, NULL::TEXT; RETURN; END IF; -- Check for direct mapping RETURN QUERY SELECT m.to_code, m.mapping_type, m.confidence, m.notes FROM urt_code_mappings m WHERE m.from_code = p_code AND m.from_version = p_from_version AND m.to_version = p_to_version; -- If no direct mapping, check if code exists unchanged in target IF NOT FOUND THEN RETURN QUERY SELECT p_code, 'unchanged'::TEXT, 1.0::FLOAT, 'Code exists in both versions without explicit mapping'::TEXT FROM urt_codes_versioned WHERE code = p_code AND version_id = p_to_version AND change_type = 'unchanged'; END IF; -- If still not found, check transitive mappings (one hop) IF NOT FOUND THEN RETURN QUERY SELECT m2.to_code, 'transitive'::TEXT, m1.confidence * m2.confidence, 'Via ' || m1.to_version FROM urt_code_mappings m1 JOIN urt_code_mappings m2 ON m1.to_code = m2.from_code AND m1.to_version = m2.from_version WHERE m1.from_code = p_code AND m1.from_version = p_from_version AND m2.to_version = p_to_version; END IF; END; $$ LANGUAGE plpgsql STABLE; COMMENT ON FUNCTION translate_urt_code IS 'Translate a URT code from one taxonomy version to another'; -- Get full lineage for a code (all historical versions) CREATE OR REPLACE FUNCTION get_code_lineage( p_code TEXT, p_version TEXT DEFAULT NULL ) RETURNS TABLE ( code TEXT, version_id TEXT, display_name TEXT, relationship TEXT, confidence FLOAT, depth INT ) AS $$ DECLARE v_target_version TEXT; BEGIN -- Default to current version v_target_version := COALESCE(p_version, (SELECT tv.version_id FROM urt_taxonomy_versions tv WHERE tv.is_current = TRUE)); RETURN QUERY WITH RECURSIVE lineage AS ( -- Base: the code itself SELECT cv.code, cv.version_id, cv.display_name, 'self'::TEXT as relationship, 1.0::FLOAT as confidence, 0 as depth FROM urt_codes_versioned cv WHERE cv.code = p_code AND cv.version_id = v_target_version UNION ALL -- Recursive: predecessors via mappings SELECT m.from_code, m.from_version, cv.display_name, m.mapping_type, l.confidence * m.confidence, l.depth + 1 FROM lineage l JOIN urt_code_mappings m ON m.to_code = l.code AND m.to_version = l.version_id JOIN urt_codes_versioned cv ON cv.code = m.from_code AND cv.version_id = m.from_version WHERE l.depth < 10 -- Prevent infinite loops ) SELECT * FROM lineage ORDER BY depth, version_id; END; $$ LANGUAGE plpgsql STABLE; COMMENT ON FUNCTION get_code_lineage IS 'Get full historical lineage for a code across all taxonomy versions'; -- Get current version of taxonomy CREATE OR REPLACE FUNCTION get_current_taxonomy_version() RETURNS TEXT AS $$ SELECT version_id FROM urt_taxonomy_versions WHERE is_current = TRUE; $$ LANGUAGE sql STABLE; COMMENT ON FUNCTION get_current_taxonomy_version IS 'Returns the current/active taxonomy version ID'; ``` ### 3.2 Normalized Aggregation ```sql -- ═══════════════════════════════════════════════════════════════ -- NORMALIZED AGGREGATION -- Query spans with automatic translation to target version -- ═══════════════════════════════════════════════════════════════ -- View: Spans normalized to current taxonomy version CREATE OR REPLACE VIEW v_spans_normalized AS SELECT rs.*, COALESCE( (SELECT translated_code FROM translate_urt_code( rs.urt_primary, rs.taxonomy_version, get_current_taxonomy_version() ) LIMIT 1), rs.urt_primary ) as urt_primary_normalized, COALESCE( (SELECT confidence FROM translate_urt_code( rs.urt_primary, rs.taxonomy_version, get_current_taxonomy_version() ) LIMIT 1), 1.0 ) as translation_confidence, get_current_taxonomy_version() as normalized_to_version FROM review_spans rs WHERE rs.is_active = TRUE; COMMENT ON VIEW v_spans_normalized IS 'Review spans with URT codes translated to current taxonomy version'; -- Function: Aggregate facts with version normalization CREATE OR REPLACE FUNCTION aggregate_spans_normalized( p_business_id TEXT, p_place_id TEXT, p_start_date DATE, p_end_date DATE, p_target_version TEXT DEFAULT NULL ) RETURNS TABLE ( urt_code TEXT, span_count BIGINT, negative_count BIGINT, positive_count BIGINT, avg_confidence FLOAT, source_versions TEXT[] ) AS $$ DECLARE v_target TEXT; BEGIN v_target := COALESCE(p_target_version, get_current_taxonomy_version()); RETURN QUERY SELECT COALESCE( (SELECT translated_code FROM translate_urt_code( rs.urt_primary, rs.taxonomy_version, v_target ) LIMIT 1), rs.urt_primary ) as urt_code, COUNT(*) as span_count, COUNT(*) FILTER (WHERE rs.valence = 'V-') as negative_count, COUNT(*) FILTER (WHERE rs.valence = 'V+') as positive_count, AVG(COALESCE( (SELECT confidence FROM translate_urt_code( rs.urt_primary, rs.taxonomy_version, v_target ) LIMIT 1), 1.0 )) as avg_confidence, array_agg(DISTINCT rs.taxonomy_version) as source_versions FROM review_spans rs WHERE rs.business_id = p_business_id AND (rs.place_id = p_place_id OR p_place_id = 'ALL') AND rs.review_time >= p_start_date AND rs.review_time < p_end_date AND rs.is_active = TRUE GROUP BY 1 ORDER BY span_count DESC; END; $$ LANGUAGE plpgsql STABLE; COMMENT ON FUNCTION aggregate_spans_normalized IS 'Aggregate span counts with automatic translation to target taxonomy version'; ``` ### 3.3 Drift Detection ```sql -- ═══════════════════════════════════════════════════════════════ -- DRIFT DETECTION -- Analyze classification changes between taxonomy versions -- ═══════════════════════════════════════════════════════════════ -- Function: Detect potential drift when upgrading taxonomy CREATE OR REPLACE FUNCTION detect_taxonomy_drift( p_from_version TEXT, p_to_version TEXT, p_business_id TEXT DEFAULT NULL ) RETURNS TABLE ( from_code TEXT, from_display_name TEXT, to_code TEXT, to_display_name TEXT, mapping_type TEXT, affected_spans BIGINT, sample_span_ids TEXT[] ) AS $$ BEGIN RETURN QUERY SELECT m.from_code, cv_from.display_name as from_display_name, m.to_code, cv_to.display_name as to_display_name, m.mapping_type, COUNT(rs.span_id) as affected_spans, (array_agg(rs.span_id ORDER BY rs.review_time DESC))[1:5] as sample_span_ids FROM urt_code_mappings m JOIN urt_codes_versioned cv_from ON cv_from.code = m.from_code AND cv_from.version_id = m.from_version JOIN urt_codes_versioned cv_to ON cv_to.code = m.to_code AND cv_to.version_id = m.to_version LEFT JOIN review_spans rs ON rs.urt_primary = m.from_code AND rs.taxonomy_version = m.from_version AND rs.is_active = TRUE AND (p_business_id IS NULL OR rs.business_id = p_business_id) WHERE m.from_version = p_from_version AND m.to_version = p_to_version AND m.mapping_type NOT IN ('equivalent', 'unchanged') GROUP BY m.from_code, cv_from.display_name, m.to_code, cv_to.display_name, m.mapping_type HAVING COUNT(rs.span_id) > 0 ORDER BY affected_spans DESC; END; $$ LANGUAGE plpgsql STABLE; COMMENT ON FUNCTION detect_taxonomy_drift IS 'Identify spans affected by non-equivalent mappings between taxonomy versions'; -- Function: Get deprecated codes still in use CREATE OR REPLACE FUNCTION get_deprecated_codes_in_use( p_version TEXT, p_business_id TEXT DEFAULT NULL ) RETURNS TABLE ( code TEXT, display_name TEXT, deprecation_reason TEXT, successor_hint TEXT, span_count BIGINT, latest_span_date TIMESTAMPTZ ) AS $$ BEGIN RETURN QUERY SELECT cv.code, cv.display_name, cv.deprecation_reason, cv.successor_hint, COUNT(rs.span_id) as span_count, MAX(rs.review_time) as latest_span_date FROM urt_codes_versioned cv LEFT JOIN review_spans rs ON rs.urt_primary = cv.code AND rs.taxonomy_version = cv.version_id AND rs.is_active = TRUE AND (p_business_id IS NULL OR rs.business_id = p_business_id) WHERE cv.version_id = p_version AND cv.change_type = 'deprecated' GROUP BY cv.code, cv.display_name, cv.deprecation_reason, cv.successor_hint HAVING COUNT(rs.span_id) > 0 ORDER BY span_count DESC; END; $$ LANGUAGE plpgsql STABLE; COMMENT ON FUNCTION get_deprecated_codes_in_use IS 'Find deprecated codes that still have active spans'; ``` --- ## Part 4: Seed Data ### 4.1 Initial Version Registration ```sql -- ═══════════════════════════════════════════════════════════════ -- SEED DATA: v5.1 Initial Version -- ═══════════════════════════════════════════════════════════════ INSERT INTO urt_taxonomy_versions ( version_id, semver, effective_from, is_current, release_notes, domain_count, category_count, subcode_count, breaking_changes, created_by ) VALUES ( 'v5.1', '5.1.0', '2026-01-01', TRUE, 'Initial URT v5.1 release. 7 domains, 28 categories, 138 subcodes.', 7, 28, 138, FALSE, 'system' ); -- Note: urt_codes_versioned should be populated from B1-urt-codes.yaml -- using a separate seed script. See: scripts/seed-urt-codes.py ``` ### 4.2 Sample Versioned Codes (Subset) ```sql -- Sample: Domain-level codes for v5.1 INSERT INTO urt_codes_versioned ( code, version_id, domain, category, tier, display_name, definition, default_owner, display_order, change_type ) VALUES ('O', 'v5.1', 'O', 'O', 1, 'Offering', 'The core product, service, or outcome delivered', 'Product / Operations', 1, 'new'), ('P', 'v5.1', 'P', 'P', 1, 'People', 'Human interactions and personnel behavior', 'HR / Training', 2, 'new'), ('J', 'v5.1', 'J', 'J', 1, 'Journey', 'The process, timing, and operational flow', 'Operations / Process', 3, 'new'), ('E', 'v5.1', 'E', 'E', 1, 'Environment', 'Physical, digital, and ambient context', 'Facilities / IT', 4, 'new'), ('A', 'v5.1', 'A', 'A', 1, 'Access', 'Availability, accessibility, and inclusivity', 'Compliance / Design', 5, 'new'), ('V', 'v5.1', 'V', 'V', 1, 'Value', 'Cost, pricing, and worth of the exchange', 'Finance / Pricing', 6, 'new'), ('R', 'v5.1', 'R', 'R', 1, 'Relationship', 'Trust, reliability, and ongoing connection', 'Leadership / CX', 7, 'new'); -- Sample: Category-level codes for Journey domain INSERT INTO urt_codes_versioned ( code, version_id, domain, category, tier, parent_code, display_name, definition, display_order, change_type ) VALUES ('J1', 'v5.1', 'J', 'J1', 2, 'J', 'Timing', 'Speed, punctuality, and time management', 1, 'new'), ('J2', 'v5.1', 'J', 'J2', 2, 'J', 'Ease', 'Effort required and friction encountered', 2, 'new'), ('J3', 'v5.1', 'J', 'J3', 2, 'J', 'Reliability', 'Consistency and predictability of process', 3, 'new'), ('J4', 'v5.1', 'J', 'J4', 2, 'J', 'Resolution', 'How problems are handled when they arise', 4, 'new'); -- Sample: Subcode-level codes for J1 (Timing) INSERT INTO urt_codes_versioned ( code, version_id, domain, category, subcategory, tier, parent_code, display_name, definition, dont_confuse_with, dont_confuse_reason, display_order, change_type ) VALUES ('J1.01', 'v5.1', 'J', 'J1', 'J1.01', 3, 'J1', 'Wait Time', 'Duration of waiting before service begins or between steps', 'J1.03', 'J1.03 is total duration, J1.01 is specifically idle waiting', 1, 'new'), ('J1.02', 'v5.1', 'J', 'J1', 'J1.02', 3, 'J1', 'Punctuality', 'On-time arrival, start, or delivery vs. scheduled time', 'J3.01', 'J3.01 is consistency over time, J1.02 is single-instance timeliness', 2, 'new'), ('J1.03', 'v5.1', 'J', 'J1', 'J1.03', 3, 'J1', 'Service Speed', 'Overall pace of active service delivery', 'O1.02', 'O1.02 is product performance, J1.03 is service delivery speed', 3, 'new'), ('J1.04', 'v5.1', 'J', 'J1', 'J1.04', 3, 'J1', 'Response Time', 'Speed of response to customer requests or inquiries', 'P3.01', 'P3.01 is attentiveness, J1.04 is measured response time', 4, 'new'), ('J1.05', 'v5.1', 'J', 'J1', 'J1.05', 3, 'J1', 'Time Respect', 'Respect for customer''s time and schedule constraints', 'P1.01', 'P1.01 is general attitude, J1.05 is specifically about time', 5, 'new'); ``` --- ## Part 5: Migration Guide ### 5.1 Migration Steps (v3.2 → v3.2.1) ```sql -- ═══════════════════════════════════════════════════════════════ -- MIGRATION SCRIPT: v3.2 to v3.2.1 -- Run in a transaction, test on staging first -- ═══════════════════════════════════════════════════════════════ BEGIN; -- Step 1: Create new tables -- (Run DDL from Part 2 above) -- Step 2: Register initial taxonomy version INSERT INTO urt_taxonomy_versions ( version_id, semver, effective_from, is_current, release_notes, breaking_changes ) VALUES ( 'v5.1', '5.1.0', '2026-01-01', TRUE, 'Initial URT v5.1 - baseline for versioning', FALSE ); -- Step 3: Populate urt_codes_versioned from existing urt_codes INSERT INTO urt_codes_versioned ( code, version_id, domain, category, subcategory, tier, display_name, definition, keywords, display_order, change_type ) SELECT code, 'v5.1', domain, category, CASE WHEN code ~ '^[OPJEAVR][1-4]\.[0-9]{2}$' THEN code ELSE NULL END, CASE WHEN code ~ '^[OPJEAVR]$' THEN 1 WHEN code ~ '^[OPJEAVR][1-4]$' THEN 2 ELSE 3 END, display_name, description, keywords, 0, 'new' FROM urt_codes; -- Step 4: Add taxonomy_version columns with default ALTER TABLE review_spans ADD COLUMN taxonomy_version TEXT NOT NULL DEFAULT 'v5.1'; ALTER TABLE reviews_enriched ADD COLUMN taxonomy_version TEXT NOT NULL DEFAULT 'v5.1'; ALTER TABLE issues ADD COLUMN taxonomy_version TEXT NOT NULL DEFAULT 'v5.1'; ALTER TABLE fact_timeseries ADD COLUMN taxonomy_version TEXT NOT NULL DEFAULT 'v5.1'; -- Step 5: Add FK constraints ALTER TABLE review_spans ADD CONSTRAINT fk_spans_urt_versioned FOREIGN KEY (urt_primary, taxonomy_version) REFERENCES urt_codes_versioned(code, version_id); ALTER TABLE issues ADD CONSTRAINT fk_issues_urt_versioned FOREIGN KEY (primary_subcode, taxonomy_version) REFERENCES urt_codes_versioned(code, version_id); -- Step 6: Create indexes CREATE INDEX idx_spans_taxonomy_version ON review_spans(taxonomy_version); CREATE INDEX idx_enriched_taxonomy_version ON reviews_enriched(taxonomy_version); CREATE INDEX idx_issues_taxonomy_version ON issues(taxonomy_version); -- Step 7: Update unique constraint on fact_timeseries ALTER TABLE fact_timeseries DROP CONSTRAINT IF EXISTS fact_timeseries_business_id_place_id_period_date_bucket_typ_key; ALTER TABLE fact_timeseries ADD CONSTRAINT uq_fact_timeseries UNIQUE (business_id, place_id, period_date, bucket_type, subject_type, subject_id, taxonomy_version); COMMIT; ``` ### 5.2 Application Code Changes ```python # Classification pipeline: Include taxonomy version async def store_review_spans(enriched: dict, spans: list[dict], batch_id: str): """Store extracted spans with taxonomy version.""" # Get current taxonomy version taxonomy_version = await get_current_taxonomy_version() for idx, span in enumerate(spans): await db.execute(""" INSERT INTO review_spans ( span_id, business_id, place_id, source, review_id, review_version, span_index, span_text, span_start, span_end, profile, urt_primary, valence, intensity, taxonomy_version, model_version, -- NEW: taxonomy_version ... ) VALUES (...) """, [ ..., taxonomy_version, 'gpt-4o-mini-2024-07-18', ... ]) # Trend queries: Support version normalization async def get_timeline( business_id: str, place_id: str, subject_id: str, start: date, end: date, normalize_to_version: str = None # NEW: optional normalization ) -> list[dict]: """Query timeline with optional version normalization.""" if normalize_to_version: # Use normalized aggregation return await db.query(""" SELECT * FROM aggregate_spans_normalized(%s, %s, %s, %s, %s) WHERE urt_code = %s """, [business_id, place_id, start, end, normalize_to_version, subject_id]) else: # Standard query (preserves original versions) return await db.query(""" SELECT * FROM fact_timeseries WHERE business_id = %s AND place_id = %s AND subject_type = 'urt_code' AND subject_id = %s AND period_date BETWEEN %s AND %s """, [business_id, place_id, subject_id, start, end]) ``` --- ## Part 6: Query Patterns ### 6.1 Point-in-Time Query (Historical Context) ```sql -- "What were our J1 issues in Q3 2025, as classified then?" -- Returns data with original taxonomy context SELECT rs.span_id, rs.span_text, rs.urt_primary, rs.taxonomy_version, cv.display_name, cv.definition FROM review_spans rs JOIN urt_codes_versioned cv ON rs.urt_primary = cv.code AND rs.taxonomy_version = cv.version_id WHERE rs.business_id = 'acme' AND rs.urt_primary LIKE 'J1%' AND rs.review_time BETWEEN '2025-07-01' AND '2025-09-30' AND rs.is_active = TRUE ORDER BY rs.review_time DESC; ``` ### 6.2 Normalized Trend Query ```sql -- "Show all historical data mapped to current J1.01" -- Translates across taxonomy versions SELECT DATE_TRUNC('month', rs.review_time) as month, rs.taxonomy_version as original_version, COUNT(*) as span_count FROM review_spans rs JOIN get_code_lineage('J1.01') lineage ON rs.urt_primary = lineage.code AND rs.taxonomy_version = lineage.version_id WHERE rs.business_id = 'acme' AND rs.is_active = TRUE GROUP BY 1, 2 ORDER BY 1, 2; ``` ### 6.3 Drift Analysis ```sql -- "What spans would be affected by upgrading to v5.2?" SELECT * FROM detect_taxonomy_drift('v5.1', 'v5.2', 'acme'); -- "Are we still using deprecated codes?" SELECT * FROM get_deprecated_codes_in_use('v5.2', 'acme'); ``` ### 6.4 Cross-Version Comparison ```sql -- "Compare classification distribution between taxonomy versions" SELECT rs.taxonomy_version, LEFT(rs.urt_primary, 1) as domain, COUNT(*) as span_count, ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (PARTITION BY rs.taxonomy_version), 1) as pct FROM review_spans rs WHERE rs.business_id = 'acme' AND rs.is_active = TRUE GROUP BY rs.taxonomy_version, LEFT(rs.urt_primary, 1) ORDER BY rs.taxonomy_version, span_count DESC; ``` --- ## Part 7: Operational Procedures ### 7.1 Releasing a New Taxonomy Version ```sql -- Step 1: Register new version (not current yet) INSERT INTO urt_taxonomy_versions ( version_id, semver, effective_from, is_current, release_notes, predecessor_version, breaking_changes ) VALUES ( 'v5.2', '5.2.0', '2026-04-01', FALSE, 'Split J1.01 into J1.01a/J1.01b for queue vs service wait', 'v5.1', TRUE ); -- Step 2: Populate versioned codes INSERT INTO urt_codes_versioned (code, version_id, ...) SELECT ... FROM urt_codes_v52_staging; -- Step 3: Define mappings from v5.1 to v5.2 INSERT INTO urt_code_mappings ( from_code, from_version, to_code, to_version, mapping_type, confidence, notes ) VALUES ('J1.01', 'v5.1', 'J1.01a', 'v5.2', 'narrower', 0.7, 'J1.01 split: queue wait'), ('J1.01', 'v5.1', 'J1.01b', 'v5.2', 'narrower', 0.3, 'J1.01 split: service wait'); -- Step 4: Run drift analysis SELECT * FROM detect_taxonomy_drift('v5.1', 'v5.2'); -- Step 5: Activate new version UPDATE urt_taxonomy_versions SET is_current = FALSE WHERE is_current = TRUE; UPDATE urt_taxonomy_versions SET is_current = TRUE WHERE version_id = 'v5.2'; UPDATE urt_taxonomy_versions SET effective_to = CURRENT_DATE WHERE version_id = 'v5.1'; ``` ### 7.2 Reclassification Workflow ```sql -- Reclassify historical spans with new taxonomy (creates parallel records) -- Uses soft-switch pattern from v3.2 -- 1. Create new batch with new taxonomy version INSERT INTO review_spans ( span_id, ..., taxonomy_version, ingest_batch_id, is_active ) SELECT generate_span_id(...), -- New span ID ..., 'v5.2', -- New taxonomy version 'reclassify-batch-001', FALSE -- Not active yet FROM review_spans_to_reclassify; -- 2. Run LLM reclassification on batch -- (Application code) -- 3. Atomic switch (deactivate old, activate new) BEGIN; UPDATE review_spans SET is_active = FALSE WHERE taxonomy_version = 'v5.1' AND business_id = 'acme'; UPDATE review_spans SET is_active = TRUE WHERE ingest_batch_id = 'reclassify-batch-001'; COMMIT; ``` --- ## Part 8: Decisions | Decision | Choice | Rationale | |----------|--------|-----------| | Version ID format | `v{major}.{minor}` | Human-readable, matches URT releases | | Semver column | Separate TEXT field | Enables programmatic version comparison | | Code FK strategy | Composite `(code, version_id)` | Prevents orphaned classifications | | Mapping direction | Forward only (old→new) | Simpler model, matches time flow | | Transitive mappings | Single hop in function | Balances accuracy vs complexity | | Default version | `'v5.1'` hardcoded | Safe baseline, explicit upgrade path | | Fact table versioning | Per-row `taxonomy_version` | Enables version-specific aggregation | --- ## Part 9: Future Considerations (v3.3+) | Feature | Description | |---------|-------------| | **Prompt versioning** | Track prompt templates used for classification | | **A/B testing** | Compare classifier accuracy across versions | | **Automatic mapping suggestion** | LLM-powered mapping recommendations | | **Version-aware dashboards** | UI toggle for normalized vs point-in-time | | **Bulk reclassification pipeline** | Scheduled jobs for taxonomy upgrades | --- ## Document Control | Field | Value | |-------|-------| | **Document** | ReviewIQ v3.2.1 Addendum: Taxonomy Versioning | | **Status** | Draft Specification | | **Date** | 2026-01-24 | | **Extends** | ReviewIQ Architecture v3.2.0 | | **Author** | Architecture Team | ### Changelog | Version | Changes | |---------|---------| | v3.2.1-draft | Initial taxonomy versioning specification | --- *End of ReviewIQ v3.2.1 Addendum*