New artifacts: - ReviewIQ-Pipeline-DevGuide.md: Entry point for pipeline work - ReviewIQ-Pipeline-Contracts-v1.md: Stage I/O specs, validation rules, test fixtures - ReviewIQ-Pipeline-Checklist.md: Per-stage implementation checklists - ReviewIQ-Codebase-Overview.md: File structure, integration points - ReviewIQ-v3.2.1-Taxonomy-Versioning.md: Taxonomy versioning addendum Updated: - ReviewIQ-v32-Decisions.md: Added B2 audit findings, taxonomy versioning decisions, pipeline status These artifacts enable parallel development of pipeline stages 1-4 with: - Independent validation (35 rules across stages) - Clear input/output contracts - Test fixtures for each stage - Definition of done criteria Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
38 KiB
38 KiB
ReviewIQ v3.2.1 Addendum: Taxonomy Versioning
Version: 3.2.1 Status: Draft Specification Date: 2026-01-24 Extends: ReviewIQ Architecture v3.2.0
Executive Summary
This addendum introduces taxonomy versioning to ReviewIQ, enabling the system to track how URT classifications evolve over time. Classifications become fully contextualized facts that include the taxonomy version used, allowing accurate historical analysis, safe taxonomy evolution, and cross-version trend normalization.
Key Additions:
taxonomy_versioncolumn on classification tablesurt_taxonomy_versionstable for version metadataurt_codes_versionedtable for versioned code definitionsurt_code_mappingstable for cross-version translation- Translation functions for normalized queries
- Lineage tracking for code evolution
Design Principle: Facts are immutable. A span classified as J1.01 in taxonomy v5.1 stays that way forever. Translation between versions is explicit and auditable.
Part 1: Problem Statement
Why Taxonomy Versioning?
Without versioning, these scenarios cause data integrity issues:
| Scenario | Problem |
|---|---|
| Code renamed | Historical reports show wrong label |
| Definition changed | Same code means different things over time |
| Code split | Old data can't distinguish new subcategories |
| Code merged | Trend analysis shows artificial drop |
| Code deprecated | Orphaned data with no valid reference |
The Three Dimensions of Classification Context
A classification is only meaningful when you know:
- Taxonomy Version — What do the codes mean?
- Model Version — How was the text processed? (already in v3.2)
- Prompt Version — What instructions were given? (optional)
┌─────────────────────────────────────────────────────────────┐
│ CLASSIFICATION FACT │
│ │
│ "At time T, using taxonomy V and model M, │
│ this text was classified as code C with confidence X" │
│ │
│ This fact is IMMUTABLE. │
└─────────────────────────────────────────────────────────────┘
Part 2: Schema Additions
2.1 Taxonomy Version Registry
-- ═══════════════════════════════════════════════════════════════
-- TAXONOMY VERSION REGISTRY
-- Tracks taxonomy releases like git tags
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE urt_taxonomy_versions (
version_id TEXT PRIMARY KEY, -- 'v5.1', 'v5.2', 'v6.0'
semver TEXT NOT NULL, -- '5.1.0' for programmatic comparison
-- Validity period
effective_from DATE NOT NULL,
effective_to DATE, -- NULL = current/open-ended
is_current BOOLEAN NOT NULL DEFAULT FALSE,
-- Metadata
release_notes TEXT,
changelog_url TEXT,
-- Statistics (populated on release)
domain_count SMALLINT,
category_count SMALLINT,
subcode_count SMALLINT,
-- Migration hints
breaking_changes BOOLEAN NOT NULL DEFAULT FALSE,
migration_guide_url TEXT,
predecessor_version TEXT REFERENCES urt_taxonomy_versions(version_id),
-- Audit
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by TEXT,
-- Ensure semver format
CONSTRAINT chk_semver_format
CHECK (semver ~ '^\d+\.\d+\.\d+(-[a-zA-Z0-9]+)?$')
);
-- Only one current version allowed
CREATE UNIQUE INDEX uq_taxonomy_current
ON urt_taxonomy_versions(is_current)
WHERE is_current = TRUE;
CREATE INDEX idx_taxonomy_versions_effective
ON urt_taxonomy_versions(effective_from, effective_to);
COMMENT ON TABLE urt_taxonomy_versions IS
'Registry of URT taxonomy versions with validity periods and migration metadata';
2.2 Versioned Code Definitions
-- ═══════════════════════════════════════════════════════════════
-- VERSIONED CODE DEFINITIONS
-- Full code definitions per taxonomy version (SCD Type 2 style)
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE urt_codes_versioned (
-- Composite primary key
code TEXT NOT NULL,
version_id TEXT NOT NULL REFERENCES urt_taxonomy_versions(version_id),
-- Classification hierarchy
domain CHAR(1) NOT NULL,
category TEXT NOT NULL, -- 'J1', 'P2', etc.
subcategory TEXT, -- 'J1.01', 'P2.03', etc.
tier SMALLINT NOT NULL, -- 1=domain, 2=category, 3=subcode
-- Semantics
display_name TEXT NOT NULL,
definition TEXT NOT NULL,
keywords TEXT[] DEFAULT '{}',
-- Examples (for classifier training/validation)
examples JSONB,
-- Format: {
-- "positive": ["example text 1", "example text 2"],
-- "negative": ["counter-example 1"],
-- "boundary": ["edge case 1"]
-- }
-- Disambiguation
dont_confuse_with TEXT, -- Another code
dont_confuse_reason TEXT,
-- Hierarchy links (within same version)
parent_code TEXT, -- Category code if this is subcode
-- Cross-version lineage
change_type TEXT NOT NULL DEFAULT 'unchanged',
predecessor_codes TEXT[] DEFAULT '{}', -- Codes in previous version this evolved from
deprecation_reason TEXT,
successor_hint TEXT, -- Suggested replacement if deprecated
-- Ownership
default_owner TEXT, -- Team responsible for this domain
-- Ordering
display_order SMALLINT NOT NULL DEFAULT 0,
-- Audit
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
PRIMARY KEY (code, version_id),
CONSTRAINT chk_tier_valid
CHECK (tier IN (1, 2, 3)),
CONSTRAINT chk_change_type_valid
CHECK (change_type IN (
'new', -- First appearance in this version
'unchanged', -- Identical to previous version
'renamed', -- Display name changed, same meaning
'redefined', -- Definition changed (semantic drift)
'deprecated', -- Marked for removal
'split', -- This code split from a previous code
'merged' -- This code merged from multiple previous codes
)),
CONSTRAINT chk_code_format
CHECK (
(tier = 1 AND code ~ '^[OPJEAVR]$') OR
(tier = 2 AND code ~ '^[OPJEAVR][1-4]$') OR
(tier = 3 AND code ~ '^[OPJEAVR][1-4]\.[0-9]{2}$')
),
CONSTRAINT chk_deprecated_has_reason
CHECK (change_type != 'deprecated' OR deprecation_reason IS NOT NULL),
-- Parent must be in same version
CONSTRAINT fk_parent_code
FOREIGN KEY (parent_code, version_id)
REFERENCES urt_codes_versioned(code, version_id)
);
CREATE INDEX idx_codes_versioned_domain
ON urt_codes_versioned(version_id, domain);
CREATE INDEX idx_codes_versioned_category
ON urt_codes_versioned(version_id, category);
CREATE INDEX idx_codes_versioned_change
ON urt_codes_versioned(version_id, change_type)
WHERE change_type != 'unchanged';
CREATE INDEX idx_codes_versioned_deprecated
ON urt_codes_versioned(version_id)
WHERE change_type = 'deprecated';
COMMENT ON TABLE urt_codes_versioned IS
'URT code definitions with full semantics, versioned per taxonomy release';
2.3 Cross-Version Code Mappings
-- ═══════════════════════════════════════════════════════════════
-- CROSS-VERSION CODE MAPPINGS
-- Explicit translation rules between taxonomy versions
-- ═══════════════════════════════════════════════════════════════
CREATE TABLE urt_code_mappings (
id SERIAL PRIMARY KEY,
-- Source (older version)
from_code TEXT NOT NULL,
from_version TEXT NOT NULL,
-- Target (newer version)
to_code TEXT NOT NULL,
to_version TEXT NOT NULL,
-- Mapping semantics
mapping_type TEXT NOT NULL,
-- Confidence and applicability
confidence FLOAT NOT NULL DEFAULT 1.0,
bidirectional BOOLEAN NOT NULL DEFAULT FALSE,
-- Context
notes TEXT,
effective_from DATE, -- When this mapping became valid
-- Audit
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by TEXT,
UNIQUE (from_code, from_version, to_code, to_version),
CONSTRAINT fk_from_code
FOREIGN KEY (from_code, from_version)
REFERENCES urt_codes_versioned(code, version_id),
CONSTRAINT fk_to_code
FOREIGN KEY (to_code, to_version)
REFERENCES urt_codes_versioned(code, version_id),
CONSTRAINT chk_mapping_type_valid
CHECK (mapping_type IN (
'equivalent', -- Same meaning, safe 1:1 translation
'broader', -- to_code is more general (result of merge)
'narrower', -- to_code is more specific (result of split)
'related', -- Conceptually similar but not equivalent
'superseded' -- from_code deprecated, to_code is replacement
)),
CONSTRAINT chk_confidence_range
CHECK (confidence > 0 AND confidence <= 1.0),
CONSTRAINT chk_different_versions
CHECK (from_version != to_version),
CONSTRAINT chk_version_order
CHECK (from_version < to_version) -- Mappings go forward in time
);
CREATE INDEX idx_mappings_from
ON urt_code_mappings(from_code, from_version);
CREATE INDEX idx_mappings_to
ON urt_code_mappings(to_code, to_version);
CREATE INDEX idx_mappings_type
ON urt_code_mappings(mapping_type);
COMMENT ON TABLE urt_code_mappings IS
'Explicit translation rules between taxonomy versions for query normalization';
2.4 Schema Modifications to Existing Tables
-- ═══════════════════════════════════════════════════════════════
-- MODIFICATIONS TO EXISTING TABLES
-- Add taxonomy_version to classification tables
-- ═══════════════════════════════════════════════════════════════
-- review_spans: Add taxonomy version
ALTER TABLE review_spans
ADD COLUMN taxonomy_version TEXT NOT NULL DEFAULT 'v5.1';
ALTER TABLE review_spans
ADD COLUMN prompt_version TEXT; -- Optional: hash or ID of prompt template
-- FK to versioned codes (replaces existing FK to urt_codes)
ALTER TABLE review_spans
DROP CONSTRAINT IF EXISTS fk_spans_urt_primary;
ALTER TABLE review_spans
ADD CONSTRAINT fk_spans_urt_versioned
FOREIGN KEY (urt_primary, taxonomy_version)
REFERENCES urt_codes_versioned(code, version_id);
-- Index for version-aware queries
CREATE INDEX idx_spans_taxonomy_version
ON review_spans(taxonomy_version);
CREATE INDEX idx_spans_code_version
ON review_spans(urt_primary, taxonomy_version);
-- reviews_enriched: Add taxonomy version for review-level classification
ALTER TABLE reviews_enriched
ADD COLUMN taxonomy_version TEXT NOT NULL DEFAULT 'v5.1';
CREATE INDEX idx_enriched_taxonomy_version
ON reviews_enriched(taxonomy_version);
-- issues: Track which taxonomy version the issue was created under
ALTER TABLE issues
ADD COLUMN taxonomy_version TEXT NOT NULL DEFAULT 'v5.1';
ALTER TABLE issues
ADD CONSTRAINT fk_issues_urt_versioned
FOREIGN KEY (primary_subcode, taxonomy_version)
REFERENCES urt_codes_versioned(code, version_id);
CREATE INDEX idx_issues_taxonomy_version
ON issues(taxonomy_version);
-- fact_timeseries: Track taxonomy version for aggregated facts
ALTER TABLE fact_timeseries
ADD COLUMN taxonomy_version TEXT NOT NULL DEFAULT 'v5.1';
-- Update unique constraint to include taxonomy version
ALTER TABLE fact_timeseries
DROP CONSTRAINT IF EXISTS fact_timeseries_business_id_place_id_period_date_bucket_typ_key;
ALTER TABLE fact_timeseries
ADD CONSTRAINT uq_fact_timeseries
UNIQUE (business_id, place_id, period_date, bucket_type,
subject_type, subject_id, taxonomy_version);
Part 3: Functions
3.1 Code Translation
-- ═══════════════════════════════════════════════════════════════
-- TRANSLATION FUNCTIONS
-- ═══════════════════════════════════════════════════════════════
-- Translate a single code between versions
CREATE OR REPLACE FUNCTION translate_urt_code(
p_code TEXT,
p_from_version TEXT,
p_to_version TEXT
) RETURNS TABLE (
translated_code TEXT,
mapping_type TEXT,
confidence FLOAT,
notes TEXT
) AS $$
BEGIN
-- If same version, return as-is
IF p_from_version = p_to_version THEN
RETURN QUERY SELECT p_code, 'identity'::TEXT, 1.0::FLOAT, NULL::TEXT;
RETURN;
END IF;
-- Check for direct mapping
RETURN QUERY
SELECT m.to_code, m.mapping_type, m.confidence, m.notes
FROM urt_code_mappings m
WHERE m.from_code = p_code
AND m.from_version = p_from_version
AND m.to_version = p_to_version;
-- If no direct mapping, check if code exists unchanged in target
IF NOT FOUND THEN
RETURN QUERY
SELECT p_code, 'unchanged'::TEXT, 1.0::FLOAT,
'Code exists in both versions without explicit mapping'::TEXT
FROM urt_codes_versioned
WHERE code = p_code
AND version_id = p_to_version
AND change_type = 'unchanged';
END IF;
-- If still not found, check transitive mappings (one hop)
IF NOT FOUND THEN
RETURN QUERY
SELECT m2.to_code,
'transitive'::TEXT,
m1.confidence * m2.confidence,
'Via ' || m1.to_version
FROM urt_code_mappings m1
JOIN urt_code_mappings m2
ON m1.to_code = m2.from_code
AND m1.to_version = m2.from_version
WHERE m1.from_code = p_code
AND m1.from_version = p_from_version
AND m2.to_version = p_to_version;
END IF;
END;
$$ LANGUAGE plpgsql STABLE;
COMMENT ON FUNCTION translate_urt_code IS
'Translate a URT code from one taxonomy version to another';
-- Get full lineage for a code (all historical versions)
CREATE OR REPLACE FUNCTION get_code_lineage(
p_code TEXT,
p_version TEXT DEFAULT NULL
) RETURNS TABLE (
code TEXT,
version_id TEXT,
display_name TEXT,
relationship TEXT,
confidence FLOAT,
depth INT
) AS $$
DECLARE
v_target_version TEXT;
BEGIN
-- Default to current version
v_target_version := COALESCE(p_version,
(SELECT tv.version_id FROM urt_taxonomy_versions tv WHERE tv.is_current = TRUE));
RETURN QUERY
WITH RECURSIVE lineage AS (
-- Base: the code itself
SELECT
cv.code,
cv.version_id,
cv.display_name,
'self'::TEXT as relationship,
1.0::FLOAT as confidence,
0 as depth
FROM urt_codes_versioned cv
WHERE cv.code = p_code AND cv.version_id = v_target_version
UNION ALL
-- Recursive: predecessors via mappings
SELECT
m.from_code,
m.from_version,
cv.display_name,
m.mapping_type,
l.confidence * m.confidence,
l.depth + 1
FROM lineage l
JOIN urt_code_mappings m
ON m.to_code = l.code
AND m.to_version = l.version_id
JOIN urt_codes_versioned cv
ON cv.code = m.from_code
AND cv.version_id = m.from_version
WHERE l.depth < 10 -- Prevent infinite loops
)
SELECT * FROM lineage
ORDER BY depth, version_id;
END;
$$ LANGUAGE plpgsql STABLE;
COMMENT ON FUNCTION get_code_lineage IS
'Get full historical lineage for a code across all taxonomy versions';
-- Get current version of taxonomy
CREATE OR REPLACE FUNCTION get_current_taxonomy_version()
RETURNS TEXT AS $$
SELECT version_id
FROM urt_taxonomy_versions
WHERE is_current = TRUE;
$$ LANGUAGE sql STABLE;
COMMENT ON FUNCTION get_current_taxonomy_version IS
'Returns the current/active taxonomy version ID';
3.2 Normalized Aggregation
-- ═══════════════════════════════════════════════════════════════
-- NORMALIZED AGGREGATION
-- Query spans with automatic translation to target version
-- ═══════════════════════════════════════════════════════════════
-- View: Spans normalized to current taxonomy version
CREATE OR REPLACE VIEW v_spans_normalized AS
SELECT
rs.*,
COALESCE(
(SELECT translated_code FROM translate_urt_code(
rs.urt_primary,
rs.taxonomy_version,
get_current_taxonomy_version()
) LIMIT 1),
rs.urt_primary
) as urt_primary_normalized,
COALESCE(
(SELECT confidence FROM translate_urt_code(
rs.urt_primary,
rs.taxonomy_version,
get_current_taxonomy_version()
) LIMIT 1),
1.0
) as translation_confidence,
get_current_taxonomy_version() as normalized_to_version
FROM review_spans rs
WHERE rs.is_active = TRUE;
COMMENT ON VIEW v_spans_normalized IS
'Review spans with URT codes translated to current taxonomy version';
-- Function: Aggregate facts with version normalization
CREATE OR REPLACE FUNCTION aggregate_spans_normalized(
p_business_id TEXT,
p_place_id TEXT,
p_start_date DATE,
p_end_date DATE,
p_target_version TEXT DEFAULT NULL
) RETURNS TABLE (
urt_code TEXT,
span_count BIGINT,
negative_count BIGINT,
positive_count BIGINT,
avg_confidence FLOAT,
source_versions TEXT[]
) AS $$
DECLARE
v_target TEXT;
BEGIN
v_target := COALESCE(p_target_version, get_current_taxonomy_version());
RETURN QUERY
SELECT
COALESCE(
(SELECT translated_code FROM translate_urt_code(
rs.urt_primary, rs.taxonomy_version, v_target
) LIMIT 1),
rs.urt_primary
) as urt_code,
COUNT(*) as span_count,
COUNT(*) FILTER (WHERE rs.valence = 'V-') as negative_count,
COUNT(*) FILTER (WHERE rs.valence = 'V+') as positive_count,
AVG(COALESCE(
(SELECT confidence FROM translate_urt_code(
rs.urt_primary, rs.taxonomy_version, v_target
) LIMIT 1),
1.0
)) as avg_confidence,
array_agg(DISTINCT rs.taxonomy_version) as source_versions
FROM review_spans rs
WHERE rs.business_id = p_business_id
AND (rs.place_id = p_place_id OR p_place_id = 'ALL')
AND rs.review_time >= p_start_date
AND rs.review_time < p_end_date
AND rs.is_active = TRUE
GROUP BY 1
ORDER BY span_count DESC;
END;
$$ LANGUAGE plpgsql STABLE;
COMMENT ON FUNCTION aggregate_spans_normalized IS
'Aggregate span counts with automatic translation to target taxonomy version';
3.3 Drift Detection
-- ═══════════════════════════════════════════════════════════════
-- DRIFT DETECTION
-- Analyze classification changes between taxonomy versions
-- ═══════════════════════════════════════════════════════════════
-- Function: Detect potential drift when upgrading taxonomy
CREATE OR REPLACE FUNCTION detect_taxonomy_drift(
p_from_version TEXT,
p_to_version TEXT,
p_business_id TEXT DEFAULT NULL
) RETURNS TABLE (
from_code TEXT,
from_display_name TEXT,
to_code TEXT,
to_display_name TEXT,
mapping_type TEXT,
affected_spans BIGINT,
sample_span_ids TEXT[]
) AS $$
BEGIN
RETURN QUERY
SELECT
m.from_code,
cv_from.display_name as from_display_name,
m.to_code,
cv_to.display_name as to_display_name,
m.mapping_type,
COUNT(rs.span_id) as affected_spans,
(array_agg(rs.span_id ORDER BY rs.review_time DESC))[1:5] as sample_span_ids
FROM urt_code_mappings m
JOIN urt_codes_versioned cv_from
ON cv_from.code = m.from_code AND cv_from.version_id = m.from_version
JOIN urt_codes_versioned cv_to
ON cv_to.code = m.to_code AND cv_to.version_id = m.to_version
LEFT JOIN review_spans rs
ON rs.urt_primary = m.from_code
AND rs.taxonomy_version = m.from_version
AND rs.is_active = TRUE
AND (p_business_id IS NULL OR rs.business_id = p_business_id)
WHERE m.from_version = p_from_version
AND m.to_version = p_to_version
AND m.mapping_type NOT IN ('equivalent', 'unchanged')
GROUP BY m.from_code, cv_from.display_name, m.to_code, cv_to.display_name, m.mapping_type
HAVING COUNT(rs.span_id) > 0
ORDER BY affected_spans DESC;
END;
$$ LANGUAGE plpgsql STABLE;
COMMENT ON FUNCTION detect_taxonomy_drift IS
'Identify spans affected by non-equivalent mappings between taxonomy versions';
-- Function: Get deprecated codes still in use
CREATE OR REPLACE FUNCTION get_deprecated_codes_in_use(
p_version TEXT,
p_business_id TEXT DEFAULT NULL
) RETURNS TABLE (
code TEXT,
display_name TEXT,
deprecation_reason TEXT,
successor_hint TEXT,
span_count BIGINT,
latest_span_date TIMESTAMPTZ
) AS $$
BEGIN
RETURN QUERY
SELECT
cv.code,
cv.display_name,
cv.deprecation_reason,
cv.successor_hint,
COUNT(rs.span_id) as span_count,
MAX(rs.review_time) as latest_span_date
FROM urt_codes_versioned cv
LEFT JOIN review_spans rs
ON rs.urt_primary = cv.code
AND rs.taxonomy_version = cv.version_id
AND rs.is_active = TRUE
AND (p_business_id IS NULL OR rs.business_id = p_business_id)
WHERE cv.version_id = p_version
AND cv.change_type = 'deprecated'
GROUP BY cv.code, cv.display_name, cv.deprecation_reason, cv.successor_hint
HAVING COUNT(rs.span_id) > 0
ORDER BY span_count DESC;
END;
$$ LANGUAGE plpgsql STABLE;
COMMENT ON FUNCTION get_deprecated_codes_in_use IS
'Find deprecated codes that still have active spans';
Part 4: Seed Data
4.1 Initial Version Registration
-- ═══════════════════════════════════════════════════════════════
-- SEED DATA: v5.1 Initial Version
-- ═══════════════════════════════════════════════════════════════
INSERT INTO urt_taxonomy_versions (
version_id, semver, effective_from, is_current,
release_notes, domain_count, category_count, subcode_count,
breaking_changes, created_by
) VALUES (
'v5.1', '5.1.0', '2026-01-01', TRUE,
'Initial URT v5.1 release. 7 domains, 28 categories, 138 subcodes.',
7, 28, 138,
FALSE, 'system'
);
-- Note: urt_codes_versioned should be populated from B1-urt-codes.yaml
-- using a separate seed script. See: scripts/seed-urt-codes.py
4.2 Sample Versioned Codes (Subset)
-- Sample: Domain-level codes for v5.1
INSERT INTO urt_codes_versioned (
code, version_id, domain, category, tier,
display_name, definition, default_owner, display_order, change_type
) VALUES
('O', 'v5.1', 'O', 'O', 1, 'Offering',
'The core product, service, or outcome delivered', 'Product / Operations', 1, 'new'),
('P', 'v5.1', 'P', 'P', 1, 'People',
'Human interactions and personnel behavior', 'HR / Training', 2, 'new'),
('J', 'v5.1', 'J', 'J', 1, 'Journey',
'The process, timing, and operational flow', 'Operations / Process', 3, 'new'),
('E', 'v5.1', 'E', 'E', 1, 'Environment',
'Physical, digital, and ambient context', 'Facilities / IT', 4, 'new'),
('A', 'v5.1', 'A', 'A', 1, 'Access',
'Availability, accessibility, and inclusivity', 'Compliance / Design', 5, 'new'),
('V', 'v5.1', 'V', 'V', 1, 'Value',
'Cost, pricing, and worth of the exchange', 'Finance / Pricing', 6, 'new'),
('R', 'v5.1', 'R', 'R', 1, 'Relationship',
'Trust, reliability, and ongoing connection', 'Leadership / CX', 7, 'new');
-- Sample: Category-level codes for Journey domain
INSERT INTO urt_codes_versioned (
code, version_id, domain, category, tier, parent_code,
display_name, definition, display_order, change_type
) VALUES
('J1', 'v5.1', 'J', 'J1', 2, 'J', 'Timing',
'Speed, punctuality, and time management', 1, 'new'),
('J2', 'v5.1', 'J', 'J2', 2, 'J', 'Ease',
'Effort required and friction encountered', 2, 'new'),
('J3', 'v5.1', 'J', 'J3', 2, 'J', 'Reliability',
'Consistency and predictability of process', 3, 'new'),
('J4', 'v5.1', 'J', 'J4', 2, 'J', 'Resolution',
'How problems are handled when they arise', 4, 'new');
-- Sample: Subcode-level codes for J1 (Timing)
INSERT INTO urt_codes_versioned (
code, version_id, domain, category, subcategory, tier, parent_code,
display_name, definition,
dont_confuse_with, dont_confuse_reason,
display_order, change_type
) VALUES
('J1.01', 'v5.1', 'J', 'J1', 'J1.01', 3, 'J1',
'Wait Time', 'Duration of waiting before service begins or between steps',
'J1.03', 'J1.03 is total duration, J1.01 is specifically idle waiting', 1, 'new'),
('J1.02', 'v5.1', 'J', 'J1', 'J1.02', 3, 'J1',
'Punctuality', 'On-time arrival, start, or delivery vs. scheduled time',
'J3.01', 'J3.01 is consistency over time, J1.02 is single-instance timeliness', 2, 'new'),
('J1.03', 'v5.1', 'J', 'J1', 'J1.03', 3, 'J1',
'Service Speed', 'Overall pace of active service delivery',
'O1.02', 'O1.02 is product performance, J1.03 is service delivery speed', 3, 'new'),
('J1.04', 'v5.1', 'J', 'J1', 'J1.04', 3, 'J1',
'Response Time', 'Speed of response to customer requests or inquiries',
'P3.01', 'P3.01 is attentiveness, J1.04 is measured response time', 4, 'new'),
('J1.05', 'v5.1', 'J', 'J1', 'J1.05', 3, 'J1',
'Time Respect', 'Respect for customer''s time and schedule constraints',
'P1.01', 'P1.01 is general attitude, J1.05 is specifically about time', 5, 'new');
Part 5: Migration Guide
5.1 Migration Steps (v3.2 → v3.2.1)
-- ═══════════════════════════════════════════════════════════════
-- MIGRATION SCRIPT: v3.2 to v3.2.1
-- Run in a transaction, test on staging first
-- ═══════════════════════════════════════════════════════════════
BEGIN;
-- Step 1: Create new tables
-- (Run DDL from Part 2 above)
-- Step 2: Register initial taxonomy version
INSERT INTO urt_taxonomy_versions (
version_id, semver, effective_from, is_current,
release_notes, breaking_changes
) VALUES (
'v5.1', '5.1.0', '2026-01-01', TRUE,
'Initial URT v5.1 - baseline for versioning', FALSE
);
-- Step 3: Populate urt_codes_versioned from existing urt_codes
INSERT INTO urt_codes_versioned (
code, version_id, domain, category, subcategory, tier,
display_name, definition, keywords, display_order, change_type
)
SELECT
code,
'v5.1',
domain,
category,
CASE WHEN code ~ '^[OPJEAVR][1-4]\.[0-9]{2}$' THEN code ELSE NULL END,
CASE
WHEN code ~ '^[OPJEAVR]$' THEN 1
WHEN code ~ '^[OPJEAVR][1-4]$' THEN 2
ELSE 3
END,
display_name,
description,
keywords,
0,
'new'
FROM urt_codes;
-- Step 4: Add taxonomy_version columns with default
ALTER TABLE review_spans
ADD COLUMN taxonomy_version TEXT NOT NULL DEFAULT 'v5.1';
ALTER TABLE reviews_enriched
ADD COLUMN taxonomy_version TEXT NOT NULL DEFAULT 'v5.1';
ALTER TABLE issues
ADD COLUMN taxonomy_version TEXT NOT NULL DEFAULT 'v5.1';
ALTER TABLE fact_timeseries
ADD COLUMN taxonomy_version TEXT NOT NULL DEFAULT 'v5.1';
-- Step 5: Add FK constraints
ALTER TABLE review_spans
ADD CONSTRAINT fk_spans_urt_versioned
FOREIGN KEY (urt_primary, taxonomy_version)
REFERENCES urt_codes_versioned(code, version_id);
ALTER TABLE issues
ADD CONSTRAINT fk_issues_urt_versioned
FOREIGN KEY (primary_subcode, taxonomy_version)
REFERENCES urt_codes_versioned(code, version_id);
-- Step 6: Create indexes
CREATE INDEX idx_spans_taxonomy_version ON review_spans(taxonomy_version);
CREATE INDEX idx_enriched_taxonomy_version ON reviews_enriched(taxonomy_version);
CREATE INDEX idx_issues_taxonomy_version ON issues(taxonomy_version);
-- Step 7: Update unique constraint on fact_timeseries
ALTER TABLE fact_timeseries
DROP CONSTRAINT IF EXISTS fact_timeseries_business_id_place_id_period_date_bucket_typ_key;
ALTER TABLE fact_timeseries
ADD CONSTRAINT uq_fact_timeseries
UNIQUE (business_id, place_id, period_date, bucket_type,
subject_type, subject_id, taxonomy_version);
COMMIT;
5.2 Application Code Changes
# Classification pipeline: Include taxonomy version
async def store_review_spans(enriched: dict, spans: list[dict], batch_id: str):
"""Store extracted spans with taxonomy version."""
# Get current taxonomy version
taxonomy_version = await get_current_taxonomy_version()
for idx, span in enumerate(spans):
await db.execute("""
INSERT INTO review_spans (
span_id, business_id, place_id, source, review_id, review_version,
span_index, span_text, span_start, span_end,
profile, urt_primary, valence, intensity,
taxonomy_version, model_version, -- NEW: taxonomy_version
...
) VALUES (...)
""", [
...,
taxonomy_version,
'gpt-4o-mini-2024-07-18',
...
])
# Trend queries: Support version normalization
async def get_timeline(
business_id: str,
place_id: str,
subject_id: str,
start: date,
end: date,
normalize_to_version: str = None # NEW: optional normalization
) -> list[dict]:
"""Query timeline with optional version normalization."""
if normalize_to_version:
# Use normalized aggregation
return await db.query("""
SELECT * FROM aggregate_spans_normalized(%s, %s, %s, %s, %s)
WHERE urt_code = %s
""", [business_id, place_id, start, end, normalize_to_version, subject_id])
else:
# Standard query (preserves original versions)
return await db.query("""
SELECT * FROM fact_timeseries
WHERE business_id = %s AND place_id = %s
AND subject_type = 'urt_code' AND subject_id = %s
AND period_date BETWEEN %s AND %s
""", [business_id, place_id, subject_id, start, end])
Part 6: Query Patterns
6.1 Point-in-Time Query (Historical Context)
-- "What were our J1 issues in Q3 2025, as classified then?"
-- Returns data with original taxonomy context
SELECT
rs.span_id,
rs.span_text,
rs.urt_primary,
rs.taxonomy_version,
cv.display_name,
cv.definition
FROM review_spans rs
JOIN urt_codes_versioned cv
ON rs.urt_primary = cv.code
AND rs.taxonomy_version = cv.version_id
WHERE rs.business_id = 'acme'
AND rs.urt_primary LIKE 'J1%'
AND rs.review_time BETWEEN '2025-07-01' AND '2025-09-30'
AND rs.is_active = TRUE
ORDER BY rs.review_time DESC;
6.2 Normalized Trend Query
-- "Show all historical data mapped to current J1.01"
-- Translates across taxonomy versions
SELECT
DATE_TRUNC('month', rs.review_time) as month,
rs.taxonomy_version as original_version,
COUNT(*) as span_count
FROM review_spans rs
JOIN get_code_lineage('J1.01') lineage
ON rs.urt_primary = lineage.code
AND rs.taxonomy_version = lineage.version_id
WHERE rs.business_id = 'acme'
AND rs.is_active = TRUE
GROUP BY 1, 2
ORDER BY 1, 2;
6.3 Drift Analysis
-- "What spans would be affected by upgrading to v5.2?"
SELECT * FROM detect_taxonomy_drift('v5.1', 'v5.2', 'acme');
-- "Are we still using deprecated codes?"
SELECT * FROM get_deprecated_codes_in_use('v5.2', 'acme');
6.4 Cross-Version Comparison
-- "Compare classification distribution between taxonomy versions"
SELECT
rs.taxonomy_version,
LEFT(rs.urt_primary, 1) as domain,
COUNT(*) as span_count,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (PARTITION BY rs.taxonomy_version), 1) as pct
FROM review_spans rs
WHERE rs.business_id = 'acme'
AND rs.is_active = TRUE
GROUP BY rs.taxonomy_version, LEFT(rs.urt_primary, 1)
ORDER BY rs.taxonomy_version, span_count DESC;
Part 7: Operational Procedures
7.1 Releasing a New Taxonomy Version
-- Step 1: Register new version (not current yet)
INSERT INTO urt_taxonomy_versions (
version_id, semver, effective_from, is_current,
release_notes, predecessor_version, breaking_changes
) VALUES (
'v5.2', '5.2.0', '2026-04-01', FALSE,
'Split J1.01 into J1.01a/J1.01b for queue vs service wait',
'v5.1', TRUE
);
-- Step 2: Populate versioned codes
INSERT INTO urt_codes_versioned (code, version_id, ...)
SELECT ... FROM urt_codes_v52_staging;
-- Step 3: Define mappings from v5.1 to v5.2
INSERT INTO urt_code_mappings (
from_code, from_version, to_code, to_version,
mapping_type, confidence, notes
) VALUES
('J1.01', 'v5.1', 'J1.01a', 'v5.2', 'narrower', 0.7,
'J1.01 split: queue wait'),
('J1.01', 'v5.1', 'J1.01b', 'v5.2', 'narrower', 0.3,
'J1.01 split: service wait');
-- Step 4: Run drift analysis
SELECT * FROM detect_taxonomy_drift('v5.1', 'v5.2');
-- Step 5: Activate new version
UPDATE urt_taxonomy_versions SET is_current = FALSE WHERE is_current = TRUE;
UPDATE urt_taxonomy_versions SET is_current = TRUE WHERE version_id = 'v5.2';
UPDATE urt_taxonomy_versions SET effective_to = CURRENT_DATE WHERE version_id = 'v5.1';
7.2 Reclassification Workflow
-- Reclassify historical spans with new taxonomy (creates parallel records)
-- Uses soft-switch pattern from v3.2
-- 1. Create new batch with new taxonomy version
INSERT INTO review_spans (
span_id, ..., taxonomy_version, ingest_batch_id, is_active
)
SELECT
generate_span_id(...), -- New span ID
...,
'v5.2', -- New taxonomy version
'reclassify-batch-001',
FALSE -- Not active yet
FROM review_spans_to_reclassify;
-- 2. Run LLM reclassification on batch
-- (Application code)
-- 3. Atomic switch (deactivate old, activate new)
BEGIN;
UPDATE review_spans SET is_active = FALSE
WHERE taxonomy_version = 'v5.1' AND business_id = 'acme';
UPDATE review_spans SET is_active = TRUE
WHERE ingest_batch_id = 'reclassify-batch-001';
COMMIT;
Part 8: Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Version ID format | v{major}.{minor} |
Human-readable, matches URT releases |
| Semver column | Separate TEXT field | Enables programmatic version comparison |
| Code FK strategy | Composite (code, version_id) |
Prevents orphaned classifications |
| Mapping direction | Forward only (old→new) | Simpler model, matches time flow |
| Transitive mappings | Single hop in function | Balances accuracy vs complexity |
| Default version | 'v5.1' hardcoded |
Safe baseline, explicit upgrade path |
| Fact table versioning | Per-row taxonomy_version |
Enables version-specific aggregation |
Part 9: Future Considerations (v3.3+)
| Feature | Description |
|---|---|
| Prompt versioning | Track prompt templates used for classification |
| A/B testing | Compare classifier accuracy across versions |
| Automatic mapping suggestion | LLM-powered mapping recommendations |
| Version-aware dashboards | UI toggle for normalized vs point-in-time |
| Bulk reclassification pipeline | Scheduled jobs for taxonomy upgrades |
Document Control
| Field | Value |
|---|---|
| Document | ReviewIQ v3.2.1 Addendum: Taxonomy Versioning |
| Status | Draft Specification |
| Date | 2026-01-24 |
| Extends | ReviewIQ Architecture v3.2.0 |
| Author | Architecture Team |
Changelog
| Version | Changes |
|---|---|
| v3.2.1-draft | Initial taxonomy versioning specification |
End of ReviewIQ v3.2.1 Addendum