feat: Add extensible multi-pipeline integration system

This commit implements a plugin-like pipeline architecture with: Pipeline Core Package (packages/pipeline-core/): - BasePipeline abstract class all pipelines implement - PipelineRegistry for database-backed discovery/management - PipelineRunner for execution with status tracking - DashboardConfig contracts for dynamic widget definitions Database Migration (006_pipeline_registry.sql): - pipeline.registry table for registered pipelines - pipeline.executions table for execution history - Views for execution stats and monitoring ReviewIQ Pipeline Refactor: - Implements BasePipeline interface - Adds get_dashboard_config() with widget definitions - Adds get_widget_data() methods for all dashboard widgets - Maintains backward compatibility with Pipeline alias Generic Pipeline API (api/routes/pipelines.py): - GET /api/pipelines - List all registered pipelines - GET /api/pipelines/{id} - Pipeline details - POST /api/pipelines/{id}/execute - Execute pipeline - GET /api/pipelines/{id}/dashboard - Dashboard config - GET /api/pipelines/{id}/widgets/{w} - Widget data - GET /api/pipelines/{id}/executions - Execution history Frontend Dynamic Dashboard System: - DynamicDashboard component renders from config - WidgetRegistry maps types to components - Widget components: StatCard, LineChart, BarChart, PieChart, DataTable, Heatmap - Pipeline API client library Frontend Pipeline Pages: - /pipelines - List all registered pipelines - /pipelines/[id] - Dynamic dashboard for pipeline - /pipelines/[id]/executions - Execution history - Pipelines nav item in Sidebar Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-24 19:05:38 +00:00
parent d64f06ba9e
commit 824634aa76
30 changed files with 5697 additions and 95 deletions
--- a/packages/pipeline-core/README.md
+++ b/packages/pipeline-core/README.md
@@ -0,0 +1,119 @@
+# Pipeline Core
+
+Extensible multi-pipeline framework with dynamic dashboards.
+
+## Overview
+
+Pipeline Core provides the base abstractions for building pipelines that can be:
+- Discovered and registered dynamically
+- Executed with status tracking
+- Rendered with auto-generated dashboards
+
+## Features
+
+- **BasePipeline** - Abstract base class all pipelines implement
+- **PipelineRegistry** - Database-backed pipeline discovery and management
+- **PipelineRunner** - Execution with status tracking
+- **Dashboard Contracts** - TypedDicts for widget configuration
+
+## Installation
+
+```bash
+pip install -e packages/pipeline-core
+```
+
+## Usage
+
+### Implementing a Pipeline
+
+```python
+from pipeline_core import BasePipeline, PipelineMetadata, DashboardConfig
+
+class MyPipeline(BasePipeline):
+    @property
+    def metadata(self) -> PipelineMetadata:
+        return {
+            "id": "my-pipeline",
+            "name": "My Pipeline",
+            "description": "Does something useful",
+            "version": "1.0.0",
+            "stages": ["stage1", "stage2"],
+            "input_type": "MyInputType",
+        }
+
+    async def initialize(self) -> None:
+        # Set up connections
+        pass
+
+    async def close(self) -> None:
+        # Clean up
+        pass
+
+    async def process(self, input_data, stages=None):
+        # Run the pipeline
+        pass
+
+    def get_dashboard_config(self) -> DashboardConfig:
+        return {
+            "pipeline_id": "my-pipeline",
+            "title": "My Dashboard",
+            "sections": [...]
+        }
+
+    async def get_widget_data(self, widget_id, params):
+        # Return widget data
+        pass
+```
+
+### Registering a Pipeline
+
+```python
+from pipeline_core import PipelineRegistry
+import asyncpg
+
+pool = await asyncpg.create_pool(database_url)
+registry = PipelineRegistry(pool)
+
+await registry.register(
+    pipeline_id="my-pipeline",
+    name="My Pipeline",
+    description="Does something useful",
+    version="1.0.0",
+    module_path="my_package.pipeline:MyPipeline",
+    stages=["stage1", "stage2"],
+    input_type="MyInputType",
+)
+```
+
+### Executing a Pipeline
+
+```python
+from pipeline_core import PipelineRunner
+
+runner = PipelineRunner(pool, registry)
+
+execution_id, result = await runner.execute(
+    pipeline_id="my-pipeline",
+    request={
+        "input_data": {"key": "value"},
+        "stages": ["stage1"],
+    }
+)
+```
+
+## Dashboard Widgets
+
+Pipelines declare dashboard widgets via `get_dashboard_config()`. Available widget types:
+
+- `stat_card` - KPI stat card with value and trend
+- `line_chart` - Time series line chart
+- `bar_chart` - Bar chart (horizontal or vertical)
+- `pie_chart` - Pie/donut chart
+- `table` - Data table with columns
+- `heatmap` - Heatmap grid visualization
+- `area_chart` - Stacked area chart
+- `gauge` - Gauge/meter visualization
+
+## License
+
+MIT
--- a/packages/pipeline-core/pyproject.toml
+++ b/packages/pipeline-core/pyproject.toml
@@ -0,0 +1,65 @@
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[project]
+name = "pipeline-core"
+version = "0.1.0"
+description = "Pipeline Core - Extensible multi-pipeline framework with dynamic dashboards"
+readme = "README.md"
+license = "MIT"
+requires-python = ">=3.11"
+authors = [
+    { name = "ReviewIQ Team" }
+]
+keywords = ["pipeline", "framework", "dashboard", "registry"]
+classifiers = [
+    "Development Status :: 3 - Alpha",
+    "Intended Audience :: Developers",
+    "License :: OSI Approved :: MIT License",
+    "Programming Language :: Python :: 3",
+    "Programming Language :: Python :: 3.11",
+    "Programming Language :: Python :: 3.12",
+]
+
+dependencies = [
+    "asyncpg>=0.28.0",
+    "pydantic>=2.0",
+    "pydantic-settings>=2.0",
+]
+
+[project.optional-dependencies]
+dev = [
+    "pytest>=7.0",
+    "pytest-asyncio>=0.21.0",
+    "pytest-cov>=4.0",
+    "ruff>=0.1.0",
+    "mypy>=1.0",
+]
+
+[project.urls]
+Homepage = "https://github.com/reviewiq/pipeline-core"
+Documentation = "https://github.com/reviewiq/pipeline-core#readme"
+Repository = "https://github.com/reviewiq/pipeline-core"
+
+[tool.hatch.build.targets.wheel]
+packages = ["src/pipeline_core"]
+
+[tool.pytest.ini_options]
+asyncio_mode = "auto"
+testpaths = ["tests"]
+pythonpath = ["src"]
+
+[tool.ruff]
+line-length = 100
+target-version = "py311"
+
+[tool.ruff.lint]
+select = ["E", "F", "I", "N", "W", "UP"]
+ignore = ["E501"]
+
+[tool.mypy]
+python_version = "3.11"
+strict = true
+warn_return_any = true
+warn_unused_ignores = true
--- a/packages/pipeline-core/src/pipeline_core/init.py
+++ b/packages/pipeline-core/src/pipeline_core/init.py
@@ -0,0 +1,34 @@
+"""
+Pipeline Core - Extensible multi-pipeline framework with dynamic dashboards.
+
+This package provides the base abstractions for building pipelines that can be
+discovered, registered, and rendered with dynamic dashboards.
+"""
+
+from pipeline_core.base import BasePipeline, PipelineMetadata, PipelineResult
+from pipeline_core.contracts import (
+    DashboardConfig,
+    DashboardSection,
+    WidgetConfig,
+    WidgetType,
+)
+from pipeline_core.registry import PipelineRegistry
+from pipeline_core.runner import PipelineRunner
+
+__version__ = "0.1.0"
+
+__all__ = [
+    # Base classes
+    "BasePipeline",
+    "PipelineMetadata",
+    "PipelineResult",
+    # Contracts
+    "DashboardConfig",
+    "DashboardSection",
+    "WidgetConfig",
+    "WidgetType",
+    # Registry
+    "PipelineRegistry",
+    # Runner
+    "PipelineRunner",
+]
--- a/packages/pipeline-core/src/pipeline_core/base.py
+++ b/packages/pipeline-core/src/pipeline_core/base.py
@@ -0,0 +1,263 @@
+"""
+Base Pipeline abstract class and related types.
+
+All pipelines must implement this interface to be compatible with the
+pipeline registry, runner, and dynamic dashboard system.
+"""
+
+from __future__ import annotations
+
+from abc import ABC, abstractmethod
+from typing import Any, TypedDict
+
+from pipeline_core.contracts import DashboardConfig
+
+
+class PipelineMetadata(TypedDict):
+    """Metadata describing a pipeline."""
+
+    id: str                    # Unique pipeline identifier (e.g., "reviewiq")
+    name: str                  # Display name (e.g., "ReviewIQ Classification Pipeline")
+    description: str           # Human-readable description
+    version: str               # Semantic version (e.g., "1.0.0")
+    stages: list[str]          # Ordered list of stage names
+    input_type: str            # Expected input type (e.g., "ScraperV1Output")
+
+
+class StageResult(TypedDict, total=False):
+    """Result from running a single pipeline stage."""
+
+    stage: str                 # Stage name
+    success: bool              # Whether the stage succeeded
+    data: dict[str, Any]       # Stage output data
+    error: str | None          # Error message if failed
+    duration_ms: int           # Stage execution time
+
+
+class PipelineResult:
+    """Result from running a pipeline."""
+
+    def __init__(
+        self,
+        pipeline_id: str,
+        stages_run: list[str] | None = None,
+        stage_results: dict[str, StageResult] | None = None,
+        success: bool = True,
+        error: str | None = None,
+    ):
+        """
+        Initialize pipeline result.
+
+        Args:
+            pipeline_id: Pipeline identifier
+            stages_run: List of stages that were run
+            stage_results: Results from each stage
+            success: Overall success status
+            error: Error message if failed
+        """
+        self.pipeline_id = pipeline_id
+        self.stages_run = stages_run or []
+        self.stage_results = stage_results or {}
+        self.success = success
+        self.error = error
+
+    def get_stage_result(self, stage: str) -> StageResult | None:
+        """Get result for a specific stage."""
+        return self.stage_results.get(stage)
+
+    def to_dict(self) -> dict[str, Any]:
+        """Convert to dictionary."""
+        return {
+            "pipeline_id": self.pipeline_id,
+            "stages_run": self.stages_run,
+            "stage_results": self.stage_results,
+            "success": self.success,
+            "error": self.error,
+        }
+
+    @classmethod
+    def from_error(cls, pipeline_id: str, error: str) -> PipelineResult:
+        """Create a failed result from an error."""
+        return cls(
+            pipeline_id=pipeline_id,
+            success=False,
+            error=error,
+        )
+
+
+class BasePipeline(ABC):
+    """
+    Abstract base class for all pipelines.
+
+    All pipelines must implement this interface to be compatible with:
+    - Pipeline registry (discovery and management)
+    - Pipeline runner (execution)
+    - Dynamic dashboard system (widget configuration and data)
+
+    Example implementation:
+
+        class ReviewIQPipeline(BasePipeline):
+            @property
+            def metadata(self) -> PipelineMetadata:
+                return {
+                    "id": "reviewiq",
+                    "name": "ReviewIQ Classification Pipeline",
+                    "description": "Classifies reviews using URT taxonomy",
+                    "version": "1.0.0",
+                    "stages": ["normalize", "classify", "route", "aggregate"],
+                    "input_type": "ScraperV1Output",
+                }
+
+            async def initialize(self) -> None:
+                # Set up database connections, etc.
+                pass
+
+            async def process(self, input_data, stages=None) -> PipelineResult:
+                # Run the pipeline
+                pass
+
+            def get_dashboard_config(self) -> DashboardConfig:
+                return {
+                    "pipeline_id": "reviewiq",
+                    "title": "ReviewIQ Analytics",
+                    "sections": [...]
+                }
+
+            async def get_widget_data(self, widget_id, params) -> dict:
+                # Return data for a specific widget
+                pass
+    """
+
+    @property
+    @abstractmethod
+    def metadata(self) -> PipelineMetadata:
+        """
+        Get pipeline metadata.
+
+        Returns:
+            PipelineMetadata with id, name, description, version, stages, input_type
+        """
+        ...
+
+    @abstractmethod
+    async def initialize(self) -> None:
+        """
+        Initialize the pipeline.
+
+        This is called before any processing. Use it to:
+        - Establish database connections
+        - Load configuration
+        - Initialize services
+
+        This method may be called multiple times but should be idempotent.
+        """
+        ...
+
+    @abstractmethod
+    async def close(self) -> None:
+        """
+        Close and cleanup pipeline resources.
+
+        This is called when the pipeline is no longer needed. Use it to:
+        - Close database connections
+        - Release resources
+        - Cleanup temporary files
+        """
+        ...
+
+    @abstractmethod
+    async def process(
+        self,
+        input_data: dict[str, Any],
+        stages: list[str] | None = None,
+    ) -> PipelineResult:
+        """
+        Process input data through the pipeline.
+
+        Args:
+            input_data: Input data dictionary (format depends on input_type)
+            stages: List of stages to run (default: all stages)
+
+        Returns:
+            PipelineResult with stage outputs and validation results
+        """
+        ...
+
+    @abstractmethod
+    def get_dashboard_config(self) -> DashboardConfig:
+        """
+        Get the dashboard configuration for this pipeline.
+
+        Returns:
+            DashboardConfig with sections and widget definitions
+
+        The frontend uses this configuration to dynamically render
+        the pipeline's dashboard with appropriate widgets.
+        """
+        ...
+
+    @abstractmethod
+    async def get_widget_data(
+        self,
+        widget_id: str,
+        params: dict[str, Any],
+    ) -> dict[str, Any]:
+        """
+        Get data for a specific dashboard widget.
+
+        Args:
+            widget_id: Widget identifier (from dashboard config)
+            params: Query parameters (e.g., time range, filters)
+
+        Returns:
+            Dictionary with widget data in the format expected by the widget type
+
+        Common params:
+            - business_id: Filter by business
+            - time_range: Time range (e.g., "7d", "30d", "custom")
+            - start_date: Start date for custom range
+            - end_date: End date for custom range
+        """
+        ...
+
+    # Optional methods with default implementations
+
+    async def validate_input(self, input_data: dict[str, Any]) -> list[str]:
+        """
+        Validate input data before processing.
+
+        Args:
+            input_data: Input data to validate
+
+        Returns:
+            List of validation error messages (empty if valid)
+
+        Override this to add custom input validation.
+        """
+        return []
+
+    async def health_check(self) -> dict[str, Any]:
+        """
+        Perform a health check on the pipeline.
+
+        Returns:
+            Dictionary with health status:
+                - healthy: bool
+                - checks: dict of individual check results
+                - message: optional message
+
+        Override this to add custom health checks.
+        """
+        return {
+            "healthy": True,
+            "checks": {},
+            "message": None,
+        }
+
+    def get_stage_names(self) -> list[str]:
+        """Get the list of stage names."""
+        return self.metadata["stages"]
+
+    def get_pipeline_id(self) -> str:
+        """Get the pipeline identifier."""
+        return self.metadata["id"]
--- a/packages/pipeline-core/src/pipeline_core/contracts.py
+++ b/packages/pipeline-core/src/pipeline_core/contracts.py
@@ -0,0 +1,252 @@
+"""
+Dashboard and Widget contracts for the pipeline system.
+
+These TypedDicts define the data structures for dynamic dashboard configuration,
+allowing pipelines to declare their dashboard widgets which the frontend renders.
+"""
+
+from __future__ import annotations
+
+from typing import Any, Literal, TypedDict
+
+
+# =============================================================================
+# Widget Types
+# =============================================================================
+
+WidgetType = Literal[
+    "stat_card",      # KPI stat card with value and optional trend
+    "line_chart",     # Time series line chart
+    "bar_chart",      # Bar chart (horizontal or vertical)
+    "pie_chart",      # Pie/donut chart
+    "table",          # Data table with columns
+    "heatmap",        # Heatmap grid visualization
+    "area_chart",     # Stacked area chart
+    "gauge",          # Gauge/meter visualization
+]
+
+
+# =============================================================================
+# Widget Configuration
+# =============================================================================
+
+
+class GridPosition(TypedDict):
+    """Grid position for a widget in the dashboard layout."""
+
+    x: int           # Column position (0-based)
+    y: int           # Row position (0-based)
+    w: int           # Width in grid units
+    h: int           # Height in grid units
+
+
+class StatCardConfig(TypedDict, total=False):
+    """Configuration specific to stat card widgets."""
+
+    value_key: str           # Key in data for the main value
+    label: str               # Label to display
+    format: str              # Format string (e.g., "{value:,}", "{value:.1%}")
+    trend_key: str | None    # Key for trend value (optional)
+    trend_format: str | None # Format for trend (e.g., "+{value:.1%}")
+    icon: str | None         # Icon name (optional)
+    color: str | None        # Color theme (e.g., "blue", "green", "red")
+
+
+class ChartAxisConfig(TypedDict, total=False):
+    """Configuration for chart axes."""
+
+    key: str                 # Data key for this axis
+    label: str               # Axis label
+    type: Literal["number", "category", "time"]
+    format: str | None       # Format string
+
+
+class ChartSeriesConfig(TypedDict, total=False):
+    """Configuration for a chart data series."""
+
+    key: str                 # Data key
+    name: str                # Display name
+    color: str | None        # Series color
+    type: Literal["line", "bar", "area"] | None
+
+
+class ChartConfig(TypedDict, total=False):
+    """Configuration for chart widgets (line, bar, area)."""
+
+    x_axis: ChartAxisConfig
+    y_axis: ChartAxisConfig
+    series: list[ChartSeriesConfig]
+    stacked: bool
+    show_legend: bool
+    show_grid: bool
+
+
+class PieChartConfig(TypedDict, total=False):
+    """Configuration for pie/donut chart widgets."""
+
+    value_key: str           # Key for segment value
+    label_key: str           # Key for segment label
+    colors: list[str] | None # Custom color palette
+    show_legend: bool
+    show_labels: bool
+    inner_radius: int | None # For donut chart (0 = pie)
+
+
+class TableColumnConfig(TypedDict, total=False):
+    """Configuration for a table column."""
+
+    key: str                 # Data key
+    header: str              # Column header
+    width: int | None        # Column width
+    align: Literal["left", "center", "right"]
+    format: str | None       # Format string
+    sortable: bool
+
+
+class TableConfig(TypedDict, total=False):
+    """Configuration for table widgets."""
+
+    columns: list[TableColumnConfig]
+    row_key: str             # Key for unique row identifier
+    page_size: int
+    show_pagination: bool
+    sortable: bool
+    filterable: bool
+
+
+class HeatmapConfig(TypedDict, total=False):
+    """Configuration for heatmap widgets."""
+
+    x_key: str               # Key for x-axis categories
+    y_key: str               # Key for y-axis categories
+    value_key: str           # Key for cell values
+    color_scale: list[str]   # Color gradient
+    show_values: bool
+    format: str | None       # Format for values
+
+
+class GaugeConfig(TypedDict, total=False):
+    """Configuration for gauge widgets."""
+
+    value_key: str           # Key for gauge value
+    min: float               # Minimum value
+    max: float               # Maximum value
+    thresholds: list[dict[str, Any]]  # Color thresholds
+    format: str | None       # Format string
+
+
+# Union of all widget-specific configs
+WidgetSpecificConfig = (
+    StatCardConfig
+    | ChartConfig
+    | PieChartConfig
+    | TableConfig
+    | HeatmapConfig
+    | GaugeConfig
+)
+
+
+class WidgetConfig(TypedDict, total=False):
+    """Configuration for a dashboard widget."""
+
+    id: str                  # Unique widget identifier
+    type: WidgetType         # Widget type
+    title: str               # Widget title
+    grid: GridPosition       # Grid position and size
+    config: dict[str, Any]   # Widget-specific configuration
+    data_endpoint: str | None  # Custom data endpoint (if not default)
+    refresh_interval: int | None  # Auto-refresh interval in seconds
+
+
+# =============================================================================
+# Dashboard Configuration
+# =============================================================================
+
+
+class DashboardSection(TypedDict):
+    """A section in the dashboard containing widgets."""
+
+    id: str                  # Unique section identifier
+    title: str               # Section title
+    description: str | None  # Optional description
+    widgets: list[WidgetConfig]
+    collapsed: bool | None   # Whether section is collapsed by default
+
+
+class DashboardConfig(TypedDict):
+    """Full dashboard configuration for a pipeline."""
+
+    pipeline_id: str         # Pipeline identifier
+    title: str               # Dashboard title
+    description: str | None  # Optional description
+    sections: list[DashboardSection]
+    default_time_range: str | None  # Default time range (e.g., "7d", "30d")
+    refresh_interval: int | None    # Global refresh interval in seconds
+
+
+# =============================================================================
+# Execution Types
+# =============================================================================
+
+
+class ExecutionStatus(TypedDict, total=False):
+    """Status of a pipeline execution."""
+
+    id: str                  # Execution ID
+    pipeline_id: str         # Pipeline identifier
+    job_id: str | None       # Associated job ID
+    business_id: str | None  # Business identifier
+    status: Literal["pending", "running", "completed", "failed", "cancelled"]
+    stages_requested: list[str]
+    stages_completed: list[str]
+    current_stage: str | None
+    progress: float          # 0.0 to 1.0
+    input_summary: dict[str, Any] | None
+    result_summary: dict[str, Any] | None
+    error_message: str | None
+    started_at: str | None
+    completed_at: str | None
+    created_at: str
+
+
+class ExecutionRequest(TypedDict, total=False):
+    """Request to execute a pipeline."""
+
+    job_id: str | None       # Job ID to process
+    business_id: str | None  # Business identifier
+    input_data: dict[str, Any] | None  # Direct input data
+    stages: list[str] | None  # Stages to run (default: all)
+    options: dict[str, Any] | None  # Pipeline-specific options
+
+
+# =============================================================================
+# Pipeline Info Types
+# =============================================================================
+
+
+class PipelineInfo(TypedDict):
+    """Summary information about a pipeline."""
+
+    id: str                  # Pipeline ID (e.g., "reviewiq")
+    name: str                # Display name
+    description: str
+    version: str
+    is_enabled: bool
+    stages: list[str]        # Available stages
+    input_type: str          # Expected input type
+
+
+class PipelineDetail(TypedDict):
+    """Detailed pipeline information including metadata."""
+
+    id: str
+    name: str
+    description: str
+    version: str
+    is_enabled: bool
+    stages: list[str]
+    input_type: str
+    module_path: str
+    config: dict[str, Any] | None
+    created_at: str
+    updated_at: str
--- a/packages/pipeline-core/src/pipeline_core/registry.py
+++ b/packages/pipeline-core/src/pipeline_core/registry.py
@@ -0,0 +1,455 @@
+"""
+Pipeline Registry - Database-backed discovery and management of pipelines.
+
+The registry maintains a list of registered pipelines and their metadata,
+allowing the system to discover available pipelines and instantiate them.
+"""
+
+from __future__ import annotations
+
+import importlib
+import logging
+from datetime import datetime
+from typing import TYPE_CHECKING, Any
+
+from pipeline_core.contracts import PipelineDetail, PipelineInfo
+
+if TYPE_CHECKING:
+    import asyncpg
+
+    from pipeline_core.base import BasePipeline
+
+logger = logging.getLogger(__name__)
+
+
+class PipelineRegistry:
+    """
+    Database-backed registry for pipeline discovery and management.
+
+    The registry stores pipeline metadata in a PostgreSQL table and provides
+    methods to register, list, and instantiate pipelines.
+
+    Usage:
+        pool = await asyncpg.create_pool(database_url)
+        registry = PipelineRegistry(pool)
+
+        # Register a pipeline
+        await registry.register(
+            pipeline_id="reviewiq",
+            name="ReviewIQ Pipeline",
+            description="Classifies reviews",
+            version="1.0.0",
+            module_path="reviewiq_pipeline.pipeline:ReviewIQPipeline",
+        )
+
+        # List pipelines
+        pipelines = await registry.list_pipelines()
+
+        # Get a pipeline instance
+        pipeline = await registry.get_pipeline("reviewiq")
+    """
+
+    def __init__(self, pool: asyncpg.Pool):
+        """
+        Initialize the registry.
+
+        Args:
+            pool: asyncpg connection pool
+        """
+        self._pool = pool
+        self._instances: dict[str, BasePipeline] = {}
+
+    async def register(
+        self,
+        pipeline_id: str,
+        name: str,
+        description: str,
+        version: str,
+        module_path: str,
+        stages: list[str],
+        input_type: str,
+        config: dict[str, Any] | None = None,
+        is_enabled: bool = True,
+    ) -> None:
+        """
+        Register a pipeline in the database.
+
+        Args:
+            pipeline_id: Unique pipeline identifier
+            name: Display name
+            description: Human-readable description
+            version: Semantic version
+            module_path: Python module path (e.g., "package.module:ClassName")
+            stages: List of stage names
+            input_type: Expected input type
+            config: Optional pipeline configuration
+            is_enabled: Whether the pipeline is enabled
+
+        Raises:
+            ValueError: If module_path is invalid
+        """
+        # Validate module path format
+        if ":" not in module_path:
+            raise ValueError(
+                f"Invalid module_path: {module_path}. "
+                "Expected format: 'package.module:ClassName'"
+            )
+
+        async with self._pool.acquire() as conn:
+            await conn.execute(
+                """
+                INSERT INTO pipeline.registry (
+                    pipeline_id, name, description, version, module_path,
+                    stages, input_type, config, is_enabled, updated_at
+                )
+                VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, NOW())
+                ON CONFLICT (pipeline_id) DO UPDATE SET
+                    name = EXCLUDED.name,
+                    description = EXCLUDED.description,
+                    version = EXCLUDED.version,
+                    module_path = EXCLUDED.module_path,
+                    stages = EXCLUDED.stages,
+                    input_type = EXCLUDED.input_type,
+                    config = EXCLUDED.config,
+                    is_enabled = EXCLUDED.is_enabled,
+                    updated_at = NOW()
+                """,
+                pipeline_id,
+                name,
+                description,
+                version,
+                module_path,
+                stages,
+                input_type,
+                config,
+                is_enabled,
+            )
+
+        logger.info(f"Registered pipeline: {pipeline_id} v{version}")
+
+    async def register_from_instance(
+        self,
+        pipeline: BasePipeline,
+        module_path: str,
+        config: dict[str, Any] | None = None,
+    ) -> None:
+        """
+        Register a pipeline from an instance.
+
+        Args:
+            pipeline: Pipeline instance
+            module_path: Python module path
+            config: Optional configuration
+        """
+        metadata = pipeline.metadata
+        await self.register(
+            pipeline_id=metadata["id"],
+            name=metadata["name"],
+            description=metadata["description"],
+            version=metadata["version"],
+            module_path=module_path,
+            stages=metadata["stages"],
+            input_type=metadata["input_type"],
+            config=config,
+        )
+        # Cache the instance
+        self._instances[metadata["id"]] = pipeline
+
+    async def unregister(self, pipeline_id: str) -> bool:
+        """
+        Unregister a pipeline from the database.
+
+        Args:
+            pipeline_id: Pipeline identifier to remove
+
+        Returns:
+            True if pipeline was removed, False if not found
+        """
+        async with self._pool.acquire() as conn:
+            result = await conn.execute(
+                "DELETE FROM pipeline.registry WHERE pipeline_id = $1",
+                pipeline_id,
+            )
+
+        # Remove from cache
+        self._instances.pop(pipeline_id, None)
+
+        deleted = result.split()[-1] != "0"
+        if deleted:
+            logger.info(f"Unregistered pipeline: {pipeline_id}")
+        return deleted
+
+    async def set_enabled(self, pipeline_id: str, enabled: bool) -> bool:
+        """
+        Enable or disable a pipeline.
+
+        Args:
+            pipeline_id: Pipeline identifier
+            enabled: Whether to enable or disable
+
+        Returns:
+            True if pipeline was updated, False if not found
+        """
+        async with self._pool.acquire() as conn:
+            result = await conn.execute(
+                """
+                UPDATE pipeline.registry
+                SET is_enabled = $2, updated_at = NOW()
+                WHERE pipeline_id = $1
+                """,
+                pipeline_id,
+                enabled,
+            )
+
+        updated = result.split()[-1] != "0"
+        if updated:
+            logger.info(f"Set pipeline {pipeline_id} enabled={enabled}")
+        return updated
+
+    async def list_pipelines(
+        self,
+        enabled_only: bool = True,
+    ) -> list[PipelineInfo]:
+        """
+        List all registered pipelines.
+
+        Args:
+            enabled_only: Only return enabled pipelines
+
+        Returns:
+            List of PipelineInfo dictionaries
+        """
+        async with self._pool.acquire() as conn:
+            if enabled_only:
+                rows = await conn.fetch(
+                    """
+                    SELECT pipeline_id, name, description, version,
+                           is_enabled, stages, input_type
+                    FROM pipeline.registry
+                    WHERE is_enabled = TRUE
+                    ORDER BY name
+                    """
+                )
+            else:
+                rows = await conn.fetch(
+                    """
+                    SELECT pipeline_id, name, description, version,
+                           is_enabled, stages, input_type
+                    FROM pipeline.registry
+                    ORDER BY name
+                    """
+                )
+
+        return [
+            PipelineInfo(
+                id=row["pipeline_id"],
+                name=row["name"],
+                description=row["description"],
+                version=row["version"],
+                is_enabled=row["is_enabled"],
+                stages=row["stages"],
+                input_type=row["input_type"],
+            )
+            for row in rows
+        ]
+
+    async def get_pipeline_detail(
+        self,
+        pipeline_id: str,
+    ) -> PipelineDetail | None:
+        """
+        Get detailed information about a pipeline.
+
+        Args:
+            pipeline_id: Pipeline identifier
+
+        Returns:
+            PipelineDetail or None if not found
+        """
+        async with self._pool.acquire() as conn:
+            row = await conn.fetchrow(
+                """
+                SELECT pipeline_id, name, description, version, module_path,
+                       is_enabled, stages, input_type, config,
+                       created_at, updated_at
+                FROM pipeline.registry
+                WHERE pipeline_id = $1
+                """,
+                pipeline_id,
+            )
+
+        if not row:
+            return None
+
+        return PipelineDetail(
+            id=row["pipeline_id"],
+            name=row["name"],
+            description=row["description"],
+            version=row["version"],
+            is_enabled=row["is_enabled"],
+            stages=row["stages"],
+            input_type=row["input_type"],
+            module_path=row["module_path"],
+            config=row["config"],
+            created_at=row["created_at"].isoformat() if row["created_at"] else None,
+            updated_at=row["updated_at"].isoformat() if row["updated_at"] else None,
+        )
+
+    async def get_pipeline(
+        self,
+        pipeline_id: str,
+        initialize: bool = True,
+    ) -> BasePipeline | None:
+        """
+        Get a pipeline instance.
+
+        Args:
+            pipeline_id: Pipeline identifier
+            initialize: Whether to call initialize() on the pipeline
+
+        Returns:
+            Pipeline instance or None if not found
+
+        This method caches pipeline instances for reuse.
+        """
+        # Check cache first
+        if pipeline_id in self._instances:
+            return self._instances[pipeline_id]
+
+        # Get pipeline details from database
+        detail = await self.get_pipeline_detail(pipeline_id)
+        if not detail or not detail["is_enabled"]:
+            return None
+
+        # Import and instantiate the pipeline
+        try:
+            pipeline = self._import_pipeline(detail["module_path"])
+        except Exception as e:
+            logger.error(f"Failed to import pipeline {pipeline_id}: {e}")
+            return None
+
+        # Initialize if requested
+        if initialize:
+            try:
+                await pipeline.initialize()
+            except Exception as e:
+                logger.error(f"Failed to initialize pipeline {pipeline_id}: {e}")
+                return None
+
+        # Cache and return
+        self._instances[pipeline_id] = pipeline
+        return pipeline
+
+    def _import_pipeline(self, module_path: str) -> BasePipeline:
+        """
+        Import a pipeline class from a module path.
+
+        Args:
+            module_path: Path in format "package.module:ClassName"
+
+        Returns:
+            Pipeline instance
+        """
+        module_name, class_name = module_path.rsplit(":", 1)
+        module = importlib.import_module(module_name)
+        cls = getattr(module, class_name)
+        return cls()
+
+    async def close_all(self) -> None:
+        """Close all cached pipeline instances."""
+        for pipeline_id, pipeline in self._instances.items():
+            try:
+                await pipeline.close()
+            except Exception as e:
+                logger.error(f"Error closing pipeline {pipeline_id}: {e}")
+
+        self._instances.clear()
+
+
+class InMemoryPipelineRegistry:
+    """
+    In-memory pipeline registry for testing and simple deployments.
+
+    This registry doesn't persist to a database and stores pipelines in memory.
+    """
+
+    def __init__(self):
+        self._pipelines: dict[str, BasePipeline] = {}
+        self._enabled: dict[str, bool] = {}
+
+    async def register(self, pipeline: BasePipeline) -> None:
+        """Register a pipeline instance."""
+        pipeline_id = pipeline.metadata["id"]
+        self._pipelines[pipeline_id] = pipeline
+        self._enabled[pipeline_id] = True
+        logger.info(f"Registered pipeline: {pipeline_id}")
+
+    async def unregister(self, pipeline_id: str) -> bool:
+        """Unregister a pipeline."""
+        if pipeline_id in self._pipelines:
+            del self._pipelines[pipeline_id]
+            del self._enabled[pipeline_id]
+            return True
+        return False
+
+    async def set_enabled(self, pipeline_id: str, enabled: bool) -> bool:
+        """Enable or disable a pipeline."""
+        if pipeline_id in self._enabled:
+            self._enabled[pipeline_id] = enabled
+            return True
+        return False
+
+    async def list_pipelines(
+        self,
+        enabled_only: bool = True,
+    ) -> list[PipelineInfo]:
+        """List all registered pipelines."""
+        result = []
+        for pipeline_id, pipeline in self._pipelines.items():
+            is_enabled = self._enabled.get(pipeline_id, True)
+            if enabled_only and not is_enabled:
+                continue
+
+            metadata = pipeline.metadata
+            result.append(
+                PipelineInfo(
+                    id=pipeline_id,
+                    name=metadata["name"],
+                    description=metadata["description"],
+                    version=metadata["version"],
+                    is_enabled=is_enabled,
+                    stages=metadata["stages"],
+                    input_type=metadata["input_type"],
+                )
+            )
+        return result
+
+    async def get_pipeline(
+        self,
+        pipeline_id: str,
+        initialize: bool = True,
+    ) -> BasePipeline | None:
+        """Get a pipeline instance."""
+        if pipeline_id not in self._pipelines:
+            return None
+
+        if not self._enabled.get(pipeline_id, True):
+            return None
+
+        pipeline = self._pipelines[pipeline_id]
+
+        if initialize:
+            await pipeline.initialize()
+
+        return pipeline
+
+    async def close_all(self) -> None:
+        """Close all pipeline instances."""
+        for pipeline in self._pipelines.values():
+            try:
+                await pipeline.close()
+            except Exception as e:
+                logger.error(f"Error closing pipeline: {e}")
+
+        self._pipelines.clear()
+        self._enabled.clear()
--- a/packages/pipeline-core/src/pipeline_core/runner.py
+++ b/packages/pipeline-core/src/pipeline_core/runner.py
@@ -0,0 +1,467 @@
+"""
+Pipeline Runner - Executes pipelines and tracks execution history.
+
+The runner handles pipeline execution, tracking execution status in the database,
+and providing execution history for monitoring and debugging.
+"""
+
+from __future__ import annotations
+
+import logging
+import time
+import uuid
+from datetime import datetime
+from typing import TYPE_CHECKING, Any
+
+from pipeline_core.base import PipelineResult
+from pipeline_core.contracts import ExecutionRequest, ExecutionStatus
+
+if TYPE_CHECKING:
+    import asyncpg
+
+    from pipeline_core.base import BasePipeline
+    from pipeline_core.registry import PipelineRegistry
+
+logger = logging.getLogger(__name__)
+
+
+class PipelineRunner:
+    """
+    Executes pipelines and tracks execution history.
+
+    The runner:
+    - Gets pipeline instances from the registry
+    - Tracks execution status in the database
+    - Handles errors and updates status
+    - Provides execution history queries
+
+    Usage:
+        registry = PipelineRegistry(pool)
+        runner = PipelineRunner(pool, registry)
+
+        # Execute a pipeline
+        result = await runner.execute(
+            pipeline_id="reviewiq",
+            request=ExecutionRequest(
+                job_id="job-123",
+                stages=["normalize", "classify"],
+            )
+        )
+
+        # Get execution status
+        status = await runner.get_execution("exec-123")
+
+        # List executions
+        executions = await runner.list_executions(pipeline_id="reviewiq")
+    """
+
+    def __init__(
+        self,
+        pool: asyncpg.Pool,
+        registry: PipelineRegistry,
+    ):
+        """
+        Initialize the runner.
+
+        Args:
+            pool: asyncpg connection pool
+            registry: Pipeline registry for getting pipeline instances
+        """
+        self._pool = pool
+        self._registry = registry
+
+    async def execute(
+        self,
+        pipeline_id: str,
+        request: ExecutionRequest,
+    ) -> tuple[str, PipelineResult]:
+        """
+        Execute a pipeline.
+
+        Args:
+            pipeline_id: Pipeline identifier
+            request: Execution request with input data and options
+
+        Returns:
+            Tuple of (execution_id, PipelineResult)
+
+        Raises:
+            ValueError: If pipeline not found or disabled
+        """
+        # Get pipeline instance
+        pipeline = await self._registry.get_pipeline(pipeline_id)
+        if not pipeline:
+            raise ValueError(f"Pipeline not found or disabled: {pipeline_id}")
+
+        # Create execution record
+        execution_id = str(uuid.uuid4())
+        stages = request.get("stages") or pipeline.get_stage_names()
+
+        await self._create_execution(
+            execution_id=execution_id,
+            pipeline_id=pipeline_id,
+            job_id=request.get("job_id"),
+            business_id=request.get("business_id"),
+            stages_requested=stages,
+        )
+
+        # Update status to running
+        await self._update_execution_status(
+            execution_id=execution_id,
+            status="running",
+            started_at=datetime.utcnow(),
+        )
+
+        try:
+            # Prepare input data
+            input_data = request.get("input_data") or {}
+            if request.get("job_id"):
+                input_data["job_id"] = request["job_id"]
+            if request.get("business_id"):
+                input_data["business_id"] = request["business_id"]
+
+            # Validate input
+            validation_errors = await pipeline.validate_input(input_data)
+            if validation_errors:
+                error_msg = "; ".join(validation_errors)
+                await self._update_execution_status(
+                    execution_id=execution_id,
+                    status="failed",
+                    error_message=f"Validation failed: {error_msg}",
+                    completed_at=datetime.utcnow(),
+                )
+                return execution_id, PipelineResult.from_error(
+                    pipeline_id, f"Validation failed: {error_msg}"
+                )
+
+            # Execute pipeline
+            start_time = time.time()
+            result = await pipeline.process(input_data, stages=stages)
+            duration_ms = int((time.time() - start_time) * 1000)
+
+            # Update execution with result
+            if result.success:
+                await self._update_execution_status(
+                    execution_id=execution_id,
+                    status="completed",
+                    stages_completed=result.stages_run,
+                    result_summary=self._summarize_result(result),
+                    completed_at=datetime.utcnow(),
+                )
+            else:
+                await self._update_execution_status(
+                    execution_id=execution_id,
+                    status="failed",
+                    stages_completed=result.stages_run,
+                    error_message=result.error,
+                    completed_at=datetime.utcnow(),
+                )
+
+            logger.info(
+                f"Pipeline {pipeline_id} execution {execution_id} "
+                f"completed in {duration_ms}ms: success={result.success}"
+            )
+
+            return execution_id, result
+
+        except Exception as e:
+            logger.exception(f"Pipeline {pipeline_id} execution failed: {e}")
+
+            await self._update_execution_status(
+                execution_id=execution_id,
+                status="failed",
+                error_message=str(e),
+                completed_at=datetime.utcnow(),
+            )
+
+            return execution_id, PipelineResult.from_error(pipeline_id, str(e))
+
+    async def cancel(self, execution_id: str) -> bool:
+        """
+        Cancel a running execution.
+
+        Args:
+            execution_id: Execution identifier
+
+        Returns:
+            True if cancelled, False if not found or already completed
+        """
+        async with self._pool.acquire() as conn:
+            result = await conn.execute(
+                """
+                UPDATE pipeline.executions
+                SET status = 'cancelled', completed_at = NOW()
+                WHERE id = $1 AND status IN ('pending', 'running')
+                """,
+                uuid.UUID(execution_id),
+            )
+
+        cancelled = result.split()[-1] != "0"
+        if cancelled:
+            logger.info(f"Cancelled execution: {execution_id}")
+        return cancelled
+
+    async def get_execution(self, execution_id: str) -> ExecutionStatus | None:
+        """
+        Get execution status.
+
+        Args:
+            execution_id: Execution identifier
+
+        Returns:
+            ExecutionStatus or None if not found
+        """
+        async with self._pool.acquire() as conn:
+            row = await conn.fetchrow(
+                """
+                SELECT id, pipeline_id, job_id, business_id, status,
+                       stages_requested, stages_completed, current_stage,
+                       input_summary, result_summary, error_message,
+                       started_at, completed_at, created_at
+                FROM pipeline.executions
+                WHERE id = $1
+                """,
+                uuid.UUID(execution_id),
+            )
+
+        if not row:
+            return None
+
+        return self._row_to_execution_status(row)
+
+    async def list_executions(
+        self,
+        pipeline_id: str | None = None,
+        job_id: str | None = None,
+        business_id: str | None = None,
+        status: str | None = None,
+        limit: int = 50,
+        offset: int = 0,
+    ) -> list[ExecutionStatus]:
+        """
+        List execution history.
+
+        Args:
+            pipeline_id: Filter by pipeline
+            job_id: Filter by job
+            business_id: Filter by business
+            status: Filter by status
+            limit: Maximum results
+            offset: Result offset
+
+        Returns:
+            List of ExecutionStatus
+        """
+        conditions = []
+        params = []
+        param_idx = 1
+
+        if pipeline_id:
+            conditions.append(f"pipeline_id = ${param_idx}")
+            params.append(pipeline_id)
+            param_idx += 1
+
+        if job_id:
+            conditions.append(f"job_id = ${param_idx}")
+            params.append(uuid.UUID(job_id))
+            param_idx += 1
+
+        if business_id:
+            conditions.append(f"business_id = ${param_idx}")
+            params.append(business_id)
+            param_idx += 1
+
+        if status:
+            conditions.append(f"status = ${param_idx}")
+            params.append(status)
+            param_idx += 1
+
+        where_clause = "WHERE " + " AND ".join(conditions) if conditions else ""
+
+        query = f"""
+            SELECT id, pipeline_id, job_id, business_id, status,
+                   stages_requested, stages_completed, current_stage,
+                   input_summary, result_summary, error_message,
+                   started_at, completed_at, created_at
+            FROM pipeline.executions
+            {where_clause}
+            ORDER BY created_at DESC
+            LIMIT ${param_idx} OFFSET ${param_idx + 1}
+        """
+        params.extend([limit, offset])
+
+        async with self._pool.acquire() as conn:
+            rows = await conn.fetch(query, *params)
+
+        return [self._row_to_execution_status(row) for row in rows]
+
+    async def get_execution_count(
+        self,
+        pipeline_id: str | None = None,
+        status: str | None = None,
+    ) -> int:
+        """
+        Get execution count.
+
+        Args:
+            pipeline_id: Filter by pipeline
+            status: Filter by status
+
+        Returns:
+            Count of executions matching filters
+        """
+        conditions = []
+        params = []
+        param_idx = 1
+
+        if pipeline_id:
+            conditions.append(f"pipeline_id = ${param_idx}")
+            params.append(pipeline_id)
+            param_idx += 1
+
+        if status:
+            conditions.append(f"status = ${param_idx}")
+            params.append(status)
+            param_idx += 1
+
+        where_clause = "WHERE " + " AND ".join(conditions) if conditions else ""
+
+        async with self._pool.acquire() as conn:
+            result = await conn.fetchval(
+                f"SELECT COUNT(*) FROM pipeline.executions {where_clause}",
+                *params,
+            )
+
+        return result or 0
+
+    async def _create_execution(
+        self,
+        execution_id: str,
+        pipeline_id: str,
+        job_id: str | None,
+        business_id: str | None,
+        stages_requested: list[str],
+    ) -> None:
+        """Create an execution record."""
+        async with self._pool.acquire() as conn:
+            await conn.execute(
+                """
+                INSERT INTO pipeline.executions (
+                    id, pipeline_id, job_id, business_id,
+                    status, stages_requested, created_at
+                )
+                VALUES ($1, $2, $3, $4, 'pending', $5, NOW())
+                """,
+                uuid.UUID(execution_id),
+                pipeline_id,
+                uuid.UUID(job_id) if job_id else None,
+                business_id,
+                stages_requested,
+            )
+
+    async def _update_execution_status(
+        self,
+        execution_id: str,
+        status: str,
+        current_stage: str | None = None,
+        stages_completed: list[str] | None = None,
+        input_summary: dict[str, Any] | None = None,
+        result_summary: dict[str, Any] | None = None,
+        error_message: str | None = None,
+        started_at: datetime | None = None,
+        completed_at: datetime | None = None,
+    ) -> None:
+        """Update execution status."""
+        updates = ["status = $2"]
+        params: list[Any] = [uuid.UUID(execution_id), status]
+        param_idx = 3
+
+        if current_stage is not None:
+            updates.append(f"current_stage = ${param_idx}")
+            params.append(current_stage)
+            param_idx += 1
+
+        if stages_completed is not None:
+            updates.append(f"stages_completed = ${param_idx}")
+            params.append(stages_completed)
+            param_idx += 1
+
+        if input_summary is not None:
+            updates.append(f"input_summary = ${param_idx}")
+            params.append(input_summary)
+            param_idx += 1
+
+        if result_summary is not None:
+            updates.append(f"result_summary = ${param_idx}")
+            params.append(result_summary)
+            param_idx += 1
+
+        if error_message is not None:
+            updates.append(f"error_message = ${param_idx}")
+            params.append(error_message)
+            param_idx += 1
+
+        if started_at is not None:
+            updates.append(f"started_at = ${param_idx}")
+            params.append(started_at)
+            param_idx += 1
+
+        if completed_at is not None:
+            updates.append(f"completed_at = ${param_idx}")
+            params.append(completed_at)
+            param_idx += 1
+
+        query = f"""
+            UPDATE pipeline.executions
+            SET {", ".join(updates)}
+            WHERE id = $1
+        """
+
+        async with self._pool.acquire() as conn:
+            await conn.execute(query, *params)
+
+    def _row_to_execution_status(self, row: Any) -> ExecutionStatus:
+        """Convert database row to ExecutionStatus."""
+        # Calculate progress
+        stages_requested = row["stages_requested"] or []
+        stages_completed = row["stages_completed"] or []
+        progress = (
+            len(stages_completed) / len(stages_requested)
+            if stages_requested
+            else 0.0
+        )
+
+        return ExecutionStatus(
+            id=str(row["id"]),
+            pipeline_id=row["pipeline_id"],
+            job_id=str(row["job_id"]) if row["job_id"] else None,
+            business_id=row["business_id"],
+            status=row["status"],
+            stages_requested=stages_requested,
+            stages_completed=stages_completed,
+            current_stage=row["current_stage"],
+            progress=progress,
+            input_summary=row["input_summary"],
+            result_summary=row["result_summary"],
+            error_message=row["error_message"],
+            started_at=row["started_at"].isoformat() if row["started_at"] else None,
+            completed_at=row["completed_at"].isoformat() if row["completed_at"] else None,
+            created_at=row["created_at"].isoformat() if row["created_at"] else None,
+        )
+
+    def _summarize_result(self, result: PipelineResult) -> dict[str, Any]:
+        """Create a summary of the pipeline result for storage."""
+        summary: dict[str, Any] = {
+            "success": result.success,
+            "stages_run": result.stages_run,
+        }
+
+        # Add stage-specific summaries
+        for stage, stage_result in result.stage_results.items():
+            if stage_result.get("data"):
+                # Extract stats if available
+                data = stage_result["data"]
+                if "stats" in data:
+                    summary[f"{stage}_stats"] = data["stats"]
+
+        return summary