feat: Add extensible multi-pipeline integration system

This commit implements a plugin-like pipeline architecture with:

Pipeline Core Package (packages/pipeline-core/):
- BasePipeline abstract class all pipelines implement
- PipelineRegistry for database-backed discovery/management
- PipelineRunner for execution with status tracking
- DashboardConfig contracts for dynamic widget definitions

Database Migration (006_pipeline_registry.sql):
- pipeline.registry table for registered pipelines
- pipeline.executions table for execution history
- Views for execution stats and monitoring

ReviewIQ Pipeline Refactor:
- Implements BasePipeline interface
- Adds get_dashboard_config() with widget definitions
- Adds get_widget_data() methods for all dashboard widgets
- Maintains backward compatibility with Pipeline alias

Generic Pipeline API (api/routes/pipelines.py):
- GET /api/pipelines - List all registered pipelines
- GET /api/pipelines/{id} - Pipeline details
- POST /api/pipelines/{id}/execute - Execute pipeline
- GET /api/pipelines/{id}/dashboard - Dashboard config
- GET /api/pipelines/{id}/widgets/{w} - Widget data
- GET /api/pipelines/{id}/executions - Execution history

Frontend Dynamic Dashboard System:
- DynamicDashboard component renders from config
- WidgetRegistry maps types to components
- Widget components: StatCard, LineChart, BarChart,
  PieChart, DataTable, Heatmap
- Pipeline API client library

Frontend Pipeline Pages:
- /pipelines - List all registered pipelines
- /pipelines/[id] - Dynamic dashboard for pipeline
- /pipelines/[id]/executions - Execution history
- Pipelines nav item in Sidebar

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Alejandro Gutiérrez
2026-01-24 19:05:38 +00:00
parent d64f06ba9e
commit 824634aa76
30 changed files with 5697 additions and 95 deletions

View File

@@ -0,0 +1,119 @@
# Pipeline Core
Extensible multi-pipeline framework with dynamic dashboards.
## Overview
Pipeline Core provides the base abstractions for building pipelines that can be:
- Discovered and registered dynamically
- Executed with status tracking
- Rendered with auto-generated dashboards
## Features
- **BasePipeline** - Abstract base class all pipelines implement
- **PipelineRegistry** - Database-backed pipeline discovery and management
- **PipelineRunner** - Execution with status tracking
- **Dashboard Contracts** - TypedDicts for widget configuration
## Installation
```bash
pip install -e packages/pipeline-core
```
## Usage
### Implementing a Pipeline
```python
from pipeline_core import BasePipeline, PipelineMetadata, DashboardConfig
class MyPipeline(BasePipeline):
@property
def metadata(self) -> PipelineMetadata:
return {
"id": "my-pipeline",
"name": "My Pipeline",
"description": "Does something useful",
"version": "1.0.0",
"stages": ["stage1", "stage2"],
"input_type": "MyInputType",
}
async def initialize(self) -> None:
# Set up connections
pass
async def close(self) -> None:
# Clean up
pass
async def process(self, input_data, stages=None):
# Run the pipeline
pass
def get_dashboard_config(self) -> DashboardConfig:
return {
"pipeline_id": "my-pipeline",
"title": "My Dashboard",
"sections": [...]
}
async def get_widget_data(self, widget_id, params):
# Return widget data
pass
```
### Registering a Pipeline
```python
from pipeline_core import PipelineRegistry
import asyncpg
pool = await asyncpg.create_pool(database_url)
registry = PipelineRegistry(pool)
await registry.register(
pipeline_id="my-pipeline",
name="My Pipeline",
description="Does something useful",
version="1.0.0",
module_path="my_package.pipeline:MyPipeline",
stages=["stage1", "stage2"],
input_type="MyInputType",
)
```
### Executing a Pipeline
```python
from pipeline_core import PipelineRunner
runner = PipelineRunner(pool, registry)
execution_id, result = await runner.execute(
pipeline_id="my-pipeline",
request={
"input_data": {"key": "value"},
"stages": ["stage1"],
}
)
```
## Dashboard Widgets
Pipelines declare dashboard widgets via `get_dashboard_config()`. Available widget types:
- `stat_card` - KPI stat card with value and trend
- `line_chart` - Time series line chart
- `bar_chart` - Bar chart (horizontal or vertical)
- `pie_chart` - Pie/donut chart
- `table` - Data table with columns
- `heatmap` - Heatmap grid visualization
- `area_chart` - Stacked area chart
- `gauge` - Gauge/meter visualization
## License
MIT

View File

@@ -0,0 +1,65 @@
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "pipeline-core"
version = "0.1.0"
description = "Pipeline Core - Extensible multi-pipeline framework with dynamic dashboards"
readme = "README.md"
license = "MIT"
requires-python = ">=3.11"
authors = [
{ name = "ReviewIQ Team" }
]
keywords = ["pipeline", "framework", "dashboard", "registry"]
classifiers = [
"Development Status :: 3 - Alpha",
"Intended Audience :: Developers",
"License :: OSI Approved :: MIT License",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
]
dependencies = [
"asyncpg>=0.28.0",
"pydantic>=2.0",
"pydantic-settings>=2.0",
]
[project.optional-dependencies]
dev = [
"pytest>=7.0",
"pytest-asyncio>=0.21.0",
"pytest-cov>=4.0",
"ruff>=0.1.0",
"mypy>=1.0",
]
[project.urls]
Homepage = "https://github.com/reviewiq/pipeline-core"
Documentation = "https://github.com/reviewiq/pipeline-core#readme"
Repository = "https://github.com/reviewiq/pipeline-core"
[tool.hatch.build.targets.wheel]
packages = ["src/pipeline_core"]
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
pythonpath = ["src"]
[tool.ruff]
line-length = 100
target-version = "py311"
[tool.ruff.lint]
select = ["E", "F", "I", "N", "W", "UP"]
ignore = ["E501"]
[tool.mypy]
python_version = "3.11"
strict = true
warn_return_any = true
warn_unused_ignores = true

View File

@@ -0,0 +1,34 @@
"""
Pipeline Core - Extensible multi-pipeline framework with dynamic dashboards.
This package provides the base abstractions for building pipelines that can be
discovered, registered, and rendered with dynamic dashboards.
"""
from pipeline_core.base import BasePipeline, PipelineMetadata, PipelineResult
from pipeline_core.contracts import (
DashboardConfig,
DashboardSection,
WidgetConfig,
WidgetType,
)
from pipeline_core.registry import PipelineRegistry
from pipeline_core.runner import PipelineRunner
__version__ = "0.1.0"
__all__ = [
# Base classes
"BasePipeline",
"PipelineMetadata",
"PipelineResult",
# Contracts
"DashboardConfig",
"DashboardSection",
"WidgetConfig",
"WidgetType",
# Registry
"PipelineRegistry",
# Runner
"PipelineRunner",
]

View File

@@ -0,0 +1,263 @@
"""
Base Pipeline abstract class and related types.
All pipelines must implement this interface to be compatible with the
pipeline registry, runner, and dynamic dashboard system.
"""
from __future__ import annotations
from abc import ABC, abstractmethod
from typing import Any, TypedDict
from pipeline_core.contracts import DashboardConfig
class PipelineMetadata(TypedDict):
"""Metadata describing a pipeline."""
id: str # Unique pipeline identifier (e.g., "reviewiq")
name: str # Display name (e.g., "ReviewIQ Classification Pipeline")
description: str # Human-readable description
version: str # Semantic version (e.g., "1.0.0")
stages: list[str] # Ordered list of stage names
input_type: str # Expected input type (e.g., "ScraperV1Output")
class StageResult(TypedDict, total=False):
"""Result from running a single pipeline stage."""
stage: str # Stage name
success: bool # Whether the stage succeeded
data: dict[str, Any] # Stage output data
error: str | None # Error message if failed
duration_ms: int # Stage execution time
class PipelineResult:
"""Result from running a pipeline."""
def __init__(
self,
pipeline_id: str,
stages_run: list[str] | None = None,
stage_results: dict[str, StageResult] | None = None,
success: bool = True,
error: str | None = None,
):
"""
Initialize pipeline result.
Args:
pipeline_id: Pipeline identifier
stages_run: List of stages that were run
stage_results: Results from each stage
success: Overall success status
error: Error message if failed
"""
self.pipeline_id = pipeline_id
self.stages_run = stages_run or []
self.stage_results = stage_results or {}
self.success = success
self.error = error
def get_stage_result(self, stage: str) -> StageResult | None:
"""Get result for a specific stage."""
return self.stage_results.get(stage)
def to_dict(self) -> dict[str, Any]:
"""Convert to dictionary."""
return {
"pipeline_id": self.pipeline_id,
"stages_run": self.stages_run,
"stage_results": self.stage_results,
"success": self.success,
"error": self.error,
}
@classmethod
def from_error(cls, pipeline_id: str, error: str) -> PipelineResult:
"""Create a failed result from an error."""
return cls(
pipeline_id=pipeline_id,
success=False,
error=error,
)
class BasePipeline(ABC):
"""
Abstract base class for all pipelines.
All pipelines must implement this interface to be compatible with:
- Pipeline registry (discovery and management)
- Pipeline runner (execution)
- Dynamic dashboard system (widget configuration and data)
Example implementation:
class ReviewIQPipeline(BasePipeline):
@property
def metadata(self) -> PipelineMetadata:
return {
"id": "reviewiq",
"name": "ReviewIQ Classification Pipeline",
"description": "Classifies reviews using URT taxonomy",
"version": "1.0.0",
"stages": ["normalize", "classify", "route", "aggregate"],
"input_type": "ScraperV1Output",
}
async def initialize(self) -> None:
# Set up database connections, etc.
pass
async def process(self, input_data, stages=None) -> PipelineResult:
# Run the pipeline
pass
def get_dashboard_config(self) -> DashboardConfig:
return {
"pipeline_id": "reviewiq",
"title": "ReviewIQ Analytics",
"sections": [...]
}
async def get_widget_data(self, widget_id, params) -> dict:
# Return data for a specific widget
pass
"""
@property
@abstractmethod
def metadata(self) -> PipelineMetadata:
"""
Get pipeline metadata.
Returns:
PipelineMetadata with id, name, description, version, stages, input_type
"""
...
@abstractmethod
async def initialize(self) -> None:
"""
Initialize the pipeline.
This is called before any processing. Use it to:
- Establish database connections
- Load configuration
- Initialize services
This method may be called multiple times but should be idempotent.
"""
...
@abstractmethod
async def close(self) -> None:
"""
Close and cleanup pipeline resources.
This is called when the pipeline is no longer needed. Use it to:
- Close database connections
- Release resources
- Cleanup temporary files
"""
...
@abstractmethod
async def process(
self,
input_data: dict[str, Any],
stages: list[str] | None = None,
) -> PipelineResult:
"""
Process input data through the pipeline.
Args:
input_data: Input data dictionary (format depends on input_type)
stages: List of stages to run (default: all stages)
Returns:
PipelineResult with stage outputs and validation results
"""
...
@abstractmethod
def get_dashboard_config(self) -> DashboardConfig:
"""
Get the dashboard configuration for this pipeline.
Returns:
DashboardConfig with sections and widget definitions
The frontend uses this configuration to dynamically render
the pipeline's dashboard with appropriate widgets.
"""
...
@abstractmethod
async def get_widget_data(
self,
widget_id: str,
params: dict[str, Any],
) -> dict[str, Any]:
"""
Get data for a specific dashboard widget.
Args:
widget_id: Widget identifier (from dashboard config)
params: Query parameters (e.g., time range, filters)
Returns:
Dictionary with widget data in the format expected by the widget type
Common params:
- business_id: Filter by business
- time_range: Time range (e.g., "7d", "30d", "custom")
- start_date: Start date for custom range
- end_date: End date for custom range
"""
...
# Optional methods with default implementations
async def validate_input(self, input_data: dict[str, Any]) -> list[str]:
"""
Validate input data before processing.
Args:
input_data: Input data to validate
Returns:
List of validation error messages (empty if valid)
Override this to add custom input validation.
"""
return []
async def health_check(self) -> dict[str, Any]:
"""
Perform a health check on the pipeline.
Returns:
Dictionary with health status:
- healthy: bool
- checks: dict of individual check results
- message: optional message
Override this to add custom health checks.
"""
return {
"healthy": True,
"checks": {},
"message": None,
}
def get_stage_names(self) -> list[str]:
"""Get the list of stage names."""
return self.metadata["stages"]
def get_pipeline_id(self) -> str:
"""Get the pipeline identifier."""
return self.metadata["id"]

View File

@@ -0,0 +1,252 @@
"""
Dashboard and Widget contracts for the pipeline system.
These TypedDicts define the data structures for dynamic dashboard configuration,
allowing pipelines to declare their dashboard widgets which the frontend renders.
"""
from __future__ import annotations
from typing import Any, Literal, TypedDict
# =============================================================================
# Widget Types
# =============================================================================
WidgetType = Literal[
"stat_card", # KPI stat card with value and optional trend
"line_chart", # Time series line chart
"bar_chart", # Bar chart (horizontal or vertical)
"pie_chart", # Pie/donut chart
"table", # Data table with columns
"heatmap", # Heatmap grid visualization
"area_chart", # Stacked area chart
"gauge", # Gauge/meter visualization
]
# =============================================================================
# Widget Configuration
# =============================================================================
class GridPosition(TypedDict):
"""Grid position for a widget in the dashboard layout."""
x: int # Column position (0-based)
y: int # Row position (0-based)
w: int # Width in grid units
h: int # Height in grid units
class StatCardConfig(TypedDict, total=False):
"""Configuration specific to stat card widgets."""
value_key: str # Key in data for the main value
label: str # Label to display
format: str # Format string (e.g., "{value:,}", "{value:.1%}")
trend_key: str | None # Key for trend value (optional)
trend_format: str | None # Format for trend (e.g., "+{value:.1%}")
icon: str | None # Icon name (optional)
color: str | None # Color theme (e.g., "blue", "green", "red")
class ChartAxisConfig(TypedDict, total=False):
"""Configuration for chart axes."""
key: str # Data key for this axis
label: str # Axis label
type: Literal["number", "category", "time"]
format: str | None # Format string
class ChartSeriesConfig(TypedDict, total=False):
"""Configuration for a chart data series."""
key: str # Data key
name: str # Display name
color: str | None # Series color
type: Literal["line", "bar", "area"] | None
class ChartConfig(TypedDict, total=False):
"""Configuration for chart widgets (line, bar, area)."""
x_axis: ChartAxisConfig
y_axis: ChartAxisConfig
series: list[ChartSeriesConfig]
stacked: bool
show_legend: bool
show_grid: bool
class PieChartConfig(TypedDict, total=False):
"""Configuration for pie/donut chart widgets."""
value_key: str # Key for segment value
label_key: str # Key for segment label
colors: list[str] | None # Custom color palette
show_legend: bool
show_labels: bool
inner_radius: int | None # For donut chart (0 = pie)
class TableColumnConfig(TypedDict, total=False):
"""Configuration for a table column."""
key: str # Data key
header: str # Column header
width: int | None # Column width
align: Literal["left", "center", "right"]
format: str | None # Format string
sortable: bool
class TableConfig(TypedDict, total=False):
"""Configuration for table widgets."""
columns: list[TableColumnConfig]
row_key: str # Key for unique row identifier
page_size: int
show_pagination: bool
sortable: bool
filterable: bool
class HeatmapConfig(TypedDict, total=False):
"""Configuration for heatmap widgets."""
x_key: str # Key for x-axis categories
y_key: str # Key for y-axis categories
value_key: str # Key for cell values
color_scale: list[str] # Color gradient
show_values: bool
format: str | None # Format for values
class GaugeConfig(TypedDict, total=False):
"""Configuration for gauge widgets."""
value_key: str # Key for gauge value
min: float # Minimum value
max: float # Maximum value
thresholds: list[dict[str, Any]] # Color thresholds
format: str | None # Format string
# Union of all widget-specific configs
WidgetSpecificConfig = (
StatCardConfig
| ChartConfig
| PieChartConfig
| TableConfig
| HeatmapConfig
| GaugeConfig
)
class WidgetConfig(TypedDict, total=False):
"""Configuration for a dashboard widget."""
id: str # Unique widget identifier
type: WidgetType # Widget type
title: str # Widget title
grid: GridPosition # Grid position and size
config: dict[str, Any] # Widget-specific configuration
data_endpoint: str | None # Custom data endpoint (if not default)
refresh_interval: int | None # Auto-refresh interval in seconds
# =============================================================================
# Dashboard Configuration
# =============================================================================
class DashboardSection(TypedDict):
"""A section in the dashboard containing widgets."""
id: str # Unique section identifier
title: str # Section title
description: str | None # Optional description
widgets: list[WidgetConfig]
collapsed: bool | None # Whether section is collapsed by default
class DashboardConfig(TypedDict):
"""Full dashboard configuration for a pipeline."""
pipeline_id: str # Pipeline identifier
title: str # Dashboard title
description: str | None # Optional description
sections: list[DashboardSection]
default_time_range: str | None # Default time range (e.g., "7d", "30d")
refresh_interval: int | None # Global refresh interval in seconds
# =============================================================================
# Execution Types
# =============================================================================
class ExecutionStatus(TypedDict, total=False):
"""Status of a pipeline execution."""
id: str # Execution ID
pipeline_id: str # Pipeline identifier
job_id: str | None # Associated job ID
business_id: str | None # Business identifier
status: Literal["pending", "running", "completed", "failed", "cancelled"]
stages_requested: list[str]
stages_completed: list[str]
current_stage: str | None
progress: float # 0.0 to 1.0
input_summary: dict[str, Any] | None
result_summary: dict[str, Any] | None
error_message: str | None
started_at: str | None
completed_at: str | None
created_at: str
class ExecutionRequest(TypedDict, total=False):
"""Request to execute a pipeline."""
job_id: str | None # Job ID to process
business_id: str | None # Business identifier
input_data: dict[str, Any] | None # Direct input data
stages: list[str] | None # Stages to run (default: all)
options: dict[str, Any] | None # Pipeline-specific options
# =============================================================================
# Pipeline Info Types
# =============================================================================
class PipelineInfo(TypedDict):
"""Summary information about a pipeline."""
id: str # Pipeline ID (e.g., "reviewiq")
name: str # Display name
description: str
version: str
is_enabled: bool
stages: list[str] # Available stages
input_type: str # Expected input type
class PipelineDetail(TypedDict):
"""Detailed pipeline information including metadata."""
id: str
name: str
description: str
version: str
is_enabled: bool
stages: list[str]
input_type: str
module_path: str
config: dict[str, Any] | None
created_at: str
updated_at: str

View File

@@ -0,0 +1,455 @@
"""
Pipeline Registry - Database-backed discovery and management of pipelines.
The registry maintains a list of registered pipelines and their metadata,
allowing the system to discover available pipelines and instantiate them.
"""
from __future__ import annotations
import importlib
import logging
from datetime import datetime
from typing import TYPE_CHECKING, Any
from pipeline_core.contracts import PipelineDetail, PipelineInfo
if TYPE_CHECKING:
import asyncpg
from pipeline_core.base import BasePipeline
logger = logging.getLogger(__name__)
class PipelineRegistry:
"""
Database-backed registry for pipeline discovery and management.
The registry stores pipeline metadata in a PostgreSQL table and provides
methods to register, list, and instantiate pipelines.
Usage:
pool = await asyncpg.create_pool(database_url)
registry = PipelineRegistry(pool)
# Register a pipeline
await registry.register(
pipeline_id="reviewiq",
name="ReviewIQ Pipeline",
description="Classifies reviews",
version="1.0.0",
module_path="reviewiq_pipeline.pipeline:ReviewIQPipeline",
)
# List pipelines
pipelines = await registry.list_pipelines()
# Get a pipeline instance
pipeline = await registry.get_pipeline("reviewiq")
"""
def __init__(self, pool: asyncpg.Pool):
"""
Initialize the registry.
Args:
pool: asyncpg connection pool
"""
self._pool = pool
self._instances: dict[str, BasePipeline] = {}
async def register(
self,
pipeline_id: str,
name: str,
description: str,
version: str,
module_path: str,
stages: list[str],
input_type: str,
config: dict[str, Any] | None = None,
is_enabled: bool = True,
) -> None:
"""
Register a pipeline in the database.
Args:
pipeline_id: Unique pipeline identifier
name: Display name
description: Human-readable description
version: Semantic version
module_path: Python module path (e.g., "package.module:ClassName")
stages: List of stage names
input_type: Expected input type
config: Optional pipeline configuration
is_enabled: Whether the pipeline is enabled
Raises:
ValueError: If module_path is invalid
"""
# Validate module path format
if ":" not in module_path:
raise ValueError(
f"Invalid module_path: {module_path}. "
"Expected format: 'package.module:ClassName'"
)
async with self._pool.acquire() as conn:
await conn.execute(
"""
INSERT INTO pipeline.registry (
pipeline_id, name, description, version, module_path,
stages, input_type, config, is_enabled, updated_at
)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, NOW())
ON CONFLICT (pipeline_id) DO UPDATE SET
name = EXCLUDED.name,
description = EXCLUDED.description,
version = EXCLUDED.version,
module_path = EXCLUDED.module_path,
stages = EXCLUDED.stages,
input_type = EXCLUDED.input_type,
config = EXCLUDED.config,
is_enabled = EXCLUDED.is_enabled,
updated_at = NOW()
""",
pipeline_id,
name,
description,
version,
module_path,
stages,
input_type,
config,
is_enabled,
)
logger.info(f"Registered pipeline: {pipeline_id} v{version}")
async def register_from_instance(
self,
pipeline: BasePipeline,
module_path: str,
config: dict[str, Any] | None = None,
) -> None:
"""
Register a pipeline from an instance.
Args:
pipeline: Pipeline instance
module_path: Python module path
config: Optional configuration
"""
metadata = pipeline.metadata
await self.register(
pipeline_id=metadata["id"],
name=metadata["name"],
description=metadata["description"],
version=metadata["version"],
module_path=module_path,
stages=metadata["stages"],
input_type=metadata["input_type"],
config=config,
)
# Cache the instance
self._instances[metadata["id"]] = pipeline
async def unregister(self, pipeline_id: str) -> bool:
"""
Unregister a pipeline from the database.
Args:
pipeline_id: Pipeline identifier to remove
Returns:
True if pipeline was removed, False if not found
"""
async with self._pool.acquire() as conn:
result = await conn.execute(
"DELETE FROM pipeline.registry WHERE pipeline_id = $1",
pipeline_id,
)
# Remove from cache
self._instances.pop(pipeline_id, None)
deleted = result.split()[-1] != "0"
if deleted:
logger.info(f"Unregistered pipeline: {pipeline_id}")
return deleted
async def set_enabled(self, pipeline_id: str, enabled: bool) -> bool:
"""
Enable or disable a pipeline.
Args:
pipeline_id: Pipeline identifier
enabled: Whether to enable or disable
Returns:
True if pipeline was updated, False if not found
"""
async with self._pool.acquire() as conn:
result = await conn.execute(
"""
UPDATE pipeline.registry
SET is_enabled = $2, updated_at = NOW()
WHERE pipeline_id = $1
""",
pipeline_id,
enabled,
)
updated = result.split()[-1] != "0"
if updated:
logger.info(f"Set pipeline {pipeline_id} enabled={enabled}")
return updated
async def list_pipelines(
self,
enabled_only: bool = True,
) -> list[PipelineInfo]:
"""
List all registered pipelines.
Args:
enabled_only: Only return enabled pipelines
Returns:
List of PipelineInfo dictionaries
"""
async with self._pool.acquire() as conn:
if enabled_only:
rows = await conn.fetch(
"""
SELECT pipeline_id, name, description, version,
is_enabled, stages, input_type
FROM pipeline.registry
WHERE is_enabled = TRUE
ORDER BY name
"""
)
else:
rows = await conn.fetch(
"""
SELECT pipeline_id, name, description, version,
is_enabled, stages, input_type
FROM pipeline.registry
ORDER BY name
"""
)
return [
PipelineInfo(
id=row["pipeline_id"],
name=row["name"],
description=row["description"],
version=row["version"],
is_enabled=row["is_enabled"],
stages=row["stages"],
input_type=row["input_type"],
)
for row in rows
]
async def get_pipeline_detail(
self,
pipeline_id: str,
) -> PipelineDetail | None:
"""
Get detailed information about a pipeline.
Args:
pipeline_id: Pipeline identifier
Returns:
PipelineDetail or None if not found
"""
async with self._pool.acquire() as conn:
row = await conn.fetchrow(
"""
SELECT pipeline_id, name, description, version, module_path,
is_enabled, stages, input_type, config,
created_at, updated_at
FROM pipeline.registry
WHERE pipeline_id = $1
""",
pipeline_id,
)
if not row:
return None
return PipelineDetail(
id=row["pipeline_id"],
name=row["name"],
description=row["description"],
version=row["version"],
is_enabled=row["is_enabled"],
stages=row["stages"],
input_type=row["input_type"],
module_path=row["module_path"],
config=row["config"],
created_at=row["created_at"].isoformat() if row["created_at"] else None,
updated_at=row["updated_at"].isoformat() if row["updated_at"] else None,
)
async def get_pipeline(
self,
pipeline_id: str,
initialize: bool = True,
) -> BasePipeline | None:
"""
Get a pipeline instance.
Args:
pipeline_id: Pipeline identifier
initialize: Whether to call initialize() on the pipeline
Returns:
Pipeline instance or None if not found
This method caches pipeline instances for reuse.
"""
# Check cache first
if pipeline_id in self._instances:
return self._instances[pipeline_id]
# Get pipeline details from database
detail = await self.get_pipeline_detail(pipeline_id)
if not detail or not detail["is_enabled"]:
return None
# Import and instantiate the pipeline
try:
pipeline = self._import_pipeline(detail["module_path"])
except Exception as e:
logger.error(f"Failed to import pipeline {pipeline_id}: {e}")
return None
# Initialize if requested
if initialize:
try:
await pipeline.initialize()
except Exception as e:
logger.error(f"Failed to initialize pipeline {pipeline_id}: {e}")
return None
# Cache and return
self._instances[pipeline_id] = pipeline
return pipeline
def _import_pipeline(self, module_path: str) -> BasePipeline:
"""
Import a pipeline class from a module path.
Args:
module_path: Path in format "package.module:ClassName"
Returns:
Pipeline instance
"""
module_name, class_name = module_path.rsplit(":", 1)
module = importlib.import_module(module_name)
cls = getattr(module, class_name)
return cls()
async def close_all(self) -> None:
"""Close all cached pipeline instances."""
for pipeline_id, pipeline in self._instances.items():
try:
await pipeline.close()
except Exception as e:
logger.error(f"Error closing pipeline {pipeline_id}: {e}")
self._instances.clear()
class InMemoryPipelineRegistry:
"""
In-memory pipeline registry for testing and simple deployments.
This registry doesn't persist to a database and stores pipelines in memory.
"""
def __init__(self):
self._pipelines: dict[str, BasePipeline] = {}
self._enabled: dict[str, bool] = {}
async def register(self, pipeline: BasePipeline) -> None:
"""Register a pipeline instance."""
pipeline_id = pipeline.metadata["id"]
self._pipelines[pipeline_id] = pipeline
self._enabled[pipeline_id] = True
logger.info(f"Registered pipeline: {pipeline_id}")
async def unregister(self, pipeline_id: str) -> bool:
"""Unregister a pipeline."""
if pipeline_id in self._pipelines:
del self._pipelines[pipeline_id]
del self._enabled[pipeline_id]
return True
return False
async def set_enabled(self, pipeline_id: str, enabled: bool) -> bool:
"""Enable or disable a pipeline."""
if pipeline_id in self._enabled:
self._enabled[pipeline_id] = enabled
return True
return False
async def list_pipelines(
self,
enabled_only: bool = True,
) -> list[PipelineInfo]:
"""List all registered pipelines."""
result = []
for pipeline_id, pipeline in self._pipelines.items():
is_enabled = self._enabled.get(pipeline_id, True)
if enabled_only and not is_enabled:
continue
metadata = pipeline.metadata
result.append(
PipelineInfo(
id=pipeline_id,
name=metadata["name"],
description=metadata["description"],
version=metadata["version"],
is_enabled=is_enabled,
stages=metadata["stages"],
input_type=metadata["input_type"],
)
)
return result
async def get_pipeline(
self,
pipeline_id: str,
initialize: bool = True,
) -> BasePipeline | None:
"""Get a pipeline instance."""
if pipeline_id not in self._pipelines:
return None
if not self._enabled.get(pipeline_id, True):
return None
pipeline = self._pipelines[pipeline_id]
if initialize:
await pipeline.initialize()
return pipeline
async def close_all(self) -> None:
"""Close all pipeline instances."""
for pipeline in self._pipelines.values():
try:
await pipeline.close()
except Exception as e:
logger.error(f"Error closing pipeline: {e}")
self._pipelines.clear()
self._enabled.clear()

View File

@@ -0,0 +1,467 @@
"""
Pipeline Runner - Executes pipelines and tracks execution history.
The runner handles pipeline execution, tracking execution status in the database,
and providing execution history for monitoring and debugging.
"""
from __future__ import annotations
import logging
import time
import uuid
from datetime import datetime
from typing import TYPE_CHECKING, Any
from pipeline_core.base import PipelineResult
from pipeline_core.contracts import ExecutionRequest, ExecutionStatus
if TYPE_CHECKING:
import asyncpg
from pipeline_core.base import BasePipeline
from pipeline_core.registry import PipelineRegistry
logger = logging.getLogger(__name__)
class PipelineRunner:
"""
Executes pipelines and tracks execution history.
The runner:
- Gets pipeline instances from the registry
- Tracks execution status in the database
- Handles errors and updates status
- Provides execution history queries
Usage:
registry = PipelineRegistry(pool)
runner = PipelineRunner(pool, registry)
# Execute a pipeline
result = await runner.execute(
pipeline_id="reviewiq",
request=ExecutionRequest(
job_id="job-123",
stages=["normalize", "classify"],
)
)
# Get execution status
status = await runner.get_execution("exec-123")
# List executions
executions = await runner.list_executions(pipeline_id="reviewiq")
"""
def __init__(
self,
pool: asyncpg.Pool,
registry: PipelineRegistry,
):
"""
Initialize the runner.
Args:
pool: asyncpg connection pool
registry: Pipeline registry for getting pipeline instances
"""
self._pool = pool
self._registry = registry
async def execute(
self,
pipeline_id: str,
request: ExecutionRequest,
) -> tuple[str, PipelineResult]:
"""
Execute a pipeline.
Args:
pipeline_id: Pipeline identifier
request: Execution request with input data and options
Returns:
Tuple of (execution_id, PipelineResult)
Raises:
ValueError: If pipeline not found or disabled
"""
# Get pipeline instance
pipeline = await self._registry.get_pipeline(pipeline_id)
if not pipeline:
raise ValueError(f"Pipeline not found or disabled: {pipeline_id}")
# Create execution record
execution_id = str(uuid.uuid4())
stages = request.get("stages") or pipeline.get_stage_names()
await self._create_execution(
execution_id=execution_id,
pipeline_id=pipeline_id,
job_id=request.get("job_id"),
business_id=request.get("business_id"),
stages_requested=stages,
)
# Update status to running
await self._update_execution_status(
execution_id=execution_id,
status="running",
started_at=datetime.utcnow(),
)
try:
# Prepare input data
input_data = request.get("input_data") or {}
if request.get("job_id"):
input_data["job_id"] = request["job_id"]
if request.get("business_id"):
input_data["business_id"] = request["business_id"]
# Validate input
validation_errors = await pipeline.validate_input(input_data)
if validation_errors:
error_msg = "; ".join(validation_errors)
await self._update_execution_status(
execution_id=execution_id,
status="failed",
error_message=f"Validation failed: {error_msg}",
completed_at=datetime.utcnow(),
)
return execution_id, PipelineResult.from_error(
pipeline_id, f"Validation failed: {error_msg}"
)
# Execute pipeline
start_time = time.time()
result = await pipeline.process(input_data, stages=stages)
duration_ms = int((time.time() - start_time) * 1000)
# Update execution with result
if result.success:
await self._update_execution_status(
execution_id=execution_id,
status="completed",
stages_completed=result.stages_run,
result_summary=self._summarize_result(result),
completed_at=datetime.utcnow(),
)
else:
await self._update_execution_status(
execution_id=execution_id,
status="failed",
stages_completed=result.stages_run,
error_message=result.error,
completed_at=datetime.utcnow(),
)
logger.info(
f"Pipeline {pipeline_id} execution {execution_id} "
f"completed in {duration_ms}ms: success={result.success}"
)
return execution_id, result
except Exception as e:
logger.exception(f"Pipeline {pipeline_id} execution failed: {e}")
await self._update_execution_status(
execution_id=execution_id,
status="failed",
error_message=str(e),
completed_at=datetime.utcnow(),
)
return execution_id, PipelineResult.from_error(pipeline_id, str(e))
async def cancel(self, execution_id: str) -> bool:
"""
Cancel a running execution.
Args:
execution_id: Execution identifier
Returns:
True if cancelled, False if not found or already completed
"""
async with self._pool.acquire() as conn:
result = await conn.execute(
"""
UPDATE pipeline.executions
SET status = 'cancelled', completed_at = NOW()
WHERE id = $1 AND status IN ('pending', 'running')
""",
uuid.UUID(execution_id),
)
cancelled = result.split()[-1] != "0"
if cancelled:
logger.info(f"Cancelled execution: {execution_id}")
return cancelled
async def get_execution(self, execution_id: str) -> ExecutionStatus | None:
"""
Get execution status.
Args:
execution_id: Execution identifier
Returns:
ExecutionStatus or None if not found
"""
async with self._pool.acquire() as conn:
row = await conn.fetchrow(
"""
SELECT id, pipeline_id, job_id, business_id, status,
stages_requested, stages_completed, current_stage,
input_summary, result_summary, error_message,
started_at, completed_at, created_at
FROM pipeline.executions
WHERE id = $1
""",
uuid.UUID(execution_id),
)
if not row:
return None
return self._row_to_execution_status(row)
async def list_executions(
self,
pipeline_id: str | None = None,
job_id: str | None = None,
business_id: str | None = None,
status: str | None = None,
limit: int = 50,
offset: int = 0,
) -> list[ExecutionStatus]:
"""
List execution history.
Args:
pipeline_id: Filter by pipeline
job_id: Filter by job
business_id: Filter by business
status: Filter by status
limit: Maximum results
offset: Result offset
Returns:
List of ExecutionStatus
"""
conditions = []
params = []
param_idx = 1
if pipeline_id:
conditions.append(f"pipeline_id = ${param_idx}")
params.append(pipeline_id)
param_idx += 1
if job_id:
conditions.append(f"job_id = ${param_idx}")
params.append(uuid.UUID(job_id))
param_idx += 1
if business_id:
conditions.append(f"business_id = ${param_idx}")
params.append(business_id)
param_idx += 1
if status:
conditions.append(f"status = ${param_idx}")
params.append(status)
param_idx += 1
where_clause = "WHERE " + " AND ".join(conditions) if conditions else ""
query = f"""
SELECT id, pipeline_id, job_id, business_id, status,
stages_requested, stages_completed, current_stage,
input_summary, result_summary, error_message,
started_at, completed_at, created_at
FROM pipeline.executions
{where_clause}
ORDER BY created_at DESC
LIMIT ${param_idx} OFFSET ${param_idx + 1}
"""
params.extend([limit, offset])
async with self._pool.acquire() as conn:
rows = await conn.fetch(query, *params)
return [self._row_to_execution_status(row) for row in rows]
async def get_execution_count(
self,
pipeline_id: str | None = None,
status: str | None = None,
) -> int:
"""
Get execution count.
Args:
pipeline_id: Filter by pipeline
status: Filter by status
Returns:
Count of executions matching filters
"""
conditions = []
params = []
param_idx = 1
if pipeline_id:
conditions.append(f"pipeline_id = ${param_idx}")
params.append(pipeline_id)
param_idx += 1
if status:
conditions.append(f"status = ${param_idx}")
params.append(status)
param_idx += 1
where_clause = "WHERE " + " AND ".join(conditions) if conditions else ""
async with self._pool.acquire() as conn:
result = await conn.fetchval(
f"SELECT COUNT(*) FROM pipeline.executions {where_clause}",
*params,
)
return result or 0
async def _create_execution(
self,
execution_id: str,
pipeline_id: str,
job_id: str | None,
business_id: str | None,
stages_requested: list[str],
) -> None:
"""Create an execution record."""
async with self._pool.acquire() as conn:
await conn.execute(
"""
INSERT INTO pipeline.executions (
id, pipeline_id, job_id, business_id,
status, stages_requested, created_at
)
VALUES ($1, $2, $3, $4, 'pending', $5, NOW())
""",
uuid.UUID(execution_id),
pipeline_id,
uuid.UUID(job_id) if job_id else None,
business_id,
stages_requested,
)
async def _update_execution_status(
self,
execution_id: str,
status: str,
current_stage: str | None = None,
stages_completed: list[str] | None = None,
input_summary: dict[str, Any] | None = None,
result_summary: dict[str, Any] | None = None,
error_message: str | None = None,
started_at: datetime | None = None,
completed_at: datetime | None = None,
) -> None:
"""Update execution status."""
updates = ["status = $2"]
params: list[Any] = [uuid.UUID(execution_id), status]
param_idx = 3
if current_stage is not None:
updates.append(f"current_stage = ${param_idx}")
params.append(current_stage)
param_idx += 1
if stages_completed is not None:
updates.append(f"stages_completed = ${param_idx}")
params.append(stages_completed)
param_idx += 1
if input_summary is not None:
updates.append(f"input_summary = ${param_idx}")
params.append(input_summary)
param_idx += 1
if result_summary is not None:
updates.append(f"result_summary = ${param_idx}")
params.append(result_summary)
param_idx += 1
if error_message is not None:
updates.append(f"error_message = ${param_idx}")
params.append(error_message)
param_idx += 1
if started_at is not None:
updates.append(f"started_at = ${param_idx}")
params.append(started_at)
param_idx += 1
if completed_at is not None:
updates.append(f"completed_at = ${param_idx}")
params.append(completed_at)
param_idx += 1
query = f"""
UPDATE pipeline.executions
SET {", ".join(updates)}
WHERE id = $1
"""
async with self._pool.acquire() as conn:
await conn.execute(query, *params)
def _row_to_execution_status(self, row: Any) -> ExecutionStatus:
"""Convert database row to ExecutionStatus."""
# Calculate progress
stages_requested = row["stages_requested"] or []
stages_completed = row["stages_completed"] or []
progress = (
len(stages_completed) / len(stages_requested)
if stages_requested
else 0.0
)
return ExecutionStatus(
id=str(row["id"]),
pipeline_id=row["pipeline_id"],
job_id=str(row["job_id"]) if row["job_id"] else None,
business_id=row["business_id"],
status=row["status"],
stages_requested=stages_requested,
stages_completed=stages_completed,
current_stage=row["current_stage"],
progress=progress,
input_summary=row["input_summary"],
result_summary=row["result_summary"],
error_message=row["error_message"],
started_at=row["started_at"].isoformat() if row["started_at"] else None,
completed_at=row["completed_at"].isoformat() if row["completed_at"] else None,
created_at=row["created_at"].isoformat() if row["created_at"] else None,
)
def _summarize_result(self, result: PipelineResult) -> dict[str, Any]:
"""Create a summary of the pipeline result for storage."""
summary: dict[str, Any] = {
"success": result.success,
"stages_run": result.stages_run,
}
# Add stage-specific summaries
for stage, stage_result in result.stage_results.items():
if stage_result.get("data"):
# Extract stats if available
data = stage_result["data"]
if "stats" in data:
summary[f"{stage}_stats"] = data["stats"]
return summary