feat: Add extensible multi-pipeline integration system
This commit implements a plugin-like pipeline architecture with:
Pipeline Core Package (packages/pipeline-core/):
- BasePipeline abstract class all pipelines implement
- PipelineRegistry for database-backed discovery/management
- PipelineRunner for execution with status tracking
- DashboardConfig contracts for dynamic widget definitions
Database Migration (006_pipeline_registry.sql):
- pipeline.registry table for registered pipelines
- pipeline.executions table for execution history
- Views for execution stats and monitoring
ReviewIQ Pipeline Refactor:
- Implements BasePipeline interface
- Adds get_dashboard_config() with widget definitions
- Adds get_widget_data() methods for all dashboard widgets
- Maintains backward compatibility with Pipeline alias
Generic Pipeline API (api/routes/pipelines.py):
- GET /api/pipelines - List all registered pipelines
- GET /api/pipelines/{id} - Pipeline details
- POST /api/pipelines/{id}/execute - Execute pipeline
- GET /api/pipelines/{id}/dashboard - Dashboard config
- GET /api/pipelines/{id}/widgets/{w} - Widget data
- GET /api/pipelines/{id}/executions - Execution history
Frontend Dynamic Dashboard System:
- DynamicDashboard component renders from config
- WidgetRegistry maps types to components
- Widget components: StatCard, LineChart, BarChart,
PieChart, DataTable, Heatmap
- Pipeline API client library
Frontend Pipeline Pages:
- /pipelines - List all registered pipelines
- /pipelines/[id] - Dynamic dashboard for pipeline
- /pipelines/[id]/executions - Execution history
- Pipelines nav item in Sidebar
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
119
packages/pipeline-core/README.md
Normal file
119
packages/pipeline-core/README.md
Normal file
@@ -0,0 +1,119 @@
|
||||
# Pipeline Core
|
||||
|
||||
Extensible multi-pipeline framework with dynamic dashboards.
|
||||
|
||||
## Overview
|
||||
|
||||
Pipeline Core provides the base abstractions for building pipelines that can be:
|
||||
- Discovered and registered dynamically
|
||||
- Executed with status tracking
|
||||
- Rendered with auto-generated dashboards
|
||||
|
||||
## Features
|
||||
|
||||
- **BasePipeline** - Abstract base class all pipelines implement
|
||||
- **PipelineRegistry** - Database-backed pipeline discovery and management
|
||||
- **PipelineRunner** - Execution with status tracking
|
||||
- **Dashboard Contracts** - TypedDicts for widget configuration
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
pip install -e packages/pipeline-core
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Implementing a Pipeline
|
||||
|
||||
```python
|
||||
from pipeline_core import BasePipeline, PipelineMetadata, DashboardConfig
|
||||
|
||||
class MyPipeline(BasePipeline):
|
||||
@property
|
||||
def metadata(self) -> PipelineMetadata:
|
||||
return {
|
||||
"id": "my-pipeline",
|
||||
"name": "My Pipeline",
|
||||
"description": "Does something useful",
|
||||
"version": "1.0.0",
|
||||
"stages": ["stage1", "stage2"],
|
||||
"input_type": "MyInputType",
|
||||
}
|
||||
|
||||
async def initialize(self) -> None:
|
||||
# Set up connections
|
||||
pass
|
||||
|
||||
async def close(self) -> None:
|
||||
# Clean up
|
||||
pass
|
||||
|
||||
async def process(self, input_data, stages=None):
|
||||
# Run the pipeline
|
||||
pass
|
||||
|
||||
def get_dashboard_config(self) -> DashboardConfig:
|
||||
return {
|
||||
"pipeline_id": "my-pipeline",
|
||||
"title": "My Dashboard",
|
||||
"sections": [...]
|
||||
}
|
||||
|
||||
async def get_widget_data(self, widget_id, params):
|
||||
# Return widget data
|
||||
pass
|
||||
```
|
||||
|
||||
### Registering a Pipeline
|
||||
|
||||
```python
|
||||
from pipeline_core import PipelineRegistry
|
||||
import asyncpg
|
||||
|
||||
pool = await asyncpg.create_pool(database_url)
|
||||
registry = PipelineRegistry(pool)
|
||||
|
||||
await registry.register(
|
||||
pipeline_id="my-pipeline",
|
||||
name="My Pipeline",
|
||||
description="Does something useful",
|
||||
version="1.0.0",
|
||||
module_path="my_package.pipeline:MyPipeline",
|
||||
stages=["stage1", "stage2"],
|
||||
input_type="MyInputType",
|
||||
)
|
||||
```
|
||||
|
||||
### Executing a Pipeline
|
||||
|
||||
```python
|
||||
from pipeline_core import PipelineRunner
|
||||
|
||||
runner = PipelineRunner(pool, registry)
|
||||
|
||||
execution_id, result = await runner.execute(
|
||||
pipeline_id="my-pipeline",
|
||||
request={
|
||||
"input_data": {"key": "value"},
|
||||
"stages": ["stage1"],
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
## Dashboard Widgets
|
||||
|
||||
Pipelines declare dashboard widgets via `get_dashboard_config()`. Available widget types:
|
||||
|
||||
- `stat_card` - KPI stat card with value and trend
|
||||
- `line_chart` - Time series line chart
|
||||
- `bar_chart` - Bar chart (horizontal or vertical)
|
||||
- `pie_chart` - Pie/donut chart
|
||||
- `table` - Data table with columns
|
||||
- `heatmap` - Heatmap grid visualization
|
||||
- `area_chart` - Stacked area chart
|
||||
- `gauge` - Gauge/meter visualization
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
65
packages/pipeline-core/pyproject.toml
Normal file
65
packages/pipeline-core/pyproject.toml
Normal file
@@ -0,0 +1,65 @@
|
||||
[build-system]
|
||||
requires = ["hatchling"]
|
||||
build-backend = "hatchling.build"
|
||||
|
||||
[project]
|
||||
name = "pipeline-core"
|
||||
version = "0.1.0"
|
||||
description = "Pipeline Core - Extensible multi-pipeline framework with dynamic dashboards"
|
||||
readme = "README.md"
|
||||
license = "MIT"
|
||||
requires-python = ">=3.11"
|
||||
authors = [
|
||||
{ name = "ReviewIQ Team" }
|
||||
]
|
||||
keywords = ["pipeline", "framework", "dashboard", "registry"]
|
||||
classifiers = [
|
||||
"Development Status :: 3 - Alpha",
|
||||
"Intended Audience :: Developers",
|
||||
"License :: OSI Approved :: MIT License",
|
||||
"Programming Language :: Python :: 3",
|
||||
"Programming Language :: Python :: 3.11",
|
||||
"Programming Language :: Python :: 3.12",
|
||||
]
|
||||
|
||||
dependencies = [
|
||||
"asyncpg>=0.28.0",
|
||||
"pydantic>=2.0",
|
||||
"pydantic-settings>=2.0",
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
dev = [
|
||||
"pytest>=7.0",
|
||||
"pytest-asyncio>=0.21.0",
|
||||
"pytest-cov>=4.0",
|
||||
"ruff>=0.1.0",
|
||||
"mypy>=1.0",
|
||||
]
|
||||
|
||||
[project.urls]
|
||||
Homepage = "https://github.com/reviewiq/pipeline-core"
|
||||
Documentation = "https://github.com/reviewiq/pipeline-core#readme"
|
||||
Repository = "https://github.com/reviewiq/pipeline-core"
|
||||
|
||||
[tool.hatch.build.targets.wheel]
|
||||
packages = ["src/pipeline_core"]
|
||||
|
||||
[tool.pytest.ini_options]
|
||||
asyncio_mode = "auto"
|
||||
testpaths = ["tests"]
|
||||
pythonpath = ["src"]
|
||||
|
||||
[tool.ruff]
|
||||
line-length = 100
|
||||
target-version = "py311"
|
||||
|
||||
[tool.ruff.lint]
|
||||
select = ["E", "F", "I", "N", "W", "UP"]
|
||||
ignore = ["E501"]
|
||||
|
||||
[tool.mypy]
|
||||
python_version = "3.11"
|
||||
strict = true
|
||||
warn_return_any = true
|
||||
warn_unused_ignores = true
|
||||
34
packages/pipeline-core/src/pipeline_core/__init__.py
Normal file
34
packages/pipeline-core/src/pipeline_core/__init__.py
Normal file
@@ -0,0 +1,34 @@
|
||||
"""
|
||||
Pipeline Core - Extensible multi-pipeline framework with dynamic dashboards.
|
||||
|
||||
This package provides the base abstractions for building pipelines that can be
|
||||
discovered, registered, and rendered with dynamic dashboards.
|
||||
"""
|
||||
|
||||
from pipeline_core.base import BasePipeline, PipelineMetadata, PipelineResult
|
||||
from pipeline_core.contracts import (
|
||||
DashboardConfig,
|
||||
DashboardSection,
|
||||
WidgetConfig,
|
||||
WidgetType,
|
||||
)
|
||||
from pipeline_core.registry import PipelineRegistry
|
||||
from pipeline_core.runner import PipelineRunner
|
||||
|
||||
__version__ = "0.1.0"
|
||||
|
||||
__all__ = [
|
||||
# Base classes
|
||||
"BasePipeline",
|
||||
"PipelineMetadata",
|
||||
"PipelineResult",
|
||||
# Contracts
|
||||
"DashboardConfig",
|
||||
"DashboardSection",
|
||||
"WidgetConfig",
|
||||
"WidgetType",
|
||||
# Registry
|
||||
"PipelineRegistry",
|
||||
# Runner
|
||||
"PipelineRunner",
|
||||
]
|
||||
263
packages/pipeline-core/src/pipeline_core/base.py
Normal file
263
packages/pipeline-core/src/pipeline_core/base.py
Normal file
@@ -0,0 +1,263 @@
|
||||
"""
|
||||
Base Pipeline abstract class and related types.
|
||||
|
||||
All pipelines must implement this interface to be compatible with the
|
||||
pipeline registry, runner, and dynamic dashboard system.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
from typing import Any, TypedDict
|
||||
|
||||
from pipeline_core.contracts import DashboardConfig
|
||||
|
||||
|
||||
class PipelineMetadata(TypedDict):
|
||||
"""Metadata describing a pipeline."""
|
||||
|
||||
id: str # Unique pipeline identifier (e.g., "reviewiq")
|
||||
name: str # Display name (e.g., "ReviewIQ Classification Pipeline")
|
||||
description: str # Human-readable description
|
||||
version: str # Semantic version (e.g., "1.0.0")
|
||||
stages: list[str] # Ordered list of stage names
|
||||
input_type: str # Expected input type (e.g., "ScraperV1Output")
|
||||
|
||||
|
||||
class StageResult(TypedDict, total=False):
|
||||
"""Result from running a single pipeline stage."""
|
||||
|
||||
stage: str # Stage name
|
||||
success: bool # Whether the stage succeeded
|
||||
data: dict[str, Any] # Stage output data
|
||||
error: str | None # Error message if failed
|
||||
duration_ms: int # Stage execution time
|
||||
|
||||
|
||||
class PipelineResult:
|
||||
"""Result from running a pipeline."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
pipeline_id: str,
|
||||
stages_run: list[str] | None = None,
|
||||
stage_results: dict[str, StageResult] | None = None,
|
||||
success: bool = True,
|
||||
error: str | None = None,
|
||||
):
|
||||
"""
|
||||
Initialize pipeline result.
|
||||
|
||||
Args:
|
||||
pipeline_id: Pipeline identifier
|
||||
stages_run: List of stages that were run
|
||||
stage_results: Results from each stage
|
||||
success: Overall success status
|
||||
error: Error message if failed
|
||||
"""
|
||||
self.pipeline_id = pipeline_id
|
||||
self.stages_run = stages_run or []
|
||||
self.stage_results = stage_results or {}
|
||||
self.success = success
|
||||
self.error = error
|
||||
|
||||
def get_stage_result(self, stage: str) -> StageResult | None:
|
||||
"""Get result for a specific stage."""
|
||||
return self.stage_results.get(stage)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
"""Convert to dictionary."""
|
||||
return {
|
||||
"pipeline_id": self.pipeline_id,
|
||||
"stages_run": self.stages_run,
|
||||
"stage_results": self.stage_results,
|
||||
"success": self.success,
|
||||
"error": self.error,
|
||||
}
|
||||
|
||||
@classmethod
|
||||
def from_error(cls, pipeline_id: str, error: str) -> PipelineResult:
|
||||
"""Create a failed result from an error."""
|
||||
return cls(
|
||||
pipeline_id=pipeline_id,
|
||||
success=False,
|
||||
error=error,
|
||||
)
|
||||
|
||||
|
||||
class BasePipeline(ABC):
|
||||
"""
|
||||
Abstract base class for all pipelines.
|
||||
|
||||
All pipelines must implement this interface to be compatible with:
|
||||
- Pipeline registry (discovery and management)
|
||||
- Pipeline runner (execution)
|
||||
- Dynamic dashboard system (widget configuration and data)
|
||||
|
||||
Example implementation:
|
||||
|
||||
class ReviewIQPipeline(BasePipeline):
|
||||
@property
|
||||
def metadata(self) -> PipelineMetadata:
|
||||
return {
|
||||
"id": "reviewiq",
|
||||
"name": "ReviewIQ Classification Pipeline",
|
||||
"description": "Classifies reviews using URT taxonomy",
|
||||
"version": "1.0.0",
|
||||
"stages": ["normalize", "classify", "route", "aggregate"],
|
||||
"input_type": "ScraperV1Output",
|
||||
}
|
||||
|
||||
async def initialize(self) -> None:
|
||||
# Set up database connections, etc.
|
||||
pass
|
||||
|
||||
async def process(self, input_data, stages=None) -> PipelineResult:
|
||||
# Run the pipeline
|
||||
pass
|
||||
|
||||
def get_dashboard_config(self) -> DashboardConfig:
|
||||
return {
|
||||
"pipeline_id": "reviewiq",
|
||||
"title": "ReviewIQ Analytics",
|
||||
"sections": [...]
|
||||
}
|
||||
|
||||
async def get_widget_data(self, widget_id, params) -> dict:
|
||||
# Return data for a specific widget
|
||||
pass
|
||||
"""
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def metadata(self) -> PipelineMetadata:
|
||||
"""
|
||||
Get pipeline metadata.
|
||||
|
||||
Returns:
|
||||
PipelineMetadata with id, name, description, version, stages, input_type
|
||||
"""
|
||||
...
|
||||
|
||||
@abstractmethod
|
||||
async def initialize(self) -> None:
|
||||
"""
|
||||
Initialize the pipeline.
|
||||
|
||||
This is called before any processing. Use it to:
|
||||
- Establish database connections
|
||||
- Load configuration
|
||||
- Initialize services
|
||||
|
||||
This method may be called multiple times but should be idempotent.
|
||||
"""
|
||||
...
|
||||
|
||||
@abstractmethod
|
||||
async def close(self) -> None:
|
||||
"""
|
||||
Close and cleanup pipeline resources.
|
||||
|
||||
This is called when the pipeline is no longer needed. Use it to:
|
||||
- Close database connections
|
||||
- Release resources
|
||||
- Cleanup temporary files
|
||||
"""
|
||||
...
|
||||
|
||||
@abstractmethod
|
||||
async def process(
|
||||
self,
|
||||
input_data: dict[str, Any],
|
||||
stages: list[str] | None = None,
|
||||
) -> PipelineResult:
|
||||
"""
|
||||
Process input data through the pipeline.
|
||||
|
||||
Args:
|
||||
input_data: Input data dictionary (format depends on input_type)
|
||||
stages: List of stages to run (default: all stages)
|
||||
|
||||
Returns:
|
||||
PipelineResult with stage outputs and validation results
|
||||
"""
|
||||
...
|
||||
|
||||
@abstractmethod
|
||||
def get_dashboard_config(self) -> DashboardConfig:
|
||||
"""
|
||||
Get the dashboard configuration for this pipeline.
|
||||
|
||||
Returns:
|
||||
DashboardConfig with sections and widget definitions
|
||||
|
||||
The frontend uses this configuration to dynamically render
|
||||
the pipeline's dashboard with appropriate widgets.
|
||||
"""
|
||||
...
|
||||
|
||||
@abstractmethod
|
||||
async def get_widget_data(
|
||||
self,
|
||||
widget_id: str,
|
||||
params: dict[str, Any],
|
||||
) -> dict[str, Any]:
|
||||
"""
|
||||
Get data for a specific dashboard widget.
|
||||
|
||||
Args:
|
||||
widget_id: Widget identifier (from dashboard config)
|
||||
params: Query parameters (e.g., time range, filters)
|
||||
|
||||
Returns:
|
||||
Dictionary with widget data in the format expected by the widget type
|
||||
|
||||
Common params:
|
||||
- business_id: Filter by business
|
||||
- time_range: Time range (e.g., "7d", "30d", "custom")
|
||||
- start_date: Start date for custom range
|
||||
- end_date: End date for custom range
|
||||
"""
|
||||
...
|
||||
|
||||
# Optional methods with default implementations
|
||||
|
||||
async def validate_input(self, input_data: dict[str, Any]) -> list[str]:
|
||||
"""
|
||||
Validate input data before processing.
|
||||
|
||||
Args:
|
||||
input_data: Input data to validate
|
||||
|
||||
Returns:
|
||||
List of validation error messages (empty if valid)
|
||||
|
||||
Override this to add custom input validation.
|
||||
"""
|
||||
return []
|
||||
|
||||
async def health_check(self) -> dict[str, Any]:
|
||||
"""
|
||||
Perform a health check on the pipeline.
|
||||
|
||||
Returns:
|
||||
Dictionary with health status:
|
||||
- healthy: bool
|
||||
- checks: dict of individual check results
|
||||
- message: optional message
|
||||
|
||||
Override this to add custom health checks.
|
||||
"""
|
||||
return {
|
||||
"healthy": True,
|
||||
"checks": {},
|
||||
"message": None,
|
||||
}
|
||||
|
||||
def get_stage_names(self) -> list[str]:
|
||||
"""Get the list of stage names."""
|
||||
return self.metadata["stages"]
|
||||
|
||||
def get_pipeline_id(self) -> str:
|
||||
"""Get the pipeline identifier."""
|
||||
return self.metadata["id"]
|
||||
252
packages/pipeline-core/src/pipeline_core/contracts.py
Normal file
252
packages/pipeline-core/src/pipeline_core/contracts.py
Normal file
@@ -0,0 +1,252 @@
|
||||
"""
|
||||
Dashboard and Widget contracts for the pipeline system.
|
||||
|
||||
These TypedDicts define the data structures for dynamic dashboard configuration,
|
||||
allowing pipelines to declare their dashboard widgets which the frontend renders.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from typing import Any, Literal, TypedDict
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Widget Types
|
||||
# =============================================================================
|
||||
|
||||
WidgetType = Literal[
|
||||
"stat_card", # KPI stat card with value and optional trend
|
||||
"line_chart", # Time series line chart
|
||||
"bar_chart", # Bar chart (horizontal or vertical)
|
||||
"pie_chart", # Pie/donut chart
|
||||
"table", # Data table with columns
|
||||
"heatmap", # Heatmap grid visualization
|
||||
"area_chart", # Stacked area chart
|
||||
"gauge", # Gauge/meter visualization
|
||||
]
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Widget Configuration
|
||||
# =============================================================================
|
||||
|
||||
|
||||
class GridPosition(TypedDict):
|
||||
"""Grid position for a widget in the dashboard layout."""
|
||||
|
||||
x: int # Column position (0-based)
|
||||
y: int # Row position (0-based)
|
||||
w: int # Width in grid units
|
||||
h: int # Height in grid units
|
||||
|
||||
|
||||
class StatCardConfig(TypedDict, total=False):
|
||||
"""Configuration specific to stat card widgets."""
|
||||
|
||||
value_key: str # Key in data for the main value
|
||||
label: str # Label to display
|
||||
format: str # Format string (e.g., "{value:,}", "{value:.1%}")
|
||||
trend_key: str | None # Key for trend value (optional)
|
||||
trend_format: str | None # Format for trend (e.g., "+{value:.1%}")
|
||||
icon: str | None # Icon name (optional)
|
||||
color: str | None # Color theme (e.g., "blue", "green", "red")
|
||||
|
||||
|
||||
class ChartAxisConfig(TypedDict, total=False):
|
||||
"""Configuration for chart axes."""
|
||||
|
||||
key: str # Data key for this axis
|
||||
label: str # Axis label
|
||||
type: Literal["number", "category", "time"]
|
||||
format: str | None # Format string
|
||||
|
||||
|
||||
class ChartSeriesConfig(TypedDict, total=False):
|
||||
"""Configuration for a chart data series."""
|
||||
|
||||
key: str # Data key
|
||||
name: str # Display name
|
||||
color: str | None # Series color
|
||||
type: Literal["line", "bar", "area"] | None
|
||||
|
||||
|
||||
class ChartConfig(TypedDict, total=False):
|
||||
"""Configuration for chart widgets (line, bar, area)."""
|
||||
|
||||
x_axis: ChartAxisConfig
|
||||
y_axis: ChartAxisConfig
|
||||
series: list[ChartSeriesConfig]
|
||||
stacked: bool
|
||||
show_legend: bool
|
||||
show_grid: bool
|
||||
|
||||
|
||||
class PieChartConfig(TypedDict, total=False):
|
||||
"""Configuration for pie/donut chart widgets."""
|
||||
|
||||
value_key: str # Key for segment value
|
||||
label_key: str # Key for segment label
|
||||
colors: list[str] | None # Custom color palette
|
||||
show_legend: bool
|
||||
show_labels: bool
|
||||
inner_radius: int | None # For donut chart (0 = pie)
|
||||
|
||||
|
||||
class TableColumnConfig(TypedDict, total=False):
|
||||
"""Configuration for a table column."""
|
||||
|
||||
key: str # Data key
|
||||
header: str # Column header
|
||||
width: int | None # Column width
|
||||
align: Literal["left", "center", "right"]
|
||||
format: str | None # Format string
|
||||
sortable: bool
|
||||
|
||||
|
||||
class TableConfig(TypedDict, total=False):
|
||||
"""Configuration for table widgets."""
|
||||
|
||||
columns: list[TableColumnConfig]
|
||||
row_key: str # Key for unique row identifier
|
||||
page_size: int
|
||||
show_pagination: bool
|
||||
sortable: bool
|
||||
filterable: bool
|
||||
|
||||
|
||||
class HeatmapConfig(TypedDict, total=False):
|
||||
"""Configuration for heatmap widgets."""
|
||||
|
||||
x_key: str # Key for x-axis categories
|
||||
y_key: str # Key for y-axis categories
|
||||
value_key: str # Key for cell values
|
||||
color_scale: list[str] # Color gradient
|
||||
show_values: bool
|
||||
format: str | None # Format for values
|
||||
|
||||
|
||||
class GaugeConfig(TypedDict, total=False):
|
||||
"""Configuration for gauge widgets."""
|
||||
|
||||
value_key: str # Key for gauge value
|
||||
min: float # Minimum value
|
||||
max: float # Maximum value
|
||||
thresholds: list[dict[str, Any]] # Color thresholds
|
||||
format: str | None # Format string
|
||||
|
||||
|
||||
# Union of all widget-specific configs
|
||||
WidgetSpecificConfig = (
|
||||
StatCardConfig
|
||||
| ChartConfig
|
||||
| PieChartConfig
|
||||
| TableConfig
|
||||
| HeatmapConfig
|
||||
| GaugeConfig
|
||||
)
|
||||
|
||||
|
||||
class WidgetConfig(TypedDict, total=False):
|
||||
"""Configuration for a dashboard widget."""
|
||||
|
||||
id: str # Unique widget identifier
|
||||
type: WidgetType # Widget type
|
||||
title: str # Widget title
|
||||
grid: GridPosition # Grid position and size
|
||||
config: dict[str, Any] # Widget-specific configuration
|
||||
data_endpoint: str | None # Custom data endpoint (if not default)
|
||||
refresh_interval: int | None # Auto-refresh interval in seconds
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Dashboard Configuration
|
||||
# =============================================================================
|
||||
|
||||
|
||||
class DashboardSection(TypedDict):
|
||||
"""A section in the dashboard containing widgets."""
|
||||
|
||||
id: str # Unique section identifier
|
||||
title: str # Section title
|
||||
description: str | None # Optional description
|
||||
widgets: list[WidgetConfig]
|
||||
collapsed: bool | None # Whether section is collapsed by default
|
||||
|
||||
|
||||
class DashboardConfig(TypedDict):
|
||||
"""Full dashboard configuration for a pipeline."""
|
||||
|
||||
pipeline_id: str # Pipeline identifier
|
||||
title: str # Dashboard title
|
||||
description: str | None # Optional description
|
||||
sections: list[DashboardSection]
|
||||
default_time_range: str | None # Default time range (e.g., "7d", "30d")
|
||||
refresh_interval: int | None # Global refresh interval in seconds
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Execution Types
|
||||
# =============================================================================
|
||||
|
||||
|
||||
class ExecutionStatus(TypedDict, total=False):
|
||||
"""Status of a pipeline execution."""
|
||||
|
||||
id: str # Execution ID
|
||||
pipeline_id: str # Pipeline identifier
|
||||
job_id: str | None # Associated job ID
|
||||
business_id: str | None # Business identifier
|
||||
status: Literal["pending", "running", "completed", "failed", "cancelled"]
|
||||
stages_requested: list[str]
|
||||
stages_completed: list[str]
|
||||
current_stage: str | None
|
||||
progress: float # 0.0 to 1.0
|
||||
input_summary: dict[str, Any] | None
|
||||
result_summary: dict[str, Any] | None
|
||||
error_message: str | None
|
||||
started_at: str | None
|
||||
completed_at: str | None
|
||||
created_at: str
|
||||
|
||||
|
||||
class ExecutionRequest(TypedDict, total=False):
|
||||
"""Request to execute a pipeline."""
|
||||
|
||||
job_id: str | None # Job ID to process
|
||||
business_id: str | None # Business identifier
|
||||
input_data: dict[str, Any] | None # Direct input data
|
||||
stages: list[str] | None # Stages to run (default: all)
|
||||
options: dict[str, Any] | None # Pipeline-specific options
|
||||
|
||||
|
||||
# =============================================================================
|
||||
# Pipeline Info Types
|
||||
# =============================================================================
|
||||
|
||||
|
||||
class PipelineInfo(TypedDict):
|
||||
"""Summary information about a pipeline."""
|
||||
|
||||
id: str # Pipeline ID (e.g., "reviewiq")
|
||||
name: str # Display name
|
||||
description: str
|
||||
version: str
|
||||
is_enabled: bool
|
||||
stages: list[str] # Available stages
|
||||
input_type: str # Expected input type
|
||||
|
||||
|
||||
class PipelineDetail(TypedDict):
|
||||
"""Detailed pipeline information including metadata."""
|
||||
|
||||
id: str
|
||||
name: str
|
||||
description: str
|
||||
version: str
|
||||
is_enabled: bool
|
||||
stages: list[str]
|
||||
input_type: str
|
||||
module_path: str
|
||||
config: dict[str, Any] | None
|
||||
created_at: str
|
||||
updated_at: str
|
||||
455
packages/pipeline-core/src/pipeline_core/registry.py
Normal file
455
packages/pipeline-core/src/pipeline_core/registry.py
Normal file
@@ -0,0 +1,455 @@
|
||||
"""
|
||||
Pipeline Registry - Database-backed discovery and management of pipelines.
|
||||
|
||||
The registry maintains a list of registered pipelines and their metadata,
|
||||
allowing the system to discover available pipelines and instantiate them.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import importlib
|
||||
import logging
|
||||
from datetime import datetime
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from pipeline_core.contracts import PipelineDetail, PipelineInfo
|
||||
|
||||
if TYPE_CHECKING:
|
||||
import asyncpg
|
||||
|
||||
from pipeline_core.base import BasePipeline
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class PipelineRegistry:
|
||||
"""
|
||||
Database-backed registry for pipeline discovery and management.
|
||||
|
||||
The registry stores pipeline metadata in a PostgreSQL table and provides
|
||||
methods to register, list, and instantiate pipelines.
|
||||
|
||||
Usage:
|
||||
pool = await asyncpg.create_pool(database_url)
|
||||
registry = PipelineRegistry(pool)
|
||||
|
||||
# Register a pipeline
|
||||
await registry.register(
|
||||
pipeline_id="reviewiq",
|
||||
name="ReviewIQ Pipeline",
|
||||
description="Classifies reviews",
|
||||
version="1.0.0",
|
||||
module_path="reviewiq_pipeline.pipeline:ReviewIQPipeline",
|
||||
)
|
||||
|
||||
# List pipelines
|
||||
pipelines = await registry.list_pipelines()
|
||||
|
||||
# Get a pipeline instance
|
||||
pipeline = await registry.get_pipeline("reviewiq")
|
||||
"""
|
||||
|
||||
def __init__(self, pool: asyncpg.Pool):
|
||||
"""
|
||||
Initialize the registry.
|
||||
|
||||
Args:
|
||||
pool: asyncpg connection pool
|
||||
"""
|
||||
self._pool = pool
|
||||
self._instances: dict[str, BasePipeline] = {}
|
||||
|
||||
async def register(
|
||||
self,
|
||||
pipeline_id: str,
|
||||
name: str,
|
||||
description: str,
|
||||
version: str,
|
||||
module_path: str,
|
||||
stages: list[str],
|
||||
input_type: str,
|
||||
config: dict[str, Any] | None = None,
|
||||
is_enabled: bool = True,
|
||||
) -> None:
|
||||
"""
|
||||
Register a pipeline in the database.
|
||||
|
||||
Args:
|
||||
pipeline_id: Unique pipeline identifier
|
||||
name: Display name
|
||||
description: Human-readable description
|
||||
version: Semantic version
|
||||
module_path: Python module path (e.g., "package.module:ClassName")
|
||||
stages: List of stage names
|
||||
input_type: Expected input type
|
||||
config: Optional pipeline configuration
|
||||
is_enabled: Whether the pipeline is enabled
|
||||
|
||||
Raises:
|
||||
ValueError: If module_path is invalid
|
||||
"""
|
||||
# Validate module path format
|
||||
if ":" not in module_path:
|
||||
raise ValueError(
|
||||
f"Invalid module_path: {module_path}. "
|
||||
"Expected format: 'package.module:ClassName'"
|
||||
)
|
||||
|
||||
async with self._pool.acquire() as conn:
|
||||
await conn.execute(
|
||||
"""
|
||||
INSERT INTO pipeline.registry (
|
||||
pipeline_id, name, description, version, module_path,
|
||||
stages, input_type, config, is_enabled, updated_at
|
||||
)
|
||||
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, NOW())
|
||||
ON CONFLICT (pipeline_id) DO UPDATE SET
|
||||
name = EXCLUDED.name,
|
||||
description = EXCLUDED.description,
|
||||
version = EXCLUDED.version,
|
||||
module_path = EXCLUDED.module_path,
|
||||
stages = EXCLUDED.stages,
|
||||
input_type = EXCLUDED.input_type,
|
||||
config = EXCLUDED.config,
|
||||
is_enabled = EXCLUDED.is_enabled,
|
||||
updated_at = NOW()
|
||||
""",
|
||||
pipeline_id,
|
||||
name,
|
||||
description,
|
||||
version,
|
||||
module_path,
|
||||
stages,
|
||||
input_type,
|
||||
config,
|
||||
is_enabled,
|
||||
)
|
||||
|
||||
logger.info(f"Registered pipeline: {pipeline_id} v{version}")
|
||||
|
||||
async def register_from_instance(
|
||||
self,
|
||||
pipeline: BasePipeline,
|
||||
module_path: str,
|
||||
config: dict[str, Any] | None = None,
|
||||
) -> None:
|
||||
"""
|
||||
Register a pipeline from an instance.
|
||||
|
||||
Args:
|
||||
pipeline: Pipeline instance
|
||||
module_path: Python module path
|
||||
config: Optional configuration
|
||||
"""
|
||||
metadata = pipeline.metadata
|
||||
await self.register(
|
||||
pipeline_id=metadata["id"],
|
||||
name=metadata["name"],
|
||||
description=metadata["description"],
|
||||
version=metadata["version"],
|
||||
module_path=module_path,
|
||||
stages=metadata["stages"],
|
||||
input_type=metadata["input_type"],
|
||||
config=config,
|
||||
)
|
||||
# Cache the instance
|
||||
self._instances[metadata["id"]] = pipeline
|
||||
|
||||
async def unregister(self, pipeline_id: str) -> bool:
|
||||
"""
|
||||
Unregister a pipeline from the database.
|
||||
|
||||
Args:
|
||||
pipeline_id: Pipeline identifier to remove
|
||||
|
||||
Returns:
|
||||
True if pipeline was removed, False if not found
|
||||
"""
|
||||
async with self._pool.acquire() as conn:
|
||||
result = await conn.execute(
|
||||
"DELETE FROM pipeline.registry WHERE pipeline_id = $1",
|
||||
pipeline_id,
|
||||
)
|
||||
|
||||
# Remove from cache
|
||||
self._instances.pop(pipeline_id, None)
|
||||
|
||||
deleted = result.split()[-1] != "0"
|
||||
if deleted:
|
||||
logger.info(f"Unregistered pipeline: {pipeline_id}")
|
||||
return deleted
|
||||
|
||||
async def set_enabled(self, pipeline_id: str, enabled: bool) -> bool:
|
||||
"""
|
||||
Enable or disable a pipeline.
|
||||
|
||||
Args:
|
||||
pipeline_id: Pipeline identifier
|
||||
enabled: Whether to enable or disable
|
||||
|
||||
Returns:
|
||||
True if pipeline was updated, False if not found
|
||||
"""
|
||||
async with self._pool.acquire() as conn:
|
||||
result = await conn.execute(
|
||||
"""
|
||||
UPDATE pipeline.registry
|
||||
SET is_enabled = $2, updated_at = NOW()
|
||||
WHERE pipeline_id = $1
|
||||
""",
|
||||
pipeline_id,
|
||||
enabled,
|
||||
)
|
||||
|
||||
updated = result.split()[-1] != "0"
|
||||
if updated:
|
||||
logger.info(f"Set pipeline {pipeline_id} enabled={enabled}")
|
||||
return updated
|
||||
|
||||
async def list_pipelines(
|
||||
self,
|
||||
enabled_only: bool = True,
|
||||
) -> list[PipelineInfo]:
|
||||
"""
|
||||
List all registered pipelines.
|
||||
|
||||
Args:
|
||||
enabled_only: Only return enabled pipelines
|
||||
|
||||
Returns:
|
||||
List of PipelineInfo dictionaries
|
||||
"""
|
||||
async with self._pool.acquire() as conn:
|
||||
if enabled_only:
|
||||
rows = await conn.fetch(
|
||||
"""
|
||||
SELECT pipeline_id, name, description, version,
|
||||
is_enabled, stages, input_type
|
||||
FROM pipeline.registry
|
||||
WHERE is_enabled = TRUE
|
||||
ORDER BY name
|
||||
"""
|
||||
)
|
||||
else:
|
||||
rows = await conn.fetch(
|
||||
"""
|
||||
SELECT pipeline_id, name, description, version,
|
||||
is_enabled, stages, input_type
|
||||
FROM pipeline.registry
|
||||
ORDER BY name
|
||||
"""
|
||||
)
|
||||
|
||||
return [
|
||||
PipelineInfo(
|
||||
id=row["pipeline_id"],
|
||||
name=row["name"],
|
||||
description=row["description"],
|
||||
version=row["version"],
|
||||
is_enabled=row["is_enabled"],
|
||||
stages=row["stages"],
|
||||
input_type=row["input_type"],
|
||||
)
|
||||
for row in rows
|
||||
]
|
||||
|
||||
async def get_pipeline_detail(
|
||||
self,
|
||||
pipeline_id: str,
|
||||
) -> PipelineDetail | None:
|
||||
"""
|
||||
Get detailed information about a pipeline.
|
||||
|
||||
Args:
|
||||
pipeline_id: Pipeline identifier
|
||||
|
||||
Returns:
|
||||
PipelineDetail or None if not found
|
||||
"""
|
||||
async with self._pool.acquire() as conn:
|
||||
row = await conn.fetchrow(
|
||||
"""
|
||||
SELECT pipeline_id, name, description, version, module_path,
|
||||
is_enabled, stages, input_type, config,
|
||||
created_at, updated_at
|
||||
FROM pipeline.registry
|
||||
WHERE pipeline_id = $1
|
||||
""",
|
||||
pipeline_id,
|
||||
)
|
||||
|
||||
if not row:
|
||||
return None
|
||||
|
||||
return PipelineDetail(
|
||||
id=row["pipeline_id"],
|
||||
name=row["name"],
|
||||
description=row["description"],
|
||||
version=row["version"],
|
||||
is_enabled=row["is_enabled"],
|
||||
stages=row["stages"],
|
||||
input_type=row["input_type"],
|
||||
module_path=row["module_path"],
|
||||
config=row["config"],
|
||||
created_at=row["created_at"].isoformat() if row["created_at"] else None,
|
||||
updated_at=row["updated_at"].isoformat() if row["updated_at"] else None,
|
||||
)
|
||||
|
||||
async def get_pipeline(
|
||||
self,
|
||||
pipeline_id: str,
|
||||
initialize: bool = True,
|
||||
) -> BasePipeline | None:
|
||||
"""
|
||||
Get a pipeline instance.
|
||||
|
||||
Args:
|
||||
pipeline_id: Pipeline identifier
|
||||
initialize: Whether to call initialize() on the pipeline
|
||||
|
||||
Returns:
|
||||
Pipeline instance or None if not found
|
||||
|
||||
This method caches pipeline instances for reuse.
|
||||
"""
|
||||
# Check cache first
|
||||
if pipeline_id in self._instances:
|
||||
return self._instances[pipeline_id]
|
||||
|
||||
# Get pipeline details from database
|
||||
detail = await self.get_pipeline_detail(pipeline_id)
|
||||
if not detail or not detail["is_enabled"]:
|
||||
return None
|
||||
|
||||
# Import and instantiate the pipeline
|
||||
try:
|
||||
pipeline = self._import_pipeline(detail["module_path"])
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to import pipeline {pipeline_id}: {e}")
|
||||
return None
|
||||
|
||||
# Initialize if requested
|
||||
if initialize:
|
||||
try:
|
||||
await pipeline.initialize()
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to initialize pipeline {pipeline_id}: {e}")
|
||||
return None
|
||||
|
||||
# Cache and return
|
||||
self._instances[pipeline_id] = pipeline
|
||||
return pipeline
|
||||
|
||||
def _import_pipeline(self, module_path: str) -> BasePipeline:
|
||||
"""
|
||||
Import a pipeline class from a module path.
|
||||
|
||||
Args:
|
||||
module_path: Path in format "package.module:ClassName"
|
||||
|
||||
Returns:
|
||||
Pipeline instance
|
||||
"""
|
||||
module_name, class_name = module_path.rsplit(":", 1)
|
||||
module = importlib.import_module(module_name)
|
||||
cls = getattr(module, class_name)
|
||||
return cls()
|
||||
|
||||
async def close_all(self) -> None:
|
||||
"""Close all cached pipeline instances."""
|
||||
for pipeline_id, pipeline in self._instances.items():
|
||||
try:
|
||||
await pipeline.close()
|
||||
except Exception as e:
|
||||
logger.error(f"Error closing pipeline {pipeline_id}: {e}")
|
||||
|
||||
self._instances.clear()
|
||||
|
||||
|
||||
class InMemoryPipelineRegistry:
|
||||
"""
|
||||
In-memory pipeline registry for testing and simple deployments.
|
||||
|
||||
This registry doesn't persist to a database and stores pipelines in memory.
|
||||
"""
|
||||
|
||||
def __init__(self):
|
||||
self._pipelines: dict[str, BasePipeline] = {}
|
||||
self._enabled: dict[str, bool] = {}
|
||||
|
||||
async def register(self, pipeline: BasePipeline) -> None:
|
||||
"""Register a pipeline instance."""
|
||||
pipeline_id = pipeline.metadata["id"]
|
||||
self._pipelines[pipeline_id] = pipeline
|
||||
self._enabled[pipeline_id] = True
|
||||
logger.info(f"Registered pipeline: {pipeline_id}")
|
||||
|
||||
async def unregister(self, pipeline_id: str) -> bool:
|
||||
"""Unregister a pipeline."""
|
||||
if pipeline_id in self._pipelines:
|
||||
del self._pipelines[pipeline_id]
|
||||
del self._enabled[pipeline_id]
|
||||
return True
|
||||
return False
|
||||
|
||||
async def set_enabled(self, pipeline_id: str, enabled: bool) -> bool:
|
||||
"""Enable or disable a pipeline."""
|
||||
if pipeline_id in self._enabled:
|
||||
self._enabled[pipeline_id] = enabled
|
||||
return True
|
||||
return False
|
||||
|
||||
async def list_pipelines(
|
||||
self,
|
||||
enabled_only: bool = True,
|
||||
) -> list[PipelineInfo]:
|
||||
"""List all registered pipelines."""
|
||||
result = []
|
||||
for pipeline_id, pipeline in self._pipelines.items():
|
||||
is_enabled = self._enabled.get(pipeline_id, True)
|
||||
if enabled_only and not is_enabled:
|
||||
continue
|
||||
|
||||
metadata = pipeline.metadata
|
||||
result.append(
|
||||
PipelineInfo(
|
||||
id=pipeline_id,
|
||||
name=metadata["name"],
|
||||
description=metadata["description"],
|
||||
version=metadata["version"],
|
||||
is_enabled=is_enabled,
|
||||
stages=metadata["stages"],
|
||||
input_type=metadata["input_type"],
|
||||
)
|
||||
)
|
||||
return result
|
||||
|
||||
async def get_pipeline(
|
||||
self,
|
||||
pipeline_id: str,
|
||||
initialize: bool = True,
|
||||
) -> BasePipeline | None:
|
||||
"""Get a pipeline instance."""
|
||||
if pipeline_id not in self._pipelines:
|
||||
return None
|
||||
|
||||
if not self._enabled.get(pipeline_id, True):
|
||||
return None
|
||||
|
||||
pipeline = self._pipelines[pipeline_id]
|
||||
|
||||
if initialize:
|
||||
await pipeline.initialize()
|
||||
|
||||
return pipeline
|
||||
|
||||
async def close_all(self) -> None:
|
||||
"""Close all pipeline instances."""
|
||||
for pipeline in self._pipelines.values():
|
||||
try:
|
||||
await pipeline.close()
|
||||
except Exception as e:
|
||||
logger.error(f"Error closing pipeline: {e}")
|
||||
|
||||
self._pipelines.clear()
|
||||
self._enabled.clear()
|
||||
467
packages/pipeline-core/src/pipeline_core/runner.py
Normal file
467
packages/pipeline-core/src/pipeline_core/runner.py
Normal file
@@ -0,0 +1,467 @@
|
||||
"""
|
||||
Pipeline Runner - Executes pipelines and tracks execution history.
|
||||
|
||||
The runner handles pipeline execution, tracking execution status in the database,
|
||||
and providing execution history for monitoring and debugging.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
import time
|
||||
import uuid
|
||||
from datetime import datetime
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from pipeline_core.base import PipelineResult
|
||||
from pipeline_core.contracts import ExecutionRequest, ExecutionStatus
|
||||
|
||||
if TYPE_CHECKING:
|
||||
import asyncpg
|
||||
|
||||
from pipeline_core.base import BasePipeline
|
||||
from pipeline_core.registry import PipelineRegistry
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class PipelineRunner:
|
||||
"""
|
||||
Executes pipelines and tracks execution history.
|
||||
|
||||
The runner:
|
||||
- Gets pipeline instances from the registry
|
||||
- Tracks execution status in the database
|
||||
- Handles errors and updates status
|
||||
- Provides execution history queries
|
||||
|
||||
Usage:
|
||||
registry = PipelineRegistry(pool)
|
||||
runner = PipelineRunner(pool, registry)
|
||||
|
||||
# Execute a pipeline
|
||||
result = await runner.execute(
|
||||
pipeline_id="reviewiq",
|
||||
request=ExecutionRequest(
|
||||
job_id="job-123",
|
||||
stages=["normalize", "classify"],
|
||||
)
|
||||
)
|
||||
|
||||
# Get execution status
|
||||
status = await runner.get_execution("exec-123")
|
||||
|
||||
# List executions
|
||||
executions = await runner.list_executions(pipeline_id="reviewiq")
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
pool: asyncpg.Pool,
|
||||
registry: PipelineRegistry,
|
||||
):
|
||||
"""
|
||||
Initialize the runner.
|
||||
|
||||
Args:
|
||||
pool: asyncpg connection pool
|
||||
registry: Pipeline registry for getting pipeline instances
|
||||
"""
|
||||
self._pool = pool
|
||||
self._registry = registry
|
||||
|
||||
async def execute(
|
||||
self,
|
||||
pipeline_id: str,
|
||||
request: ExecutionRequest,
|
||||
) -> tuple[str, PipelineResult]:
|
||||
"""
|
||||
Execute a pipeline.
|
||||
|
||||
Args:
|
||||
pipeline_id: Pipeline identifier
|
||||
request: Execution request with input data and options
|
||||
|
||||
Returns:
|
||||
Tuple of (execution_id, PipelineResult)
|
||||
|
||||
Raises:
|
||||
ValueError: If pipeline not found or disabled
|
||||
"""
|
||||
# Get pipeline instance
|
||||
pipeline = await self._registry.get_pipeline(pipeline_id)
|
||||
if not pipeline:
|
||||
raise ValueError(f"Pipeline not found or disabled: {pipeline_id}")
|
||||
|
||||
# Create execution record
|
||||
execution_id = str(uuid.uuid4())
|
||||
stages = request.get("stages") or pipeline.get_stage_names()
|
||||
|
||||
await self._create_execution(
|
||||
execution_id=execution_id,
|
||||
pipeline_id=pipeline_id,
|
||||
job_id=request.get("job_id"),
|
||||
business_id=request.get("business_id"),
|
||||
stages_requested=stages,
|
||||
)
|
||||
|
||||
# Update status to running
|
||||
await self._update_execution_status(
|
||||
execution_id=execution_id,
|
||||
status="running",
|
||||
started_at=datetime.utcnow(),
|
||||
)
|
||||
|
||||
try:
|
||||
# Prepare input data
|
||||
input_data = request.get("input_data") or {}
|
||||
if request.get("job_id"):
|
||||
input_data["job_id"] = request["job_id"]
|
||||
if request.get("business_id"):
|
||||
input_data["business_id"] = request["business_id"]
|
||||
|
||||
# Validate input
|
||||
validation_errors = await pipeline.validate_input(input_data)
|
||||
if validation_errors:
|
||||
error_msg = "; ".join(validation_errors)
|
||||
await self._update_execution_status(
|
||||
execution_id=execution_id,
|
||||
status="failed",
|
||||
error_message=f"Validation failed: {error_msg}",
|
||||
completed_at=datetime.utcnow(),
|
||||
)
|
||||
return execution_id, PipelineResult.from_error(
|
||||
pipeline_id, f"Validation failed: {error_msg}"
|
||||
)
|
||||
|
||||
# Execute pipeline
|
||||
start_time = time.time()
|
||||
result = await pipeline.process(input_data, stages=stages)
|
||||
duration_ms = int((time.time() - start_time) * 1000)
|
||||
|
||||
# Update execution with result
|
||||
if result.success:
|
||||
await self._update_execution_status(
|
||||
execution_id=execution_id,
|
||||
status="completed",
|
||||
stages_completed=result.stages_run,
|
||||
result_summary=self._summarize_result(result),
|
||||
completed_at=datetime.utcnow(),
|
||||
)
|
||||
else:
|
||||
await self._update_execution_status(
|
||||
execution_id=execution_id,
|
||||
status="failed",
|
||||
stages_completed=result.stages_run,
|
||||
error_message=result.error,
|
||||
completed_at=datetime.utcnow(),
|
||||
)
|
||||
|
||||
logger.info(
|
||||
f"Pipeline {pipeline_id} execution {execution_id} "
|
||||
f"completed in {duration_ms}ms: success={result.success}"
|
||||
)
|
||||
|
||||
return execution_id, result
|
||||
|
||||
except Exception as e:
|
||||
logger.exception(f"Pipeline {pipeline_id} execution failed: {e}")
|
||||
|
||||
await self._update_execution_status(
|
||||
execution_id=execution_id,
|
||||
status="failed",
|
||||
error_message=str(e),
|
||||
completed_at=datetime.utcnow(),
|
||||
)
|
||||
|
||||
return execution_id, PipelineResult.from_error(pipeline_id, str(e))
|
||||
|
||||
async def cancel(self, execution_id: str) -> bool:
|
||||
"""
|
||||
Cancel a running execution.
|
||||
|
||||
Args:
|
||||
execution_id: Execution identifier
|
||||
|
||||
Returns:
|
||||
True if cancelled, False if not found or already completed
|
||||
"""
|
||||
async with self._pool.acquire() as conn:
|
||||
result = await conn.execute(
|
||||
"""
|
||||
UPDATE pipeline.executions
|
||||
SET status = 'cancelled', completed_at = NOW()
|
||||
WHERE id = $1 AND status IN ('pending', 'running')
|
||||
""",
|
||||
uuid.UUID(execution_id),
|
||||
)
|
||||
|
||||
cancelled = result.split()[-1] != "0"
|
||||
if cancelled:
|
||||
logger.info(f"Cancelled execution: {execution_id}")
|
||||
return cancelled
|
||||
|
||||
async def get_execution(self, execution_id: str) -> ExecutionStatus | None:
|
||||
"""
|
||||
Get execution status.
|
||||
|
||||
Args:
|
||||
execution_id: Execution identifier
|
||||
|
||||
Returns:
|
||||
ExecutionStatus or None if not found
|
||||
"""
|
||||
async with self._pool.acquire() as conn:
|
||||
row = await conn.fetchrow(
|
||||
"""
|
||||
SELECT id, pipeline_id, job_id, business_id, status,
|
||||
stages_requested, stages_completed, current_stage,
|
||||
input_summary, result_summary, error_message,
|
||||
started_at, completed_at, created_at
|
||||
FROM pipeline.executions
|
||||
WHERE id = $1
|
||||
""",
|
||||
uuid.UUID(execution_id),
|
||||
)
|
||||
|
||||
if not row:
|
||||
return None
|
||||
|
||||
return self._row_to_execution_status(row)
|
||||
|
||||
async def list_executions(
|
||||
self,
|
||||
pipeline_id: str | None = None,
|
||||
job_id: str | None = None,
|
||||
business_id: str | None = None,
|
||||
status: str | None = None,
|
||||
limit: int = 50,
|
||||
offset: int = 0,
|
||||
) -> list[ExecutionStatus]:
|
||||
"""
|
||||
List execution history.
|
||||
|
||||
Args:
|
||||
pipeline_id: Filter by pipeline
|
||||
job_id: Filter by job
|
||||
business_id: Filter by business
|
||||
status: Filter by status
|
||||
limit: Maximum results
|
||||
offset: Result offset
|
||||
|
||||
Returns:
|
||||
List of ExecutionStatus
|
||||
"""
|
||||
conditions = []
|
||||
params = []
|
||||
param_idx = 1
|
||||
|
||||
if pipeline_id:
|
||||
conditions.append(f"pipeline_id = ${param_idx}")
|
||||
params.append(pipeline_id)
|
||||
param_idx += 1
|
||||
|
||||
if job_id:
|
||||
conditions.append(f"job_id = ${param_idx}")
|
||||
params.append(uuid.UUID(job_id))
|
||||
param_idx += 1
|
||||
|
||||
if business_id:
|
||||
conditions.append(f"business_id = ${param_idx}")
|
||||
params.append(business_id)
|
||||
param_idx += 1
|
||||
|
||||
if status:
|
||||
conditions.append(f"status = ${param_idx}")
|
||||
params.append(status)
|
||||
param_idx += 1
|
||||
|
||||
where_clause = "WHERE " + " AND ".join(conditions) if conditions else ""
|
||||
|
||||
query = f"""
|
||||
SELECT id, pipeline_id, job_id, business_id, status,
|
||||
stages_requested, stages_completed, current_stage,
|
||||
input_summary, result_summary, error_message,
|
||||
started_at, completed_at, created_at
|
||||
FROM pipeline.executions
|
||||
{where_clause}
|
||||
ORDER BY created_at DESC
|
||||
LIMIT ${param_idx} OFFSET ${param_idx + 1}
|
||||
"""
|
||||
params.extend([limit, offset])
|
||||
|
||||
async with self._pool.acquire() as conn:
|
||||
rows = await conn.fetch(query, *params)
|
||||
|
||||
return [self._row_to_execution_status(row) for row in rows]
|
||||
|
||||
async def get_execution_count(
|
||||
self,
|
||||
pipeline_id: str | None = None,
|
||||
status: str | None = None,
|
||||
) -> int:
|
||||
"""
|
||||
Get execution count.
|
||||
|
||||
Args:
|
||||
pipeline_id: Filter by pipeline
|
||||
status: Filter by status
|
||||
|
||||
Returns:
|
||||
Count of executions matching filters
|
||||
"""
|
||||
conditions = []
|
||||
params = []
|
||||
param_idx = 1
|
||||
|
||||
if pipeline_id:
|
||||
conditions.append(f"pipeline_id = ${param_idx}")
|
||||
params.append(pipeline_id)
|
||||
param_idx += 1
|
||||
|
||||
if status:
|
||||
conditions.append(f"status = ${param_idx}")
|
||||
params.append(status)
|
||||
param_idx += 1
|
||||
|
||||
where_clause = "WHERE " + " AND ".join(conditions) if conditions else ""
|
||||
|
||||
async with self._pool.acquire() as conn:
|
||||
result = await conn.fetchval(
|
||||
f"SELECT COUNT(*) FROM pipeline.executions {where_clause}",
|
||||
*params,
|
||||
)
|
||||
|
||||
return result or 0
|
||||
|
||||
async def _create_execution(
|
||||
self,
|
||||
execution_id: str,
|
||||
pipeline_id: str,
|
||||
job_id: str | None,
|
||||
business_id: str | None,
|
||||
stages_requested: list[str],
|
||||
) -> None:
|
||||
"""Create an execution record."""
|
||||
async with self._pool.acquire() as conn:
|
||||
await conn.execute(
|
||||
"""
|
||||
INSERT INTO pipeline.executions (
|
||||
id, pipeline_id, job_id, business_id,
|
||||
status, stages_requested, created_at
|
||||
)
|
||||
VALUES ($1, $2, $3, $4, 'pending', $5, NOW())
|
||||
""",
|
||||
uuid.UUID(execution_id),
|
||||
pipeline_id,
|
||||
uuid.UUID(job_id) if job_id else None,
|
||||
business_id,
|
||||
stages_requested,
|
||||
)
|
||||
|
||||
async def _update_execution_status(
|
||||
self,
|
||||
execution_id: str,
|
||||
status: str,
|
||||
current_stage: str | None = None,
|
||||
stages_completed: list[str] | None = None,
|
||||
input_summary: dict[str, Any] | None = None,
|
||||
result_summary: dict[str, Any] | None = None,
|
||||
error_message: str | None = None,
|
||||
started_at: datetime | None = None,
|
||||
completed_at: datetime | None = None,
|
||||
) -> None:
|
||||
"""Update execution status."""
|
||||
updates = ["status = $2"]
|
||||
params: list[Any] = [uuid.UUID(execution_id), status]
|
||||
param_idx = 3
|
||||
|
||||
if current_stage is not None:
|
||||
updates.append(f"current_stage = ${param_idx}")
|
||||
params.append(current_stage)
|
||||
param_idx += 1
|
||||
|
||||
if stages_completed is not None:
|
||||
updates.append(f"stages_completed = ${param_idx}")
|
||||
params.append(stages_completed)
|
||||
param_idx += 1
|
||||
|
||||
if input_summary is not None:
|
||||
updates.append(f"input_summary = ${param_idx}")
|
||||
params.append(input_summary)
|
||||
param_idx += 1
|
||||
|
||||
if result_summary is not None:
|
||||
updates.append(f"result_summary = ${param_idx}")
|
||||
params.append(result_summary)
|
||||
param_idx += 1
|
||||
|
||||
if error_message is not None:
|
||||
updates.append(f"error_message = ${param_idx}")
|
||||
params.append(error_message)
|
||||
param_idx += 1
|
||||
|
||||
if started_at is not None:
|
||||
updates.append(f"started_at = ${param_idx}")
|
||||
params.append(started_at)
|
||||
param_idx += 1
|
||||
|
||||
if completed_at is not None:
|
||||
updates.append(f"completed_at = ${param_idx}")
|
||||
params.append(completed_at)
|
||||
param_idx += 1
|
||||
|
||||
query = f"""
|
||||
UPDATE pipeline.executions
|
||||
SET {", ".join(updates)}
|
||||
WHERE id = $1
|
||||
"""
|
||||
|
||||
async with self._pool.acquire() as conn:
|
||||
await conn.execute(query, *params)
|
||||
|
||||
def _row_to_execution_status(self, row: Any) -> ExecutionStatus:
|
||||
"""Convert database row to ExecutionStatus."""
|
||||
# Calculate progress
|
||||
stages_requested = row["stages_requested"] or []
|
||||
stages_completed = row["stages_completed"] or []
|
||||
progress = (
|
||||
len(stages_completed) / len(stages_requested)
|
||||
if stages_requested
|
||||
else 0.0
|
||||
)
|
||||
|
||||
return ExecutionStatus(
|
||||
id=str(row["id"]),
|
||||
pipeline_id=row["pipeline_id"],
|
||||
job_id=str(row["job_id"]) if row["job_id"] else None,
|
||||
business_id=row["business_id"],
|
||||
status=row["status"],
|
||||
stages_requested=stages_requested,
|
||||
stages_completed=stages_completed,
|
||||
current_stage=row["current_stage"],
|
||||
progress=progress,
|
||||
input_summary=row["input_summary"],
|
||||
result_summary=row["result_summary"],
|
||||
error_message=row["error_message"],
|
||||
started_at=row["started_at"].isoformat() if row["started_at"] else None,
|
||||
completed_at=row["completed_at"].isoformat() if row["completed_at"] else None,
|
||||
created_at=row["created_at"].isoformat() if row["created_at"] else None,
|
||||
)
|
||||
|
||||
def _summarize_result(self, result: PipelineResult) -> dict[str, Any]:
|
||||
"""Create a summary of the pipeline result for storage."""
|
||||
summary: dict[str, Any] = {
|
||||
"success": result.success,
|
||||
"stages_run": result.stages_run,
|
||||
}
|
||||
|
||||
# Add stage-specific summaries
|
||||
for stage, stage_result in result.stage_results.items():
|
||||
if stage_result.get("data"):
|
||||
# Extract stats if available
|
||||
data = stage_result["data"]
|
||||
if "stats" in data:
|
||||
summary[f"{stage}_stats"] = data["stats"]
|
||||
|
||||
return summary
|
||||
@@ -23,6 +23,7 @@ classifiers = [
|
||||
]
|
||||
|
||||
dependencies = [
|
||||
"pipeline-core",
|
||||
"asyncpg>=0.28.0",
|
||||
"pydantic>=2.0",
|
||||
"pydantic-settings>=2.0",
|
||||
|
||||
@@ -6,6 +6,9 @@ This package provides a complete pipeline for processing customer reviews:
|
||||
- Stage 2: LLM Classification (span extraction with URT codes)
|
||||
- Stage 3: Issue Routing (route negative spans to issues)
|
||||
- Stage 4: Fact Aggregation (pre-aggregate metrics for dashboards)
|
||||
|
||||
Implements the BasePipeline interface from pipeline-core for the extensible
|
||||
multi-pipeline system with dynamic dashboards.
|
||||
"""
|
||||
|
||||
from reviewiq_pipeline.config import Config
|
||||
@@ -28,12 +31,14 @@ from reviewiq_pipeline.contracts import (
|
||||
ValidationError,
|
||||
ValidationResult,
|
||||
)
|
||||
from reviewiq_pipeline.pipeline import Pipeline
|
||||
from reviewiq_pipeline.pipeline import Pipeline, PipelineResult, ReviewIQPipeline
|
||||
|
||||
__version__ = "0.1.0"
|
||||
__all__ = [
|
||||
# Main API
|
||||
"Pipeline",
|
||||
"ReviewIQPipeline",
|
||||
"PipelineResult",
|
||||
"Config",
|
||||
# Contracts
|
||||
"ScraperOutput",
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user