Initial commit - NUC server configuration and docs

- CLAUDE.md: Server instructions and service reference
- docs/: Persistent documentation (architecture, guides)
- .artifacts/: Session-generated notes
- playwriter-browser/: Remote browser container config

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
Alejandro Gutiérrez
2026-02-01 20:49:20 +00:00
commit 390eda1595
25 changed files with 3664 additions and 0 deletions

View File

@@ -0,0 +1,124 @@
# LiquidGym Database Engines Reference
**Date:** 2026-02-01 17:15
**Context:** Reference guide for LiquidGym's multi-engine SQL testing infrastructure
## Overview
LiquidGym is a multi-database testing environment designed to verify that analytical queries work identically across different database engines. This ensures engine-agnostic query generation.
## Engine Tiers
### Core (Always Started)
| Engine | Image | Port | Purpose |
|--------|-------|------|---------|
| PostgreSQL 16 | `postgres:16` | 5433 | Primary test database with sample datasets |
| CloudBeaver | `dbeaver/cloudbeaver` | 8978 | Web-based database UI |
### Tier 1: Essential Engines
Different SQL dialects for cross-engine testing.
| Engine | Image | Port | Description |
|--------|-------|------|-------------|
| **ClickHouse** | `clickhouse/clickhouse-server` | 8123 (HTTP), 9000 (Native) | Column-oriented OLAP database. Extremely fast for analytics on billions of rows. Used by Cloudflare, Uber, eBay. Best for: logs, metrics, time-series analytics. |
| **MySQL 8** | `mysql:8` | 3306 | World's most popular open-source RDBMS. Tests MySQL-specific SQL dialect. |
### Tier 2: Distributed & Specialized
| Engine | Image | Port | Description |
|--------|-------|------|-------------|
| **Trino** | `trinodb/trino` | 8084 | Distributed SQL query engine. Queries data across multiple sources (Postgres, S3, Kafka) with single SQL. No storage - just a query layer. |
| **StarRocks** | `starrocks/allin1-ubuntu` | 9030 (MySQL), 8030 (HTTP) | MPP analytics database. Sub-second queries on large datasets. Powers BI dashboards. Fork of Apache Doris with performance improvements. |
| **TimescaleDB** | `timescale/timescaledb:latest-pg16` | 5434 | PostgreSQL extension for time-series data. Auto-partitions by time. Perfect for IoT, metrics, events. Familiar Postgres SQL. |
### Tier 3: Advanced/Specialized
| Engine | Image | Port | Description |
|--------|-------|------|-------------|
| **Apache Doris** | `apache/doris:doris-all-in-one-2.1.0` | 9031 (MySQL), 8031 (HTTP) | Real-time analytical database. MySQL-compatible. Good for real-time dashboards and ad-hoc queries. |
| **Apache Druid** | `apache/druid:26.0.0` | 8888 | Real-time OLAP for sub-second slice-and-dice analytics. Powers Airbnb, Netflix, Alibaba dashboards. Best for: high-concurrency, low-latency queries. |
| **Apache Spark** | `apache/spark:3.5.0` | 7077 (Master), 8085 (UI) | Distributed compute engine for big data. ML pipelines, ETL, batch processing. Overkill for small datasets. |
## Observability Stack
| Tool | Image | Port | Description |
|------|-------|------|-------------|
| **Grafana** | `grafana/grafana` | 3005 | Visualization & dashboards. Query any data source, create alerts. Login: admin/liquidgym |
| **Prometheus** | `prom/prometheus` | 9090 | Metrics collection & alerting. Scrapes metrics from all engines. |
| **Redis** | `redis:7-alpine` | 6379 | In-memory cache. Used for session storage, caching query results. |
## Usage
```bash
cd ~/Desktop/liquidgym/infra
# Start core only (Postgres + CloudBeaver)
docker compose up -d
# Start with Tier 1 engines (+ ClickHouse, MySQL)
docker compose --profile tier1 up -d
# Start with Tier 2 engines (+ Trino, StarRocks, TimescaleDB)
docker compose --profile tier2 up -d
# Start with Tier 3 engines (+ Doris, Spark)
docker compose --profile tier3 up -d
# Start observability stack (+ Prometheus, Grafana, Redis)
docker compose --profile observability up -d
# Start everything
docker compose --profile all up -d
# Load sample datasets
docker compose --profile loader up
```
## Sample Datasets
| Dataset | Description | Tables |
|---------|-------------|--------|
| **Northwind** | Classic MS Access sample - orders, products, customers | 14 |
| **Pagila** | DVD rental store (PostgreSQL port of Sakila) | 29 |
| **Chinook** | Digital media store - artists, albums, tracks | 11 |
| **AdventureWorks** | Microsoft sample - sales, HR, production | 68 |
| **Employees** | Large HR dataset with 300K+ employee records | 6 |
| **LEGO** | LEGO sets, parts, themes, colors | 8 |
| **Netflix** | Netflix titles catalog | 1 |
## When to Use Each Engine
| Use Case | Recommended Engine |
|----------|-------------------|
| General OLTP | PostgreSQL, MySQL |
| Analytics on large datasets | ClickHouse, StarRocks |
| Time-series / IoT | TimescaleDB |
| Real-time dashboards | Druid, Doris |
| Query across multiple DBs | Trino |
| Big data / ML pipelines | Spark |
| Caching | Redis |
## Resource Requirements
| Profile | RAM | CPU | Disk |
|---------|-----|-----|------|
| Core | 1GB | 1 | 1GB |
| + Tier 1 | 6GB | 2 | 3GB |
| + Tier 2 | 10GB | 4 | 5GB |
| + Tier 3 | 16GB+ | 6+ | 10GB+ |
| + Observability | +2GB | +1 | +1GB |
## NUC Migration Status
The following have been migrated to NUC and no longer need local volumes:
| Service | NUC Location | Status |
|---------|--------------|--------|
| PostgreSQL (datasets) | 192.168.1.3:5433 | Migrated |
| MySQL | 192.168.1.3:3306 | Migrated |
Tier 1-3 engines remain local-only for development testing.
## Related
- LiquidGym project: `~/Desktop/liquidgym/infra/`
- Docker Compose: `~/Desktop/liquidgym/infra/docker-compose.yml`
- Datasets: `~/Desktop/liquidgym/infra/datasets/`