# LiquidGym Database Engines Reference **Date:** 2026-02-01 17:15 **Context:** Reference guide for LiquidGym's multi-engine SQL testing infrastructure ## Overview LiquidGym is a multi-database testing environment designed to verify that analytical queries work identically across different database engines. This ensures engine-agnostic query generation. ## Engine Tiers ### Core (Always Started) | Engine | Image | Port | Purpose | |--------|-------|------|---------| | PostgreSQL 16 | `postgres:16` | 5433 | Primary test database with sample datasets | | CloudBeaver | `dbeaver/cloudbeaver` | 8978 | Web-based database UI | ### Tier 1: Essential Engines Different SQL dialects for cross-engine testing. | Engine | Image | Port | Description | |--------|-------|------|-------------| | **ClickHouse** | `clickhouse/clickhouse-server` | 8123 (HTTP), 9000 (Native) | Column-oriented OLAP database. Extremely fast for analytics on billions of rows. Used by Cloudflare, Uber, eBay. Best for: logs, metrics, time-series analytics. | | **MySQL 8** | `mysql:8` | 3306 | World's most popular open-source RDBMS. Tests MySQL-specific SQL dialect. | ### Tier 2: Distributed & Specialized | Engine | Image | Port | Description | |--------|-------|------|-------------| | **Trino** | `trinodb/trino` | 8084 | Distributed SQL query engine. Queries data across multiple sources (Postgres, S3, Kafka) with single SQL. No storage - just a query layer. | | **StarRocks** | `starrocks/allin1-ubuntu` | 9030 (MySQL), 8030 (HTTP) | MPP analytics database. Sub-second queries on large datasets. Powers BI dashboards. Fork of Apache Doris with performance improvements. | | **TimescaleDB** | `timescale/timescaledb:latest-pg16` | 5434 | PostgreSQL extension for time-series data. Auto-partitions by time. Perfect for IoT, metrics, events. Familiar Postgres SQL. | ### Tier 3: Advanced/Specialized | Engine | Image | Port | Description | |--------|-------|------|-------------| | **Apache Doris** | `apache/doris:doris-all-in-one-2.1.0` | 9031 (MySQL), 8031 (HTTP) | Real-time analytical database. MySQL-compatible. Good for real-time dashboards and ad-hoc queries. | | **Apache Druid** | `apache/druid:26.0.0` | 8888 | Real-time OLAP for sub-second slice-and-dice analytics. Powers Airbnb, Netflix, Alibaba dashboards. Best for: high-concurrency, low-latency queries. | | **Apache Spark** | `apache/spark:3.5.0` | 7077 (Master), 8085 (UI) | Distributed compute engine for big data. ML pipelines, ETL, batch processing. Overkill for small datasets. | ## Observability Stack | Tool | Image | Port | Description | |------|-------|------|-------------| | **Grafana** | `grafana/grafana` | 3005 | Visualization & dashboards. Query any data source, create alerts. Login: admin/liquidgym | | **Prometheus** | `prom/prometheus` | 9090 | Metrics collection & alerting. Scrapes metrics from all engines. | | **Redis** | `redis:7-alpine` | 6379 | In-memory cache. Used for session storage, caching query results. | ## Usage ```bash cd ~/Desktop/liquidgym/infra # Start core only (Postgres + CloudBeaver) docker compose up -d # Start with Tier 1 engines (+ ClickHouse, MySQL) docker compose --profile tier1 up -d # Start with Tier 2 engines (+ Trino, StarRocks, TimescaleDB) docker compose --profile tier2 up -d # Start with Tier 3 engines (+ Doris, Spark) docker compose --profile tier3 up -d # Start observability stack (+ Prometheus, Grafana, Redis) docker compose --profile observability up -d # Start everything docker compose --profile all up -d # Load sample datasets docker compose --profile loader up ``` ## Sample Datasets | Dataset | Description | Tables | |---------|-------------|--------| | **Northwind** | Classic MS Access sample - orders, products, customers | 14 | | **Pagila** | DVD rental store (PostgreSQL port of Sakila) | 29 | | **Chinook** | Digital media store - artists, albums, tracks | 11 | | **AdventureWorks** | Microsoft sample - sales, HR, production | 68 | | **Employees** | Large HR dataset with 300K+ employee records | 6 | | **LEGO** | LEGO sets, parts, themes, colors | 8 | | **Netflix** | Netflix titles catalog | 1 | ## When to Use Each Engine | Use Case | Recommended Engine | |----------|-------------------| | General OLTP | PostgreSQL, MySQL | | Analytics on large datasets | ClickHouse, StarRocks | | Time-series / IoT | TimescaleDB | | Real-time dashboards | Druid, Doris | | Query across multiple DBs | Trino | | Big data / ML pipelines | Spark | | Caching | Redis | ## Resource Requirements | Profile | RAM | CPU | Disk | |---------|-----|-----|------| | Core | 1GB | 1 | 1GB | | + Tier 1 | 6GB | 2 | 3GB | | + Tier 2 | 10GB | 4 | 5GB | | + Tier 3 | 16GB+ | 6+ | 10GB+ | | + Observability | +2GB | +1 | +1GB | ## NUC Migration Status The following have been migrated to NUC and no longer need local volumes: | Service | NUC Location | Status | |---------|--------------|--------| | PostgreSQL (datasets) | 192.168.1.3:5433 | Migrated | | MySQL | 192.168.1.3:3306 | Migrated | Tier 1-3 engines remain local-only for development testing. ## Related - LiquidGym project: `~/Desktop/liquidgym/infra/` - Docker Compose: `~/Desktop/liquidgym/infra/docker-compose.yml` - Datasets: `~/Desktop/liquidgym/infra/datasets/`