Files
nuc/docs/monitoring.md
Alejandro Gutiérrez 8b503a549c Add operational documentation
CloudBeaver database manager guide, Ecija intranet deployment,
Gitea-Coolify auto-deploy and integration docs, monitoring setup
with presentation, remote access guide, security architecture,
and Turbostarter deployment procedure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-18 15:17:18 +01:00

6.2 KiB

NUC Monitoring & Recovery Setup

Date: 2026-02-02 22:20 Context: Complete monitoring stack deployment with auto-recovery and remote access

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                         MONITORING STACK                            │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   ┌─────────────┐    scrape    ┌─────────────┐                     │
│   │   OpenWrt   │◄─────────────│  Prometheus │                     │
│   │   :9100     │              │   :9091     │                     │
│   └─────────────┘              └──────┬──────┘                     │
│                                       │                             │
│   ┌─────────────┐    scrape          │      ┌─────────────┐        │
│   │  NUC Node   │◄───────────────────┤      │ Alertmanager│        │
│   │  Exporter   │                    │      │   :9093     │        │
│   │   :9100     │              ┌─────┴────┐ └──────┬──────┘        │
│   └─────────────┘              │  Grafana │        │               │
│                                │  :3333   │        ▼               │
│                                └──────────┘  ┌───────────┐         │
│                                              │ntfy Bridge│         │
│                                              │  :9095    │         │
│                                              └─────┬─────┘         │
│                                                    │               │
│                                                    ▼               │
│                                            ntfy.sh/nuc-watchdog    │
└─────────────────────────────────────────────────────────────────────┘

Access URLs

Service Local URL Remote URL Credentials
Grafana http://192.168.1.3:3333 https://alezmad-nuc.tail58f5ad.ts.net admin / nucmonitoring
Prometheus http://192.168.1.3:9091 - -
Alertmanager http://192.168.1.3:9093 - -
OpenWrt Metrics http://192.168.1.1:9100 - -

Prometheus Targets

Job Target Scrape Interval
prometheus localhost:9090 15s
nuc-node 192.168.1.3:9100 15s
openwrt 192.168.1.1:9100 30s

Alert Rules

NUC Alerts (/opt/monitoring/alert_rules.yml)

Alert Condition Severity
NUCDown up{job="nuc-node"} == 0 for 1m critical
HighCPULoad CPU > 80% for 5m warning
HighMemoryUsage Memory > 85% for 5m warning
DiskSpaceLow Disk > 85% for 5m warning

OpenWrt Alerts

Alert Condition Severity
OpenWrtDown up{job="openwrt"} == 0 for 1m critical
OpenWrtHighLoad Load > 2 for 5m warning

Auto-Recovery Layers

Layer Component Action Trigger
1 OpenWrt Monitor WoL packet HTTP+Ping fail (3x)
2 Hardware Watchdog Auto-reboot System freeze
3 Kernel Panic Auto-reboot (10s) Kernel panic
4 WoL Wake from power off Manual or script

Configuration Files

NUC (/opt/monitoring/)

  • prometheus.yml - Prometheus configuration
  • alert_rules.yml - Alert rules
  • alertmanager.yml - Alertmanager → ntfy bridge
  • alertmanager-ntfy-bridge.py - Webhook translator

OpenWrt (/opt/)

  • monitor-nuc.sh - Health check daemon (0.5Hz)

Systemd Services

  • prometheus-node-exporter.service - NUC metrics
  • alertmanager-ntfy-bridge.service - Alert translator

Notifications

ntfy Topic: nuc-watchdog

Subscribe via:

Alert Sources:

  1. OpenWrt watchdog (direct to ntfy.sh)
  2. Alertmanager → bridge → ntfy.sh

Remote Access

Tailscale (NUC)

WireGuard (OpenWrt)

  • Endpoint: 5.224.196.245:51820
  • VPN Subnet: 10.10.10.0/24
  • Config: ~/wireguard/home-vpn.conf

Grafana Setup

Data Source: Prometheus (http://prometheus:9090)

Imported Dashboards:

  • Node Exporter Full (ID: 1860)

Maintenance Commands

# Check Prometheus targets
curl -s http://192.168.1.3:9091/api/v1/targets | jq '.data.activeTargets[].health'

# Check active alerts
curl -s http://192.168.1.3:9093/api/v2/alerts | jq '.[].labels.alertname'

# Restart monitoring stack
ssh nuc "docker restart prometheus-r0wg4gwoow44kkkc8skc4kwg alertmanager-r0wg4gwoow44kkkc8skc4kwg grafana-r0wg4gwoow44kkkc8skc4kwg"

# Check OpenWrt monitor
ssh root@192.168.1.1 "ps | grep monitor"

# Test alert
curl -X POST http://192.168.1.3:9093/api/v2/alerts -H "Content-Type: application/json" \
  -d '[{"labels":{"alertname":"TestAlert","severity":"warning"},"annotations":{"summary":"Test"}}]'

# Manual ntfy test
curl -d "Test message" https://ntfy.sh/nuc-watchdog

Container UUIDs (Coolify)

Service UUID
Monitoring Stack r0wg4gwoow44kkkc8skc4kwg
  • OpenWrt NUC Monitor: /opt/monitor-nuc.sh
  • Kernel panic config: /etc/sysctl.d/99-auto-reboot.conf
  • WireGuard config: ~/wireguard/home-vpn.conf