# NUC Monitoring & Recovery Setup **Date:** 2026-02-02 22:20 **Context:** Complete monitoring stack deployment with auto-recovery and remote access ## Architecture ``` ┌─────────────────────────────────────────────────────────────────────┐ │ MONITORING STACK │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ ┌─────────────┐ scrape ┌─────────────┐ │ │ │ OpenWrt │◄─────────────│ Prometheus │ │ │ │ :9100 │ │ :9091 │ │ │ └─────────────┘ └──────┬──────┘ │ │ │ │ │ ┌─────────────┐ scrape │ ┌─────────────┐ │ │ │ NUC Node │◄───────────────────┤ │ Alertmanager│ │ │ │ Exporter │ │ │ :9093 │ │ │ │ :9100 │ ┌─────┴────┐ └──────┬──────┘ │ │ └─────────────┘ │ Grafana │ │ │ │ │ :3333 │ ▼ │ │ └──────────┘ ┌───────────┐ │ │ │ntfy Bridge│ │ │ │ :9095 │ │ │ └─────┬─────┘ │ │ │ │ │ ▼ │ │ ntfy.sh/nuc-watchdog │ └─────────────────────────────────────────────────────────────────────┘ ``` ## Access URLs | Service | Local URL | Remote URL | Credentials | |---------|-----------|------------|-------------| | **Grafana** | http://192.168.1.3:3333 | https://alezmad-nuc.tail58f5ad.ts.net | admin / nucmonitoring | | **Prometheus** | http://192.168.1.3:9091 | - | - | | **Alertmanager** | http://192.168.1.3:9093 | - | - | | **OpenWrt Metrics** | http://192.168.1.1:9100 | - | - | ## Prometheus Targets | Job | Target | Scrape Interval | |-----|--------|-----------------| | `prometheus` | localhost:9090 | 15s | | `nuc-node` | 192.168.1.3:9100 | 15s | | `openwrt` | 192.168.1.1:9100 | 30s | ## Alert Rules ### NUC Alerts (`/opt/monitoring/alert_rules.yml`) | Alert | Condition | Severity | |-------|-----------|----------| | NUCDown | `up{job="nuc-node"} == 0` for 1m | critical | | HighCPULoad | CPU > 80% for 5m | warning | | HighMemoryUsage | Memory > 85% for 5m | warning | | DiskSpaceLow | Disk > 85% for 5m | warning | ### OpenWrt Alerts | Alert | Condition | Severity | |-------|-----------|----------| | OpenWrtDown | `up{job="openwrt"} == 0` for 1m | critical | | OpenWrtHighLoad | Load > 2 for 5m | warning | ## Auto-Recovery Layers | Layer | Component | Action | Trigger | |-------|-----------|--------|---------| | 1 | OpenWrt Monitor | WoL packet | HTTP+Ping fail (3x) | | 2 | Hardware Watchdog | Auto-reboot | System freeze | | 3 | Kernel Panic | Auto-reboot (10s) | Kernel panic | | 4 | WoL | Wake from power off | Manual or script | ## Configuration Files ### NUC (`/opt/monitoring/`) - `prometheus.yml` - Prometheus configuration - `alert_rules.yml` - Alert rules - `alertmanager.yml` - Alertmanager → ntfy bridge - `alertmanager-ntfy-bridge.py` - Webhook translator ### OpenWrt (`/opt/`) - `monitor-nuc.sh` - Health check daemon (0.5Hz) ### Systemd Services - `prometheus-node-exporter.service` - NUC metrics - `alertmanager-ntfy-bridge.service` - Alert translator ## Notifications **ntfy Topic:** `nuc-watchdog` Subscribe via: - App: ntfy (iOS/Android) - Web: https://ntfy.sh/nuc-watchdog **Alert Sources:** 1. OpenWrt watchdog (direct to ntfy.sh) 2. Alertmanager → bridge → ntfy.sh ## Remote Access ### Tailscale (NUC) - Funnel URL: https://alezmad-nuc.tail58f5ad.ts.net - Exposes: Grafana (:3333) ### WireGuard (OpenWrt) - Endpoint: 5.224.196.245:51820 - VPN Subnet: 10.10.10.0/24 - Config: `~/wireguard/home-vpn.conf` ## Grafana Setup **Data Source:** Prometheus (http://prometheus:9090) **Imported Dashboards:** - Node Exporter Full (ID: 1860) ## Maintenance Commands ```bash # Check Prometheus targets curl -s http://192.168.1.3:9091/api/v1/targets | jq '.data.activeTargets[].health' # Check active alerts curl -s http://192.168.1.3:9093/api/v2/alerts | jq '.[].labels.alertname' # Restart monitoring stack ssh nuc "docker restart prometheus-r0wg4gwoow44kkkc8skc4kwg alertmanager-r0wg4gwoow44kkkc8skc4kwg grafana-r0wg4gwoow44kkkc8skc4kwg" # Check OpenWrt monitor ssh root@192.168.1.1 "ps | grep monitor" # Test alert curl -X POST http://192.168.1.3:9093/api/v2/alerts -H "Content-Type: application/json" \ -d '[{"labels":{"alertname":"TestAlert","severity":"warning"},"annotations":{"summary":"Test"}}]' # Manual ntfy test curl -d "Test message" https://ntfy.sh/nuc-watchdog ``` ## Container UUIDs (Coolify) | Service | UUID | |---------|------| | Monitoring Stack | r0wg4gwoow44kkkc8skc4kwg | ## Related - OpenWrt NUC Monitor: `/opt/monitor-nuc.sh` - Kernel panic config: `/etc/sysctl.d/99-auto-reboot.conf` - WireGuard config: `~/wireguard/home-vpn.conf`