CloudBeaver database manager guide, Ecija intranet deployment, Gitea-Coolify auto-deploy and integration docs, monitoring setup with presentation, remote access guide, security architecture, and Turbostarter deployment procedure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
157 lines
6.2 KiB
Markdown
157 lines
6.2 KiB
Markdown
# NUC Monitoring & Recovery Setup
|
|
|
|
**Date:** 2026-02-02 22:20
|
|
**Context:** Complete monitoring stack deployment with auto-recovery and remote access
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ MONITORING STACK │
|
|
├─────────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ ┌─────────────┐ scrape ┌─────────────┐ │
|
|
│ │ OpenWrt │◄─────────────│ Prometheus │ │
|
|
│ │ :9100 │ │ :9091 │ │
|
|
│ └─────────────┘ └──────┬──────┘ │
|
|
│ │ │
|
|
│ ┌─────────────┐ scrape │ ┌─────────────┐ │
|
|
│ │ NUC Node │◄───────────────────┤ │ Alertmanager│ │
|
|
│ │ Exporter │ │ │ :9093 │ │
|
|
│ │ :9100 │ ┌─────┴────┐ └──────┬──────┘ │
|
|
│ └─────────────┘ │ Grafana │ │ │
|
|
│ │ :3333 │ ▼ │
|
|
│ └──────────┘ ┌───────────┐ │
|
|
│ │ntfy Bridge│ │
|
|
│ │ :9095 │ │
|
|
│ └─────┬─────┘ │
|
|
│ │ │
|
|
│ ▼ │
|
|
│ ntfy.sh/nuc-watchdog │
|
|
└─────────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
## Access URLs
|
|
|
|
| Service | Local URL | Remote URL | Credentials |
|
|
|---------|-----------|------------|-------------|
|
|
| **Grafana** | http://192.168.1.3:3333 | https://alezmad-nuc.tail58f5ad.ts.net | admin / nucmonitoring |
|
|
| **Prometheus** | http://192.168.1.3:9091 | - | - |
|
|
| **Alertmanager** | http://192.168.1.3:9093 | - | - |
|
|
| **OpenWrt Metrics** | http://192.168.1.1:9100 | - | - |
|
|
|
|
## Prometheus Targets
|
|
|
|
| Job | Target | Scrape Interval |
|
|
|-----|--------|-----------------|
|
|
| `prometheus` | localhost:9090 | 15s |
|
|
| `nuc-node` | 192.168.1.3:9100 | 15s |
|
|
| `openwrt` | 192.168.1.1:9100 | 30s |
|
|
|
|
## Alert Rules
|
|
|
|
### NUC Alerts (`/opt/monitoring/alert_rules.yml`)
|
|
|
|
| Alert | Condition | Severity |
|
|
|-------|-----------|----------|
|
|
| NUCDown | `up{job="nuc-node"} == 0` for 1m | critical |
|
|
| HighCPULoad | CPU > 80% for 5m | warning |
|
|
| HighMemoryUsage | Memory > 85% for 5m | warning |
|
|
| DiskSpaceLow | Disk > 85% for 5m | warning |
|
|
|
|
### OpenWrt Alerts
|
|
|
|
| Alert | Condition | Severity |
|
|
|-------|-----------|----------|
|
|
| OpenWrtDown | `up{job="openwrt"} == 0` for 1m | critical |
|
|
| OpenWrtHighLoad | Load > 2 for 5m | warning |
|
|
|
|
## Auto-Recovery Layers
|
|
|
|
| Layer | Component | Action | Trigger |
|
|
|-------|-----------|--------|---------|
|
|
| 1 | OpenWrt Monitor | WoL packet | HTTP+Ping fail (3x) |
|
|
| 2 | Hardware Watchdog | Auto-reboot | System freeze |
|
|
| 3 | Kernel Panic | Auto-reboot (10s) | Kernel panic |
|
|
| 4 | WoL | Wake from power off | Manual or script |
|
|
|
|
## Configuration Files
|
|
|
|
### NUC (`/opt/monitoring/`)
|
|
- `prometheus.yml` - Prometheus configuration
|
|
- `alert_rules.yml` - Alert rules
|
|
- `alertmanager.yml` - Alertmanager → ntfy bridge
|
|
- `alertmanager-ntfy-bridge.py` - Webhook translator
|
|
|
|
### OpenWrt (`/opt/`)
|
|
- `monitor-nuc.sh` - Health check daemon (0.5Hz)
|
|
|
|
### Systemd Services
|
|
- `prometheus-node-exporter.service` - NUC metrics
|
|
- `alertmanager-ntfy-bridge.service` - Alert translator
|
|
|
|
## Notifications
|
|
|
|
**ntfy Topic:** `nuc-watchdog`
|
|
|
|
Subscribe via:
|
|
- App: ntfy (iOS/Android)
|
|
- Web: https://ntfy.sh/nuc-watchdog
|
|
|
|
**Alert Sources:**
|
|
1. OpenWrt watchdog (direct to ntfy.sh)
|
|
2. Alertmanager → bridge → ntfy.sh
|
|
|
|
## Remote Access
|
|
|
|
### Tailscale (NUC)
|
|
- Funnel URL: https://alezmad-nuc.tail58f5ad.ts.net
|
|
- Exposes: Grafana (:3333)
|
|
|
|
### WireGuard (OpenWrt)
|
|
- Endpoint: 5.224.196.245:51820
|
|
- VPN Subnet: 10.10.10.0/24
|
|
- Config: `~/wireguard/home-vpn.conf`
|
|
|
|
## Grafana Setup
|
|
|
|
**Data Source:** Prometheus (http://prometheus:9090)
|
|
|
|
**Imported Dashboards:**
|
|
- Node Exporter Full (ID: 1860)
|
|
|
|
## Maintenance Commands
|
|
|
|
```bash
|
|
# Check Prometheus targets
|
|
curl -s http://192.168.1.3:9091/api/v1/targets | jq '.data.activeTargets[].health'
|
|
|
|
# Check active alerts
|
|
curl -s http://192.168.1.3:9093/api/v2/alerts | jq '.[].labels.alertname'
|
|
|
|
# Restart monitoring stack
|
|
ssh nuc "docker restart prometheus-r0wg4gwoow44kkkc8skc4kwg alertmanager-r0wg4gwoow44kkkc8skc4kwg grafana-r0wg4gwoow44kkkc8skc4kwg"
|
|
|
|
# Check OpenWrt monitor
|
|
ssh root@192.168.1.1 "ps | grep monitor"
|
|
|
|
# Test alert
|
|
curl -X POST http://192.168.1.3:9093/api/v2/alerts -H "Content-Type: application/json" \
|
|
-d '[{"labels":{"alertname":"TestAlert","severity":"warning"},"annotations":{"summary":"Test"}}]'
|
|
|
|
# Manual ntfy test
|
|
curl -d "Test message" https://ntfy.sh/nuc-watchdog
|
|
```
|
|
|
|
## Container UUIDs (Coolify)
|
|
|
|
| Service | UUID |
|
|
|---------|------|
|
|
| Monitoring Stack | r0wg4gwoow44kkkc8skc4kwg |
|
|
|
|
## Related
|
|
|
|
- OpenWrt NUC Monitor: `/opt/monitor-nuc.sh`
|
|
- Kernel panic config: `/etc/sysctl.d/99-auto-reboot.conf`
|
|
- WireGuard config: `~/wireguard/home-vpn.conf`
|