diff --git a/docs/network-map.md b/docs/network-map.md new file mode 100644 index 0000000..2833a3e --- /dev/null +++ b/docs/network-map.md @@ -0,0 +1,292 @@ +# 🌐 Network Map & Topology + +**Last Updated:** October 31, 2025 +**Network Range:** 192.168.68.0/22 +**Maintained By:** Weston + +--- + +## πŸ“Š Quick Reference + +| Device | IP Address | Purpose | +|--------|-----------|---------| +| **TP-Link Router** | 192.168.68.1 | Gateway, DHCP, Mesh Primary | +| **Foxtrot (Gaming PC)** | 192.168.68.50 | Workstation | +| **Unraid Server (Tower)** | 192.168.68.51 | Main infrastructure | +| **PiKVM** | 192.168.68.53 | Server out-of-band management | +| **Pi-hole (Pi Zero 2W)** | 192.168.68.61 | DNS + Ad-blocking + Unbound | +| **Code-Server VM** | 192.168.68.70 | Ubuntu headless + VS Code | +| **TP-Link Mesh Node** | 192.168.71.250 | Office WiFi extender | + +--- + +## πŸ—ΊοΈ Physical Network Topology + +``` + Internet + β”‚ + β”‚ (WAN) + β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ TP-Link Routerβ”‚ + β”‚ 192.168.68.1 β”‚ + β”‚ (Mesh Primary) β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ (LAN - Mesh Network) + β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ β”‚ β”‚ + β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β” + β”‚TP-Link β”‚ β”‚ Unraid β”‚ β”‚Pi Zero β”‚ + β”‚Mesh Node β”‚ β”‚ Server β”‚ β”‚Pi-hole β”‚ + β”‚ .71.250 β”‚ β”‚ Tower β”‚ β”‚Unbound β”‚ + β”‚ (Office) β”‚ β”‚ .68.51 β”‚ β”‚ .68.61 β”‚ + β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ β”‚ + β”Œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β” + β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ + β”Œβ”€β”€β”€β”€β”΄β” β”Œβ”€β”΄β”€β”€β”€β”€β” β”Œβ”€β”΄β”€β”€β” β”‚ β”Œβ”€β”€β”΄β”€β”€β”€β”€β” + β”‚Foxtrotβ”‚Laptopβ”‚ β”‚PiKVMβ”‚ β”‚ β”‚VM: β”‚ + β”‚Gamingβ”‚(WiFi)β”‚ β”‚.68.53β”‚ β”‚ β”‚Code β”‚ + β”‚ PC β”‚ β”‚ β”‚(Directβ”‚ β”‚ β”‚Server β”‚ + β”‚.68.50β”‚ β”‚ β”‚to Svr)β”‚ β”‚ β”‚.68.70 β”‚ + β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ + (Server VMs) +``` + +--- + +## πŸ–₯️ Unraid Server Virtual Network + +``` +Physical: eth0 (2.5GbE) β†’ bond0 β†’ br0 (192.168.68.51) + β”‚ + β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” + β”‚ β”‚ β”‚ + β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β” + β”‚ VMs β”‚ β”‚ Docker β”‚ β”‚ Tailscale β”‚ + β”‚ β”‚ β”‚ β”‚ β”‚ VPN β”‚ + β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ + β”‚ β”‚ 100.122.220.126 + β”‚ β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β” + β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β” β”‚ docker0 β”‚ + β”‚Code-Srvr β”‚ β”‚172.17.0.1β”‚ + β”‚ .68.70 β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ + β”‚ (Ubuntu) β”‚ β”‚ + β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” + β”‚ β”‚ β”‚ β”‚ + β”Œβ”€β”€β”€β”€β”΄β” β”Œβ”€β”΄β”€β”€β” β”Œβ”€β”€β”€β”΄β”€β”€β” β”Œβ”€β”΄β”€β”€β”€β” + β”‚open-β”‚ β”‚NPM β”‚ β”‚Gitea β”‚ β”‚Guac β”‚ + β”‚webuiβ”‚ β”‚ .4 β”‚ β”‚ .3 β”‚ β”‚ .2 β”‚ + β”‚ .5 β”‚ β””β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”˜ + β””β”€β”€β”€β”€β”€β”˜ +``` + +--- + +## πŸ“ Complete IP Address Table + +### Infrastructure & Services + +| Device/Service | IP Address | MAC | Type | Notes | +|---------------|-----------|-----|------|-------| +| **TP-Link Router** | 192.168.68.1 | - | Physical | Gateway, DHCP, primary mesh | +| **Foxtrot (Gaming PC)** | 192.168.68.50 | - | Physical | Workstation, static IP | +| **Unraid Server** | 192.168.68.51 | 58:47:ca:7b:97:b0 | Physical | Main server, static IP | +| **PiKVM** | 192.168.68.53 | - | Physical | Direct to server, management | +| **Pi-hole (Pi Zero 2W)** | 192.168.68.61 | - | Physical | DNS/ad-block/Unbound, static | +| **Code-Server VM** | 192.168.68.70 | - | Virtual | Ubuntu + VS Code, KVM/QEMU | +| **Laptop** | DHCP | - | Physical | Mobile device, WiFi | +| **TP-Link Mesh Node** | 192.168.71.250 | - | Physical | Office WiFi extender | + +### Docker Containers (172.17.0.0/16) + +| Container | Docker IP | Host Port | Purpose | +|-----------|-----------|-----------|---------| +| **ApacheGuacamole** | 172.17.0.2 | 4000 | Remote desktop gateway | +| **Gitea** | 172.17.0.3 | 3002, 22 | Git server | +| **NginxProxyManager** | 172.17.0.4 | 1880, 7818, 18443 | Reverse proxy | +| **open-webui** | 172.17.0.5 | 3000 | LLM interface | +| **Cloudflared** | 172.17.0.6 | 46495 | Cloudflare tunnel | +| **Vaultwarden** | 172.17.0.7 | 4743 | Password manager | + +### VPN + +| Service | IP | Network | Purpose | +|---------|----|---------| --------| +| **Tailscale** | 100.122.220.126 | 100.64.0.0/10 | Secure remote access | + +--- + +## 🌐 Network Details + +**Subnet:** 192.168.68.0/22 +**Netmask:** 255.255.252.0 +**Usable Range:** 192.168.68.1 - 192.168.71.254 (1022 hosts) +**Gateway:** 192.168.68.1 +**Primary DNS:** 192.168.68.61 (Pi-hole) +**Secondary DNS:** 9.9.9.9 (Quad9) +**Broadcast:** 192.168.71.255 + +--- + +## πŸ”Œ Port Reference Guide + +### Unraid Server Ports + +| Service | Port | Protocol | URL | +|---------|------|----------|-----| +| **Unraid WebUI** | 80 | HTTP | http://192.168.68.51 | +| **Unraid SSL** | 443 | HTTPS | https://192.168.68.51 | +| **SMB** | 445 | TCP | \\\\192.168.68.51 | +| **SSH** | 22 | TCP | ssh root@192.168.68.51 | + +### Container Access + +| Service | URL | Port | Notes | +|---------|-----|------|-------| +| **open-webui** | http://192.168.68.51:3000 | 3000 | LLM chat interface | +| **Gitea** | http://192.168.68.51:3002 | 3002 | Git web UI | +| **Gitea (domain)** | https://gitea.segelschiff.app | 443 | Via Cloudflare | +| **NPM Web** | http://192.168.68.51:1880 | 1880 | Proxy frontend | +| **NPM Admin** | http://192.168.68.51:7818 | 7818 | Management UI | +| **Guacamole** | http://192.168.68.51:4000 | 4000 | Remote desktop | +| **Vaultwarden** | http://192.168.68.51:4743 | 4743 | Password vault | + +### Infrastructure Access + +| Service | URL | Default Port | +|---------|-----|--------------| +| **PiKVM** | https://192.168.68.53 | 443 | +| **Pi-hole Admin** | http://192.168.68.61/admin | 80 | +| **Code-Server** | http://192.168.68.70:8080 | 8080 (typical) | + +--- + +## πŸ›‘οΈ DNS Configuration + +**Primary:** Pi-hole (192.168.68.61) +- Ad-blocking +- Local DNS records +- Query logging +- DHCP relay + +**Upstream:** Unbound (same device) +- Recursive DNS resolver +- No forwarding to ISP +- Privacy-focused +- DNSSEC validation + +**Resolution Flow:** +``` +Client β†’ Pi-hole (192.168.68.61) β†’ Unbound β†’ Root Servers +``` + +**Fallback:** 9.9.9.9 (Quad9) - Privacy-respecting public DNS + +--- + +## 🌐 Remote Access + +### Cloudflare Tunnel +``` +Internet β†’ Cloudflare Edge β†’ Tunnel β†’ NPM β†’ Services +``` +- **Domain:** *.segelschiff.app +- **Services Exposed:** Gitea (and others via NPM) +- **Benefits:** No open ports, DDoS protection, SSL +- **Container:** Cloudflared (172.17.0.6) + +### Tailscale VPN +``` +Remote Device β†’ Encrypted Tunnel β†’ Unraid (100.122.220.126) +``` +- **Network:** 100.64.0.0/10 (CGNAT) +- **Protocol:** WireGuard +- **Benefits:** Zero-trust, peer-to-peer, NAT traversal +- **Access:** Full homelab as if local + +--- + +## πŸ“Š Network Performance + +| Link | Capacity | Usage | Status | +|------|----------|-------|--------| +| **Unraid NIC** | 2.5 Gbps | <1% | Underutilized | +| **Mesh Backhaul** | Unknown | Unknown | Check model specs | +| **Internet WAN** | Unknown | Unknown | ISP dependent | + +**Observed (eth0):** ~2 Mbps average = 0.08% of 2.5G capacity + +--- + +## πŸ”§ Troubleshooting Commands + +### Connectivity Tests +```bash +# Test key infrastructure +ping 192.168.68.1 # Router +ping 192.168.68.51 # Unraid +ping 192.168.68.61 # Pi-hole +ping 192.168.68.70 # Code-Server VM +ping 8.8.8.8 # Internet + +# DNS tests +nslookup google.com 192.168.68.61 # Test Pi-hole +dig @192.168.68.61 example.com # Detailed DNS query +``` + +### Network Status (from Unraid) +```bash +# Interfaces +ip addr show +ip link show + +# Routes +ip route show + +# Active connections +ss -tulpn + +# Docker networks +docker network ls +docker network inspect bridge +``` + +### VM Network (Code-Server) +```bash +# List VMs +virsh list --all + +# Get VM IP +virsh domifaddr + +# VM network info +virsh net-info default +``` + +--- + +## πŸ“ Recommendations + +### Security +1. ⚠️ **Separate Gitea SSH port** - Currently conflicts with Unraid SSH (both port 22) +2. ⚠️ **Implement VLANs** - Segment management/services/workstations +3. ⚠️ **Firewall hardening** - Move from ACCEPT-all to explicit rules + +### Performance +1. Monitor mesh performance between nodes +2. Document ISP speeds and plan accordingly +3. Consider 10GbE upgrade path (future) + +### Documentation +1. βœ… Document Code-Server VM configuration +2. βœ… Record TP-Link mesh model and capabilities +3. βœ… Map exact ISP speeds and plan + +--- + +**Last Updated:** October 31, 2025 +**Next Review:** When network topology changes +**Quick Access:** See README.md for service URLs diff --git a/docs/quick-start.md b/docs/quick-start.md new file mode 100644 index 0000000..86d861d --- /dev/null +++ b/docs/quick-start.md @@ -0,0 +1,954 @@ +# πŸš€ Quick Start & Emergency Recovery Guide + +**Purpose:** Get your homelab back online quickly after disaster +**Target Time:** 30-60 minutes to basic functionality +**Last Updated:** October 31, 2025 + +--- + +## 🎯 Quick Access Reference + +### Essential URLs + +| Service | URL | Default Credentials | +|---------|-----|---------------------| +| **Unraid Dashboard** | http://192.168.68.51 | root / (your password) | +| **Gitea** | https://gitea.segelschiff.app | Weston / (your password) | +| **Vaultwarden** | http://192.168.68.51:4743 | Master password | +| **NPM Admin** | http://192.168.68.51:7818 | admin@example.com / changeme (first login) | +| **Pi-hole** | http://192.168.68.61/admin | (your password) | +| **PiKVM** | https://192.168.68.53 | admin / admin (default) | + +### SSH Access + +```bash +# Local network +ssh root@192.168.68.51 + +# Via Tailscale (from anywhere) +ssh root@100.122.220.126 + +# Emergency: Use PiKVM for console access +# https://192.168.68.53 +``` + +--- + +## πŸ†˜ Emergency Recovery Scenarios + +### Scenario 1: Server Won't Boot 🚨 + +**Symptoms:** +- No network connectivity to 192.168.68.51 +- Unraid WebUI unreachable +- No response to ping + +**Recovery Steps:** + +1. **Physical Check** (via PiKVM or in person) + ``` + [ ] Server has power (check LED) + [ ] Network cable connected to eth0 + [ ] Monitor shows output (via PiKVM) + [ ] USB boot drive is present and detected + ``` + +2. **Use PiKVM for Remote Console** + - Access: https://192.168.68.53 + - Login: admin / admin + - View boot process + - Check BIOS/boot messages + +3. **Common Boot Issues** + + **USB Boot Drive Failure** (Most common!) + ``` + Symptoms: "Boot device not found" or similar + + Fix: + 1. Have backup USB ready + 2. Shut down server (via PiKVM power control) + 3. Replace USB boot drive + 4. Power on + 5. Restore configuration from backup + ``` + + **BIOS Settings Changed** + ``` + Fix: + 1. Enter BIOS (DEL/F2 during boot) + 2. Load defaults + 3. Verify boot order (USB first) + 4. Save and exit + ``` + + **Hardware Failure** + ``` + Check: + 1. RAM seated properly + 2. All drives detected in BIOS + 3. CPU fan spinning + 4. No error beeps + ``` + +4. **Boot from Backup USB** + ``` + Steps: + 1. Power off server + 2. Insert backup USB boot drive + 3. Power on + 4. Verify boot successful + 5. Restore configuration: + - Tools β†’ Flash Backup β†’ Browse β†’ Select backup ZIP + - Reboot + ``` + +**Prevention:** +- βœ… Keep USB flash backup updated (weekly) +- βœ… Store backup USB in safe location +- βœ… Document BIOS settings (screenshots via PiKVM) + +--- + +### Scenario 2: Lost Admin Password + +**Unraid Root Password Reset:** + +1. **Via PiKVM Console** + ``` + 1. Access PiKVM: https://192.168.68.53 + 2. View console in browser + 3. Wait for login prompt + 4. Press Ctrl+Alt+F2 (via PiKVM keyboard) + 5. At terminal: passwd root + 6. Enter new password twice + 7. Press Ctrl+Alt+F1 to return to GUI + 8. Update documentation + ``` + +2. **Via Physical Access** + ``` + 1. Connect monitor and keyboard to server + 2. Press Ctrl+Alt+F2 + 3. Run: passwd root + 4. Set new password + 5. Press Ctrl+Alt+F1 + ``` + +**Container Passwords:** +- Check `/mnt/user/appdata//config` +- Review environment variables in Docker templates +- Use Vaultwarden if accessible +- Check this documentation repo in Gitea + +--- + +### Scenario 3: Container Won't Start + +**Quick Diagnosis:** + +```bash +# Check container status +docker ps -a | grep + +# View recent logs +docker logs --tail 100 + +# Look for errors +docker inspect | grep -i error +``` + +**Common Fixes:** + +**Port Conflict:** +```bash +# Find what's using the port +netstat -tulpn | grep + +# Example: Port 3000 already in use +netstat -tulpn | grep 3000 + +# Stop conflicting service +docker stop +``` + +**Volume Permission Issues:** +```bash +# Check ownership +ls -la /mnt/user/appdata/ + +# Fix permissions (Unraid standard: 99:100) +chown -R 99:100 /mnt/user/appdata/ + +# Example: Fix Vaultwarden +chown -R 99:100 /mnt/user/appdata/vaultwarden +``` + +**Dependency Missing:** +```bash +# Example: Guacamole needs MariaDB +docker start mariadb +sleep 10 # Wait for database initialization +docker start ApacheGuacamole + +# Verify dependency is running +docker ps | grep mariadb +``` + +**Resource Exhaustion:** +```bash +# Check cache usage +df -h /mnt/cache + +# If cache full (>90%), clean up +docker system prune -a # ⚠️ REMOVES UNUSED IMAGES! + +# Or free space manually +# See service-inventory.md for cleanup recommendations +``` + +--- + +### Scenario 4: Network Connectivity Issues + +**Can't Access from LAN:** + +```bash +# SSH into Unraid (via PiKVM if network down) +ssh root@192.168.68.51 + +# Check if br0 is up +ip addr show br0 +# Should show: 192.168.68.51/22 + +# Verify IP and routes +ip route | grep default +# Should show: default via 192.168.68.1 + +# Test router connectivity +ping -c 3 192.168.68.1 + +# Test internet +ping -c 3 8.8.8.8 + +# Test DNS (Pi-hole) +nslookup google.com 192.168.68.61 +``` + +**Fix Network Issues:** + +```bash +# Restart networking (from console/PiKVM) +/etc/rc.d/rc.inet1 restart + +# If that doesn't work, reboot +reboot +``` + +**Can't Access Containers:** + +```bash +# Check Docker network +docker network inspect bridge + +# Verify container IP +docker inspect | grep IPAddress + +# Test from Unraid host +curl http://172.17.0.5:8080 # Example: open-webui + +# Test port mapping +curl http://192.168.68.51:3000 # Should reach open-webui +``` + +**DNS Not Resolving:** + +```bash +# Test Pi-hole directly +nslookup google.com 192.168.68.61 + +# If Pi-hole down, check Pi Zero +ping 192.168.68.61 + +# SSH to Pi-hole +ssh pi@192.168.68.61 + +# Check Pi-hole status +pihole status + +# Restart if needed +pihole restartdns +``` + +--- + +### Scenario 5: Array Won't Start + +**Symptoms:** +- Unraid GUI accessible but array shows "Stopped" +- Disks show errors or missing + +**Troubleshooting:** + +```bash +# Check disk health +smartctl -a /dev/sdb # Parity +smartctl -a /dev/sdc # Disk 1 + +# View disk assignments +cat /boot/config/disk.cfg + +# Check for filesystem errors (read-only check) +xfs_repair -n /dev/md1p1 +``` + +**Common Causes:** +- Parity sync in progress (wait for completion) +- Disk failed (check SMART, may need replacement) +- Unclean shutdown (filesystem check required) +- Disk assignment changed + +**Recovery:** + +1. **Start Array in Maintenance Mode** + - Click "Start" in Unraid GUI + - Select "Maintenance mode" if prompted + - Run filesystem check if prompted + +2. **Review Logs** + - Settings β†’ System Log + - Look for disk errors + - Check for power events + +3. **If Disk Failed** + - Follow Unraid disk replacement procedure + - Do NOT format or write to disk unnecessarily + - Seek help in Unraid forums if uncertain + +--- + +## πŸ”§ Critical Service Restart Procedures + +### Restart Core Services (Proper Order) + +**1. Infrastructure First:** +```bash +# Start reverse proxy (for routing) +docker start NginxProxyManager + +# Wait for it to be ready +sleep 5 +docker ps | grep NginxProxyManager + +# Start tunnel (for remote access) +docker start Cloudflared + +# Verify both running +docker ps | grep -E "NginxProxyManager|Cloudflared" +``` + +**2. Security Services:** +```bash +# Password manager (critical!) +docker start vaultwarden + +# Wait for healthy status +sleep 10 +docker ps | grep vaultwarden +# Should show "(healthy)" + +# If not healthy, check logs +docker logs --tail 50 vaultwarden +``` + +**3. Development Tools:** +```bash +# Git server +docker start Gitea + +# Wait for initialization +sleep 5 + +# Remote access gateway +docker start ApacheGuacamole +# Note: Needs MariaDB if configured +``` + +**4. Monitoring (IMPORTANT!):** +```bash +# Database first +docker start Influxdb + +# Wait for DB to initialize +sleep 15 + +# Then metrics collector +docker start Telegraf + +# Finally visualization +docker start Grafana + +# Verify all running +docker ps | grep -E "Influxdb|Telegraf|Grafana" +``` + +**5. Optional Services:** +```bash +# LLM backend +docker start ollama +sleep 10 + +# LLM interface +docker start open-webui + +# Wait for healthy +docker ps | grep open-webui +``` + +--- + +### Stop All Services Gracefully + +```bash +# Stop all running containers +docker stop $(docker ps -q) + +# Verify all stopped +docker ps +# Should show empty output + +# Wait before stopping array +sleep 5 + +# Stop array (from GUI) +# Main β†’ Array Operation β†’ Stop +``` + +--- + +## πŸ“¦ Backup & Restore Procedures + +### USB Flash Backup (Unraid Configuration) + +**Create Backup:** +1. Navigate to: **Main β†’ Flash β†’ Flash Backup** +2. Click "Backup Now" +3. Download ZIP file (e.g., `unraid-flash-backup-20251031.zip`) +4. Store securely OFF-SERVER: + - OneDrive: `/z_Unraid/Backups/` + - External drive + - Cloud storage + +**Restore from Backup:** +``` +1. Format new USB drive (if needed) +2. Copy backup ZIP to new USB +3. Extract contents to root of USB + - config/ directory + - bzimage, bzroot, etc. +4. Safely eject USB +5. Boot from new USB +6. Configuration restored automatically +``` + +**Frequency:** +- Weekly minimum +- After ANY configuration change +- Before major updates + +--- + +### Container Data Backup + +**Critical Directories:** + +``` +Priority 1 (CRITICAL): +/mnt/user/appdata/vaultwarden/ 🚨 Your passwords! +/mnt/user/appdata/gitea/ 🚨 Your code repositories! + +Priority 2 (Important): +/mnt/user/appdata/NginxProxyManager/ Proxy configs +/mnt/user/appdata/Grafana/ Dashboards +/mnt/user/appdata/Influxdb/ Metrics history + +Priority 3 (Optional): +/mnt/user/appdata/open-webui/ LLM chat history +``` + +**Quick Backup Script:** + +```bash +#!/bin/bash +# Save as: /mnt/user/scripts/backup-critical.sh + +BACKUP_DIR="/mnt/user/backups/$(date +%Y%m%d_%H%M%S)" +mkdir -p "$BACKUP_DIR" + +echo "Stopping containers..." +docker stop vaultwarden Gitea NginxProxyManager + +echo "Backing up data..." +tar -czf "$BACKUP_DIR/vaultwarden.tar.gz" /mnt/user/appdata/vaultwarden +tar -czf "$BACKUP_DIR/gitea.tar.gz" /mnt/user/appdata/gitea +tar -czf "$BACKUP_DIR/npm.tar.gz" /mnt/user/appdata/NginxProxyManager + +echo "Restarting containers..." +docker start vaultwarden Gitea NginxProxyManager + +echo "βœ… Backup complete: $BACKUP_DIR" +ls -lh "$BACKUP_DIR" +``` + +**Make Executable:** +```bash +chmod +x /mnt/user/scripts/backup-critical.sh +``` + +**Run Manually:** +```bash +/mnt/user/scripts/backup-critical.sh +``` + +**Schedule (User Scripts Plugin):** +- Frequency: Daily at 2 AM +- Retention: Keep last 30 days + +--- + +**Restore from Backup:** + +```bash +# Example: Restore Vaultwarden +docker stop vaultwarden + +# Backup current (corrupted) data +mv /mnt/user/appdata/vaultwarden /mnt/user/appdata/vaultwarden.old + +# Extract backup +tar -xzf /mnt/user/backups/20251031_120000/vaultwarden.tar.gz -C / + +# Restart container +docker start vaultwarden + +# Verify working +curl http://192.168.68.51:4743 +``` + +--- + +## ⚑ Quick Commands Reference + +### System Status + +```bash +# System uptime and load +uptime + +# Resource usage +free -h +df -h + +# Array status +cat /proc/mdcmd + +# Docker container summary +docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" + +# Temperature (if sensors installed) +sensors + +# Disk health quick check +smartctl -H /dev/sdb # Parity +smartctl -H /dev/sdc # Disk 1 +``` + +### Docker Quick Commands + +```bash +# Start all stopped containers +docker start $(docker ps -aq) + +# Stop all running containers +docker stop $(docker ps -q) + +# View logs (last 50 lines) +docker logs --tail 50 + +# Follow logs in real-time +docker logs -f + +# Restart container +docker restart + +# Remove container (⚠️ will lose non-volume data!) +docker rm + +# Clean up unused resources +docker system prune # Safe cleanup +docker system prune -a # ⚠️ Removes unused images too! +docker system prune --volumes # ⚠️ Removes unused volumes! +``` + +### Network Diagnostics + +```bash +# Check all interfaces +ip addr show + +# Test key infrastructure +ping -c 3 192.168.68.1 # Router +ping -c 3 192.168.68.51 # Unraid +ping -c 3 192.168.68.61 # Pi-hole +ping -c 3 8.8.8.8 # Internet + +# DNS resolution test +nslookup google.com +nslookup google.com 192.168.68.61 # Test Pi-hole specifically + +# Check listening ports +netstat -tulpn | grep LISTEN + +# Test specific port +nc -zv 192.168.68.51 3002 # Example: Gitea +curl -I http://192.168.68.51:3002 # HTTP test +``` + +### Quick Health Check Script + +```bash +#!/bin/bash +# Save as: /mnt/user/scripts/health-check.sh + +echo "=== Unraid Health Check ===" +echo "" + +echo "1. Array Status:" +cat /proc/mdcmd | grep mdState + +echo "" +echo "2. Running Containers:" +docker ps --format "table {{.Names}}\t{{.Status}}" + +echo "" +echo "3. Disk Usage:" +df -h | grep -E "cache|disk1|Filesystem" + +echo "" +echo "4. Network Connectivity:" +ping -c 2 192.168.68.1 >/dev/null 2>&1 && echo " Router: βœ… OK" || echo " Router: ❌ FAIL" +ping -c 2 8.8.8.8 >/dev/null 2>&1 && echo " Internet: βœ… OK" || echo " Internet: ❌ FAIL" +ping -c 2 192.168.68.61 >/dev/null 2>&1 && echo " Pi-hole: βœ… OK" || echo " Pi-hole: ❌ FAIL" + +echo "" +echo "5. Critical Services:" +curl -s http://localhost:4743 >/dev/null && echo " Vaultwarden: βœ… OK" || echo " Vaultwarden: ❌ DOWN" +curl -s http://localhost:3002 >/dev/null && echo " Gitea: βœ… OK" || echo " Gitea: ❌ DOWN" +curl -s http://localhost:7818 >/dev/null && echo " NPM: βœ… OK" || echo " NPM: ❌ DOWN" + +echo "" +echo "=== Health Check Complete ===" +``` + +**Run:** `bash /mnt/user/scripts/health-check.sh` + +--- + +## πŸ“ž Getting Help + +### Pre-flight Checks + +Before asking for help, gather this information: + +1. **System Diagnostics** + - Unraid WebGUI: Tools β†’ Diagnostics β†’ Download + - Creates ZIP with all logs + +2. **Container Logs** + ```bash + docker logs > container-logs.txt + ``` + +3. **Network Configuration** + ```bash + ip addr show > network-config.txt + ip route show >> network-config.txt + ``` + +4. **Disk Status** + ```bash + smartctl -a /dev/sdb > disk-smart.txt + smartctl -a /dev/sdc >> disk-smart.txt + ``` + +### Community Resources + +- **Unraid Forums:** https://forums.unraid.net/ + - Post diagnostics ZIP + - Be specific about symptoms + - Include what you've tried + +- **r/unraid:** https://reddit.com/r/unraid + - Quick questions + - Share diagnostics in pastebin + +- **Discord:** Unraid Official Discord + - Real-time help + - Active community + +### Emergency Contacts + +``` +ISP Support: [Your ISP Phone Number] +Unraid License: [Store in secure location] +USB Backup Location: [Document where stored] +Off-site Backup: [If applicable] +``` + +--- + +## πŸŽ“ Post-Recovery Checklist + +After restoring from disaster: + +``` +[ ] Unraid array started successfully +[ ] All critical services running + [ ] NginxProxyManager + [ ] Cloudflared + [ ] Vaultwarden + [ ] Gitea +[ ] Network connectivity verified + [ ] Can access Unraid WebUI + [ ] Can ping router (192.168.68.1) + [ ] Internet working + [ ] DNS resolving (Pi-hole) +[ ] Vaultwarden accessible (test password retrieval) +[ ] Gitea accessible (verify repositories intact) +[ ] NPM routing working (test reverse proxy) +[ ] Monitoring stack restarted + [ ] Grafana + [ ] InfluxDB + [ ] Telegraf +[ ] External access working + [ ] Tailscale connected + [ ] Cloudflare tunnel active +[ ] Backups verified and up-to-date +[ ] Documentation updated with lessons learned +[ ] Incident documented in change log (Gitea) +``` + +--- + +## πŸ”’ Security After Recovery + +**Immediately After Disaster Recovery:** + +1. **Change Passwords** (if compromise suspected) + ``` + [ ] Unraid root password + [ ] Vaultwarden master password + [ ] Container admin passwords + [ ] Pi-hole admin password + [ ] PiKVM password + ``` + +2. **Review Access Logs** + ```bash + # Check SSH attempts + grep "Failed password" /var/log/auth.log | tail -50 + + # Check NPM access + docker logs NginxProxyManager | grep -i error + + # Check Gitea access + docker logs Gitea | grep -i login + ``` + +3. **Verify Firewall Rules** + ```bash + iptables -L -n -v + ``` + +4. **Check for Unauthorized Changes** + ```bash + # Review Docker containers + docker ps -a + + # Check cron jobs + crontab -l + + # Review network interfaces + ip addr show + ``` + +--- + +## πŸ“ Documentation Updates After Incident + +**What to Document:** + +1. **What Happened:** + - Date/time of incident + - Symptoms observed + - Root cause (if determined) + - Duration of outage + +2. **What You Did:** + - Steps taken to recover + - What worked / didn't work + - Resources used (forums, docs, etc.) + - Time to recovery + +3. **Lessons Learned:** + - What could prevent this in future + - Process improvements needed + - Documentation gaps discovered + - Backup improvements needed + +4. **Action Items:** + - Backups to implement/improve + - Monitoring to add + - Scripts to create + - Hardware to replace/upgrade + +**Where to Document:** +- Create incident report: `docs/incidents/YYYY-MM-DD-incident-name.md` +- Update this quick-start guide with new procedures +- Add to troubleshooting section if recurring issue +- Commit to Gitea with detailed message + +--- + +## πŸš€ Normal Startup Sequence + +**From Cold Boot:** + +``` +1. Power on server + ↓ +2. BIOS POST (~30 seconds) + - Hardware check + - Memory test + - Drive detection + ↓ +3. Unraid boots from USB (~1-2 minutes) + - Linux kernel loads + - Unraid OS starts + ↓ +4. Network initializes + - br0 interface up + - Gets IP: 192.168.68.51 + ↓ +5. Array auto-starts (if configured) + - Parity disk: sdb + - Data disk: sdc + - Cache: nvme1n1p1 + ↓ +6. Docker service starts + - docker0 bridge created + - Networks initialized + ↓ +7. Containers auto-start (if enabled) + - Infrastructure services first + - Then application services + ↓ +8. Services available (~3-5 minutes total) + βœ… Ready to use! +``` + +**Expected Boot Time:** 3-5 minutes +**If Taking Longer:** Check system log for errors + +--- + +## 🎯 Quick Health Check Command + +**Run After Any Restart:** + +```bash +# Quick one-liner health check +docker ps --format "table {{.Names}}\t{{.Status}}" && \ +df -h | grep -E "cache|disk1" && \ +ping -c 2 192.168.68.1 >/dev/null && echo "Network: OK" || echo "Network: FAIL" +``` + +--- + +## πŸ“š Related Documentation + +- **Network Issues:** See `network-map.md` +- **Service Details:** See `service-inventory.md` +- **Container Configs:** See `docker-compose/` (when created) +- **Main Overview:** See `README.md` + +--- + +## πŸ†˜ True Emergency - Complete System Down + +**If everything is down and you need immediate help:** + +1. **Access via PiKVM** + - https://192.168.68.53 + - Get console access + - View what's happening + +2. **Check Physical Server** + - Power LED on? + - Fans spinning? + - Drives spinning up? + - Network activity lights? + +3. **Try Safe Mode Boot** + - Boot Unraid in Safe Mode (GUI mode) + - Diagnose from console + +4. **Community Help** + - Unraid Discord (fastest response) + - Forums with diagnostics ZIP + - r/unraid for quick questions + +5. **Document Everything** + - Take photos/screenshots via PiKVM + - Note exact error messages + - Record what you tried + - Timeline of events + +--- + +## πŸ’‘ Pro Tips + +1. **Test Your Backups** + - Restore test annually + - Verify data integrity + - Practice recovery procedures + +2. **Keep This Guide Accessible** + - Save offline copy to phone/laptop + - Print critical sections + - Bookmark in browser + +3. **Automate Where Possible** + - Schedule backup scripts + - Set up monitoring alerts + - Use User Scripts plugin + +4. **Document As You Go** + - Update after fixing issues + - Add new procedures discovered + - Note what worked/didn't work + +--- + +**Last Updated:** October 31, 2025 +**Next Review:** Quarterly or after incidents +**Maintained By:** Weston + +--- + +**Remember:** Most issues are recoverable. Stay calm, work methodically, document your steps, and don't hesitate to ask for help! + +**Keep this guide accessible even when the server is down!** +πŸ’‘ **Pro Tip:** Save a copy to your phone/laptop/OneDrive! + +πŸš€ **You've got this!** diff --git a/docs/service-inventory.md b/docs/service-inventory.md new file mode 100644 index 0000000..3c8a393 --- /dev/null +++ b/docs/service-inventory.md @@ -0,0 +1,614 @@ +# πŸ“¦ Service Inventory - Complete Container Catalog + +**Last Updated:** October 31, 2025 +**Total Containers:** 32 (6 running, 26 stopped) +**Purpose:** Comprehensive catalog of all services + +--- + +## πŸ“Š Quick Stats + +| Metric | Value | Status | +|--------|-------|--------| +| **Total Containers** | 32 | - | +| **Running** | 6 | βœ… 19% | +| **Stopped** | 26 | ⚠️ 81% | +| **Total Docker Images** | ~50GB | ⚠️ High | +| **Cache Usage** | 578GB / 932GB | ⚠️ 63% | + +**Key Insight:** 81% of containers are stopped - cleanup opportunity! + +--- + +## 🟒 Running Services (6 containers) + +### 1. open-webui ⭐⭐⭐ + +**Status:** Running (healthy) +**Container:** open-webui +**Image:** ghcr.io/open-webui/open-webui:main (4.55GB) +**Created:** 2025-10-16 (2 weeks ago) +**Network:** bridge (172.17.0.5) +**Ports:** 8080 β†’ 3000 + +**Resources:** +- CPU: 0.15% +- Memory: 1.026GB / 60.55GB (1.69%) +- Storage: 42.4MB + +**Purpose:** LLM chat interface (ChatGPT-like UI for local models) + +**Dependencies:** +- ollama (currently STOPPED ❌) +- OpenAI API key (configured) + +**Access:** +- Local: http://192.168.68.51:3000 +- No authentication by default + +**Issues:** +- ⚠️ Depends on ollama container which is stopped +- ⚠️ OpenAI API key exposed in environment variables + +**Recommendations:** +1. βœ… **KEEP** - Active LLM interface +2. Restart ollama container to enable local models +3. Move API keys to Docker secrets +4. Enable authentication + +**Priority:** HIGH - Core AI/ML service + +--- + +### 2. NginxProxyManager ⭐⭐⭐ + +**Status:** Running +**Container:** NginxProxyManager +**Image:** jlesage/nginx-proxy-manager (189MB) +**Created:** 2025-10-11 (3 weeks ago) +**Network:** bridge (172.17.0.4) +**Ports:** 4443β†’18443, 8080β†’1880, 8181β†’7818 + +**Resources:** +- CPU: 0.08% +- Memory: 77.45MB (0.12%) +- Storage: 13.4KB + +**Purpose:** Reverse proxy with web UI - SSL termination and routing + +**Dependencies:** None + +**Access:** +- Admin UI: http://192.168.68.51:7818 +- HTTP: http://192.168.68.51:1880 +- HTTPS: https://192.168.68.51:18443 + +**Configuration:** +- Routes traffic to backend services +- Manages SSL certificates +- Provides access control + +**Recommendations:** +1. βœ… **KEEP** - Critical infrastructure +2. Document all proxy rules in Gitea +3. Verify SSL auto-renewal is configured +4. Enable MFA if available +5. Review access logs regularly + +**Priority:** CRITICAL - Core infrastructure + +--- + +### 3. Gitea ⭐⭐⭐ + +**Status:** Running +**Container:** Gitea +**Image:** gitea/gitea (180MB) +**Created:** 2025-10-08 (3 weeks ago) +**Network:** bridge (172.17.0.3) +**Ports:** 22β†’22, 3000β†’3002 + +**Resources:** +- CPU: 0.11% +- Memory: 114.5MB (0.18%) +- Storage: 113MB (active repositories!) + +**Purpose:** Self-hosted Git server (GitHub alternative) + +**Dependencies:** None (internal SQLite) + +**Access:** +- Web: http://192.168.68.51:3002 +- Domain: https://gitea.segelschiff.app +- SSH: ssh://192.168.68.51:22 ⚠️ (conflicts with Unraid SSH) + +**Configuration:** +- Using latest tag (unpinned version) +- Storage: /mnt/user/appdata/gitea + +**Issues:** +- ⚠️ SSH port 22 conflicts with Unraid SSH +- ⚠️ Using `latest` tag (version not pinned) +- ⚠️ Backup strategy unknown + +**Recommendations:** +1. βœ… **KEEP** - Critical for version control +2. Change SSH port to 2222 to avoid conflict +3. Pin to specific version tag +4. Implement automated backups (CRITICAL!) +5. This is your version control hub - protect it! + +**Priority:** CRITICAL - Infrastructure documentation depends on this + +--- + +### 4. ApacheGuacamole ⭐⭐ + +**Status:** Running (2+ months uptime!) +**Container:** ApacheGuacamole +**Image:** jasonbean/guacamole (737MB) +**Created:** 2025-08-22 (2+ months ago) +**Network:** bridge (172.17.0.2) +**Ports:** 8080β†’4000 + +**Resources:** +- CPU: 0.16% +- Memory: 785.8MB (1.27%) +- Storage: 46.2MB + +**Purpose:** Clientless remote desktop gateway (RDP/VNC/SSH via browser) + +**Dependencies:** +- MariaDB (STOPPED ❌) - **BROKEN DEPENDENCY!** + +**Access:** +- Web: http://192.168.68.51:4000 + +**Configuration:** +- MySQL enabled but MariaDB stopped +- Multiple auth modules: MySQL, LDAP, TOTP, etc. + +**Issues:** +- 🚨 **CRITICAL:** Depends on MariaDB which is stopped! +- Currently using embedded database (not recommended) +- Data loss risk without proper database backend + +**Recommendations:** +1. ⚠️ **FIX IMMEDIATELY** - Restart MariaDB or reconfigure +2. If keeping: Start MariaDB and verify connection +3. If not using: Stop Guacamole and remove both +4. Document your use case for remote desktop access + +**Priority:** MEDIUM - Fix dependency or remove + +--- + +### 5. Cloudflared ⭐⭐⭐ + +**Status:** Running (2.5+ months - very stable!) +**Container:** Unraid-Cloudflared-Tunnel +**Image:** figro/unraid-cloudflared-tunnel (8.92MB) +**Created:** 2025-08-10 (2.5+ months ago) +**Network:** bridge (172.17.0.6) +**Ports:** 46495β†’46495 (metrics) + +**Resources:** +- CPU: 0.33% (highest of running containers) +- Memory: 68.6MB (0.11%) +- Network I/O: 41.7MB RX / 310KB TX + +**Purpose:** Cloudflare Tunnel - secure external access without port forwarding + +**Dependencies:** None + +**Access:** +- Metrics: http://192.168.68.51:46495 +- Domain: *.segelschiff.app (managed via Cloudflare) + +**Configuration:** +- Tunnel token configured +- No auto-update enabled +- Metrics exposed for monitoring + +**Security:** +- ⚠️ Tunnel token in plain text environment variable +- βœ… No open ports on router (excellent!) + +**Recommendations:** +1. βœ… **KEEP** - Excellent security practice +2. Rotate tunnel token periodically +3. Document which services are exposed +4. Integrate metrics with monitoring stack + +**Priority:** HIGH - Critical for secure remote access + +--- + +### 6. Vaultwarden ⭐⭐⭐ + +**Status:** Running (healthy) - 3+ months uptime! +**Container:** vaultwarden +**Image:** vaultwarden/server (256MB) +**Created:** 2025-07-31 (3+ months ago) +**Network:** bridge (172.17.0.7) +**Ports:** 80β†’4743 + +**Resources:** +- CPU: 0.00% (idle) +- Memory: 24.96MB (0.04%) - Very lightweight! + +**Purpose:** Self-hosted password manager (Bitwarden compatible) + +**Dependencies:** None + +**Access:** +- Web: http://192.168.68.51:4743 +- Admin: http://192.168.68.51:4743/admin + +**Configuration:** +- Signups allowed: true ⚠️ +- Invitations allowed: false βœ… +- WebSocket disabled ⚠️ +- Admin token exposed ⚠️ + +**Issues:** +- 🚨 **CRITICAL:** No backup strategy evident! +- ⚠️ Admin token in plain text +- ⚠️ Signups open (verify intentional) +- ⚠️ WebSocket disabled (reduces functionality) + +**Recommendations:** +1. βœ… **KEEP** - Critical security infrastructure +2. 🚨 **IMPLEMENT BACKUP IMMEDIATELY** - This is your password vault! +3. Close signups after initial setup +4. Rotate admin token and use secrets management +5. Enable WebSocket for better sync +6. Automate daily backups to off-site location + +**Priority:** CRITICAL - Contains all your passwords! + +--- + +## πŸ”΄ Recently Stopped Services (Worth Investigating) + +### 7. ollama ⚠️ + +**Status:** Exited (128) 4 minutes ago +**Image:** ollama/ollama (3.33GB) +**Purpose:** Local LLM inference engine + +**Why It Matters:** open-webui depends on this! + +**Recommendations:** +1. πŸ”§ **RESTART** - Required for open-webui local models +2. Investigate exit code 128 (configuration issue?) +3. Configure GPU acceleration (RTX 4090!) +4. Test with open-webui after restart + +**Action:** `docker start ollama && docker logs -f ollama` + +--- + +### 8. Monitoring Stack (Stopped 12 days ago) 🚨 + +**Containers:** +- Grafana (stopped 12 days) +- InfluxDB (stopped 12 days) +- Telegraf (stopped 12 days) + +**Total Size:** ~1.7GB + +**Why Critical:** Zero observability into system health! + +**Recommendations:** +1. 🚨 **RESTART IMMEDIATELY** - Priority 1! +2. Configure dashboards for: + - Docker container stats + - System resources (CPU, RAM, disk) + - Network traffic + - Temperature sensors +3. Set up alerting for critical issues +4. Document in runbook + +**Action:** +```bash +docker start Influxdb +sleep 15 # Wait for DB initialization +docker start Telegraf +docker start Grafana +``` + +--- + +### 9. MariaDB (Stopped 12 days ago) ⚠️ + +**Status:** Exited (0) 12 days ago +**Image:** lscr.io/linuxserver/mariadb (348MB) +**Purpose:** MySQL database for Guacamole + +**Issue:** Guacamole is running but database is stopped! + +**Recommendations:** +1. If using Guacamole: **RESTART** +2. If not using Guacamole: **REMOVE BOTH** +3. Document decision + +--- + +### 10. Database Admin Tools (Stopped 12 days ago) + +**CloudBeaver** - Stopped 12 days +**adminer** - Stopped 12 days + +**Issue:** Two database admin tools - redundant! + +**Recommendations:** +1. **CHOOSE ONE:** + - CloudBeaver: Feature-rich (725MB) + - adminer: Lightweight (118MB) +2. Remove the other +3. Only restart if you need database management + +--- + +## 🟑 Experimental / Inactive Services (Decision Needed) + +### 11. Nextcloud AIO Stack (7 containers!) 🚨 + +**Status:** All stopped 3 weeks ago +**Total Size:** ~7GB Docker images + data +**Containers:** +- nextcloud-aio-mastercontainer +- nextcloud-aio-apache +- nextcloud-aio-nextcloud (2.19GB) +- nextcloud-aio-database (PostgreSQL) +- nextcloud-aio-redis +- nextcloud-aio-onlyoffice (3.79GB!) +- nextcloud-aio-imaginary +- nextcloud-aio-notify-push + +**Data:** /mnt/user/nextcloud (~1GB+) + +**Analysis:** +- Massive resource footprint +- "All-in-One" = heavy coupling +- Stopped for 3 weeks suggests not critical + +**Recommendations:** +**DECISION REQUIRED:** + +**Option A: Remove Everything** +```bash +# Backup data first! +cp -r /mnt/user/nextcloud /mnt/user/backup/nextcloud-$(date +%Y%m%d) + +# Remove containers +docker rm nextcloud-aio-* + +# Remove images to free space +docker rmi $(docker images | grep nextcloud | awk '{print $3}') + +# Archive data +tar -czf nextcloud-data-backup.tar.gz /mnt/user/nextcloud +``` +**Saves:** ~7GB+ space + +**Option B: Keep and Restart** +- Document why you need it +- Create restart procedure +- Implement backup strategy +- Monitor resource usage + +**My Recommendation:** Remove unless actively needed. Nextcloud is great but this All-in-One stack is heavy. + +--- + +### 12. Jellyfin (Stopped 2 weeks ago) ⚠️ + +**Status:** Exited (0) 2 weeks ago +**Image:** jellyfin/jellyfin (1.25GB) +**GPU:** RTX 4090 allocated but idle! + +**Media:** +- Movies: /mnt/user/movies +- TV: /mnt/user/tv shows +- Music: /mnt/user/music + +**Issue:** $1600 GPU sitting idle! + +**Recommendations:** +**If you want media server:** +1. **RESTART** with hardware transcoding: + ```bash + docker start Jellyfin + ``` +2. Configure NVENC/NVDEC for RTX 4090 +3. Test 4K transcoding performance +4. Switch from `host` network to bridge (security) + +**If you don't need media server:** +1. Remove GPU allocation from container +2. Free GPU for other projects (AI/ML) + +**Action Required:** Decide on media server strategy + +--- + +### 13. Large AI/ML Containers (Rarely Used) + +**ebook2audiobook** - 20.06GB! (stopped 3 weeks) +**docling-serve** - 14.45GB! (stopped 2 weeks) + +**Total:** 34.5GB for two containers! + +**Analysis:** +- Massive images +- Rarely used (stopped weeks ago) +- Experimental/one-time use? + +**Recommendations:** +1. **REMOVE** both to free 34.5GB +2. If needed again, pull fresh images +3. Document use cases if keeping + +**Potential Savings:** 34.5GB cache space! + +--- + +### 14. Productivity Suite (Multiple Stopped) + +**baserow** - Stopped 2 weeks (2.25GB) +**NocoDB** - Stopped 3 weeks (588MB) +**OpenProject** - Stopped 7 weeks (2.87GB) + +**Issue:** Three project management tools - redundant! + +**Recommendations:** +1. **CHOOSE ONE** (or none if not used) +2. Remove the others +3. Migrate data if needed first + +**Potential Savings:** ~5GB + +--- + +### 15. Development Tools + +**n8n** (workflow automation) - Created but never started +**steam-headless** - Created but not running + +**Recommendations:** +- Document if you have plans for these +- Remove if experimental and abandoned + +--- + +## πŸ“‹ Container Decision Matrix + +| Container | Keep? | Action | Priority | +|-----------|-------|--------|----------| +| **open-webui** | βœ… Yes | Keep running, restart ollama | HIGH | +| **NginxProxyManager** | βœ… Yes | Keep, document configs | CRITICAL | +| **Gitea** | βœ… Yes | Keep, fix SSH port, backup | CRITICAL | +| **ApacheGuacamole** | ⚠️ Decide | Fix MariaDB OR remove both | MEDIUM | +| **Cloudflared** | βœ… Yes | Keep, rotate token | HIGH | +| **Vaultwarden** | βœ… Yes | Keep, BACKUP NOW! | CRITICAL | +| **ollama** | βœ… Yes | Restart immediately | HIGH | +| **Monitoring Stack** | βœ… Yes | Restart all 3 containers | CRITICAL | +| **MariaDB** | ⚠️ Conditional | If Guacamole stays | MEDIUM | +| **Nextcloud AIO (7)** | ❌ Remove | Backup data, remove stack | LOW | +| **Jellyfin** | ⚠️ Decide | Use GPU or remove | MEDIUM | +| **ebook2audiobook** | ❌ Remove | Free 20GB | LOW | +| **docling-serve** | ❌ Remove | Free 14.5GB | LOW | +| **baserow/NocoDB/OpenProject** | ❌ Choose 1 | Remove others | LOW | +| **CloudBeaver/adminer** | ⚠️ Choose 1 | Keep one DB admin | LOW | + +--- + +## 🎯 Recommended Action Plan + +### Phase 1: Critical (Do First!) 🚨 + +1. **Backup Vaultwarden** (30 min) + ```bash + docker stop vaultwarden + tar -czf vaultwarden-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/vaultwarden + docker start vaultwarden + ``` + +2. **Backup Gitea** (30 min) + ```bash + docker stop Gitea + tar -czf gitea-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/gitea + docker start Gitea + ``` + +3. **Restart Monitoring Stack** (15 min) + ```bash + docker start Influxdb && sleep 15 + docker start Telegraf Grafana + # Configure dashboards + ``` + +4. **Restart ollama** (5 min) + ```bash + docker start ollama + docker logs -f ollama + ``` + +### Phase 2: Cleanup (Free Space!) πŸ’Ύ + +5. **Remove Large Unused Containers** (1 hour) + - ebook2audiobook (20GB) + - docling-serve (14.5GB) + - Nextcloud AIO stack (7GB) + - **Saves: ~41GB!** + +6. **Docker System Cleanup** + ```bash + docker system prune -a + # Free unused images and build cache + ``` + +### Phase 3: Decisions (This Week) + +7. **Guacamole + MariaDB** - Keep or remove? +8. **Jellyfin** - Restart with GPU or remove? +9. **Productivity tools** - Choose one, remove others +10. **Database admin** - CloudBeaver or adminer? + +--- + +## πŸ“Š Storage Cleanup Impact + +**Current Cache Usage:** 578GB / 932GB (63%) + +**After Recommended Cleanup:** +- Remove ebook2audiobook: -20GB +- Remove docling-serve: -14.5GB +- Remove Nextcloud AIO: -7GB +- Docker system prune: ~10-20GB +- **Total Freed: ~50-60GB** + +**New Cache Usage:** ~520GB / 932GB (56%) βœ… + +--- + +## πŸ” Security Recommendations + +1. **Secrets Management** - Stop using plain text env vars +2. **Close Open Signups** - Vaultwarden signups should be closed +3. **SSH Port Conflict** - Fix Gitea port 22 conflict +4. **Network Mode** - Move Jellyfin from `host` to `bridge` +5. **Version Pinning** - Stop using `latest` tags + +--- + +## πŸ“ˆ Resource Summary + +**Docker Images Total:** ~50GB +**Container Data:** Varies by appdata +**Cache Impact:** High (63% full) + +**Top Resource Consumers (Images):** +1. ebook2audiobook: 20.06GB +2. docling-serve: 14.45GB +3. Nextcloud stack: ~7GB +4. open-webui: 4.55GB +5. OpenProject: 2.87GB + +--- + +## πŸŽ“ Key Takeaways + +1. **6 services are your core** - Keep these running +2. **26 stopped containers** - Cleanup opportunity +3. **~40GB can be freed** - Significant space available +4. **No monitoring** - Critical gap (restart Grafana stack!) +5. **Backup critical** - Vaultwarden and Gitea MUST be backed up + +--- + +**Last Updated:** October 31, 2025 +**Next Review:** After cleanup actions completed +**Maintained By:** Weston diff --git a/quick-start.md b/quick-start.md new file mode 100644 index 0000000..86d861d --- /dev/null +++ b/quick-start.md @@ -0,0 +1,954 @@ +# πŸš€ Quick Start & Emergency Recovery Guide + +**Purpose:** Get your homelab back online quickly after disaster +**Target Time:** 30-60 minutes to basic functionality +**Last Updated:** October 31, 2025 + +--- + +## 🎯 Quick Access Reference + +### Essential URLs + +| Service | URL | Default Credentials | +|---------|-----|---------------------| +| **Unraid Dashboard** | http://192.168.68.51 | root / (your password) | +| **Gitea** | https://gitea.segelschiff.app | Weston / (your password) | +| **Vaultwarden** | http://192.168.68.51:4743 | Master password | +| **NPM Admin** | http://192.168.68.51:7818 | admin@example.com / changeme (first login) | +| **Pi-hole** | http://192.168.68.61/admin | (your password) | +| **PiKVM** | https://192.168.68.53 | admin / admin (default) | + +### SSH Access + +```bash +# Local network +ssh root@192.168.68.51 + +# Via Tailscale (from anywhere) +ssh root@100.122.220.126 + +# Emergency: Use PiKVM for console access +# https://192.168.68.53 +``` + +--- + +## πŸ†˜ Emergency Recovery Scenarios + +### Scenario 1: Server Won't Boot 🚨 + +**Symptoms:** +- No network connectivity to 192.168.68.51 +- Unraid WebUI unreachable +- No response to ping + +**Recovery Steps:** + +1. **Physical Check** (via PiKVM or in person) + ``` + [ ] Server has power (check LED) + [ ] Network cable connected to eth0 + [ ] Monitor shows output (via PiKVM) + [ ] USB boot drive is present and detected + ``` + +2. **Use PiKVM for Remote Console** + - Access: https://192.168.68.53 + - Login: admin / admin + - View boot process + - Check BIOS/boot messages + +3. **Common Boot Issues** + + **USB Boot Drive Failure** (Most common!) + ``` + Symptoms: "Boot device not found" or similar + + Fix: + 1. Have backup USB ready + 2. Shut down server (via PiKVM power control) + 3. Replace USB boot drive + 4. Power on + 5. Restore configuration from backup + ``` + + **BIOS Settings Changed** + ``` + Fix: + 1. Enter BIOS (DEL/F2 during boot) + 2. Load defaults + 3. Verify boot order (USB first) + 4. Save and exit + ``` + + **Hardware Failure** + ``` + Check: + 1. RAM seated properly + 2. All drives detected in BIOS + 3. CPU fan spinning + 4. No error beeps + ``` + +4. **Boot from Backup USB** + ``` + Steps: + 1. Power off server + 2. Insert backup USB boot drive + 3. Power on + 4. Verify boot successful + 5. Restore configuration: + - Tools β†’ Flash Backup β†’ Browse β†’ Select backup ZIP + - Reboot + ``` + +**Prevention:** +- βœ… Keep USB flash backup updated (weekly) +- βœ… Store backup USB in safe location +- βœ… Document BIOS settings (screenshots via PiKVM) + +--- + +### Scenario 2: Lost Admin Password + +**Unraid Root Password Reset:** + +1. **Via PiKVM Console** + ``` + 1. Access PiKVM: https://192.168.68.53 + 2. View console in browser + 3. Wait for login prompt + 4. Press Ctrl+Alt+F2 (via PiKVM keyboard) + 5. At terminal: passwd root + 6. Enter new password twice + 7. Press Ctrl+Alt+F1 to return to GUI + 8. Update documentation + ``` + +2. **Via Physical Access** + ``` + 1. Connect monitor and keyboard to server + 2. Press Ctrl+Alt+F2 + 3. Run: passwd root + 4. Set new password + 5. Press Ctrl+Alt+F1 + ``` + +**Container Passwords:** +- Check `/mnt/user/appdata//config` +- Review environment variables in Docker templates +- Use Vaultwarden if accessible +- Check this documentation repo in Gitea + +--- + +### Scenario 3: Container Won't Start + +**Quick Diagnosis:** + +```bash +# Check container status +docker ps -a | grep + +# View recent logs +docker logs --tail 100 + +# Look for errors +docker inspect | grep -i error +``` + +**Common Fixes:** + +**Port Conflict:** +```bash +# Find what's using the port +netstat -tulpn | grep + +# Example: Port 3000 already in use +netstat -tulpn | grep 3000 + +# Stop conflicting service +docker stop +``` + +**Volume Permission Issues:** +```bash +# Check ownership +ls -la /mnt/user/appdata/ + +# Fix permissions (Unraid standard: 99:100) +chown -R 99:100 /mnt/user/appdata/ + +# Example: Fix Vaultwarden +chown -R 99:100 /mnt/user/appdata/vaultwarden +``` + +**Dependency Missing:** +```bash +# Example: Guacamole needs MariaDB +docker start mariadb +sleep 10 # Wait for database initialization +docker start ApacheGuacamole + +# Verify dependency is running +docker ps | grep mariadb +``` + +**Resource Exhaustion:** +```bash +# Check cache usage +df -h /mnt/cache + +# If cache full (>90%), clean up +docker system prune -a # ⚠️ REMOVES UNUSED IMAGES! + +# Or free space manually +# See service-inventory.md for cleanup recommendations +``` + +--- + +### Scenario 4: Network Connectivity Issues + +**Can't Access from LAN:** + +```bash +# SSH into Unraid (via PiKVM if network down) +ssh root@192.168.68.51 + +# Check if br0 is up +ip addr show br0 +# Should show: 192.168.68.51/22 + +# Verify IP and routes +ip route | grep default +# Should show: default via 192.168.68.1 + +# Test router connectivity +ping -c 3 192.168.68.1 + +# Test internet +ping -c 3 8.8.8.8 + +# Test DNS (Pi-hole) +nslookup google.com 192.168.68.61 +``` + +**Fix Network Issues:** + +```bash +# Restart networking (from console/PiKVM) +/etc/rc.d/rc.inet1 restart + +# If that doesn't work, reboot +reboot +``` + +**Can't Access Containers:** + +```bash +# Check Docker network +docker network inspect bridge + +# Verify container IP +docker inspect | grep IPAddress + +# Test from Unraid host +curl http://172.17.0.5:8080 # Example: open-webui + +# Test port mapping +curl http://192.168.68.51:3000 # Should reach open-webui +``` + +**DNS Not Resolving:** + +```bash +# Test Pi-hole directly +nslookup google.com 192.168.68.61 + +# If Pi-hole down, check Pi Zero +ping 192.168.68.61 + +# SSH to Pi-hole +ssh pi@192.168.68.61 + +# Check Pi-hole status +pihole status + +# Restart if needed +pihole restartdns +``` + +--- + +### Scenario 5: Array Won't Start + +**Symptoms:** +- Unraid GUI accessible but array shows "Stopped" +- Disks show errors or missing + +**Troubleshooting:** + +```bash +# Check disk health +smartctl -a /dev/sdb # Parity +smartctl -a /dev/sdc # Disk 1 + +# View disk assignments +cat /boot/config/disk.cfg + +# Check for filesystem errors (read-only check) +xfs_repair -n /dev/md1p1 +``` + +**Common Causes:** +- Parity sync in progress (wait for completion) +- Disk failed (check SMART, may need replacement) +- Unclean shutdown (filesystem check required) +- Disk assignment changed + +**Recovery:** + +1. **Start Array in Maintenance Mode** + - Click "Start" in Unraid GUI + - Select "Maintenance mode" if prompted + - Run filesystem check if prompted + +2. **Review Logs** + - Settings β†’ System Log + - Look for disk errors + - Check for power events + +3. **If Disk Failed** + - Follow Unraid disk replacement procedure + - Do NOT format or write to disk unnecessarily + - Seek help in Unraid forums if uncertain + +--- + +## πŸ”§ Critical Service Restart Procedures + +### Restart Core Services (Proper Order) + +**1. Infrastructure First:** +```bash +# Start reverse proxy (for routing) +docker start NginxProxyManager + +# Wait for it to be ready +sleep 5 +docker ps | grep NginxProxyManager + +# Start tunnel (for remote access) +docker start Cloudflared + +# Verify both running +docker ps | grep -E "NginxProxyManager|Cloudflared" +``` + +**2. Security Services:** +```bash +# Password manager (critical!) +docker start vaultwarden + +# Wait for healthy status +sleep 10 +docker ps | grep vaultwarden +# Should show "(healthy)" + +# If not healthy, check logs +docker logs --tail 50 vaultwarden +``` + +**3. Development Tools:** +```bash +# Git server +docker start Gitea + +# Wait for initialization +sleep 5 + +# Remote access gateway +docker start ApacheGuacamole +# Note: Needs MariaDB if configured +``` + +**4. Monitoring (IMPORTANT!):** +```bash +# Database first +docker start Influxdb + +# Wait for DB to initialize +sleep 15 + +# Then metrics collector +docker start Telegraf + +# Finally visualization +docker start Grafana + +# Verify all running +docker ps | grep -E "Influxdb|Telegraf|Grafana" +``` + +**5. Optional Services:** +```bash +# LLM backend +docker start ollama +sleep 10 + +# LLM interface +docker start open-webui + +# Wait for healthy +docker ps | grep open-webui +``` + +--- + +### Stop All Services Gracefully + +```bash +# Stop all running containers +docker stop $(docker ps -q) + +# Verify all stopped +docker ps +# Should show empty output + +# Wait before stopping array +sleep 5 + +# Stop array (from GUI) +# Main β†’ Array Operation β†’ Stop +``` + +--- + +## πŸ“¦ Backup & Restore Procedures + +### USB Flash Backup (Unraid Configuration) + +**Create Backup:** +1. Navigate to: **Main β†’ Flash β†’ Flash Backup** +2. Click "Backup Now" +3. Download ZIP file (e.g., `unraid-flash-backup-20251031.zip`) +4. Store securely OFF-SERVER: + - OneDrive: `/z_Unraid/Backups/` + - External drive + - Cloud storage + +**Restore from Backup:** +``` +1. Format new USB drive (if needed) +2. Copy backup ZIP to new USB +3. Extract contents to root of USB + - config/ directory + - bzimage, bzroot, etc. +4. Safely eject USB +5. Boot from new USB +6. Configuration restored automatically +``` + +**Frequency:** +- Weekly minimum +- After ANY configuration change +- Before major updates + +--- + +### Container Data Backup + +**Critical Directories:** + +``` +Priority 1 (CRITICAL): +/mnt/user/appdata/vaultwarden/ 🚨 Your passwords! +/mnt/user/appdata/gitea/ 🚨 Your code repositories! + +Priority 2 (Important): +/mnt/user/appdata/NginxProxyManager/ Proxy configs +/mnt/user/appdata/Grafana/ Dashboards +/mnt/user/appdata/Influxdb/ Metrics history + +Priority 3 (Optional): +/mnt/user/appdata/open-webui/ LLM chat history +``` + +**Quick Backup Script:** + +```bash +#!/bin/bash +# Save as: /mnt/user/scripts/backup-critical.sh + +BACKUP_DIR="/mnt/user/backups/$(date +%Y%m%d_%H%M%S)" +mkdir -p "$BACKUP_DIR" + +echo "Stopping containers..." +docker stop vaultwarden Gitea NginxProxyManager + +echo "Backing up data..." +tar -czf "$BACKUP_DIR/vaultwarden.tar.gz" /mnt/user/appdata/vaultwarden +tar -czf "$BACKUP_DIR/gitea.tar.gz" /mnt/user/appdata/gitea +tar -czf "$BACKUP_DIR/npm.tar.gz" /mnt/user/appdata/NginxProxyManager + +echo "Restarting containers..." +docker start vaultwarden Gitea NginxProxyManager + +echo "βœ… Backup complete: $BACKUP_DIR" +ls -lh "$BACKUP_DIR" +``` + +**Make Executable:** +```bash +chmod +x /mnt/user/scripts/backup-critical.sh +``` + +**Run Manually:** +```bash +/mnt/user/scripts/backup-critical.sh +``` + +**Schedule (User Scripts Plugin):** +- Frequency: Daily at 2 AM +- Retention: Keep last 30 days + +--- + +**Restore from Backup:** + +```bash +# Example: Restore Vaultwarden +docker stop vaultwarden + +# Backup current (corrupted) data +mv /mnt/user/appdata/vaultwarden /mnt/user/appdata/vaultwarden.old + +# Extract backup +tar -xzf /mnt/user/backups/20251031_120000/vaultwarden.tar.gz -C / + +# Restart container +docker start vaultwarden + +# Verify working +curl http://192.168.68.51:4743 +``` + +--- + +## ⚑ Quick Commands Reference + +### System Status + +```bash +# System uptime and load +uptime + +# Resource usage +free -h +df -h + +# Array status +cat /proc/mdcmd + +# Docker container summary +docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}" + +# Temperature (if sensors installed) +sensors + +# Disk health quick check +smartctl -H /dev/sdb # Parity +smartctl -H /dev/sdc # Disk 1 +``` + +### Docker Quick Commands + +```bash +# Start all stopped containers +docker start $(docker ps -aq) + +# Stop all running containers +docker stop $(docker ps -q) + +# View logs (last 50 lines) +docker logs --tail 50 + +# Follow logs in real-time +docker logs -f + +# Restart container +docker restart + +# Remove container (⚠️ will lose non-volume data!) +docker rm + +# Clean up unused resources +docker system prune # Safe cleanup +docker system prune -a # ⚠️ Removes unused images too! +docker system prune --volumes # ⚠️ Removes unused volumes! +``` + +### Network Diagnostics + +```bash +# Check all interfaces +ip addr show + +# Test key infrastructure +ping -c 3 192.168.68.1 # Router +ping -c 3 192.168.68.51 # Unraid +ping -c 3 192.168.68.61 # Pi-hole +ping -c 3 8.8.8.8 # Internet + +# DNS resolution test +nslookup google.com +nslookup google.com 192.168.68.61 # Test Pi-hole specifically + +# Check listening ports +netstat -tulpn | grep LISTEN + +# Test specific port +nc -zv 192.168.68.51 3002 # Example: Gitea +curl -I http://192.168.68.51:3002 # HTTP test +``` + +### Quick Health Check Script + +```bash +#!/bin/bash +# Save as: /mnt/user/scripts/health-check.sh + +echo "=== Unraid Health Check ===" +echo "" + +echo "1. Array Status:" +cat /proc/mdcmd | grep mdState + +echo "" +echo "2. Running Containers:" +docker ps --format "table {{.Names}}\t{{.Status}}" + +echo "" +echo "3. Disk Usage:" +df -h | grep -E "cache|disk1|Filesystem" + +echo "" +echo "4. Network Connectivity:" +ping -c 2 192.168.68.1 >/dev/null 2>&1 && echo " Router: βœ… OK" || echo " Router: ❌ FAIL" +ping -c 2 8.8.8.8 >/dev/null 2>&1 && echo " Internet: βœ… OK" || echo " Internet: ❌ FAIL" +ping -c 2 192.168.68.61 >/dev/null 2>&1 && echo " Pi-hole: βœ… OK" || echo " Pi-hole: ❌ FAIL" + +echo "" +echo "5. Critical Services:" +curl -s http://localhost:4743 >/dev/null && echo " Vaultwarden: βœ… OK" || echo " Vaultwarden: ❌ DOWN" +curl -s http://localhost:3002 >/dev/null && echo " Gitea: βœ… OK" || echo " Gitea: ❌ DOWN" +curl -s http://localhost:7818 >/dev/null && echo " NPM: βœ… OK" || echo " NPM: ❌ DOWN" + +echo "" +echo "=== Health Check Complete ===" +``` + +**Run:** `bash /mnt/user/scripts/health-check.sh` + +--- + +## πŸ“ž Getting Help + +### Pre-flight Checks + +Before asking for help, gather this information: + +1. **System Diagnostics** + - Unraid WebGUI: Tools β†’ Diagnostics β†’ Download + - Creates ZIP with all logs + +2. **Container Logs** + ```bash + docker logs > container-logs.txt + ``` + +3. **Network Configuration** + ```bash + ip addr show > network-config.txt + ip route show >> network-config.txt + ``` + +4. **Disk Status** + ```bash + smartctl -a /dev/sdb > disk-smart.txt + smartctl -a /dev/sdc >> disk-smart.txt + ``` + +### Community Resources + +- **Unraid Forums:** https://forums.unraid.net/ + - Post diagnostics ZIP + - Be specific about symptoms + - Include what you've tried + +- **r/unraid:** https://reddit.com/r/unraid + - Quick questions + - Share diagnostics in pastebin + +- **Discord:** Unraid Official Discord + - Real-time help + - Active community + +### Emergency Contacts + +``` +ISP Support: [Your ISP Phone Number] +Unraid License: [Store in secure location] +USB Backup Location: [Document where stored] +Off-site Backup: [If applicable] +``` + +--- + +## πŸŽ“ Post-Recovery Checklist + +After restoring from disaster: + +``` +[ ] Unraid array started successfully +[ ] All critical services running + [ ] NginxProxyManager + [ ] Cloudflared + [ ] Vaultwarden + [ ] Gitea +[ ] Network connectivity verified + [ ] Can access Unraid WebUI + [ ] Can ping router (192.168.68.1) + [ ] Internet working + [ ] DNS resolving (Pi-hole) +[ ] Vaultwarden accessible (test password retrieval) +[ ] Gitea accessible (verify repositories intact) +[ ] NPM routing working (test reverse proxy) +[ ] Monitoring stack restarted + [ ] Grafana + [ ] InfluxDB + [ ] Telegraf +[ ] External access working + [ ] Tailscale connected + [ ] Cloudflare tunnel active +[ ] Backups verified and up-to-date +[ ] Documentation updated with lessons learned +[ ] Incident documented in change log (Gitea) +``` + +--- + +## πŸ”’ Security After Recovery + +**Immediately After Disaster Recovery:** + +1. **Change Passwords** (if compromise suspected) + ``` + [ ] Unraid root password + [ ] Vaultwarden master password + [ ] Container admin passwords + [ ] Pi-hole admin password + [ ] PiKVM password + ``` + +2. **Review Access Logs** + ```bash + # Check SSH attempts + grep "Failed password" /var/log/auth.log | tail -50 + + # Check NPM access + docker logs NginxProxyManager | grep -i error + + # Check Gitea access + docker logs Gitea | grep -i login + ``` + +3. **Verify Firewall Rules** + ```bash + iptables -L -n -v + ``` + +4. **Check for Unauthorized Changes** + ```bash + # Review Docker containers + docker ps -a + + # Check cron jobs + crontab -l + + # Review network interfaces + ip addr show + ``` + +--- + +## πŸ“ Documentation Updates After Incident + +**What to Document:** + +1. **What Happened:** + - Date/time of incident + - Symptoms observed + - Root cause (if determined) + - Duration of outage + +2. **What You Did:** + - Steps taken to recover + - What worked / didn't work + - Resources used (forums, docs, etc.) + - Time to recovery + +3. **Lessons Learned:** + - What could prevent this in future + - Process improvements needed + - Documentation gaps discovered + - Backup improvements needed + +4. **Action Items:** + - Backups to implement/improve + - Monitoring to add + - Scripts to create + - Hardware to replace/upgrade + +**Where to Document:** +- Create incident report: `docs/incidents/YYYY-MM-DD-incident-name.md` +- Update this quick-start guide with new procedures +- Add to troubleshooting section if recurring issue +- Commit to Gitea with detailed message + +--- + +## πŸš€ Normal Startup Sequence + +**From Cold Boot:** + +``` +1. Power on server + ↓ +2. BIOS POST (~30 seconds) + - Hardware check + - Memory test + - Drive detection + ↓ +3. Unraid boots from USB (~1-2 minutes) + - Linux kernel loads + - Unraid OS starts + ↓ +4. Network initializes + - br0 interface up + - Gets IP: 192.168.68.51 + ↓ +5. Array auto-starts (if configured) + - Parity disk: sdb + - Data disk: sdc + - Cache: nvme1n1p1 + ↓ +6. Docker service starts + - docker0 bridge created + - Networks initialized + ↓ +7. Containers auto-start (if enabled) + - Infrastructure services first + - Then application services + ↓ +8. Services available (~3-5 minutes total) + βœ… Ready to use! +``` + +**Expected Boot Time:** 3-5 minutes +**If Taking Longer:** Check system log for errors + +--- + +## 🎯 Quick Health Check Command + +**Run After Any Restart:** + +```bash +# Quick one-liner health check +docker ps --format "table {{.Names}}\t{{.Status}}" && \ +df -h | grep -E "cache|disk1" && \ +ping -c 2 192.168.68.1 >/dev/null && echo "Network: OK" || echo "Network: FAIL" +``` + +--- + +## πŸ“š Related Documentation + +- **Network Issues:** See `network-map.md` +- **Service Details:** See `service-inventory.md` +- **Container Configs:** See `docker-compose/` (when created) +- **Main Overview:** See `README.md` + +--- + +## πŸ†˜ True Emergency - Complete System Down + +**If everything is down and you need immediate help:** + +1. **Access via PiKVM** + - https://192.168.68.53 + - Get console access + - View what's happening + +2. **Check Physical Server** + - Power LED on? + - Fans spinning? + - Drives spinning up? + - Network activity lights? + +3. **Try Safe Mode Boot** + - Boot Unraid in Safe Mode (GUI mode) + - Diagnose from console + +4. **Community Help** + - Unraid Discord (fastest response) + - Forums with diagnostics ZIP + - r/unraid for quick questions + +5. **Document Everything** + - Take photos/screenshots via PiKVM + - Note exact error messages + - Record what you tried + - Timeline of events + +--- + +## πŸ’‘ Pro Tips + +1. **Test Your Backups** + - Restore test annually + - Verify data integrity + - Practice recovery procedures + +2. **Keep This Guide Accessible** + - Save offline copy to phone/laptop + - Print critical sections + - Bookmark in browser + +3. **Automate Where Possible** + - Schedule backup scripts + - Set up monitoring alerts + - Use User Scripts plugin + +4. **Document As You Go** + - Update after fixing issues + - Add new procedures discovered + - Note what worked/didn't work + +--- + +**Last Updated:** October 31, 2025 +**Next Review:** Quarterly or after incidents +**Maintained By:** Weston + +--- + +**Remember:** Most issues are recoverable. Stay calm, work methodically, document your steps, and don't hesitate to ask for help! + +**Keep this guide accessible even when the server is down!** +πŸ’‘ **Pro Tip:** Save a copy to your phone/laptop/OneDrive! + +πŸš€ **You've got this!** diff --git a/service-inventory.md b/service-inventory.md new file mode 100644 index 0000000..3c8a393 --- /dev/null +++ b/service-inventory.md @@ -0,0 +1,614 @@ +# πŸ“¦ Service Inventory - Complete Container Catalog + +**Last Updated:** October 31, 2025 +**Total Containers:** 32 (6 running, 26 stopped) +**Purpose:** Comprehensive catalog of all services + +--- + +## πŸ“Š Quick Stats + +| Metric | Value | Status | +|--------|-------|--------| +| **Total Containers** | 32 | - | +| **Running** | 6 | βœ… 19% | +| **Stopped** | 26 | ⚠️ 81% | +| **Total Docker Images** | ~50GB | ⚠️ High | +| **Cache Usage** | 578GB / 932GB | ⚠️ 63% | + +**Key Insight:** 81% of containers are stopped - cleanup opportunity! + +--- + +## 🟒 Running Services (6 containers) + +### 1. open-webui ⭐⭐⭐ + +**Status:** Running (healthy) +**Container:** open-webui +**Image:** ghcr.io/open-webui/open-webui:main (4.55GB) +**Created:** 2025-10-16 (2 weeks ago) +**Network:** bridge (172.17.0.5) +**Ports:** 8080 β†’ 3000 + +**Resources:** +- CPU: 0.15% +- Memory: 1.026GB / 60.55GB (1.69%) +- Storage: 42.4MB + +**Purpose:** LLM chat interface (ChatGPT-like UI for local models) + +**Dependencies:** +- ollama (currently STOPPED ❌) +- OpenAI API key (configured) + +**Access:** +- Local: http://192.168.68.51:3000 +- No authentication by default + +**Issues:** +- ⚠️ Depends on ollama container which is stopped +- ⚠️ OpenAI API key exposed in environment variables + +**Recommendations:** +1. βœ… **KEEP** - Active LLM interface +2. Restart ollama container to enable local models +3. Move API keys to Docker secrets +4. Enable authentication + +**Priority:** HIGH - Core AI/ML service + +--- + +### 2. NginxProxyManager ⭐⭐⭐ + +**Status:** Running +**Container:** NginxProxyManager +**Image:** jlesage/nginx-proxy-manager (189MB) +**Created:** 2025-10-11 (3 weeks ago) +**Network:** bridge (172.17.0.4) +**Ports:** 4443β†’18443, 8080β†’1880, 8181β†’7818 + +**Resources:** +- CPU: 0.08% +- Memory: 77.45MB (0.12%) +- Storage: 13.4KB + +**Purpose:** Reverse proxy with web UI - SSL termination and routing + +**Dependencies:** None + +**Access:** +- Admin UI: http://192.168.68.51:7818 +- HTTP: http://192.168.68.51:1880 +- HTTPS: https://192.168.68.51:18443 + +**Configuration:** +- Routes traffic to backend services +- Manages SSL certificates +- Provides access control + +**Recommendations:** +1. βœ… **KEEP** - Critical infrastructure +2. Document all proxy rules in Gitea +3. Verify SSL auto-renewal is configured +4. Enable MFA if available +5. Review access logs regularly + +**Priority:** CRITICAL - Core infrastructure + +--- + +### 3. Gitea ⭐⭐⭐ + +**Status:** Running +**Container:** Gitea +**Image:** gitea/gitea (180MB) +**Created:** 2025-10-08 (3 weeks ago) +**Network:** bridge (172.17.0.3) +**Ports:** 22β†’22, 3000β†’3002 + +**Resources:** +- CPU: 0.11% +- Memory: 114.5MB (0.18%) +- Storage: 113MB (active repositories!) + +**Purpose:** Self-hosted Git server (GitHub alternative) + +**Dependencies:** None (internal SQLite) + +**Access:** +- Web: http://192.168.68.51:3002 +- Domain: https://gitea.segelschiff.app +- SSH: ssh://192.168.68.51:22 ⚠️ (conflicts with Unraid SSH) + +**Configuration:** +- Using latest tag (unpinned version) +- Storage: /mnt/user/appdata/gitea + +**Issues:** +- ⚠️ SSH port 22 conflicts with Unraid SSH +- ⚠️ Using `latest` tag (version not pinned) +- ⚠️ Backup strategy unknown + +**Recommendations:** +1. βœ… **KEEP** - Critical for version control +2. Change SSH port to 2222 to avoid conflict +3. Pin to specific version tag +4. Implement automated backups (CRITICAL!) +5. This is your version control hub - protect it! + +**Priority:** CRITICAL - Infrastructure documentation depends on this + +--- + +### 4. ApacheGuacamole ⭐⭐ + +**Status:** Running (2+ months uptime!) +**Container:** ApacheGuacamole +**Image:** jasonbean/guacamole (737MB) +**Created:** 2025-08-22 (2+ months ago) +**Network:** bridge (172.17.0.2) +**Ports:** 8080β†’4000 + +**Resources:** +- CPU: 0.16% +- Memory: 785.8MB (1.27%) +- Storage: 46.2MB + +**Purpose:** Clientless remote desktop gateway (RDP/VNC/SSH via browser) + +**Dependencies:** +- MariaDB (STOPPED ❌) - **BROKEN DEPENDENCY!** + +**Access:** +- Web: http://192.168.68.51:4000 + +**Configuration:** +- MySQL enabled but MariaDB stopped +- Multiple auth modules: MySQL, LDAP, TOTP, etc. + +**Issues:** +- 🚨 **CRITICAL:** Depends on MariaDB which is stopped! +- Currently using embedded database (not recommended) +- Data loss risk without proper database backend + +**Recommendations:** +1. ⚠️ **FIX IMMEDIATELY** - Restart MariaDB or reconfigure +2. If keeping: Start MariaDB and verify connection +3. If not using: Stop Guacamole and remove both +4. Document your use case for remote desktop access + +**Priority:** MEDIUM - Fix dependency or remove + +--- + +### 5. Cloudflared ⭐⭐⭐ + +**Status:** Running (2.5+ months - very stable!) +**Container:** Unraid-Cloudflared-Tunnel +**Image:** figro/unraid-cloudflared-tunnel (8.92MB) +**Created:** 2025-08-10 (2.5+ months ago) +**Network:** bridge (172.17.0.6) +**Ports:** 46495β†’46495 (metrics) + +**Resources:** +- CPU: 0.33% (highest of running containers) +- Memory: 68.6MB (0.11%) +- Network I/O: 41.7MB RX / 310KB TX + +**Purpose:** Cloudflare Tunnel - secure external access without port forwarding + +**Dependencies:** None + +**Access:** +- Metrics: http://192.168.68.51:46495 +- Domain: *.segelschiff.app (managed via Cloudflare) + +**Configuration:** +- Tunnel token configured +- No auto-update enabled +- Metrics exposed for monitoring + +**Security:** +- ⚠️ Tunnel token in plain text environment variable +- βœ… No open ports on router (excellent!) + +**Recommendations:** +1. βœ… **KEEP** - Excellent security practice +2. Rotate tunnel token periodically +3. Document which services are exposed +4. Integrate metrics with monitoring stack + +**Priority:** HIGH - Critical for secure remote access + +--- + +### 6. Vaultwarden ⭐⭐⭐ + +**Status:** Running (healthy) - 3+ months uptime! +**Container:** vaultwarden +**Image:** vaultwarden/server (256MB) +**Created:** 2025-07-31 (3+ months ago) +**Network:** bridge (172.17.0.7) +**Ports:** 80β†’4743 + +**Resources:** +- CPU: 0.00% (idle) +- Memory: 24.96MB (0.04%) - Very lightweight! + +**Purpose:** Self-hosted password manager (Bitwarden compatible) + +**Dependencies:** None + +**Access:** +- Web: http://192.168.68.51:4743 +- Admin: http://192.168.68.51:4743/admin + +**Configuration:** +- Signups allowed: true ⚠️ +- Invitations allowed: false βœ… +- WebSocket disabled ⚠️ +- Admin token exposed ⚠️ + +**Issues:** +- 🚨 **CRITICAL:** No backup strategy evident! +- ⚠️ Admin token in plain text +- ⚠️ Signups open (verify intentional) +- ⚠️ WebSocket disabled (reduces functionality) + +**Recommendations:** +1. βœ… **KEEP** - Critical security infrastructure +2. 🚨 **IMPLEMENT BACKUP IMMEDIATELY** - This is your password vault! +3. Close signups after initial setup +4. Rotate admin token and use secrets management +5. Enable WebSocket for better sync +6. Automate daily backups to off-site location + +**Priority:** CRITICAL - Contains all your passwords! + +--- + +## πŸ”΄ Recently Stopped Services (Worth Investigating) + +### 7. ollama ⚠️ + +**Status:** Exited (128) 4 minutes ago +**Image:** ollama/ollama (3.33GB) +**Purpose:** Local LLM inference engine + +**Why It Matters:** open-webui depends on this! + +**Recommendations:** +1. πŸ”§ **RESTART** - Required for open-webui local models +2. Investigate exit code 128 (configuration issue?) +3. Configure GPU acceleration (RTX 4090!) +4. Test with open-webui after restart + +**Action:** `docker start ollama && docker logs -f ollama` + +--- + +### 8. Monitoring Stack (Stopped 12 days ago) 🚨 + +**Containers:** +- Grafana (stopped 12 days) +- InfluxDB (stopped 12 days) +- Telegraf (stopped 12 days) + +**Total Size:** ~1.7GB + +**Why Critical:** Zero observability into system health! + +**Recommendations:** +1. 🚨 **RESTART IMMEDIATELY** - Priority 1! +2. Configure dashboards for: + - Docker container stats + - System resources (CPU, RAM, disk) + - Network traffic + - Temperature sensors +3. Set up alerting for critical issues +4. Document in runbook + +**Action:** +```bash +docker start Influxdb +sleep 15 # Wait for DB initialization +docker start Telegraf +docker start Grafana +``` + +--- + +### 9. MariaDB (Stopped 12 days ago) ⚠️ + +**Status:** Exited (0) 12 days ago +**Image:** lscr.io/linuxserver/mariadb (348MB) +**Purpose:** MySQL database for Guacamole + +**Issue:** Guacamole is running but database is stopped! + +**Recommendations:** +1. If using Guacamole: **RESTART** +2. If not using Guacamole: **REMOVE BOTH** +3. Document decision + +--- + +### 10. Database Admin Tools (Stopped 12 days ago) + +**CloudBeaver** - Stopped 12 days +**adminer** - Stopped 12 days + +**Issue:** Two database admin tools - redundant! + +**Recommendations:** +1. **CHOOSE ONE:** + - CloudBeaver: Feature-rich (725MB) + - adminer: Lightweight (118MB) +2. Remove the other +3. Only restart if you need database management + +--- + +## 🟑 Experimental / Inactive Services (Decision Needed) + +### 11. Nextcloud AIO Stack (7 containers!) 🚨 + +**Status:** All stopped 3 weeks ago +**Total Size:** ~7GB Docker images + data +**Containers:** +- nextcloud-aio-mastercontainer +- nextcloud-aio-apache +- nextcloud-aio-nextcloud (2.19GB) +- nextcloud-aio-database (PostgreSQL) +- nextcloud-aio-redis +- nextcloud-aio-onlyoffice (3.79GB!) +- nextcloud-aio-imaginary +- nextcloud-aio-notify-push + +**Data:** /mnt/user/nextcloud (~1GB+) + +**Analysis:** +- Massive resource footprint +- "All-in-One" = heavy coupling +- Stopped for 3 weeks suggests not critical + +**Recommendations:** +**DECISION REQUIRED:** + +**Option A: Remove Everything** +```bash +# Backup data first! +cp -r /mnt/user/nextcloud /mnt/user/backup/nextcloud-$(date +%Y%m%d) + +# Remove containers +docker rm nextcloud-aio-* + +# Remove images to free space +docker rmi $(docker images | grep nextcloud | awk '{print $3}') + +# Archive data +tar -czf nextcloud-data-backup.tar.gz /mnt/user/nextcloud +``` +**Saves:** ~7GB+ space + +**Option B: Keep and Restart** +- Document why you need it +- Create restart procedure +- Implement backup strategy +- Monitor resource usage + +**My Recommendation:** Remove unless actively needed. Nextcloud is great but this All-in-One stack is heavy. + +--- + +### 12. Jellyfin (Stopped 2 weeks ago) ⚠️ + +**Status:** Exited (0) 2 weeks ago +**Image:** jellyfin/jellyfin (1.25GB) +**GPU:** RTX 4090 allocated but idle! + +**Media:** +- Movies: /mnt/user/movies +- TV: /mnt/user/tv shows +- Music: /mnt/user/music + +**Issue:** $1600 GPU sitting idle! + +**Recommendations:** +**If you want media server:** +1. **RESTART** with hardware transcoding: + ```bash + docker start Jellyfin + ``` +2. Configure NVENC/NVDEC for RTX 4090 +3. Test 4K transcoding performance +4. Switch from `host` network to bridge (security) + +**If you don't need media server:** +1. Remove GPU allocation from container +2. Free GPU for other projects (AI/ML) + +**Action Required:** Decide on media server strategy + +--- + +### 13. Large AI/ML Containers (Rarely Used) + +**ebook2audiobook** - 20.06GB! (stopped 3 weeks) +**docling-serve** - 14.45GB! (stopped 2 weeks) + +**Total:** 34.5GB for two containers! + +**Analysis:** +- Massive images +- Rarely used (stopped weeks ago) +- Experimental/one-time use? + +**Recommendations:** +1. **REMOVE** both to free 34.5GB +2. If needed again, pull fresh images +3. Document use cases if keeping + +**Potential Savings:** 34.5GB cache space! + +--- + +### 14. Productivity Suite (Multiple Stopped) + +**baserow** - Stopped 2 weeks (2.25GB) +**NocoDB** - Stopped 3 weeks (588MB) +**OpenProject** - Stopped 7 weeks (2.87GB) + +**Issue:** Three project management tools - redundant! + +**Recommendations:** +1. **CHOOSE ONE** (or none if not used) +2. Remove the others +3. Migrate data if needed first + +**Potential Savings:** ~5GB + +--- + +### 15. Development Tools + +**n8n** (workflow automation) - Created but never started +**steam-headless** - Created but not running + +**Recommendations:** +- Document if you have plans for these +- Remove if experimental and abandoned + +--- + +## πŸ“‹ Container Decision Matrix + +| Container | Keep? | Action | Priority | +|-----------|-------|--------|----------| +| **open-webui** | βœ… Yes | Keep running, restart ollama | HIGH | +| **NginxProxyManager** | βœ… Yes | Keep, document configs | CRITICAL | +| **Gitea** | βœ… Yes | Keep, fix SSH port, backup | CRITICAL | +| **ApacheGuacamole** | ⚠️ Decide | Fix MariaDB OR remove both | MEDIUM | +| **Cloudflared** | βœ… Yes | Keep, rotate token | HIGH | +| **Vaultwarden** | βœ… Yes | Keep, BACKUP NOW! | CRITICAL | +| **ollama** | βœ… Yes | Restart immediately | HIGH | +| **Monitoring Stack** | βœ… Yes | Restart all 3 containers | CRITICAL | +| **MariaDB** | ⚠️ Conditional | If Guacamole stays | MEDIUM | +| **Nextcloud AIO (7)** | ❌ Remove | Backup data, remove stack | LOW | +| **Jellyfin** | ⚠️ Decide | Use GPU or remove | MEDIUM | +| **ebook2audiobook** | ❌ Remove | Free 20GB | LOW | +| **docling-serve** | ❌ Remove | Free 14.5GB | LOW | +| **baserow/NocoDB/OpenProject** | ❌ Choose 1 | Remove others | LOW | +| **CloudBeaver/adminer** | ⚠️ Choose 1 | Keep one DB admin | LOW | + +--- + +## 🎯 Recommended Action Plan + +### Phase 1: Critical (Do First!) 🚨 + +1. **Backup Vaultwarden** (30 min) + ```bash + docker stop vaultwarden + tar -czf vaultwarden-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/vaultwarden + docker start vaultwarden + ``` + +2. **Backup Gitea** (30 min) + ```bash + docker stop Gitea + tar -czf gitea-backup-$(date +%Y%m%d).tar.gz /mnt/user/appdata/gitea + docker start Gitea + ``` + +3. **Restart Monitoring Stack** (15 min) + ```bash + docker start Influxdb && sleep 15 + docker start Telegraf Grafana + # Configure dashboards + ``` + +4. **Restart ollama** (5 min) + ```bash + docker start ollama + docker logs -f ollama + ``` + +### Phase 2: Cleanup (Free Space!) πŸ’Ύ + +5. **Remove Large Unused Containers** (1 hour) + - ebook2audiobook (20GB) + - docling-serve (14.5GB) + - Nextcloud AIO stack (7GB) + - **Saves: ~41GB!** + +6. **Docker System Cleanup** + ```bash + docker system prune -a + # Free unused images and build cache + ``` + +### Phase 3: Decisions (This Week) + +7. **Guacamole + MariaDB** - Keep or remove? +8. **Jellyfin** - Restart with GPU or remove? +9. **Productivity tools** - Choose one, remove others +10. **Database admin** - CloudBeaver or adminer? + +--- + +## πŸ“Š Storage Cleanup Impact + +**Current Cache Usage:** 578GB / 932GB (63%) + +**After Recommended Cleanup:** +- Remove ebook2audiobook: -20GB +- Remove docling-serve: -14.5GB +- Remove Nextcloud AIO: -7GB +- Docker system prune: ~10-20GB +- **Total Freed: ~50-60GB** + +**New Cache Usage:** ~520GB / 932GB (56%) βœ… + +--- + +## πŸ” Security Recommendations + +1. **Secrets Management** - Stop using plain text env vars +2. **Close Open Signups** - Vaultwarden signups should be closed +3. **SSH Port Conflict** - Fix Gitea port 22 conflict +4. **Network Mode** - Move Jellyfin from `host` to `bridge` +5. **Version Pinning** - Stop using `latest` tags + +--- + +## πŸ“ˆ Resource Summary + +**Docker Images Total:** ~50GB +**Container Data:** Varies by appdata +**Cache Impact:** High (63% full) + +**Top Resource Consumers (Images):** +1. ebook2audiobook: 20.06GB +2. docling-serve: 14.45GB +3. Nextcloud stack: ~7GB +4. open-webui: 4.55GB +5. OpenProject: 2.87GB + +--- + +## πŸŽ“ Key Takeaways + +1. **6 services are your core** - Keep these running +2. **26 stopped containers** - Cleanup opportunity +3. **~40GB can be freed** - Significant space available +4. **No monitoring** - Critical gap (restart Grafana stack!) +5. **Backup critical** - Vaultwarden and Gitea MUST be backed up + +--- + +**Last Updated:** October 31, 2025 +**Next Review:** After cleanup actions completed +**Maintained By:** Weston